Welcome to the Settings — Knowledge Base Details Page!
Use this page to decide what web content the application should learn from and to start/monitor ingestion. Clear inputs here lead to better, more reliable answers for your users.
What this page does
- Define content to ingest: You provide the web addresses (URLs) that make up your knowledge base.
- Start & monitor ingestion: You run a crawl and check its live status from here.
Which URLs would you like to include?
Enter one URL per line. Each URL tells the application where to gather information. You may optionally label each URL with a product name and version (this helps organize answers).
Format
https://docs.solverox.com/TSSAIAgent/|TSSAIAgent|V3.1
https://docs.solverox.com/All-In-1Filter/|All-In-1Filter|V1.0
The first part is the URL to ingest. The second and third parts (optional) indicate the related product and version. If your URL is not specific to a product, you can leave labels empty:
https://docs.solverox.com/TSSAIAgent/
/ (slash), regardless of how they appear on the web.
Encrypt knowledge base artifacts with my own AWS KMS key
You can protect your information (content chunks and any JQLs that might include customer names) with your own encryption key. If you enable this option, provide your AWS KMS ARN so the application can encrypt with your key.
Scope controls (keep ingestion focused)
Include information only under the base domains above
Limits ingestion to the same domain. For example, if you include https://docs.solverox.com/TSSAIAgent/, links pointing outside docs.solverox.com will be ignored.
Include information only under the same URL paths
Limits ingestion to the same path. With https://docs.solverox.com/TSSAIAgent/, any link outside that path is ignored.
Which URLs would you like to exclude?
Provide specific URLs you do not want ingested. If the crawler encounters them, those pages are skipped.
https://docs.solverox.com/TSSAIAgent/settings/
Which languages would you like to include?
If your site uses language codes in URLs and you want only selected languages ingested, list them here.
Which keywords would you like to exclude?
Use this field to filter out patterns via regular expressions (regex). Separate each expression with the three-character sequence [;].
\bCopyright\b[;]\bAll rights reserved\b[;]\bBranding:\s*Solverox\b
Should we include the text on your images?
If enabled, the application will extract text from images (OCR) and add it to your knowledge base.
The depth from the URL provided
Choose how many link levels to follow from the starting URL. Example for https://docs.solverox.com/TSSAIAgent/ with depth = 1:
https://docs.solverox.com/TSSAIAgent/something.html → OK
https://docs.solverox.com/TSSAIAgent/dept1/something1.html → OK
https://docs.solverox.com/TSSAIAgent/dept1/something2.html → OK
https://docs.solverox.com/TSSAIAgent/dept1/depth2/something.html → NOT OK
How many pages should be included?
Set a maximum number of pages to ingest per run (e.g., 100). This caps the volume and helps you stay within your plan.
Save Settings
Click Save Settings to store all choices on this page before starting a crawl.
Extraction Controls
- Run Crawl: Starts the extraction. At the end, content matching your settings is ingested.
- Check Status: Shows the latest crawl status.
- Latest Crawl Status: Displays real-time progress while a crawl is running.
Why these settings are helpful
- Precision: Include only the content you trust; exclude noise and boilerplate.
- Compliance: Keep scope aligned with domains/paths and languages you approve.
- Security: Use your own encryption key to ensure only you (and the app) can read sensitive artifacts.
- Efficiency: Control depth and page limits to manage cost, time, and relevance.
Expert tips
- Ingestion-only model: There is no per-URL delete from the knowledge base. Removing content requires a full re-scrape.
- Refreshing a source: If a specific area changed (e.g.,
https://docs.solverox.com/TSSAIAgent/dept1/), run a targeted crawl (e.g., depth = 0, same-path enabled) to revisit those pages. The app will show which URLs were added or removed.
When you’re satisfied with the setup, proceed to Knowledge Base Sources export to review coverage, then enable the app via the App Status master switch.