Welcome to the Settings — Knowledge Base Details Page!

Use this page to decide what web content the application should learn from and to start/monitor ingestion. Clear inputs here lead to better, more reliable answers for your users.

Tip: Complete this page before turning the app ON. You can adjust these settings again for future crawls.
First part of the page

What this page does

  1. Define content to ingest: You provide the web addresses (URLs) that make up your knowledge base.
  2. Start & monitor ingestion: You run a crawl and check its live status from here.

Which URLs would you like to include?

Enter one URL per line. Each URL tells the application where to gather information. You may optionally label each URL with a product name and version (this helps organize answers).

Format

https://docs.solverox.com/TSSAIAgent/|TSSAIAgent|V3.1
      https://docs.solverox.com/All-In-1Filter/|All-In-1Filter|V1.0

The first part is the URL to ingest. The second and third parts (optional) indicate the related product and version. If your URL is not specific to a product, you can leave labels empty:

https://docs.solverox.com/TSSAIAgent/
Tip: All URLs must end with a trailing / (slash), regardless of how they appear on the web.

Encrypt knowledge base artifacts with my own AWS KMS key

You can protect your information (content chunks and any JQLs that might include customer names) with your own encryption key. If you enable this option, provide your AWS KMS ARN so the application can encrypt with your key.

Tip: Your data is encrypted even if you do not use your own key. For the strongest control (so only you can decrypt), select this option and supply your AWS KMS key.

Scope controls (keep ingestion focused)

Include information only under the base domains above

Limits ingestion to the same domain. For example, if you include https://docs.solverox.com/TSSAIAgent/, links pointing outside docs.solverox.com will be ignored.

Include information only under the same URL paths

Limits ingestion to the same path. With https://docs.solverox.com/TSSAIAgent/, any link outside that path is ignored.

Tip: These two options prevent off-topic pages from entering your knowledge base and improve answer quality.
Second part of the page

Which URLs would you like to exclude?

Provide specific URLs you do not want ingested. If the crawler encounters them, those pages are skipped.

https://docs.solverox.com/TSSAIAgent/settings/

Which languages would you like to include?

If your site uses language codes in URLs and you want only selected languages ingested, list them here.

Tip: The application supports multiple languages, but the most consistent results are typically achieved in English.

Which keywords would you like to exclude?

Use this field to filter out patterns via regular expressions (regex). Separate each expression with the three-character sequence [;].

\bCopyright\b[;]\bAll rights reserved\b[;]\bBranding:\s*Solverox\b
Tip: Cleaner input produces better answers. Exclude repeated boilerplate such as footers, cookie notices, or branding blocks.

Should we include the text on your images?

If enabled, the application will extract text from images (OCR) and add it to your knowledge base.

Tip: Use OCR only when images contain meaningful text (e.g., a screenshot of a paragraph). OCR increases ingestion time and is usually unnecessary if your images already have good metadata.

The depth from the URL provided

Choose how many link levels to follow from the starting URL. Example for https://docs.solverox.com/TSSAIAgent/ with depth = 1:

https://docs.solverox.com/TSSAIAgent/something.html                → OK
      https://docs.solverox.com/TSSAIAgent/dept1/something1.html         → OK
      https://docs.solverox.com/TSSAIAgent/dept1/something2.html         → OK
      https://docs.solverox.com/TSSAIAgent/dept1/depth2/something.html   → NOT OK
Tip: Start shallow (e.g., 1) to validate results quickly, then increase if needed.

How many pages should be included?

Set a maximum number of pages to ingest per run (e.g., 100). This caps the volume and helps you stay within your plan.

Tip: Different packages have different ingestion limits. These settings apply to the current run only; you can change them for future runs.

Save Settings

Click Save Settings to store all choices on this page before starting a crawl.

Third part of the page

Extraction Controls

Tip: When ingestion finishes, you’ll see Crawled URLs and Removed URLs populated, and the status returns to idle.

Why these settings are helpful

Expert tips

Tip: Treat crawls as repeatable jobs. Adjust scope, run again, and review the delta to keep your knowledge current.

When you’re satisfied with the setup, proceed to Knowledge Base Sources export to review coverage, then enable the app via the App Status master switch.