Knowledge Pipeline

The Knowledge Base is the foundation of accurate AI responses. Konrado.AI crawls, indexes, and searches your content to provide contextually relevant answers to customer questions.

How Content Gets Indexed

1. Source Discovery

You provide content sources in the Knowledge Base settings:

Sitemaps — XML sitemaps that point to all pages on your website
Individual URLs — Specific pages you want indexed (documentation, FAQ, help articles)
Ticket History — Past support tickets and their resolutions

2. Web Crawling

Konrado.AI's crawler fetches the content from each URL:

Extracts meaningful text content from HTML pages
Strips navigation, footers, and other non-content elements
Handles JavaScript-rendered pages
Respects rate limits and robots.txt

If your site uses Cloudflare, a WAF, or bot protection, you may need to whitelist the Konrado.AI crawler. Check the Knowledge Base page for whitelisting instructions.

3. Chunking and Embedding

The extracted content is processed for semantic search:

Chunking — Documents are split into meaningful segments that preserve context
Embedding — Each chunk is converted into a vector representation using AI embedding models
Indexing — Vectors are stored in a PostgreSQL database with pgvector for fast similarity search

4. Automatic Refresh

Your web content is automatically refreshed once per day during off-peak hours. This ensures the AI always has access to your latest documentation and support articles.

You can also trigger a manual refresh from the Knowledge Base dashboard at any time.

How Search Works

When the AI needs to answer a customer question, it performs semantic search:

The customer's question is converted to a vector embedding
The system finds the most similar content chunks using vector similarity (cosine distance)
The top-matching chunks are ranked by relevance
The most relevant content is provided to the response generation agent as context

This approach finds conceptually related content even when the exact words don't match — for example, a question about "my website is down" will match documentation about "server uptime monitoring" and "service status checks."

Content Sources Comparison

Source	Best For	Update Frequency
Sitemaps	Comprehensive coverage of all public pages	Daily automatic refresh
Individual URLs	Specific high-value pages, internal docs	Daily automatic refresh
Ticket History	Learning from past resolutions and common issues	Real-time as tickets close

Best Practices

Add all relevant sitemaps — The more content the AI has access to, the better its responses
Include FAQ and help pages — These are gold for common customer questions
Keep content up to date — The daily refresh ensures changes propagate, but outdated source content leads to outdated AI responses
Monitor processing status — Check the Knowledge Base dashboard for errors or failed crawls
Use descriptive page titles — The AI uses page titles to understand content relevance

Learn how to configure your Knowledge Base in the Knowledge Base guide.

On this page