Knowledge Pipeline
How Konrado.AI indexes and uses your content for AI-powered responses
The Knowledge Base is the foundation of accurate AI responses. Konrado.AI crawls, indexes, and searches your content to provide contextually relevant answers to customer questions.
How Content Gets Indexed
1. Source Discovery
You provide content sources in the Knowledge Base settings:
- Sitemaps — XML sitemaps that point to all pages on your website
- Individual URLs — Specific pages you want indexed (documentation, FAQ, help articles)
- Ticket History — Past support tickets and their resolutions
2. Web Crawling
Konrado.AI's crawler fetches the content from each URL:
- Extracts meaningful text content from HTML pages
- Strips navigation, footers, and other non-content elements
- Handles JavaScript-rendered pages
- Respects rate limits and robots.txt
If your site uses Cloudflare, a WAF, or bot protection, you may need to whitelist the Konrado.AI crawler. Check the Knowledge Base page for whitelisting instructions.
3. Chunking and Embedding
The extracted content is processed for semantic search:
- Chunking — Documents are split into meaningful segments that preserve context
- Embedding — Each chunk is converted into a vector representation using AI embedding models
- Indexing — Vectors are stored in a PostgreSQL database with pgvector for fast similarity search
4. Automatic Refresh
Your web content is automatically refreshed once per day during off-peak hours. This ensures the AI always has access to your latest documentation and support articles.
You can also trigger a manual refresh from the Knowledge Base dashboard at any time.
How Search Works
When the AI needs to answer a customer question, it performs semantic search:
- The customer's question is converted to a vector embedding
- The system finds the most similar content chunks using vector similarity (cosine distance)
- The top-matching chunks are ranked by relevance
- The most relevant content is provided to the response generation agent as context
This approach finds conceptually related content even when the exact words don't match — for example, a question about "my website is down" will match documentation about "server uptime monitoring" and "service status checks."
Content Sources Comparison
| Source | Best For | Update Frequency |
|---|---|---|
| Sitemaps | Comprehensive coverage of all public pages | Daily automatic refresh |
| Individual URLs | Specific high-value pages, internal docs | Daily automatic refresh |
| Ticket History | Learning from past resolutions and common issues | Real-time as tickets close |
Best Practices
- Add all relevant sitemaps — The more content the AI has access to, the better its responses
- Include FAQ and help pages — These are gold for common customer questions
- Keep content up to date — The daily refresh ensures changes propagate, but outdated source content leads to outdated AI responses
- Monitor processing status — Check the Knowledge Base dashboard for errors or failed crawls
- Use descriptive page titles — The AI uses page titles to understand content relevance
Learn how to configure your Knowledge Base in the Knowledge Base guide.