Konrado.AIdocs
Concepts

Knowledge Pipeline

How Konrado.AI indexes and uses your content for AI-powered responses

The Knowledge Base is the foundation of accurate AI responses. Konrado.AI crawls, indexes, and searches your content to provide contextually relevant answers to customer questions.

How Content Gets Indexed

1. Source Discovery

You provide content sources in the Knowledge Base settings:

  • Sitemaps — XML sitemaps that point to all pages on your website
  • Individual URLs — Specific pages you want indexed (documentation, FAQ, help articles)
  • Ticket History — Past support tickets and their resolutions

2. Web Crawling

Konrado.AI's crawler fetches the content from each URL:

  • Extracts meaningful text content from HTML pages
  • Strips navigation, footers, and other non-content elements
  • Handles JavaScript-rendered pages
  • Respects rate limits and robots.txt

If your site uses Cloudflare, a WAF, or bot protection, you may need to whitelist the Konrado.AI crawler. Check the Knowledge Base page for whitelisting instructions.

3. Chunking and Embedding

The extracted content is processed for semantic search:

  • Chunking — Documents are split into meaningful segments that preserve context
  • Embedding — Each chunk is converted into a vector representation using AI embedding models
  • Indexing — Vectors are stored in a PostgreSQL database with pgvector for fast similarity search

4. Automatic Refresh

Your web content is automatically refreshed once per day during off-peak hours. This ensures the AI always has access to your latest documentation and support articles.

You can also trigger a manual refresh from the Knowledge Base dashboard at any time.

How Search Works

When the AI needs to answer a customer question, it performs semantic search:

  1. The customer's question is converted to a vector embedding
  2. The system finds the most similar content chunks using vector similarity (cosine distance)
  3. The top-matching chunks are ranked by relevance
  4. The most relevant content is provided to the response generation agent as context

This approach finds conceptually related content even when the exact words don't match — for example, a question about "my website is down" will match documentation about "server uptime monitoring" and "service status checks."

Content Sources Comparison

SourceBest ForUpdate Frequency
SitemapsComprehensive coverage of all public pagesDaily automatic refresh
Individual URLsSpecific high-value pages, internal docsDaily automatic refresh
Ticket HistoryLearning from past resolutions and common issuesReal-time as tickets close

Best Practices

  • Add all relevant sitemaps — The more content the AI has access to, the better its responses
  • Include FAQ and help pages — These are gold for common customer questions
  • Keep content up to date — The daily refresh ensures changes propagate, but outdated source content leads to outdated AI responses
  • Monitor processing status — Check the Knowledge Base dashboard for errors or failed crawls
  • Use descriptive page titles — The AI uses page titles to understand content relevance

Learn how to configure your Knowledge Base in the Knowledge Base guide.

On this page