AI Agent Knowledge Base: Formats, Chunking & Indexing
Build a BETTER AI agent knowledge base. NVIDIA benchmarks on chunk sizes, format tradeoffs, hybrid search, and RAG pipelines. Step-by-step guide for engineers.
Frequently Asked Questions
What is an AI agent knowledge base?
An AI agent knowledge base is a structured collection of documents, facts, and data that an agent retrieves at runtime to answer questions accurately. It works through a RAG pipeline — documents are chunked, embedded into vectors, and stored in a vector database for semantic retrieval. This is how agents stay factual without hallucinating.
What file formats work best for an AI knowledge base?
Markdown (.md) produces the highest retrieval quality because its heading structure maps directly to meaningful chunks. JSON and CSV excel for structured data like product catalogs. PDFs work but require preprocessing with tools like Docling before ingestion. For most teams, a Markdown-first strategy with JSON for structured facts is optimal.
What is the best chunk size for RAG?
NVIDIA research shows 512 tokens with 50–100 token overlap is the best starting point for most use cases, achieving 0.640+ average retrieval accuracy. For precise factoid queries, 256–512 tokens works best; for complex analytical questions, use 1,024-token or page-level chunks.
What is the difference between vector, keyword, and hybrid search?
Vector search finds semantically similar content via embeddings — great for conceptual queries. Keyword (BM25) search matches exact terms — reliable for product codes and proper nouns. Hybrid search combines both using Reciprocal Rank Fusion for the best coverage. Most production RAG systems use [hybrid search](/blog/agentic-rag/) because pure semantic search fails on proprietary terminology.
How do I keep an AI knowledge base from going stale?
Implement an incremental ingestion pipeline triggered by source changes (webhooks, git hooks, or scheduled polling). Each update re-chunks and re-embeds only the changed documents. Use timestamp metadata to invalidate stale vectors. Teams using cowork.ink can connect knowledge sources directly to shared agents so the whole team always queries fresh data.