Skip to main content
Glossary

RAG (Retrieval-Augmented Generation)

Architecture where the LLM fetches relevant documents from a knowledge base before answering. Enables up-to-date, company-specific responses without fine-tuning.

Retrieval-Augmented Generation (RAG) is the architecture that gives LLMs access to company-specific knowledge — without fine-tuning. Before every answer the system searches a knowledge base (documents, FAQ, wiki), retrieves the most relevant passages, and hands them to the LLM as context.

A RAG pipeline has four steps: (1) documents are chunked and embedded into a vector database; (2) the caller’s query is also embedded; (3) top-k similar chunks are retrieved; (4) the LLM generates the answer grounded on those chunks.

The decisive lever is rarely the model, it’s data quality: chunk size, deduplication, freshness of sources, and a clear hierarchy between binding and informational documents. A good RAG setup is 80 % data hygiene and 20 % model choice.

Next step

See BHOMY in a 15-minute demo on a real call example.

🍪

Cookies & Privacy

We use cookies to provide you with the best possible experience on our website. Some of them are technically necessary, others help us improve the website.