RAG, short for retrieval-augmented generation, is a technique that lets an AI model look up relevant information and use it to answer, instead of relying only on what it memorized during training. The system first retrieves documents from a knowledge base, then feeds those documents into the model alongside your question, so the answer is grounded in real, current, specific sources. That two-step pattern — retrieve, then generate — is what RAG means. This explainer covers how it works, a concrete example, and where it helps versus where it does not.
Why RAG exists
A base language model has two stubborn limits. It only knows what was in its training data, so it goes stale and never saw your private documents. And it generates plausible text, so it can confidently invent facts. RAG addresses both by giving the model real source material at the moment of the question. The model still writes the answer, but now it is reading from a relevant document rather than recalling vague patterns.
How RAG works
- Index — split your documents into chunks and store them in a way that is searchable by meaning, usually as embeddings in a vector database.
- Retrieve — when a question arrives, find the chunks most relevant to it.
- Augment — add those chunks to the prompt, alongside the question.
- Generate — the model answers using the supplied context, ideally citing it.
The quality of the answer depends heavily on step two. If retrieval pulls the wrong chunks, the model answers from the wrong material, and the result is wrong with full confidence. Good RAG is mostly good retrieval.
A concrete example
A company wants an internal assistant that answers questions about its own policies. Those policies were never in the model training data. With RAG, the policy documents are indexed; when an employee asks "how many remote days am I allowed?", the system retrieves the relevant policy section and feeds it to the model, which answers from that exact text and can quote it. No retraining, and updating the answer is as simple as updating the document.
This is why RAG is the default for grounding a model in private or fast-changing knowledge, and why it is often compared with retraining the model itself; see RAG versus fine-tuning.
RAG versus fine-tuning
| Factor |
RAG |
Fine-tuning |
| Adds knowledge |
Yes, from a live source |
Bakes patterns into the model |
| Updating |
Edit the documents |
Retrain the model |
| Cost |
Lower, ongoing retrieval |
Higher, training compute |
| Best for |
Facts, docs, changing data |
Style, format, behavior |
| Citations |
Natural, sources available |
Hard, no source to point to |
A rough rule: use RAG to give a model facts and use fine-tuning to change how it behaves. Many real systems use both. For the underlying model, see what a large language model is.
What to skip and watch for
- Do not fine-tune for facts you could retrieve. It is more expensive and harder to update than RAG.
- Do not ignore retrieval quality. Most RAG failures are retrieval failures: wrong or missing chunks.
- Do not assume RAG eliminates hallucination. It reduces it. The model can still misuse or contradict the sources.
- Do not feed it garbage sources. Grounding in bad documents grounds you in bad answers.
FAQ
What does RAG stand for?
Retrieval-augmented generation. The model retrieves relevant documents and uses them to generate an answer, rather than relying only on its training data.
How is RAG different from fine-tuning?
RAG supplies facts at query time from a live source and is easy to update. Fine-tuning changes the model behavior through retraining and is better for style and format than for knowledge.
Does RAG stop AI from making things up?
It reduces hallucination by grounding answers in real sources, but does not eliminate it. Poor retrieval or poor sources still lead to wrong answers, so data quality matters.
When should I use RAG?
When you need a model to answer about private documents, recent information, or facts it never saw in training, and you want answers you can trace back to a source.
Where to go next
Compare RAG and fine-tuning in depth, understand the large language model underneath, and see how generative AI creates answers.