Home / Glossary / RAG
What is Retrieval-Augmented Generation?
Abbreviation: RAG
Retrieval-Augmented Generation (RAG) is the technique of fetching relevant context from a knowledge store at query time and including it in the LLM prompt. osFoundry’s knowledge bases auto-index for RAG and Maestro retrieves from them on every relevant chat turn.
Detail
RAG addresses two LLM limitations: outdated training data and limited context window. Instead of trying to fit everything into the prompt, you retrieve only the chunks relevant to the user query and pass those.
A RAG pipeline typically has stages: query → embed → vector search → optional reranking → optional filtering → assemble context → call LLM. Each stage has knobs (embedding model, top-k, reranker, threshold) that affect quality and cost.
How osFoundry approaches Retrieval-Augmented Generation
osFoundry’s RAG pipeline is fully configurable per chat path in osStudio. Drag stages, pick embedding and reranker models, set thresholds. Different surfaces (code-chat vs customer-success) can have different pipelines.
FAQ
Does osFoundry support RAG?
Yes — knowledge bases auto-index, retrieval pipelines are configurable in osStudio, and Maestro retrieves on every relevant chat turn.
What is the difference between RAG and fine-tuning?
RAG retrieves external facts at query time. Fine-tuning bakes new behaviour into the model weights. They’re complementary — RAG for facts, fine-tuning for style or specialised reasoning.
Can I customise the RAG pipeline?
Yes — per chat path in osStudio. Different stages, models, thresholds per use case.
Related terms
Related features