Home / Glossary / Context window
What is Context Window?
The context window is the maximum number of tokens an LLM can process in one request (input + output combined). Modern models range from 4 K to 2 M tokens. osFoundry’s catalog lists the context window for every model.
Detail
Tokens are sub-word units; ~3-4 characters of English text per token on average. A 128 K context window holds roughly 100,000 words. The window includes the system prompt, conversation history, retrieved context, AND the model’s generated reply — every part is counted against the limit.
Bigger windows let you stuff more context but cost more per request and have diminishing returns — quality often degrades past 50-100 K. Strategies like RAG retrieve only the relevant chunks instead of stuffing everything in.
How osFoundry approaches Context Window
osFoundry’s knowledge bases + RAG pipeline retrieve only the relevant chunks for each query, keeping the context window focused. You can also pick a model with a bigger window from the catalog if you need it.
Related terms
Related features