Home / Glossary / Context window

What is Context Window?

The context window is the maximum number of tokens an LLM can process in one request (input + output combined). Modern models range from 4 K to 2 M tokens. osFoundry’s catalog lists the context window for every model.

Detail

Tokens are sub-word units; ~3-4 characters of English text per token on average. A 128 K context window holds roughly 100,000 words. The window includes the system prompt, conversation history, retrieved context, AND the model’s generated reply — every part is counted against the limit.

Bigger windows let you stuff more context but cost more per request and have diminishing returns — quality often degrades past 50-100 K. Strategies like RAG retrieve only the relevant chunks instead of stuffing everything in.

How osFoundry approaches Context Window

osFoundry’s knowledge bases + RAG pipeline retrieve only the relevant chunks for each query, keeping the context window focused. You can also pick a model with a bigger window from the catalog if you need it.