A context window is the amount of text an AI language model can read and keep in mind at one time, measured in tokens. Think of it as the model's short-term working memory: everything you want it to consider for a single response — your question, any pasted documents, the conversation so far, and the answer it generates — has to fit inside that window. When the limit is reached, the oldest content falls out of view. In 2026, windows range from tens of thousands of tokens to well over a million on the largest models.
How it works
The model does not see letters or words directly. It breaks text into tokens, which are common chunks of characters — a short word is often one token, while a long or rare word may split into several. As a rough guide, a token is about three-quarters of an English word. The context window is the maximum number of tokens the model can attend to at once, and that single budget is shared by your input and its output — including any hidden system prompt the assistant carries.
| Term |
Plain meaning |
| Token |
A chunk of text, roughly part of a word |
| Context window |
Max tokens the model handles in one turn |
| Input tokens |
Your prompt, files, and chat history |
| Output tokens |
The reply the model writes back |
Why it matters
The window sets the ceiling on how much the model can reason over at once. A small window means a long document must be split into pieces, and the model never sees the whole thing together. A large window lets you paste an entire report, codebase, or transcript and ask questions across all of it. It also affects cost and speed, since most providers bill per token and longer inputs take longer to process.
A concrete example
Say you paste a 50-page contract and ask the model to find every clause about termination. If the contract fits inside the context window, the model can scan the whole thing in one pass and cross-reference clauses. If it does not fit, you must chunk the contract, run several queries, and stitch the answers together yourself, which is slower and easier to get wrong.
Common misconceptions
Bigger windows mean perfect recall. They do not. Models often attend more strongly to the start and end of a long input and can overlook details buried in the middle, a pattern sometimes called the lost-in-the-middle effect.
Context is permanent memory. It is not. Once content scrolls out of the window or the session ends, the model forgets it unless you provide it again or use a retrieval system.
More context is always better. Stuffing in irrelevant text dilutes the signal, costs more, and can lower answer quality. A focused excerpt usually beats a giant dump.
How to use it well
- Lead with the most relevant material. Put the key text near the start or end, not buried in the middle.
- Trim ruthlessly. Include only the passages that bear on your question.
- Use retrieval for large corpora. Pull in just the relevant snippets instead of the whole library.
- Watch the shared budget. Long inputs leave less room for the answer; leave headroom for the response.
FAQ
Is a bigger context window always better?
No. It helps for genuinely large inputs, but it costs more, runs slower, and does not guarantee the model uses every detail well. Relevance beats raw size.
Does the context window remember previous chats?
Only within the current session and only while content stays in the window. For lasting memory across sessions you need a separate memory or retrieval feature.
How many words fit in a context window?
Roughly, divide the token limit by about 1.3 to estimate words. A 100,000-token window holds on the order of 75,000 words.
Why does my long input get cut off?
You likely exceeded the window. The model drops the oldest or overflowing content, so shorten the input or use retrieval to feed only what matters.
Where to go next
See what is a token in AI in 2026, what is a large language model in 2026, and what is RAG in 2026.