Context

Context window

Also called: context length, context size

The context window is the maximum amount of text, measured in tokens, that a model can consider for a single request. It is a hard ceiling, and it is the main resource you manage when working with an agent.

James Phoenix
Understanding Data Updated July 2, 2026

Every request to a model has a budget: the context window. It is the total number of tokens the model can look at in one go, covering both what you send in and what it writes back. Modern models advertise large windows, but "large" is not "unlimited," and treating it as unlimited is the single most common way agent sessions go wrong.

What lives in the window

For a coding agent, the window is shared by a lot of competing tenants:

  • The system prompt and tool definitions that set up the agent.
  • Your instructions and any project rules the agent loads.
  • File contents, command output, and search results the agent has pulled in.
  • The full back-and-forth of the conversation so far.

Every one of those takes space, and space is finite. When the window fills, something has to give, and that is where quality quietly degrades.

Why it is the resource you manage

Two failure modes follow directly from the ceiling:

  • Overflow. Push past the limit and the oldest or least-relevant content gets dropped or compacted. If the thing that got dropped was the instruction that actually mattered, the agent will confidently do the wrong thing.
  • Dilution. Even well within the limit, a window stuffed with marginally-relevant text makes it harder for the model to attend to the few lines that count. More context is not automatically better context.

The craft of working with an agent is largely the craft of curating this window: giving it the files that matter, clearing out what is done, and pointing it at sources instead of pasting everything in.

Tip
If an agent starts forgetting an instruction you gave it earlier, the window is usually the culprit. The instruction has been pushed out or buried. Restate it, start a fresh session, or trim what the agent is carrying, rather than repeating yourself louder.

Keeping this window clean is a whole discipline, which is why the context-engineering section of this dictionary exists: memory systems, progressive disclosure, and handoffs are all techniques for getting maximum value from a fixed number of tokens.

Related terms

Building with AI agents?

This dictionary is part of how I think about agentic engineering. If you want the same thinking applied to your codebase, that is what I do.

See how I can help