An attention budget is a way of thinking about attention as a finite resource. The model's ability to focus is spread across everything in the context at once, so the more you put in, the thinner that focus gets on any single piece.
Finite focus
The metaphor is not literal accounting, but it captures something real. Cramming a giant context window full does not give you a model that pays full attention to all of it. It gives you attention divided across more material. Fill the window with ten files when the task needs two, and the model is now splitting its focus eight ways that do not help, at the expense of the two that do.
Where the budget gets wasted:
- Files, logs, or docs that are not relevant to the current task.
- Finished conversation history nobody needs anymore.
- Long, repetitive tool output that could be trimmed to the point.
Spend it deliberately
This is why bigger is not automatically better. A model with plenty of room left in its window can still perform worse than one handed a tight, relevant slice, because a bloated context starves the budget and slides toward attention degradation.
Related terms
Attention
Attention is the mechanism a model uses to weigh how strongly each token in its context relates to the others when predicting the next one. It is the basis of how a model actually uses context.
Read definition →Context window
The context window is the maximum amount of text, measured in tokens, that a model can consider for a single request. It is a hard ceiling, and it is the main resource you manage when working with an agent.
Read definition →Attention degradation
Attention degradation is the quality drop a model shows as its context grows: recall weakens and it misses or confuses buried details, often well below the hard token limit. It is also called context rot.
Read definition →