Attention degradation is the quality drop you get as a context fills up. Long before you hit the hard token limit, a model's recall weakens: it starts missing details, confusing similar things, and forgetting instructions you gave earlier. People also call it context rot.
Room in the window is not the same as focus
It is tempting to treat the context window as a container: if it fits, you are fine. In practice the model gets less reliable well before the container is full, because a fuller context stretches its attention thinner. Common symptoms:
- It forgets a constraint you stated near the top of a long session.
- It re-introduces a bug you already fixed, having lost track of the change.
- It answers using the wrong file when several similar ones are loaded.
- It misses a fact buried in the middle, the classic lost in the middle effect.
Fight it by keeping context lean
The countermeasures all come down to respecting the attention budget: clear finished work, compact or hand off before the window bloats, and keep only the files that matter loaded. A fresh, focused context routinely outperforms a long, cluttered one on the very same problem.
Related terms
Attention
Attention is the mechanism a model uses to weigh how strongly each token in its context relates to the others when predicting the next one. It is the basis of how a model actually uses context.
Read definition →Attention budget
The attention budget is the idea that a model's effective attention is a finite resource spread across the whole context window. The more you put in, the thinner the attention on each piece.
Read definition →Lost in the middle
Lost in the middle is the well-known tendency for models to attend best to the start and end of a long context and to miss information buried in the middle.
Read definition →