Lost in the middle describes a well-documented quirk: give a model a long context window and it pays closest attention to the beginning and the end, while information stuck in the middle is the most likely to be overlooked. Put the same fact at the top or the bottom and the model uses it. Bury it halfway down a huge prompt and it can behave as if the fact were never there.
Why it happens
It comes down to how attention is distributed. The model does not read like a person working top to bottom. It weighs every token against every other, and in practice the ends of a long sequence tend to win that competition. As the context grows, the middle gets thinner coverage, which is one concrete face of the broader problem of attention degradation: more text in, less reliable recall of any single piece.
What to do about it
You can design around it:
- Put the important stuff at the edges. Lead with the key instruction or spec, and restate the critical constraint near the end, right before you ask for the work.
- Do not pad. Every irrelevant page you add pushes real information toward the neglected middle.
- Keep it short enough to matter. A tight context has no bad middle to get lost in.
Related terms
Context window
The context window is the maximum amount of text, measured in tokens, that a model can consider for a single request. It is a hard ceiling, and it is the main resource you manage when working with an agent.
Read definition →Attention degradation
Attention degradation is the quality drop a model shows as its context grows: recall weakens and it misses or confuses buried details, often well below the hard token limit. It is also called context rot.
Read definition →Attention
Attention is the mechanism a model uses to weigh how strongly each token in its context relates to the others when predicting the next one. It is the basis of how a model actually uses context.
Read definition →