Give a model a long context and it does not attend to all of it evenly. It tends to use what sits at the beginning and the end well, and to overlook what is buried in the middle. The effect was documented in the 2023 paper Lost in the Middle: How Language Models Use Long Contexts, and the name stuck because it describes exactly what you see in practice: the right answer is in the context, but the model reads past it.
Why it bites
It is the reason a bigger context window is not a free win. You can fit a hundred documents in, but if the one that matters lands in the middle of the pile, the model may never really see it. "Just put everything in the prompt" quietly fails, and it fails silently: the answer looks confident and is simply wrong.
What to do about it
- Retrieve, do not dump. Pull in the few passages that matter with RAG instead of stuffing the whole corpus in and hoping.
- Mind the placement. If you must include a lot, put the most important material at the start or the end, not lost in the middle.
- Keep it short. A tighter context has no unattended middle. Less to spread across means stronger focus on what counts.
Related terms
Context engineering
Context engineering is the discipline of deciding what a model sees. Since a model can only work from the text in front of it, the quality of any answer is capped by the quality of the context you assemble.
Read definition →Retrieval-augmented generation (RAG)
RAG is the workhorse pattern of context engineering: retrieve the material relevant to a request, put it in the context, and let the model generate an answer grounded in it rather than guessing from memory.
Read definition →Chunking
Chunking is splitting a long document into smaller pieces before you embed and retrieve them. The size and overlap of the chunks decide what can be found as a unit, so it quietly makes or breaks a retrieval system.
Read definition →