Attention

Attention is how a model decides which parts of the text in front of it matter for what it is about to say next. For each token it produces, it weighs how strongly every other token in the context relates to the current one, then leans on the relevant ones.

How to picture it

Imagine the model reading your whole context window at once and, at every step, asking "which of these earlier tokens should influence this next one most?" A function name it needs to match, a constraint you stated three paragraphs up, the variable it just declared: attention is what lets a token here connect to the one that matters over there. It is the reason a model can keep a long piece of code coherent instead of treating each line in isolation.

A useful mental model:

Attention is about the relationships between tokens, not just their order.
Every token can, in principle, attend to every other token in the context.
The strength of those connections is learned during training, not hand-written.

Why it matters for coding

You do not tune attention directly, but nearly every context habit is really about helping it. Attention is not infinite: it spreads across everything in the window, which is the idea behind an attention budget, and it grows less reliable as the context fills up, which is attention degradation.

Note

There is no dial labelled "attention." It is an internal part of how the model works. It is worth understanding because it explains why a shorter, sharper context so often beats a giant one: less to spread across means stronger focus on what counts.

How to picture it

Why it matters for coding

Related terms

Context window

Attention budget

Attention degradation

Building with AI agents?