Knowledge & failure modes

Attention

Also called: attention mechanism

Attention is the mechanism a model uses to weigh how strongly each token in its context relates to the others when predicting the next one. It is the basis of how a model actually uses context.

James Phoenix
Understanding Data Updated July 2, 2026

Attention is how a model decides which parts of the text in front of it matter for what it is about to say next. For each token it produces, it weighs how strongly every other token in the context relates to the current one, then leans on the relevant ones.

How to picture it

Imagine the model reading your whole context window at once and, at every step, asking "which of these earlier tokens should influence this next one most?" A function name it needs to match, a constraint you stated three paragraphs up, the variable it just declared: attention is what lets a token here connect to the one that matters over there. It is the reason a model can keep a long piece of code coherent instead of treating each line in isolation.

A useful mental model:

  • Attention is about the relationships between tokens, not just their order.
  • Every token can, in principle, attend to every other token in the context.
  • The strength of those connections is learned during training, not hand-written.

Why it matters for coding

You do not tune attention directly, but nearly every context habit is really about helping it. Attention is not infinite: it spreads across everything in the window, which is the idea behind an attention budget, and it grows less reliable as the context fills up, which is attention degradation.

Note
There is no dial labelled "attention." It is an internal part of how the model works. It is worth understanding because it explains why a shorter, sharper context so often beats a giant one: less to spread across means stronger focus on what counts.

Related terms

Building with AI agents?

This dictionary is part of how I think about agentic engineering. If you want the same thinking applied to your codebase, that is what I do.

See how I can help