Foundations

Next-token prediction

Next-token prediction is the one job a language model does: given the text so far, predict the most likely next token, add it, and repeat. It is both the training objective and what runs at inference.

James Phoenix
Understanding Data Updated July 2, 2026

Underneath all the apparent intelligence, a model does exactly one thing: next-token prediction. You give it a sequence of tokens, and it produces a probability for what the next token should be. Pick one, append it, feed the whole thing back in, and predict again. A full paragraph of code is just that loop run hundreds of times.

One objective, start to finish

This is not a metaphor for how models work; it is the literal mechanism. During training the model is graded on a single task: predict the next token in real text and code, over and over, until it gets good at it. At inference it runs the same task. There is no separate "reasoning module" that gets switched on. The reasoning you see is what emerges from a system that got extremely good at predicting what comes next.

Why this shapes how you use it

Two practical truths fall straight out of this:

  • It completes, it does not consult. The model is continuing a pattern, not looking anything up. That is why it is fluent, and also why it can produce a confident, well-formed answer that is wrong.
  • What comes before steers everything. Since each token is conditioned on the text so far, the context you provide is not a hint, it is the input that determines the prediction. Better context, better completion.
Tip
When a model goes off the rails, think of it as pattern-matching the wrong continuation. Often the fix is to change what precedes it, not to argue with the output.

Related terms

Building with AI agents?

This dictionary is part of how I think about agentic engineering. If you want the same thinking applied to your codebase, that is what I do.

See how I can help