Next-token prediction

Underneath all the apparent intelligence, a model does exactly one thing: next-token prediction. You give it a sequence of tokens, and it produces a probability for what the next token should be. Pick one, append it, feed the whole thing back in, and predict again. A full paragraph of code is just that loop run hundreds of times.

One objective, start to finish

This is not a metaphor for how models work; it is the literal mechanism. During training the model is graded on a single task: predict the next token in real text and code, over and over, until it gets good at it. At inference it runs the same task. There is no separate "reasoning module" that gets switched on. The reasoning you see is what emerges from a system that got extremely good at predicting what comes next.

Why this shapes how you use it

Two practical truths fall straight out of this:

It completes, it does not consult. The model is continuing a pattern, not looking anything up. That is why it is fluent, and also why it can produce a confident, well-formed answer that is wrong.
What comes before steers everything. Since each token is conditioned on the text so far, the context you provide is not a hint, it is the input that determines the prediction. Better context, better completion.

Tip

When a model goes off the rails, think of it as pattern-matching the wrong continuation. Often the fix is to change what precedes it, not to argue with the output.

One objective, start to finish

Why this shapes how you use it

Related terms

Token

Inference

Model

Building with AI agents?