Underneath all the apparent intelligence, a model does exactly one thing: next-token prediction. You give it a sequence of tokens, and it produces a probability for what the next token should be. Pick one, append it, feed the whole thing back in, and predict again. A full paragraph of code is just that loop run hundreds of times.
One objective, start to finish
This is not a metaphor for how models work; it is the literal mechanism. During training the model is graded on a single task: predict the next token in real text and code, over and over, until it gets good at it. At inference it runs the same task. There is no separate "reasoning module" that gets switched on. The reasoning you see is what emerges from a system that got extremely good at predicting what comes next.
Why this shapes how you use it
Two practical truths fall straight out of this:
- It completes, it does not consult. The model is continuing a pattern, not looking anything up. That is why it is fluent, and also why it can produce a confident, well-formed answer that is wrong.
- What comes before steers everything. Since each token is conditioned on the text so far, the context you provide is not a hint, it is the input that determines the prediction. Better context, better completion.
Related terms
Token
A token is the unit of text a model reads and writes: a chunk that is usually part of a word, not a whole word or a single character. Everything is measured in tokens, including your context window and your bill.
Read definition →Inference
Inference is the act of running a trained model to get an answer: text goes in, a prediction comes out. Every message you send to a coding agent is an inference. It is the opposite end of the lifecycle from training.
Read definition →Model
A model is the trained artifact at the centre of every AI coding tool: a large file of numbers (parameters) that, given some text, produces the most likely continuation. When people say "which model are you using," this is the thing they mean.
Read definition →