Output tokens are the other half of a request: the tokens the model actually generates. The reply you read, the code it writes, the tool calls it emits, and on reasoning models a chunk of hidden thinking too, all of it is output. They are sometimes called completion tokens.
Slower and pricier than input
Output behaves differently from input tokens in two ways that matter:
- They cost more. Providers almost always price output tokens higher than input, often several times higher. Generating text is the expensive part.
- They are produced one at a time. The model writes output sequentially, each token after the last, which is why a long answer streams in slowly and takes real wall-clock time. Input, by contrast, is read in one pass.
Where they quietly pile up
The surprise on modern models is how much output you cannot see. When you raise effort, the model reasons more before answering, and that reasoning is billed as output even if it never reaches your screen. A short final answer can sit on top of a large, invisible pile of reasoning tokens.
Because output is the slow, costly side of a request, brevity has value. Asking an agent to write a novel where a sentence would do is not just noise, it is latency and money.
Related terms
Token
A token is the unit of text a model reads and writes: a chunk that is usually part of a word, not a whole word or a single character. Everything is measured in tokens, including your context window and your bill.
Read definition →Input tokens
Input tokens are the tokens you send in a request: the system prompt, the conversation history, loaded files, and tool definitions. You are billed for them, and they count against the context window.
Read definition →Effort
Effort is a dial for how much internal reasoning a model spends before it answers. Turn it up for genuinely hard problems; you pay for it in latency and extra output tokens.
Read definition →