Output tokens

Output tokens are the other half of a request: the tokens the model actually generates. The reply you read, the code it writes, the tool calls it emits, and on reasoning models a chunk of hidden thinking too, all of it is output. They are sometimes called completion tokens.

Slower and pricier than input

Output behaves differently from input tokens in two ways that matter:

They cost more. Providers almost always price output tokens higher than input, often several times higher. Generating text is the expensive part.
They are produced one at a time. The model writes output sequentially, each token after the last, which is why a long answer streams in slowly and takes real wall-clock time. Input, by contrast, is read in one pass.

Where they quietly pile up

The surprise on modern models is how much output you cannot see. When you raise effort, the model reasons more before answering, and that reasoning is billed as output even if it never reaches your screen. A short final answer can sit on top of a large, invisible pile of reasoning tokens.

Because output is the slow, costly side of a request, brevity has value. Asking an agent to write a novel where a sentence would do is not just noise, it is latency and money.

Tip

If responses feel slow, the length of the output is usually why, not the size of your input. Ask for concise answers, and reserve high effort for problems that genuinely need the extra reasoning.

Slower and pricier than input

Where they quietly pile up

Related terms

Token

Input tokens

Effort

Building with AI agents?