Providers & requests

Output tokens

Also called: completion tokens

Output tokens are the tokens a model generates in its response, including any hidden reasoning. They are usually priced higher than input tokens, and turning up effort produces more of them.

James Phoenix
Understanding Data Updated July 2, 2026

Output tokens are the other half of a request: the tokens the model actually generates. The reply you read, the code it writes, the tool calls it emits, and on reasoning models a chunk of hidden thinking too, all of it is output. They are sometimes called completion tokens.

Slower and pricier than input

Output behaves differently from input tokens in two ways that matter:

  • They cost more. Providers almost always price output tokens higher than input, often several times higher. Generating text is the expensive part.
  • They are produced one at a time. The model writes output sequentially, each token after the last, which is why a long answer streams in slowly and takes real wall-clock time. Input, by contrast, is read in one pass.

Where they quietly pile up

The surprise on modern models is how much output you cannot see. When you raise effort, the model reasons more before answering, and that reasoning is billed as output even if it never reaches your screen. A short final answer can sit on top of a large, invisible pile of reasoning tokens.

Because output is the slow, costly side of a request, brevity has value. Asking an agent to write a novel where a sentence would do is not just noise, it is latency and money.

Tip
If responses feel slow, the length of the output is usually why, not the size of your input. Ask for concise answers, and reserve high effort for problems that genuinely need the extra reasoning.

Related terms

Building with AI agents?

This dictionary is part of how I think about agentic engineering. If you want the same thinking applied to your codebase, that is what I do.

See how I can help