Every model provider request has two sides. Input tokens are the tokens you send in: the system prompt, the whole conversation so far, any files the agent has read, the tool definitions, and your latest message. They are also called prompt tokens. The provider adds them all up, and that total is what the model reads before it writes a single word back.
What counts as input
It is easy to underestimate this, because most input is not typed by you. On a real coding request the input is dominated by:
- The system prompt and every tool definition, sent on every request.
- The full history of the session, which only grows.
- File contents and command output the agent has pulled in.
Your actual instruction is often the smallest part.
Why input tokens deserve attention
Two reasons, one about cost and one about quality:
- You pay for them. Input tokens are billed, usually at a lower rate than output tokens, but there are far more of them, so they often dominate the bill on a long session.
- They fill the [context window](/ai-coding-dictionary/context-window). Input is what consumes the window, and a window packed with marginally relevant input both costs more and dilutes the model's attention.
Related terms
Token
A token is the unit of text a model reads and writes: a chunk that is usually part of a word, not a whole word or a single character. Everything is measured in tokens, including your context window and your bill.
Read definition →Output tokens
Output tokens are the tokens a model generates in its response, including any hidden reasoning. They are usually priced higher than input tokens, and turning up effort produces more of them.
Read definition →Context window
The context window is the maximum amount of text, measured in tokens, that a model can consider for a single request. It is a hard ceiling, and it is the main resource you manage when working with an agent.
Read definition →