Everything an agent does eventually becomes a model provider request: one call to the provider carrying the full state of the conversation, and one response coming back. It is the atomic unit of work. The model is stateless, so each request has to include everything the model needs, the system prompt, the history, the tool definitions, all of it, every single time.
What a request looks like
A request to Anthropic's Messages API bundles the model id, a token budget, the running list of messages, and the tools the model is allowed to call:
const response = await client.messages.create({
model: 'claude-sonnet-5',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Fix the failing test in auth.ts' },
],
tools: [readFileTool, runCommandTool],
})Everything you send counts as input tokens; everything the model generates comes back as output tokens. Both land on the bill for this one call.
One turn, many requests
The thing that surprises people is that a single turn is usually not one request. When the model wants a tool, it stops and returns a tool call. Your harness runs the tool, appends the result to the messages, and makes a fresh request so the model can continue. A single "fix this bug" instruction might be ten or twenty requests before the agent is done.
Related terms
Model provider
A model provider is the company or service that hosts a model behind an API. Your agent sends requests to it and gets completions back; you never run the model yourself.
Read definition →Harness
The harness is the code wrapped around a model that builds requests, runs tools, manages context, and enforces permissions. It is the agent minus the model, and it is where most of the real engineering lives.
Read definition →Input tokens
Input tokens are the tokens you send in a request: the system prompt, the conversation history, loaded files, and tool definitions. You are billed for them, and they count against the context window.
Read definition →Output tokens
Output tokens are the tokens a model generates in its response, including any hidden reasoning. They are usually priced higher than input tokens, and turning up effort produces more of them.
Read definition →