Model provider request

Everything an agent does eventually becomes a model provider request: one call to the provider carrying the full state of the conversation, and one response coming back. It is the atomic unit of work. The model is stateless, so each request has to include everything the model needs, the system prompt, the history, the tool definitions, all of it, every single time.

What a request looks like

A request to Anthropic's Messages API bundles the model id, a token budget, the running list of messages, and the tools the model is allowed to call:

TypeScript

const response = await client.messages.create({
  model: 'claude-sonnet-5',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Fix the failing test in auth.ts' },
  ],
  tools: [readFileTool, runCommandTool],
})

Everything you send counts as input tokens; everything the model generates comes back as output tokens. Both land on the bill for this one call.

One turn, many requests

The thing that surprises people is that a single turn is usually not one request. When the model wants a tool, it stops and returns a tool call. Your harness runs the tool, appends the result to the messages, and makes a fresh request so the model can continue. A single "fix this bug" instruction might be ten or twenty requests before the agent is done.

Note

Because each request re-sends the whole conversation, cost and latency creep up as a session grows. A long agent run is not one big call, it is many calls that each get a little heavier.

What a request looks like

One turn, many requests

Related terms

Model provider

Harness

Input tokens

Output tokens

Building with AI agents?