Cost Control for AI Agents

Principle

Cost is an architecture signal, not only a finance report.

Why it matters

Agents can spend money in loops, retries, long context, duplicate retrievals, and overpowered models. If the system cannot explain spend at the level of a feature and task, optimisation becomes guesswork and budgets turn into after-the-fact surprises.

Build this

Usage accounting for input tokens, output tokens, cache tokens, tool calls, and retries.
Budgets by environment, tenant, user, feature, workflow, and background job.
Model routing rules that reserve expensive models for tasks that justify them.
Kill switches, soft limits, degradation paths, and alerts before spend becomes an incident.

Watch for

Only tracking total provider invoice cost.
Long agent loops with no max step count or budget ceiling.
Background jobs using frontier models because the default client does.
Caching decisions made without measuring freshness or cache hit quality.

Proof it works

A trace can show the cost of a single task and the reason for each expensive call.
Budget limits are tested for both foreground and background work.
Cost dashboards can rank spend by feature, tenant, and model.

Implementation checklist

Attach cost metadata to every model request at the serving layer.

Set per-task token and step budgets, then make overruns explicit states.

Measure quality before and after routing a task to a cheaper model.

Review high-spend traces regularly, not only when invoices arrive.

Related dictionary terms

Cache tokens Output tokens Effort

Control LLM spend before it surprises you