Cost is an architecture signal, not only a finance report.
Why it matters
Agents can spend money in loops, retries, long context, duplicate retrievals, and overpowered models. If the system cannot explain spend at the level of a feature and task, optimisation becomes guesswork and budgets turn into after-the-fact surprises.
Build this
- Usage accounting for input tokens, output tokens, cache tokens, tool calls, and retries.
- Budgets by environment, tenant, user, feature, workflow, and background job.
- Model routing rules that reserve expensive models for tasks that justify them.
- Kill switches, soft limits, degradation paths, and alerts before spend becomes an incident.
Watch for
- Only tracking total provider invoice cost.
- Long agent loops with no max step count or budget ceiling.
- Background jobs using frontier models because the default client does.
- Caching decisions made without measuring freshness or cache hit quality.
Proof it works
- A trace can show the cost of a single task and the reason for each expensive call.
- Budget limits are tested for both foreground and background work.
- Cost dashboards can rank spend by feature, tenant, and model.
Implementation checklist
Attach cost metadata to every model request at the serving layer.
Set per-task token and step budgets, then make overruns explicit states.
Measure quality before and after routing a task to a cheaper model.
Review high-spend traces regularly, not only when invoices arrive.