Run / Production Operations

08

Cost Control

Control LLM spend before it surprises you

Track and shape LLM spend by task, model, tenant, user, feature, token type, retry path, and business value.

Principle

Cost is an architecture signal, not only a finance report.

Why it matters

Agents can spend money in loops, retries, long context, duplicate retrievals, and overpowered models. If the system cannot explain spend at the level of a feature and task, optimisation becomes guesswork and budgets turn into after-the-fact surprises.

Build this

  • Usage accounting for input tokens, output tokens, cache tokens, tool calls, and retries.
  • Budgets by environment, tenant, user, feature, workflow, and background job.
  • Model routing rules that reserve expensive models for tasks that justify them.
  • Kill switches, soft limits, degradation paths, and alerts before spend becomes an incident.

Watch for

  • Only tracking total provider invoice cost.
  • Long agent loops with no max step count or budget ceiling.
  • Background jobs using frontier models because the default client does.
  • Caching decisions made without measuring freshness or cache hit quality.

Proof it works

  • A trace can show the cost of a single task and the reason for each expensive call.
  • Budget limits are tested for both foreground and background work.
  • Cost dashboards can rank spend by feature, tenant, and model.

Implementation checklist

01

Attach cost metadata to every model request at the serving layer.

02

Set per-task token and step budgets, then make overruns explicit states.

03

Measure quality before and after routing a task to a cheaper model.

04

Review high-spend traces regularly, not only when invoices arrive.

Related dictionary terms

Keep the framework connected.

Each factor is useful alone, but the system only becomes production-grade when build, run, and govern controls reinforce each other.

Return to the hub