Providers & requests

Cache tokens

Also called: cached tokens

Cache tokens are input tokens served from the prefix cache at a reduced rate. They are how prompt caching shows up as a separate line in your usage numbers.

James Phoenix
Understanding Data Updated July 2, 2026

When a prefix cache hits, the tokens it covers do not vanish from your bill; they get reclassified. Cache tokens are the portion of your input tokens that the provider served from cache rather than processing fresh. Same tokens, cheaper rate.

Reading them in your usage

This is mostly something you notice in the numbers. Providers break input down into a few buckets in the usage they return on each request, and you will typically see something like:

  • Cache read tokens. Prefix tokens that were already cached and reused, billed at a large discount.
  • Cache write tokens. Tokens being stored into the cache for the first time, sometimes billed at a slight premium.
  • Uncached input tokens. The genuinely new part of the request, at the normal input rate.

Add those up and you get your total input for the call.

Why the number is worth watching

Cache tokens are a direct, honest signal of how well your caching is working. A healthy long session should show most of its input arriving as cache reads, because the big stable prefix is being reused over and over. If that number stays low, something is busting the cache: an unstable system prompt, reordered context, or content that changes near the front on every request.

Note
Cache tokens save you money without changing what the model sees. The output is identical whether a prefix was cached or not; caching only changes the cost and speed of getting there.

Related terms

Building with AI agents?

This dictionary is part of how I think about agentic engineering. If you want the same thinking applied to your codebase, that is what I do.

See how I can help