Cache tokens - AI Coding Dictionary

When a prefix cache hits, the tokens it covers do not vanish from your bill; they get reclassified. Cache tokens are the portion of your input tokens that the provider served from cache rather than processing fresh. Same tokens, cheaper rate.

Reading them in your usage

This is mostly something you notice in the numbers. Providers break input down into a few buckets in the usage they return on each request, and you will typically see something like:

Cache read tokens. Prefix tokens that were already cached and reused, billed at a large discount.
Cache write tokens. Tokens being stored into the cache for the first time, sometimes billed at a slight premium.
Uncached input tokens. The genuinely new part of the request, at the normal input rate.

Add those up and you get your total input for the call.

Why the number is worth watching

Cache tokens are a direct, honest signal of how well your caching is working. A healthy long session should show most of its input arriving as cache reads, because the big stable prefix is being reused over and over. If that number stays low, something is busting the cache: an unstable system prompt, reordered context, or content that changes near the front on every request.

Note

Cache tokens save you money without changing what the model sees. The output is identical whether a prefix was cached or not; caching only changes the cost and speed of getting there.

Related terms

Prefix cache

A prefix cache lets a provider reuse the unchanged front of your request instead of reprocessing it, so repeated prefixes are cheaper and faster. It is the main reason keeping the start of your prompt stable pays off.

Read definition →

Input tokens

Input tokens are the tokens you send in a request: the system prompt, the conversation history, loaded files, and tool definitions. You are billed for them, and they count against the context window.