What is Prompt Caching?

Prompt Caching | The Wise Operator

Prompt Caching

A feature that stores a chunk of your prompt on the provider's side so repeated calls read from the cache instead of re-processing. Claude caches are one-tenth the price of fresh input tokens. Hitting the cache is the single biggest cost lever in production AI.

Prompt caching lets you mark part of your prompt as cacheable. On the first call, the provider reads and stores it. On subsequent calls with the same prefix, the model reads from the cache at a fraction of the cost. Claude’s cache hits cost ten percent of normal input rate. GPT-5 offers similar discounts. For applications with large stable context, this is transformative.

The Simple Version

If you send the same 20,000 tokens of instructions and documents every time, pay for them once, not every time. That is what caching buys you. The cache lasts a few minutes to a few hours depending on the provider.

Why It Matters

In production apps, cache hit rate often determines whether the unit economics work. A support chatbot sending the same product documentation to every call can run at one-tenth the cost with good caching. Without it, the same application costs ten times more and scales ten times worse. Cache hit rate is the metric that separates profitable AI products from unprofitable ones.

How It’s Used on This Site

TWO’s digest pipeline caches the PURPOSE.md voice guide and the prior-digest list across calls. Cache hit rate is above eighty-five percent on typical runs. This is why the digest costs pennies per edition instead of dollars.

Prompt Caching

The Simple Version

Why It Matters

How It’s Used on This Site

Related Terms

Scrolls That Teach This

The Economics of Tokens