A Scroll
The Economics of Tokens
Every word you speak to an AI costs something. Learn the currency before the bill comes due.
Opening
A friend of mine, a senior executive, spent his first weekend with Claude like a man who had just discovered electricity. He asked it to summarize his inbox, draft his Monday brief, rewrite his deck, and late on a Sunday evening set it loose to research the entire competitive landscape of his industry, one company at a time, in a single conversation that did not end until sometime Monday morning.
On Tuesday, he got the bill.
He called me, a little pale, and asked the same question every operator eventually asks: What actually costs money here, and how did I spend so much of it without noticing?
The answer is the subject of this scroll. The currency of the AI era is not attention. It is not compute. It is not even time. It is the token. And like any currency, it rewards the patient and punishes the careless. Most operators will never read the fine print. You are not going to be one of them.
The Core Teaching
1. A token is a unit of language made measurable
A token is a fragment of text, usually about three to four characters, or roughly three-quarters of a word. The sentence “The wise operator counts the cost” is seven words and about nine tokens. A short email is around 200 tokens. A long memo is two thousand. The King James Bible, cover to cover, is about one million.
Every interaction with a large language model is priced in tokens. You are not paying for questions, and you are not paying for answers. You are paying for language, by the piece, in both directions.
This sounds small until you remember that a model working on a real task does not see just your question. It sees your question, plus the full system prompt, plus the conversation so far, plus whatever files or search results it has pulled in, plus its own reasoning as it works. A single “quick question” can carry ten thousand tokens on its back.
A working sense of scale:
| What you’re sending | Roughly |
|---|---|
| One short sentence | ~10 tokens |
| A short email | ~200 tokens |
| A long memo | ~2,000 tokens |
| A ten-page document | ~5,000 tokens |
| A hundred-page PDF | ~50,000 tokens |
| The King James Bible, cover to cover | ~1,000,000 tokens |
Keep those numbers in your pocket. Nearly every cost decision you make in the AI era will start with a silent estimate against this table.
2. Input and output are priced differently, and it matters more than you think
Here is the asymmetry that almost nobody learns on day one: input tokens are cheap, and output tokens are expensive.
On today’s frontier models, output typically costs three to five times what input costs. This is not an accident. Input is what the model reads. Output is what the model generates. That is the slow, expensive part where the neural network is actually doing the work you came for.
The operator’s first instinct, when a bill arrives, is usually to shorten the prompt. That is the wrong instinct. The scalpel goes on the output side. Ask for bullets instead of paragraphs. Ask for the JSON without the explanation. Say “do not repeat the question back to me.” Say “stop after the third example.” Every word the model does not write is a word you do not pay for.
You can feed a model a hundred-page document as input, and you will pay less for that than for a two-page essay it writes back. Sit with that until it becomes instinct.
What a real request looks like when you price it out. A “quick question” to Claude about a file is rarely just the question:
| Ingredient | Tokens | Side |
|---|---|---|
| System prompt | 800 | input |
| Conversation so far | 6,000 | input |
| The file you attached | 4,000 | input |
| Your question | 50 | input |
| Claude’s answer | 400 | output |
| Total | 11,250 | · |
At representative frontier pricing of around $3 per million input tokens and $15 per million output, that single exchange costs roughly four cents. One cent by itself is unremarkable. Ten thousand such calls is four hundred dollars. Without caching discipline, closer to two thousand. The scalpel, as promised, belongs on the output side.
3. The context window is a room with a door that stays open
The context window is the amount of language the model can hold in mind at once. Modern frontier models have windows of 200,000 to a million tokens, staggeringly large by the standards of even two years ago.
But here is the catch: you pay for everything in the room. Every token you load into the context, every file you attached, every prior message, every tool result, is billed on every subsequent turn as long as it stays in the window. If your conversation has accumulated 80,000 tokens of context, your next simple question costs you 80,000 tokens of input before the model types a single word of reply.
Operators with good habits close the door. They start a fresh session when the topic changes. They delete attachments they no longer need. They keep the context as long as it serves the work and no longer. Operators with bad habits run one monstrous thread for weeks and wonder why every exchange feels slower and more expensive than the last. Both things are true because of the same mechanism.
4. Caching, batching, and budgets: the three levers
Three mechanisms, once learned, change the economics of your work permanently.
Prompt caching is the discipline of reusing the same stable prefix across many calls. Your system prompt, your style guide, your knowledge base. The parts that do not change from request to request can be cached by the model provider and billed at roughly a tenth of normal price on subsequent reads. If you build any workflow that runs more than a handful of times a day on the same instructions, caching is not optional. It is the difference between a sustainable habit and a leaking faucet.
Batch inference is a separate pricing tier for work that does not need an answer in the next five seconds. Submit a hundred requests as a batch and the provider will return them within a day, at roughly half price. For overnight work (research sweeps, bulk rewrites, mass classification), this is the single easiest fifty-percent discount in software. Most operators never use it because they do not know it exists.
A token budget is a target cost you assign to a workflow before you run it. This daily digest should cost no more than 80,000 tokens. This client research run should cost no more than 400,000. The budget is not a rule the model enforces. It is a compass you hold. Without it, every workflow drifts toward maximalism: more sources, more reasoning, more passes. With it, you notice the drift before it empties the account.
5. Agent loops: where bills go to die
An agent is a model given tools and told to keep going until the work is done. It is the most powerful pattern in modern AI. It is also the most expensive pattern in modern AI, and the reason is simple: an agent’s cost is not bounded by its prompt. It is bounded by its willingness to stop.
Every loop the agent takes (read a file, think, call a tool, think again, read another file) adds to the bill. A poorly scoped agent will happily spend a hundred thousand tokens clarifying a problem you could have scoped in a sentence. A well-scoped agent will finish the same job in five thousand.
The wise operator treats an agent the way a foreman treats a skilled contractor: the work is serious, the pay is fair, and the scope is written down before the first hammer swings. The operator’s discipline here is not technical. It is the oldest discipline in the book: count the cost before you build.
The same question, two different bills. Imagine you ask an agent: “Summarize the key differences between A, B, and C.”
| Scope | Steps | Total tokens | Approx. cost |
|---|---|---|---|
| Tight: three sources, one synthesis pass | 6 | ~14,000 | ~$0.07 |
| Loose: twenty-four sources, self-clarifying, four rewrites | ~20 | ~260,000 | ~$1.92 |
Same question. Twenty-seven times the bill. The only difference is whether someone wrote a sentence bounding the scope before the agent began.
6. How a token is actually priced
Providers post prices per million tokens, abbreviated MTok. It is the same convention gasoline uses (per gallon) or electricity (per kilowatt-hour): the per-unit cost is so small that a single token has no real price tag. You buy and sell them in bulk.
At representative frontier pricing of $3 per MTok input and $15 per MTok output, the arithmetic is small but the habit is important:
| Tokens | Cost if input ($3/MTok) | Cost if output ($15/MTok) |
|---|---|---|
| 100 | $0.0003 | $0.0015 |
| 1,000 | $0.003 | $0.015 |
| 10,000 | $0.03 | $0.15 |
| 100,000 | $0.30 | $1.50 |
| 1,000,000 | $3.00 | $15.00 |
Cheaper models (Haiku-class) run roughly an order of magnitude below these numbers. The most capable models (Opus-class) run three to five times above. Choosing the right model for the job is a cost decision as much as it is a capability decision.
And here is what most operators miss entirely: tokens are not just an API concern. Claude Pro, ChatGPT Plus, Claude Max, Cursor, Windsurf, Claude Code itself. Every monthly subscription you pay for is token-metered underneath the brand name. You do not get an itemized bill at the end of the month. You get throttled, rate-limited, or cut off when you blow through your allowance. The exact allowance varies (Pro plans give you roughly five times a free account’s usage; Max plans roughly twenty), but the mechanism is identical. Every subscription is a pre-paid token budget with a friendly wrapper.
If you use Claude Code in a high-context conversation for two hours, you are not only spending time. You are spending your daily allocation. The operator who understands this plans their context the way a backcountry hiker plans water: rationed, replenished, never assumed.
The Operator’s Application
Four practices, starting this week.
First, learn to read your own bill. Your API provider shows you token usage per request. Your subscription dashboard shows you allowance remaining. Look at both. For one week, glance at the counter after every non-trivial call. You will develop a gut feel for cost faster than any training could give you. A person who cannot feel the cost of a thing spends it carelessly. A person who can, does not.
Second, cache everything that repeats. If you have a system prompt, a style guide, a reference document you consult daily, put them in a cached prefix. One afternoon of setup will return dividends every day for the rest of the year.
Third, move overnight work to batch. Anything that does not need an answer in the moment (research sweeps, catch-up summaries, document processing) should be batched. The fifty-percent discount is a gift. Receive it.
Fourth, scope your agents as you would scope a contractor. Before any agent loop (research, build, draft, investigate), write the scope in one sentence. What is done. What is out of scope. What signals the end. An agent with a clear stop condition is a blessing. An agent without one is a bonfire you paid to light.
Wisdom Close
Seneca, writing in Letters to Lucilius, watched his friends run through fortunes and wrote back a single diagnosis: “No one learns how to live until he has lived almost all his life.” He was not talking about money, though he could have been. He was talking about the human gift for spending carelessly what we cannot replace, and only noticing the loss when the account is empty. The Stoics did not moralize about frugality. They practiced attention.
The deeper current runs through a gospel parable. In Luke 14, Jesus stops the crowd following him and asks a question that has nothing to do with discipleship on its surface and everything to do with it underneath: “Which of you, intending to build a tower, does not first sit down and count the cost, whether he has enough to finish it?” The warning is not against ambition. It is against ambition uncoupled from attention.
Tokens are not moral. The bill is not moral. But the attention you pay to the cost, that is where stewardship lives. Every operator using AI today is building a tower of some kind: a practice, a service, a body of work. The wise ones count the cost before the foundation goes in. The careless ones discover it, like my friend on a Tuesday morning, after the weekend is spent.
Stewardship is not frugality. It is the discipline of treating powerful things with the respect they deserve. A model that can read a library and write a book in thirty seconds deserves to be used by operators who know what it costs to ask.
What You’ll Carry Away
This week, take one workflow you run with AI (a daily summary, a weekly review, a research pattern, a writing assistant) and do two things before you run it: estimate its token cost, and then measure the real number after. Where did your estimate miss? What single change, applied once, would cut that bill in half?
You are not trying to be cheap. You are trying to build an instinct. The operators who thrive in this era will be the ones who know, in their bones, what a thing costs, and who can still choose to pay it, gladly, when the work is worth the price.