Input and Output Tokens
Input tokens are what you send to the model. Output tokens are what the model writes back. Output is almost always priced higher than input because it requires the model to actually generate rather than just read.
Every API call has two meters running. The input meter counts everything you put in: system prompt, user messages, tool definitions, prior conversation, uploaded files after conversion. The output meter counts only what the model generates. A typical pricing ratio is three to one or five to one, output to input, meaning generated text is far more expensive per token than context you pass in.
The Simple Version
Reading is cheap. Writing is expensive. If you ask the model to summarize a fifty-page document in one paragraph, you pay for fifty pages of input and one paragraph of output. The second number is almost negligible compared to the first. Now flip it: ask the model to write a fifty-page report from a one-paragraph brief and the economics invert completely. Output dominates.
Why It Matters
The ratio of input to output defines the cost curve of every AI feature. Chatbots have short inputs and short outputs, so they stay cheap. Research agents have long inputs and short outputs, so they are moderate. Long-form generation has short inputs and long outputs, which can get expensive fast. Knowing this shape tells you where to optimize.
How It’s Used on This Site
TWO’s digest pipeline runs high-input, low-output. Research pack in, editorial markdown out. That makes it cheap to operate even with frequent cache misses. The Scrolls pipeline will be the opposite, so we budget accordingly and limit frequency.