Tag
optimization
3 entries tagged optimization · 3 terms.
Dictionary
Agent Loop Cost
The compounding token cost of a tool-using agent. Each turn of the loop feeds the entire conversation history plus the tool result back into the model, so costs grow non-linearly with the number of steps. A five-step agent can cost fifteen to twenty times a single-prompt equivalent.
Batch Inference
A processing mode where you submit a batch of requests and the provider returns results over minutes or hours instead of seconds. In exchange, you pay fifty percent less. Anthropic and OpenAI both offer it. For non-urgent work, it cuts your bill in half.
Prompt Caching
A feature that stores a chunk of your prompt on the provider's side so repeated calls read from the cache instead of re-processing. Claude caches are one-tenth the price of fresh input tokens. Hitting the cache is the single biggest cost lever in production AI.