Adaptive Thinking
An inference behavior where a frontier model self-scales its reasoning depth and tool-use budget to match the complexity of the prompt, without the operator flipping a manual extended-thinking or test-time-compute switch.
What It Is
Adaptive thinking is the design choice, introduced by Anthropic in Claude Fable 5 on June 9, 2026, to let a model decide on its own how long to reason on a given prompt instead of asking the operator to set a thinking budget in advance. Older frontier models exposed reasoning as a knob. A thinking-mode toggle on Claude. A “high effort” tier on GPT. A test-time-compute slider in the API. Each of those knobs put the burden of estimating task difficulty on the operator, who almost never had enough information to estimate well. Adaptive thinking moves the estimation inside the model.
The mechanism is not a single switch. The model is trained to size up the prompt against its own internal sense of difficulty, allocate chain of thought accordingly, and stop when the next step is no longer adding signal. A trivial prompt gets a short pass. A multi-step refactor gets a long one. A multi-day agent session can dial reasoning up and down across thousands of turns without the operator ever touching a setting. In a release built for long-horizon work, that is the precondition for the agent budgeting itself.
How It Actually Works
There are two layers to it. The first is the reasoning router, a learned policy that reads the prompt and decides how many internal steps to take before drafting an answer. The second is a stopping criterion, also learned, that checks whether the next step is still adding signal. The two together let the same model run a one-shot reply on a simple email and a 200-step plan on a complex codebase, on the same call, without a parameter change from the operator.
The catch is that the operator no longer sees the decision being made. Earlier reasoning surfaces, like a visible scratchpad or an extended-thinking sidebar, let you watch the model spend its compute. Adaptive thinking moves most of that spending into the model’s own loop, where the user does not have a direct view. Anthropic mitigates this by surfacing an effort and confidence indicator after the fact, but the lever itself is gone from the prompt.
Why It Matters Right Now
The thinking knob was always a confession that the model could not estimate its own load. Every operator who used extended thinking on a trivial task and watched the token bill spike, or skipped extended thinking on a hard task and got a glib answer, learned that the knob was hiding a missing skill. Adaptive thinking fills that gap.
It also resets the cost curve. The previous era of token budget discipline assumed the operator should be the budgeter. Long-horizon agentic work made that assumption brittle. A task that spans hours cannot wait for a human to re-set the thinking dial every twenty minutes. The agent that decides its own thinking depth is the only kind of agent that can run unattended for a multi-day audit, a self-written test suite, or a delegated sub-agent fan-out.
How TWO Uses It
TWO’s editorial line on adaptive thinking is that the lever did not disappear, it changed shape. The new lever is the prompt itself. With adaptive thinking on, the model reads the prompt as the brief and the brief as the cost ceiling. A vague prompt invites the model to think more, because it is trying to infer what you meant. A precise prompt lets the model think less, because the goal is already named.
The operator-decision moment is the prompt revision pass before you press send. Most operators will keep their old habit of dashing off a vague request and waiting to see how Claude handles it. With adaptive thinking, that habit is now an invoice. Scott’s working rule is to spend an extra minute restating the prompt as a contract: what the model has, what the model must produce, and what counts as done. Each clarification trims the implicit thinking budget. The minute of prompt revision saves five minutes of agent loop cost on the back end and produces a better answer.
The shift also collapses the gap between “fast” and “smart” model tiers in a single product. Operators who used to route trivial prompts to a cheaper model to save money will find that adaptive thinking does the routing inside one model, removing one decision from the day and quietly killing one routing-cost-saving tactic that has been in the playbook since GPT-4 launched.
A Concrete Operator Scenario
You are running a Friday-evening codebase audit with Claude Fable 5. The first prompt is “review this repo for unused imports and dead routes.” The model takes a short pass, returns a list, and you accept. The second prompt is “propose three refactors that improve startup time without changing public APIs.” With adaptive thinking on, Fable 5 treats the second prompt as license to think much longer, because the constraint (no public-API changes) signals a hard goal and the surface (startup time) signals depth is needed. You get a deeper plan. The bill for the second call is higher. The bill for the first call is not. Without adaptive thinking, both prompts would have cost the same, and one of the two would have been wrong on cost.
What to Watch Next
Three signals tell you adaptive thinking is reshaping the frontier model market. First, when a competing lab ships the same default, watch which one exposes the model’s internal effort estimate to the operator. The lab that shows the dial is selling a tool. The lab that hides it is selling a service. Second, watch pricing. Once a model self-budgets, vendors can sell flat-rate seats instead of token-metered keys, because the model’s own conservation becomes the margin protection. Third, watch the agent platforms. A model that thinks deeper on hard prompts without being asked is a model that finally fits inside an unattended overnight run. The orchestration layers that depended on extended-thinking toggles will need to rewrite their playbooks around the model’s own pacing.