Frontier Model
A frontier model is one of the most capable AI systems in existence at a given moment, advanced enough that its risks are not yet fully understood.
What It Is
A frontier model is one of the handful of AI systems sitting at the leading edge of capability at any given time. The word “frontier” is doing real work. It marks the boundary between what AI has reliably done before and what it is doing for the first time, where the behavior is new enough that even the people who built it cannot fully predict what it will do.
The term entered policy language because regulators needed a way to talk about the small set of models that pose outsized risk without writing rules that accidentally cover every chatbot and spam filter. When the May 2026 executive order referred to “mandatory federal approval of frontier models,” it was naming this top tier specifically: the systems capable enough that a mistake at launch is a national problem, not a product bug.
A frontier model is usually a large language model or a closely related system, trained on enormous datasets using enormous amounts of compute. But “frontier” is not a fixed technical threshold. It is a moving line. Today’s frontier model is next year’s ordinary tool. What makes a model “frontier” is not a feature list but its position relative to everything else available at the same moment.
How the Line Gets Drawn
There is no single agreed definition, which is part of the governance problem. In practice, three rough signals are used together.
The first is compute. Frontier models are trained using a quantity of computing power that only a few organizations can afford, so the size of a training run becomes a proxy for capability. The second is benchmark performance: a model that posts leading scores across reasoning, coding, and knowledge tests is treated as frontier almost by definition. The third, and the one regulators care about most, is dangerous capability. A model that can find network vulnerabilities, assist with bioweapon design, or run long autonomous tasks is frontier regardless of where it lands on a leaderboard.
That third signal is why Anthropic’s disclosure about its Mythos model mattered. A model unusually good at finding network vulnerabilities is a frontier model in the sense that policy cares about, because the capability itself is the hazard.
Why It Matters Right Now
The frontier label has become the unit of regulation. Lawmakers cannot sensibly govern “AI” as a whole, because AI now includes everything from autocomplete to autonomous research agents. So the regulatory strategy has narrowed to the frontier: write rules for the dozen or so most capable systems and leave the rest mostly alone.
This is why the May 2026 executive order spent its energy on a pre-launch testing window for frontier models rather than a blanket AI law. The bet is that risk concentrates at the top of the capability curve, so oversight should too. Whether that bet holds depends entirely on whether “frontier” can be defined precisely enough to enforce, and it currently cannot.
How TWO Uses It
At TWO we treat “frontier model” as a risk word, not a marketing word. When a vendor calls its product frontier, we hear a claim about consequences, not quality, and we read it accordingly.
For an operator, the term sharpens one specific decision: which model to actually build on. Frontier models are the most capable option, but they are also the most expensive, the most likely to change under you, and the most likely to attract new compliance requirements mid-project. A frontier model is the right choice when the task genuinely needs leading-edge reasoning. It is the wrong choice when a cheaper, more stable model a tier down would do the same job.
Here is the concrete moment. You are choosing a model for a workflow that will run for a year. The frontier option benchmarks two points higher. The non-frontier option costs a third as much and is far less likely to be the subject of a pre-deployment evaluation requirement or a cyber-permissive-model disclosure that forces a sudden change. Unless those two benchmark points decide your product, the non-frontier model is usually the operator’s answer. Frontier is where the research happens. It is not where most real work should live.
What to Watch Next
Watch for a definition. The single biggest open question is whether any government settles on an enforceable threshold for what counts as frontier, likely tied to training compute or a specific dangerous-capability test. The moment that line is drawn, the voluntary frameworks of today become candidates for hard law, and the compute-commitment decisions labs are making now will determine who lands above it.