Computer-Use Model
A foundation model purpose-built to operate software the way a person does, by clicking, typing, navigating menus, and manipulating files, rather than producing chat or code.
What It Is
A computer-use model is a foundation model trained specifically to operate the graphical interfaces of software the way a human operator would. It looks at a screen, decides what to click, types into fields, opens menus, and moves files between applications. That is a different job from a chat model (which produces text) or a coding model (which produces code), and it requires training data and reinforcement signals tuned to screen pixels, mouse paths, and the messy state machines of real desktop and browser software.
The category gained sharp definition this week when six-person startup Standard Intelligence raised $75 million from Sequoia, Spark Capital, and Andrej Karpathy as an angel to scale FDM-1, a model the team demoed designing a metal component in a CAD application and learning to drive a vehicle through a website’s controls after one hour of fine-tuning. The point of FDM-1 is not that it can chat about CAD. The point is that it can drive CAD.
In practice, computer-use models often combine a vision encoder, an action policy, and a planning loop. Vendors are investing here because the cost curve of “have an agent click through legacy software for you” looks very different from the cost curve of “have a human do it,” provided the model is reliable enough to leave alone.
Why It Matters
For operators, the structural shift is that a large fraction of internal back-office work today is gated by software interfaces that were designed for humans, not APIs. ERP screens, claims-processing applications, CAD packages, and government portals all assume someone will sit down and click through them. A computer-use model that works reliably collapses the cost of that work and changes what counts as automatable.
That has two business consequences worth tracking. First, the traditional RPA category, which scripts the same kind of clicking but breaks every time a vendor changes a button, gets repriced when a model can adapt to interface changes on the fly. Second, software vendors lose part of the lock-in that came from owning the only viable user interface to their data; if a model can drive your competitor’s UI as easily as yours, your moat shifts elsewhere.
It also reshapes the security conversation. A computer-use model with credentials to your laptop is a different threat model from a chatbot with no hands. Vendors and buyers are both still figuring out how to gate, log, and revoke that kind of access at scale.
In Practice
If you are evaluating a computer-use model for your own work, run one bounded test before you commit. Pick a single repetitive task in your business that is gated by a clunky software interface, something like reformatting CSVs in a vendor portal, downloading reports out of an ERP, or filling claim forms in a government system. Time how long a human takes today. Then have the model do the same task end to end, on real data, with the model’s logs visible.
The questions you want answered are concrete. Did the model finish the task without intervention? How often did it ask for help, and on what kinds of steps? When the underlying software changed, did the model adapt or did it fail silently? Is there an audit trail you would be willing to show a compliance officer?
If the model passes that test on one task, the operator move is not to roll it out across the organization immediately. The move is to identify the next two tasks where the same skill applies, run the same bounded test, and only then start retiring the human-in-the-loop step. Computer-use models will reach production reliability unevenly, by surface and by application. Be the operator who knows where they work and where they do not, before your vendor sells you a license that assumes they work everywhere.