Gemini 3.5 Flash Adds Built-In Computer Use, Hits 78.4 on OSWorld

For two years the agent conversation has been about what models can say. This week it became about what they can do with their own hands on your screen. Google moved a capability that used to live in a separate, slower model into the engine that already answers most of its developer traffic. The line between a model that talks and a model that acts just got thinner, and the operator question changed with it.

The Lead: Google Builds Computer Use Into Gemini 3.5 Flash

Google made computer use a built-in tool inside Gemini 3.5 Flash, letting developers build agents that see, reason, and act across browser, mobile, and desktop without reaching for a separate model.

The capability used to live in a standalone Gemini 2.5 computer-use model, a slower side door you called only when you needed an agent to click. Now it sits in the main Flash model that already serves most of Google’s developer volume. On OSWorld, a benchmark that tests agents operating real software, Gemini 3.5 Flash scores 78.4. That puts it ahead of the older Gemini 3 Flash and GPT-5.4 mini, just behind GPT-5.5 at 78.7, and under Anthropic’s Opus 4.8 at 83.4. The numbers are close enough that benchmark rank matters less than the structural change underneath.

That change is what computer use becomes when it stops being a specialty model and becomes a default tool. A developer no longer chooses between a fast model that chats and a slow one that acts. The same call can reason about a task and then move the mouse. Google ships it with adversarial training plus two optional enterprise safeguards against prompt injection, because a model with hands on a screen is a different risk than a chatbot with none.

The question nobody is asking loudly enough: when acting becomes a tool on the default model rather than a deliberate choice, who is left watching the agent work? Google is selling the capability through the Gemini API and its Gemini Enterprise Agent Platform (The Decoder). The safeguards are optional. The oversight is yours to build.

What It Means for You

The apps you already open are getting models that act on your behalf, not just answer your questions.

The most visible change is the one you will feel without configuring anything. OpenAI updated GPT-5.5 Instant again, the default model behind most ChatGPT conversations, framing it as better at reading the intent behind a question, handling complex constraints, and giving shopping and local recommendations. Paid users get it first, free users a day later. You did not pick this model. It is simply what answers you now, which is the whole point of a default.

The same shift shows up where the screen itself moves closer to your face. Meta launched three own-brand smart-glasses styles from $299, about $80 below Ray-Ban Meta Gen 2, plus a $399 Kylie Jenner collaboration. The frames debut Muse Spark, the first model from Meta Superintelligence Labs built for wearables, rolling out to existing Ray-Ban and Oakley Meta glasses by software update. Pedestrian turn-by-turn navigation and 14 new live-translation languages are coming. Early demos were notably glitchy, which is the honest part of the story.

“A default is a decision someone else made for you, running before you wake up.”

For the professional reader, the sharpest version is Perplexity’s Computer for Counsel, a version of its agent aimed at legal teams. It routes among 20-plus frontier models by task, bundles the legal-research tool Midpage plus connectors to Box, Clio, Docusign, Ironclad, and NetDocuments, and works inside Microsoft 365. Available today to Enterprise and Max subscribers, it is a concrete agent a working lawyer can put on a real matter this week.

What’s Moving Underneath

The week’s macro thread is control of the layers most users never see: the compiler beneath the chip, and the law above the interface.

Qualcomm confirmed an all-stock deal worth roughly $3.9 billion to acquire Modular, the AI developer-platform startup led by Chris Lattner. Yesterday this was reported as nearing. Today it is done, and the new detail is the target: Modular builds at the software and compiler layer that makes Nvidia’s CUDA hard to displace. Modular last raised at a $1.6 billion valuation in September 2025, so $3.9 billion is a steep step-up. The deal caps a Qualcomm spree alongside Alphawave and Ventana, with a separately reported pursuit of chip startup Tenstorrent and a close expected in the second half of 2026.

The other layer is legal. Fresh guidance underscores that the EU AI Act’s Article 50 transparency obligations, disclosing when someone is interacting with AI and labeling AI-generated content, still take effect August 2, 2026, largely untouched by the Digital Omnibus package that pushes high-risk rules into 2027 and 2028. One carve-out: marking obligations for AI-generated audio, image, video, and text placed on the market before August 2 move to December 2, 2026. The European Parliament approved the omnibus amendments on June 16; Council adoption is expected in the coming weeks.

“Whoever owns the compiler and writes the disclosure rule owns the leash the agent runs on.”

The thread connecting the two is the same one running through the lead. As models gain hands, the contest moves to who owns the compiler they run on and who writes the rules for what they must disclose. None of this reaches your screen this week. All of it sets the terms under which next year’s agents are allowed to act, and on whose silicon.

One Tool Worth Knowing

Perplexity Computer for Counsel

This is the cleanest example yet of computer use aimed at a single profession rather than at developers in general. It is not a chatbot bolted onto a legal database. It is an agent that routes among 20-plus frontier models depending on the task, reaches into the systems a legal team already runs (Box, Clio, Docusign, Ironclad, NetDocuments, and 400-plus connectors), and operates inside Microsoft 365 where the work actually happens. The bet is that a lawyer’s day is gated by software interfaces, not by a shortage of answers, and that an agent which can navigate those interfaces is worth more than one that only talks.

If you run a legal or operations team, evaluate it the way you would any agent with access to live systems: bounded scope first. The code-touching next step is to connect it to one repository or matter-management system in a sandbox and watch its logs as it handles a single real task end to end, checking whether it asks for help on the right steps and leaves an audit trail you would show a compliance officer. The non-code next step is to write down, before the first run, the one task you are entrusting it with and the criteria by which you will judge the return, so you are measuring the agent against a decision you made rather than against the demo Perplexity showed you.

Wisdom Speaks

“His lord said unto him, Well done, thou good and faithful servant: thou hast been faithful over a few things, I will make thee ruler over many things: enter thou into the joy of thy lord.” Matthew 25:21, KJV

The Parable of the Talents is the master text on delegated agency. Authority is entrusted to servants who must act while the master is away, and the watching is built into the structure: there is a reckoning when he returns. The servant praised is the one who acted faithfully under entrusted authority. The one judged is the one who buried what he was given and did nothing. In an age handing real tasks to autonomous agents, the parable names a role we now occupy without a word for it. We have become the epitropos, the steward who holds another’s authority and must give an account for how it was used. Delegation was never abandonment. The agent acts; the operator answers.

“The eye sees not itself but by reflection, by some other things.” William Shakespeare, Julius Caesar, Act I, Scene 2

Shakespeare names the blind spot that Scripture’s reckoning assumes. No eye watches itself. No system fully audits its own action without an outside reflection to judge it by. As models gain the literal power to see and act across a screen, the urgent question shifts from whether they can see to who reflects their seeing back. A computer-use agent operating unattended is an eye with no mirror, and an unmirrored eye is exactly the third servant’s mistake in a new form: capability held without account.

Build the reflection before you hand over the screen. The question to carry this week is not what your agents can do, but who is watching them do it, and how you will know what they did.

Yesterday’s digest: OpenAI and Broadcom Unveil Jalapeño, OpenAI’s First Custom Inference Chip, on a lab owning its own silicon. Earlier this week: Gemini 2.5 Pro Deep Think Lands as Fable 5 Stays Behind the Ban, on capability leaps. Today’s move folds those threads together: a capability leap that lets models act, riding on a fight over whose silicon and compiler they run on, the same contest Jalapeño joined from the chip side.