The Wise Operator

Record and Replay

An AI skill creation pattern in which an agent watches a user complete a task once, then saves the demonstration as a reusable, parameterized routine the agent can execute later without further instruction.


What It Is

Record and Replay is a pattern for teaching software agents by demonstration. Instead of describing what you want the agent to do in a prompt, you turn on a recording mode, complete the task yourself one time, and the agent captures the actions you took, the interfaces you touched, and the inputs you provided. The recording is then saved as an editable skill file the agent can replay later, with variables substituted in for the parts that change between runs. The pattern was a research curiosity for years; OpenAI shipped it to the consumer Codex app on Mac on June 20, 2026, making it the first frontier lab to roll the capability out to paying ChatGPT subscribers.

The idea is older than AI. Spreadsheet macros worked this way in the 1980s. Robotic process automation tools sold the pattern to enterprises for a decade. What the agentic version adds is that the recording is no longer a rigid script. The saved skill is a structured set of intent and steps the model can adapt when the interface changes slightly, when an input differs, or when an unexpected dialog appears. The recording captures intent. The model fills in the rest.

How It Actually Works

The recording layer watches three things at once. It logs the user’s clicks and keystrokes, the visual state of the screen at each step, and the natural-language reasoning the user supplies during the recording. OpenAI’s Mac implementation writes the result to a Markdown file with a YAML frontmatter, then offers the user a chance to edit it. Variables that appeared in the recording, like the ticket title or the recipient’s email, are marked as inputs. Steps that involved a fixed sequence, like opening a specific menu, are marked as routines. The resulting file is portable: it sits in the user’s repository like any other code artifact, can be version-controlled, and can be shared with teammates who have the same agent.

At runtime, the agent reads the saved skill, accepts the variable inputs, and executes the steps using whatever interface is in front of it. The replay is not pixel-perfect playback. The agent uses its computer-use capabilities to handle UI drift, unexpected dialog boxes, and small layout shifts. If the original recording clicked a button at coordinates (340, 220), the replay clicks the button by its label even if the coordinates have moved.

Why It Matters Right Now

The bottleneck in agentic adoption has not been model capability. It has been the cost and skill of writing the prompt that gets an agent to do work reliably. Record and Replay removes that bottleneck for a real class of tasks. The user no longer has to translate a workflow into prose. They can simply show the agent and let the agent write the skill on their behalf.

This matters because most operator workflows are not code. They are sequences of clicks across half a dozen tools that the operator has never thought to describe in writing. The pattern lets a non-developer turn muscle memory into automation without learning a new vocabulary.

How TWO Uses It

We use Record and Replay editorially as the canonical example of “show, don’t tell” reaching consumer AI. The internal test is this: a feature that requires you to learn the model’s language is a feature the model has not yet learned yours. Record and Replay reverses that direction. The model meets the operator where the operator already lives, on the screen they already use. We rank consumer AI releases against that bar.

The operator-decision is which of your weekly workflows is actually worth recording. Not every repetition is a candidate. The good ones share three traits: they happen at least twice a week, they involve clicking through interfaces nobody enjoys, and the steps are roughly the same every time with only small inputs changing. If a workflow fails any of those tests, the recording will be brittle, and the agent will burn more time fixing the replay than the workflow would have taken. Scott runs the test before every “automate this” reflex: would I describe this to a new hire in three sentences? If yes, it is a recording candidate. If no, the workflow is not yet stable enough to capture.

A Concrete Operator Scenario

A growth lead at a 20-person company sends a weekly update Slack message to four channels every Friday. The message pulls last week’s traffic numbers from Plausible, copies a screenshot, and tags the same three teammates each time. The lead has done this every Friday for a year. The recording captures the entire flow in twelve minutes: open Plausible, copy the chart, switch to Slack, paste in channel one, tag, repeat. The saved skill takes three inputs: the date range, the chart name, and a one-sentence editorial note. The next Friday, the lead types the three inputs in eight seconds. The skill does the rest in ninety. Friday update goes from a twenty-minute job to under two minutes for the next forty Fridays.

What to Watch Next

The capability is currently Mac-only and limited to ChatGPT Plus, Pro, Business, Enterprise, and Edu plans outside the EEA, UK, and Switzerland. Watch for two signals. The first is a Windows port, which would expand the addressable base by an order of magnitude. The second is a shared skill marketplace where users can publish skills the way they publish prompts. The day a recording made by one operator can be installed by another with one click is the day Record and Replay stops being a feature and becomes a platform.