The Harness Is The Product (and Other Hot Takes) : Dave Beckett

I've spent the last nine months using AI coding tools on my own projects such as Claude Code, Cursor, Gemini CLI, Amp, Codex and others. I'm currently between jobs, which means I have no corporate agenda and no stake in any of these companies.

I have opinions. Some of them might even survive the week.

Part 1: The Harness Is the Product

GPT-5.4 is very good with Cursor. Surprisingly good. I don't even see it showcased this well in ChatGPT, which is OpenAI's own product. That's a tell.

The most interesting thing happening in AI right now isn't the models, it's the harnesses acting as the integration layer: tool calling, UX, agentic orchestration. A great model in a mediocre harness loses to a good model in a great harness. Gemini's models are competitive but feel underwhelming because Google's tooling can't showcase them, which is presumably why they acquired Windsurf and relaunched it as Antigravity. Claude's models shine brightest through Claude Code. The engine matters, but nobody buys an engine.

Hot take: The model is the engine. The harness is the car.

This has implications for where moats form. For a while it looked like scale and training compute were the only defensible positions. That's still true at the absolute frontier, but below that line, models are converging fast enough that harness quality dominates the user experience. Cursor and Claude Code figured this out early. The companies that win will be the ones treating the model as a component and the harness as the product, which is a deeply uncomfortable position for labs that spent billions training the models.

It's worth being specific about what a modern harness actually does, because the shift is easy to miss. Early AI coding tools worked like this:

prompt → completion

You asked a question, got an answer, tried again if it was wrong.

Modern coding systems work like this:

observe repository → plan change → edit files → run tests → inspect errors → iterate

That loop is subtle but it changes everything. The system isn't generating code snippets; it's participating in a continuous cycle over a real project. It reads the codebase, modifies multiple files, runs commands, and adjusts based on results. Less autocomplete, more collaborator.

And here's the thing: a lot of the agentic stuff IS just this loop with different tools plugged in. An agent observes state, generates a command or script, runs it, inspects the output, decides what to do next. Even tasks that aren't obviously programming often reduce internally to "write some python, call an API, parse the result, continue." If you solve the coding harness, you've solved a large chunk of the general agentic problem. This is something Anthropic realized relatively recently and took advantage of with CoWork.

Hot take: Agents are mostly code-writing loops with tool access.

This also means the IDE is quietly becoming an agent runtime. Editors already provide everything agents need: structured projects, deterministic execution environments, version control, feedback loops. It's not a coincidence that the best agent experiences are happening inside coding tools or on CLIs rather than chat windows.

Hot take: The IDE is becoming the operating system for AI agents.

The Google tragedy

Google is the most painful case study: they have the research, the infrastructure, the talent, and arguably the best foundation model team on earth, yet they keep fumbling the integration layer. The Windsurf acquisition and Antigravity launch tells the story: Google paid $2.4 billion to license Windsurf's code and hire its founders, then launched Antigravity four months later.

That's a strange failure mode for the company that built Gmail, Maps, and Search. Something broke culturally.

I want Google to be good and honestly, they do have adjacent AI products that are very good from my experience. NotebookLM is great, AI search is free and genuinely useful. The whole Google Docs ecosystem works well with AI. Google's strength has always been horizontal platform plays, and those products reflect that.

But the coding-centric agentic future is a vertical integration game and Google keeps losing it. Their model quality isn't the problem; their harness is.

If harnesses are becoming the product, the next question is: who builds them?

Part 2: Open Source and the Harness Layer

The leading harnesses right now are proprietary. Cursor is proprietary. Claude Code is proprietary. Antigravity is a $2.4 billion proprietary fork. So: closed source wins?

Not so fast. It's worth noting that the model layer hasn't been won by open source either, despite the narrative. Open weights models from Meta and others (mostly Chinese labs) are competitive but the frontier is still closed, and Meta's stuff is clearly a strategic weapon against Google and OpenAI dressed up as generosity.

The harness layer is more interesting because it's more contested. OpenClaw blows a hole in the story. Formerly Clawdbot, then Moltbot, it went from 9,000 to 60,000+ GitHub stars in days and now sits over 250K. It's not a coding harness in the Cursor sense; it's a general agentic harness with message routing across WhatsApp, Telegram, Discord, and dozens of other channels, autonomous task execution, 50+ integrations, running 24/7 on your own hardware. OpenCode is doing something similar for the coding-specific case.

These projects are moving fast and quite arguably faster than their closed counterparts on raw feature velocity.

The tradeoff is risk. OpenClaw's attack surface is enormous. Security researchers have mapped it against every category in the OWASP Top 10 for Agentic Applications. There are documented cases of agents acting well beyond user intent; one created a dating profile autonomously, which is either impressive or terrifying depending on your perspective. Its creator, Peter Steinberger, joined OpenAI and the project is moving to an open source foundation. That could mean more institutional backing or it could mean founder departure stalls momentum, it's too early to tell.

So the real picture isn't "open source is winning" or "open source is losing." It's that closed harnesses and open source harnesses are optimizing on different axes:

Closed (Cursor, Claude Code): safety, polish, tight model integration
Open (OpenClaw, OpenCode): extensibility, speed, community velocity, accepting more risk

Both are viable today. The question is which axis matters more as agentic tools move from developers to everyone else. My guess: the closed harnesses win the mainstream because most people don't want to manage their own attack surface. But open source keeps pushing the bleeding edge, and ideas flow from bleeding edge to mainstream on roughly a three-month delay.

Hot take: The open source harness ecosystem is about three months ahead of commercial tools. The ideas show up there first; the polish shows up later in closed products.

Hot take: Models may become commodities. Harnesses are the product.

This might be the first major technology wave where open source doesn't clearly own the infrastructure layer, or it might not. Ask me again in a few weeks, when this take will be outdated.

Part 3: Code Is the New Assembler (and Other Predictions)

Code is becoming the new assembler. Nobody writes assembler anymore, but it didn't disappear, it just got generated and was below the surface. Code is heading the same way. The skill is shifting from "can you write code" to "can you specify intent precisely enough that code gets generated correctly." That's closer to systems architecture than traditional programming.

The agentic loop where a human specifies, model generates, harness orchestrates, human validates, is the new unit of work. This now applies well beyond coding, as any task that can be decomposed into tool calls and validation steps is fully in agentic territory. Code is just where it showed up first because code is the easiest thing to validate (it either runs or it doesn't, mostly).

The competence amplifier

Here's something I didn't expect. Over the past nine months I've shipped working tools and apps written in Go, JavaScript, and Postgres. I don't write Go or JavaScript, although I can read them. I've never administered Postgres in anger. But I have 25+ years of systems experience, and it turns out that's enough. I can read the generated code, spot architectural problems, evaluate whether the error handling makes sense, and steer the iteration loop. I can't write idiomatic Go from scratch but I can tell when the AI-generated Go is doing something stupid.

This is the real shape of "code as assembler." The AI handles the syntax and idiom; the human provides the judgment layer. My experience with distributed systems, failure modes, and operational patterns transfers directly even when I don't know the language. The harness doesn't replace expertise, it makes expertise portable across languages and frameworks in a way that wasn't possible before.

This has two implications.

For experienced developers, your value shifts from "I know language X" to "I know how systems work." That's a bigger, more durable moat.
For non-developers (product managers, designers, domain experts) the barrier to building working software just dropped dramatically. They don't need to learn Go or Python. They need to learn how to specify what they want clearly enough that the loop converges. That's a different skill, and a lot of people already have it without realizing.

Hot take: AI coding tools don't replace developers. They make systems thinking portable across any language or framework.

The model picker disappears

One near-term prediction: the model picker goes away. Nobody types http:// anymore. Nobody picks which CDN node serves their webpage. The system picks. Model selection is heading the same way.

The fact that I currently care whether I'm running Sonnet 4.6 or GPT-5.4 is a sign of immaturity, not a feature. In two years, maybe less, the harness routes tasks dynamically:

cheap model     → routine edits, boilerplate
reasoning model → planning, architecture
coding model    → implementation
verifier model  → checking, testing

The user interacts with one interface. The model choice becomes an implementation detail, like which CPU core your thread runs on. That'll be a sign the ecosystem has grown up.

Hot take: The model menu will eventually disappear.

(The model picker sticking around for power users and experts is fine. I'm talking about the default experience.)

The rate of change problem

The uncomfortable corollary to all of this is that the rate of change is stupid fast. Expertise about specific model behavior expires in days to weeks. Any opinion formed about a model's capabilities on a given Tuesday is stale by the following Tuesday. Including the opinions in this post, presumably.

The durable skills are meta-skills: evaluating models, designing harnesses, specifying intent, thinking in systems. The specific knowledge of "Claude is good at X but bad at Y" or "GPT-5.4 handles long context better than..." is transient. It's useful for a week, maybe two, then something ships and the landscape shifts.

This favors a certain kind of engineer. The senior generalist who thinks in systems, evaluates tradeoffs, and adapts fast. Not the specialist who knows one tool deeply. This is convenient for me, I realize, but I think it's true regardless.

Hot take: The most valuable AI skill is no longer prompting. It's building the loop around the model.

Where this lands

I don't have a neat conclusion. These are hot takes and some of them will age badly. But the harness-as-product thesis feels durable to me, the open source picture is genuinely unsettled, and "code as assembler" is more a description of what's already happening than a prediction.

Interesting times.

Dave Beckett