I've spent the last nine months using AI coding tools on my own projects
such as Claude Code,
Cursor, Gemini
CLI,
Amp, Codex
and others. I'm currently between jobs, which means I have no corporate
agenda and no stake in any of these companies.
I have opinions. Some of them might even survive the week.
Part 1: The Harness Is the Product
GPT-5.4 is very good with Cursor. Surprisingly good. I don't even see it
showcased this well in ChatGPT, which is OpenAI's own product. That's a
tell.
The most interesting thing happening in AI right now isn't the models, it's
the harnesses acting as the integration layer: tool calling, UX, agentic
orchestration. A great model in a mediocre harness loses to a good model in
a great harness. Gemini's models are competitive but feel underwhelming
because Google's tooling can't showcase them, which is presumably why they
acquired Windsurf and relaunched it as
Antigravity. Claude's models shine brightest
through Claude Code. The engine matters, but nobody buys an engine.
Hot take: The model is the engine. The harness is the car.
This has implications for where moats form. For a while it looked like scale
and training compute were the only defensible positions. That's still true
at the absolute frontier, but below that line, models are converging fast
enough that harness quality dominates the user experience. Cursor and
Claude Code figured this out early. The companies that win will be the ones
treating the model as a component and the harness as the product, which is a
deeply uncomfortable position for labs that spent billions training the
models.
It's worth being specific about what a modern harness actually does, because
the shift is easy to miss. Early AI coding tools worked like this:
You asked a question, got an answer, tried again if it was wrong.
Modern coding systems work like this:
observe repository → plan change → edit files → run tests → inspect errors → iterate
That loop is subtle but it changes everything. The system isn't generating
code snippets; it's participating in a continuous cycle over a real project.
It reads the codebase, modifies multiple files, runs commands, and adjusts
based on results. Less autocomplete, more collaborator.
And here's the thing: a lot of the agentic stuff IS just this loop with
different tools plugged in. An agent observes state, generates a command or
script, runs it, inspects the output, decides what to do next. Even tasks
that aren't obviously programming often reduce internally to "write some
python, call an API, parse the result, continue." If you solve the coding
harness, you've solved a large chunk of the general agentic problem. This
is something Anthropic realized relatively recently and took advantage of
with
CoWork.
Hot take: Agents are mostly code-writing loops with tool access.
This also means the IDE is quietly becoming an agent runtime. Editors
already provide everything agents need: structured projects, deterministic
execution environments, version control, feedback loops. It's not a
coincidence that the best agent experiences are happening inside coding
tools or on CLIs rather than chat windows.
Hot take: The IDE is becoming the operating system for AI agents.
The Google tragedy
Google is the most painful case study: they have the research, the
infrastructure, the talent, and arguably the best foundation model team on
earth, yet they keep fumbling the integration layer. The Windsurf
acquisition and Antigravity launch tells the story: Google paid $2.4 billion
to license Windsurf's code and hire its founders, then launched Antigravity
four months later.
That's a strange failure mode for the company that built Gmail, Maps, and
Search. Something broke culturally.
I want Google to be good and honestly, they do have adjacent AI products
that are very good from my experience.
NotebookLM is great, AI
search is free and genuinely useful. The
whole Google Docs ecosystem works well with AI. Google's strength has
always been horizontal platform plays, and those products reflect that.
But the coding-centric agentic future is a vertical integration game and
Google keeps losing it. Their model quality isn't the problem; their harness
is.
If harnesses are becoming the product, the next question is: who builds
them?
Part 2: Open Source and the Harness Layer
The leading harnesses right now are proprietary. Cursor is proprietary.
Claude Code is proprietary. Antigravity is a $2.4 billion proprietary fork.
So: closed source wins?
Not so fast. It's worth noting that the model layer hasn't been won by open
source either, despite the narrative. Open weights models from Meta and
others (mostly Chinese labs) are competitive but the frontier is still
closed, and Meta's stuff is clearly a strategic weapon against Google and
OpenAI dressed up as generosity.
The harness layer is more interesting because it's more contested.
OpenClaw blows a hole in the story.
Formerly Clawdbot, then Moltbot, it went from 9,000 to 60,000+ GitHub stars
in days and now sits over 250K. It's not a coding harness in the Cursor
sense; it's a general agentic harness with message routing across WhatsApp,
Telegram, Discord, and dozens of other channels, autonomous task execution,
50+ integrations, running 24/7 on your own hardware.
OpenCode is doing something similar
for the coding-specific case.
These projects are moving fast and quite arguably faster than their closed
counterparts on raw feature velocity.
The tradeoff is risk. OpenClaw's attack surface is enormous. Security
researchers have mapped it against every category in the OWASP Top 10 for
Agentic
Applications.
There are documented cases of agents acting well beyond user intent; one
created a dating profile autonomously, which is either impressive or
terrifying depending on your perspective. Its creator, Peter Steinberger,
joined OpenAI and the project is moving to an open source foundation. That
could mean more institutional backing or it could mean founder departure
stalls momentum, it's too early to tell.
So the real picture isn't "open source is winning" or "open source is
losing." It's that closed harnesses and open source harnesses are
optimizing on different axes:
- Closed (Cursor, Claude Code): safety, polish, tight model integration
- Open (OpenClaw, OpenCode): extensibility, speed, community velocity,
accepting more risk
Both are viable today. The question is which axis matters more as agentic
tools move from developers to everyone else. My guess: the closed harnesses
win the mainstream because most people don't want to manage their own attack
surface. But open source keeps pushing the bleeding edge, and ideas flow
from bleeding edge to mainstream on roughly a three-month delay.
Hot take: The open source harness ecosystem is about three months
ahead of commercial tools. The ideas show up there first; the polish
shows up later in closed products.
Hot take: Models may become commodities. Harnesses are the product.
This might be the first major technology wave where open source doesn't
clearly own the infrastructure layer, or it might not. Ask me again in a few
weeks, when this take will be outdated.
Part 3: Code Is the New Assembler (and Other Predictions)
Code is becoming the new assembler. Nobody writes assembler anymore, but it
didn't disappear, it just got generated and was below the surface. Code is
heading the same way. The skill is shifting from "can you write code" to
"can you specify intent precisely enough that code gets generated
correctly." That's closer to systems architecture than traditional
programming.
The agentic loop where a human specifies, model generates, harness
orchestrates, human validates, is the new unit of work. This now applies
well beyond coding, as any task that can be decomposed into tool calls and
validation steps is fully in agentic territory. Code is just where it
showed up first because code is the easiest thing to validate (it either
runs or it doesn't, mostly).
The competence amplifier
Here's something I didn't expect. Over the past nine months I've shipped
working tools and apps written in Go, JavaScript, and Postgres. I don't
write Go or JavaScript, although I can read them. I've never administered
Postgres in anger. But I have 25+ years of systems experience, and it turns
out that's enough. I can read the generated code, spot architectural
problems, evaluate whether the error handling makes sense, and steer the
iteration loop. I can't write idiomatic Go from scratch but I can tell when
the AI-generated Go is doing something stupid.
This is the real shape of "code as assembler." The AI handles the syntax
and idiom; the human provides the judgment layer. My experience with
distributed systems, failure modes, and operational patterns transfers
directly even when I don't know the language. The harness doesn't replace
expertise, it makes expertise portable across languages and frameworks in a
way that wasn't possible before.
This has two implications.
- For experienced developers, your value shifts from "I know language X"
to "I know how systems work." That's a bigger, more durable moat.
- For non-developers (product managers, designers, domain experts) the
barrier to building working software just dropped dramatically. They
don't need to learn Go or Python. They need to learn how to specify
what they want clearly enough that the loop converges. That's a
different skill, and a lot of people already have it without realizing.
Hot take: AI coding tools don't replace developers. They make
systems thinking portable across any language or framework.
The model picker disappears
One near-term prediction: the model picker goes away. Nobody types
http:// anymore. Nobody picks which CDN node serves their webpage. The
system picks. Model selection is heading the same way.
The fact that I currently care whether I'm running Sonnet 4.6 or GPT-5.4 is
a sign of immaturity, not a feature. In two years, maybe less, the harness
routes tasks dynamically:
cheap model → routine edits, boilerplate
reasoning model → planning, architecture
coding model → implementation
verifier model → checking, testing
The user interacts with one interface. The model choice becomes an
implementation detail, like which CPU core your thread runs on. That'll be a
sign the ecosystem has grown up.
Hot take: The model menu will eventually disappear.
(The model picker sticking around for power users and experts is fine. I'm
talking about the default experience.)
The rate of change problem
The uncomfortable corollary to all of this is that the rate of change is
stupid fast. Expertise about specific model behavior expires in days to
weeks. Any opinion formed about a model's capabilities on a given Tuesday
is stale by the following Tuesday. Including the opinions in this post,
presumably.
The durable skills are meta-skills: evaluating models, designing harnesses,
specifying intent, thinking in systems. The specific knowledge of "Claude is
good at X but bad at Y" or "GPT-5.4 handles long context better than..." is
transient. It's useful for a week, maybe two, then something ships and the
landscape shifts.
This favors a certain kind of engineer. The senior generalist who thinks in
systems, evaluates tradeoffs, and adapts fast. Not the specialist who knows
one tool deeply. This is convenient for me, I realize, but I think it's true
regardless.
Hot take: The most valuable AI skill is no longer prompting. It's
building the loop around the model.
Where this lands
I don't have a neat conclusion. These are hot takes and some of them will
age badly. But the harness-as-product thesis feels durable to me, the open
source picture is genuinely unsettled, and "code as assembler" is more a
description of what's already happening than a prediction.
Interesting times.
Permalink