Dave Beckett

Eight Coding LLM Tools, One Configuration

2026-03-16 11:23

I currently use eight coding LLM tools at various times: Claude Code, Codex, Cursor's CLI (agent, formerly cursor-agent), Gemini CLI, Amp, Copilot, OpenCode, and Kilo. Each tool has its own configuration format, its own mechanism for custom commands, and its own opinions about where settings live. I want the same behavior from all of them: no emojis in commit messages, run markdownlint on every markdown file, don't be sycophantic.

Getting that consistency across multiple tools on a dozen development machines turned out to be its own project. I started pulling it together in mid-2025 after the third time I fixed a guideline in one tool's config and forgot to update the others.

The problem

Coding LLM CLI tools are multiplying fast, and none of them have agreed on a configuration standard. Claude wants JSON settings and markdown commands with YAML frontmatter. Cursor wants its own JSON format and plain markdown. Gemini wants markdown files with TOML headers. They all have different mechanisms for custom commands, and different places to put project-level vs global rules. And they keep changing!

If you only use one tool on one machine, this is fine. I use several tools across a bunch of machines running Fedora, Gentoo, Debian, Ubuntu, and macOS. Every time a tool updates its config format or I want to change a rule, I was finding myself editing the same content in multiple places and inevitably getting drift and tired of this.

Single-source guidelines

The fix was straightforward. I keep one file, data/guidelines.md, that defines how I want all my coding LLM assistants to behave. My dotfiles system templates it into each tool's config format on install:

  • Claude gets them in ~/.claude/CLAUDE.md
  • Cursor gets them in ~/.cursor/cli-config.json
  • Gemini gets them in ~/.gemini/GEMINI.md

Change the guidelines once, run make install, and every tool picks them up.

Write once, generate three ways

Custom commands were trickier. I have a git-commit command that tells the LLMs how to structure commit messages: conventional commit format, no emojis, no "Generated by Tool" footers, imperative mood. The content is defined once but each tool wants a different wrapper format.

Claude wants YAML frontmatter:

---
allowed-tools: Bash(git *)
description: Make git commits
---
# Command definition

Look at all the git $ARGUMENTS changes...

Cursor and Codex want plain markdown in different places, while Gemini wants TOML. So there's a template per tool that wraps the shared content in the right format. The install process generates all of them from the single source.

The custom git-commit command is the main thing I use across all the tools to avoid hype and phrases I hate, and it's what keeps LLM-generated commits looking like they were written by a human who cares about their git history.

Not all tools support custom commands yet, or at least I haven't figured it all out. Currently Claude, Cursor, Gemini, and Codex get generated commands. The rest get the shared guidelines but not the command wrappers. The template approach means adding a new tool is just another wrapper when they catch up.

This pattern extends to longer prompt definitions that Claude calls "skills." Skills can be multi-file: the blog-post skill that was used to write this post includes a ~300-line style guide derived from analyzing dozens of posts spanning two decades of my blog, plus the prompt definition that references it. Other skills handle things like analyzing job descriptions or preparing for interviews.

Skills now deploy to both Claude Code and Codex from shared source data. Claude gets symlinks into ~/.claude/skills/; Codex gets real file copies into ~/.codex/skills/ with OpenAI-format metadata for skill discovery. For claude.ai's web interface, the install process builds zip archives directly.

The settings merge problem

One problem I hadn't anticipated: Claude's settings.json accumulates permission rules as you use it. Every time you approve "allow this tool to run git commands" or "allow writes to this directory" those get saved. If you naively overwrite the settings file with a template on every install, you lose all the permissions you've granted during a session.

The fix was a JSON merge strategy: when installing a templated JSON settings file, the script loads the existing file, unions and deduplicates the permissions.allow, permissions.deny, and permissions.ask arrays with the template's values, preserves any extra top-level keys, and writes the merged result. The template provides the baseline; local usage adds to it.

In practice, those arrays are the tool's "what am I allowed to run and touch?" rules, and preserving them avoids re-approving the same actions after every install.

code-aide: installing the tools themselves

Installing coding LLM CLIs on a dozen Linux and macOS machines is its own annoyance. Some need Node.js and npm. Others have their own native installer scripts. Cursor downloads a tarball directly. Each has different prerequisites, each updates on its own schedule, and some have changed their installation method since they launched.

I wrote code-aide to handle this. It's now open source, installable via uv tool install code-aide or pipx, with zero external dependencies and Python 3.11+ stdlib only. Tool definitions live in a JSON config file (tools.json) with a schema_version field, so adding a new tool means adding a JSON entry rather than editing Python code.

code-aide supports three installer definition types:

  1. npm tools get an npm_package name and optional min_node_version (Gemini, Codex, OpenCode, Kilo, Copilot)
  2. script tools get an install_url and install_sha256 (Claude, Amp)
  3. direct download tools get a tarball URL template with platform and architecture substitution (Cursor)

Those types describe how a tool is modeled in tools.json. The VIA column below is the install source detected on this specific machine, which can be brew or cask if that host was provisioned that way.

The snippets below are example output from my setup at the time of writing:

$ code-aide status -c
uv run code-aide status -c
TOOL      STATE   VERSION                 VIA       PATH
agent     ok      2026.02.27-e7d2ef6      download  /Users/.../.local/bin/agent
claude    newer   2.1.71                  script    /Users/.../.local/bin/claude
gemini    ok      0.32.1                  brew      /opt/homebrew/bin/gemini
amp       ok      0.0.1772734909-g2a936a  script    /Users/.../.local/bin/amp
codex     ok      0.111.0                 cask      /opt/homebrew/bin/codex
copilot   newer   0.0.423                 npm       /opt/homebrew/bin/copilot
opencode  newer   1.2.20                  brew      /opt/homebrew/bin/opencode
kilo      opt-in

Note: The agent row is Cursor's CLI; the binary started out as cursor-agent. The latest-version metadata still uses cursor, which is why the table below shows that name instead.

Some of these tools install via curl | bash and I'd rather not run a script that's changed since I last reviewed it, so for script-type installers the downloaded script is verified against a known SHA256 of a reviewed script, and will not run if it has changed. For direct_download tools like Cursor, the install script changes with every version, so SHA256 verification was dropped in favor of version-string comparison against the cached latest version.

Auto-migration

One thing I didn't anticipate: tools keep changing their installation method. Claude Code started as an npm package (@anthropic-ai/claude-code) and later shipped a native installer script. In my own setup, Cursor installs also moved from shell-script installs to direct tarball downloads managed by code-aide. If you keep older install paths around, things often still work, but the fleet drifts and upgrade behavior becomes less predictable. You also may end up with packaged installs as well as self installs.

code-aide 1.7.0+ detects these deprecated installs and auto-migrates. code-aide status warns you, code-aide upgrade handles the transition: remove the old install, run the new method, verify it worked. If something goes wrong, it tells you what to do manually rather than leaving you with a half-migrated mess.

Keeping up with upstream

The SHA256 hashes go stale, of course. The update-versions command handles that: for npm-installed tools it queries the registry for the latest version and publish date; for script-install tools it downloads the install script, computes the SHA256, and compares against the stored hash. Version extraction is custom per-tool, for example Cursor embeds YYYY.MM.DD-hash patterns in download URLs, Amp has a GCS version endpoint, others use VERSION= patterns in the script itself.

 code-aide update-versions
Checking 8 tool(s) for updates...

Tool      Check         Version                  Date        Status
--------  ------------  -----------------------  ----------  ------
cursor    script-url    2026.02.27-e7d2ef6       2026-02-27  ok
claude    npm-registry  2.1.71                   2026-03-06  ok
gemini    npm-registry  0.32.1                   2026-03-04  ok
amp       script-url    v0.0.1772937800-g3b2e3d  2026-03-08  ok
codex     npm-registry  0.111.0                  2026-03-05  ok
copilot   npm-registry  1.0.2                    2026-03-06  ok
opencode  npm-registry  1.2.21                   2026-03-07  ok
kilo      npm-registry  7.0.40                   2026-03-06  ok

Updated latest version info in ~/.config/code-aide/versions.json.
No installer checksum updates required (latest version metadata was refreshed).
Note: 'update-versions' checks upstream metadata, not your installed binary versions. Use 'code-aide status' and 'code-aide upgrade' for local installs.

code-aide uses a two-layer version model: bundled definitions ship with the package and contain install methods, URLs, and SHA256 checksums. A local cache at ~/.config/code-aide/versions.json stores the latest versions and dates from update-versions. The cache merges into the bundled data at load time, so you can track upstream changes without waiting for a new code-aide release.

When a script-install tool's SHA256 has changed, it flags the mismatch and can write the updated hash back with --yes, or interactively one at a time. Finding the right version endpoint for Amp took a couple of tries; the obvious ampcode.com/version URL returns HTML; the actual version lives at a GCS endpoint buried in the install script. That version string (v0.0.1774123456-gc0ffee) probably also tells you something about Amp's relationship with semantic versioning, if we even care about such things in this fast moving LLM world.

Prerequisites

code-aide also handles the Node.js dependency problem for npm-based install paths. The minimum Node.js version varies by tool (at time of writing: Gemini wants 20+, Codex and Copilot want 22+ on npm installs). If you use brew / cask / native installers for those tools, Node.js may not be required. code-aide install -p detects your system package manager (apt, dnf, pacman, emerge, and a few others) and installs Node.js and npm if they're missing.

What I'd do differently

The format fragmentation across tools is the real ongoing cost. I've had to update templates multiple times already because a tool changed where it looks for config files or switched its frontmatter format. There's no standard emerging; if anything, each new tool invents another format. The title of this post has changed numbers several times before publishing.

The single-source approach helps, but it only works because the semantic overlap between tools is high; they all want roughly the same information, just arranged differently. If the tools diverge in what they support rather than just how they format it, the shared-content model gets harder to maintain.

Testing has improved since the first version. code-aide has a pytest suite now, though some of the harder-to-test operations (upgrade, remove, prerequisite installation) are still on the TODO list. Progress, at least.

Numbers

  • 8 coding LLM tools managed (4 with custom commands, all 8 with guidelines)
  • 3 install types: npm, script, direct download
  • 0 external Python dependencies

Adding a new coding LLM tool is mostly: add a JSON entry to tools.json, and it shows up everywhere on next install. Unless they invent yet another install mechanism!

Thoughts

The interesting problem here isn't dotfiles management, that's a solved problem with many good tools. The new problem that coding LLM assistants have created is a new category of configuration that needs to stay synchronized: guidelines, custom commands, skills, and permissions, across tools that don't share any common format. I've written separately about why I think the harness layer matters more than the model; this is the practical side of that argument. The approach I describe here is straightforward enough for readers to replicate by pointing a coding LLM at this post.

There are still gaps, such as whether prompt style should vary by harness and model combination. I haven't tested that systematically yet, including whether it is necessary to SHOUT in one model's skill text to emphasise.

code-aide is at github.com/dajobe/code-aide and installable via uv tool install code-aide.

This follows my earlier post on Redland Forge, which covers using LLMs for the actual development work. A companion post on the dotfiles system that powers the templating is also published.

Permalink

Zero-Dependency Dotfiles for a Homelab

2026-03-16 11:23

I have a dozen development machines in my homelab: a mix of Fedora, Gentoo, Debian (stable and unstable), Ubuntu LTS, macOS, and a few Turing Pi nodes. I got tired of my configurations drifting apart, so I built a dotfiles management system in Python. No external dependencies, just str.format() templates and JSON config files. It manages shell configs, git settings, Kubernetes credentials, and the configuration for eight different coding LLM CLI tools. That last part turned out to be an interesting use case, but the foundation described here is what makes it work.

The problem

The usual dotfiles approach is a git repo full of symlinks and a bash script to wire them up. That works until you need the same .bashrc to behave differently on macOS versus Debian, or you need API keys templated into config files without committing them to git, or you want your coding LLM assistant to follow the same rules regardless of which tool you're using this week.

I wanted one repo, one install command, and consistent configuration everywhere.

The approach

The core started as a single Python script, dotfiles.py, which I began writing in September 2025. It has since been refactored into a package: dotfiles.py remains the CLI entrypoint, and dotfiles_lib/ holds a dozen modules (config, installer, renderer, generators, platform detection, and others) totaling around 3,800 lines. The split was motivated by code review and testing, in that a 2,500-line monolith is hard to reason about in diffs, and hard to unit-test without importing everything.

The bootstrap was straightforward: I copied the shell dotfiles from all my hosts into one tree of per-host files, pointed an LLM at the pile, and had it analyze them for commonalities and generate the initial files and templates. Most of the per-host differences turned out to be PATH entries and tool availability, which mapped cleanly to OS detection variables.

The tool reads a main JSON config that maps target files to their sources:

{
  ".bashrc": {
    "mode": "templated",
    "template": "bash/bashrc.tmpl"
  },
  ".gitignore_global": {
    "mode": "symlink",
    "source": "git/gitignore_global"
  },
  ".claude/settings.json": {
    "mode": "templated",
    "template": "templates/claude-settings.tmpl"
  }
}

There are three installation modes: symlink for files that don't vary, templated for files that need per-machine or per-secret customization, and copy for root user configs where you don't want a symlink back to a regular user's repo. A fourth mode obsolete is also available to mark files that should be cleaned up during install which is useful when tools get renamed or configs move (coding LLMs do this a lot).

Templates use Python's str.format(): no Jinja2, no dependencies. The template data comes from three sources: the JSON config, OS detection at install time, and a ~/.secrets.sh file that holds API keys and credentials. On install, the script parses ~/.secrets.sh, merges it with the computed template data, and renders everything.

Installation on any machine is:

make install

(make clean handles the build artifacts, cache directories, and other generated files.)

Secrets without the complexity

I didn't want a secrets manager dependency. The approach is a ~/.secrets.sh file that's never committed, with a simple KEY=value format. It's also sourced by shells. The script parses it, strips quotes, and makes the values available as template variables:

# ~/.secrets.sh
ANTHROPIC_API_KEY="sk-ant-..."
GEMINI_API_KEY="..."
GIT_EMAIL="dave@dajobe.org"
...

If a key exists as an environment variable, that takes precedence over the file.

This is also where things like KUBE_CA_DATA live: base64-encoded certificate authority data that gets templated into kubeconfig files without committing credentials. A separate script pulls the right variables out of a kubeconfig YAML file so I don't have to do it by hand when cluster certificates rotate.

A separate utility copies the secrets file to all dev hosts over SSH. I keep it mode 0600 and only sync it to machines I trust with those credentials.

Recursive config directories

The dotfiles system doesn't just handle flat config files. Some tools such as coding LLM CLIs want directory trees for commands, skills, and agents rather than a single config file, so the installer walks those directories recursively and deploys them the same way it does ordinary dotfiles.

A skill like blog-post isn't a single file, it's a subdirectory containing a prompt definition and supporting reference materials. The install script had to be extended to handle these recursive structures rather than just flat file listings.

Agent definitions use the same pattern. They live in agents-config.json, which specifies the model, allowed tools, and a reference to the markdown prompt content. At install time that metadata is combined with the prompt text to generate tool-specific agent files, with a parity test to make sure the Claude Code and Amp versions stay in sync.

The git-commit agent is a good example of why this is useful. It can run git add, git diff, git commit, and a few other git commands and nothing else, which means I can point it at a messy working tree and trust it not to get creative.

Skills and agents then deploy with whatever packaging each tool expects: symlinks, file copies, and zip archives depending on what the target supports. The tool-specific details are in the companion post.

Splitting out code-aide

Installing, updating, and checking versions of Claude Code, Cursor, Gemini CLI, and the rest eventually outgrew its corner of dotfiles.py and got extracted into a separate open source tool called code-aide. The dotfiles repo still bootstraps it during make install with a best-effort uv tool install, but the upstream-version churn now lives in its own project rather than bulking up the renderer and installer logic here.

Ghostty terminal support

I started playing with the Ghostty terminal emulator and discovered that its xterm-ghostty terminfo entry isn't installed on most of my remote hosts. SSH into a machine without it and you get missing or unsuitable terminal: xterm-ghostty from every ncurses-based tool. Which is an annoying bump.

The fix: vendor the Ghostty terminfo source file into the dotfiles repo and compile it into ~/.terminfo during make install using tic. The install script checks whether the entry already exists and skips the compilation if so. Ghostty's own config file is also templated and deployed.

Not glamorous, but it's the kind of thing that makes a dotfiles system earn its keep with one fix deployed everywhere, instead of manually installing terminfo on each host.

Multi-host deployment

With a dozen or so machines, running ssh host 'cd dev/dotfiles && git pull && make install on each one gets tedious. Instead I have a deploy-dotfiles utility that reads the hosts list from config and runs the install:

$ deploy-dotfiles
host1 ✔
host2 ✔
host3 ✔
host4 ✔
host5  (connection timed out)
...

This is not done in parallel; I considered it but the SSH overhead means it's fast enough in sequence, and sequential output is easier to read when something fails. It's fine for a small homelab.

After make install, a JSON receipt file is written to ~/.dotfiles-version.json recording the git commit, install timestamp, and hostname. A version subcommand shows the source HEAD alongside the installed receipt, flagging stale installs. The deploy-dotfiles utility has a --check mode that queries receipt files on all remote hosts without deploying, so I can see at a glance which machines are behind.

What I'd do differently next time

The str.format() template engine has its limits. Anything with literal curly braces (JSON templates, for instance) requires doubling every brace that isn't a variable. I have a 245-line JSON config full of doubled braces. A Jinja2-like syntax would be cleaner, but I'd have to either add a dependency or write a minimal template engine. For now, the doubled braces are ugly but functional. They're also a reliable way to make an LLM lose track of what it's editing.

Some kind of file-system-convention approach (drop a .symlink suffix on files you want symlinked) might reduce the config overhead, but I haven't hit enough pain to justify the rewrite.

The test suite now covers config validation, template rendering, file installation, agent generation, skill parity, the markdown formatter, version receipts, utils, and the CLI itself with close to 20 test modules, around 4,000 lines. A pre-submit script runs Black, mypy, and pytest through uv, and the Makefile has targets for each. It's not full CI yet (there's no pipeline triggered on push), but the local workflow catches most regressions before they're committed.

Numbers at a glance

  • 30+ dotfiles managed (15+ templated, 14 symlinked)
  • 3 agents generated for Claude Code and Amp
  • 2 skills deployed to Claude Code and Codex (blog-post, job-prep)
  • 8 coding LLM tools configured with code-aide
  • 12+ development hosts deployed to
  • ~20 test modules, ~4,000 lines
  • 0 external Python dependencies

The whole thing runs with make install and takes about a second. Zero dependencies means it works on a fresh machine with just Python 3.11, which every machine in my fleet already has.

Adding a new machine is: clone, create ~/.secrets.sh by hand, run make install.

Adding a new dotfile is: create the template or source file, add one entry to the config JSON, run make install. I just get an LLM to do those changes with a prompt like ingest ~/.newdotfile and manage it with dotfiles, review and approve.

Thoughts

This work is private but the approach is straightforward enough to replicate by pointing an LLM at this blog post. The interesting bits are the template data pipeline (secrets file + OS detection + JSON config merged at install time) and the zero-dependency constraint, not any particularly clever code.

The coding LLM tool configuration is covered in a companion post: Eight Coding LLM Tools, One Configuration.

Permalink

The Harness Is The Product (and Other Hot Takes)

2026-03-06 19:00

I've spent the last nine months using AI coding tools on my own projects such as Claude Code, Cursor, Gemini CLI, Amp, Codex and others. I'm currently between jobs, which means I have no corporate agenda and no stake in any of these companies.

I have opinions. Some of them might even survive the week.

Part 1: The Harness Is the Product

GPT-5.4 is very good with Cursor. Surprisingly good. I don't even see it showcased this well in ChatGPT, which is OpenAI's own product. That's a tell.

The most interesting thing happening in AI right now isn't the models, it's the harnesses acting as the integration layer: tool calling, UX, agentic orchestration. A great model in a mediocre harness loses to a good model in a great harness. Gemini's models are competitive but feel underwhelming because Google's tooling can't showcase them, which is presumably why they acquired Windsurf and relaunched it as Antigravity. Claude's models shine brightest through Claude Code. The engine matters, but nobody buys an engine.

Hot take: The model is the engine. The harness is the car.

This has implications for where moats form. For a while it looked like scale and training compute were the only defensible positions. That's still true at the absolute frontier, but below that line, models are converging fast enough that harness quality dominates the user experience. Cursor and Claude Code figured this out early. The companies that win will be the ones treating the model as a component and the harness as the product, which is a deeply uncomfortable position for labs that spent billions training the models.

It's worth being specific about what a modern harness actually does, because the shift is easy to miss. Early AI coding tools worked like this:

prompt → completion

You asked a question, got an answer, tried again if it was wrong.

Modern coding systems work like this:

observe repository  plan change  edit files  run tests  inspect errors  iterate

That loop is subtle but it changes everything. The system isn't generating code snippets; it's participating in a continuous cycle over a real project. It reads the codebase, modifies multiple files, runs commands, and adjusts based on results. Less autocomplete, more collaborator.

And here's the thing: a lot of the agentic stuff IS just this loop with different tools plugged in. An agent observes state, generates a command or script, runs it, inspects the output, decides what to do next. Even tasks that aren't obviously programming often reduce internally to "write some python, call an API, parse the result, continue." If you solve the coding harness, you've solved a large chunk of the general agentic problem. This is something Anthropic realized relatively recently and took advantage of with CoWork.

Hot take: Agents are mostly code-writing loops with tool access.

This also means the IDE is quietly becoming an agent runtime. Editors already provide everything agents need: structured projects, deterministic execution environments, version control, feedback loops. It's not a coincidence that the best agent experiences are happening inside coding tools or on CLIs rather than chat windows.

Hot take: The IDE is becoming the operating system for AI agents.

The Google tragedy

Google is the most painful case study: they have the research, the infrastructure, the talent, and arguably the best foundation model team on earth, yet they keep fumbling the integration layer. The Windsurf acquisition and Antigravity launch tells the story: Google paid $2.4 billion to license Windsurf's code and hire its founders, then launched Antigravity four months later.

That's a strange failure mode for the company that built Gmail, Maps, and Search. Something broke culturally.

I want Google to be good and honestly, they do have adjacent AI products that are very good from my experience. NotebookLM is great, AI search is free and genuinely useful. The whole Google Docs ecosystem works well with AI. Google's strength has always been horizontal platform plays, and those products reflect that.

But the coding-centric agentic future is a vertical integration game and Google keeps losing it. Their model quality isn't the problem; their harness is.

If harnesses are becoming the product, the next question is: who builds them?

Part 2: Open Source and the Harness Layer

The leading harnesses right now are proprietary. Cursor is proprietary. Claude Code is proprietary. Antigravity is a $2.4 billion proprietary fork. So: closed source wins?

Not so fast. It's worth noting that the model layer hasn't been won by open source either, despite the narrative. Open weights models from Meta and others (mostly Chinese labs) are competitive but the frontier is still closed, and Meta's stuff is clearly a strategic weapon against Google and OpenAI dressed up as generosity.

The harness layer is more interesting because it's more contested. OpenClaw blows a hole in the story. Formerly Clawdbot, then Moltbot, it went from 9,000 to 60,000+ GitHub stars in days and now sits over 250K. It's not a coding harness in the Cursor sense; it's a general agentic harness with message routing across WhatsApp, Telegram, Discord, and dozens of other channels, autonomous task execution, 50+ integrations, running 24/7 on your own hardware. OpenCode is doing something similar for the coding-specific case.

These projects are moving fast and quite arguably faster than their closed counterparts on raw feature velocity.

The tradeoff is risk. OpenClaw's attack surface is enormous. Security researchers have mapped it against every category in the OWASP Top 10 for Agentic Applications. There are documented cases of agents acting well beyond user intent; one created a dating profile autonomously, which is either impressive or terrifying depending on your perspective. Its creator, Peter Steinberger, joined OpenAI and the project is moving to an open source foundation. That could mean more institutional backing or it could mean founder departure stalls momentum, it's too early to tell.

So the real picture isn't "open source is winning" or "open source is losing." It's that closed harnesses and open source harnesses are optimizing on different axes:

  • Closed (Cursor, Claude Code): safety, polish, tight model integration
  • Open (OpenClaw, OpenCode): extensibility, speed, community velocity, accepting more risk

Both are viable today. The question is which axis matters more as agentic tools move from developers to everyone else. My guess: the closed harnesses win the mainstream because most people don't want to manage their own attack surface. But open source keeps pushing the bleeding edge, and ideas flow from bleeding edge to mainstream on roughly a three-month delay.

Hot take: The open source harness ecosystem is about three months ahead of commercial tools. The ideas show up there first; the polish shows up later in closed products.

Hot take: Models may become commodities. Harnesses are the product.

This might be the first major technology wave where open source doesn't clearly own the infrastructure layer, or it might not. Ask me again in a few weeks, when this take will be outdated.

Part 3: Code Is the New Assembler (and Other Predictions)

Code is becoming the new assembler. Nobody writes assembler anymore, but it didn't disappear, it just got generated and was below the surface. Code is heading the same way. The skill is shifting from "can you write code" to "can you specify intent precisely enough that code gets generated correctly." That's closer to systems architecture than traditional programming.

The agentic loop where a human specifies, model generates, harness orchestrates, human validates, is the new unit of work. This now applies well beyond coding, as any task that can be decomposed into tool calls and validation steps is fully in agentic territory. Code is just where it showed up first because code is the easiest thing to validate (it either runs or it doesn't, mostly).

The competence amplifier

Here's something I didn't expect. Over the past nine months I've shipped working tools and apps written in Go, JavaScript, and Postgres. I don't write Go or JavaScript, although I can read them. I've never administered Postgres in anger. But I have 25+ years of systems experience, and it turns out that's enough. I can read the generated code, spot architectural problems, evaluate whether the error handling makes sense, and steer the iteration loop. I can't write idiomatic Go from scratch but I can tell when the AI-generated Go is doing something stupid.

This is the real shape of "code as assembler." The AI handles the syntax and idiom; the human provides the judgment layer. My experience with distributed systems, failure modes, and operational patterns transfers directly even when I don't know the language. The harness doesn't replace expertise, it makes expertise portable across languages and frameworks in a way that wasn't possible before.

This has two implications.

  1. For experienced developers, your value shifts from "I know language X" to "I know how systems work." That's a bigger, more durable moat.
  2. For non-developers (product managers, designers, domain experts) the barrier to building working software just dropped dramatically. They don't need to learn Go or Python. They need to learn how to specify what they want clearly enough that the loop converges. That's a different skill, and a lot of people already have it without realizing.

Hot take: AI coding tools don't replace developers. They make systems thinking portable across any language or framework.

The model picker disappears

One near-term prediction: the model picker goes away. Nobody types http:// anymore. Nobody picks which CDN node serves their webpage. The system picks. Model selection is heading the same way.

The fact that I currently care whether I'm running Sonnet 4.6 or GPT-5.4 is a sign of immaturity, not a feature. In two years, maybe less, the harness routes tasks dynamically:

cheap model      routine edits, boilerplate
reasoning model  planning, architecture
coding model     implementation
verifier model   checking, testing

The user interacts with one interface. The model choice becomes an implementation detail, like which CPU core your thread runs on. That'll be a sign the ecosystem has grown up.

Hot take: The model menu will eventually disappear.

(The model picker sticking around for power users and experts is fine. I'm talking about the default experience.)

The rate of change problem

The uncomfortable corollary to all of this is that the rate of change is stupid fast. Expertise about specific model behavior expires in days to weeks. Any opinion formed about a model's capabilities on a given Tuesday is stale by the following Tuesday. Including the opinions in this post, presumably.

The durable skills are meta-skills: evaluating models, designing harnesses, specifying intent, thinking in systems. The specific knowledge of "Claude is good at X but bad at Y" or "GPT-5.4 handles long context better than..." is transient. It's useful for a week, maybe two, then something ships and the landscape shifts.

This favors a certain kind of engineer. The senior generalist who thinks in systems, evaluates tradeoffs, and adapts fast. Not the specialist who knows one tool deeply. This is convenient for me, I realize, but I think it's true regardless.

Hot take: The most valuable AI skill is no longer prompting. It's building the loop around the model.

Where this lands

I don't have a neat conclusion. These are hot takes and some of them will age badly. But the harness-as-product thesis feels durable to me, the open source picture is genuinely unsettled, and "code as assembler" is more a description of what's already happening than a prediction.

Interesting times.

Permalink

22 Years of Code, 2 Months of LLMs: The Redland Forge Story

2025-09-13 12:34

Twenty-two years ago, I wrote some Perl scripts to test Redland RDF library builds across multiple machines with SSH. Two months ago, I asked an LLM to turn those scripts into a modern Python application. The resulting Redland Forge application evolved from simple automation into a full terminal user interface for monitoring parallel builds - a transformation that shows how LLMs can accelerate development from years into weeks.

The Shell Script Years (2003-2023)

The project originated from the need to build and test Redland, an RDF library with language bindings for C, C#, Lua, Perl, Python, PHP, Ruby, TCL and others. The initial scripts handled the basic workflow: SSH into remote machines, transfer source code, run the autoconf build sequence, and collect results.

Early versions focused on the fundamentals: - Remote build execution via SSH - Basic timing and status reporting - Support for the standard autoconf pattern: configure, make, make check, make install - JDK detection and path setup for Java bindings - Cross-platform compatibility for various Unix systems and macOS

Over the years, the scripts grew more features: - Automatic GNU make detection across different systems - Berkeley DB version hunting (supporting versions 2, 3, and 4) - CPU core detection for parallel make execution - Dynamic library path management for different architectures - Enhanced error detection and build artifact cleanup

The scripts were pretty capable of handling everything from config.guess location discovery to compiler output integration into build summaries.

The Python Conversion (2024)

The script remained largely the same until 2024, when I decided to revisit it. It was time to move on from Perl and shell scripts and it seemed like a good opportunity to use the emerging LLM coding agents to do that with a simple prompt. This was relatively easy to do and I forget which LLM I used but it was probably Gemini.

The conversion to Python brought:

  • Type hints and modern Python 3 features.
  • Proper argument parsing with argparse instead of manual option handling
  • Pathlib for cross-platform file operations.
  • Structured logging with debug and info levels.
  • Better error handling and user feedback.

The user experience improved as well: - Intelligent color support that detects terminal capabilities. - Host file support with comment parsing. - Build summaries with success/failure statistics and emojis. I'm not sure if that's absolutely an improvement, but 🤷

Terminal User Interface (2025)

A year later, in July 2025, with LLM technology rapidly advancing almost weekly, I was inspired to make a big change to the tool by prompting to make it a full text user interface, with parallel execution of the builds visible interactively in the terminal.

Continuing from the Python foundation, the tool gained a full terminal user interface. The TUI could monitor multiple builds simultaneously, showing real-time progress across different hosts.

One of the first prompts was to identify what existing Python TUI and other classes should be used, and this quickly led to using blessed for TUI and paramiko for SSH.

A lot of the early work was making the TUI work properly on a terminal, where the drawn UI did not cause scrolling or overflows, and the text wrapping or truncation worked properly. After something worked, prompting the LLM to make unit tests for each of these was very helpful to avoid backsliding.

As it grew, the architecture became much more modular: - SSH connection management with parallel execution - A blessed-based terminal interface for responsive updates - Statistics tracking and build step detection - Keyboard input handling and navigation

Each of those was by prompting to refactor large classes, sometimes identifying which ones to attack by using a prompt to analyze the code state and identify candidates, and sometimes by running external code complexity tools; in this case Lizard

The features grew quickly at this stage: - Live progress updates based on event loop. - Adaptive layouts that resize with the terminal. - Automatic build phase detection (extract, configure, make, check, install). - Color-coded status indicators both as builds ran, and afterwards. - Host visibility management for large deployments so if the window was too small, you'd see a subset of hosts building in the window.

The design used established design patterns such as the observer pattern for state changes, strategy pattern for layouts, and manager (factory) pattern for connections. Most of these were picked by the LLM in use at the time with occasional guidance such as "make a configuration class"

Completing the application (September 2025)

The final phase built the tool into a more complete application and added release focus features and additional testing. The tool transformed from an internal development utility into something that could be shared and useful for anyone who had an autoconf project tarball and SSH.

Major additions included: - A build timing cache system with persistent JSON storage so it could store previous build times. - Intelligent progress estimation based on the cached times. - Configurable auto-exit functionality with countdown display. - Keyboard based navigation of hosts and logs with a full-screen host mode and interactive menus.

The testing at this point was quite substantial: - Over 400 unit tests covering all components. - Mock-based testing for external dependencies. - Integration tests and edge cases.

At this point it was doing the job fully and seemed complete, and of more broader use than just for Redland dev work.

Learnings

Redland Forge demonstrates how developer tools evolve. What started as pragmatic shell and perl scripts for a specific need grew into a sophisticated application. Each phase built on the previous, with the Python conversion serving as the catalyst that enabled the terminal interface.

It also demonstrates how LLMs in 2025 can act as a leverage multiplier to productivity, when used carefully. I did spend a lot of time pasting terminal outputs for debugging the TUI boxes and layout. I used lots of GIT commits and taggings when the tool worked; I even developed a custom command to make the commits in a way that I prefered, avoiding hype which some tend to do, but that's another story or blog post. When the LLMs made mistakes, I could always go back to the previous working GIT (git reset --hard), or ask it to try again which worked more than you'd expect. Or try a different LLM.

I found that coding LLMs can work on their own somewhat, depending on the LLM in question. Some regularly prompt for permissions or end their turn after some progress whereas others just keep coding without checking back with me. This allowed some semi asynchronous development where a bunch of work was done, then I reviewed its work and adjusted. I did review the code, since I know Python well enough.

The skill I think I learnt the most about was in writing prompts or what is now being called spec-driven development for much later larger changes. I described what I wanted to one LLM and made it write the markdown specification and sometimes asked a different LLM to review it for gaps, before one of them implemented it. I often asked the LLM to update the spec as it worked, since sometimes the LLMs crashed or hung or looped with the same output, and if the spec was updated, the chat could be killed and restarted. Sometimes just telling the LLM it was looping was enough.

The final application I'm happy with and it's nearly 100% written by the LLMs, including the documentation, tests, configuration, although that's 100% prompted by me, 100% tested by me and 100% of commits reviewed by me. After all, it is my name in the commit messages.


Disclaimer: I currently work for Google who make Gemini.

Permalink

Production Chaos

2025-02-01 11:00

Chaos happens a lot in production and in the associated roles such as Site Reliability Engineering (SRE). Day to day you can be dealing a scale of chaos from noise, interruptions, unknowns, mysteries all the way up to incidents, emergencies and disasters. If you are working in that space, you will have to deal with tradeoffs of risk, time, uncertainty and more. The "unknown unknowns" as Donald Rumsfeld put it or the 1-in-a-million events can happen regularly, if are operating a lot of code, data or systems.

If this is going to happen all the time, you need to have support around you, in particular a team, leadership and organization you can trust to support you whatever happens. You have to be able to relax even in the stressful environment, not worrying about your personal safety or career. This leads to the SRE best practice of blameless when things are failing; it's the fault of the system, not the person. There is no way that you are going to get people working their best, if they are going to get blamed for making mistakes. That way leads to hiding things, avoiding responsibility and a negative feedback loop where everyone avoids making things better.

If you have a culture of blame and fear, you are going to get the worst from your people. Which leads me to my experience working at Twitter when Phony Stark aka Space Karen aka Elmo Maga bought it. He did not trust his employees, did not support them, did not communicate with them, and indeed blamed them. He wanted and fostered a culture of fear and uncertainty.

It was so chaotic at the end I once had two managers message me the same hour that they were my new manager. I also I didn't know at the time that he was my manager for two days:

Picture showing table of my Twitter managers and dates with names redacted except for Elon Musk

Elon Musk is a negative example of how to manage and how to be a grown human. He has many character flaws and a Character Limit.

He is not an example to copy.

It's nauseating seeing him repeating this again at the US Government: Déjà Vu: Elon Musk Takes His Twitter Takeover Tactics to Washington (Gift link)

Permalink

10 Years an SRE

2025-01-08 09:53

I've recently been thinking about my SRE journey and the SRE role I had at Twitter. When I joined in 2015, it was my first SRE titled position but I frankly didn't know what the SRE job was really all about.

At the time, Twitter had no specific SRE onboarding - Flight School was focused on software engineers. You were mostly expected to shoulder-surf existing SREs and learn by osmosis and that's what I did. That's a poor approach in multiple ways including how it is unstructured, requires extensive one-to-one time and may not suit the learning style.

Instead, I used the approach of "to learn it, teach it" and I started creating some SRE-specific onboarding along with an experienced SRE colleague, Rob Malchow. Over a year or so, we developed 3 SRE onboarding courses that covered technologies, processes and specific help for SREs in particular about how to prepare and be oncall - including a big DON'T PANIC slide.

I taught this series of 3 courses with Rob and others maybe 5 times to hundreds of people and every time I tried to invite an existing experienced SRE to join, so that every time the course material was improved and corrected, because nothing stays still in tech. I also believe that pair or co-teaching works much better as two people can both deliver and check for understanding in parallel. At the end I felt I had got a good grasp of the SRE scope. Hopefully the students did too!

With reference to the SRE books from Google, I had read the first one which was out at the time but I found that Google scale and approaches needed customizing for the environment although lots was highly relevant to get to a data-driven approach to reliability using SLA, SLI, SLOs and error budgets.

Now I'm at Google and can see the other side of the fence, where SRE training ("SRE EDU") is taken very seriously and extensive training created and delivered where feedback and evolution is built in. Education remains a very interesting area to me and "always be learning" is an important value of mine. It's also a key part of being an SRE, and working in tech more generally. Hopefully I can participate here too.

Permalink

Google 3 Times

2024-11-04 00:00

Today I work at Google but this was my third attempt to try to join, going back over a decade. Looking back, the first two times I attempted to join as a software engineer, which was probably in hindsight, where I went wrong.

When I was interviewing at Twitter in 2015 for a software engineering role, the Hadoop team engineering manager Joep Rottinghuis suggested my skills might fit better into a different position called Site Reliability Engineering which was new to me at the time. I switched to interview with the SRE manager, an ex-Yahoo! colleague Pascal Borghino and successfully joined.

It turned out that SRE was the perfect position for me, and was the role I had been doing all along, but I just didn't know it existed. I always was curious about the lower levels of software and hardware and wanting to automate getting things working reliably.

As for Google, I successfully interviewed for an SRE position late in 2022 with Duncan Winn and Andrew Brampton, for a start in March 2023. It really helped this time that I was interviewing for the right role which took me some time to figure out, pretty much 25 years into my career.

Permalink

Twitter Interviewing

2024-11-01 00:00

At Twitter, I believe we did a great job in interviewing. We interviewed in pairs (one talks, one takes notes), we used rubrics to ensure fairness and had lots of checks and balances. Everyone who did interviewing had to do training, shadow and then could participate. Although I wasn't involved in the final decision / offer side of it, I did interview a fair amount of it and even interviewed my new manager a couple of times.

In all those interviews I never ONCE looked at a candidates school / college and certificates or qualifications. We always asked questions to figure out what they could DO, how they worked and their career in the sense of what they wanted and how they learnt from their journey. Evidence of emotional intelligence, learning and curiosity were key, beyond the technical skills.

I still believe that was a great approach.

Permalink

The Journey to SRE

2023-03-20 18:00

In my career I've had three big fork()s in the road, so far.

My higher education started off with a Computer Science degree back in 1990 from University of Bristol UK, with a class size of less than 20. My final year project was a parallelized graphics renderer written in occam. #code #graphics

At the end my degree, I had applied to do a PhD in computer graphics, but a couple of days before that offer appeared: I got a job offer for a parallel computing position, which I accepted.

Fork #1: Parallel Computing

If I had started the PhD, the other fork path would likely have ended up as me working as a computer graphics renderer or pipeline engineer working for a big CGI or SFX firm, probably in the US.

Instead, I went to work at the University of Kent at Canterbury (UKC), now called University of Kent, in Canterbury, UK of course. There I worked on the Meiko parallel computer at a blistering 25MHz - a relatively unheard of speed in 1990 - with dozens of nodes, each capable of thousands of lightweight processes based on CSP (lighter than threads, look it up). I helped operate the Meiko system: rebooting, rewiring it (literally wires) between the nodes and racks. #operations #code #learning #teaching

Deeper into that, I got into organizing materials for the Internet Parallel Computing Archive, the software to manage it and hand-compiling Apache on SunOS and IRIX to run it. This led to my first home page location http://www.hensa.ac.uk/parallel/www/djb1.html in 1993. #operations #code

Fork #2: Web and Metadata

If I had continued with parallel computing, the second fork alternate path would have likely been going into research, getting a PhD, working on high performance computing, supercomputers and probably ending up in the US.

Instead, I improved the archive, developed metadata to manage it in IAFA Templates and expanded to work on web metadata, Dublin Core and onwards to RDF and Semantic Web. I wrote software in Perl, presented at multiple web conferences from WWW3, workshops and attended many Dublin Core working groups. #code #web #rdf #metadata

Meanwhile, around 1996, my day-to-day work changed to be web-focused, working on the UK Mirror Service at Kent, installing machines, operating them, making backups and keeping things running for the entire UK academic network, a network called HENSA. I also ran the computer science department's (first) web site http://www.cs.ukc.ac.uk. This was where I learnt operations, web tech and started using Linux. #web #operations #learning

In 2000, I took up an opportunity to go work at the Institute for Learning and Research Technology (ILRT) at the University of Bristol as a technical researcher entirely on software and metadata in the emerging RDF and semantic web area. At that time, I created the Free Software / Open Source Redland RDF libraries all written in C and supporting multiple language bindings, developed and tested these across multiple OSes via build farms. I worked for several years on the software, RDF, semantic web and other standards work in EU research projects such as SWADE, SKOS, as well as lots of W3C projects and working groups for RDF, SPARQL and Turtle. I learnt so much about organizing my time and working in a fast changing environment. #operations #code #web #learning #metadata

I was asked in 2005 if I'd like to come take the work and experience I'd developed in the semantic web work and deploy the software at Yahoo! in USA. I said yes.

Fork #3: Corporate USA

The third fork's other path would have been continuing in the UK and EU University sector, working on open source and web technologies as they evolved. Possibly, I would have ended up working in some large UK IT firm, deploying web tech or teaching web tech in Universities.

At Yahoo! in Sunnyvale, I entered a whole new world, in which there were highly specialized roles, such as Product Managers and Operations Engineers to go along with Software Engineers. After multiple positions and not working on coding or web technologies, I ended up far away from my happy place. #architect #learning

In 2012, I moved on to software engineering roles at a social news startup, Digg, which closed up shop, then subsequently at Rackspace Hosting in San Francisco in 2013. In both cases, I was increasingly working Hadoop big data applications, as well as running and operating Hadoop which was now called DevOps. #operations #code #learning #bigdata

That led to joining Twitter in 2016 finally as Site Reliability Engineer for the Data Platform operating the Hadoop clusters with software addressing the day to day issues, automating the routine tasks and working on strategic projects like cloud for data platform. Finally, I arrived at the job title that matched what I'd been doing for a long time and I loved working in a group of SREs, always learning and helping. #sre #operations #code #learning #teaching #bigdata #cloud

In 2022, Twitter also sold its furniture and, well, that's another story... #chaos

So here we are in 2023 and I'm excited to announce I'm joining Google as a Staff System Engineer in the Site Reliability Engineering part of the Google Cloud organization. #sre #learning #cloud

Permalink

Making Debian Docker Images Smaller

2015-04-18 14:00

TL;DR:

  1. Use one RUN to prepare, configure, make, install and cleanup.
  2. Cleanup with: apt-get remove --purge -y $BUILD_PACKAGES $(apt-mark showauto) && rm -rf /var/lib/apt/lists/*

I've been packaging the nghttp2 HTTP/2.0 proxy and client by Tatsuhiro Tsujikawa in both Debian and with docker and noticed it takes some time to get the build dependencies (C++ cough) as well as to do the build.

In the Debian packaging case its easy to create minimal dependencies thanks to pbuilder and ensure the binary package contains only the right files. See debian nghttp2

For docker, since you work with containers it's harder to see what changed, but you still really want the containers as small as possible since you have to download them to run the app, as well as the disk use. While doing this I kept seeing huge images (480 MB), way larger than the base image I was using (123 MB) and it didn't make sense since I was just packaging a few binaries with some small files, plus their dependencies. My estimate was that it should be way less than 100 MB delta.

I poured over multiple blog posts about Docker images and how to make them small. I even looked at some of the squashing commands like docker-squash that involved import and export, but those seemed not quite the right approach.

It took me a while to really understand that each Dockerfile command creates a new container with the deltas. So when you see all those downloaded layers in a docker pull of an image, it sometimes is a lot of data which is mostly unused.

So if you want to make it small, you need to make each Dockerfile command touch the smallest amount of files and use a standard image, so most people do not have to download your custom l33t base.

It doesn't matter if you rm -rf the files in a later command; they continue exist in some intermediate layer container.

So: prepare configure, build, make install and cleanup in one RUN command if you can. If the lines get too long, put the steps in separate scripts and call them.

Lots of Docker images are based on Debian images because they are a small and practical base. The debian:jessie image is smaller than the Ubuntu (and CentOS) images. I haven't checked out the fancy 'cloud' images too much: Ubuntu Cloud Images, Snappy Ubuntu Core, Project Atomic, ...

In a Dockerfile building from some downloaded package, you generally need wget or curl and maybe git. When you install, for example curl and ca-certificates to get TLS/SSL certificates, it pulls in a lot of extra packages, such as openssl in the standard Debian curl build.

You are pretty unlikely to need curl or git after the build stage of your package. So if you don't need them, you could - and you should - remove them, but that's one of the tricky parts.

If $BUILD_PACKAGES contains the list of build dependency packages such as e.g. libxml2-dev and so on, you would think that this would get you back to the start state:

$ apt-get install -y $BUILD_PACKAGES
$ apt-get remove -y $BUILD_PACKAGES

However this isn't enough; you missed out those dependencies that got automatically installed and their dependencies.

You could try

$ apt-get autoremove -y

but that also doesn't grab them all. It's not clear why to me at this point. What you actually need is to remove all autoadded packages, which you can find with apt-mark showauto

So what you really need is

$ AUTO_ADDED_PACKAGES=`apt-mark showauto`
$ apt-get remove --purge -y $BUILD_PACKAGES $AUTO_ADDED_PACKAGES

I added --purge too since we don't need any config files in /etc for build packages we aren't using.

Having done that, you might have removed some runtime package dependencies of something you built. That's harder to automatically find, so you'll have to list and install those by hand

$ RUNTIME_PACKAGES="...."
$ apt-get install -y $RUNTIME_PACKAGES

Finally you need to cleanup apt which you should do with rm -rf /var/lib/apt/lists/* which is great and removes all the index files that apt-get update installed. This is in many best practice documents and example Dockerfiles.

You could add apt-get clean which removes any cached downloaded packages, but that's not needed in the official Docker images of debian and ubuntu since the cached package archive feature is disabled.

Finally don't forget to delete your build tree and do it in the same RUN that you did a compile, so the tree never creates a new container. This might not make sense for some languages where you work from inside the extracted tree; but why not delete the src dirs? Definitely delete the tarball!

This is the delta for what I was working on with dajobe/nghttpx.

479.7 MB  separate prepare, build, cleanup 3x RUNs
186.8 MB  prepare, build and cleanup in one RUN
149.8 MB  after using apt-mark showauto in cleanup

You can use docker history IMAGE to see the detailed horror (edited for width):

...    /bin/sh -c /build/cleanup-nghttp2.sh && rm -r   7.595 MB
...    /bin/sh -c cd /build/nghttp2 && make install    76.92 MB
...    /bin/sh -c /build/prepare-nghttp2.sh            272.4 MB

and the smallest version:

...    /bin/sh -c /build/prepare-nghttp2.sh &&         27.05 MB

The massive difference is the source tree and the 232 MB of build dependencies that apt-get pulls in. If you don't clean all that up before the end of the RUN you end up with a huge transient layer.

The final size of 149.8 MB compared to the 122.8 MB debian/jessie base image size is a delta of 27 MB which for a few servers, a client and their libraries sounds great! I probably could get it down a little more if I just installed the binaries. The runtime libraries I use are 5.9 MB.

You can see my work at github and in the Docker Hub

... and of course this HTTP/2 setup is used on this blog!

References

Permalink