Dave Beckett

The Harness Is The Product (and Other Hot Takes)

2026-03-06 19:00

I've spent the last nine months using AI coding tools on my own projects such as Claude Code, Cursor, Gemini CLI, Amp, Codex and others. I'm currently between jobs, which means I have no corporate agenda and no stake in any of these companies.

I have opinions. Some of them might even survive the week.

Part 1: The Harness Is the Product

GPT-5.4 is very good with Cursor. Surprisingly good. I don't even see it showcased this well in ChatGPT, which is OpenAI's own product. That's a tell.

The most interesting thing happening in AI right now isn't the models, it's the harnesses acting as the integration layer: tool calling, UX, agentic orchestration. A great model in a mediocre harness loses to a good model in a great harness. Gemini's models are competitive but feel underwhelming because Google's tooling can't showcase them, which is presumably why they acquired Windsurf and relaunched it as Antigravity. Claude's models shine brightest through Claude Code. The engine matters, but nobody buys an engine.

Hot take: The model is the engine. The harness is the car.

This has implications for where moats form. For a while it looked like scale and training compute were the only defensible positions. That's still true at the absolute frontier, but below that line, models are converging fast enough that harness quality dominates the user experience. Cursor and Claude Code figured this out early. The companies that win will be the ones treating the model as a component and the harness as the product, which is a deeply uncomfortable position for labs that spent billions training the models.

It's worth being specific about what a modern harness actually does, because the shift is easy to miss. Early AI coding tools worked like this:

prompt → completion

You asked a question, got an answer, tried again if it was wrong.

Modern coding systems work like this:

observe repository  plan change  edit files  run tests  inspect errors  iterate

That loop is subtle but it changes everything. The system isn't generating code snippets; it's participating in a continuous cycle over a real project. It reads the codebase, modifies multiple files, runs commands, and adjusts based on results. Less autocomplete, more collaborator.

And here's the thing: a lot of the agentic stuff IS just this loop with different tools plugged in. An agent observes state, generates a command or script, runs it, inspects the output, decides what to do next. Even tasks that aren't obviously programming often reduce internally to "write some python, call an API, parse the result, continue." If you solve the coding harness, you've solved a large chunk of the general agentic problem. This is something Anthropic realized relatively recently and took advantage of with CoWork.

Hot take: Agents are mostly code-writing loops with tool access.

This also means the IDE is quietly becoming an agent runtime. Editors already provide everything agents need: structured projects, deterministic execution environments, version control, feedback loops. It's not a coincidence that the best agent experiences are happening inside coding tools or on CLIs rather than chat windows.

Hot take: The IDE is becoming the operating system for AI agents.

The Google tragedy

Google is the most painful case study: they have the research, the infrastructure, the talent, and arguably the best foundation model team on earth, yet they keep fumbling the integration layer. The Windsurf acquisition and Antigravity launch tells the story: Google paid $2.4 billion to license Windsurf's code and hire its founders, then launched Antigravity four months later.

That's a strange failure mode for the company that built Gmail, Maps, and Search. Something broke culturally.

I want Google to be good and honestly, they do have adjacent AI products that are very good from my experience. NotebookLM is great, AI search is free and genuinely useful. The whole Google Docs ecosystem works well with AI. Google's strength has always been horizontal platform plays, and those products reflect that.

But the coding-centric agentic future is a vertical integration game and Google keeps losing it. Their model quality isn't the problem; their harness is.

If harnesses are becoming the product, the next question is: who builds them?

Part 2: Open Source and the Harness Layer

The leading harnesses right now are proprietary. Cursor is proprietary. Claude Code is proprietary. Antigravity is a $2.4 billion proprietary fork. So: closed source wins?

Not so fast. It's worth noting that the model layer hasn't been won by open source either, despite the narrative. Open weights models from Meta and others (mostly Chinese labs) are competitive but the frontier is still closed, and Meta's stuff is clearly a strategic weapon against Google and OpenAI dressed up as generosity.

The harness layer is more interesting because it's more contested. OpenClaw blows a hole in the story. Formerly Clawdbot, then Moltbot, it went from 9,000 to 60,000+ GitHub stars in days and now sits over 250K. It's not a coding harness in the Cursor sense; it's a general agentic harness with message routing across WhatsApp, Telegram, Discord, and dozens of other channels, autonomous task execution, 50+ integrations, running 24/7 on your own hardware. OpenCode is doing something similar for the coding-specific case.

These projects are moving fast and quite arguably faster than their closed counterparts on raw feature velocity.

The tradeoff is risk. OpenClaw's attack surface is enormous. Security researchers have mapped it against every category in the OWASP Top 10 for Agentic Applications. There are documented cases of agents acting well beyond user intent; one created a dating profile autonomously, which is either impressive or terrifying depending on your perspective. Its creator, Peter Steinberger, joined OpenAI and the project is moving to an open source foundation. That could mean more institutional backing or it could mean founder departure stalls momentum, it's too early to tell.

So the real picture isn't "open source is winning" or "open source is losing." It's that closed harnesses and open source harnesses are optimizing on different axes:

  • Closed (Cursor, Claude Code): safety, polish, tight model integration
  • Open (OpenClaw, OpenCode): extensibility, speed, community velocity, accepting more risk

Both are viable today. The question is which axis matters more as agentic tools move from developers to everyone else. My guess: the closed harnesses win the mainstream because most people don't want to manage their own attack surface. But open source keeps pushing the bleeding edge, and ideas flow from bleeding edge to mainstream on roughly a three-month delay.

Hot take: The open source harness ecosystem is about three months ahead of commercial tools. The ideas show up there first; the polish shows up later in closed products.

Hot take: Models may become commodities. Harnesses are the product.

This might be the first major technology wave where open source doesn't clearly own the infrastructure layer, or it might not. Ask me again in a few weeks, when this take will be outdated.

Part 3: Code Is the New Assembler (and Other Predictions)

Code is becoming the new assembler. Nobody writes assembler anymore, but it didn't disappear, it just got generated and was below the surface. Code is heading the same way. The skill is shifting from "can you write code" to "can you specify intent precisely enough that code gets generated correctly." That's closer to systems architecture than traditional programming.

The agentic loop where a human specifies, model generates, harness orchestrates, human validates, is the new unit of work. This now applies well beyond coding, as any task that can be decomposed into tool calls and validation steps is fully in agentic territory. Code is just where it showed up first because code is the easiest thing to validate (it either runs or it doesn't, mostly).

The competence amplifier

Here's something I didn't expect. Over the past nine months I've shipped working tools and apps written in Go, JavaScript, and Postgres. I don't write Go or JavaScript, although I can read them. I've never administered Postgres in anger. But I have 25+ years of systems experience, and it turns out that's enough. I can read the generated code, spot architectural problems, evaluate whether the error handling makes sense, and steer the iteration loop. I can't write idiomatic Go from scratch but I can tell when the AI-generated Go is doing something stupid.

This is the real shape of "code as assembler." The AI handles the syntax and idiom; the human provides the judgment layer. My experience with distributed systems, failure modes, and operational patterns transfers directly even when I don't know the language. The harness doesn't replace expertise, it makes expertise portable across languages and frameworks in a way that wasn't possible before.

This has two implications.

  1. For experienced developers, your value shifts from "I know language X" to "I know how systems work." That's a bigger, more durable moat.
  2. For non-developers (product managers, designers, domain experts) the barrier to building working software just dropped dramatically. They don't need to learn Go or Python. They need to learn how to specify what they want clearly enough that the loop converges. That's a different skill, and a lot of people already have it without realizing.

Hot take: AI coding tools don't replace developers. They make systems thinking portable across any language or framework.

The model picker disappears

One near-term prediction: the model picker goes away. Nobody types http:// anymore. Nobody picks which CDN node serves their webpage. The system picks. Model selection is heading the same way.

The fact that I currently care whether I'm running Sonnet 4.6 or GPT-5.4 is a sign of immaturity, not a feature. In two years, maybe less, the harness routes tasks dynamically:

cheap model      routine edits, boilerplate
reasoning model  planning, architecture
coding model     implementation
verifier model   checking, testing

The user interacts with one interface. The model choice becomes an implementation detail, like which CPU core your thread runs on. That'll be a sign the ecosystem has grown up.

Hot take: The model menu will eventually disappear.

(The model picker sticking around for power users and experts is fine. I'm talking about the default experience.)

The rate of change problem

The uncomfortable corollary to all of this is that the rate of change is stupid fast. Expertise about specific model behavior expires in days to weeks. Any opinion formed about a model's capabilities on a given Tuesday is stale by the following Tuesday. Including the opinions in this post, presumably.

The durable skills are meta-skills: evaluating models, designing harnesses, specifying intent, thinking in systems. The specific knowledge of "Claude is good at X but bad at Y" or "GPT-5.4 handles long context better than..." is transient. It's useful for a week, maybe two, then something ships and the landscape shifts.

This favors a certain kind of engineer. The senior generalist who thinks in systems, evaluates tradeoffs, and adapts fast. Not the specialist who knows one tool deeply. This is convenient for me, I realize, but I think it's true regardless.

Hot take: The most valuable AI skill is no longer prompting. It's building the loop around the model.

Where this lands

I don't have a neat conclusion. These are hot takes and some of them will age badly. But the harness-as-product thesis feels durable to me, the open source picture is genuinely unsettled, and "code as assembler" is more a description of what's already happening than a prediction.

Interesting times.

Permalink

22 Years of Code, 2 Months of LLMs: The Redland Forge Story

2025-09-13 12:34

Twenty-two years ago, I wrote some Perl scripts to test Redland RDF library builds across multiple machines with SSH. Two months ago, I asked an LLM to turn those scripts into a modern Python application. The resulting Redland Forge application evolved from simple automation into a full terminal user interface for monitoring parallel builds - a transformation that shows how LLMs can accelerate development from years into weeks.

The Shell Script Years (2003-2023)

The project originated from the need to build and test Redland, an RDF library with language bindings for C, C#, Lua, Perl, Python, PHP, Ruby, TCL and others. The initial scripts handled the basic workflow: SSH into remote machines, transfer source code, run the autoconf build sequence, and collect results.

Early versions focused on the fundamentals: - Remote build execution via SSH - Basic timing and status reporting - Support for the standard autoconf pattern: configure, make, make check, make install - JDK detection and path setup for Java bindings - Cross-platform compatibility for various Unix systems and macOS

Over the years, the scripts grew more features: - Automatic GNU make detection across different systems - Berkeley DB version hunting (supporting versions 2, 3, and 4) - CPU core detection for parallel make execution - Dynamic library path management for different architectures - Enhanced error detection and build artifact cleanup

The scripts were pretty capable of handling everything from config.guess location discovery to compiler output integration into build summaries.

The Python Conversion (2024)

The script remained largely the same until 2024, when I decided to revisit it. It was time to move on from Perl and shell scripts and it seemed like a good opportunity to use the emerging LLM coding agents to do that with a simple prompt. This was relatively easy to do and I forget which LLM I used but it was probably Gemini.

The conversion to Python brought:

  • Type hints and modern Python 3 features.
  • Proper argument parsing with argparse instead of manual option handling
  • Pathlib for cross-platform file operations.
  • Structured logging with debug and info levels.
  • Better error handling and user feedback.

The user experience improved as well: - Intelligent color support that detects terminal capabilities. - Host file support with comment parsing. - Build summaries with success/failure statistics and emojis. I'm not sure if that's absolutely an improvement, but 🤷

Terminal User Interface (2025)

A year later, in July 2025, with LLM technology rapidly advancing almost weekly, I was inspired to make a big change to the tool by prompting to make it a full text user interface, with parallel execution of the builds visible interactively in the terminal.

Continuing from the Python foundation, the tool gained a full terminal user interface. The TUI could monitor multiple builds simultaneously, showing real-time progress across different hosts.

One of the first prompts was to identify what existing Python TUI and other classes should be used, and this quickly led to using blessed for TUI and paramiko for SSH.

A lot of the early work was making the TUI work properly on a terminal, where the drawn UI did not cause scrolling or overflows, and the text wrapping or truncation worked properly. After something worked, prompting the LLM to make unit tests for each of these was very helpful to avoid backsliding.

As it grew, the architecture became much more modular: - SSH connection management with parallel execution - A blessed-based terminal interface for responsive updates - Statistics tracking and build step detection - Keyboard input handling and navigation

Each of those was by prompting to refactor large classes, sometimes identifying which ones to attack by using a prompt to analyze the code state and identify candidates, and sometimes by running external code complexity tools; in this case Lizard

The features grew quickly at this stage: - Live progress updates based on event loop. - Adaptive layouts that resize with the terminal. - Automatic build phase detection (extract, configure, make, check, install). - Color-coded status indicators both as builds ran, and afterwards. - Host visibility management for large deployments so if the window was too small, you'd see a subset of hosts building in the window.

The design used established design patterns such as the observer pattern for state changes, strategy pattern for layouts, and manager (factory) pattern for connections. Most of these were picked by the LLM in use at the time with occasional guidance such as "make a configuration class"

Completing the application (September 2025)

The final phase built the tool into a more complete application and added release focus features and additional testing. The tool transformed from an internal development utility into something that could be shared and useful for anyone who had an autoconf project tarball and SSH.

Major additions included: - A build timing cache system with persistent JSON storage so it could store previous build times. - Intelligent progress estimation based on the cached times. - Configurable auto-exit functionality with countdown display. - Keyboard based navigation of hosts and logs with a full-screen host mode and interactive menus.

The testing at this point was quite substantial: - Over 400 unit tests covering all components. - Mock-based testing for external dependencies. - Integration tests and edge cases.

At this point it was doing the job fully and seemed complete, and of more broader use than just for Redland dev work.

Learnings

Redland Forge demonstrates how developer tools evolve. What started as pragmatic shell and perl scripts for a specific need grew into a sophisticated application. Each phase built on the previous, with the Python conversion serving as the catalyst that enabled the terminal interface.

It also demonstrates how LLMs in 2025 can act as a leverage multiplier to productivity, when used carefully. I did spend a lot of time pasting terminal outputs for debugging the TUI boxes and layout. I used lots of GIT commits and taggings when the tool worked; I even developed a custom command to make the commits in a way that I prefered, avoiding hype which some tend to do, but that's another story or blog post. When the LLMs made mistakes, I could always go back to the previous working GIT (git reset --hard), or ask it to try again which worked more than you'd expect. Or try a different LLM.

I found that coding LLMs can work on their own somewhat, depending on the LLM in question. Some regularly prompt for permissions or end their turn after some progress whereas others just keep coding without checking back with me. This allowed some semi asynchronous development where a bunch of work was done, then I reviewed its work and adjusted. I did review the code, since I know Python well enough.

The skill I think I learnt the most about was in writing prompts or what is now being called spec-driven development for much later larger changes. I described what I wanted to one LLM and made it write the markdown specification and sometimes asked a different LLM to review it for gaps, before one of them implemented it. I often asked the LLM to update the spec as it worked, since sometimes the LLMs crashed or hung or looped with the same output, and if the spec was updated, the chat could be killed and restarted. Sometimes just telling the LLM it was looping was enough.

The final application I'm happy with and it's nearly 100% written by the LLMs, including the documentation, tests, configuration, although that's 100% prompted by me, 100% tested by me and 100% of commits reviewed by me. After all, it is my name in the commit messages.


Disclaimer: I currently work for Google who make Gemini.

Permalink

Production Chaos

2025-02-01 11:00

Chaos happens a lot in production and in the associated roles such as Site Reliability Engineering (SRE). Day to day you can be dealing a scale of chaos from noise, interruptions, unknowns, mysteries all the way up to incidents, emergencies and disasters. If you are working in that space, you will have to deal with tradeoffs of risk, time, uncertainty and more. The "unknown unknowns" as Donald Rumsfeld put it or the 1-in-a-million events can happen regularly, if are operating a lot of code, data or systems.

If this is going to happen all the time, you need to have support around you, in particular a team, leadership and organization you can trust to support you whatever happens. You have to be able to relax even in the stressful environment, not worrying about your personal safety or career. This leads to the SRE best practice of blameless when things are failing; it's the fault of the system, not the person. There is no way that you are going to get people working their best, if they are going to get blamed for making mistakes. That way leads to hiding things, avoiding responsibility and a negative feedback loop where everyone avoids making things better.

If you have a culture of blame and fear, you are going to get the worst from your people. Which leads me to my experience working at Twitter when Phony Stark aka Space Karen aka Elmo Maga bought it. He did not trust his employees, did not support them, did not communicate with them, and indeed blamed them. He wanted and fostered a culture of fear and uncertainty.

It was so chaotic at the end I once had two managers message me the same hour that they were my new manager. I also I didn't know at the time that he was my manager for two days:

Picture showing table of my Twitter managers and dates with names redacted except for Elon Musk

Elon Musk is a negative example of how to manage and how to be a grown human. He has many character flaws and a Character Limit.

He is not an example to copy.

It's nauseating seeing him repeating this again at the US Government: Déjà Vu: Elon Musk Takes His Twitter Takeover Tactics to Washington (Gift link)

Permalink

10 Years an SRE

2025-01-08 09:53

I've recently been thinking about my SRE journey and the SRE role I had at Twitter. When I joined in 2015, it was my first SRE titled position but I frankly didn't know what the SRE job was really all about.

At the time, Twitter had no specific SRE onboarding - Flight School was focused on software engineers. You were mostly expected to shoulder-surf existing SREs and learn by osmosis and that's what I did. That's a poor approach in multiple ways including how it is unstructured, requires extensive one-to-one time and may not suit the learning style.

Instead, I used the approach of "to learn it, teach it" and I started creating some SRE-specific onboarding along with an experienced SRE colleague, Rob Malchow. Over a year or so, we developed 3 SRE onboarding courses that covered technologies, processes and specific help for SREs in particular about how to prepare and be oncall - including a big DON'T PANIC slide.

I taught this series of 3 courses with Rob and others maybe 5 times to hundreds of people and every time I tried to invite an existing experienced SRE to join, so that every time the course material was improved and corrected, because nothing stays still in tech. I also believe that pair or co-teaching works much better as two people can both deliver and check for understanding in parallel. At the end I felt I had got a good grasp of the SRE scope. Hopefully the students did too!

With reference to the SRE books from Google, I had read the first one which was out at the time but I found that Google scale and approaches needed customizing for the environment although lots was highly relevant to get to a data-driven approach to reliability using SLA, SLI, SLOs and error budgets.

Now I'm at Google and can see the other side of the fence, where SRE training ("SRE EDU") is taken very seriously and extensive training created and delivered where feedback and evolution is built in. Education remains a very interesting area to me and "always be learning" is an important value of mine. It's also a key part of being an SRE, and working in tech more generally. Hopefully I can participate here too.

Permalink

Google 3 Times

2024-11-04 00:00

Today I work at Google but this was my third attempt to try to join, going back over a decade. Looking back, the first two times I attempted to join as a software engineer, which was probably in hindsight, where I went wrong.

When I was interviewing at Twitter in 2015 for a software engineering role, the Hadoop team engineering manager Joep Rottinghuis suggested my skills might fit better into a different position called Site Reliability Engineering which was new to me at the time. I switched to interview with the SRE manager, an ex-Yahoo! colleague Pascal Borghino and successfully joined.

It turned out that SRE was the perfect position for me, and was the role I had been doing all along, but I just didn't know it existed. I always was curious about the lower levels of software and hardware and wanting to automate getting things working reliably.

As for Google, I successfully interviewed for an SRE position late in 2022 with Duncan Winn and Andrew Brampton, for a start in March 2023. It really helped this time that I was interviewing for the right role which took me some time to figure out, pretty much 25 years into my career.

Permalink

Twitter Interviewing

2024-11-01 00:00

At Twitter, I believe we did a great job in interviewing. We interviewed in pairs (one talks, one takes notes), we used rubrics to ensure fairness and had lots of checks and balances. Everyone who did interviewing had to do training, shadow and then could participate. Although I wasn't involved in the final decision / offer side of it, I did interview a fair amount of it and even interviewed my new manager a couple of times.

In all those interviews I never ONCE looked at a candidates school / college and certificates or qualifications. We always asked questions to figure out what they could DO, how they worked and their career in the sense of what they wanted and how they learnt from their journey. Evidence of emotional intelligence, learning and curiosity were key, beyond the technical skills.

I still believe that was a great approach.

Permalink

The Journey to SRE

2023-03-20 18:00

In my career I've had three big fork()s in the road, so far.

My higher education started off with a Computer Science degree back in 1990 from University of Bristol UK, with a class size of less than 20. My final year project was a parallelized graphics renderer written in occam. #code #graphics

At the end my degree, I had applied to do a PhD in computer graphics, but a couple of days before that offer appeared: I got a job offer for a parallel computing position, which I accepted.

Fork #1: Parallel Computing

If I had started the PhD, the other fork path would likely have ended up as me working as a computer graphics renderer or pipeline engineer working for a big CGI or SFX firm, probably in the US.

Instead, I went to work at the University of Kent at Canterbury (UKC), now called University of Kent, in Canterbury, UK of course. There I worked on the Meiko parallel computer at a blistering 25MHz - a relatively unheard of speed in 1990 - with dozens of nodes, each capable of thousands of lightweight processes based on CSP (lighter than threads, look it up). I helped operate the Meiko system: rebooting, rewiring it (literally wires) between the nodes and racks. #operations #code #learning #teaching

Deeper into that, I got into organizing materials for the Internet Parallel Computing Archive, the software to manage it and hand-compiling Apache on SunOS and IRIX to run it. This led to my first home page location http://www.hensa.ac.uk/parallel/www/djb1.html in 1993. #operations #code

Fork #2: Web and Metadata

If I had continued with parallel computing, the second fork alternate path would have likely been going into research, getting a PhD, working on high performance computing, supercomputers and probably ending up in the US.

Instead, I improved the archive, developed metadata to manage it in IAFA Templates and expanded to work on web metadata, Dublin Core and onwards to RDF and Semantic Web. I wrote software in Perl, presented at multiple web conferences from WWW3, workshops and attended many Dublin Core working groups. #code #web #rdf #metadata

Meanwhile, around 1996, my day-to-day work changed to be web-focused, working on the UK Mirror Service at Kent, installing machines, operating them, making backups and keeping things running for the entire UK academic network, a network called HENSA. I also ran the computer science department's (first) web site http://www.cs.ukc.ac.uk. This was where I learnt operations, web tech and started using Linux. #web #operations #learning

In 2000, I took up an opportunity to go work at the Institute for Learning and Research Technology (ILRT) at the University of Bristol as a technical researcher entirely on software and metadata in the emerging RDF and semantic web area. At that time, I created the Free Software / Open Source Redland RDF libraries all written in C and supporting multiple language bindings, developed and tested these across multiple OSes via build farms. I worked for several years on the software, RDF, semantic web and other standards work in EU research projects such as SWADE, SKOS, as well as lots of W3C projects and working groups for RDF, SPARQL and Turtle. I learnt so much about organizing my time and working in a fast changing environment. #operations #code #web #learning #metadata

I was asked in 2005 if I'd like to come take the work and experience I'd developed in the semantic web work and deploy the software at Yahoo! in USA. I said yes.

Fork #3: Corporate USA

The third fork's other path would have been continuing in the UK and EU University sector, working on open source and web technologies as they evolved. Possibly, I would have ended up working in some large UK IT firm, deploying web tech or teaching web tech in Universities.

At Yahoo! in Sunnyvale, I entered a whole new world, in which there were highly specialized roles, such as Product Managers and Operations Engineers to go along with Software Engineers. After multiple positions and not working on coding or web technologies, I ended up far away from my happy place. #architect #learning

In 2012, I moved on to software engineering roles at a social news startup, Digg, which closed up shop, then subsequently at Rackspace Hosting in San Francisco in 2013. In both cases, I was increasingly working Hadoop big data applications, as well as running and operating Hadoop which was now called DevOps. #operations #code #learning #bigdata

That led to joining Twitter in 2016 finally as Site Reliability Engineer for the Data Platform operating the Hadoop clusters with software addressing the day to day issues, automating the routine tasks and working on strategic projects like cloud for data platform. Finally, I arrived at the job title that matched what I'd been doing for a long time and I loved working in a group of SREs, always learning and helping. #sre #operations #code #learning #teaching #bigdata #cloud

In 2022, Twitter also sold its furniture and, well, that's another story... #chaos

So here we are in 2023 and I'm excited to announce I'm joining Google as a Staff System Engineer in the Site Reliability Engineering part of the Google Cloud organization. #sre #learning #cloud

Permalink

Making Debian Docker Images Smaller

2015-04-18 14:00

TL;DR:

  1. Use one RUN to prepare, configure, make, install and cleanup.
  2. Cleanup with: apt-get remove --purge -y $BUILD_PACKAGES $(apt-mark showauto) && rm -rf /var/lib/apt/lists/*

I've been packaging the nghttp2 HTTP/2.0 proxy and client by Tatsuhiro Tsujikawa in both Debian and with docker and noticed it takes some time to get the build dependencies (C++ cough) as well as to do the build.

In the Debian packaging case its easy to create minimal dependencies thanks to pbuilder and ensure the binary package contains only the right files. See debian nghttp2

For docker, since you work with containers it's harder to see what changed, but you still really want the containers as small as possible since you have to download them to run the app, as well as the disk use. While doing this I kept seeing huge images (480 MB), way larger than the base image I was using (123 MB) and it didn't make sense since I was just packaging a few binaries with some small files, plus their dependencies. My estimate was that it should be way less than 100 MB delta.

I poured over multiple blog posts about Docker images and how to make them small. I even looked at some of the squashing commands like docker-squash that involved import and export, but those seemed not quite the right approach.

It took me a while to really understand that each Dockerfile command creates a new container with the deltas. So when you see all those downloaded layers in a docker pull of an image, it sometimes is a lot of data which is mostly unused.

So if you want to make it small, you need to make each Dockerfile command touch the smallest amount of files and use a standard image, so most people do not have to download your custom l33t base.

It doesn't matter if you rm -rf the files in a later command; they continue exist in some intermediate layer container.

So: prepare configure, build, make install and cleanup in one RUN command if you can. If the lines get too long, put the steps in separate scripts and call them.

Lots of Docker images are based on Debian images because they are a small and practical base. The debian:jessie image is smaller than the Ubuntu (and CentOS) images. I haven't checked out the fancy 'cloud' images too much: Ubuntu Cloud Images, Snappy Ubuntu Core, Project Atomic, ...

In a Dockerfile building from some downloaded package, you generally need wget or curl and maybe git. When you install, for example curl and ca-certificates to get TLS/SSL certificates, it pulls in a lot of extra packages, such as openssl in the standard Debian curl build.

You are pretty unlikely to need curl or git after the build stage of your package. So if you don't need them, you could - and you should - remove them, but that's one of the tricky parts.

If $BUILD_PACKAGES contains the list of build dependency packages such as e.g. libxml2-dev and so on, you would think that this would get you back to the start state:

$ apt-get install -y $BUILD_PACKAGES
$ apt-get remove -y $BUILD_PACKAGES

However this isn't enough; you missed out those dependencies that got automatically installed and their dependencies.

You could try

$ apt-get autoremove -y

but that also doesn't grab them all. It's not clear why to me at this point. What you actually need is to remove all autoadded packages, which you can find with apt-mark showauto

So what you really need is

$ AUTO_ADDED_PACKAGES=`apt-mark showauto`
$ apt-get remove --purge -y $BUILD_PACKAGES $AUTO_ADDED_PACKAGES

I added --purge too since we don't need any config files in /etc for build packages we aren't using.

Having done that, you might have removed some runtime package dependencies of something you built. That's harder to automatically find, so you'll have to list and install those by hand

$ RUNTIME_PACKAGES="...."
$ apt-get install -y $RUNTIME_PACKAGES

Finally you need to cleanup apt which you should do with rm -rf /var/lib/apt/lists/* which is great and removes all the index files that apt-get update installed. This is in many best practice documents and example Dockerfiles.

You could add apt-get clean which removes any cached downloaded packages, but that's not needed in the official Docker images of debian and ubuntu since the cached package archive feature is disabled.

Finally don't forget to delete your build tree and do it in the same RUN that you did a compile, so the tree never creates a new container. This might not make sense for some languages where you work from inside the extracted tree; but why not delete the src dirs? Definitely delete the tarball!

This is the delta for what I was working on with dajobe/nghttpx.

479.7 MB  separate prepare, build, cleanup 3x RUNs
186.8 MB  prepare, build and cleanup in one RUN
149.8 MB  after using apt-mark showauto in cleanup

You can use docker history IMAGE to see the detailed horror (edited for width):

...    /bin/sh -c /build/cleanup-nghttp2.sh && rm -r   7.595 MB
...    /bin/sh -c cd /build/nghttp2 && make install    76.92 MB
...    /bin/sh -c /build/prepare-nghttp2.sh            272.4 MB

and the smallest version:

...    /bin/sh -c /build/prepare-nghttp2.sh &&         27.05 MB

The massive difference is the source tree and the 232 MB of build dependencies that apt-get pulls in. If you don't clean all that up before the end of the RUN you end up with a huge transient layer.

The final size of 149.8 MB compared to the 122.8 MB debian/jessie base image size is a delta of 27 MB which for a few servers, a client and their libraries sounds great! I probably could get it down a little more if I just installed the binaries. The runtime libraries I use are 5.9 MB.

You can see my work at github and in the Docker Hub

... and of course this HTTP/2 setup is used on this blog!

References

Permalink

Open to Rackspace

2013-06-24 21:34

I'm happy to announce that today I started work at Rackspace in San Francisco in a senior engineering role. I am excited anticipating these aspects:

  1. The company: a fun, fast moving organization with a culture of innovation and openness
  2. The people: lots of smart engineers and devops to work with
  3. The technologies: Openstack cloud, Hadoop big data and lots more
  4. The place: San Francisco technology world and nearby Silicon Valley

My personal technology interests and coding projects such as Planet RDF, the Redland librdf libraries and Flickcurl will continue in my own time.

Permalink

Undugg

2012-05-10 08:20

Digg just announced that Digg Engineering Team Joins SocialCode and The Washington Post reported SocialCode hires 15 employees from Digg.com

This acquihire does NOT include me. I will be changing jobs shortly but have nothing further to announce at this time.

I wish my former Digg colleagues the best of luck in their new roles. I had a great time at Digg and learned a lot about working in a small company, social news, analytics, public APIs and the technology stack there.

Permalink