Dave Beckett

22 Years of Code, 2 Months of LLMs: The Redland Forge Story

2025-09-13 12:34

Twenty-two years ago, I wrote some Perl scripts to test Redland RDF library builds across multiple machines with SSH. Two months ago, I asked an LLM to turn those scripts into a modern Python application. The resulting Redland Forge application evolved from simple automation into a full terminal user interface for monitoring parallel builds - a transformation that shows how LLMs can accelerate development from years into weeks.

The Shell Script Years (2003-2023)

The project originated from the need to build and test Redland, an RDF library with language bindings for C, C#, Lua, Perl, Python, PHP, Ruby, TCL and others. The initial scripts handled the basic workflow: SSH into remote machines, transfer source code, run the autoconf build sequence, and collect results.

Early versions focused on the fundamentals: - Remote build execution via SSH - Basic timing and status reporting - Support for the standard autoconf pattern: configure, make, make check, make install - JDK detection and path setup for Java bindings - Cross-platform compatibility for various Unix systems and macOS

Over the years, the scripts grew more features: - Automatic GNU make detection across different systems - Berkeley DB version hunting (supporting versions 2, 3, and 4) - CPU core detection for parallel make execution - Dynamic library path management for different architectures - Enhanced error detection and build artifact cleanup

The scripts were pretty capable of handling everything from config.guess location discovery to compiler output integration into build summaries.

The Python Conversion (2024)

The script remained largely the same until 2024, when I decided to revisit it. It was time to move on from Perl and shell scripts and it seemed like a good opportunity to use the emerging LLM coding agents to do that with a simple prompt. This was relatively easy to do and I forget which LLM I used but it was probably Gemini.

The conversion to Python brought:

  • Type hints and modern Python 3 features.
  • Proper argument parsing with argparse instead of manual option handling
  • Pathlib for cross-platform file operations.
  • Structured logging with debug and info levels.
  • Better error handling and user feedback.

The user experience improved as well: - Intelligent color support that detects terminal capabilities. - Host file support with comment parsing. - Build summaries with success/failure statistics and emojis. I'm not sure if that's absolutely an improvement, but 🤷

Terminal User Interface (2025)

A year later, in July 2025, with LLM technology rapidly advancing almost weekly, I was inspired to make a big change to the tool by prompting to make it a full text user interface, with parallel execution of the builds visible interactively in the terminal.

Continuing from the Python foundation, the tool gained a full terminal user interface. The TUI could monitor multiple builds simultaneously, showing real-time progress across different hosts.

One of the first prompts was to identify what existing Python TUI and other classes should be used, and this quickly led to using blessed for TUI and paramiko for SSH.

A lot of the early work was making the TUI work properly on a terminal, where the drawn UI did not cause scrolling or overflows, and the text wrapping or truncation worked properly. After something worked, prompting the LLM to make unit tests for each of these was very helpful to avoid backsliding.

As it grew, the architecture became much more modular: - SSH connection management with parallel execution - A blessed-based terminal interface for responsive updates - Statistics tracking and build step detection - Keyboard input handling and navigation

Each of those was by prompting to refactor large classes, sometimes identifying which ones to attack by using a prompt to analyze the code state and identify candidates, and sometimes by running external code complexity tools; in this case Lizard

The features grew quickly at this stage: - Live progress updates based on event loop. - Adaptive layouts that resize with the terminal. - Automatic build phase detection (extract, configure, make, check, install). - Color-coded status indicators both as builds ran, and afterwards. - Host visibility management for large deployments so if the window was too small, you'd see a subset of hosts building in the window.

The design used established design patterns such as the observer pattern for state changes, strategy pattern for layouts, and manager (factory) pattern for connections. Most of these were picked by the LLM in use at the time with occasional guidance such as "make a configuration class"

Completing the application (September 2025)

The final phase built the tool into a more complete application and added release focus features and additional testing. The tool transformed from an internal development utility into something that could be shared and useful for anyone who had an autoconf project tarball and SSH.

Major additions included: - A build timing cache system with persistent JSON storage so it could store previous build times. - Intelligent progress estimation based on the cached times. - Configurable auto-exit functionality with countdown display. - Keyboard based navigation of hosts and logs with a full-screen host mode and interactive menus.

The testing at this point was quite substantial: - Over 400 unit tests covering all components. - Mock-based testing for external dependencies. - Integration tests and edge cases.

At this point it was doing the job fully and seemed complete, and of more broader use than just for Redland dev work.

Learnings

Redland Forge demonstrates how developer tools evolve. What started as pragmatic shell and perl scripts for a specific need grew into a sophisticated application. Each phase built on the previous, with the Python conversion serving as the catalyst that enabled the terminal interface.

It also demonstrates how LLMs in 2025 can act as a leverage multiplier to productivity, when used carefully. I did spend a lot of time pasting terminal outputs for debugging the TUI boxes and layout. I used lots of GIT commits and taggings when the tool worked; I even developed a custom command to make the commits in a way that I prefered, avoiding hype which some tend to do, but that's another story or blog post. When the LLMs made mistakes, I could always go back to the previous working GIT (git reset --hard), or ask it to try again which worked more than you'd expect. Or try a different LLM.

I found that coding LLMs can work on their own somewhat, depending on the LLM in question. Some regularly prompt for permissions or end their turn after some progress whereas others just keep coding without checking back with me. This allowed some semi asynchronous development where a bunch of work was done, then I reviewed its work and adjusted. I did review the code, since I know Python well enough.

The skill I think I learnt the most about was in writing prompts or what is now being called spec-driven development for much later larger changes. I described what I wanted to one LLM and made it write the markdown specification and sometimes asked a different LLM to review it for gaps, before one of them implemented it. I often asked the LLM to update the spec as it worked, since sometimes the LLMs crashed or hung or looped with the same output, and if the spec was updated, the chat could be killed and restarted. Sometimes just telling the LLM it was looping was enough.

The final application I'm happy with and it's nearly 100% written by the LLMs, including the documentation, tests, configuration, although that's 100% prompted by me, 100% tested by me and 100% of commits reviewed by me. After all, it is my name in the commit messages.


Disclaimer: I currently work for Google who make Gemini.

Permalink

Production Chaos

2025-02-01 11:00

Chaos happens a lot in production and in the associated roles such as Site Reliability Engineering (SRE). Day to day you can be dealing a scale of chaos from noise, interruptions, unknowns, mysteries all the way up to incidents, emergencies and disasters. If you are working in that space, you will have to deal with tradeoffs of risk, time, uncertainty and more. The "unknown unknowns" as Donald Rumsfeld put it or the 1-in-a-million events can happen regularly, if are operating a lot of code, data or systems.

If this is going to happen all the time, you need to have support around you, in particular a team, leadership and organization you can trust to support you whatever happens. You have to be able to relax even in the stressful environment, not worrying about your personal safety or career. This leads to the SRE best practice of blameless when things are failing; it's the fault of the system, not the person. There is no way that you are going to get people working their best, if they are going to get blamed for making mistakes. That way leads to hiding things, avoiding responsibility and a negative feedback loop where everyone avoids making things better.

If you have a culture of blame and fear, you are going to get the worst from your people. Which leads me to my experience working at Twitter when Phony Stark aka Space Karen aka Elmo Maga bought it. He did not trust his employees, did not support them, did not communicate with them, and indeed blamed them. He wanted and fostered a culture of fear and uncertainty.

It was so chaotic at the end I once had two managers message me the same hour that they were my new manager. I also I didn't know at the time that he was my manager for two days:

Picture showing table of my Twitter managers and dates with names redacted except for Elon Musk

Elon Musk is a negative example of how to manage and how to be a grown human. He has many character flaws and a Character Limit.

He is not an example to copy.

It's nauseating seeing him repeating this again at the US Government: Déjà Vu: Elon Musk Takes His Twitter Takeover Tactics to Washington (Gift link)

Permalink

10 Years an SRE

2025-01-08 09:53

I've recently been thinking about my SRE journey and the SRE role I had at Twitter. When I joined in 2015, it was my first SRE titled position but I frankly didn't know what the SRE job was really all about.

At the time, Twitter had no specific SRE onboarding - Flight School was focused on software engineers. You were mostly expected to shoulder-surf existing SREs and learn by osmosis and that's what I did. That's a poor approach in multiple ways including how it is unstructured, requires extensive one-to-one time and may not suit the learning style.

Instead, I used the approach of "to learn it, teach it" and I started creating some SRE-specific onboarding along with an experienced SRE colleague, Rob Malchow. Over a year or so, we developed 3 SRE onboarding courses that covered technologies, processes and specific help for SREs in particular about how to prepare and be oncall - including a big DON'T PANIC slide.

I taught this series of 3 courses with Rob and others maybe 5 times to hundreds of people and every time I tried to invite an existing experienced SRE to join, so that every time the course material was improved and corrected, because nothing stays still in tech. I also believe that pair or co-teaching works much better as two people can both deliver and check for understanding in parallel. At the end I felt I had got a good grasp of the SRE scope. Hopefully the students did too!

With reference to the SRE books from Google, I had read the first one which was out at the time but I found that Google scale and approaches needed customizing for the environment although lots was highly relevant to get to a data-driven approach to reliability using SLA, SLI, SLOs and error budgets.

Now I'm at Google and can see the other side of the fence, where SRE training ("SRE EDU") is taken very seriously and extensive training created and delivered where feedback and evolution is built in. Education remains a very interesting area to me and "always be learning" is an important value of mine. It's also a key part of being an SRE, and working in tech more generally. Hopefully I can participate here too.

Permalink

Google 3 Times

2024-11-04 00:00

Today I work at Google but this was my third attempt to try to join, going back over a decade. Looking back, the first two times I attempted to join as a software engineer, which was probably in hindsight, where I went wrong.

When I was interviewing at Twitter in 2015 for a software engineering role, the Hadoop team engineering manager Joep Rottinghuis suggested my skills might fit better into a different position called Site Reliability Engineering which was new to me at the time. I switched to interview with the SRE manager, an ex-Yahoo! colleague Pascal Borghino and successfully joined.

It turned out that SRE was the perfect position for me, and was the role I had been doing all along, but I just didn't know it existed. I always was curious about the lower levels of software and hardware and wanting to automate getting things working reliably.

As for Google, I successfully interviewed for an SRE position late in 2022 with Duncan Winn and Andrew Brampton, for a start in March 2023. It really helped this time that I was interviewing for the right role which took me some time to figure out, pretty much 25 years into my career.

Permalink

Twitter Interviewing

2024-11-01 00:00

At Twitter, I believe we did a great job in interviewing. We interviewed in pairs (one talks, one takes notes), we used rubrics to ensure fairness and had lots of checks and balances. Everyone who did interviewing had to do training, shadow and then could participate. Although I wasn't involved in the final decision / offer side of it, I did interview a fair amount of it and even interviewed my new manager a couple of times.

In all those interviews I never ONCE looked at a candidates school / college and certificates or qualifications. We always asked questions to figure out what they could DO, how they worked and their career in the sense of what they wanted and how they learnt from their journey. Evidence of emotional intelligence, learning and curiosity were key, beyond the technical skills.

I still believe that was a great approach.

Permalink

The Journey to SRE

2023-03-20 18:00

In my career I've had three big fork()s in the road, so far.

My higher education started off with a Computer Science degree back in 1990 from University of Bristol UK, with a class size of less than 20. My final year project was a parallelized graphics renderer written in occam. #code #graphics

At the end my degree, I had applied to do a PhD in computer graphics, but a couple of days before that offer appeared: I got a job offer for a parallel computing position, which I accepted.

Fork #1: Parallel Computing

If I had started the PhD, the other fork path would likely have ended up as me working as a computer graphics renderer or pipeline engineer working for a big CGI or SFX firm, probably in the US.

Instead, I went to work at the University of Kent at Canterbury (UKC), now called University of Kent, in Canterbury, UK of course. There I worked on the Meiko parallel computer at a blistering 25MHz - a relatively unheard of speed in 1990 - with dozens of nodes, each capable of thousands of lightweight processes based on CSP (lighter than threads, look it up). I helped operate the Meiko system: rebooting, rewiring it (literally wires) between the nodes and racks. #operations #code #learning #teaching

Deeper into that, I got into organizing materials for the Internet Parallel Computing Archive, the software to manage it and hand-compiling Apache on SunOS and IRIX to run it. This led to my first home page location http://www.hensa.ac.uk/parallel/www/djb1.html in 1993. #operations #code

Fork #2: Web and Metadata

If I had continued with parallel computing, the second fork alternate path would have likely been going into research, getting a PhD, working on high performance computing, supercomputers and probably ending up in the US.

Instead, I improved the archive, developed metadata to manage it in IAFA Templates and expanded to work on web metadata, Dublin Core and onwards to RDF and Semantic Web. I wrote software in Perl, presented at multiple web conferences from WWW3, workshops and attended many Dublin Core working groups. #code #web #rdf #metadata

Meanwhile, around 1996, my day-to-day work changed to be web-focused, working on the UK Mirror Service at Kent, installing machines, operating them, making backups and keeping things running for the entire UK academic network, a network called HENSA. I also ran the computer science department's (first) web site http://www.cs.ukc.ac.uk. This was where I learnt operations, web tech and started using Linux. #web #operations #learning

In 2000, I took up an opportunity to go work at the Institute for Learning and Research Technology (ILRT) at the University of Bristol as a technical researcher entirely on software and metadata in the emerging RDF and semantic web area. At that time, I created the Free Software / Open Source Redland RDF libraries all written in C and supporting multiple language bindings, developed and tested these across multiple OSes via build farms. I worked for several years on the software, RDF, semantic web and other standards work in EU research projects such as SWADE, SKOS, as well as lots of W3C projects and working groups for RDF, SPARQL and Turtle. I learnt so much about organizing my time and working in a fast changing environment. #operations #code #web #learning #metadata

I was asked in 2005 if I'd like to come take the work and experience I'd developed in the semantic web work and deploy the software at Yahoo! in USA. I said yes.

Fork #3: Corporate USA

The third fork's other path would have been continuing in the UK and EU University sector, working on open source and web technologies as they evolved. Possibly, I would have ended up working in some large UK IT firm, deploying web tech or teaching web tech in Universities.

At Yahoo! in Sunnyvale, I entered a whole new world, in which there were highly specialized roles, such as Product Managers and Operations Engineers to go along with Software Engineers. After multiple positions and not working on coding or web technologies, I ended up far away from my happy place. #architect #learning

In 2012, I moved on to software engineering roles at a social news startup, Digg, which closed up shop, then subsequently at Rackspace Hosting in San Francisco in 2013. In both cases, I was increasingly working Hadoop big data applications, as well as running and operating Hadoop which was now called DevOps. #operations #code #learning #bigdata

That led to joining Twitter in 2016 finally as Site Reliability Engineer for the Data Platform operating the Hadoop clusters with software addressing the day to day issues, automating the routine tasks and working on strategic projects like cloud for data platform. Finally, I arrived at the job title that matched what I'd been doing for a long time and I loved working in a group of SREs, always learning and helping. #sre #operations #code #learning #teaching #bigdata #cloud

In 2022, Twitter also sold its furniture and, well, that's another story... #chaos

So here we are in 2023 and I'm excited to announce I'm joining Google as a Staff System Engineer in the Site Reliability Engineering part of the Google Cloud organization. #sre #learning #cloud

Permalink

Making Debian Docker Images Smaller

2015-04-18 14:00

TL;DR:

  1. Use one RUN to prepare, configure, make, install and cleanup.
  2. Cleanup with: apt-get remove --purge -y $BUILD_PACKAGES $(apt-mark showauto) && rm -rf /var/lib/apt/lists/*

I've been packaging the nghttp2 HTTP/2.0 proxy and client by Tatsuhiro Tsujikawa in both Debian and with docker and noticed it takes some time to get the build dependencies (C++ cough) as well as to do the build.

In the Debian packaging case its easy to create minimal dependencies thanks to pbuilder and ensure the binary package contains only the right files. See debian nghttp2

For docker, since you work with containers it's harder to see what changed, but you still really want the containers as small as possible since you have to download them to run the app, as well as the disk use. While doing this I kept seeing huge images (480 MB), way larger than the base image I was using (123 MB) and it didn't make sense since I was just packaging a few binaries with some small files, plus their dependencies. My estimate was that it should be way less than 100 MB delta.

I poured over multiple blog posts about Docker images and how to make them small. I even looked at some of the squashing commands like docker-squash that involved import and export, but those seemed not quite the right approach.

It took me a while to really understand that each Dockerfile command creates a new container with the deltas. So when you see all those downloaded layers in a docker pull of an image, it sometimes is a lot of data which is mostly unused.

So if you want to make it small, you need to make each Dockerfile command touch the smallest amount of files and use a standard image, so most people do not have to download your custom l33t base.

It doesn't matter if you rm -rf the files in a later command; they continue exist in some intermediate layer container.

So: prepare configure, build, make install and cleanup in one RUN command if you can. If the lines get too long, put the steps in separate scripts and call them.

Lots of Docker images are based on Debian images because they are a small and practical base. The debian:jessie image is smaller than the Ubuntu (and CentOS) images. I haven't checked out the fancy 'cloud' images too much: Ubuntu Cloud Images, Snappy Ubuntu Core, Project Atomic, ...

In a Dockerfile building from some downloaded package, you generally need wget or curl and maybe git. When you install, for example curl and ca-certificates to get TLS/SSL certificates, it pulls in a lot of extra packages, such as openssl in the standard Debian curl build.

You are pretty unlikely to need curl or git after the build stage of your package. So if you don't need them, you could - and you should - remove them, but that's one of the tricky parts.

If $BUILD_PACKAGES contains the list of build dependency packages such as e.g. libxml2-dev and so on, you would think that this would get you back to the start state:

$ apt-get install -y $BUILD_PACKAGES
$ apt-get remove -y $BUILD_PACKAGES

However this isn't enough; you missed out those dependencies that got automatically installed and their dependencies.

You could try

$ apt-get autoremove -y

but that also doesn't grab them all. It's not clear why to me at this point. What you actually need is to remove all autoadded packages, which you can find with apt-mark showauto

So what you really need is

$ AUTO_ADDED_PACKAGES=`apt-mark showauto`
$ apt-get remove --purge -y $BUILD_PACKAGES $AUTO_ADDED_PACKAGES

I added --purge too since we don't need any config files in /etc for build packages we aren't using.

Having done that, you might have removed some runtime package dependencies of something you built. That's harder to automatically find, so you'll have to list and install those by hand

$ RUNTIME_PACKAGES="...."
$ apt-get install -y $RUNTIME_PACKAGES

Finally you need to cleanup apt which you should do with rm -rf /var/lib/apt/lists/* which is great and removes all the index files that apt-get update installed. This is in many best practice documents and example Dockerfiles.

You could add apt-get clean which removes any cached downloaded packages, but that's not needed in the official Docker images of debian and ubuntu since the cached package archive feature is disabled.

Finally don't forget to delete your build tree and do it in the same RUN that you did a compile, so the tree never creates a new container. This might not make sense for some languages where you work from inside the extracted tree; but why not delete the src dirs? Definitely delete the tarball!

This is the delta for what I was working on with dajobe/nghttpx.

479.7 MB  separate prepare, build, cleanup 3x RUNs
186.8 MB  prepare, build and cleanup in one RUN
149.8 MB  after using apt-mark showauto in cleanup

You can use docker history IMAGE to see the detailed horror (edited for width):

...    /bin/sh -c /build/cleanup-nghttp2.sh && rm -r   7.595 MB
...    /bin/sh -c cd /build/nghttp2 && make install    76.92 MB
...    /bin/sh -c /build/prepare-nghttp2.sh            272.4 MB

and the smallest version:

...    /bin/sh -c /build/prepare-nghttp2.sh &&         27.05 MB

The massive difference is the source tree and the 232 MB of build dependencies that apt-get pulls in. If you don't clean all that up before the end of the RUN you end up with a huge transient layer.

The final size of 149.8 MB compared to the 122.8 MB debian/jessie base image size is a delta of 27 MB which for a few servers, a client and their libraries sounds great! I probably could get it down a little more if I just installed the binaries. The runtime libraries I use are 5.9 MB.

You can see my work at github and in the Docker Hub

... and of course this HTTP/2 setup is used on this blog!

References

Permalink

Open to Rackspace

2013-06-24 21:34

I'm happy to announce that today I started work at Rackspace in San Francisco in a senior engineering role. I am excited anticipating these aspects:

  1. The company: a fun, fast moving organization with a culture of innovation and openness
  2. The people: lots of smart engineers and devops to work with
  3. The technologies: Openstack cloud, Hadoop big data and lots more
  4. The place: San Francisco technology world and nearby Silicon Valley

My personal technology interests and coding projects such as Planet RDF, the Redland librdf libraries and Flickcurl will continue in my own time.

Permalink

Undugg

2012-05-10 08:20

Digg just announced that Digg Engineering Team Joins SocialCode and The Washington Post reported SocialCode hires 15 employees from Digg.com

This acquihire does NOT include me. I will be changing jobs shortly but have nothing further to announce at this time.

I wish my former Digg colleagues the best of luck in their new roles. I had a great time at Digg and learned a lot about working in a small company, social news, analytics, public APIs and the technology stack there.

Permalink

Releases = Tweets

2011-08-15 12:25

I got tired of posting release announcements to my blog so I just emailed the announcements to the redland-dev list, tweeted a link to it from @dajobe and announced it on Freshmeat which a lot of places still pick up..

Here are the tweets for the 13 releases I didn't blog since the start of 2011:

  • 3 Jan: Released Raptor RDF syntax library 2.0.0 at http://librdf.org/raptor/ only 10 years in the making :)
  • 12 Jan: Released Rasqal RDF Query Library 0.9.22: Raptor 2 only, ABI/API break, 16 new SPARQL Query 1.1 builtins and more http://bit.ly/fzb9xW #rdf
  • 27 Jan: Rasqal 0.9.23 RDF query library released with SPARQL update query structure fixes (for @theno23 and 4store ): http://bit.ly/gVDp57
  • 1 Feb: Released Redland librdf 1.0.13 C RDF API and Triplestores with Raptor 2 support + more http://bit.ly/hOr4HA
  • 22 Feb: Released Rasqal RDF Query Library 0.9.25 with many SPARQL 1.1 new things and fixes. RAND() and BIND() away! http://bit.ly/flFDH1
  • 20 Mar: Raptor RDF Syntax Library 2.0.1 released with minor fixes for N-Quads serialializer and internal librdfa parser http://bit.ly/fT3aPX
  • 26 Mar: Released my Flickcurl C API to Flickr 1.21 with some bug fixes and Raptor V2 support (optional) See http://bit.ly/f7QncO
  • 1 Jun: Released Raptor 2.0.3 RDF syntax library: a minor release adding raptor2.h header, Turtle / TRiG and ohter fixes. http://bit.ly/jHKaB8
  • 27 Jun: Rasqal RDF query library 0.9.26 released with better UNION execution, SPARQL 1.1 MD5, SHA* digests and more http://bit.ly/lI7lDW
  • 23 Jul: Released Redland librdf RDF API / triplestore C library 1.0.14: core code cleanups, bug fixes and a few new APIs. http://bit.ly/qqV1Rb
  • 25 Jul: Raptor RDF Syntax C library 2.0.4 released with YAJL V2, and latest curl support, SSL client certs, bug fixes and more http://bit.ly/oCIIDd

(yes 13; I didn't tweet 2 of them: Rasqal 0.9.24 and Raptor 2.0.2)

You know it's quite tricky to collapse months of changelogs (GIT history) into release notes, compress it further into a news summary of a few lines and even harder to compress that into less than 140 characters. It is way less if you include room for a link url and space for retweeting and sometimes need a hashtag for context.

So how do you measure a release? Let's try!

Tarballs

Released tarball files from the Redland download site.

date package old
version
new
version
old
tarball size
new
tarball size
tarball
byte diff
tarball
%diff
2011-01-03 raptor 1.4.21 2.0.0 1,651,843 1,635,566 -16,277 -0.99%
2011-01-12 rasqal 0.9.21 0.9.22 1,356,923 1,398,581 +41,658 3.07%
2011-01-27 rasqal 0.9.22 0.9.23 1,398,581 1,404,087 +5,506 0.39%
2011-01-30 rasqal 0.9.23 0.9.24 1,404,087 1,412,165 +8,078 0.58%
2011-02-01 redland 1.0.12 1.0.13 1,552,241 1,554,764 +2,523 0.16%
2011-02-22 rasqal 0.9.24 0.9.25 1,412,165 1,429,683 +17,518 1.24%
2011-03-20 raptor 2.0.0 2.0.1 1,635,566 1,637,928 +2,362 0.14%
2011-03-20 raptor 2.0.1 2.0.2 1,637,928 1,633,744 -4,184 -0.26%
2011-03-26 flickcurl 1.20 1.21 1,775,246 1,775,999 +753 0.04%
2011-06-01 raptor 2.0.2 2.0.3 1,633,744 1,652,679 +18,935 1.16%
2011-06-27 rasqal 0.9.25 0.9.26 1,429,683 1,451,819 +22,136 1.55%
2011-07-23 raptor 2.0.3 2.0.4 1,652,679 1,660,320 +7,641 0.46%
2011-07-25 redland 1.0.13 1.0.14 1,554,764 1,581,695 +26,931 1.73%
Barchart of %diffs between tarball releases.  Noticeable differences are raptor 2.0.0 with a big negative change and rasqal 0.9.22 with largest increase.
Click image to embiggen

Releases that stand out here are Raptor 2.0.0 which was a major release with lots of changes and Rasqal 0.9.21; that changed a lot upwards and it was both an API break as well as lots of new functionality.

Sources

Taken from my GitHub repositories extracting the tagged releases, excluding ChangeLog* files, and running diffstat over the output of a recursive diff -uRN.

date package old
version
new
version
source
files
changed
source
lines
inserted
source
lines
deleted
source
lines
net
2011-01-03 raptor 1.4.21 2.0.0 215 34,018 30,348 64,366
2011-01-12 rasqal 0.9.21 0.9.22 94 11,641 5,712 17,353
2011-01-27 rasqal 0.9.22 0.9.23 25 5,663 5,199 10,862
2011-01-30 rasqal 0.9.23 0.9.24 48 1,107 227 1,334
2011-02-01 redland 1.0.12 1.0.13 96 3,721 5,627 9,348
2011-02-22 rasqal 0.9.24 0.9.25 64 3,857 1,333 5,190
2011-03-20 raptor 2.0.0 2.0.1 42 6,163 5,833 11,996
2011-03-20 raptor 2.0.1 2.0.2 9 55 12 67
2011-03-26 flickcurl 1.20 1.21 19 737 308 1,045
2011-06-01 raptor 2.0.2 2.0.3 88 2,827 2,232 5,059
2011-06-27 rasqal 0.9.25 0.9.26 116 7,130 4,272 11,402
2011-07-23 raptor 2.0.3 2.0.4 33 808 103 911
2011-07-25 redland 1.0.13 1.0.14 75 3,681 5,477 9,158
Total 924 81,408 66,683 148,091
Barchart of number of source lines changed (insertions + deletions) in each release.  Raptor 2.0.0 stands out as much larger than all the others, nearly combined.
Click image to embiggen

Again Raptor 2.0.0 stands out as changing a huge number of files and lines. Also you can see the mistake that was Raptor 2.0.1 being corrected the same day with Raptor 2.0.2 with a few changes. This didn't seem to get tweeted. However also note that several of the Rasqal releases like 0.9.22 and 0.9.26 changed many files. The 'source lines net' column is the addition of the insert and deletes although some of those lines are the same.

Words

Words from the changelog, the release notes and the news post comparing the number of words in the rendered output.

date package old
version
new
version
changelog
words
release
note
words
changelog
to release
word ratio
news
words
changelog
to news
word ratio
2011-01-03 raptor 1.4.21 2.0.0 15,411 2,709 5.69 365 42.22
2011-01-12 rasqal 0.9.21 0.9.22 3,465 1,199 2.89 162 21.39
2011-01-27 rasqal 0.9.22 0.9.23 318 135 2.36 52 6.12
2011-01-30 rasqal 0.9.23 0.9.24 450 254 1.77 59 7.63
2011-02-01 redland 1.0.12 1.0.13 778 235 3.31 73 10.66
2011-02-22 rasqal 0.9.24 0.9.25 1,649 558 2.96 136 12.13
2011-03-20 raptor 2.0.0 2.0.1 247 76 3.25 50 4.94
2011-03-20 raptor 2.0.1 2.0.2 42 27 1.56 42 1.00
2011-03-26 flickcurl 1.20 1.21 119 - - 68 -
2011-06-01 raptor 2.0.2 2.0.3 872 266 3.28 28 31.14
2011-06-27 rasqal 0.9.25 0.9.26 4,410 970 4.55 96 45.94
2011-07-23 raptor 2.0.3 2.0.4 517 345 1.50 77 6.71
2011-07-25 redland 1.0.13 1.0.14 1,347 620 2.17 88 15.31
Total 29,625 7,394 1,296
Bar chart of the ratio of the number of words in the changelog to those in the release news.  Rasqal 0.9.26 and Raptor 2.0.0 are the argest but the tiny Raptor 2.0.2 bug fix is also notable.
Click image to embiggen

So now we get to words. Yes, lots of words, most of them by me. Starting with the changelog which is a hand edited version of the SVN and later GIT changes was over 15K words for Raptor 2.0.0. And that gets boiled down lots into release notes, news and then a terse tweet. Since the changelog corresponds roughly to source changes but the news to user visible changes like APIs, you can see that the oddities are again Rasqal 0.9.26 where there were lots of changes but not so much news; it was mostly internal work.

Now I need to go summarise this blog post in a tweet: Releases = Tweets in 1156 words http://bit.ly/n88ZIQ

Permalink