FogBugz and Kiln Build and Release Report Card

November 3rd, 2011 by Rock Hymas

My wife and I are a couple months into our first year of homeschooling, and we’ve been discussing how the kids are doing with it, and the challenges we’ve faced as their teachers and parents. Those discussions have led us to make some important changes, in the same way that a school report card can get parents and children to make changes that lead to more learning and happier kids. In that spirit, I offer here a report card on the build and release management work we’ve been doing on FogBugz and Kiln [1] over the last 12-18 months.

About a year ago, I was in the middle of writing a series of posts about the practices of build and release management. Many of those practices came from looking at what we were doing with FogBugz and Kiln and identifying what worked, and what didn’t. Others came from what I saw and experienced while working within the Microsoft Office organization, and still more came from discussions within our team at the Creek. Though in some areas we were doing things right, most of the posts were a description of what our existing build and release (non-)system was doing utterly wrong and what I hoped we could change to fix it. So the series was often written in a spirit of hope that we would one day live up to the ideal.

In fact, you could take many of the posts I wrote last fall and read them as the opposite of how things existed at Fog Creek in the summer of 2010. So, in the spirit of Jason Cohen’s recent BoS talk about honesty, I’m going to be brutally honest about how we were doing and how we are now doing.

First, how we were doing. Here’s the report card for June of 2010, divided into subjects, of course.

Report Card for June 2010
Subject Grade
Fast builds/deploys

Deploying upgrades to www.fogcreek.com takes over an hour

Other builds and deployments often take much longer than they need to

Upgrades to FogBugz or Kiln often involve downtime of a few seconds to half an hour or more

D
No hard dependencies

Checked out source code is tied to absolute paths in some cases

The build and deployment scripts are full of hard coded paths and other hard dependencies

F
Single commands (automate, automate, automate)

We can’t run most builds with a single command

Many of the build and deployment scripts are combined

Deploying a new version takes multiple steps

Different scripts are used for dev builds vs official builds

Different configurations of the build use different build scripts

There is no tooling to make it easy to setup your dev system

There are no scripts to configure a new server for our hosted environment, or a new dev box for development

D
Staging Environment

We have no staging environment for our hosted FogBugz and Kiln products

F
Continuous Integration

There is no continuous integration builds or tests running

F
Scripts @ production quality

The scripts are old, unmaintained, with lots of cruft

Many built binaries are stored directly in the VCS

D
Onboarding new developers

There is no easy way for devs to build installers

There are no build scripts that do incremental builds

D
Insight into builds and releases

Old builds are not saved for most official configurations

We aren’t tracking build statistics like failures, build quality, etc.

Logs of builds are not systematically kept and stored anywhere

Build failures do not automatically notify anyone, nor do successful builds

There are three different types of version numbers for our three different deployment targets

F

All in all, things were a mess. ”Good grief!”, I hear you cry. “So what have you done about these issues? Anything? What are you planning to do?” I’m glad you asked.

My initial goal in the role of build and release manager was to eliminate the position. I wanted to get things running smoothly, and spread the knowledge and ownership of the problems through the team, so that I could go back to doing product development. As such, a decent amount of the progress described here was done by other great developers here at the Creek. Back in January (when I joined FogBugz as the team lead), I thought I’d gotten about halfway through the work required for that to happen. But, of course, new ideas come up constantly, so maybe it never ends.

Since January, I haven’t been able to do as much. In addition to my small efforts, other great Creekers have picked up some of the slack. Here’s our report card for October 2011.

Report card for October 2011
Subject Grade
Fast builds/deploys

A new, fast build machine

Reduced web site deployments from taking ~80 minutes to taking ~8 minutes

FogBugz and Kiln deployments now faster, because they don’t also rebuild the product

Upgrades to FogBugz or Kiln still involve downtime of a few seconds to half an hour or more

C+
No hard dependencies

Removed all the absolute paths

Removed many other hard dependencies

Removed some remaining hard dependencies that occasionally caused build failures

B
Single commands (automate, automate, automate)

Mortar, an internal website for kicking off builds and deployments

Lots of help on this one from Benjamin at bitquabit

Separated build and deploy scripts for hosted builds

Some of the build and deployment scripts are still combined

Deploying a new version takes multiple steps

Different scripts are still used for dev builds vs official builds

Different configurations of the build still use different build scripts, but some combining work has been done

There is still no tooling to make it easy to setup your dev system

There are still no scripts to configure a new server for our hosted environment, or a new dev box for development

C+
Staging Environment

Still no progress on the staging environment

F
Continuous Integration

Continuous integration builds exist, mostly just building the products

Basic tests have been added to some continuous integration builds

B
Scripts @ production quality

Removed many checked in binaries, building them instead

Many scripts converted from FinalBuilder to Python, but a few still remain

B-
Onboarding new developers

Branch builds, for any branch a dev wants to setup

bf, a better tool for building FogBugz incrementally on dev machines

Aaron kick-started this and continues to be the driving force

There is still no easy way for devs to build installers

D
Insight into builds and releases

Saving old hosted builds

Moved to a single type of version number no matter the config

Build logs for all official builds

Cases filed on build failure

Notifying build owners when builds complete, successfully or not

We aren’t tracking build statistics like failures, build quality, etc.

B-

As you can see, there is still plenty of work to do. The glaring failure in both report cards is our lack of a staging environment. We’re now working on making that a reality, with high hopes that it will help us start catching a new class of bugs before they make it into our customers hands. As part of that work, we’ll also be able to work on:

  • Scripts for configuring our servers, by type
  • No-downtime upgrades
  • Move our internal dogfooding to the staging environment, for better consolidation and testing of deployment scripts

Other possibilities for further work should be pretty clear from the report card, but the staging environment is the key focus when it comes to improving our builds and releases.

One note about the culture for both FogBugz and Kiln. We’ve learned a ton in the last couple years by seeing at close range what can be done in greenfield development on products like Trello, or teams that actively maintain and improve their processes for builds and releases, like StackOverflow. Obviously, when working with a legacy product like FogBugz, the costs of implementing certain practices may vary a lot, and we have to take that into consideration, but the inspiration that comes from seeing what is possible keeps us working to make things better.

When I look at this report card, I feel like my oldest son a little bit. He’s our perfectionist, always expecting to get everything right the first time through, and pretty disappointed when that doesn’t happen. We’re hoping that some of the changes we make in our homeschooling will help him relax a bit so he can learn more freely. And yes, it’s somewhat humbling to look at our build and release report card and admit how far we have to go. But I have high hopes that in another six months we’ll have improved our grades again.

 

[1] What about Trello?

Trello has the awesome advantage of being greenfield development. As such, they’ve really started from the ground up by doing the right things when it comes to builds and releases. They frequently release midday with no downtime, they have a nice staging environment, their releases take almost no time at all. They have run into some interesting load challenges with updating their clients after an upgrade, but I’ll let the Trello team talk about that in their own sweet time.