Fog Creek

Conquering Large Legacy Software – Interview with Mike Long

Looking for audio only? Listen on and subscribe

Up Next
The Abuse and Misuse of Test Automation

We interviewed Mike Long, partner at Praqma, and author of ‘Conquering Large Legacy Software’, a field manual for dealing with large, long-lived projects. We discuss how to deal with large legacy codebases, in particular, how to get started, where to focus your attention to maximize returns and how to approach adding tests.

Content and Timings

  • Introduction (0:00)
  • About Mike (0:26)
  • Getting Started with Large, Legacy Codebases (1:38)
  • Realistic Goals of Improving Legacy Software (3:30)
  • Where to Focus your Efforts (4:04)
  • To Re-write or Improve? (5:12)
  • Tools to Understand Legacy Codebases (6:38)
  • Adding Tests to Legacy Code (7:22)



Mike Long is a software consultant and partner at Praqma. He’s a regular conference speaker focusing on areas like cleaning code, long-life software, and continuous delivery. He’s the author of ‘Conquering Large Legacy Software’, a field manual for dealing with large long-lived projects. Mike, thank you so much for taking time to join us today. Why don’t you share a bit about yourself?

About Mike

I was born in Australia. I grew up in Scotland, and after University I started my career in England working on drilling tools. After that I moved to Oslo and was there for 5 years working on a giant marine seismic acquisition system. Then I spent the last 3 years in China working on a large legacy system. Since I came back to Norway last summer I’ve been focusing on continuous delivery. I’ve been helping teams adopt modern technical practices in usually complex development problems.

Problems in dealing with legacy software are well known. Out-of-date technologies, code no-one understands and no tests, yet few organizations seem to actively work to prevent such issues from occurring. Why do you think is?

It’s easy to try and point to one of the biggest core reasons that I see that is that there’s a big imbalance and power between the business side of the organizations and the technical side of organizations. When that occurs, when you don’t have strong spokespeople for the software, then you tend to get decisions that are not at the best interest of the software and business in the long term.

I think the biggest mistake people make is that they don’t start

Getting Started with Large, Legacy Codebases

Just getting started with improving legacy software can be difficult, so where do you recommend people start?

I guess the challenge for most people that are dealing large legacy systems is that the problem can be so overwhelming that they find it hard to pick a target. What I recommend is to focus on what you’re changing. There’s no sense in spending efforts to improve code or improve the software development process of a part of a system that isn’t changing. If it’s not changing, then you don’t have bugs there because you’re not fixing them, you’re not adding new features there, so the return on improving those areas of the code base is usually very low. If you focus on where people are actively investing their time, you’ll get much better returns on those investments.

The second thing is when you talk about large legacy systems, I think that the big problem is not the legacy side of things it’s the large side of things. Large monolithic systems are usually a pain point. It’s not the fact that the software isn’t as good as it could be, it’s the software is so big that actually working with it becomes a very slow process. All the tools are very large. Even checking out the changes is a very big process. Doing a build can take a long time. All these frictions in the development process are a big pain. If you can focus on breaking this monolith up into smaller parts that can each have their own release schedule, their own development pipeline, their own tooling, then you will get a much improved development efficiency there.

That brings me to the third point which is to automate the donkey work. All the stuff that happens manually that could be automated is a great investment. People are the most expensive line item in most software organizations, so if you can find ways to free them up to do value adding work rather than donkey work you’ll get returns very quickly.

Realistic Goals of Improving Legacy Software

What are some realistic goals of improving legacy software?

Realistically, that’s a challenge because it really depends on what you’re prepared to invest. That in turn is dependent on what your future of the software is. Say you have 20 million lines of code and you’re not very happy with them. The realistic goals are not to improve 20 million lines of code. That’s not going to fly. The realistic improvements are to automate the donkey work, to break up the monolith, to focus on the areas that people are already investing their time on because then you get the return on the investment.

Where to Focus your Efforts

How can you go about measuring the quality of a large legacy system to see where the efforts should be directed?

I think as software people, we have lots of tools. Tools aren’t the problem with software actually. We have very good tools for the way we work. To get feedback on your code is quite simple. If you want to find out what your code coverage is, what your complexity is, all these things are very easy to figure out and there’s great tool support in terms of static and dynamic analysis and this kind of thing.

I think one of the most common feedback mechanisms that people mess up on, not just legacy software but software in general, is to tag their change sets when they’re fixing bugs. If you actually tied every time you’re making a change to software in response to a bug, over time you can find out, not only your defect rate, but where in your software those defects occur. You can map them out in your codebase. That’s a very powerful thing. That’s a great way to figure out where to focus your efforts on. That directly feeds back into a return on investment because if you can do preventative work on a place that is error prone that will pay back very quickly.

To Re-write or Improve?

What factors should you consider when deciding whether to rewrite or improve the current

If you were to ask Joe Spolsky, he would say never rewrite. I think that, in general, that’s a good rule of thumb. If you have to pick a rule of thumb that’s not a bad one because any significantly valuable piece of software has had a huge investment on it. To assume that you can rewrite it in a way that’s going to provide return on investment is usually very naïve. In my experience, there’s only a couple of cases when it’s a good idea to rewrite. One is that if nobody cares, if there’s no financial cost associated, and of course, open source is a great example, where we make gitlib2 to. It doesn’t matter that there’s already a get library out there. We can make another one because we don’t have the financial constraints.

The other example is if, like some of these very old systems, there’s no source code around anymore or the tools aren’t, or the mainframes are being decommissioned, or there’s a platform obsolescence, and there are oftentimes when you have to say, “Well, okay, we’re just have to suck it up and rewrite it because we really don’t have any way to carry this forward.” One of the interesting things about modern times is that architectures are much more stable. x86 has been around for a long time, and code that was written in the 90s is still easy to run on modern systems for better or worse. There’s a great deal of effort going into backward compatibility. Those problems of platform obsolescence issues are becoming less and less of a reason for rewriting.

Tools to Understand Legacy Codebases

Are there any tools that one can use to help gain an understanding of that current go base and steps to improve it?

Yes, there are. In my experience they have different payoffs and they’re very context dependent. If you are dealing with a very large codebase that has unmanaged memory management, you have to do manual memory management. You can get a lot of value out of using a static and dynamic analysis tools to track whether you’re doing the right things with memory, whether you’re going beyond buffer bounds, whether you’re trashing your stack when you don’t even know. There’s great tools out there for that and you should definitely give them a try. Again, it’s very context dependent so don’t rush out and buy tools. Evaluate them and see whether they provide value in your context first.

You don’t have to look very far in a legacy project to get some wins

Adding Tests to Legacy Code

The common problem with legacy software is the lack of tests. How should you approach testing legacy software?

Try and get a safety net around the stuff that you’re changing. Typically, the common body of knowledge says, “Unit tests are the way to go.” In a lot of ways that’s very true. As you write code writing unit tests is a very low-cost, low-friction way to get high-quality software. In legacy systems it’s usually very, very difficult to get unit tests into it and breaking those dependencies is very hard. On the other hand you have an awful lot of existing functionality at the user level that you have to keep in a vice. You don’t want to break it. Normally large legacy software tends to have much more end-to-end full system maybe user-interface driven tests which are slow. They’re painful. Getting those things up and running are actually not as expensive as you would imagine, and they can provide a lot of value if you have enough people hammering on a codebase just to give you that safety net, then as you are changing code work on the unit tests and lower level testing.

What are some typical opportunities to reduce waste in legacy projects?

Oh, there’s waste everywhere. You don’t have to look very far in a legacy project to get some wins. As software people, we tend to make the same mistake over and over again and that we wildly overestimate the time it would take to do refactoring and improvement. We wildly underestimate the time it takes to implement features. When we have to make those short-term trade-offs, we always take the wrong choice. On the other hand, once you’ve come to realization that something needs to be done, you don’t have to look far to find quick win. Usually around automation, big systems usually have suffered from poor automation, you can often through a hardware problem as well, as simple thing like just getting some really beefy build service to cut the developer feedback clips is a great way to get the quick win.

Something often overlooked when people are looking at a software problem, if you go in and spend some money on service, you can save millions of dollars on engineering time. You have to think like that. You have to go for the big wins that you can get and then slowly will chip away at the longer term payoffs.

Focus on breaking this monolith up

Are there any common mistakes people make when trying to tackle legacy software?

I think the biggest mistake people make is that they don’t start. It’s not that they’ve gone in the wrong direction or they’ve improved the wrong thing. In general, if you have some focus on continuous improvement, you’ll get some wins. The biggest mistake I see in large legacy projects is that people just don’t start. They don’t understand that there’s a really valuable payoff in spending the time to work on this stuff and just rolling up their sleeves and getting on with it.

What are some resources you can recommend for those working with legacy software?

I would be remiss if I didn’t mention Michael Feathers’ book. It is actually a very good book, and if you want to learn how to get legacy software under test, it’s a great resource. It’s got lots of good techniques. If you’re working as part of a team, form a study group or end it. Just learn it.

Thank you so much for taking time to join us today.

Thanks a lot. No problem.