Wednesday 23 January 2013

Finding the Joy in Legacy Code

Recently, I gave a presentation for NxtGenUG.  The main event was Uncle Bob's talk about Clean Architecture which meant that the event was really well attended (100+ people).  I decided to try to talk about finding the positives in legacy code.  The slides from the original presentation are available here.

Legacy code. Words that strikes fear and revulsion into most developers. Who wants to work on a code-base full of problems that's incredibly difficult to change and has little or no automated tests? I've spent the most of my programming career working in legacy code in one form or another.

Legacy code doesn't have a formal definition.  I've heard various definitions floating around.  Maybe it's the code you just don't want to work on?  Every project I've ever worked on has had some modules that no-one wants to go anywhere near.  One of my favourite examples is:

    if (!fileExists("c:/some/path")) {
       // The file exists?

    #define FILE_EXIST_SUCCESS 0

    if (FILE_EXIST_SUCCESS == fileExists("c:/some/path")) {
       printf("Ah, it makes sense now");

Yes, someone had actually shorted the code like that all over the place, so every time you read fileExists you have to do some crazy double-take to work out that the code means exactly what it doesn't say. I'm pretty sure that counts as legacy code, right? (this file also contained the legendary fileCopyWithRetry7, but that's another story).

Another definition of legacy code might be that using a platform that's on the way out. Perhaps you're the poor person stuck maintaining a COBOL or Fortran application? If your really unlucky, perhaps you have a J2EE application to maintain. That must count as legacy code right?

A simple definition of legacy code might just be code that's on the way out? Maybe your airline reservation system is being phased out and replaced with a new one. That's a simple definition and almost certainly covers it.

I thought I'd try to work out what legacy code is by looking at it from the other direction. What's great code? I went round the engineering teams at Red Gate and asked each of them what makes great code.

There was broad agreement on the themes.  Testable, tested, readable, maintainable.  All of these words are associated with confidence, any code that satisfies these properties is going to be fun to work on.

So that brings me to what I consider legacy code
Legacy Code is code that you are scared to change
 That's a hugely broad definition, but it sums it up nicely for me.  I'm aware of Michael Feathers definition which is that legacy code is code not covered by tests, but that doesn't quite cut it for me.  You need to have confidence in the tests, and they also need to get that feedback to you quickly, otherwise you'll still be reluctant to change the code.

If you're working in legacy code, you might be wondering how on earth you can find any joy when working them.  It's all too easy to feel like this guy:

You spend each and every day pushing around this code base, making things a little bit worse every day (metaphor courtesy of Roly Perera way back when).  Let's face it no-one wants to be a dung-beetle.  Who wants to push shit around all day?

Unfortunately, it's all too easy to fall into this situation.  If a method already takes 10 parameters, adding another one is all too easy.  If the preferred development methodology is copy and paste rather than abstraction, then some kind of Stockholm Syndrome sets in and it's easy to convince yourself that it's not that bad.

Even if you do push through that barrier, I've often found myself falling into the situation of not really solving the right problems.  For example, I'll convince myself that before I start any restructuring of the code I should eliminate all the warnings, or perhaps I'll get rid of those pesky PMD warnings?  Increase adherence to FxCop standards?  All of these things are easy to do, but do they really increase the quality of the code-base   Do they make engineering the next feature easier than the previous?  Do they increase your confidence when you work with the code?  Really?

So how do you transform yourself from a dung beetle into someone who enjoys the challenge of legacy code?  It's no easy task, the most important thing for me is to establish the basics:

  • Continuous Build server
  • Tests that give you confidence
  • Fast feedback
None of these things are easy.  Tests are really hard.  The cost of getting the very first test written for a legacy code base might be obscenely high, but until you invest in it, you're be destined to forever be pushing that dung around.

What is there to love about legacy code?  One thing is progress, it's pretty easy to gamify some aspects of legacy code.  As an example, perhaps your team could focus on eliminating global variables one at a time.  It's easy to keep score (Find Usages in the IDE) and it's the sort of change that can usually be made piece-meal.  Having a small set of engineering goals each sprint helps maintain the sense of progress.

One of the biggest assets a legacy code has is the code.  It might be messy, incomprehensible and thousands of lines of code, but there's gold buried in there.  Mining the code base for information can often reveal really interesting things.  Too often it's all too easy to think that any particular class can be reimplemented in half the time it'd take to understand the existing code.  Unfortunately, estimating rewrites tends to miss all the edge-cases and strange-behaviour that's captured in the legacy system.  Perhaps the SP3 edition of the Foo/Bar/Baz component has weird behaviour on the Turkish locale?  The only reason the legacy code captures this is it's been around the block, it's used code and it's had real bugs squashed by real users.  Joel on Spolsky's classic article on the Things you Should Never Do covers this problem really well.

In addition to the codebase, the version control system can reveal lots of other details.  For example, which files have changed the most?  Which have changed the least?  This can help guide important decisions, such as which parts of the system are worth getting under test.  Chances are if code hasn't been edited for 10 years, it probably works pretty well and doesn't require as much testing as the code that's changed every 5 minutes to handle just another edge case.

For me though, the biggest advantage with legacy code is the rate at which you can learn new techniques.  As a new software developer you're bombarded with information about refactoring, patterns and unit testing.  By necessity these are presented on trivial examples, and it's often difficult to see the point.  Why should I be disciplined about extract method when it's only ten lines long?  Why on earth would I need that pattern, surely I can just new this object up?  Test coupled to implementation details rather than behaviour?  It doesn't matter for this example?

Legacy code forces you to confront these issues head on.  You'll discover patterns from first principles. Just by trying to introduce a test you'll quickly realise the dangers of certain code constructs that make things hard to test.  Without the safety blanket of unit tests, you'll quickly learn which sort of refactorings you can confidently apply without tests (yes, sometimes you DO need to refactor without tests, otherwise you're permanently stymied).

The skills you gain with legacy code are entirely transferable.  They aren't a faddish library that no-one will be using next week, they are the meat and potatoes of software engineering.  Master the challenges of legacy code, and you'll have a job for life.