Monday, 1 December 2014

How much time should you spend fixing bugs in legacy code?

How much time should you spend fixing bugs in legacy code?

There's a huge amount written about dealing with greenfield code. You start with practices such as test-driven development, walking skeletons and thin vertical stripes of functionality. Legacy code is much harder. Given hundreds of thousands of lines of poorly structured code, where'd you start? Working Effectively with Legacy Code gives some great pointers; put seams in, get the tests in place and TDD the new feature work. I'm interested in the next level up, how do you balance feature work against bug fixing?

I'm got an interesting problem. We've got a clump of legacy software that product management tell me needs new features, but we also know from support that the number of bugs is a worry. From my point of view as a development manager I want data that allows me to make the right decision and that requires evidence and understanding of the scale and scope of the problem.

It is impossible to find any domain in which humans outperformed crude extrapolation algorithms, less still sophisticated ones (Expert Political Judgement: How good is it? How can we know? [via How to Measure Anything: Finding the Value of Intangibles in Business)

I'd like to move from a faith-based to a science-based approach to balancing new features over bug count.

One field that provides some inspiration is population estimation. Given a small sample size, how do you estimate the total population?

Mark and recapture is a common method for population estimation. Capture 100 animals, tag them and release them. Repeat the process. The number of tagged animals is proportional to the number of tagged animals in the population. If we had no morals whatsoever, we could release an update to 1000 users and sample the number of bugs. We could then release the same update to another 1000 users and see how many bugs we see again.

This isn't a great way to do things, but it does give us some simple formula. If we use the same notation as Wikipedia, then

  • N is the total number of bugs
  • K is the number of bugs found by the first group
  • n is the number of bugs found by the second group
  • k is the number of bugs seen for a second time
This gives us a simple formula that we could use (N = Kn / k). For bugs for released products, it's even simpler. Since we can tally bugs against each other automatically, we can estimate N without doing anything too amoral. We can use the data from the latest release to arbitrarily divide the users in half, calculate how many bugs each side finds and count the number of duplicates.

After a bit of searching around, I found that this isn't a very novel application of the idea. "How many errors are left to find?" talks about this, but from the perspective of software testing (this seems to have generated some controversy in the response, "Another silly quantitative model").

There's a lot of caveats with model-based approaches like this (what exactly is a bug anyway?), but it's better than nothing.

Tuesday, 23 September 2014

The Goal - The Match Bowl Experiment

Recently I've been re-reading The Goal by Eliyahu Goldratt. It's a great little book about manufacturing plants and how to manage them. It introduces the Theory of Constraints and that's relevant for all software developers for an understanding of why our development processes are structured the way they are.

Throughout the book it uses games and metaphors to illustrate faulty thinking about interconnected processes. In this post, I'd like to introduce Goldratt's dice game.

In the dice game there are a number of stations (representing part of a business process). The stations are arranged in a line with the output from one station becoming the input of the next. This arrangement represents a production line. In order to move items through the production line players take it in turn to roll a dice. The number rolled is the maximum you can move to the next station. For example, if you roll a six, but only have three items in your station, then you can only move three to the next station.

Let's imagine a really simple system with 8 stations that starts with 100 units in the left hand bowl with the aim of producing 100 units in the right hand bowl. Each bowl has the same capacity; it'll produce between 1-6 units each work step.

Based on the rules above, what's the flow of work going to look like through the system?

You might expect the flow to be smooth; each workstation has about 0 items at any one time because as soon as they are produced they move onto the next state. The reality is somewhat different.

The movie below shows the system processing a set of 100 units. The bottleneck (that is the work centre with the most items in it) is highlighted in yellow.

What's really interesting is the chaotic nature of the bottleneck. Random fluctuations mean that the bottleneck can appear anywhere. Balancing capacity across each item is clearly not the right answer.

All the diagrams in this page were built using the excellent Diagrams library for Haskell.

Saturday, 26 July 2014

Dynamic Time Warping

Dynamic Time Warping is nothing to do with the Rocky Horror show. It's a dynamic programming algorithm for aligning sequences of data that vary in terms of speed or time. Some typical applications of dynamic time warping are aligning fragments of speech for the purposes of performing speaker recognition.

In this post, we'll look at how simple the algorithm is and visualize some of the output you can get from aligning sequences. The complete code is on github and any flames, comments and critiques are most appreciated. You'll need the Haskell platform installed and a cabal install of the Codec.BMP package if you want to generate some images.

Given two vectors of some symbol a representing a time series (e.g. they both represent the output f(x) = y where x is some time, and y is an output signal) produce as output an array describing the cost. The output array gives the "alignment factor" of the two sequences at difference points in time.

We can find the best alignment path by simply walking back through the matrix from the top right, to the bottom left and taking the minimum choice at each turn. Using this we can visualize the best matching path for two exactly matching sequences. That's dead simple to code up:

Let's look at what happens if we try match the signal against itself and highlight the matching path in white.

The colour demonstrates how well the signals match. Blue highlights the best match (e.g. least cost) and hotter colours (such as red) highlight the worst cost. This pattern matches simple intuition. Since the sequences are exactly aligned, we'd expect a path from the top right to the bottom left, and that's what we get.

What happens if we try to match two completely random signals of integers? First off, let's try with the measure of the cost function being the absolute difference between the values (e.g. the cost function passed in is simply cost x y = abs (x - y)).

Cool patterns. Does this make sense? I think it does. The best match is at the beginning, before the sequences have diverged. As time goes on the match always gets worse because the cumulative absolute difference between the sequences is continuously increasing (albeit randomly).

What if we try to match a sequence against its opposite? Let's visualize that:

That looks odd. What's the intuition describing the image here? The best match of these two signals occurs in the middle (since they are opposite), this feels like this explains the center structure. By the time we reach the end of the signal (the top right) we've got the worst possible match and hence the brightest colour.

This implementation of the algorithm isn't all that practical. It's an O(N^2) algorithm and thus isn't suitable for signals with a high number of samples. However, it's fun to play with!

If you want to find out more about an efficient implementation of dynamic time warping then Fast DTW is a great place to start. As someone who enjoys reading papers, it's fantastic to see the code behind it, quoting from the link:

FastDTW is implemented in Java. If the JVM heap size is not large enough for the cost matrix to fit into memory, the implementation will automatically switch to an on-disk cost matrix. Alternate approaches evaluated in the papers listed below are also implemented: Sakoe-Chiba Band, Abstraction, Piecewise Dynamic Time Warping (PDTW). This is the original/official implementation used in the experiments described in the papers below.