Monday 1 December 2014

How much time should you spend fixing bugs in legacy code?

How much time should you spend fixing bugs in legacy code?

There's a huge amount written about dealing with greenfield code. You start with practices such as test-driven development, walking skeletons and thin vertical stripes of functionality. Legacy code is much harder. Given hundreds of thousands of lines of poorly structured code, where'd you start? Working Effectively with Legacy Code gives some great pointers; put seams in, get the tests in place and TDD the new feature work. I'm interested in the next level up, how do you balance feature work against bug fixing?

I'm got an interesting problem. We've got a clump of legacy software that product management tell me needs new features, but we also know from support that the number of bugs is a worry. From my point of view as a development manager I want data that allows me to make the right decision and that requires evidence and understanding of the scale and scope of the problem.

It is impossible to find any domain in which humans outperformed crude extrapolation algorithms, less still sophisticated ones (Expert Political Judgement: How good is it? How can we know? [via How to Measure Anything: Finding the Value of Intangibles in Business)

I'd like to move from a faith-based to a science-based approach to balancing new features over bug count.

One field that provides some inspiration is population estimation. Given a small sample size, how do you estimate the total population?

Mark and recapture is a common method for population estimation. Capture 100 animals, tag them and release them. Repeat the process. The number of tagged animals is proportional to the number of tagged animals in the population. If we had no morals whatsoever, we could release an update to 1000 users and sample the number of bugs. We could then release the same update to another 1000 users and see how many bugs we see again.

This isn't a great way to do things, but it does give us some simple formula. If we use the same notation as Wikipedia, then

  • N is the total number of bugs
  • K is the number of bugs found by the first group
  • n is the number of bugs found by the second group
  • k is the number of bugs seen for a second time
This gives us a simple formula that we could use (N = Kn / k). For bugs for released products, it's even simpler. Since we can tally bugs against each other automatically, we can estimate N without doing anything too amoral. We can use the data from the latest release to arbitrarily divide the users in half, calculate how many bugs each side finds and count the number of duplicates.

After a bit of searching around, I found that this isn't a very novel application of the idea. "How many errors are left to find?" talks about this, but from the perspective of software testing (this seems to have generated some controversy in the response, "Another silly quantitative model").

There's a lot of caveats with model-based approaches like this (what exactly is a bug anyway?), but it's better than nothing.

Tuesday 23 September 2014

The Goal - The Match Bowl Experiment

Recently I've been re-reading The Goal by Eliyahu Goldratt. It's a great little book about manufacturing plants and how to manage them. It introduces the Theory of Constraints and that's relevant for all software developers for an understanding of why our development processes are structured the way they are.

Throughout the book it uses games and metaphors to illustrate faulty thinking about interconnected processes. In this post, I'd like to introduce Goldratt's dice game.

In the dice game there are a number of stations (representing part of a business process). The stations are arranged in a line with the output from one station becoming the input of the next. This arrangement represents a production line. In order to move items through the production line players take it in turn to roll a dice. The number rolled is the maximum you can move to the next station. For example, if you roll a six, but only have three items in your station, then you can only move three to the next station.

Let's imagine a really simple system with 8 stations that starts with 100 units in the left hand bowl with the aim of producing 100 units in the right hand bowl. Each bowl has the same capacity; it'll produce between 1-6 units each work step.


Based on the rules above, what's the flow of work going to look like through the system?

You might expect the flow to be smooth; each workstation has about 0 items at any one time because as soon as they are produced they move onto the next state. The reality is somewhat different.

The movie below shows the system processing a set of 100 units. The bottleneck (that is the work centre with the most items in it) is highlighted in yellow.


What's really interesting is the chaotic nature of the bottleneck. Random fluctuations mean that the bottleneck can appear anywhere. Balancing capacity across each item is clearly not the right answer.

All the diagrams in this page were built using the excellent Diagrams library for Haskell.

Saturday 26 July 2014

Dynamic Time Warping

Dynamic Time Warping is nothing to do with the Rocky Horror show. It's a dynamic programming algorithm for aligning sequences of data that vary in terms of speed or time. Some typical applications of dynamic time warping are aligning fragments of speech for the purposes of performing speaker recognition.

In this post, we'll look at how simple the algorithm is and visualize some of the output you can get from aligning sequences. The complete code is on github and any flames, comments and critiques are most appreciated. You'll need the Haskell platform installed and a cabal install of the Codec.BMP package if you want to generate some images.

Given two vectors of some symbol a representing a time series (e.g. they both represent the output f(x) = y where x is some time, and y is an output signal) produce as output an array describing the cost. The output array gives the "alignment factor" of the two sequences at difference points in time.



We can find the best alignment path by simply walking back through the matrix from the top right, to the bottom left and taking the minimum choice at each turn. Using this we can visualize the best matching path for two exactly matching sequences. That's dead simple to code up:



Let's look at what happens if we try match the signal against itself and highlight the matching path in white.


The colour demonstrates how well the signals match. Blue highlights the best match (e.g. least cost) and hotter colours (such as red) highlight the worst cost. This pattern matches simple intuition. Since the sequences are exactly aligned, we'd expect a path from the top right to the bottom left, and that's what we get.

What happens if we try to match two completely random signals of integers? First off, let's try with the measure of the cost function being the absolute difference between the values (e.g. the cost function passed in is simply cost x y = abs (x - y)).


Cool patterns. Does this make sense? I think it does. The best match is at the beginning, before the sequences have diverged. As time goes on the match always gets worse because the cumulative absolute difference between the sequences is continuously increasing (albeit randomly).

What if we try to match a sequence against its opposite? Let's visualize that:


That looks odd. What's the intuition describing the image here? The best match of these two signals occurs in the middle (since they are opposite), this feels like this explains the center structure. By the time we reach the end of the signal (the top right) we've got the worst possible match and hence the brightest colour.

This implementation of the algorithm isn't all that practical. It's an O(N^2) algorithm and thus isn't suitable for signals with a high number of samples. However, it's fun to play with!

If you want to find out more about an efficient implementation of dynamic time warping then Fast DTW is a great place to start. As someone who enjoys reading papers, it's fantastic to see the code behind it, quoting from the link:

FastDTW is implemented in Java. If the JVM heap size is not large enough for the cost matrix to fit into memory, the implementation will automatically switch to an on-disk cost matrix. Alternate approaches evaluated in the papers listed below are also implemented: Sakoe-Chiba Band, Abstraction, Piecewise Dynamic Time Warping (PDTW). This is the original/official implementation used in the experiments described in the papers below.

Monday 14 July 2014

The Stable Marriage Problem

It's been far too long since I wrote posts with any real code in, so in an attempt to get back into good habits I'm going to try to write a few more posts and read up a bit more about some algorithms and the history behind them.

The Stable Marriage Problem was originally described by David Gale and Lloyd Shapley in their 1962 paper, "College Admissions and the Stability of Marriage". They describe the problem as follows:

A certain community consists of n men and n women. Each person ranks those of the opposite sex in accordance with his or her preferences for a marriage partner. We seek a satisfactory way of marrying off all member of the community. We call a set of marriage unstable if under it there are a man and a woman who are not married to each other, but prefer each other to their actual mates.

Gale and Shapley shows that for any pattern of preferences it's possible to find a stable set of marriages.

On its own, this doesn't sound very interesting. However, bringing together resources is an important economic principle and this work formed part of the puzzle of Cooperative Game Theory and Shapley was jointly awarded the Nobel Prize for economics in 2012.

So how does the algorithm for Stable Marriages work?

Let's start by defining the problem. Given two lists of preferences, find the match such that there is no unstable match (that is two pairs that would cooperatively trade partners to make each other better off). The only constraint the types have is that they have is that they are equatable. This isn't the ideal representation (to put it mildly) in a strongly typed language (it doesn't enforce any invariants about the structure of the lists), but it's probably the simplest representation for explaining the algorithm.

stableMatch :: (Eq m, Eq w) => [(m,[w])] -> [(w,[m])] -> [(m,w)]

The algorithm continues whilst there are any unmarried men. If there are no unmarried men, then the algorithm terminates.

  stableMatch :: (Eq m, Eq w) => [(m,[w])] -> [(w,[m])] -> [(m,w)]
  stableMatch ms ws = stableMatch' []
    where       
      stableMatch' ps = case unmarried ms ps of
        Just unmarriedMan  -> stableMatch' (findMatch unmarriedMan ws ps)
        Nothing            -> ps

  unmarried :: Eq m => [(m,[w])] -> [(m,w)] -> Maybe (m,[w])
  unmarried ms ps = find (\(m,_) -> m `notElem` engagedMen) ms
    where
      engagedMen = map fst ps

If there is at least one unmarried man, then we need to find a match. We do this by proposing to each of his preferences in turn. If his first preference is not engaged, then we propose. Otherwise, if his potential partner is already engaged and would prefer him then this violates the stable marriage principle and we breakup the engagement and re-engage with our first choice.

findMatch :: (Eq m,Eq w) => (m,[w]) -> [(w,[m])] -> [(m,w)] -> [(m,w)]
  findMatch (m,w:rest) ws ps = case isEngaged w ps of
      
    -- w is already engaged to m' - is there a better match?
    Just m' -> if prefers (getPrefs ws w) m m'
               then engage (breakup m' ps) m w
               else findMatch (m,rest) ws ps
                      
    -- can match with first choice
    Nothing -> engage ps m w

You can see the full code at Stable Marriage Problem. As always flames, comments and critiques gratefully received.

Thursday 12 June 2014

Getting the most out of Extract Class

Resharper is a wonderful tool. I can't imagine working in the horribleness of legacy code without it.

Every so often you come across a little workflow that makes slicing and dicing code either. For example, before you could "Move Instance Method" you could "Make Static", "Move Method" and "Make instance method". Knowing you could do this made tearing code apart easier.

Recently I've been using "Extract class" a million and one times to deal with one of those 10K line long classes that no-one ever admits to having. The classes in question are in this:

class DoesEverythingAndThenSome {

       private ThingToDoWithA1 m_ThingToDoWithA1;
       private ThingToDoWithA2 m_ThingToDoWithA2;
       private ThingToDoWithA3 m_ThingToDoWithA3;

       private ThingToDoWithB1 m_ThingToDoWithB1;
       private ThingToDoWithB2 m_ThingToDoWithB2;
       private ThingToDoWithB3 m_ThingToDoWithB3;

       // repeat for thousands of other "things"

       public void DO_ALL_THE_THINGS_WITH_A () {

       }
   
       public void DO_ALL_THE_THINGS_WITH_B () {

       }

       // thousands of lines of random shit
       public void example_of_random_shit () {
          if (incrediblyComplicatedCondition()) {
             for (var apples in bananas) {
               m_ThingsToDoWithA1.SomethingImportant();
             }
          } else if (auberginesAreLumpy()) {
             m_ThingsToDoWithB.SomethingElse();
          }
       }
    }

Obviously I want to extract out the responsibilities to do with A and B into separate class. Using "Extract Class" directly doesn't work because I can't pull out the references in example_of_random_shit without making properties public and introducing back references from the extracted class to the parent class.

The simple refactoring is to just extract out each line to do with each field into a single method.

class DoesEverythingAndThenSome {

       /* the rest is the same as above */

       public void SomethingToDoWithA() {
           m_ThingsToDoWithA1.SomethingImportant();
       }
 
       public void SomethingToDoWithB() {
           m_ThingsToDoWithB.SomethingElse();
       }

       // thousands of lines of random shit
       public void example_of_random_shit () {
          if (incrediblyComplicatedCondition()) {
             for (var apples in bananas) {
                 SomethingToDoWithA();
             }
          } else if (auberginesAreLumpy()) {
             SomethingToDoWithB();
          }
       }
    }

Once I've completed this simple refactoring, "Extract Class" can now do the heavy lifting and I can move all the fields, and all the functions across in a single refactoring. What was previously hard (unpicking the back references) is now incredibly simple and I end up with a mechanical transformation to get data and functions in the right place. Extract class will now give me:

class ThingsToDoWithA {
       private ThingToDoWithA1 m_ThingToDoWithA1;
       private ThingToDoWithA2 m_ThingToDoWithA2;
       private ThingToDoWithA3 m_ThingToDoWithA3;

       // snip constructor

       public void DO_ALL_THE_THINGS_WITH_A () {

       }

       public void SomethingToDoWithA() {
           m_ThingsToDoWithA1.SomethingImportant();
       }
    }

    class DoesEverythingAndThenSome {
       
       private ThingToDoWithA m_ThingToDoWithA;

       private ThingToDoWithB1 m_ThingToDoWithB1;
       private ThingToDoWithB2 m_ThingToDoWithB2;
       private ThingToDoWithB3 m_ThingToDoWithB3;

       // snip constructor

       // repeat for thousands of other "things"
   
       public void DO_ALL_THE_THINGS_WITH_B () {

       }

       // thousands of lines of random shit
       public void example_of_random_shit () {
          if (incrediblyComplicatedCondition()) {
             for (var apples in bananas) {
               m_ThingsToDoWithA.SomethingImportant();
             }
          } else if (auberginesAreLumpy()) {
             m_ThingsToDoWithB.SomethingElse();
          }
       }
    }

Repeat this mechanically for all the other responsibilities Then the hard work begins of actually working out sensible names...

Friday 11 April 2014

How does it feel to give a terrible conference talk?

Have you been to a conference and sat through an awful presentation and wondered just how the hell someone got there? Me too!

Recently I attended the ACCU conference in Bristol and got to experience what it feels like to deliver something that went down like a lead balloon. One evening many moons ago, I thought I'd send in a proposal. By some small miracle I got accepted and was all set to run a 90 minute introduction to Haskell.

I'd already run through the workshop once at a local user group. The material isn't amazing, but I was confident in delivering it and thought it offered people a chance to get a taste of Haskell and programming with functions.

Then the problems started. It's ACCU. It's full of clever people, therefore I should level-up the material and assume more knowledge. Right? I should make it more hands-on, more interactive and better in every way.

I prepared hard. I updated the slides. I added more and more. I wrote notes, I dug references and I was confident it would kick-ass.

And then the day arrived.

90 minutes seem like a long time. It isn't. I spent a good 15 minutes ensuring that everyone could run "hello world". Very rapidly 90 minutes because 60 minutes.

Then my cleverness got the better of me. The Curry-Howard isomorphism is fascinating, but perhaps it's not the best subject matter within the first 30 minutes of any presentation. Trying to explain it under pressure with questions from an audience eager to learn makes it even worse. I probably lost another 20 minutes trying and failing to explain that const :: a -> b -> a only has one valid implementation in Haskell. And what the hell are the poor attendees going to do with this information? GAH!

And so it continued. On to writing some code. I'd wanted to make it easier to compose higher order functions to produce results, so I'd made the initial data structures in the exercises a bit more complicated than those I'd shown in the example slides. Big mistake. This made it much harder for people to grok the syntax; I'd shown simple syntax but not given enough direction. 30 minutes rapidly disappeared and I'm now *way* behind schedule.

At this point, I'd already realized the situation was going Pete Tong. But what'd you do? You can't just down tools and walk out the room (well, I suppose you could, but that'd be worse), so you just have to knuckle down and carry on. And carry on I did, through more examples (well over-egged) and then onto the Universality of Fold (brain, what the hell are you thinking?!).

With 5 minutes left, there's plenty of time to through a demo of QuickCheck in, right?But then, I realized I'm in an Emacs buffer. How'd I increase the font-size so people can read it? GNARGH!! It's over to Notepad and bump the fonts up in that. "Should have used vi!" went the audience. ARGH!

And then the buzzer sounds (well, not really, but it's time to go). Bring things to a halt and escape to a corner of the building. I can't imagine that was particularly fun for the participants. A few people kept up (hurrah!) and there were a couple of positive things said, but I knew it'd gone wrong and it boy that doesn't feel good.

So, at least now I know how it feels (bad, very bad) and I also learnt an important lesson. Keep the message simple! Focus on the single takeaway you want participants to have. I wanted people to leave knowing that Haskell isn't impenetrable and looking at how far you can get just by reading type signatures. However, I lost this in a noise of other random related things and tried (and failed) to communicate a million and one other features.

KISS!

Wednesday 9 April 2014

Agile - What Next?

I'm at ACCU at the moment, and instead of preparing my talk on Haskell for Thursday, I'm writing up my notes from Bob Martin's talk on agile yesterday.

Agile was originally founded by a bunch of programmers over a decade ago. The aim (from Kent Beck) was to devise a system that eliminated the trust divide between programmers and managers (them and us). Transparency was the aim of the game. Programmers would record velocity using story points. Managers would track number of story points per sprint and produce burn-down charts. Everyone is happy.

Unfortunately, burndown and velocity charts track only one part of software development, features. There's a hidden part of software development that isn't captured by these charts, ability to change. If there's one thing for certain in software development it's that people will change their mind and features will need to adapt. It's no good having your software with the correct features today, if it can't have the correct features tomorrow. Arguably, a code bases ability to respond to change is the primary responsibility of the developers.

In the original light-weight process, XP, this was kept in check by Ron Jefferies concentric circles.

Concentric Circles

This, again, is part of transparency and trust. At the inner-level, TDD, pair-programming and simple design keep the software honest. A suite of tests gives transparency on the system functionality. Moving further out we reinforce these practices with collective ownership (transparency again, no siloed development). And so on, and so forth.

Fast-forward a decade or so, and where are we now? Agile is the domain of the manager. There are no developers at agile conferences any more, it's all about the secondary value of a software product (shipping features) rather than the primary ability (reacting to change).

The XP Practices have been forgotten. Scrum empowers teams to take ownership of their practices and opt out of ones that don't work. Of course, it's easier (in the short run!) to forget about TDD, simple design and refactoring. However, in the long run productivity grinds to a halt (see Design Stamina Hypothesis).

Bob argues (The Corruption of Agile) that agile doesn't exist without the practices that support it. I agree; most agile teams aren't agile in their ability to react to change. Martin Fowler has a term for it "Flaccid Scrum" where we adopt the project management side of it, but not the underlying practices for ensuring that the code base becomes malleable and responsive to change.

With all this in mind, the trust issues have reemerged. Dropping the velocity (number of story points per sprint) is a bad idea, so developers have rebelled. Let's just make the stories smaller. The points counted are the same, but the size of the stories is much smaller. Teams are wading through custard, developing features just as slowly as ever.

The thrust against this has come in the form "software craftsmanship". This is trying to reimagine the circles from the inside out, but it's failed. It's failed because it doesn't attempt to bridge the divide between the managers and the coders. It might help the engineers to "do the right thing" more often, but it doesn't show transparency.

And the talk ended there, no answers for the future and a little depressing. I've definitely seen the scenarios Bob describes, but what's the solution? It's probably not "kill all the project managers" as someone suggested. I'd love to make the "ability to change" a tangible concept that teams can explore and understand. It's not an easily measured property, but I think taking data-driven decisions about code is part of the answer. Project managers need options to meet business constraints. Sometimes it's OK to go quick and dirty, to spike a feature that may not live longer than a week, but you have to accept that the remedial cost of recovering from that burst of activity exists and understand the remedial cost.

Right, now to finish off a few slides for this Haskell thing.

Saturday 1 February 2014

The First International Conference on Software Archaeology

I recently attended The First International Conference on Software Archaeology, much more memorably shorted to #ticosa.

It was a slightly strange conference, in that it was never particularly clear what software archaeology was, but that was a good thing as it gave a great variety of talks encompassing everything from metrics, to tools for understanding, to philosophical thoughts on the architecture of information.

Process Echoes in Code


Michael Feathers opened the proceedings with a question, what's the real point of version control systems? The most common answer is that VCS systems help you roll back to previous revisions should something go wrong, or support multiple different product lines. The truth is this doesn't really happen. If your team deploys something to production that goes wrong, then I imagine you'll revert the deploy (not the VCS) and simply deploy again. The real purpose of source control is providing change logging. By looking at those changes we can see the traces of the way we work that are indelibly written in the version control system.

Michael demonstrated a tool (delta-flora) to explore the traces left in the source code. The tool was a simple Ruby program that mapped the Git commit history (SHA1, files changed, author, code diff) into Method event objects (methods added, changed and modified). This is a simple transformation, but one that seems to yield a vast amount of useful information.

Exploring the temporal correlation of class changes seems like an incredibly useful way of identifying an area of related objects. I'm working on a large, badly understood code-base. We're already finding that adding features requires touching multiple files. By mining information from the past, maybe we can make more educated decisions in the future?

Another area Michael mentioned that sent my synapses firing was analysing classes by closure date. Even if you have a huge code=base, identifying the closed classes (those that haven't changed) helps reduce the surface area you have to understand. One particular graph he showed (graphing the set of active classes against the open classes) was particularly interesting.



I'd love to plot this on a real code-base, but my understanding is that whilst you've got open classes, chances are you haven't finished a feature and the code-base is in an unstable phase. Looking forward to trying this one out.

Are you a lost raider in the code of doom?


Daniel Brolund followed with a quick overview of the Mikado Method. The Mikado Method provides a pragmatic way of dealing with a big ball of mud. We've probably all experienced the "shockwave" refactoring (or refucktoring?) where we've attempted to make a change, only to find that change requires another change, then another and before you know it you have a change set with 500 files in and little or no confidence that anything works.

The Mikado Method helps you tackle problems like this by recognizing that doing things wrong and reverting is not a no-op. You've gained knowledge. Briefly the method seems to consist of trying the simplest possible thing, using the compiler and more to find pre-requisites (e.g. If only that class was in a separate package...). By repeatedly finding the dependent refactorings you can arrange a safe set of refactorings to tackle larger problems.

I completely agree with this approach. Big bang refactorings on branches are no longer (if they ever were!) acceptable ways to work. Successful refactoring keeps you compiling and keeps you working in the smallest possible batch size. I liked the observation that the pre-requisites form a graph; before I've worked in pairs where we've kept a stack of refactorings (the Yak stack?) but it's an interesting observation that sometimes it's a graph.

How much should I refactor?


Matt Wynne gave a great metaphor for keeping code clean. If you imagine that software engineers are chefs and their output is meals, then the code base is the kitchen. What does your kitchen look like?

Matt had an exemplar code base (Cucumber rewrite), created as greenfield code, test-first, small-team, small commits and no commercial pressures. By analysing commits, a rough and ready guess was that 75% of commits were pure refactoring.

In answer to the question, how much should I refactor? The answer is simple.

More than you currently do.

Code Metrics


Keith Braithwaite gave us a talk about metrics and in particularly the dangers of not knowing what you are doing.

He gave some examples from earlier analysis that (allegedly) demonstrated that TDD exhibited bigger methods than test last. This doesn't fit our intuition and indeed analysing the results showed that they based the results on the mean. If we plot method length distribution, we'd find it's not a normal distribution but a power-law distribution. Doing a more statistically sound analysis actually gives the opposite results.

The moral of the story for me was that reducing a data set without knowing what you are doing is very dangerous!

Visualizing Project History


Dmitry Kandalov showed us an amazing analysis of a number of open source projects by mining the version control history (see here). This was the highlight of the conference for me, seeing interactive history of real code bases. Neat!

I really enjoyed seeing the way Scala and Clojure have evolved. Scala has progressively added more complexity and more code. Clojure however, has stabilised. Draw from that what you will!

Tools for Software Business Intelligence


Stephane Ducasse gave us an overview of some of the tools he used for software business intelligence. There was a call to action that we need dedicated tools for understanding code bases and I couldn't agree more with that. There were many interesting links:

Understanding Historical Design Decisions

Stuart Curran gave a presentation on "Understanding Historical Design Decisions". Stuart's perspective was very different as he comes from an information architecture / design background and didn't consider himself a programmer.

Some books to add to my ever-growing reading list:

Confronting Complexity

Robert Smallshire gave a talk on Confronting Complexity and returned us back to metrics (see also notes from Software Architect 2013).

We started by analysing how to calculate cyclomatic complexity. One interesting observation was that cyclomatic complexity gives us a minimum bound on the number of tests we need to get code coverage. If you follow this through, then if you add a conditional statement once every fifth line then every five lines of code you write demands another test. Ouch.

We looked at a simpler proxy for code complexity, Whitespace Integrated over Lines of Text (WILT). This is a really simple measure and incredibly quick to calculate so it lends itself to visualizing code data quickly.

There was a really good quote attributed to Rob Galankis (technical director at Eve Online):

How many if statements does it take to add a feature?

Again, this comes back to one of the recurring themes of the conference, Bertrand Meyer's open-closed principle. One of my takeaways from this was to pay much more attention to OCP!

Rob mentioned that Refactoring Reduces Complexity and gave the example of "Replace switch with polymorphism". I'd agree with this for the most part, but there are exceptions. Rename for example preserves code complexity, but increases code comprehensibility: the two don't always align. It'd be interesting to hook in a plugin to refactoring tools to calculate WILT before and after refactorings and report on the cumulative benefits.

Rob finished off by presenting an alternative model-driven approach to software engineering. The visualizations were neat and helped show the range of possibilities. That immediately seems like an improvement over other models such as COCOMO. Interestingly, going back to COCOMO shows that developer half-life isn't considered in the model, nor is complexity of the code produced (I guess the assumption is that complexity of the product => complexity of the code?).

Lightning Talks

Finally, we ended up with a set of lightning talks. Nat Pryce gave a quick demo of using neo4j to analyse a heap dump. Graph databases are cool!

Ivan Moore gave a few opinions on how you can protect your software for archaeologists from the future.

  • Ship your source with your product
  • Put your documentation be in source control
  • Put your dependencies in source control (reminded me of nuget package restore considered harmful)
  • Make sure you put instructions to build the product in source control (chef!)

There was a presentation towards the end that showed how adding sound to a running program (initially for the purposes of accessibility) produced some interesting effects. I've done this kind of thing before (creating animations for log files). Sometimes you can just rely on your brain to find the interesting things when you present it in another way.

Conclusion

TICOSA was a great conference. There was a good line up of speakers and lots of interesting content to muse over. What would I like to see next year? I'd really like to hear more war stories. I'd love to hear stories of archaeological digs. I'd especially love to hear about restorations. My general impression is that very few code bases start a restore process and come out better at the end (usually you hear about the big rewrite and sometimes those fail too), but I'd love to hear otherwise!

I'm looking forward to getting back to work on Monday and scraping through the commit logs to see what I can uncover!