Thursday 13 December 2012

Technetium scans with mock objects

I've been doing a lot of work recently with legacy code.  I like Michael Feathers definition of legacy code, namely that it's code that isn't under test.  This particular code goes further still, by not being under test and having incomprehensible dependencies.  I'm looking at something at the moment that takes 26 parameters as input (and has a million and one hidden dependencies).

With code like this, reasoning about it is really hard.  The particular area I struggle with is capturing all of the second order dependencies; assumptions about the dependencies that aren't immediately appaernt.  For example, the Law of Demeter (a.b.c) is violated, then I need to capture that knowledge somehow that a must return a non-null b. Once I've got this information, it's a lot easier to act upon it. (perhaps I just introduce c on the interface of a and then I've broken a dependency relationship).

In medicine, a technetium scan is a technique where a radioactive isotype is injected into a patient, and this trace allows you to visualize what's happening internally. We can use mock objects to simulate this technique with code. A mock object provides a simulated object that you can inspect to see how it is used. It is the radioactive isotope object, injected in and observed.


I found it super useful to capture hidden behaviour about a system. As a simple example, let's take a hideous class like this.

class HorribleMess {
       HorribleMess(Foo foo, Bar bar, Baz baz) {}

       DoHorribleThings() {
         var xyz = foo.getXYZ();
         bar(xyz.abc(), xyz.def());
       }
    }

There's a slightly hidden dependency here. HorribleMess depends not only on Foo, but on the type returned by Foo.getXYZ(). Not only that, but it also depends on the results of methods invoked on those objects. In this example, this is dead simple to see, but in a real legacy code base finding this information is a real challenge.

This is where I think mock objects can help. The way I've found useful is to create a unit test class and just try to instantiate the object passing in mocks wherever possible. If you can construct the object immediately, great, there's no dependencies on constructors and you can start to explore how the methods work. By using strict mocks you can force yourself to spell out the dependencies in the test (the test will fail unless you explicitly set the response of the mock object).

The pattern my tests often end up with is a series of documentation about the dependencies of the class. This is similar in spirit to the effect sketching advocated in Working Effectively with Legacy Code, but with strict mocks it has the advantage of being more difficult to make a mistake. It also serves as a living record of hidden dependencies for a particular class.

  class HorribleMessTest {
 
    // 1st order dependencies
    MockFoo mockFoo;
    MockBar mockBar;
    MockBaz mockBaz;

    // 2nd order dependencies
    MockXyz mockXyz;

    void Test() {
      // 1st order
      mockFoo.When(mockFoo.getXYZ()).Return(mockXyz);
 
      // 2nd order
      // setup on mockXyz
      
      new HorribleMess(mockFoo, mockBar, mockBar);
    }
  }

In legacy code, I often find that there's three or more levels of dependencies. Once these are explicitly spelt out you have a trace of a particular execution path through the code and you can start to feel slightly more confident about changing it.

The spelt out dependencies often immediately suggest the refactoring needed to make the solution cleaner. I've found remove the middle-man to be a great first step in eliminating the multiple dependencies.

I do have some concerns whether this'll continue to be a useful technique in the future. Perhaps this will create too much baggage in the code base (in terms of tests needing to be kept up to date). Time will tell!

Sunday 9 December 2012

Code Retreat Cambridge

Yesterday I had the pleasure of attending a Global Day of Code Retreat event held at Red Gate, Cambridge hosted by the local software craftsmanship group.

The format of the day is to solve the same problem a number of times with different constraints each time. Each problem is solved as a pair and generally uses test-driven development. After the end of each session, the code was thrown away. Although this might seem a little strange, it meant you couldn't form an attachment to the code and could more easily indulge with the given constraints.

The problem used this time was Conway's Game of Life. I think this is a great choice of problem, it's simple enough that the problem can be easily understood, but there's a really rich variety of possible approaches to solving the problem.

I alternated the sessions between C# and Haskell. I found the value in pairing directly related to the asymmetry in ability. The more asymmetry the better! I found that showing people unfamiliar with Haskell was a rewarding experience. People seemed to grok Haskell very quickly, and were impressed with the readability of the solution.

As a side note, I was really impressed with Mono.  I needed to get a C# development environment up and running on my Linux laptop, and I was slightly dreading doing so.  Turned out that was completely unfounded, it just worked.  The IDE lacks a few niceties (it's certainly not as easy as Visual Studio + Resharper), but it's not bad!

On the C# side, I learnt more about design than anything else. In particular, one thing that stuck out for me was the difference between the top-down and bottom-up approaches. Sometimes when I paired, my partner would aim to get a top-level test (evolving a grid produces a new grid for example). I found this really hard to drive from a TDD point of view - breaking down from the top is much harder for me. The other way up, writing the rules first (a dead cell that has three neighbours becomes alive for example) feels much more natural for me an certainly a lot easier to drive the tests. There did seem to be a middle ground that was more fun to work through; get the structure right with a top-level test, and then drive the functionality bottom-up. The seems to be the same approach as advocated by Growing Object-Oriented Software by Tests and I look forward to trying this out in anger.

I didn't pair with anyone who even considered mutable objects.  I'm not sure whether this was just a self-selecting group of people, but everyone was firmly in the "mutable state" is bad.  I would really like to run with a set of constraints to do with high-performance to see how that altered peoples perceptions.

The constraints that were chosen were interesting.
  • No primitives (including enums)
  • No if statements
  • No mouse! (use the IDE)
  • No long methods
No mouse wasn't very interesting for me, I've always been a big fan of learning keyboard shortcuts (bashing keys always makes it look like you know what you are doing, so I've found it a good way to hide my incompetence sometimes!). 

The most interesting constraint for me was the no if statements (see the anti-if campaign for some justification). The lack of if statements forced you into generating object hierarchies and using polymorphism.  It was a little artificial for the problem but it did result in cleaner code (just maybe a bit more of it!).

So now I need to try and take this back to my day job.  Always a bit harder trying to apply techniques learnt in the small to problems faced in the large.  Legacy code makes life harder!

Thursday 6 December 2012

Code Smells - The Lying Interface

I came across a code smell today that I hadn't seen / heard before. I've christened it the "Lying Interface". At work, we've been trying to refactor some code into a shared service accessible by WCF. To do this, we need to understand the interfaces and then develop the appropriate protocol to model this.

I've been faced with some interfaces like this:

public interface IPoller {   
     void Poll();
  }

This doesn't look too bad. The Poll method presumably calls every so often to poll something. Looking at the constructor then shows something a bit hidden.

public class Poller : IPoller {
    private IPollListener m_Listener;
  
    public Poller(IPollListener listener) { 
       m_Listener = listener;
    }

    public void Poll() {
      // run some code and generate some results
      m_Listener(results);
    }
  }

So despite the poll method being simple, it's deceptive. It actually relies on a constructor argument being passed in and invokes the results upon it. The interface is really lying, it's not showing the true responsibilities of the class. Perhaps another name for this is "hidden dependency"?

My opinion is that the interface should be self-describing, it should say what the class does without saying how. In this case, the hidden dependency on the IPollListener class is not expressed properly. I ended up refactoring the code to be more self-describing and using an event instead, roughly the new code ends up looking like this.

public interface IPoller {
     void Poll();
 
     event EventHandler<PollingEventArgs> Polled
   }

   public class PollingEventArgs : EventArgs {}

It's still not perfect, without a comment I have no way of knowing from the method signatures what events are triggered, but I think it's better than it was. This has the neat side effect of breaking the dependency on Poller on IPollListener and switching over to an Observer pattern.

I'm still not sure what the right name for this smell is, maybe it's more to do with hidden dependencies, but it's a start!

Saturday 6 October 2012

Circle CI and Haskell

This was going to be a long post on how to set up CI and Haskell.  Thankfully I found a simpler way.

One of the most essential parts of Continuous Integration is to have a self-testing automated build.

A CI server (such as Jenkins or TeamCity) aims to simplify this procedure .  Events on the version control system (such as a git push) trigger the CI server which performs a build of the code and runs the automated test suite.   Unfortunately, in my experience, it's always a pain to set these servers up, especially if you aren't using a standard curly brace language.

Circle CI promises one-click automated continuous integration from code pulled from your git repository.  It sounded too good to be true, and initially it was, there way no out of the box support for Haskell.  However, my e-mail to support was quickly answered and the latest version of the Haskell platform was installed and available a few days later (awesome support).

Once the Haskell platform was installed it took me literally 3 minutes to get a working build.


Each server that runs the process is a standard Ubuntu 12.04 and has the Haskell platform installed.  This makes it dead simple to install any extra libraries and then run the necessary cabal commands to configure and run the tests (I used the exitcode-stdio package to run the tests).

Being able to setup CI in a few minutes is really useful and takes a large part of the drudgery away.  Awesome sauce.

I hadn't realized that GitHub had an API for hooks.  As someone who's really interested in dev-tooling, there's a million and one useful ideas that come out from them.  I'd love to see more tools like this (e.g. static analysis, security vulnerability, fuzz testing, bench-marking, profiling, memory checking, code metrics and so on).

Tuesday 14 August 2012

NATO on Software Engineering

The Nato Science Committee report on Software Engineering (available here) is a fantastic read.

A study group was formed and given the task of assessing the entire field of computer science. The name "Software Engineering" was deliberately chosen as being provocative, in implying the need for software manufacture to be based on the types of theoretical foundations and practical disciplines, that are traditional in the established branches of engineering.

It was hoped in 1968 that the contents of the paper would be widely distributed so that the necessities, shortcomings and trends that are found could serve as a signpost of manufacturers of computers as well as their users.

Back in 1968, there were only about 10,000 installed computers, increasing between 25-50% a year.  The rate of growth was viewed with alarm then with quotes such as:


Particularly alarming is the seemingly unavoidable fallibility of large software, since a malfunction in an advanced hardware-software system can be a matter of life and death.
We undoubtedly produce software by backward techniques 
Programming management will continue to deserve its current poor reputation for cost and schedule effectiveness until such time as a more complete understanding of the program design process is achieved. 
 And my particular favourite quote about software development

Today we tend to go on for years, with tremendous investments to find that the system, which was not well understood to start with, does not work as anticipated. We build systems like the Wright brothers built air planes — build the whole thing, push it off the cliff, let it crash, and start over again.

This embryonic stage of software engineering was an exciting time.  Engineering practises hadn't matured yet (not sure they have now!), but there was spirited discussion about the nature of software engineering.  The model of software engineering described is easily recognizable as the waterfall model.

Notice already the recognition that maintenance is about the same size as implementation and that implementation is an "error-prone translation process".

From the discussion about software engineering came many things that we see as important today.

The need for constant feedback about the system was emphasised many times.  Fraser describes the software development process in a way that sounds very familiar to agile software development.
Design and implementation proceeded in a number of stages. Each stage was typified by a period of intellectual activity followed by a period of program reconstruction. Each stage produced a usable product and the period between the end of one stage and the start of the next provided the operational experience upon which the next design was based. In general the products of successive stages approached the final design requirement; each stage included more facilities than the last. On three occasions major design changes were made but for the most part the changes were localised and could be described as ‘tuning’
The emphasis of working features after each iteration is something we (as a profession) still struggle to do today.  Breaking big features down into bite-sized stories that can be completed successful is a hard problem.

Nowadays we seem to make less distinction between design and implementation.  Techniques such as TDD (coined in 2002? also known as test-driven design) try to ensure good design (or at least good enough) by driving the design through the tests.  Dijkstra probably wouldn't TDD, but he would argue that tests (or more likely formal proofs!)  are a vital part of the design process:



... I am convinced that the quality of the product can never be established afterwards. Whether the correctness of a piece of software can be guaranteed or not depends greatly on the structure of the thing made. This means that the ability to convince users, or yourself, that the product is good, is closely intertwined with the design process itself.
Perlis summed the process of software development rather well:
1. A software system can best be designed if the testing is interlaced with the designing instead of being used after the design.
2. A simulation which matches the requirements contains the control which organizes the design of the system.
3. Through successive repetitions of this process of interlaced testing and design the model ultimately becomes the software system itself. I think that it is the key of the approach that has been suggested, that there is no such question as testing things after the fact with simulation models, but that in effect the testing and the replacement of simulations with modules that are deeper and more detailed goes on with the simulation model controlling, as it were, the place and order in which these things are done. 
So how was software design knowledge shared back then?    There was some talk about sharing of decisions (even the wrong ones to avoid having them repeated).   Naur was already aware of the need for software patterns to be established.
… software designers are in a similar position to architects and civil engineers, particularly those concerned with the design of large heterogeneous constructions, such as towns and industrial plants. It therefore seems natural that we should turn to these subjects for ideas about how to attack the design problem. As one single example of such a source of ideas I would like to mention: Christopher Alexander: Notes on the Synthesis of Form (Harvard Univ. Press, 1964)


It's interesting to read the discussion on high-level languages.  There was an overwhelming agreement that high-level languages are a good thing, but at the time I think the implementation and tools available restricted their availability.  Thankfully, this problem is fixed now (it's difficult to find people arguing against managed languages),

I found this paper an awesome read and recommend everyone reads through it.  The cynic in me enjoyed seeing that the wonderful best practises advocated today have been around for years, it's just they didn't have catchy buzzwords back then.  What I find exciting is that software engineering still hasn't found that silver bullet that improves software development by an order of magnitude.  Software engineering is going to be a challenge for the next 44 years.

I'd really like to see a version 2 of this paper; how has software engineering changed in the last 44 years?  Are there any collated experience reports?

Monday 9 July 2012

On Types and Tests

There's been a couple of posts recently (Confession of a Haskell Hacker and Unit Testing isn't enough) about types and tests. I thought I'd write down some of my half-baked thoughts about how I think about types and tests.

When I'm writing code, I tend to try to write them in a functional-style. Given some arguments, produce some output. Give the same arguments, get the same answer. This isn't possible everywhere in most applications, but I've found it to be applicable most of the time. Given I've got this function, I now need to test it. I like to think of each function as having a shape.

Let's consider a simple function that scores a Poker hand. Our first signature for this is:

// And so on.
  const int HEARTS = 1;
  const int KING = 10;

  int ScoreHand(PairOfInts[] cards) {}

If I was to visualize this function, I'd imagine something like this:


The values of possible input to this function are HUGE.  It's defined over every single possible pair of integers.  The only interesting area is the tiny red dot in the middle (not to scale) that represents the tiny amount of the possible input that it's possible to return a valid value.  As is already clear this isn't a good way to write a function - you can pass almost anything in, and the chances of getting a sensible response is slim.  The testing burden on this function is huge, whereas the type "burden" on the function is very low.

Let's try and improve things a bit.

   enum Suit { Hearts, Diamonds, Clubs, Spades };
   int ScoreHand(PairOfSuitAndAnInt[] cards) { /* reams of code */}   

This is better, the size of the red circle has increased a tiny amount, but there's still a huge amount of code that it can't (sensibly) produce a value for. This is known as a partial function. We have to test slightly less (no need to test that a Suit is valid), but we still have to consider a huge range of other options (what if there are six cards?).

What we're after is aligning the function so that for every value we can calculate a suitable score.

   enum Suit { Hearts, Diamonds, Clubs, Spades };
   enum Value { One, Two, Three, Four, Five, Six
                 Seven, Eight, Nine, Ten,
                 Jack, Queen, King, Ace };

   class Card {
     public final Suit suit;
     public final Value value;
     // And an appropriate constructor
   }

   int ScoreHand(Card a, Card b, Card c, Card d, Card e) { /* reams of code */}


Now I'm closer to total function. It guarantees (via the type system) to produce a sensible value for each and every input that's passed to it. (I'm just going to willfully ignore that Java/C# allows null values. Let's pretend it doesn't!). The big hole we have is that we don't check the uniqueness of the cards (e.g. I could ask for the score of five aces!).

If I visualize the function now, it's much (that's an exaggeration) more interesting.  Red defines the area that the function produces a sensible value for and the area around the outside is the inputs we don't provide a sensible output for!



Now I can go about testing this in a much nicer way. Traditional unit testing is existential quantification. Given this set of values, I expect this kind of output. It gives some assurance for some values. I'll need some of these tests just as a sanity check.

Property-based testing extends this. In property-based testing I can say things like given any input of the form needed by the function, the following holds. For example, I could write a test that said given 5 unique cards, if I compare them (with the output of the ScoreHand function) I'll lose against a Royal Flush (unless I also have a royal flush).   If you have a total function, Property-based testing allows universal quantification - you can specify an invariant that holds for all values of input.

You should use your type system (if you have one) to constrain the inputs to your functions as much as possible. The more constrained the types, the less you have to test (you still need both!). If you can define a total function, and a set of invariants that hold for all values then you have a incredibly powerful test-suite.



Friday 29 June 2012

Sensible Not Censorship

The Government has recently started a consultation on Parental Internet Controls. I strongly believe that this is completely the wrong way to address the issue.

The Internet doesn't present any more of a threat to children than the newsagent round the corner, or your older brothers collection of pornographic magazines. The idea that we can solve the problem of children accessing "unsuitable" content by technical means is laughable.

One of the ideas in the proposal is "active choice":

Active Choice: customers are presented with an unavoidable choice or series of choices through which they consciously choose whether or not they want filters and blocks installed on their internet service or internet-enabled device

This is a poisonous idea that must be stopped! I don't want every ISP in the country to have a record that "Jeff likes his internet unfiltered, but Joe doesn't". The idea of a Government appointed set of goggles on the Internet is not attractive in the slightest!

How's it going to work? Let's imagine that little Joe wants to see some pornography. His parents have opted for filtering, so he's completely safe right? His first search for "breasts" returns no titillating content whatsoever so he gives up. Of course he doesn't! He starts to search for ways around the shield. What's next, do we block all information on proxy servers or Tor, or any other of the million and one ways that any technological solution can be by-passed?

The solution to this problem isn't technological.

It doesn't cost any money.

It doesn't need much resourcing.

It's very simple.

Let parents be responsible for what their children see online.

It's sensible not censorship.

Please read the proposal and give feedback before its too late.

Sunday 29 April 2012

Thoughts from the final day of ACCU

The final day keynote was a series of lightning keynotes (apparently due to a missing speaker during the Icelandic Volcano problem a few years before and it now being an established tradition).

The opening key note was by a Lisper, Didier Verna and was about language obesity.  The central argument was a familiar one from a Lisp expert, but I still found it pretty compelling!  Natural languages have evolved from a few roots, and have evolved by adding new words, but the grammatical structure has remained mostly static (when was the last time a new grammar structure was added to your language?).  In contrast, the rationale for most new language design tends to involve taking the best bits from previous languages, and adding a sprinkle of some new grammatical elements to simplify things.  For example, both C# and Java have evolved to have for-each loops with new syntax.  C++ is perhaps the best example of language bloat - C++ 11 is probably the most complicated language I have ever seen with very unclear semantics.

Lisp doesn't suffer from this problem (it suffers from syntactic anorexia!), and the reason it doesn't is homoiconicity.  This property is that code is data (and vice-versa).  For example, in Lisp (+ 1 1) is both an expression that applies 1 and 1 to the + function, it's also a list of three expressions atoms.  This property gives Lisp the ability to add new syntax and structures without the need for a new language.  CLOS is perhaps the best example of this, adding an object system to the base language without the need to change the language specification whatsoever.  The loop macro is another good (I'm not a fan; I don't want to learn a big DSL for dealing with collections) example of this.  Someone will probably argue that Boost Spirit is a great example of how you can use template metaprogramming to do the same thing in C++ but that's a Turing tarpit style argument!

Lisp will be around for ever in one form or another simply because of this property.  I'm not sure whether the same can be said of curly-brace languages.  What horrific new structures will be needed in a decade when we're programming on mega-core machines?  I found the argument compelling, and perhaps in light of the previous keynote on the Requiem to C there's a chance that a new homoiconic language will emerge to deal with our multi-core future. 

Charles Bailey talked about the "Important Art of Thinking".  Despite the fact that software should always be about thinking, we sometimes find ourselves developing auto-pilot and thus end up in a mess.  It's always easier to start tapping on the keyboard rather than thinking about the problem (working in Haskell inverts this for me!

He mentioned the Dreyfus model of skill acquisition and tried to relate this to programming by peeling back the layers of abstraction on std::endl.

     
  • Novice - std::endl ends a line with a newline
     
  • Advanced Beginner - std::endl is a manipulator
     
  • Competent - std::endl can be viewed as a function operating on the stream
     
  • Proficient - std::endl is defined as a function template
     
  • Expert - std::endl is redundant when connected to a terminal because it will (by POSIX standards) guarantee at least line-buffering.  Just use '\n' already!


I do like the new C++ lambda sequence.  Being able to type [](){}() and have it mean nothing is an achievement by anyone's standards!  In summary, think before coding...

Next up was Emily Bache talking about "Geek Feminism", more specifically programmer geek feminism.  This was defined as the ability for women to influence the programming community (on merit - no-one was asking for a free pass!).  As was evident at the conference, there aren't a lot of women in technology.  In the grand scheme of things, there aren't a lot of software engineers at all, so missing out on 50% of the population seems like a big problem!  Imposter syndrome was something I am familiar with (especially in my first programming gig having come out of a research background).  It's the feeling that you think you are crap despite external evidence that you are good.  There was anecdotal evidence that women are more prone to suffer from this than men.  There was also a reference to the Wisdom of Crowds which (rather obviously) states that gathering opinions from a bigger range of backgrounds leads to better results.  Software's gender inbalance stops us taking full advantage of the wisdom of crowds.

Finally, there was a great presentation from Jurgen Appelo on finding your motivation.  Jurgen presented CHAMPFROGS, an acronym for the various things that can motivate developers.

     
  • Curiousity
     
  • Honour
     
  • Acceptance
     
  • Mastery
     
  • Power
     
  • Freedom
     
  • Rrelatedness
     
  • Oorder
     
  • Goal
     
  • Status


The Moving Motivators game sounds like something useful to try to see if your needs and desires are aligned with the work you are actually doing.

Next I went to Usable API's in Practice by Giovanni Asproni.  This tried to fill the gap between the principles that we have (Single Responsibility, Principle of Least Astonishment, Don't Repeat Yourself) and the code that we write.  API's are the biggest asset (or liability) that a company has.  A bad API (even an internal one) limits your ability to change and can act as a productivity drain.  The following principles were presented.


     
  • Code from the users perspective - One example of how to do this is not just to test your API, but also to test a program written using your API.
     
  • Naming - keep it simple, don't be cute and use one word per concept
     
  • Control to the caller - don't restrict options.  Previously I've worked somewhere were all the core container functions used their own locks.  This was a terrible decision because the caller didn't have control!
     
  • Explicit Context - don't use globals to hide context from the user, pass it in at construction time
     
  • Error Reporting - one of the harder ones.  Use layers (since an error at one layer is not necessarily an error at the next).
     
  • Logging - this gave some good discussion on why Logger logger = LoggerFactory.GetLogger() is a bad thing!


I chose to leave it there at ACCU.  I really enjoyed most of the talks.  The key themes this year were multi-core is coming, TDD is good and functional is fun.  I suspect these have been messages from the expert community for quite some time, but hopefully the wider software engineering community will start to realize this!

Friday 27 April 2012

ACCU 2012 - day three

The opening keynote today was "Requiem for C" by (howling mad?) Uncle Bob.  First the easy dead language beginning with C. COBOL is a dead language.  It may still run 40% of the code in the world, but no new projects are being written in it and no new developers are being trained in it.  This is an easy one to agree on!

More contentious is that this is the fate that awaits C.  Bob was at pains to point out that he isn't gunning for C and gave us a recap of his programming career from punchcards on the PDP-8 through to his first usage of C in the late 1970's.  The presentation style was fun, though there was a little controversy.

The first nail in C's coffin was C++.  It gave us strong typing, modularity and the wonders of templates, but importantly it still gave developers the chance to see the metal.  If you worked hard enough, you could still reason about what was going on in the underlying machine.  You can still see the metal, though it's a little fuzzy.

Next Java showed us that we don't need to see the raw metal, we can have an idealised machine running on a machine.  You don't need to see the raw metal at all, have a look at the JVM instead!  Looking back, it does seem that the world had gone bonkers at this point.  Instead of running on a machine, we'll run on a software simulation of a machine, running on a machine.  Eh?

Now the final nail (and a familiar theme for the conference) the multi core revolution.  Now we need radically new views of the hardware to take full advantage of it.  

I agreed with most of the keynote.  C is never going to fully go away, but it's going to be needed by a smaller and smaller percentage of people.  The presentation style was fun, but a little intense when you are nursing a hangover!

Next I went to Kevlin Henney's talk on "Functional Programming you already know".  There's been a lot of hype generally about functional programming, but we've been practicing these concepts for years anyway (I think you could replace functional programming with any of the patterns and practises that have been hyped and still be on safe ground e.g. dependency injection).

There were some examples of first class functions from the C language e.g. qsort, function composition (UNIX pipes) and declarative (SQL).  There were also some great sound bites too:

code generation - do the wrong thing faster!
lock is the anti-fast pattern

There was lots of historical content too.  It's always suprisinging to read notes from the 1970's and hear that it still sounds fresh and new today.  For example Alan Kay's explanation of what object orientation really means.

After this I went to a C++ talk about generic libraries by Andrew Sutton.  Firstly I found that he has the perfect job, researching how to make the perfect data structure and algorithms library.  Unfortunately he has to use C++!  The Origin libraries are the playground for this experimentation.

I learnt a lot of new things about C++ concepts.  A concept is the both the syntactic meaning of a type (e.g. the operands) and the semantic (the behaviour of those operations).  I was very interested in the axiom part of concepts which provides a series of relationships that any I implementation of the concept must obey.  For example, you can specify that the equality concept should be transitive, reflexive and symmetric.   I will be interested to see how verifiability evolves in C++ as it certainly doesn't seem like a natural extension for a language as wart-ridden as C++.

Then as a C# / Java / Haskell programmer, it was clear that my next session should be the C++ 11 pub quiz.  This was a well run session about just how complicated C++ is.  Even with compiler writers, language specification contributors and so on, no one managed to understand all the code.  Compiler bugs were found and changes in behaviour over different versions were also observed.  C++ might still be close to the metal, but I don't think there is a person on earth with the necessary mental acuity to see through the complexity!

ACCU day two thoughts

This is my second attempt to write these notes.  Turns out that writing in an online editor whilst connected to hotel wi-fi is a very bad idea!

Today's keynote was given by Phil Nash, and was entitled "The Congreuent Programmer".  The message seemed to be that we live our lives according to a set of beliefs, values and principles.  If these are aligned then we are fine, otherwise we find ourselves in a state of cognitive dissonance.  One example I have for this is that I believe in TDD (at least to some degree!), but I very rarely find myself practicing it!  

Next I went to a talk on "Go, D, C++ and the multicore revolution" by Russel Winder. The aims of this talk were to convince me that shared memory multi-theading is not appropriate for application development (I don't need much convincing on this point!) and also that the new threading related features of C++ 11 may have saved it from the dustbin.

The multicore revolution is already here.  I'm writing this on my dual core iPad and chances are you (hi mum!) are reading it on a multicore machine of some description.  Since the multicore revolution, software has started to lag behind hardware.  As software engineers, we're the people holding things up!

The distinction was made between concurrency (a design tool) and parallelism (a tool for improving performance).  Three models of controlling concurrency were then presented.  None of the are new ideas, but they are well understood tools that are only now beginning to be recognised as important.

The actor model controls concurrency by having multiple independent processes (important to note that this process need not be a separate OS process, it's just a lump of code with an independent address space) communicating via asynchronous messages.  Erlang is an example of a language that has direct support for the actor model.

The data flow model is a spreadsheet model where the computation reacts to changes in inputs.  this has a role in big data as a counterpart to the map/reduce app.  By running a data flow net as a continuous query that updates based only on the diff, it's possible to get more immediate results.  Lucid is an old, but interesting, example of a data flow language.

Finally, Communicating Sequential Processes (CSP) was presented.  This is a formal model of concurrency where processes communicate via synchronous messages.  Go reinvented CSP and called it go-routines.

Next Russel presented a simple example that calculated Pi.  The example was kept consistent as the languages moved to higher level abstractions, from C and pthreads to Go and D with C++ and MPI in between.  The obvious point was higher level abstractions make things easier to understand.  All in all, I enjoyed this talk!

Henrik Berglund ran a quick 45 minute session on "Real Teams".  I have to admit I initially chose this one purely because the door was open so it was a break from the unbelievably hot rooms!  Luckily for me this turned out to be an excellent choice.  The introduction talked about the curious case of the astronauts that went on strike.  Despite being highly trained, military disciplined and very intelligent a bunch of astronauts went on strike in space simply because the working conditions were so hard.  Building great teams is hard!

Getting the conditions right for a team is very important and Henrik outlined some of the things that he had seen with successful teams.
  • People need influence over their working conditions.  This can be achieved by using self managing teams that have clear authority.  Delegation poker might be one tool to try out for this.
  • Compelling direction.  A vision, a product, a sprint goal.  One of the more interesting points was that goals should be challenging and it was suggested that a 50% failure rate was optimum.  I definitely want to try this when I get back to work!
  • Clear boundaries.  Understand who is on the team.
  • Nobody succeeds unless everyone succeeds.  This avoids the done vs done done dilemma.  Problems should be given to teams not people.
  • Trustworthy feedback based on results.  No proxies between users and the teams.  The role of the product manager is to be the vision guy, not to shield customers from engineers.
  • Need to depend on each others skills.  It's OK to specialize and build up independent skills (got to be careful of the bus number!)
  • Be able to give negative feedback if needed.  This needs trust on the team and this takes time to build up.
    Team building requires effort.  Undertsanding the needs and motivations of your colleagues is an important part of working together
I enjoyed this talk a surprising amount and there's definitely some ideas I can take back to work.
After this talk I caught the tail-end of "When only C will do".  C is supported everywhere and has a standard application binary interface (ABI).  This ubiquity makes it an easy choice.  C++ doesn't have a standard API and embedded devices may choose to drop some features (e.g. runtime type information or exceptions) which makes life harder.  The conclusion seemed to be that most of the time you can use C++, but you may need to expose a C interface to be compatible with the rest of the world.
The afternoon finished with a talk on "Refactoring to Functional".  I don't think I was the target audience for this presentation.  The talk introduced some foundational functional concepts but through the medium of Google's Java's Guava library (curiously its missing a fold/reduce operator).  This has some of the worst syntax I have ever seen.  The noise to useful ratio is stupidly high and it just made me so grateful for C#.  Yuck!

They chose this style because it made the core logic simpler (and presumably eventually their brains learned to unsee the syntactic abominations).  They did try Scala, but found that you had to think at both the Scala and Java layers.  I think using the functional style in a language without first class functions is a mistake.  It's like trying to write C in an object-oriented style.  You can do it, but you can constantly fighting it and that's not good.  Maybe Java 8 will resolve these problems in a few decades.

Wednesday 25 April 2012

ACCU 2012 - Day One

I'm lucky enough to be at ACCU 2012 this week, and I thought I'd try and write up my notes after each day so that I have more of a chance to digest the information being thrown at me!

The opening keynote today was entitled "Project Patterns: From Adrenalin Junkies to Template Zombies" by Tim Lister of PeopleWare fame. Tim was keen to point out that this wasn't about patterns in the strong Patterns sense, but more about some of the habits from teams he'd worked with.

The first pattern mentioned was "The Safety Valve". Successful teams often have a release mechanism, be it making popcorn, riding a pink tricycle (!) or playing foosball. This pattern rang true. In good projects I've been involved in I can always identify a safety valve. The converse is also true, when working on shitty projects its mostly just been heads down, stress up, writing code until its done.

The second pattern was mañana. This describes the problem with long deadlines. A lack of urgency means apathy. If something is outside of my mañana window, then I don't really care about it. As an engineer, my mañana window is about a sprint in length. If its not needed for this sprint, then it's off my radar. In contrast management types need to have a much longer mañana period.

Next up was "Lessons unlearnt" that made the controversial point that the lessons learnt in a retrospective rarely trigger change in the organisation, they only cause change in those that encountered the problems in the first place. This was backed up by the observation that long running software companies are no better at making software than those with little history.

"Project Sluts" or managers that just can't say no was another pattern that struck a chord. I suspect everyone has experienced this one!

Finally was the "Dead Fish Pattern". It's a project where everyone knows it is doomed from day one, but nobody says anything. I've direct experience of this one (it being the major reason why I left a previous job). We all knew the project was doomed, but the politics of the situation meant this could never be voiced, and the team just hunkered down into a marine corp death march mentality (see Death March by Edward Yourdon, a great book!). There was a great quote about heroics from developers to keep it running by the use of "code defibrillators".

I really enjoyed the talk, and I think I was successfully not-so-subliminally convinced to buy the book!

Next up was the Code Simplicity talk I (I was actively trying to avoid the Goldberg contraption that it C++ 11). It was a good talk using examples from the Qt framework to demonstrate complicated code.

Make it simple. Make it memorable. Make it inviting to look at. Make it fun to read

The biggest takeaway for me was looking at code from the perspective of consumers. There's no point designing a clever, simple API for yourself, if you hoist all the complexity on the users of the API (for example, requiring implementation of a huge interface). I also learnt a new useless acronym TATUC (throw away that useless code). Someone for the audience also gave a memorable composition over inheritance quote, "if you marry you can get divorced, but parents are forever".

The speaker also tried to encourage the boy scout rule. Since bad code is easy to spot,you should take every opportunity you can to tidy it up. This is much too idealistic for me, tidying code is always a risk (most of the time it won't have tests) and its all too easy to make a messier working code base into a prettier, but subtly wrong code base.

Bit of lunch and the onto Parallel Architectures. This turned out to be an interesting review of parallel architectures from the hardware side, and then the software side. In the beginning there was the Z80, and life was simple. Then the was pipelining


Pipelining is a leaky abstraction. Instructions can be reordered by the compiler/hardware and this can break your code in subtle, non-obvious ways. The best example of this is probably double checked locking. Cache coherency is another problematic abstraction. If you write something to cache, and this needs to be available to another core then the memory management unit (MMU) must flush the cache to make that data visible. This causes a huge performance problem (see here for a great example).

Multithreading is just one damn thing, after, before and during another (Alexandrescu)

On the software side of things, there was recognition that both mutex and atomic based approaches aren't suitable long term. Both are non-composable, and (apparently) there's evidence that mutex based approaches simply won't scale to large number of cores. Transactional memory was mentioned, though it's slow. Importantly though, transactional memory is composable and is simple to reason about. The Intel Haswell processor will provide hardware support, so it'll be interesting to see whether this provides an alternative to the actor models that seem to be in vogue at the moment. I'm looking forward to seeing how things change with concurrency over the next few years (I hope for a revolution, but I bet we just find more intricate ways of just about making things work).

Finally, I went to the "Objections on TDD and their refutations". This covered the many excuses that developers come up with for not doing TDD, ranging from the cop-out (my manager wouldn't let me) through to the delusional (I don't make mistakes). My feelings on this are mixed. Code without tests is just a lump of synctactically valid code, but not much more. Tests give substance. I'm not so sure whether unit tests are the best way to do this. There was a quote on Twitter today which stated that unit tests provide existential qualification whereas types provide universal qualification. I dig that!

TDD also seems to be used inappropriately for hard problems. In this case, it feels like simulated annealing; it seeks out a local solution to the problem, rather than finding a global optimum. My favourite example of this is Sudoku. Compare Ron Jefferies epic adventures with Peter Norvig's application of brain power.

Looking forward to day two.