Monday, 18 January 2016

Evolutionary Design Reading List

Evolving a shared library with an API in flux is a tough problem, but there’s plenty of principles, practices and patterns around this.

Parallel Change - Parallel change, also known as expand and contract is a pattern to implement backward-incompatible changes to an interface in a safe manner by breaking the change into three distinct phases: expand, migrate, and contract.

Postel’s Principle - Be conservative in what you send; be liberal in what you accept.

Refactoring Module Dependencies - Some patterns for refactoring module dependencies

Package Management Principles - Principles of packages (REP, CCP, CRP, ADP, SDP, SAP).  Think of these as a higher-level version of SOLID.

Strangler Application - A metaphor describing growing a new system around the edges of old.

Asset Capture - A strategy for migrating between a strangler application and back again.

On the Criteria to be used in Decomposing Systems into Modules - Parnas’ classic paper on modular systems (referenced by Tim in his recent talk).

Escape Integration Test Syrup - Talk from Agile on the Beach about testing and rapidly changing dependencies.

Semantic Versioning - For completeness!

Monday, 9 November 2015

The (very) Basics of R with the Game of Life

R is a programming language for statistical computing and graphics.  It's also the language of choice amongst pirates.  Arrr! R is increasingly important for big data analysis, and both Oracle and Microsoft have recently announced support for database analytics using R.

So, how do you get started with R?  Well, for the rest of this I'm going to assume that you already know how to program in a { } language like Java / C# and I'm going to cover the minimum amount possible to do something vaguely useful. The first step is to download the environment.  You can get this from here.  Once you've got something downloaded and installed you should be able to bring up a terminal and start R.  I really like the built in demos.  Bring up a list of them with demo() and type demo(graphics) to get an idea of the capabilities of R.

These are the boring syntax bits:
  • R is a case sensitive language
  • Comments start with # and run to the end of the line
  • Functions are called with parentheses e.g. f(x,y,z)
The "standard library" of R is called the R Base Package. When you bring up R, you bring up a workspace.  A workspace is just what is in scope as any one time.  You can examine the workspace by calling the ls function.

    # Initially my workspace is empty
    > ls()

    # Now I set a value and lo-and-behold, it's in my workspace
    > x = "banana"
    > ls()
    [1] "x"

    # I can save my workspace with
    > save(file="~/foo.RData");

    # I can load my workspace with
    > load("~/foo.RData");

    # I can remove elements from the workspace with rm
    > rm(x)
    > ls()

    # I can nuke my workspace with rm(list=ls())
    > x = 'banana'
    > rm(list=ls())
    > ls()

We've seen above that R supports string data, but it also supports vectors, lists, arrays, matrices, tables and data frames. To define a vector you use the c function.  For example:

    > x = c(1,2,3,4,5)
    [1] 1 2 3 4 5

    > length(x)
    [1] 5

Remember everything in a vector must be of the same type.  Elements are co-erced to the same type, so c(1,'1',TRUE) results in a vector of string types.  Indexing into vectors starts at 1 (not zero).  You can use Python style list selection:

    > x = c(1,2,3,4,5,6,7,8,9,10)
    >  x[7:10] # select 7 thru 10
    [1]  7  8  9 10
    > x[-(1:3)] # - does exclusion
    [1]  4  5  6  7  8  9 10

To define a list, you use, ahem, list.  Items in list are named components (see the rules of variable naming).

    > y = list(name="Fred", lastname="Bloggs", age=21)
    > y
    [1] "Fred"
    [1] "Bloggs"
    [1] 21
    > y$name # access the name property
    [1] "Fred"

Finally, let's look at matrices.  You construct them with matrix and pass in a vector to construct from, together with the size.

  > m = matrix( c(1,2,3,4,5,6,7,8,9), nrow=3, ncol=3)
  > m
           [,1] [,2] [,3]
    [1,]    1    4    7
    [2,]    2    5    8
    [3,]    3    6    9

OK, that should be enough boring information out the way to let me write a function for the Game of Life.  All I want to do is take a matrix in, apply the update rules, and return a new one.  How hard can that be? You write R files with the extension ".R" and bring them into your workspace with the source function. Here's an embarrassingly poor go at the Game of Life (note I've only spent 5 minutes with the language, so if you've got any improvements to suggest or more idiomatic ways of doing the same thing, they are greatly received!).

Testing this at the REPL with a simple pattern.

    > blinker = matrix(0,5,5)
    > blinker[3,2:4]=1
    > blinker
         [,1] [,2] [,3] [,4] [,5]
    [1,]    0    0    0    0    0
    [2,]    0    0    0    0    0
    [3,]    0    1    1    1    0
    [4,]    0    0    0    0    0
    [5,]    0    0    0    0    0
    > nextStep(blinker)
         [,1] [,2] [,3] [,4] [,5]
    [1,]    0    0    0    0    0
    [2,]    0    0    1    0    0
    [3,]    0    0    1    0    0
    [4,]    0    0    1    0    0
    [5,]    0    0    0    0    0

Huzzah! Next steps are probably to write some unit tests [PDF] around it, but learning how to install packages can wait till another day!

Saturday, 17 October 2015

Mindless TDD

In this post, I look at a common misinterpretation of TDD I often see at coding dojos/katas.

A quick refresher - TDD is simple, but not easy.

  1. Write a failing test
  2. Write the minimum code to make test pass
  3. Refactor

You have to think at each step.  This is often overlooked, and TDD portrayed as series of mindless steps (if that were true, developers probably wouldn't get paid so well!).

Let's take the Bowling Kata as an example of how easy it is to fall into mindless steps.  What's the right test to do first?

    public void PinExists() {
      Assert.That(new Pin(), IsStanding, Is.True);

We're bound to need a Pin class right?  And we should definitely check whether it's standing up or falling down.  We continue in this vein, and create a Pin that can be knocked down and stood up.  Everything proceeds swimmingly. 15 - 20 minutes have elapsed and we have a unit tested equivalent of bool that's no use to anyone.

I've seen similar anti-patterns in the Game of Life kata.  We write some tests for a cell, and the rules (three live neighbours means alive and so on).  We use this to drive a Cell class and we add methods to change the state depending on the number of neighbours.  Some time passes, and then we realize we actually want a grid of these objects, and they change state based on their neighbours state and we're in a bit of a pickle.  Our first guess at an implementation has given us a big problem.

If we somehow manage to solve the problem from this state, we end up with a load of tests that are coupled to the implementation.  Worse, because we've ended up creating lots of classes, we start prematurely applying SOLID, breaking down things into even more waffly collections of useless objects with no coherent basis.  Unsurprisingly, it's difficult to see the value in test-driven development when practiced like this.

So what's the common problem in both these cases?

Uncle Bob has described this behaviour Slide 9 of the Bowling Kata PPT describes a similar problem, but attributes it to over-design and suggests TDD as the solution.  I agree, but I think some people pervert TDD to mean test-driven development of my supposed solution, rather than TDD of the problem itself.

The common problem is simple.  Not starting with the end in mind!

If we'd have started the Bowling Kata from the outside-in, our first test might have simply bowled 10 gutter balls and verified we return a zero.    We could already ship this to (really) terrible bowlers and it'd work!

Maybe next we could ensure that if we didn't bowl any spares/strikes it'd sum the scores up.  Again, now we can ship this to a wider audience.  Next up, let's solve spares, then strikes and at each stage we can ship!

Each time around the TDD loop we should have solved more of the problem and be closer to fully solving it.  TDD should be continuous delivery, if the first test isn't solving the problem for a simple case it's probably not the right test.

Similarly for the Game of Life, instead of starting from a supposed solution of a cell class, what happens if your first test is just evolving a grid full of dead cells?  What happens if we add the rules one at a time?  You can ship every test once you've added the boilerplate of the "null" case.

TDD isn't about testing your possible implementation on the way to solving the problem, it's about writing relevant tests first and driving the implementation from that.  Start from the problem!

TDD done right is vicious - it's a series of surgical strikes (tests) aimed at getting you to solve the problem with the minimum amount of code possible.