Monday, 9 November 2015

The (very) Basics of R with the Game of Life

R is a programming language for statistical computing and graphics.  It's also the language of choice amongst pirates.  Arrr! R is increasingly important for big data analysis, and both Oracle and Microsoft have recently announced support for database analytics using R.

So, how do you get started with R?  Well, for the rest of this I'm going to assume that you already know how to program in a { } language like Java / C# and I'm going to cover the minimum amount possible to do something vaguely useful. The first step is to download the environment.  You can get this from here.  Once you've got something downloaded and installed you should be able to bring up a terminal and start R.  I really like the built in demos.  Bring up a list of them with demo() and type demo(graphics) to get an idea of the capabilities of R.

These are the boring syntax bits:
  • R is a case sensitive language
  • Comments start with # and run to the end of the line
  • Functions are called with parentheses e.g. f(x,y,z)
The "standard library" of R is called the R Base Package. When you bring up R, you bring up a workspace.  A workspace is just what is in scope as any one time.  You can examine the workspace by calling the ls function.

    # Initially my workspace is empty
    > ls()

    # Now I set a value and lo-and-behold, it's in my workspace
    > x = "banana"
    > ls()
    [1] "x"

    # I can save my workspace with
    > save(file="~/foo.RData");

    # I can load my workspace with
    > load("~/foo.RData");

    # I can remove elements from the workspace with rm
    > rm(x)
    > ls()

    # I can nuke my workspace with rm(list=ls())
    > x = 'banana'
    > rm(list=ls())
    > ls()

We've seen above that R supports string data, but it also supports vectors, lists, arrays, matrices, tables and data frames. To define a vector you use the c function.  For example:

    > x = c(1,2,3,4,5)
    [1] 1 2 3 4 5

    > length(x)
    [1] 5

Remember everything in a vector must be of the same type.  Elements are co-erced to the same type, so c(1,'1',TRUE) results in a vector of string types.  Indexing into vectors starts at 1 (not zero).  You can use Python style list selection:

    > x = c(1,2,3,4,5,6,7,8,9,10)
    >  x[7:10] # select 7 thru 10
    [1]  7  8  9 10
    > x[-(1:3)] # - does exclusion
    [1]  4  5  6  7  8  9 10

To define a list, you use, ahem, list.  Items in list are named components (see the rules of variable naming).

    > y = list(name="Fred", lastname="Bloggs", age=21)
    > y
    [1] "Fred"
    [1] "Bloggs"
    [1] 21
    > y$name # access the name property
    [1] "Fred"

Finally, let's look at matrices.  You construct them with matrix and pass in a vector to construct from, together with the size.

  > m = matrix( c(1,2,3,4,5,6,7,8,9), nrow=3, ncol=3)
  > m
           [,1] [,2] [,3]
    [1,]    1    4    7
    [2,]    2    5    8
    [3,]    3    6    9

OK, that should be enough boring information out the way to let me write a function for the Game of Life.  All I want to do is take a matrix in, apply the update rules, and return a new one.  How hard can that be? You write R files with the extension ".R" and bring them into your workspace with the source function. Here's an embarrassingly poor go at the Game of Life (note I've only spent 5 minutes with the language, so if you've got any improvements to suggest or more idiomatic ways of doing the same thing, they are greatly received!).

Testing this at the REPL with a simple pattern.

    > blinker = matrix(0,5,5)
    > blinker[3,2:4]=1
    > blinker
         [,1] [,2] [,3] [,4] [,5]
    [1,]    0    0    0    0    0
    [2,]    0    0    0    0    0
    [3,]    0    1    1    1    0
    [4,]    0    0    0    0    0
    [5,]    0    0    0    0    0
    > nextStep(blinker)
         [,1] [,2] [,3] [,4] [,5]
    [1,]    0    0    0    0    0
    [2,]    0    0    1    0    0
    [3,]    0    0    1    0    0
    [4,]    0    0    1    0    0
    [5,]    0    0    0    0    0

Huzzah! Next steps are probably to write some unit tests [PDF] around it, but learning how to install packages can wait till another day!

Saturday, 17 October 2015

Mindless TDD

In this post, I look at a common misinterpretation of TDD I often see at coding dojos/katas.

A quick refresher - TDD is simple, but not easy.

  1. Write a failing test
  2. Write the minimum code to make test pass
  3. Refactor

You have to think at each step.  This is often overlooked, and TDD portrayed as series of mindless steps (if that were true, developers probably wouldn't get paid so well!).

Let's take the Bowling Kata as an example of how easy it is to fall into mindless steps.  What's the right test to do first?

    public void PinExists() {
      Assert.That(new Pin(), IsStanding, Is.True);

We're bound to need a Pin class right?  And we should definitely check whether it's standing up or falling down.  We continue in this vein, and create a Pin that can be knocked down and stood up.  Everything proceeds swimmingly. 15 - 20 minutes have elapsed and we have a unit tested equivalent of bool that's no use to anyone.

I've seen similar anti-patterns in the Game of Life kata.  We write some tests for a cell, and the rules (three live neighbours means alive and so on).  We use this to drive a Cell class and we add methods to change the state depending on the number of neighbours.  Some time passes, and then we realize we actually want a grid of these objects, and they change state based on their neighbours state and we're in a bit of a pickle.  Our first guess at an implementation has given us a big problem.

If we somehow manage to solve the problem from this state, we end up with a load of tests that are coupled to the implementation.  Worse, because we've ended up creating lots of classes, we start prematurely applying SOLID, breaking down things into even more waffly collections of useless objects with no coherent basis.  Unsurprisingly, it's difficult to see the value in test-driven development when practiced like this.

So what's the common problem in both these cases?

Uncle Bob has described this behaviour Slide 9 of the Bowling Kata PPT describes a similar problem, but attributes it to over-design and suggests TDD as the solution.  I agree, but I think some people pervert TDD to mean test-driven development of my supposed solution, rather than TDD of the problem itself.

The common problem is simple.  Not starting with the end in mind!

If we'd have started the Bowling Kata from the outside-in, our first test might have simply bowled 10 gutter balls and verified we return a zero.    We could already ship this to (really) terrible bowlers and it'd work!

Maybe next we could ensure that if we didn't bowl any spares/strikes it'd sum the scores up.  Again, now we can ship this to a wider audience.  Next up, let's solve spares, then strikes and at each stage we can ship!

Each time around the TDD loop we should have solved more of the problem and be closer to fully solving it.  TDD should be continuous delivery, if the first test isn't solving the problem for a simple case it's probably not the right test.

Similarly for the Game of Life, instead of starting from a supposed solution of a cell class, what happens if your first test is just evolving a grid full of dead cells?  What happens if we add the rules one at a time?  You can ship every test once you've added the boilerplate of the "null" case.

TDD isn't about testing your possible implementation on the way to solving the problem, it's about writing relevant tests first and driving the implementation from that.  Start from the problem!

TDD done right is vicious - it's a series of surgical strikes (tests) aimed at getting you to solve the problem with the minimum amount of code possible.

Tuesday, 31 March 2015

Anatomy of a class.

Do you ever view a class and get filled with a sense of dread?  I did today, so I thought a good old-fashioned rant was in order.

I opened up a class today and was greeted with this.   First off, don't worry, I made the Wibble up.  Secondly, if wibble was the first thing you noticed we're probably in trouble.

    public sealed class MultiWibbledEntitiesDataPresenterFactory<TDetectionContext, TWibbledEntity, TProvider> : BasicFactory<Unit, IDataPresenter<MultiWibbledEntitiesContext<TDetectionContext, TWibbledEntity>>>
        where TWibbledEntity : WibbledEntity
        where TProvider : IProvider<TWibbledEntity>
        private readonly IUtcDateTimeProvider m_UtcDateTimeProvider;
        private readonly ILocalDateTimeProvider m_LocalDateTimeProvider;
        private readonly IWibbledEntityDetector<TDetectionContext> m_WibbledEntityDetector;
        private readonly IFactory<TWibbledEntity, TProvider> m_ProviderFactory;
        private readonly IFactory<TWibbledEntity, IDataPresenter<SingleWibbledEntityContext<TWibbledEntity, TProvider>>> m_RawWibbledEntityDataPresenterFactory;
        private readonly Func<IUtcDateTimeProvider, string, IStatusLogger> m_StatusLoggerBuilder;

        public MultiWibbledEntitiesDataPresenterFactory(
            IUtcDateTimeProvider utcDateTimeProvider,
            ILocalDateTimeProvider localDateTimeProvider,
            IWibbledEntityDetector<TDetectionContext> wibbledEntityDetector,
            IFactory<TWibbledEntity, TProvider> providerFactory,
            IFactory<TWibbledEntity, IDataPresenter<SingleWibbledEntityContext<TWibbledEntity, TProvider>>> rawWibbledEntityDataPresenterFactory,
            Func<IUtcDateTimeProvider, string, IStatusLogger> statusLoggerBuilder
            m_UtcDateTimeProvider = utcDateTimeProvider;
            m_LocalDateTimeProvider = localDateTimeProvider;
            m_WibbledEntityDetector = wibbledEntityDetector;
            m_ProviderFactory = providerFactory;
            m_RawWibbledEntityDataPresenterFactory = rawWibbledEntityDataPresenterFactory;
            m_StatusLoggerBuilder = statusLoggerBuilder;

        protected override IDataPresenter<MultiWibbledEntitiesContext<TDetectionContext, TWibbledEntity>> ConstructItem(Unit key)
            return new MultiWibbledEntitiesDataPresenter<TDetectionContext, TWibbledEntity, TProvider>(

OK, you've read through that.  You've probably died a little inside. What did you learn?  Well, this is a MultiWibbledEntitiesDataPresentorFactory.

What the actual fuck?

A multi wibbled entities data presenter factory.

Spacing it out doesn't help much either.  It's a factory that makes data presenters for multi wibbled things.  OK, that starts to make some sense.  I guess I'd use this class if ever I needed to make a multi-wibbled-entities-data-presenter-factory.

Let's say that's the case. How do I construct one of these factory things?  I need a couple of time providers (UTC and local time, just in case), a detector (no idea what that is), two more factories and a function called "statusLoggerBuilder".  And this is just to create an object (albeit a rather complicated multi-wibbled data presenter object).

What can you do with the class?  Not a lot, there aren't any public methods other than the constructor, and that's pretty boring.  So, in order to make any progress, you'll have to explore a few more classes.  You'll need to visit the "BasicFactory".  You'll need to look at Unit, IDataPresenter and a few more parameterized classes.  In order to work out what this does, I've got to read all these files.

What's with all the generics?  Does this tell me the original developer was a template meta-programming C++ person?  Why all the complexity?

How many files do I need to open in order to understand this class?

What problem is this class solving?  The code doesn't tell me this, there aren't any comments and there aren't any tests.  The only way for me to understand this code is to navigate all the code's friends and work out what each of them do.

But on the plus side, I can create one and test it, so it must be good right?