Monday 9 November 2015

The (very) Basics of R with the Game of Life

R is a programming language for statistical computing and graphics.  It's also the language of choice amongst pirates.  Arrr! R is increasingly important for big data analysis, and both Oracle and Microsoft have recently announced support for database analytics using R.

So, how do you get started with R?  Well, for the rest of this I'm going to assume that you already know how to program in a { } language like Java / C# and I'm going to cover the minimum amount possible to do something vaguely useful. The first step is to download the environment.  You can get this from here.  Once you've got something downloaded and installed you should be able to bring up a terminal and start R.  I really like the built in demos.  Bring up a list of them with demo() and type demo(graphics) to get an idea of the capabilities of R.

These are the boring syntax bits:
  • R is a case sensitive language
  • Comments start with # and run to the end of the line
  • Functions are called with parentheses e.g. f(x,y,z)
The "standard library" of R is called the R Base Package. When you bring up R, you bring up a workspace.  A workspace is just what is in scope as any one time.  You can examine the workspace by calling the ls function.

    # Initially my workspace is empty
    > ls()
    character(0)

    # Now I set a value and lo-and-behold, it's in my workspace
    > x = "banana"
    > ls()
    [1] "x"

    # I can save my workspace with
    > save(file="~/foo.RData");

    # I can load my workspace with
    > load("~/foo.RData");

    # I can remove elements from the workspace with rm
    > rm(x)
    > ls()
    character(0)

    # I can nuke my workspace with rm(list=ls())
    > x = 'banana'
    > rm(list=ls())
    > ls()
    character(0)   

We've seen above that R supports string data, but it also supports vectors, lists, arrays, matrices, tables and data frames. To define a vector you use the c function.  For example:


    > x = c(1,2,3,4,5)
    [1] 1 2 3 4 5

    > length(x)
    [1] 5

Remember everything in a vector must be of the same type.  Elements are co-erced to the same type, so c(1,'1',TRUE) results in a vector of string types.  Indexing into vectors starts at 1 (not zero).  You can use Python style list selection:

    > x = c(1,2,3,4,5,6,7,8,9,10)
    >  x[7:10] # select 7 thru 10
    [1]  7  8  9 10
    > x[-(1:3)] # - does exclusion
    [1]  4  5  6  7  8  9 10

To define a list, you use, ahem, list.  Items in list are named components (see the rules of variable naming).

    > y = list(name="Fred", lastname="Bloggs", age=21)
    > y
    $name
    [1] "Fred"
    $lastname
    [1] "Bloggs"
    $age
    [1] 21
    > y$name # access the name property
    [1] "Fred"

Finally, let's look at matrices.  You construct them with matrix and pass in a vector to construct from, together with the size.

  > m = matrix( c(1,2,3,4,5,6,7,8,9), nrow=3, ncol=3)
  > m
           [,1] [,2] [,3]
    [1,]    1    4    7
    [2,]    2    5    8
    [3,]    3    6    9

OK, that should be enough boring information out the way to let me write a function for the Game of Life.  All I want to do is take a matrix in, apply the update rules, and return a new one.  How hard can that be? You write R files with the extension ".R" and bring them into your workspace with the source function. Here's an embarrassingly poor go at the Game of Life (note I've only spent 5 minutes with the language, so if you've got any improvements to suggest or more idiomatic ways of doing the same thing, they are greatly received!).



Testing this at the REPL with a simple pattern.

    > blinker = matrix(0,5,5)
    > blinker[3,2:4]=1
    > blinker
         [,1] [,2] [,3] [,4] [,5]
    [1,]    0    0    0    0    0
    [2,]    0    0    0    0    0
    [3,]    0    1    1    1    0
    [4,]    0    0    0    0    0
    [5,]    0    0    0    0    0
    > nextStep(blinker)
         [,1] [,2] [,3] [,4] [,5]
    [1,]    0    0    0    0    0
    [2,]    0    0    1    0    0
    [3,]    0    0    1    0    0
    [4,]    0    0    1    0    0
    [5,]    0    0    0    0    0

Huzzah! Next steps are probably to write some unit tests [PDF] around it, but learning how to install packages can wait till another day!

Saturday 17 October 2015

Mindless TDD

In this post, I look at a common misinterpretation of TDD I often see at coding dojos/katas.

A quick refresher - TDD is simple, but not easy.

  1. Write a failing test
  2. Write the minimum code to make test pass
  3. Refactor

You have to think at each step.  This is often overlooked, and TDD portrayed as series of mindless steps (if that were true, developers probably wouldn't get paid so well!).

Let's take the Bowling Kata as an example of how easy it is to fall into mindless steps.  What's the right test to do first?

    [Test]
    public void PinExists() {
      Assert.That(new Pin(), IsStanding, Is.True);
    }

We're bound to need a Pin class right?  And we should definitely check whether it's standing up or falling down.  We continue in this vein, and create a Pin that can be knocked down and stood up.  Everything proceeds swimmingly. 15 - 20 minutes have elapsed and we have a unit tested equivalent of bool that's no use to anyone.

I've seen similar anti-patterns in the Game of Life kata.  We write some tests for a cell, and the rules (three live neighbours means alive and so on).  We use this to drive a Cell class and we add methods to change the state depending on the number of neighbours.  Some time passes, and then we realize we actually want a grid of these objects, and they change state based on their neighbours state and we're in a bit of a pickle.  Our first guess at an implementation has given us a big problem.

If we somehow manage to solve the problem from this state, we end up with a load of tests that are coupled to the implementation.  Worse, because we've ended up creating lots of classes, we start prematurely applying SOLID, breaking down things into even more waffly collections of useless objects with no coherent basis.  Unsurprisingly, it's difficult to see the value in test-driven development when practiced like this.

So what's the common problem in both these cases?

Uncle Bob has described this behaviour Slide 9 of the Bowling Kata PPT describes a similar problem, but attributes it to over-design and suggests TDD as the solution.  I agree, but I think some people pervert TDD to mean test-driven development of my supposed solution, rather than TDD of the problem itself.

The common problem is simple.  Not starting with the end in mind!

If we'd have started the Bowling Kata from the outside-in, our first test might have simply bowled 10 gutter balls and verified we return a zero.    We could already ship this to (really) terrible bowlers and it'd work!

Maybe next we could ensure that if we didn't bowl any spares/strikes it'd sum the scores up.  Again, now we can ship this to a wider audience.  Next up, let's solve spares, then strikes and at each stage we can ship!

Each time around the TDD loop we should have solved more of the problem and be closer to fully solving it.  TDD should be continuous delivery, if the first test isn't solving the problem for a simple case it's probably not the right test.

Similarly for the Game of Life, instead of starting from a supposed solution of a cell class, what happens if your first test is just evolving a grid full of dead cells?  What happens if we add the rules one at a time?  You can ship every test once you've added the boilerplate of the "null" case.

TDD isn't about testing your possible implementation on the way to solving the problem, it's about writing relevant tests first and driving the implementation from that.  Start from the problem!

TDD done right is vicious - it's a series of surgical strikes (tests) aimed at getting you to solve the problem with the minimum amount of code possible.

Tuesday 31 March 2015

Anatomy of a class.

Do you ever view a class and get filled with a sense of dread?  I did today, so I thought a good old-fashioned rant was in order.

I opened up a class today and was greeted with this.   First off, don't worry, I made the Wibble up.  Secondly, if wibble was the first thing you noticed we're probably in trouble.


    public sealed class MultiWibbledEntitiesDataPresenterFactory<TDetectionContext, TWibbledEntity, TProvider> : BasicFactory<Unit, IDataPresenter<MultiWibbledEntitiesContext<TDetectionContext, TWibbledEntity>>>
        where TWibbledEntity : WibbledEntity
        where TProvider : IProvider<TWibbledEntity>
    {
        private readonly IUtcDateTimeProvider m_UtcDateTimeProvider;
        private readonly ILocalDateTimeProvider m_LocalDateTimeProvider;
        private readonly IWibbledEntityDetector<TDetectionContext> m_WibbledEntityDetector;
        private readonly IFactory<TWibbledEntity, TProvider> m_ProviderFactory;
        private readonly IFactory<TWibbledEntity, IDataPresenter<SingleWibbledEntityContext<TWibbledEntity, TProvider>>> m_RawWibbledEntityDataPresenterFactory;
        private readonly Func<IUtcDateTimeProvider, string, IStatusLogger> m_StatusLoggerBuilder;

        public MultiWibbledEntitiesDataPresenterFactory(
            IUtcDateTimeProvider utcDateTimeProvider,
            ILocalDateTimeProvider localDateTimeProvider,
            IWibbledEntityDetector<TDetectionContext> wibbledEntityDetector,
            IFactory<TWibbledEntity, TProvider> providerFactory,
            IFactory<TWibbledEntity, IDataPresenter<SingleWibbledEntityContext<TWibbledEntity, TProvider>>> rawWibbledEntityDataPresenterFactory,
            Func<IUtcDateTimeProvider, string, IStatusLogger> statusLoggerBuilder
            )
        {
            m_UtcDateTimeProvider = utcDateTimeProvider;
            m_LocalDateTimeProvider = localDateTimeProvider;
            m_WibbledEntityDetector = wibbledEntityDetector;
            m_ProviderFactory = providerFactory;
            m_RawWibbledEntityDataPresenterFactory = rawWibbledEntityDataPresenterFactory;
            m_StatusLoggerBuilder = statusLoggerBuilder;
        }

        protected override IDataPresenter<MultiWibbledEntitiesContext<TDetectionContext, TWibbledEntity>> ConstructItem(Unit key)
        {
            return new MultiWibbledEntitiesDataPresenter<TDetectionContext, TWibbledEntity, TProvider>(
                m_UtcDateTimeProvider,
                m_LocalDateTimeProvider,
                m_WibbledEntityDetector,
                m_ProviderFactory,
                m_RawWibbledEntityDataPresenterFactory,
                m_StatusLoggerBuilder
                );
        }
    }


OK, you've read through that.  You've probably died a little inside. What did you learn?  Well, this is a MultiWibbledEntitiesDataPresentorFactory.

What the actual fuck?

A multi wibbled entities data presenter factory.

Spacing it out doesn't help much either.  It's a factory that makes data presenters for multi wibbled things.  OK, that starts to make some sense.  I guess I'd use this class if ever I needed to make a multi-wibbled-entities-data-presenter-factory.

Let's say that's the case. How do I construct one of these factory things?  I need a couple of time providers (UTC and local time, just in case), a detector (no idea what that is), two more factories and a function called "statusLoggerBuilder".  And this is just to create an object (albeit a rather complicated multi-wibbled data presenter object).

What can you do with the class?  Not a lot, there aren't any public methods other than the constructor, and that's pretty boring.  So, in order to make any progress, you'll have to explore a few more classes.  You'll need to visit the "BasicFactory".  You'll need to look at Unit, IDataPresenter and a few more parameterized classes.  In order to work out what this does, I've got to read all these files.

What's with all the generics?  Does this tell me the original developer was a template meta-programming C++ person?  Why all the complexity?

How many files do I need to open in order to understand this class?

What problem is this class solving?  The code doesn't tell me this, there aren't any comments and there aren't any tests.  The only way for me to understand this code is to navigate all the code's friends and work out what each of them do.

But on the plus side, I can create one and test it, so it must be good right?

Monday 19 January 2015

The Diamond Square Algorithm

Ever wondered how to generate a landscape?  I've been fascinated by these since the days of Vista Pro on my trusty Amiga.

The diamond-square algorithm is a method for generating heightmaps.  It's a great algorithm because it's amazingly simple and produces something very visual (similar to the emergent behaviour exhibited by the flocking algorithm.  My kind of algorithm!  In this post, I'll try to explain the implementation using Haskell and generate some pretty pictures.

As the paper [PDF] states, previous modelling techniques for graphics were based on the idea that you can simply describe a landscape as some set of deterministic functions.  Bezier and B-spline patches used higher-order polynomials to describe objects and this approach was good for rendering artificial objects.  Natural objects, such as terrain, don't have regular patterns so an approach likes splines doesn't work.

This algorithm was innovative because it used a stochastic approach.  Given some simple rules (and some randomness!) the algorithm generates a "natural" looking landscape.  The paper describes models for 1D, 2D and 3D surfaces.  We'll just use the simplest possible example, rendering a height map.

We start with a square with each corner given a randomly assigned height.  If the area of this square is 1, then we’re done.  Easy.  We’ll call the corners, TL, TR, BL and BR (representing top left, top right, bottom left and bottom right).


If the square is too big, then we recursively divide it into smaller squares.
We assign each new square a height based on the average of the points surrounding it.  Note there’s nothing stochastic about this approach yet, it’s purely deterministic.

We can model this with Haskell pretty clearly.  We start off by defining a simple type to represent a Square.

type Point = (Int,Int)
data Square = Square
             {
               position :: Point                
             , size    :: Int
             , tl      :: Double -- Height of top left
             , tr      :: Double -- Height of top right
             , bl      :: Double -- Height of bottom left
             , br      :: Double -- Height of bottom right
             } deriving (Show,Eq)

Now all we have to do write a little function to divide things into four.  Firstly let’s capture the pattern that dividing stops when the size of the square is one.

isUnit :: Square -> Bool
isUnit sq = size sq == 1

allSubSquares :: (Square -> [Square]) -> Square -> [Square]
allSubSquares f sq
 | isUnit sq = [sq]
 | otherwise = concatMap (allSubSquares f) (f sq)

The allSubSquares function now simply repeatedly called our splitting function until things are reduced to the tiniest possible size.

What does our split function look like?  Well, all it has to do is calculate the new squares as the picture defines above.  It looks a little like this:

divide :: Double -> Square -> [Square]
divide eps parent = [
   sq                    { tr = at, br = ah, bl = al } -- top left unchanged
 , (move sq (half,0))    { tl = at, bl = ah, br = ar } -- top right unchanged
 , (move sq (0,half))    { tr = ah, br = ab, tl = al } -- bottom left unchanged
 , (move sq (half,half)) { tl = ah, bl = ab, tr = ar } -- bottom right unchanged
 ]
 where    
   half = size parent `div` 2
   sq = parent { size = half }
   at = averageTopHeight parent
   ah = averageHeight eps parent -- height of middle
   ab = averageBottomHeight parent
   ar = averageRightHeight parent
   al = averageLeftHeight parent

OK, this isn’t very exciting (and I’ve left out the boilerplate).  But we have something now, it’s deterministic, but it creates cool results.


Woo. I used JuicyPixels to render the image.  I really wish I’d found this library a long time ago, it’s fabulously simple to use and all I needed to do was use the sexy generateImage function.

So how do we actually generate something that looks vaguely natural?

The answer is randomness.  Tightly controlled.  Let’s look at our original square divider and make one really small change.


I’ll save you the trouble of finding it, it’s that pesky “e” we’ve added to the middle.  What is e?

Well, it’s the stochastic approach.  It’s a random number that’s assigned to displace the midpoint.  When the square is big, the displacement is big.  When the square is small, the displacement is small.  In fact we simply define e as a random number [-0.5, 0.5] scaled by the size of the square.

What happens when we add this displacement is kind of cool.  We now get a random surface that smooths itself out and almost looks natural.


I think this is pretty neat.  It’s a smooth landscape that could easily look natural.  We can do even better by giving a bit of color.  I’ve done this using a simple color map as described on Stackoverflow.

Using a map generated from similar parameters, we get a much prettier colour.  If you squint a bit, imagine something it could be a natural scene right?


All the code for this is available on my GitHub profile, the diamond-square project.  Some fun extensions to this would be to create some animations, or actually render it in 3D with OpenGL.

Tuesday 13 January 2015

Book review: Lead with Respect - A Novel of Lean Practice

Book review: Lead with Respect - A Novel of Lean Practice

As I said in a previous post, I'm a sucker for a business novel. Lead with Respect is another business novel by father and son team Michael and Freddy Balle.

My goal in reading this was to get an idea of how lean management might apply to software development.

The story starts with Jane the CEO of Southcape software who is working on some software for a famed Lean company called Nexplas. As you might expect, things aren't going well. The software doesn't do what's required and milestones aren't being met. This sets the stage for the sensei/student relationship between Jane (CEO of a software company) and Andrew (VP of a manufacturing company). Throughout the book, Andrew imparts knowledge to Jane (and hopefully the reader too).

The core theme of the book is, as the title suggests, respect. Respect, in the lean sense, is much wider than the dictionary definition of respect. In Lean, respect means:

  • Engage everybody all the time in problem solving, together, by making every effort to understand each other's point of view.
  • Guarantee the quality, productivity and flexibility as we try to cut nonsatisfaction and nonvalue-added work.
  • Share success and reward involvement and initiative which makes our respect promise credible and sustains our long-term growth. Customer satisfaction simply can't happen without employee satisfaction.

"Lead with Respect" rallies against a preconceived notion of Lean as grinding people into the ground, cutting costs and working them till they drop. I've never had this view of lean, but I can see how it would make excellent FUD for competiting philosophies.

So what does a Lean manager actually do? Put simply:

Our job as managers is to create conditions where people can be successful at their job. And what that comes down to is working together to solve the problems we face.

This all sounds easy, right?

What are problems? The book ties problems down to continuous improvement through the familiar equation that a job is the sum of work and the continuous improvement that must be a part of employment. We should accept that continuous improvement is a part of the job. I don't think this is a difficult case to argue. In software engineering (and other knowledge work) if you aren't constantly learning, then you are falling behind. This differs

Lead with Respect argues that our job is to support people in this journey of continuous improvement. Continuous improvement is about change; change is scary! We should work with people to break larger challenges into smaller, every day steps. The link is made (again) with kaizen and standards, namely that you can't have continuous improvement without some standards.

To improve performance we have to improve processes. To improve processes we have to improve individual's competence and their ability to work with others.

In the book, Jane improves her performance as a team using things that are familiar to most software engineers who've got any experience of agile. Pair programming, test-driven development and listening to customers. None of this is surprising. Towards the end of the book, another tool for conversations is introduced in the form of A3 Problem Solving. This is something that sounds like a process nonsense, but the book does a good job of explaining that it's about the scientific method. By following a structured approach it provides a way to have structured conversations which in turn makes it easier for others to understand the problem and potentially coach people to a solution. This is something that Mike Rother explores in his book, Toyota Kata which is yet another item on my ever-growing reading list.

Was this book a good read? Well, it was enjoyable enough, the characters were believable at least. I'm not sure I got as much out of it as the earlier book and some of the discussions about software felt a bit unrealistic. The key themes definitely seem transferable to any discipline.

Monday 5 January 2015

Book Review: The Lean Manager - A novel of lean transformation

I've recently taken on a different role at work and as part of that I've tried to force myself to read as many books on management topics as possible.

After reading The Goal and The Phoenix Project, I realized that I'm a sucker for a business novel. After a bit of searching, I settled on the book by Freddy Balle and Michael Balle entitled The Lean Manager: A novel of lean transformation.

The book's setting is in an automotive part plant in France that is under threat from closure. Andy, the books protaganist, agrees a deal with the CEO, Phil (also his mentor), that if the plant becomes competitive it will not close. It's a familiar setting from other books, I suppose the concept is that the best catalyst for change is adversity.

What did this book teach me?

Standardized work and kaizen are two sides of the same coin.

For a long time, I've resisted the idea that standardizing any work to do with programming is a good thing. The best teams I've worked in have always had implicit coding standards created by being a closely knit team unafraid to voice concerns when standards (even if they are only in the heads of a few) weren't met. The idea of explicitly setting coding standards (and I'm not talking about tabs vs. spaces, more in the style of 101 Coding Guidelines for C++) has always been an anathema for me. I think the reasons for this are simple; I used to think standards implied something external to the team influencing how they work.

Standardized work is about agreeing how the work should be done best, to better see the problems. Kaizen is about encouraging operators and frontline supervisors to solve all the problems that appear as gaps to the standard.

I realize this is talking about manufacturing, not software engineering, but the idea of defining a standard and viewing a gap to the standard as a problem is a powerful one. As a stupid example, let's say you define automated acceptance testing as standard for all new features, but fail to meet it. Why x 5? What can you learn from this that changes the way you develop software? By stating a standard and holding yourself to account you see the problems and force a conversation about it. Standardized work encourages problem solving (kaizen) by acting as a tool that allows you to have the right conversations.

Another key theme from the book is the idea of "go and see". The best way to learn is to go and see. This applies everywhere. Go and see (Genchi Benbutsu) teams, Go and see customers. Go to the place where the work happens and magic will happen. Again, this sounds like a very simple thing (management by wandering around) but it's deceptively powerful when adopted as a deliberate technique (or at least, it is in the book!).

Visual Work Management is another tool in the Lean toolbox. Part of Go and See is being able to immediately recognize problems. We already have something like this in the software engineering industry with build status monitoring (Siren of Shame!). What else could we visualize? The advantage of the automative industry is the takt time is often short (if customers are demanding 10K units a week, the takt time is in minutes). In Software Engineering, our sprints are often weeks. It's difficult to know if things are going off the rails. Perhaps some elephant carpaccio is in order?

Does go and see translate to software engineering? Definitely for some parts, namely visiting customers and understanding their requirements (customers want holes, not drills). Does this apply at other times, such as when teams are writing code? I suspect it does; the only way to understand why teams are flying or struggling is to actually see them in action.

The last big theme from the book was that developing people is just as important as developing the product. The idea is simply that once *everyone* is contributing to product improvement and innovation then you've built yourself a significant advantage that is almost impossible to copy.

All in all, The Lean Manager was an enjoyable read. I'm not sure how many of themes adapt perfectly to software engineering, but definitely food for thought!