Tuesday, 23 September 2014

The Goal - The Match Bowl Experiment

Recently I've been re-reading The Goal by Eliyahu Goldratt. It's a great little book about manufacturing plants and how to manage them. It introduces the Theory of Constraints and that's relevant for all software developers for an understanding of why our development processes are structured the way they are.

Throughout the book it uses games and metaphors to illustrate faulty thinking about interconnected processes. In this post, I'd like to introduce Goldratt's dice game.

In the dice game there are a number of stations (representing part of a business process). The stations are arranged in a line with the output from one station becoming the input of the next. This arrangement represents a production line. In order to move items through the production line players take it in turn to roll a dice. The number rolled is the maximum you can move to the next station. For example, if you roll a six, but only have three items in your station, then you can only move three to the next station.

Let's imagine a really simple system with 8 stations that starts with 100 units in the left hand bowl with the aim of producing 100 units in the right hand bowl. Each bowl has the same capacity; it'll produce between 1-6 units each work step.

Based on the rules above, what's the flow of work going to look like through the system?

You might expect the flow to be smooth; each workstation has about 0 items at any one time because as soon as they are produced they move onto the next state. The reality is somewhat different.

The movie below shows the system processing a set of 100 units. The bottleneck (that is the work centre with the most items in it) is highlighted in yellow.

What's really interesting is the chaotic nature of the bottleneck. Random fluctuations mean that the bottleneck can appear anywhere. Balancing capacity across each item is clearly not the right answer.

All the diagrams in this page were built using the excellent Diagrams library for Haskell.

Saturday, 26 July 2014

Dynamic Time Warping

Dynamic Time Warping is nothing to do with the Rocky Horror show. It's a dynamic programming algorithm for aligning sequences of data that vary in terms of speed or time. Some typical applications of dynamic time warping are aligning fragments of speech for the purposes of performing speaker recognition.

In this post, we'll look at how simple the algorithm is and visualize some of the output you can get from aligning sequences. The complete code is on github and any flames, comments and critiques are most appreciated. You'll need the Haskell platform installed and a cabal install of the Codec.BMP package if you want to generate some images.

Given two vectors of some symbol a representing a time series (e.g. they both represent the output f(x) = y where x is some time, and y is an output signal) produce as output an array describing the cost. The output array gives the "alignment factor" of the two sequences at difference points in time.

We can find the best alignment path by simply walking back through the matrix from the top right, to the bottom left and taking the minimum choice at each turn. Using this we can visualize the best matching path for two exactly matching sequences. That's dead simple to code up:

Let's look at what happens if we try match the signal against itself and highlight the matching path in white.

The colour demonstrates how well the signals match. Blue highlights the best match (e.g. least cost) and hotter colours (such as red) highlight the worst cost. This pattern matches simple intuition. Since the sequences are exactly aligned, we'd expect a path from the top right to the bottom left, and that's what we get.

What happens if we try to match two completely random signals of integers? First off, let's try with the measure of the cost function being the absolute difference between the values (e.g. the cost function passed in is simply cost x y = abs (x - y)).

Cool patterns. Does this make sense? I think it does. The best match is at the beginning, before the sequences have diverged. As time goes on the match always gets worse because the cumulative absolute difference between the sequences is continuously increasing (albeit randomly).

What if we try to match a sequence against its opposite? Let's visualize that:

That looks odd. What's the intuition describing the image here? The best match of these two signals occurs in the middle (since they are opposite), this feels like this explains the center structure. By the time we reach the end of the signal (the top right) we've got the worst possible match and hence the brightest colour.

This implementation of the algorithm isn't all that practical. It's an O(N^2) algorithm and thus isn't suitable for signals with a high number of samples. However, it's fun to play with!

If you want to find out more about an efficient implementation of dynamic time warping then Fast DTW is a great place to start. As someone who enjoys reading papers, it's fantastic to see the code behind it, quoting from the link:

FastDTW is implemented in Java. If the JVM heap size is not large enough for the cost matrix to fit into memory, the implementation will automatically switch to an on-disk cost matrix. Alternate approaches evaluated in the papers listed below are also implemented: Sakoe-Chiba Band, Abstraction, Piecewise Dynamic Time Warping (PDTW). This is the original/official implementation used in the experiments described in the papers below.

Monday, 14 July 2014

The Stable Marriage Problem

It's been far too long since I wrote posts with any real code in, so in an attempt to get back into good habits I'm going to try to write a few more posts and read up a bit more about some algorithms and the history behind them.

The Stable Marriage Problem was originally described by David Gale and Lloyd Shapley in their 1962 paper, "College Admissions and the Stability of Marriage". They describe the problem as follows:

A certain community consists of n men and n women. Each person ranks those of the opposite sex in accordance with his or her preferences for a marriage partner. We seek a satisfactory way of marrying off all member of the community. We call a set of marriage unstable if under it there are a man and a woman who are not married to each other, but prefer each other to their actual mates.

Gale and Shapley shows that for any pattern of preferences it's possible to find a stable set of marriages.

On its own, this doesn't sound very interesting. However, bringing together resources is an important economic principle and this work formed part of the puzzle of Cooperative Game Theory and Shapley was jointly awarded the Nobel Prize for economics in 2012.

So how does the algorithm for Stable Marriages work?

Let's start by defining the problem. Given two lists of preferences, find the match such that there is no unstable match (that is two pairs that would cooperatively trade partners to make each other better off). The only constraint the types have is that they have is that they are equatable. This isn't the ideal representation (to put it mildly) in a strongly typed language (it doesn't enforce any invariants about the structure of the lists), but it's probably the simplest representation for explaining the algorithm.

stableMatch :: (Eq m, Eq w) => [(m,[w])] -> [(w,[m])] -> [(m,w)]

The algorithm continues whilst there are any unmarried men. If there are no unmarried men, then the algorithm terminates.

  stableMatch :: (Eq m, Eq w) => [(m,[w])] -> [(w,[m])] -> [(m,w)]
  stableMatch ms ws = stableMatch' []
      stableMatch' ps = case unmarried ms ps of
        Just unmarriedMan  -> stableMatch' (findMatch unmarriedMan ws ps)
        Nothing            -> ps

  unmarried :: Eq m => [(m,[w])] -> [(m,w)] -> Maybe (m,[w])
  unmarried ms ps = find (\(m,_) -> m `notElem` engagedMen) ms
      engagedMen = map fst ps

If there is at least one unmarried man, then we need to find a match. We do this by proposing to each of his preferences in turn. If his first preference is not engaged, then we propose. Otherwise, if his potential partner is already engaged and would prefer him then this violates the stable marriage principle and we breakup the engagement and re-engage with our first choice.

findMatch :: (Eq m,Eq w) => (m,[w]) -> [(w,[m])] -> [(m,w)] -> [(m,w)]
  findMatch (m,w:rest) ws ps = case isEngaged w ps of
    -- w is already engaged to m' - is there a better match?
    Just m' -> if prefers (getPrefs ws w) m m'
               then engage (breakup m' ps) m w
               else findMatch (m,rest) ws ps
    -- can match with first choice
    Nothing -> engage ps m w

You can see the full code at Stable Marriage Problem. As always flames, comments and critiques gratefully received.