Saturday 26 July 2014

Dynamic Time Warping

Dynamic Time Warping is nothing to do with the Rocky Horror show. It's a dynamic programming algorithm for aligning sequences of data that vary in terms of speed or time. Some typical applications of dynamic time warping are aligning fragments of speech for the purposes of performing speaker recognition.

In this post, we'll look at how simple the algorithm is and visualize some of the output you can get from aligning sequences. The complete code is on github and any flames, comments and critiques are most appreciated. You'll need the Haskell platform installed and a cabal install of the Codec.BMP package if you want to generate some images.

Given two vectors of some symbol a representing a time series (e.g. they both represent the output f(x) = y where x is some time, and y is an output signal) produce as output an array describing the cost. The output array gives the "alignment factor" of the two sequences at difference points in time.



We can find the best alignment path by simply walking back through the matrix from the top right, to the bottom left and taking the minimum choice at each turn. Using this we can visualize the best matching path for two exactly matching sequences. That's dead simple to code up:



Let's look at what happens if we try match the signal against itself and highlight the matching path in white.


The colour demonstrates how well the signals match. Blue highlights the best match (e.g. least cost) and hotter colours (such as red) highlight the worst cost. This pattern matches simple intuition. Since the sequences are exactly aligned, we'd expect a path from the top right to the bottom left, and that's what we get.

What happens if we try to match two completely random signals of integers? First off, let's try with the measure of the cost function being the absolute difference between the values (e.g. the cost function passed in is simply cost x y = abs (x - y)).


Cool patterns. Does this make sense? I think it does. The best match is at the beginning, before the sequences have diverged. As time goes on the match always gets worse because the cumulative absolute difference between the sequences is continuously increasing (albeit randomly).

What if we try to match a sequence against its opposite? Let's visualize that:


That looks odd. What's the intuition describing the image here? The best match of these two signals occurs in the middle (since they are opposite), this feels like this explains the center structure. By the time we reach the end of the signal (the top right) we've got the worst possible match and hence the brightest colour.

This implementation of the algorithm isn't all that practical. It's an O(N^2) algorithm and thus isn't suitable for signals with a high number of samples. However, it's fun to play with!

If you want to find out more about an efficient implementation of dynamic time warping then Fast DTW is a great place to start. As someone who enjoys reading papers, it's fantastic to see the code behind it, quoting from the link:

FastDTW is implemented in Java. If the JVM heap size is not large enough for the cost matrix to fit into memory, the implementation will automatically switch to an on-disk cost matrix. Alternate approaches evaluated in the papers listed below are also implemented: Sakoe-Chiba Band, Abstraction, Piecewise Dynamic Time Warping (PDTW). This is the original/official implementation used in the experiments described in the papers below.

Monday 14 July 2014

The Stable Marriage Problem

It's been far too long since I wrote posts with any real code in, so in an attempt to get back into good habits I'm going to try to write a few more posts and read up a bit more about some algorithms and the history behind them.

The Stable Marriage Problem was originally described by David Gale and Lloyd Shapley in their 1962 paper, "College Admissions and the Stability of Marriage". They describe the problem as follows:

A certain community consists of n men and n women. Each person ranks those of the opposite sex in accordance with his or her preferences for a marriage partner. We seek a satisfactory way of marrying off all member of the community. We call a set of marriage unstable if under it there are a man and a woman who are not married to each other, but prefer each other to their actual mates.

Gale and Shapley shows that for any pattern of preferences it's possible to find a stable set of marriages.

On its own, this doesn't sound very interesting. However, bringing together resources is an important economic principle and this work formed part of the puzzle of Cooperative Game Theory and Shapley was jointly awarded the Nobel Prize for economics in 2012.

So how does the algorithm for Stable Marriages work?

Let's start by defining the problem. Given two lists of preferences, find the match such that there is no unstable match (that is two pairs that would cooperatively trade partners to make each other better off). The only constraint the types have is that they have is that they are equatable. This isn't the ideal representation (to put it mildly) in a strongly typed language (it doesn't enforce any invariants about the structure of the lists), but it's probably the simplest representation for explaining the algorithm.

stableMatch :: (Eq m, Eq w) => [(m,[w])] -> [(w,[m])] -> [(m,w)]

The algorithm continues whilst there are any unmarried men. If there are no unmarried men, then the algorithm terminates.

  stableMatch :: (Eq m, Eq w) => [(m,[w])] -> [(w,[m])] -> [(m,w)]
  stableMatch ms ws = stableMatch' []
    where       
      stableMatch' ps = case unmarried ms ps of
        Just unmarriedMan  -> stableMatch' (findMatch unmarriedMan ws ps)
        Nothing            -> ps

  unmarried :: Eq m => [(m,[w])] -> [(m,w)] -> Maybe (m,[w])
  unmarried ms ps = find (\(m,_) -> m `notElem` engagedMen) ms
    where
      engagedMen = map fst ps

If there is at least one unmarried man, then we need to find a match. We do this by proposing to each of his preferences in turn. If his first preference is not engaged, then we propose. Otherwise, if his potential partner is already engaged and would prefer him then this violates the stable marriage principle and we breakup the engagement and re-engage with our first choice.

findMatch :: (Eq m,Eq w) => (m,[w]) -> [(w,[m])] -> [(m,w)] -> [(m,w)]
  findMatch (m,w:rest) ws ps = case isEngaged w ps of
      
    -- w is already engaged to m' - is there a better match?
    Just m' -> if prefers (getPrefs ws w) m m'
               then engage (breakup m' ps) m w
               else findMatch (m,rest) ws ps
                      
    -- can match with first choice
    Nothing -> engage ps m w

You can see the full code at Stable Marriage Problem. As always flames, comments and critiques gratefully received.