Saturday, 26 July 2014

Dynamic Time Warping

Dynamic Time Warping is nothing to do with the Rocky Horror show. It's a dynamic programming algorithm for aligning sequences of data that vary in terms of speed or time. Some typical applications of dynamic time warping are aligning fragments of speech for the purposes of performing speaker recognition.

In this post, we'll look at how simple the algorithm is and visualize some of the output you can get from aligning sequences. The complete code is on github and any flames, comments and critiques are most appreciated. You'll need the Haskell platform installed and a cabal install of the Codec.BMP package if you want to generate some images.

Given two vectors of some symbol a representing a time series (e.g. they both represent the output f(x) = y where x is some time, and y is an output signal) produce as output an array describing the cost. The output array gives the "alignment factor" of the two sequences at difference points in time.



We can find the best alignment path by simply walking back through the matrix from the top right, to the bottom left and taking the minimum choice at each turn. Using this we can visualize the best matching path for two exactly matching sequences. That's dead simple to code up:



Let's look at what happens if we try match the signal against itself and highlight the matching path in white.


The colour demonstrates how well the signals match. Blue highlights the best match (e.g. least cost) and hotter colours (such as red) highlight the worst cost. This pattern matches simple intuition. Since the sequences are exactly aligned, we'd expect a path from the top right to the bottom left, and that's what we get.

What happens if we try to match two completely random signals of integers? First off, let's try with the measure of the cost function being the absolute difference between the values (e.g. the cost function passed in is simply cost x y = abs (x - y)).


Cool patterns. Does this make sense? I think it does. The best match is at the beginning, before the sequences have diverged. As time goes on the match always gets worse because the cumulative absolute difference between the sequences is continuously increasing (albeit randomly).

What if we try to match a sequence against its opposite? Let's visualize that:


That looks odd. What's the intuition describing the image here? The best match of these two signals occurs in the middle (since they are opposite), this feels like this explains the center structure. By the time we reach the end of the signal (the top right) we've got the worst possible match and hence the brightest colour.

This implementation of the algorithm isn't all that practical. It's an O(N^2) algorithm and thus isn't suitable for signals with a high number of samples. However, it's fun to play with!

If you want to find out more about an efficient implementation of dynamic time warping then Fast DTW is a great place to start. As someone who enjoys reading papers, it's fantastic to see the code behind it, quoting from the link:

FastDTW is implemented in Java. If the JVM heap size is not large enough for the cost matrix to fit into memory, the implementation will automatically switch to an on-disk cost matrix. Alternate approaches evaluated in the papers listed below are also implemented: Sakoe-Chiba Band, Abstraction, Piecewise Dynamic Time Warping (PDTW). This is the original/official implementation used in the experiments described in the papers below.