Wednesday 24 February 2010

Consistent Hashing with Haskell

Let's say you want to distribute a bunch of requests to a number of servers. The simplest way to do this would be to get the key for the request (say the URL), generate a numeric hash and modulo it against the number of servers. This means that 1/N of your requests are going to each of your N servers - job done! Next you realize you don't have enough capacity with the servers and you need to add another - now you find that most of the requests you made to server A are now going to server B. The only way to get back to where you were (with caches nicely warmed up and so on) is to redistribute all the requests again. D'oh!

Consistent Hashing is a hashing scheme devised to solve this problem. Adding / removing nodes does not require redistributing the entire key space - in fact only K/N (K = number of keys, N = number of servers) keys need redistributing. It's a really simple scheme to understand. Firstly create a structure called a hash-ring; this is simply a circle with servers placed based on the hash of the server (from +N to -N). To find the server to distribute a request, hash the request parameters and walk clockwise around the circle until you meet the first server greater than your hash value. It's explained in much better detail here.

I've played with adding consistent hashing to the Haskell Redis libraries to get client-side sharding. The below shows my first naïve attempt. It's distributing using a SHA1 library to do the hashing and also relies on a few bits and bobs from redis-haskell (my fork will eventually roll up the consistent hashing into this api)

This implementation was helped by the Ruby and Python implementations that both demonstrated how simple the algorithm is.

So how fast is this algorithm? Not that it's engineered for speed or anything, but it's interesting to see. Enter Criterion which provides a powerful but simple way to measure the performance of Haskell code..

Getting this installed using cabal involves a small amount of jiggery-pokery due to the reliance on the GTK libraries. At least on Ubuntu these can be installed with sudo apt-get install libghc6-gtk-dev. After that it is as simple as cabal install criterion. Using criterion simply involves writing a bench mark function.

Once you've gone this, compile with ghc -O --make Foo and away you go. Running the executable with "-h" gives you a list of available options. Easiest way to get started is ./test -t win -t win and see the graphs that get started. To answer my question, it seems like looking up a server to distribute to takes just over 25ns. Sounds fast enough!