Wednesday 24 February 2010

Consistent Hashing with Haskell

Let's say you want to distribute a bunch of requests to a number of servers. The simplest way to do this would be to get the key for the request (say the URL), generate a numeric hash and modulo it against the number of servers. This means that 1/N of your requests are going to each of your N servers - job done! Next you realize you don't have enough capacity with the servers and you need to add another - now you find that most of the requests you made to server A are now going to server B. The only way to get back to where you were (with caches nicely warmed up and so on) is to redistribute all the requests again. D'oh!

Consistent Hashing is a hashing scheme devised to solve this problem. Adding / removing nodes does not require redistributing the entire key space - in fact only K/N (K = number of keys, N = number of servers) keys need redistributing. It's a really simple scheme to understand. Firstly create a structure called a hash-ring; this is simply a circle with servers placed based on the hash of the server (from +N to -N). To find the server to distribute a request, hash the request parameters and walk clockwise around the circle until you meet the first server greater than your hash value. It's explained in much better detail here.

I've played with adding consistent hashing to the Haskell Redis libraries to get client-side sharding. The below shows my first naïve attempt. It's distributing using a SHA1 library to do the hashing and also relies on a few bits and bobs from redis-haskell (my fork will eventually roll up the consistent hashing into this api)

This implementation was helped by the Ruby and Python implementations that both demonstrated how simple the algorithm is.

So how fast is this algorithm? Not that it's engineered for speed or anything, but it's interesting to see. Enter Criterion which provides a powerful but simple way to measure the performance of Haskell code..

Getting this installed using cabal involves a small amount of jiggery-pokery due to the reliance on the GTK libraries. At least on Ubuntu these can be installed with sudo apt-get install libghc6-gtk-dev. After that it is as simple as cabal install criterion. Using criterion simply involves writing a bench mark function.

Once you've gone this, compile with ghc -O --make Foo and away you go. Running the executable with "-h" gives you a list of available options. Easiest way to get started is ./test -t win -t win and see the graphs that get started. To answer my question, it seems like looking up a server to distribute to takes just over 25ns. Sounds fast enough!

Wednesday 10 February 2010

Organizing functional code for parallel execution or, foldl and foldr considered slightly harmful

Functional programming is often touted as being better for multi-core programming due to the lack of mutability. However, most functional programming languages have the linked list at the centre of their data-structures, and this doesn't lend itself to parallelism.

Guy Steele's paper, Organizing functional code for parallel execution or, foldl and foldr considered slightly harmful [PDF] gives a very readable presentation of designing code for performance.

It advocates using trees instead of lists as the primary data structure. Trees naturally decompose into a structure easily processed by Map/reduce.

The paper introduces "conc lists" which is a way of implementing lists in terms of trees. In comparison with cons lists there are some trade-offs. Cons lists have first / rest as O(1) operations whereas in a tree structure these involve traversing the tree to the left-most leaf. The big win is on functions that traverse the entire list (such as length / map / filter / append) as these can now be processed in logarithmic time because of the opportunity for parallelism.

For example, map and filter can now be defined in terms of map reduce as shown below.

(define (mapreduce f g id xs)
(cond ((null? xs) id)
(singleton? xs) (f item xs)))
(else (split xs (lambda (ys zs)
(g (mapreduce f g id ys)
(mapreduce f g id zs))))))

(define (map f xs)
(mapreduce (lambda (x) (list (f x))) append '() xs))

(define (length xs)
(mapreduce (lambda (x) 1) + 0 xs))

The split function takes a tree, applies a function to both halves and puts it back together again. This gives an opportunity for parallelism which isn't present when working with a normal cons list.

Conc lists reminds me of Clojure's persistent vectors implementation. To my knowledge Clojure currently doesn't take advantage of these, but it's nice to know it could!