Consistent Hashing is a hashing scheme devised to solve this problem. Adding / removing nodes does not require redistributing the entire key space - in fact only K/N (K = number of keys, N = number of servers) keys need redistributing. It's a really simple scheme to understand. Firstly create a structure called a hash-ring; this is simply a circle with servers placed based on the hash of the server (from +N to -N). To find the server to distribute a request, hash the request parameters and walk clockwise around the circle until you meet the first server greater than your hash value. It's explained in much better detail here.
I've played with adding consistent hashing to the Haskell Redis libraries to get client-side sharding. The below shows my first naïve attempt. It's distributing using a SHA1 library to do the hashing and also relies on a few bits and bobs from redis-haskell (my fork will eventually roll up the consistent hashing into this api)
This implementation was helped by the Ruby and Python implementations that both demonstrated how simple the algorithm is.
So how fast is this algorithm? Not that it's engineered for speed or anything, but it's interesting to see. Enter Criterion which
provides a powerful but simple way to measure the performance of Haskell code..
Getting this installed using cabal involves a small amount of jiggery-pokery due to the reliance on the GTK libraries. At least on Ubuntu these can be installed with
sudo apt-get install libghc6-gtk-dev
. After that it is as simple as cabal install criterion
. Using criterion simply involves writing a bench mark function.Once you've gone this, compile with
ghc -O --make Foo
and away you go. Running the executable with "-h" gives you a list of available options. Easiest way to get started is ./test -t win -t win
and see the graphs that get started. To answer my question, it seems like looking up a server to distribute to takes just over 25ns. Sounds fast enough!