Saturday 12 June 2010

Orbit Simulator in Haskell

Uncle Bob recently wrote an article showing a simple orbital simulator in Clojure. Aside from the strange formatting, the code is pretty easy to follow, though there is large amounts of duplicate code (see vector.clj vs. position.clj). I guess the reasoning behind this is to try to stop mixing up units?

A phantom type is one way to avoid this duplication in Haskell. A phantom type is only ever used to construct types. Its sole purpose is for validation by the type checker. The example below uses phantom types to encode position, velocity and force. This means we have no code repetition for each of the vector operations.

When we do the physical calculations to move the objects around the phantom types do their job. As an example, if I try to add two unrelated vectors together I get a compile time error showing that I can't. Here's an example from a ghci session.

  Prelude Orbit> let a = Vec 0 0 :: Vec Force
Prelude Orbit> let b = Vec 1 1 :: Vec Position
Prelude Orbit> add a b

Couldn't match expected type `Force'
against inferred type `Position'
Expected type: Vec Force
Inferred type: Vec Position
In the second argument of `add', namely `b'
In the expression: add a b

The code to do the physical calculations is very simple. The record update syntax is new to me and means you can just change the selected fields in an object, rather than creating a new one and copying across all the fields (see the section on named fields).

I think collideAll looks substantially nicer than the Clojure implementation. From the comments on the blog, I suspect it still suffers from a period of inaccuracy if three objects collide simultaneously (they won't get merged in the first round, but almost certainly will in the second). nubBy is an incredibly useful function I hadn't seen before and it allows you to remove duplicates according to your own definition of equality.

The Clojure version has a bunch of unit tests (as you might expect given the author!). The tests are in the 1 + 1 = 2 style by which I mean give some inputs and validate an output. These are the sort of tests you might capture as part of a REPL transcript, but they give you very little assurance that the code is actually correct.

QuickCheck gives an incredibly powerful way of testing applications by making assertions about the code and challenging QuickCheck to falsify them with generated data. In the case of this simple physics model, we can make assertions like the total amount of energy in the system must be conserved and let QuickCheck worry about generating the cases where this doesn't hold.

In order to do this we first have to give QuickCheck a way of generator random objects to simulate. The Arbitrary type class defines generate, so we just need to make Object an instance of this type class and implement the method.

A few simple quick check properties are shown below. This asserts that for any given vector (apart from one with zero length), the magnitude is 1.0 (subject to a rounding error or two) and that for any given set of objects, the kinetic energy is conserved. This is a very simple example of how powerful QuickCheck can be.

Running Quick Check against this property shows that it holds for a large number of automatically generated test cases. This gives a strong hint that this code actually works.

The final task is to visualize this data. I used OpenGL again and knocked up a quick UI, the source code for which is here. The (terrible) video below gives an example of it in action. If anyone knows of a better way of capturing OpenGL I'd love to know. I'm currently using Istanbul which works, but brings my system to its knees.

The complete source for this is available on my git hub page. Any comments or suggested improvements greatly appreciated!

Finally, the company I work for is currently recruiting in Cambridge, UK. If you're the sort of person who is looking for challenges such as analysing petabytes of data, writing high performance servers , or developing the richest pure JavaScript interfaces imaginable (and a whole lot more in between) then drop me a note at jeff.foster AT and I'll give you more information. Oh, and if you're accepted you'll get your choice of an iPad or the latest Android phone.

Wednesday 2 June 2010

Monte Carlo Methods and the World Cup

A Monte Carlo method is a method of solving problems by statistical sampling. A typical approach to solving a Monte-Carlo problem is (according to Wikipedia):

  1. Define a domain of possible inputs.
  2. Generate inputs randomly from the domain using a certain specified probability distribution.
  3. Perform a deterministic computation using the inputs.
  4. Aggregate the results of the individual computations into the final result.

One of the simplest example of a Monte-Carlo method is estimating pi. The approach below hurls darts at a 1x1 square. Some of these land within the unit circle, and some outside. By getting the ratio between the two we can estimate the value of pi.

The accuracy of calculating pi depends on the number of samples taken. As you can see below it's not until very large amounts of samples are taken that the number even begins to approach pi.

10 3.2
100 3.08
1000 3.204
10000 3.142
100000 3.14072
1000000 3.143136
10000000 3.1407732

We can also apply the Monte Carlo method to sporting events. The 2010 World Cup is almost upon us. The current Fifa World Rankings make Brazil or Spain the overwhelming favourities. The draw (PDF) has an element of randomness though, so it's possible that two strong teams could find themselves meeting earlier and thus one of them going home. A Monte Carlo simulation sounds like the ideal way of settling this. Or at least, it's going to be just as wildly inaccurate as the pundits predictions!

The World cup consists of two stages. The group stage, where four teams play each other and the top two advance and the knockout stage in which the remaining teams play until the final is reached.

I've defined a class to represent the model that will be used to simulate the matches. For this example, I've simply based things on the Fifa world Rankings, but there are much more sophisticated techniques that could be used (whether these are demonstrably any more accurate or not, I'm not sure!).

The basic implementation of this type class is incredibly simple. It uses the world rankings and simply compares these to get a result. In the event of a draw, the home team is assumed to win! This could obviously be substantially "improved".

To simulate randomness, I've assumed that each time can play up to 30% better or worse than their ranking would suggest. This means that Brazil will always beat New Zealand, whereas teams that are closer (England) may have some chance!

The code below simulates the world cup - hopefully it makes sense as I've tried to be as self-documenting as possible! rules gives the knockout stage structure - it keeps getting "folded in half" as teams play each other and hopefully accurately represents the real fixtures of the world cup.

So who's going to win the World Cup according to the simulation? I ran for 100K simulations and got the following results:

  • France (124)
  • Argentina (365)
  • Greece (7)
  • England (239)
  • USA (4)
  • Germany (577)
  • Serbia (1)
  • Netherlands (3706)
  • Italy (2240)
  • Brazil (48025)
  • Portugal (5249)
  • Spain (39460)
  • Chile (3)

    Unsurprisingly, this still has Brazil coming up winners almost half the time. The number of times each time wins is mostly related to the ranking (as you'd suspect). Changing the simulation slightly so that teams draw in the group stage if they have a rating within 25 points changes the results a little, but they are still broadly in line with the above.

    Still, doesn't matter what the simulation says! England will still march-forward past the group stage and then lose in "unlucky" circumstances (penalties, that goal, penalties again).