Wednesday, 2 June 2010

Monte Carlo Methods and the World Cup

A Monte Carlo method is a method of solving problems by statistical sampling. A typical approach to solving a Monte-Carlo problem is (according to Wikipedia):

  1. Define a domain of possible inputs.
  2. Generate inputs randomly from the domain using a certain specified probability distribution.
  3. Perform a deterministic computation using the inputs.
  4. Aggregate the results of the individual computations into the final result.

One of the simplest example of a Monte-Carlo method is estimating pi. The approach below hurls darts at a 1x1 square. Some of these land within the unit circle, and some outside. By getting the ratio between the two we can estimate the value of pi.

The accuracy of calculating pi depends on the number of samples taken. As you can see below it's not until very large amounts of samples are taken that the number even begins to approach pi.

10 3.2
100 3.08
1000 3.204
10000 3.142
100000 3.14072
1000000 3.143136
10000000 3.1407732

We can also apply the Monte Carlo method to sporting events. The 2010 World Cup is almost upon us. The current Fifa World Rankings make Brazil or Spain the overwhelming favourities. The draw (PDF) has an element of randomness though, so it's possible that two strong teams could find themselves meeting earlier and thus one of them going home. A Monte Carlo simulation sounds like the ideal way of settling this. Or at least, it's going to be just as wildly inaccurate as the pundits predictions!

The World cup consists of two stages. The group stage, where four teams play each other and the top two advance and the knockout stage in which the remaining teams play until the final is reached.

I've defined a class to represent the model that will be used to simulate the matches. For this example, I've simply based things on the Fifa world Rankings, but there are much more sophisticated techniques that could be used (whether these are demonstrably any more accurate or not, I'm not sure!).

The basic implementation of this type class is incredibly simple. It uses the world rankings and simply compares these to get a result. In the event of a draw, the home team is assumed to win! This could obviously be substantially "improved".

To simulate randomness, I've assumed that each time can play up to 30% better or worse than their ranking would suggest. This means that Brazil will always beat New Zealand, whereas teams that are closer (England) may have some chance!

The code below simulates the world cup - hopefully it makes sense as I've tried to be as self-documenting as possible! rules gives the knockout stage structure - it keeps getting "folded in half" as teams play each other and hopefully accurately represents the real fixtures of the world cup.

So who's going to win the World Cup according to the simulation? I ran for 100K simulations and got the following results:

  • France (124)
  • Argentina (365)
  • Greece (7)
  • England (239)
  • USA (4)
  • Germany (577)
  • Serbia (1)
  • Netherlands (3706)
  • Italy (2240)
  • Brazil (48025)
  • Portugal (5249)
  • Spain (39460)
  • Chile (3)

    Unsurprisingly, this still has Brazil coming up winners almost half the time. The number of times each time wins is mostly related to the ranking (as you'd suspect). Changing the simulation slightly so that teams draw in the group stage if they have a rating within 25 points changes the results a little, but they are still broadly in line with the above.

    Still, doesn't matter what the simulation says! England will still march-forward past the group stage and then lose in "unlucky" circumstances (penalties, that goal, penalties again).