Thursday, 15 July 2010

Foreign Exchange Arbitrage

Arbitrage exploits a price difference between two or more markets to make money with zero risk.

Sporting bets are perhaps the easiest example. Given a sporting event with two possible outcomes you look for an opportunity where two book makers give slightly different odds and try to expoit it. For example, consider a case with two outcomes, say Federer vs. Nadal. One bookie offers odds of 5/2 on Federer, whereas another offers odds of 3/5 on Nadal. If I bet £32 on 5/2, and £68 at 3/5 then regardless of the outcome I'm guaranteed to make a little money (a minimum of 8% in this case. £32 at 5/2 odds gives me $112 and £68 at 3/5 gives me £108). Wikipedia's page on arbitrage betting explains it better than I can!

Foreign exchange (forex) markets are another opportunity for the arbitrageur. By finding mispriced currencies you can get money for nothing. For example I could transfer my money from GBP to USD via CHF and end up with a profit if someone got these exchange rates wrong! Now all I need to do is find a way of finding these opportunities and I'll be rich!



The exchange rates can be represented as a weighted graph with each node representing a currency and each directed edge representing the exchange rate. To find an arbitrage opportunity we need to find a path through the graph (starting and ending at the same node) where the product of edge weights is greater than 1.0. The shortest path is the best one, because we want to make as few trades as possible. The Floyd Warshall algorithm is an efficient algorithm for finding the shortest paths in a weighted graph.

In the graph below we can make a trade 1 -> 2 -> 4 -> 3 -> 1 and make a profit.



Floyd Warshall is an example of dynamic programming. A dynamic programming solution has three components (from The Algorithm Design Manual):

  1. A recurrence relation or recursive algorithm for expressing the answer
  2. Show that the recursive version is bounded by a small polynomial
  3. An order of evaluation for the reccurence that means you always have partial results available when you need them.


We can define the arbitrage problem recursively. First we label the nodes as being number from 1 to N. The shortestPath(i,j,k) gives the shortest path from i to j using only nodes 1 to k as intermediate nodes. The base case is shortestPath(i,j,0) where the value is simply the edge weight between i and j. The recursive relation is shortestPath(i,j,k) = min(shortestPath(i,j,k-1), shortestPath(i,k,k-1) + shortestPath(k,j,k-1). The recurrence relation just says that each new node that we consider only helps if there is a shortest path that goes through it.

In order to code this up in Haskell we'll need some representation of a graph. A graph is simply a collection of vertices and some edges. In order to make things easier I've said that each vertice should be an enum and have an integer representation.



I've needed two language extensions (multi parameter type classes and functional dependencies) which seems overkill! Functional dependencies is used in Graph a b | a -> b to state that b is uniquely defined by a. Any suggestions on the simpler solution would be greatly appreciated!

Dynamic programming solutions can be expressed very neatly in lazy functional programming languages like Haskell. See here for more information, but the basic idea is to define a data structure in terms of itself. If we initialize the data structure in the correct order, then all the previous results are available when they are needed. This makes for a concise solution to the problem.



The Floyd Warshall algorithm has been extended a little above to find the minimum solution and to provide path reconstruction information. Once we have the path reconstruction matrix we can work backwards to reconstruct the series of trades that lead to an arbitrage opportunity. There's only an opportunity if there is a loop from a starting currency that returns with a value greater than 1.



So, I've now got the ability to find an arbitrage path if one exists, now all I need to do is hook my computer up to a live feed of exchange rates, put a little money in and I'll be rich by the end of the day. Just in case it's not quite that easy, I thought I'd better run it on some test data!

There's a good source of test data at GAIN Capital that's available here. I downloaded January's historial data and after a while ended up with 1.3GB of data to sort though. The data is in a simple CSV format, with columns representing the ticket, currencies, data time, bid rate and ask rate. This gave me an opportunity to use Parsec which is apparently an industrial strength, monadic parser combinator library. Thanks to RWH Chapter 16 it was pretty simple to use and understand.



Now that I've got a way of reading in the historical data in, I simply need to convert this data into a form that the Floyd Warshall algorithm can use and look for paths that give a profit. I could have done that directly with the parser code, but I might want to reuse that again for some other cunning plan. Instead I'll convert to a simple map representation, with the keys being pairs of vertices and the values being the exchange rate between those values. The following code takes a list of foreign exchange records, converts them into exchanges and looks for arbitrage opportunities.



Assuming that I've understood the format properly (anyone know any good books to get this kind of basic information?), then running this code through the 21 million (!) ticker updates in January 2010 yields absolutely ZERO arbitrage opportunities. With my limited knowledge, I think this is because of the efficient-market hypothesis. Prices reflect the information available and the updates are incredibly quick (otherwise currency traders would be going bankrupt on a daily basis!).

Back to the drawing board with my money making schemes!



(image taken from Flickr)

Saturday, 12 June 2010

Orbit Simulator in Haskell

Uncle Bob recently wrote an article showing a simple orbital simulator in Clojure. Aside from the strange formatting, the code is pretty easy to follow, though there is large amounts of duplicate code (see vector.clj vs. position.clj). I guess the reasoning behind this is to try to stop mixing up units?

A phantom type is one way to avoid this duplication in Haskell. A phantom type is only ever used to construct types. Its sole purpose is for validation by the type checker. The example below uses phantom types to encode position, velocity and force. This means we have no code repetition for each of the vector operations.



When we do the physical calculations to move the objects around the phantom types do their job. As an example, if I try to add two unrelated vectors together I get a compile time error showing that I can't. Here's an example from a ghci session.

  Prelude Orbit> let a = Vec 0 0 :: Vec Force
Prelude Orbit> let b = Vec 1 1 :: Vec Position
Prelude Orbit> add a b

:1:6:
Couldn't match expected type `Force'
against inferred type `Position'
Expected type: Vec Force
Inferred type: Vec Position
In the second argument of `add', namely `b'
In the expression: add a b


The code to do the physical calculations is very simple. The record update syntax is new to me and means you can just change the selected fields in an object, rather than creating a new one and copying across all the fields (see the section on named fields).

I think collideAll looks substantially nicer than the Clojure implementation. From the comments on the blog, I suspect it still suffers from a period of inaccuracy if three objects collide simultaneously (they won't get merged in the first round, but almost certainly will in the second). nubBy is an incredibly useful function I hadn't seen before and it allows you to remove duplicates according to your own definition of equality.



The Clojure version has a bunch of unit tests (as you might expect given the author!). The tests are in the 1 + 1 = 2 style by which I mean give some inputs and validate an output. These are the sort of tests you might capture as part of a REPL transcript, but they give you very little assurance that the code is actually correct.

QuickCheck gives an incredibly powerful way of testing applications by making assertions about the code and challenging QuickCheck to falsify them with generated data. In the case of this simple physics model, we can make assertions like the total amount of energy in the system must be conserved and let QuickCheck worry about generating the cases where this doesn't hold.

In order to do this we first have to give QuickCheck a way of generator random objects to simulate. The Arbitrary type class defines generate, so we just need to make Object an instance of this type class and implement the method.

A few simple quick check properties are shown below. This asserts that for any given vector (apart from one with zero length), the magnitude is 1.0 (subject to a rounding error or two) and that for any given set of objects, the kinetic energy is conserved. This is a very simple example of how powerful QuickCheck can be.



Running Quick Check against this property shows that it holds for a large number of automatically generated test cases. This gives a strong hint that this code actually works.

The final task is to visualize this data. I used OpenGL again and knocked up a quick UI, the source code for which is here. The (terrible) video below gives an example of it in action. If anyone knows of a better way of capturing OpenGL I'd love to know. I'm currently using Istanbul which works, but brings my system to its knees.



The complete source for this is available on my git hub page. Any comments or suggested improvements greatly appreciated!

Finally, the company I work for is currently recruiting in Cambridge, UK. If you're the sort of person who is looking for challenges such as analysing petabytes of data, writing high performance servers , or developing the richest pure JavaScript interfaces imaginable (and a whole lot more in between) then drop me a note at jeff.foster AT acm.org and I'll give you more information. Oh, and if you're accepted you'll get your choice of an iPad or the latest Android phone.

Wednesday, 2 June 2010

Monte Carlo Methods and the World Cup

A Monte Carlo method is a method of solving problems by statistical sampling. A typical approach to solving a Monte-Carlo problem is (according to Wikipedia):

  1. Define a domain of possible inputs.
  2. Generate inputs randomly from the domain using a certain specified probability distribution.
  3. Perform a deterministic computation using the inputs.
  4. Aggregate the results of the individual computations into the final result.



One of the simplest example of a Monte-Carlo method is estimating pi. The approach below hurls darts at a 1x1 square. Some of these land within the unit circle, and some outside. By getting the ratio between the two we can estimate the value of pi.



The accuracy of calculating pi depends on the number of samples taken. As you can see below it's not until very large amounts of samples are taken that the number even begins to approach pi.


10 3.2
100 3.08
1000 3.204
10000 3.142
100000 3.14072
1000000 3.143136
10000000 3.1407732


We can also apply the Monte Carlo method to sporting events. The 2010 World Cup is almost upon us. The current Fifa World Rankings make Brazil or Spain the overwhelming favourities. The draw (PDF) has an element of randomness though, so it's possible that two strong teams could find themselves meeting earlier and thus one of them going home. A Monte Carlo simulation sounds like the ideal way of settling this. Or at least, it's going to be just as wildly inaccurate as the pundits predictions!



The World cup consists of two stages. The group stage, where four teams play each other and the top two advance and the knockout stage in which the remaining teams play until the final is reached.



I've defined a class to represent the model that will be used to simulate the matches. For this example, I've simply based things on the Fifa world Rankings, but there are much more sophisticated techniques that could be used (whether these are demonstrably any more accurate or not, I'm not sure!).

The basic implementation of this type class is incredibly simple. It uses the world rankings and simply compares these to get a result. In the event of a draw, the home team is assumed to win! This could obviously be substantially "improved".



To simulate randomness, I've assumed that each time can play up to 30% better or worse than their ranking would suggest. This means that Brazil will always beat New Zealand, whereas teams that are closer (England) may have some chance!

The code below simulates the world cup - hopefully it makes sense as I've tried to be as self-documenting as possible! rules gives the knockout stage structure - it keeps getting "folded in half" as teams play each other and hopefully accurately represents the real fixtures of the world cup.



So who's going to win the World Cup according to the simulation? I ran for 100K simulations and got the following results:

  • France (124)
  • Argentina (365)
  • Greece (7)
  • England (239)
  • USA (4)
  • Germany (577)
  • Serbia (1)
  • Netherlands (3706)
  • Italy (2240)
  • Brazil (48025)
  • Portugal (5249)
  • Spain (39460)
  • Chile (3)

    Unsurprisingly, this still has Brazil coming up winners almost half the time. The number of times each time wins is mostly related to the ranking (as you'd suspect). Changing the simulation slightly so that teams draw in the group stage if they have a rating within 25 points changes the results a little, but they are still broadly in line with the above.

    Still, doesn't matter what the simulation says! England will still march-forward past the group stage and then lose in "unlucky" circumstances (penalties, that goal, penalties again).