Sunday 29 August 2010

Speeding up the Ants program

In my previous post, I looked at implementing Ants in Haskell. As comments on the blog indicated, things started off OK, but performance rapidly got worse. This is the first time I've had this kind of problem before, so I thought I'd document how I started to solve it.

Haskell has a huge range of performance testing tools to help find out what goes wrong. Real World Haskell has a greater chapter on "Profiling and Optimization" that helps identify the tools available.

The first tool in the chain is simply to run the application and collect some basic statistics about the runtime. You can do this with the command ./AntsVis +RTS -sstderr. Anything after the "+RTS" is a Haskell runtime parameter. This produced the following output:


1,835,837,744 bytes allocated in the heap
328,944,448 bytes copied during GC
2,908,728 bytes maximum residency (25 sample(s))
142,056 bytes maximum slop
9 MB total memory in use (1 MB lost due to fragmentation)

Generation 0: 3483 collections, 0 parallel, 1.54s, 1.54s elapsed
Generation 1: 25 collections, 0 parallel, 0.09s, 0.07s elapsed

INIT time 0.00s ( 0.00s elapsed)
MUT time 3.04s ( 3.13s elapsed)
GC time 1.63s ( 1.61s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 4.67s ( 4.74s elapsed)

%GC time 34.9% (34.0% elapsed)

Alloc rate 603,893,994 bytes per MUT second

Productivity 65.1% of total user, 64.2% of total elapsed


Ouch... The program spent 35% of the time doing garbage collection and only 65% of the time doing useful stuff. It also allocated 603MB of memory / second! That's clearly not acceptable and I'm definitely allocating more memory than I should!

I lack the intuition to understand where the problems are from looking at the code, so time to break out the profiling program. GHC gives quite a few compiler options, but for profiling these seem to be the important ones:
  • -prof - Enables profiling
  • -caf-all - Constant Applicative form for all top-level items (constant costs, one for each module.)
  • -auto-all - Cost-centre analysis for every top-level function
After you compile with those functions, you can run the program again with ./AntsVis +RTS -hc -p to get heap information together with profiling. That gives me the following picture. Flat things are good, things going up at a steep gradient and bad.

First Run of the Profiling Tool
First Run of the Profiling Tool

As you might be able to see from the image (but click to see the full one) the finger is pointed squarely at the chain of functions involving updateTVar and evaporate, the code for which is shown below:



So how come this tiny inoffensive lump of code is allocating a fuck-ton of memory? The blame lies solely in the record update function (c {pheromone = pheromone c * evapRate}). Laziness means that this isn't evaluated fully, which means it keeps a reference to the previous value of the cell. I'm not interested in the value, but because there is a reference to it, the runtime can't though the value away. Time for some strictness annotations. Note that I've also done a few refactorings (the World data type was just a wrapper around Vector, so I got rid of the wrapper, and typing pheromone in 100 times was driving me nuts).



The seq primitive is of type a -> b -> b and evaluates the first argument to head-normal form and returns the second. Hopefully, this should force the evaluation of the cells and free the memory up immediately. Running with the profiler again gives a much better picture:
A little bit of strictness added

Much better! Maximum memory usage is significantly down and when I run it my CPU usage is significantly down. seq isn't a panacea though. Earlier attempts at adding bang patterns everywhere led me astray - if you make everything strict, then you can get unintended consequences (I was having problems with fromJust being called on Nothing - not entirely sure why). I think I'll stick to putting seq strict annotations based on the information from the profiling tools, at least I until I have a better intuition for how it will affect the final result.

Updates thanks to very helpful comments from augustss, don and gtllama!

seq is not the right way to achieve the strictness, see comments from Don Stewart about funbox-strict-fields and ! patterns for strictness.

I've removed the Criterion graphs as I don't think the way I generated them was reliable.  I believe my use of "seq" was completely bogus and laziness bit me. The faster times almost certainly came from not evaluating the expression correctly once seq was in place.  Rather embarrassing!. From the comments, "Computing a value of type STM () does not execute it. STM code is only executed if it "meets up with" an 'atomically' that is executed."

I also go to the bottom of the slowness updating. Most definitely not an STM bug. The problem occured because I was trying to do a single STM transaction to do evaporation - this meant that evaporation could only succeed if nothing else wrote over the pheromones used in the evaporation.  Since many ants are marching and dropping pheromones this seems to mean that this transaction would need to be retried many times.  Switching this over to use a sequence of atomic operations (one per cell) removed this problem. Performance is now consistent and when I run with profiling I get a flat heap output as shown below:

Flat as a Pancake
This was definitely the most complicated Haskell program I've attempted so far, but also the one I've learnt the most from!  Probably still some more mistakes in the code to find, but I've pushed the latest code to the repository.

Friday 27 August 2010

Ants and Haskell

Software Transactional Memory (STM) is a concurrency control mechanism designed to simplify programming for shared memory computers. Beautiful Code contains a great introduction to the concepts of STM in the Haskell language

In most languages, locks and condition variables are the main mechanisms for controlling access, but these are notoriously hard to get right. Java Concurrency in Practice is a good read to understand just how many ways there are to shoot yourself in the foot (too few locks, too many locks, wrong locks, race conditions, wrong order, error conditions, deadlock, livelock, live stock, brain explosion). STM simplifies shared memory programming by providing database like semantics for changing memory. Reads/Writes to shared memory happen within a transaction - each memory access appears to happens in isolation from the others and appears atomically to observers. If a transaction conflict with another, then one of the transactions is retried. An implementation typically records the memory accesses somehow and then can decide whether there was a conflict. Languages that restrict mutability (like Clojure and Haskell) have a significantly simpler implementation than imperative languages such as C/C++.

Composability is another advantage for STM. For example, take java.util.Hashtable - what if you want to do an insert/delete as a single atomic operation and only make the contents visible to other threads once finished? As the original design didn't do this, you're on your own. In contrast STM composes well.

Both Clojure and Haskell feature support for STM. The canonical example in Clojure is Ants.clj that demonstrates STM via a simple simulation of foraging ants (see also Flocking about with Clojure). As a learning exercise I thought it'd be neat to try to convert this over to Haskell!

Ants in Haskell

To model the ants world, I use the following data structures. Transactional variables (TVars) are used to hold a reference to a mutable variable. For the Ants simulation, I use a Vector of TCell's to represent the ants world.



TVars can only be modified within the STM context. The key thing is that the only way to mutate transactional variables is from within the STM monad. To fiddle with the variables within TVar you can use newTVar, readTVar and writeTVar. Oddly there didn't seem to be a primitive operation to update a TVar based on its current value. updateTVar updates the TVar by applying a function to the value inside.



check verifies that a condition is true and if it isn't true then the transaction is retried. The key point is that the transaction is only retried when there's a reason to do so (e.g. memory read/write) so you aren't just heating the CPU whilst the condition is being validated. As an example, when we move an ant forward, we want to check that there is not an ant in the way. If there is an ant in the way, we'll wait till the coast is clear before moving.



At some point you have to run your STM actions. atomically runs the STM transactions from the IO monad (e.g. the top-level) and returns the result. A very important point is that you want your actions to be in the STM monad as much as possible. If all of your functions are run within the IO monad then you lose the composability aspect. The pattern to use is make everything use the STM monad and glue together randomly and you won't have a threading problem (you still have other problems though, it's not magic).

The Clojure code used agents to represent each ant. I'm not sure what the most idiomatic translation to Haskell is, but I spawned a thread for each ant using Control.Concurrent and forkIO. Haskell threads are incredibly light-weight so spawning even thousands of them is not a problem. Each ant thread simply evaluates the behaviour, moves, sleeps and repeats.

The rest of the code is more or less a direction translation from the Clojure. It's pretty verbose so I won't bother posting it here, but the full code is on my github page. You should be able to compile it with ghc -lglut --make -main-is AntsVis AntsVis.hs. Any hints on how to make it suck less appreciated!

Performance seems very good with the default compiler options, I'm able to run with 100+ ant agents all running concurrently.  The programming is very simple and once I'd found out and added the appropriate check logic into move everything worked properly.

Hurrah for simple concurrent programming!

(update 30/8/2010 - after finding out the performance sucked after a while with a few more ants than I'd tested with I looked at speeding up the Ants program).

Tuesday 17 August 2010

Freebasing with Haskell

Freebase is a collection of structured data, queryable via a powerful API and recently acquired by Google.

Data is categorized according to different types. A Topic represents a thing (such as a physical entity, a substance or a building). Each Topic is associated with a number of Types, for example the Salvador Dali topic might be associated with Painting and Surrealism. A group of related Types are organized into a Domain. For example, the books domain contains types for book, literature subject and publisher. Domains, Types and Properties are further organized into namespaces which can be thought of top-level containers to reference items. For example, the /en namespace contains human-readable IDs and URLs for popular topics. Similarly, the /wikipedia/en namespace gives a key that can be used to formulate a URL representing the corresponding article on Wikipedia. This extra structure allows precise concepts to be expressed, with no ambiguity.

FreeBase has a number of web services you can use to interrogate the database. The web services accept and return JSON. The most basic services just give you information about the status and version in JSON format. The Text.JSON package provides a method for reading / writing JSON in Haskell. This, coupled with Network.HTTP, makes making basic requests very simple.



More advanced requests use the MQL to express queries to Freebase. The underlying database is best thought of as a directed graph of nodes and relationships. Each node has a unique identifier and a record of who created the node. A node has a number of outgoing edges which are either relationships between other nodes of a primitive value. For example the /en/iggy_pop node might be linked to the /en/the_passenger/ via the >/music/album/artist property. Relationships between nodes can have property values associated with them, for example /en/the_passenger might be linked to /music/track/length with 4:44.

The Query Editor provides a way of running these queries in a web browser and this, together with the Schema Explorer allow you to get explore the data available in Freebase. Queries are expressed as JSON and consist of a JSON object from which the blanks are filled in. As an example, if I submit the following query with a blank array, then the returned value has the null filled in with the correct details (apparently Indiana Jones was released 23rd May 1984).


{
query: {
type: "/film/film"
name: 'Indiana Jones and the Temple of Doom',
initial_release_date: null
}
}


The Freebase documentation gives an example of a basic web app built using PHP and I thought it'd be a good learning exercise to convert this over to a functional programming language. Haskell has quite a few web frameworks, including Snap, HappStack and Yesod.

Yesod had the strangest name, so it seemed like the logical choice. Yesod uses some extensions to Haskell, Type Families, quasi-quoting and Template Haskell. Quasi-quoting and TH are particular exciting as they allow a HAML like syntax to be used as statically compiled Haskell.

Each Yesod application has a site type that is passed to all functions and contains arguments applicable to the whole site, such as database connections and global settings. URLs are handled by a routing table which, again, is statically compiled and means you can not create an invalid internal link. Neat!

For the basic album lister app, we define three really simple routes. A home page, a URL to call to get the albums, and a link to the static files. Note that the static files is again checked at compile time. If the relevant JS/CSS files don't exist, then the application fails to compile!

The getHomeR route handler (which ends in capital R by convention) simpler just renders a Hamlet template.



A little bit of JavaScript in script.js makes an Ajax call to retrieve the list of bands.



And the basic example from the MQLRead documentation is converted over. I've only spent a few hours with Yesod but I'm definitely impressed so far, good documentation and easy to get something simple done! The complete code is on my github page

Friday 13 August 2010

A Brief History of Java

In 1995, Sun released the The Java programming language as a part of a broader strategy known as the Java platform. The "write once, run anywhere" (WORA) motto initially promised to make Java the everywhere language, running on everything from wrist watches to cell phones and laptops to supercomputers.

JBlend Watch

The initial reception of Java was mixed. For every Java sucks, an evangelist promised that it would change the world, and Java powered toasters were just around the corner.

As Java evolved, it accrued more baggage as it fought to keep WORA. Deprecated methods sprang up everywhere, but Sun had to keep these features in place to provide backwards compatibility. The java.util.DateTime package became synonymous with dysfunctional design and bad naming conventions were fixed for life (is it size or is it length?).

Microsoft's .NET began to encroach on Java's domain. Microsoft's team introduced delegates, a type-safe function pointer that makes event handling considerably simpler. Java needed a competitor, and fast, so in version 1.1, Java grew inner classes, a way of achieving a similar effect, but in a more limited, laboured fashion. A Java white-paper concluded "Bound method references are simply unnecessary.... They detract from the simplicity and unity of the Java language". Meanwhile, efforts continue to this day for Java to adopt bound method references.

When Java reached version 1.4, Sun decided that a new approach was required to compete with Microsoft's .NET strategy. Sun thought long and hard and branded version 1.4 as "5" in an attempt to move ahead of .NET 2.0

This was accompanied by a decision to implement generics, a way of achieving additional type safety. Unfortunately, type-safety came at the cost of typing. As engineers adopted generics they were often heard cursing as they bashed out yet another List<Foo> foos = new ArrayList<Foo> statement.



Universities quickly adopted Java; no longer did students have to learn the intricacies of manual memory management and pointers, instead they could rely on Java to do the heavy lifting and focus on solving problems. Unfortunately, this led to the production of a league of developers known as the "Patternistas" who only had a hammer and everything was a nail. Under their leadership, Java naming conventions became increasingly ridiculous. When class names such as RequestProcessorFactoryFactory became common place some developers began to question the wisdom of the infinite tower of abstraction.

As developers realized they were just shuffling thousands of lines of code around each day, they needed a word to justify their existence. That word was refactoring. The patternistas rejoiced; not only could they apply factory factories, singletons and visitors to solve problems, but they could repeatedly change their mind and justify it with a buzzword!

A whole industry evolved to satisfy the patternistas. RSI injuries were becoming increasingly common place amongst Java veterans so a new breed of development environment was built. IntelliJ and Eclipse were built with the aim of minimizing the damage to developers. Advanced code completion and refactoring meant that developers merely had to press key combinations instead of typing in verbose code constructs for missing language features.

If all you have is a hammer, everything looks like a nail

Java began the push for the Enterprise by employing some of the leading architecture astronauts to come up with a paradigm-shifting, synergistic approach for empowering enterprise software. The result was nothing short of a revolution; beans were born. Beans are apparently a server-side component architecture for the modular constructor of enterprise applications.

Having softened up resistance to angle brackets through the use of generics, Java moved forward and jumped on the XML bandwagon. By using XML, developers were able to express concise concepts as huge, verbose angular nightmares. This has the advantage that XML files (unlike other files) can be easily read by other computers. The small price of being unreadable for humans was felt to be worth paying. Services such as Ant and JBoss led the way with executable XML and powerful deployment descriptors.

Meanwhile, in a galaxy far away from architecture astronauts, a new breed of programmer decided that getting shit done was more important than typing shit all day long. This led to the birth of frameworks such as Rails which are designed to get out-of-the-way and let you solve problems, an approach popularized as convention over configuration. Rails received positive feedback and at least some former Java addicts kicked the habit for Ruby. The first signs of Java's grip loosening were becoming apparent.

In August 2006 the Java 7 project began. Many developers are pushing for a feature known as lambda expressions that would undoubtably simplify many of the common coding tasks that Java makes so painful. Unfortunately, four years later the Java committee is still arguing over the nuances of this feature and it's possible that this will be dropped. The lack of movement on Java 7 has led to the birth of new, sexier languages such as Clojure and Scala, designed to run within the Java ecosystem, but without the need for the language itself.

The final nail in the coffin started being hammered in April 2009 when Oracle announced plans to acquire Sun. Headed by "Larry, Prince of Darkness", Oracle is an acquisition machine that specializes in Enterprise Software and making money. When Oracle's lawyers uncovered a set of software patents, they picked a big target and started a fight. Targets come no bigger than Google (a leading advertising organization), so Oracle's lawyers pounced and battle has commenced.

Where does this leave Java? From its humble beginnings 15 years ago, Java has risen to the top of the pile of popular programming languages under Sun's stewardship. Under Oracle, it's unclear whether the next version of Java will ever get released, let alone contain the features developers crave. Is this the beginning of the end for Java?

Saturday 7 August 2010

Sniffing out Pacman

How would you write the AI for ghosts in Pac-Man? Your first thought is probably to centralize the logic in a ghost object/function and go from there. This approach has quite a few traps, and a lot of complexity, for example how do you deal with dead-ends? Eventually the algorithm is going to get pretty complicated (something like A* search)

The excellent paper Programming Anti-Objects shows a very simple way to solve this problem with a bit of lateral thinking. Instead of centralizing the logic for ghost behaviour, why not distribute it? Reconceptualizing the problem in this way brings the background objects to the fore - the behaviour does not live in the ghost; it lives in the background tiles. The simple approach described in the paper is based on diffusion - each target has a scent that is distributed via the background tiles. Complex behaviour can emerge because some tiles distribute the scent (paths) whereas some block scent (obstacles).

I've represented the tiles as a map from Point to a list of Agents. The list is really a stack where the top-most agent is the only one that really matters. I've assumed that the tiles are arranged in a square, hence the one size variable and I keep the current location of the pursuers and goal handy.



The algorithm consists of two stages; diffusing the scent over the grid followed by updating the pursuers using a simple hill-climbing approach (e.g. move to the surrounding tile with the highest scent).

I've looked at the diffusion algorithm before in Barely Functional Fluid Dynamics, but I implemented it in an imperative style that makes it complicated to understand. This time around, I looked at a purely functional solution. Calculating the diffusion for any cell is a matter of doing some simple calculations on he four cells around it (also known as the Von-Neumann neighbourhood). The equation below is taken from the paper:

Diffusion Equation from Colloborative Diffusion - Programming AntiObjects

One way to solve this in Haskell would be to use some mutable state and just write some imperative-style code to mutate it in place. This would probably give the best performance, but at the cost of making the code destructive and difficult to follow. Alternatively, we can express the whole thing functionally. When I looked at the Floyd Warshall algorithm, it made use of dynamic programming and in particular defining a data structure recursively. We can use a similar trick to define updates to the grid in terms of previously calculated values.

Grid Diffusion Image

In the image above, if we are calculating the diffusion value for the red-tile, then we can reference the "new" diffused values for the green tiles, but we still must use the original values for the blue tiles.



Ignoring the update pursuers function, we update the board by constructing a map based on the previous environment and itself.

Updating the pursuers is simple, just find the nearest path cell with the highest scent and move to it. Putting this all together and we can run some little tests. The example below shows a pursuer (in green) solving a little maze. It takes a while to get going whilst the scent is diffused, but once it is it homes in on a solution very quickly.



As explained in the paper, collaborative behaviour emerges because pursuers block the scent from the goal. This means if you have multiple paths to the goal, the pursuers will each choose a different route purely because once one pursuer is on a route it blocks out the scent for those that follow. In the example scenario below, the agents will never take the same route and will always ensure there's no escape for the target.



The code for this is available on my github page together with a simple GUI that allows you to build an environment and run the simulation.