Tuesday 17 August 2010

Freebasing with Haskell

Freebase is a collection of structured data, queryable via a powerful API and recently acquired by Google.

Data is categorized according to different types. A Topic represents a thing (such as a physical entity, a substance or a building). Each Topic is associated with a number of Types, for example the Salvador Dali topic might be associated with Painting and Surrealism. A group of related Types are organized into a Domain. For example, the books domain contains types for book, literature subject and publisher. Domains, Types and Properties are further organized into namespaces which can be thought of top-level containers to reference items. For example, the /en namespace contains human-readable IDs and URLs for popular topics. Similarly, the /wikipedia/en namespace gives a key that can be used to formulate a URL representing the corresponding article on Wikipedia. This extra structure allows precise concepts to be expressed, with no ambiguity.

FreeBase has a number of web services you can use to interrogate the database. The web services accept and return JSON. The most basic services just give you information about the status and version in JSON format. The Text.JSON package provides a method for reading / writing JSON in Haskell. This, coupled with Network.HTTP, makes making basic requests very simple.

More advanced requests use the MQL to express queries to Freebase. The underlying database is best thought of as a directed graph of nodes and relationships. Each node has a unique identifier and a record of who created the node. A node has a number of outgoing edges which are either relationships between other nodes of a primitive value. For example the /en/iggy_pop node might be linked to the /en/the_passenger/ via the >/music/album/artist property. Relationships between nodes can have property values associated with them, for example /en/the_passenger might be linked to /music/track/length with 4:44.

The Query Editor provides a way of running these queries in a web browser and this, together with the Schema Explorer allow you to get explore the data available in Freebase. Queries are expressed as JSON and consist of a JSON object from which the blanks are filled in. As an example, if I submit the following query with a blank array, then the returned value has the null filled in with the correct details (apparently Indiana Jones was released 23rd May 1984).

query: {
type: "/film/film"
name: 'Indiana Jones and the Temple of Doom',
initial_release_date: null

The Freebase documentation gives an example of a basic web app built using PHP and I thought it'd be a good learning exercise to convert this over to a functional programming language. Haskell has quite a few web frameworks, including Snap, HappStack and Yesod.

Yesod had the strangest name, so it seemed like the logical choice. Yesod uses some extensions to Haskell, Type Families, quasi-quoting and Template Haskell. Quasi-quoting and TH are particular exciting as they allow a HAML like syntax to be used as statically compiled Haskell.

Each Yesod application has a site type that is passed to all functions and contains arguments applicable to the whole site, such as database connections and global settings. URLs are handled by a routing table which, again, is statically compiled and means you can not create an invalid internal link. Neat!

For the basic album lister app, we define three really simple routes. A home page, a URL to call to get the albums, and a link to the static files. Note that the static files is again checked at compile time. If the relevant JS/CSS files don't exist, then the application fails to compile!

The getHomeR route handler (which ends in capital R by convention) simpler just renders a Hamlet template.

A little bit of JavaScript in script.js makes an Ajax call to retrieve the list of bands.

And the basic example from the MQLRead documentation is converted over. I've only spent a few hours with Yesod but I'm definitely impressed so far, good documentation and easy to get something simple done! The complete code is on my github page