Saturday, 7 November 2009

Haskell, YQL and a bit of JSON

One of the talks I enjoyed at StackOverflow Devdays was about YQL.

The Yahoo! Query Language is an expressive SQL-like language that lets you query, filter, and join data across Web services. With YQL, apps run faster with fewer lines of code and a smaller network footprint.

It provides a standard interface to a whole host of web services and, more importantly, it's extensible to support other data sources. The Data Tables web site contains more information about how to expose your data via YQL.

I'm still trying to learn Haskell so I thought I'd try to knock together a quick program to see how you'd make a basic query and process the results using Haskell. To make a web service call, I'll use Haskell Http and process the results using Text.JSON. Both of these are available to install using cabal.

To make a YQL query we need to point at the right URL, select the output format and URL encode the query text. I've fixed the output format as JSON as it's more light weight.

yqlurl :: String
yqlurl = ""

json :: String
json = "&format=json"

yqlRequest :: String -> IO (Result JSValue)
yqlRequest query = do
rsp <- simpleHTTP (getRequest (yqlurl ++ urlEncode query ++ json))
body <- (getResponseBody rsp)
return (decodeStrict body :: Result JSValue)

So now we have something we can play with in the interpreter and make queries with. The really nice property of YQL is being able to do joins with sub-selects. This helps avoids doing round-trips to the server and means less boilerplate code to join items together. For example, let's say we want to find the URLs of Haskell images from Flickr.

*Main> yqlRequest "desc"
-- Returns a description of how to search photos in flickr

*Main> yqlRequest "select * from where text=\"haskell\""
-- Find images where the text is Haskell

*Main> yqlRequest "select urls from where
photo_id in (select id from where text=\"haskell\")"
-- Find the URLs for images

That gives us raw JSON back, the next step is to process this into something relevant. The following YQL selects upcoming events in Cambridge.

select description from where woeid in
(select woeid from geo.places where text="Cambridge, UK")

woeid provides a way of getting the latitude and longitude of any place on earth. This is consistently used in the APIs so you can feed it in as a sub select. Very neat!

The goal of this strange task is simply to get a list of strings of the descriptions of events coming up in Cambridge. Firstly I defined a couple of helper functions. These feel really clumsy, so I'm 99% sure that there is a better way to do it, but I can't see it.

getField :: [String] -> JSValue -> JSValue
getField (x:xs) (JSObject j) = getField xs (fromJust (get_field j x))
getField [] j = j

toString :: JSValue -> String
toString (JSString x) = fromJSString x

So now all we need to do is hook in a couple of functions to drill down into the JSON, yank the description out, and bundle it into a list.

eventsInCambridge :: String
eventsInCambridge = "Select description from where
woeid in (select woeid from geo.places where text=\"Cambridge, UK\")"

getEventList = do
response <- yqlRequest eventsInCambridge
return (case response of
Ok value -> (processEvents (getField ["query","results","event"] value))
Error msg -> undefined)

processEvents :: JSValue -> [String]
processEvents (JSArray events) = map (toString .(getField ["description"])) events

And the output from this is a giant list of descriptions of the upcoming events in Cambridge. You can see the example data by clicking here.