Tuesday 10 November 2009

Understanding "Do" Blocks

Just some notes as I thumb through Chapter 7 of Real World Haskell.

Haskell "do" notation is a way of writing code to sequence some actions. However, it's simply just syntactic sugar for two functions, >>= and >> that have the following types:

*Main> :t (>>=)
(>>=) :: (Monad m) => m a -> (a -> m b) -> m b

*Main> :t (>>)
(>>) :: (Monad m) => m a -> m b -> m b

>> performs the first action, discarding the result, and then the second. The result of the function is the value of the second action.

*Main System.Environment> getEnv "HOME"

*Main System.Environment> getEnv "USER"

*Main System.Environment> getEnv "HOME" >> getEnv "USER"

The result of chaining together the getEnv (String :: IO(String)) calls is to ignore the return value of the first action.

>>= takes the output of the first action, and feeds it into the second.

*Main System.Directory> canonicalizePath "foo"

*Main System.Directory> makeRelativeToCurrentDirectory "/home/haskell/foo"

*Main System.Directory> canonicalizePath "foo" >>= makeRelativeToCurrentDirectory

As well as sequencing actions using >> and >>= there are a whole family of functions to do with evaluating actions.

mapM :: (Monad m) => (a -> m b) -> [a] -> m [b] takes a list of actions and performs the action specified for each one (when evaluated, remember it's lazy!), collecting together the results.

*Main System.Environment> mapM getEnv ["HOME","USER"]

mapM_ :: (Monad m) => (a -> m b) -> [a] -> m () does exactly the same as mapM EXCEPT it discards the results. Useful for ensuring the side effects happen (for example, IO). Function names that end with a "M" are usually related to monads, and function names that end with an "_" typically discard their results.

Back to do blocks. Typically the final construct in a do block is a return statement. As I said previously return is like the opposite of <- - it takes a pure value and constructs an action out of it. Now that I understand a bit more Haskell the type definition of return makes more sense.

return :: (Monad m) => a -> m a

There's some more information about do notation on the Haskell wiki.

Saturday 7 November 2009

Haskell, YQL and a bit of JSON

One of the talks I enjoyed at StackOverflow Devdays was about YQL.

The Yahoo! Query Language is an expressive SQL-like language that lets you query, filter, and join data across Web services. With YQL, apps run faster with fewer lines of code and a smaller network footprint.

It provides a standard interface to a whole host of web services and, more importantly, it's extensible to support other data sources. The Data Tables web site contains more information about how to expose your data via YQL.

I'm still trying to learn Haskell so I thought I'd try to knock together a quick program to see how you'd make a basic query and process the results using Haskell. To make a web service call, I'll use Haskell Http and process the results using Text.JSON. Both of these are available to install using cabal.

To make a YQL query we need to point at the right URL, select the output format and URL encode the query text. I've fixed the output format as JSON as it's more light weight.

yqlurl :: String
yqlurl = "http://query.yahooapis.com/v1/public/yql?q="

json :: String
json = "&format=json"

yqlRequest :: String -> IO (Result JSValue)
yqlRequest query = do
rsp <- simpleHTTP (getRequest (yqlurl ++ urlEncode query ++ json))
body <- (getResponseBody rsp)
return (decodeStrict body :: Result JSValue)

So now we have something we can play with in the interpreter and make queries with. The really nice property of YQL is being able to do joins with sub-selects. This helps avoids doing round-trips to the server and means less boilerplate code to join items together. For example, let's say we want to find the URLs of Haskell images from Flickr.

*Main> yqlRequest "desc flickr.photos.search"
-- Returns a description of how to search photos in flickr

*Main> yqlRequest "select * from flickr.photos.search where text=\"haskell\""
-- Find images where the text is Haskell

*Main> yqlRequest "select urls from flickr.photos.info where
photo_id in (select id from flickr.photos.search where text=\"haskell\")"
-- Find the URLs for images

That gives us raw JSON back, the next step is to process this into something relevant. The following YQL selects upcoming events in Cambridge.

select description from upcoming.events where woeid in
(select woeid from geo.places where text="Cambridge, UK")

woeid provides a way of getting the latitude and longitude of any place on earth. This is consistently used in the APIs so you can feed it in as a sub select. Very neat!

The goal of this strange task is simply to get a list of strings of the descriptions of events coming up in Cambridge. Firstly I defined a couple of helper functions. These feel really clumsy, so I'm 99% sure that there is a better way to do it, but I can't see it.

getField :: [String] -> JSValue -> JSValue
getField (x:xs) (JSObject j) = getField xs (fromJust (get_field j x))
getField [] j = j

toString :: JSValue -> String
toString (JSString x) = fromJSString x

So now all we need to do is hook in a couple of functions to drill down into the JSON, yank the description out, and bundle it into a list.

eventsInCambridge :: String
eventsInCambridge = "Select description from upcoming.events where
woeid in (select woeid from geo.places where text=\"Cambridge, UK\")"

getEventList = do
response <- yqlRequest eventsInCambridge
return (case response of
Ok value -> (processEvents (getField ["query","results","event"] value))
Error msg -> undefined)

processEvents :: JSValue -> [String]
processEvents (JSArray events) = map (toString .(getField ["description"])) events

And the output from this is a giant list of descriptions of the upcoming events in Cambridge. You can see the example data by clicking here.