## Monday 7 December 2009

### Implementing Klondike

A comment on my last post suggested using arrays would be far simpler to model the tableau (and also the foundation). And they were right!

`  data Index = A | B | C | D | E | F | G              deriving (Eq, Ord, Show, Enum, Ix)  data Slot = Slot       {        shown :: [Card]       , hidden :: [Card]       } deriving (Show,Eq)  type Tableau = Array Index Slot  type Foundation = Array Suit [Card]`

By deriving Index and Suit from the class `Ix` it becomes possible to use this to index items in an array.

To play Klondike we need to deal the deck out. I've assumed the deck is already shuffled.

`  newGame :: [Card] -> Game  newGame cards = Game d emptyFoundation t where      (t,d) = dealTableau cards  emptyFoundation :: Foundation  emptyFoundation = array (Clubs,Spades) [(s,[]) | s <- [Clubs .. Spades]]  dealTableau :: [Card] -> (Tableau,[Card])  dealTableau dk = ((array (A,G) [(A,(Slot [a] as))                                 ,(B,(Slot [b] bs))                                 ,(C,(Slot [c] cs))                                 ,(D,(Slot [d] ds))                                 ,(E,(Slot [e] es))                                 ,(F,(Slot [f] fs))                                 ,(G,(Slot [g] gs))]),                    rest) where       (a:as,h) = splitAt 1 dk      (b:bs,i) = splitAt 2 h      (c:cs,j) = splitAt 3 i      (d:ds,k) = splitAt 4 j      (e:es,l) = splitAt 5 k      (f:fs,m) = splitAt 6 l      (g:gs,rest) = splitAt 7 m`

My goal is to write a simple model for Klondike so that I can experiment with various playing strategies. The first stage in this is writing the rules of the games.

`  data Move = TurnDeck            | ToFoundation Index             | DeckTo Index            | DeckUp            | MoveCards Index Int Index            | GameOver              deriving (Show,Eq)  -- |Can we turn the deck?  turnDeck :: Game -> Bool  turnDeck = null . deck  -- |Is the game complete?  gameWon :: Game -> Bool  gameWon game = all empty (elems (tableau game)) && not (turnDeck game)     -- |Does the second card follow the first?  successor :: Card -> Card -> Bool  successor a b = value a /= King && alternateColors a b && follows a b  -- |Can the card move down from the deck to the given slot?  cardDown :: Card -> Slot -> Bool  cardDown card (Slot s _) | null s = value card == King                           | otherwise = successor card (head s)  -- |Can the card move to foundation?  cardUp :: Card -> Foundation -> Bool  cardUp (Card v suit) f | null cards = v == Ace                         | otherwise = x /=King && succ x == v                          where                           cards = f ! suit                           (Card x _) = head cards  -- |Can the card move from x to y?  slotMove :: Slot -> Slot -> Bool  slotMove (Slot from _) s | null from = False                           | otherwise = cardDown (head from) s`

Once we have a chunky load of predicates defined, we can put these together to get the list of moves possible from a given game state.

The `getMoves` function builds up a list of valid moves.

`  getMoves :: Game -> [Move]  getMoves g  = [DeckUp | (not.null) dk && cardUp (head dk) (foundation g)]                ++ [(DeckTo . fst) x | not.null \$ dk, x <- filter (cardDown (head dk) . snd) slots]                ++ [TurnDeck | not.null \$ dk]                 ++ [ToFoundation is | is <- cardsUp]                ++ slotMoves                 ++ [GameOver | gameWon g] where      dk = deck g      slots = assocs (tableau g)      cardsUp = map fst (filter (flip cardUpFromSlot (foundation g) . snd) slots)      slotMoves = [MoveCards (fst x) 1 (fst y) | x <- slots, y <- slots,                                                  fst x /= fst y && slotMove (snd x) (snd y)]`

There's also a corresponding `move :: Game -> Move -> Game` function that performs the given move and returns a new game state. It's a bit chunky to paste here (which probably indicates it could be improved!), but you can grab it from github.

Now that I have a basic model of the game, I just need to write a routine that picks the best move based on the current state. My current thoughts are just to sort the moves according to some criteria (e.g. moving a card up is always preferable to turning the deck), but initial attempts having been that good. Maybe I just need brute force!

## Sunday 6 December 2009

Following on from here I've made some progress on modelling the game of Klondike Solitaire in Haskell.

The code is really ugly and is definitely not idiomatic Haskell - if you can show me how to improve it then that would be much appreciated - it's the only way I'll learn!

There are several concepts in the game of Solitaire. There are:

• Tableau - A set of seven stacks of cards. Cards can be transferred between each stack as long according to some rules.
• Foundation - The foundation consists of four bases, each representing one suit of cards. The game finishes when all the cards have been placed in order (from Ace to King) onto the foundation.
• Deck - The cards not in the foundation or the tableau are in the deck.

Each item in the tableau has a number of shown cards and some hidden cards.

`  data Slot = Slot       {        shown :: [Card]       , hidden :: [Card]       }deriving (Show,Eq)  data Tableau = Tableau [Slot] deriving (Show)`

One problem I had was that I wanted to represent a Tableau as being exactly 7 slots. I could do this with a tuple, but then my 7 element tuple would need lots of special functions instead of just using the list functions. There are packages for fixed size lists, but I wanted to try and do something minimal.

Foundations are simpler to model. A foundation consists of four bases.

`  data Base = Base Suit [Card] deriving (Eq,Show)  data Foundation = Foundation Base Base Base Base                    deriving (Show)`

I think I'm missing something - how can I change the definition of Foundation so that each base is represented by a different suit?

Finally, a `Game` consists of all of the elements described above.

`  data Game = Game       {        deck :: [Card]      , foundation :: Foundation      , tableau :: Tableau      } deriving (Show)`

In order to make progress in a `Game`, a series of Move's must be made. The possible moves are enumerated below:

`  data Move = TurnDeck            | ToFoundation Slot             | DeckTo Slot            | DeckUp            | MoveCards Slot Int Slot             | MoveCard Slot Slot            | GameOver              deriving (Show,Eq)`

Once the basic data structures are in place (even if they can surely be improved), it's simple to sketch out the remaining functions.

`  -- Make a move  move :: Game -> Move -> Game  -- Enumerate all the possible moves from the given game state  getMoves :: Game -> [Move]  -- Play a game using the given playing function and return the list of moves  playGame :: Game -> (Game -> [Move] -> Move) -> [Move]  -- Replay the list of moves  replayMoves :: Game -> [Move] -> Game  replayMoves = foldl move`

My goal will be to write a function that picks the best move from the given choices of type `Game -> [Move] -> Move` and see how many games it solves. If any!

Before I go much further though, I need to spend some time working out how much more of the logic I can force inside the type system. For example, currently `Foundation` allows bases of the same suit; they should be different. Similarly, `Tableau` is just a list of slots. I know that if I just use `map` on this list to transform it, the list stays the same size, but it's not enforced by the type system (for example, I could use `concatMap` and change the size). Can I define that shown cards must always be lists of cards in alternating colours and successive values?

There's some more code on my git hub repository.

## Tuesday 1 December 2009

Rightly or wrongly, I found myself writing code like this:

`  foo :: [Bar] -> [Baz]  foo xs = if (any f xs) then [] else [Baz 4]`

This can be replaced with a neater list comprehension

`  [Baz 4 | any f xs]`

If the right hand side of the list comprehension evaluation to false, then you get the empty list, else you get the single item in the list.

Neat, though I've no idea whether it's idiomatic Haskell!

As my first attempt at writing a reasonably complex lump of code in Haskell, I thought I'd try and model a card game called Klondike Solitaire. Apparently Solitaire is complex enough that:

... the odds of winning a standard game of non-Thoughtful Klondike are currently unknown. It has been said that the inability for theoreticians to calculate these odds is "one of the embarrassments of applied mathematics

The first step is to try and model the playing cards themselves. Cards have both a Suit and a Value.

`  data Suit = Clubs            | Diamonds            | Hearts            | Spades              deriving (Eq,Enum,Show,Bounded)  data Value = Ace             | Two             | Three             | Four             | Five             | Six             | Seven             | Eight             | Nine             | Ten             | Jack             | Queen             | King             deriving (Eq,Enum,Show,Bounded)  data Card = Card Value Suit              deriving (Eq,Show)`

Suit and Value both derive Eq (for equality), Show (for printing), Enum (for successor and predecessor functions) and Bounded (because they are within bounds). We can now define a complete deck of cards with a simple list comprehension.

`  allCards :: [Card]  allCards = [Card x y | x <- [Ace .. King],  y <- [Clubs .. Spades]]`

Next there are a bunch of helper functions which seem like they'll be generally useful.

`  color :: Card -> Color  color (Card _ s) | s == Clubs || s == Spades = Black                   | otherwise = Red  value :: Card -> Value  value (Card x _) = x  -- |Are the two cards alternate colours?  alternateColors :: Card -> Card -> Bool  alternateColors a b = color a /= color b  -- |Does the second card follow the first?  follows :: Card -> Card -> Bool  follows (Card King _) _ = False  follows (Card v1 _) (Card v2 _) = succ v1 == v2`

`succ` gives the successor of a given function. I don't think I can define Card as an instance of Enum because, depending on the context, the successor could be one of multiple values (e.g. the successor to `Card Ace Spades` might be a `Card Two Spades` or a `Card Two Hearts` depending on the context.

The `-- |` syntax is an indicator for the Haddock documentation tool so that you can associate the comment with the declaration.

Next time, I'll look at defining the types for Klondike Solitaire!

## Tuesday 10 November 2009

### Understanding "Do" Blocks

Just some notes as I thumb through Chapter 7 of Real World Haskell.

Haskell "do" notation is a way of writing code to sequence some actions. However, it's simply just syntactic sugar for two functions, `>>=` and `>>` that have the following types:

`  *Main> :t (>>=)  (>>=) :: (Monad m) => m a -> (a -> m b) -> m b  *Main> :t (>>)  (>>) :: (Monad m) => m a -> m b -> m b`

`>>` performs the first action, discarding the result, and then the second. The result of the function is the value of the second action.

`  *Main System.Environment> getEnv "HOME"                   "/home/jfoster"    *Main System.Environment> getEnv "USER"       "jfoster"  *Main System.Environment> getEnv "HOME" >> getEnv "USER"  "jfoster"`

The result of chaining together the getEnv (`String :: IO(String)`) calls is to ignore the return value of the first action.

`>>=` takes the output of the first action, and feeds it into the second.

`  *Main System.Directory> canonicalizePath "foo"                                          "/home/haskell/foo"  *Main System.Directory> makeRelativeToCurrentDirectory "/home/haskell/foo"  "foo"  *Main System.Directory> canonicalizePath "foo" >>= makeRelativeToCurrentDirectory  "foo"`

As well as sequencing actions using `>>` and `>>=` there are a whole family of functions to do with evaluating actions.

`mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]` takes a list of actions and performs the action specified for each one (when evaluated, remember it's lazy!), collecting together the results.

`  *Main System.Environment> mapM getEnv ["HOME","USER"]  ["/home/jfoster","jfoster"]`

`mapM_ :: (Monad m) => (a -> m b) -> [a] -> m ()` does exactly the same as mapM EXCEPT it discards the results. Useful for ensuring the side effects happen (for example, IO). Function names that end with a "M" are usually related to monads, and function names that end with an "_" typically discard their results.

Back to do blocks. Typically the final construct in a do block is a return statement. As I said previously `return` is like the opposite of <- - it takes a pure value and constructs an action out of it. Now that I understand a bit more Haskell the type definition of return makes more sense.

`  return :: (Monad m) => a -> m a`

## Saturday 7 November 2009

### Haskell, YQL and a bit of JSON

One of the talks I enjoyed at StackOverflow Devdays was about YQL.

The Yahoo! Query Language is an expressive SQL-like language that lets you query, filter, and join data across Web services. With YQL, apps run faster with fewer lines of code and a smaller network footprint.

It provides a standard interface to a whole host of web services and, more importantly, it's extensible to support other data sources. The Data Tables web site contains more information about how to expose your data via YQL.

I'm still trying to learn Haskell so I thought I'd try to knock together a quick program to see how you'd make a basic query and process the results using Haskell. To make a web service call, I'll use Haskell Http and process the results using Text.JSON. Both of these are available to install using cabal.

To make a YQL query we need to point at the right URL, select the output format and URL encode the query text. I've fixed the output format as JSON as it's more light weight.

`  yqlurl :: String  yqlurl = "http://query.yahooapis.com/v1/public/yql?q="  json :: String  json = "&format=json"  yqlRequest :: String -> IO (Result JSValue)  yqlRequest query = do    rsp <- simpleHTTP (getRequest (yqlurl ++ urlEncode query ++ json))    body <- (getResponseBody rsp)    return (decodeStrict body :: Result JSValue)`

So now we have something we can play with in the interpreter and make queries with. The really nice property of YQL is being able to do joins with sub-selects. This helps avoids doing round-trips to the server and means less boilerplate code to join items together. For example, let's say we want to find the URLs of Haskell images from Flickr.

`   *Main> yqlRequest "desc flickr.photos.search"   -- Returns a description of how to search photos in flickr   *Main> yqlRequest "select * from flickr.photos.search where text=\"haskell\""   -- Find images where the text is Haskell   *Main> yqlRequest "select urls from flickr.photos.info where              photo_id in (select id from flickr.photos.search where text=\"haskell\")"   -- Find the URLs for images `

That gives us raw JSON back, the next step is to process this into something relevant. The following YQL selects upcoming events in Cambridge.

`  select description from upcoming.events where woeid in    (select woeid from geo.places where text="Cambridge, UK")`

woeid provides a way of getting the latitude and longitude of any place on earth. This is consistently used in the APIs so you can feed it in as a sub select. Very neat!

The goal of this strange task is simply to get a list of strings of the descriptions of events coming up in Cambridge. Firstly I defined a couple of helper functions. These feel really clumsy, so I'm 99% sure that there is a better way to do it, but I can't see it.

`  getField :: [String] -> JSValue -> JSValue  getField (x:xs) (JSObject j) = getField xs (fromJust (get_field j x))  getField [] j = j      toString :: JSValue -> String  toString (JSString x) = fromJSString x`

So now all we need to do is hook in a couple of functions to drill down into the JSON, yank the description out, and bundle it into a list.

`  eventsInCambridge :: String  eventsInCambridge = "Select description from upcoming.events where         woeid in (select woeid from geo.places where text=\"Cambridge, UK\")"  getEventList = do    response <- yqlRequest eventsInCambridge    return (case response of             Ok value -> (processEvents (getField ["query","results","event"] value))             Error msg -> undefined)    processEvents :: JSValue -> [String]  processEvents (JSArray events) = map (toString .(getField ["description"])) events`

And the output from this is a giant list of descriptions of the upcoming events in Cambridge. You can see the example data by clicking here.

## Friday 30 October 2009

### Cambridge Stack Overflow Dev Days

I've just got back from the Stack Overflow Dev Day. Definitely worth attending (and well worth the price).

The opening key note from Joel focused on the balance of features vs. choice. 37 Signals popularized the idea of "simple software" (illustrated by simple bug tracking software). However, FogBugz figures don't back this up the number of sales is proportional to the number of features.

I got the feeling that this talk was targeted at companies whose business it is to sell directly to the customers. In larger software corporations you aren't targeting real end-users, you are trying to convince a budget manager that your software is better than the competitions. I guess this leads to the explosion in features (e.g. Word vs. Word Perfect, Lotus Notes vs. Excel) and ends up with over complicated solutions that do everything adequately and nothing outstanding.

I thoroughly enjoyed the conference and will definitely be going next time. Also found out about another related event.
Next up was Christian Heilmann who is a "developer evangelist" for Yahoo!. The talk show cased some of Yahoo's developer tech. I was very impressed by YQL which provides a consistent SQL-like API for accessing a whole suite of web api's (not just Yahoo). It was very impressive to see mashups created in a few seconds and I'll definitely need to poke around the APIs. The term JSONP came up together with Caja (a safe subset of JavaScript) - both of which I'd never heard of but sound like things I should know.

Frank Stajano gave a good talk of the human aspects of security, focusing on social engineering and illustrating it with examples from The Real Hustle. It seemed slightly at odds with the rest of the talks, but it was interesting and well presented! And I got some new terminology, sock puppets, astro-turfing and Sybil's.

After lunch we had the fastest introduction possible to ASP.NET MVC. Steve Sanderson went through a few slides of the architecture (very similar to Ruby On Rails!) and then proceeded to build a simple file management application at breakneck speed. Very enjoyable presentation again. And ASP.NET MVC even has Linux support with MonoDevelop.

Remy Sharp gave a similarly fast paced presentation of JQuery. It was impressive to see JQuery in action and the simple steps to build a plugin were good to see. The presentation ended with a heroic attempt to "live-code" a tag cloud application into Twitter!

The penultimate talk was about Python by Michael Foord. I wasn't too keen on this presentation. There were statements like "Dynamic Programming makes it easy to test" and "Dynamic Programming means I type less" that I didn't agree with and there was no follow up to back it up. The presentation gave a quick overview of Python using Norvig's spelling corrector as an example.

Finally, Jeff Atwood gave a quick talk about Stack Overflow which was short on detail but definitely watch-able. Whatever has been written about this guy, he (and a couple of others) are responsible for a huge web site that is apparently the 895th biggest site on the whole web! It's been hugely successful so it was interesting to hear where the ideas came from and so on.

In conclusion, definitely worth going too and roll on the next one.

## Tuesday 20 October 2009

### Pretty Printing Basic JSON

JSON is an incredibly simple format. In fact, it's so simple that the entire spec can be placed on a business card.

It can be described in Haskell (all based on Real World Haskell - highly recommended!) very simply with an algebraic data type

`data JValue = JString String            | JNumber Double            | JBool Bool            | JNull            | JObject [(String, JValue)]            | JArray [JValue]              deriving (Eq, Ord, Show)`

This gives us all we need to be able to create JSON objects within ghci.

`  *Main> JBool True  JBool True  *Main> JObject [("foo", JString "bar"), ("baz", JString "boo")]   JObject [("foo",JString "bar"),("baz",JString "boo")]  *Main> JArray [JNumber 1,JNumber 2,JNumber 3]                                    JArray [JNumber 1.0,JNumber 2.0,JNumber 3.0]`

This isn't hugely useful at the moment - we could write functions that build up JObjects and the results, but at the moment we have no way to turn them into a readable lump of JSON.

Pretty Printing is the name given to an process that applies stylistic formatting to content.

This is actually a pretty complicated problem. One solution we could use is just to write functions for each type of JValue and go from there. The problem with that is that it wouldn't allow us to keep track of the contextual information. For example, if you are printing out Java code then you might want you { and } characters to align or your columns capped at a maximum length and so on.

Pretty Printing has a long history, from GRIND (used to print Lisp) to Eclipse.

For Haskell, there have been two papers which have focused on the neat design of a pretty printing library. The Design of a Pretty-printing Library and A Prettier Printer both take the same approach of creating a document structure representing the output. Real World Haskell presents the JSON pretty printer using the same approach.

A Doc type defines the rendering of data that we'll do. It is simply defined as:

`  data Doc = Empty           | Char Char           | Text String           | Line           | Concat Doc Doc           | Union Doc Doc             deriving (Show,Eq)`

Construction functions are provided to go from primitive types to document objects.

`  empty :: Doc  empty = Empty  char :: Char -> Doc  char c = Char c  text :: String -> Doc  text str = Text str  double :: Double -> Doc  double num = text (show num)  line :: Doc  line = Line`

There are various operations defined for documents. The infix <> operator is used to denote the concatenation of two documents.

`  (<>) :: Doc -> Doc -> Doc  Empty <> y = y  x <> Empty = x  x <> y = x `Concat` y  hcat :: [Doc] -> Doc  hcat = fold (<>)  fold :: (Doc -> Doc -> Doc) -> [Doc] -> Doc  fold f = foldr f empty  empty :: Doc  empty = Empty  () :: Doc -> Doc -> Doc  x  y = x <> softline <> y  fsep :: [Doc] -> Doc  fsep = fold ()  enclose :: Char -> Char -> Doc -> Doc  enclose left right x = char left <> x <> char right  softline :: Doc  softline = group line  group :: Doc -> Doc  group x = flatten x `Union` x  flatten :: Doc -> Doc  flatten (x `Concat` y) = flatten x `Concat` flatten y  flatten Line           = Char ' '  flatten (x `Union` _)  = flatten x  flatten other          = other`

`softline` probably takes some explaining. It maintains two alternative paths (using Union). The `flatten` function replaces new lines with a space, joining two separate lines into one. The other side of the union leaves the document untouched.

All this seems rather abstract; how does it fit into rendering JSON? Firstly we have to define some functions that allow us to create document objects from the JValue instances we declared earlier:

`  renderJValue :: JValue -> Doc  renderJValue (JString s) = string (show s)  renderJValue (JNumber n) = string (show n)  renderJValue (JBool True) = string "true"  renderJValue (JBool False) = string "false"  renderJValue JNull = string "null"  renderJValue (JArray ary) = series '[' ']' renderJValue ary  renderJValue (JObject obj) = series '{' '}' field obj      where field (name,val) = string name                            <> text ": "                            <> renderJValue val`

Rending compound objects (JArray and JObject) makes use of a couple of helper functions which break down the object into individual elements:

`  series :: Char -> Char -> (a -> Doc) -> [a] -> Doc  series open close item = enclose open close                          . fsep . punctuate (char ',') . map item  punctuate :: Doc -> [Doc] -> [Doc]  punctuate p [] = []  punctuate p [d] = [d]  punctuate p (d:ds) = (d <> p) : punctuate p ds`

So now we have a way of turning JValue into Doc objects. As you can see, this is anything but pretty!

`  *Main> renderJValue (JBool True)  Concat (Concat (Char '"') (Concat (Char 't') (Concat (Char 'r')     (Concat (Char 'u') (Char 'e'))))) (Char '"')  *Main> renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])    Concat (Concat (Char '[') (Concat (Concat (Concat (Concat (Concat (Char '"')     (Concat (Char '1') (Concat (Char '.') (Char '0')))) (Char '"')) (Char ','))     (Union (Char ' ') Line)) (Concat (Concat (Concat (Concat (Concat (Char '"')     (Concat (Char '2') (Concat (Char '.') (Char '0')))) (Char '"')) (Char ','))     (Union (Char ' ') Line)) (Concat (Concat (Concat (Char '"') (Concat (Char '3')     (Concat (Char '.') (Char '0')))) (Char '"')) (Union (Char ' ') Line))))) (Char ']')`

The main point is that now we have the contextual information and can write a function to render the document. The simplest possible way of printing out the document is just to pick one side of each union operator and print.

`  compact :: Doc -> String  compact x = transform [x]      where transform [] = ""            transform (d:ds) =                 case d of                  Empty -> transform ds                  Char c -> c : transform ds                  Text s -> s ++ transform ds                  Line -> '\n' : transform ds                  a `Concat` b -> transform (a:b:ds)                  _ `Union` b -> transform (b:ds)`

This gives back a very basic rendering:

`  *Main> putStrLn (compact (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])  ["1.0",  "2.0",  "3.0"  ]`

But we can easily extend this to bound printing within a certain range:

`  pretty :: Int -> Doc -> String  pretty width x = best 0 [x]      where best col (d:ds) =                 case d of                  Empty -> best col ds                  Char c -> c : best (col + 1) ds                  Text s -> s ++ best (col + length s) ds                  Line -> '\n' : best 0 ds                  a `Concat` b -> best col (a:b:ds)                  a `Union` b -> nicest col (best col (a:ds))                                            (best col (b:ds))            best _ _ = ""            nicest col a b | (width - least) `fits` a = a                           | otherwise = b                           where least = min width col  fits :: Int -> String -> Bool  w `fits` _ | w < 0 = False  w `fits` "" = True  w `fits` ('\n':_) = True  w `fits` (c:cs) = (w - 1) `fits` cs`

Now we can render to a fixed width, so:

`  *Main> putStrLn (pretty 30 (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])))                        ["1.0", "2.0", "3.0" ]  *Main> putStrLn (pretty 15 (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])))                        ["1.0", "2.0",   "3.0" ]`

RWH goes into a lot more detail that the above basic example (for example handling escaping properly and so on), it's definitely worth buying.

## Monday 12 October 2009

A Haskell program is a collection of modules. Each module consists of a declaration and a number of exports. Modules can relate to each other by import statements.

The most basic declaration is

`  module MyModule where    foo x = 1 + x  bar x = foo (foo x)`

If you omit an export list then everything in the module is available to the outside world. Typically you want to limit the exports to provide encapsulation and this is accomplished by explicitly naming the modules you want to expose. In the example below only `bar` is exported, foo is private to this module.

`  module MyModule (bar) where  foo x = 1 + x  bar x = foo (foo x)`

Note that a module name must correspond to the file name of the haskell source, so the above must be in a file called MyModule.hs.

To relate modules to each other the import directive is used.

## Thursday 1 October 2009

### Dealing with Partial Functions

Functions can be either total (e.g. fully defined for all possible input values) or partial. Handling partial functions can be painful, for example if you pass an empty list to `head` then it'll terminate the program with an exception.

Partial functions are usually handled by either pattern matching being used to exclude cases where the function isn't defined e.g.

`  foo [] = 33      -- not safe to call head  foo xs = head xs -- safe to call head`

Or using `Maybe` to indicate (via the type system) that the function might not return a value. The first exercise of chapter 4 of Real World Haskell asks you to rewrite some of the partial list functions so that they never fail.

`  safeHead :: [a] -> Maybe a  safeHead [] = Nothing  safeHead (x:_) = Just x  safeTail :: [a] -> Maybe [a]  safeTail [] = Nothing  safeTail (_:xs) = Just xs  safeLast :: [a] -> Maybe a  safeLast [] = Nothing  safeLast (y:[]) = Just y  safeLast (_:xs) = safeLast xs  safeInit :: [a] -> Maybe [a]  safeInit [] = Nothing  safeInit (x:[]) = Just []  safeInit (x:xs) = Just (x : fromJust(safeInit xs))`

In order to be thorough, I thought I'd explore HUnit a bit more and write a few tests. The funny "~:" and "~=?" are just syntactic sugar for writing assertions. One area that confused me was the need to specify the type explicitly on the functions for undefined values. I'm not entirely sure why this has to be done...

`  testSafeHead = test [ "Safe head non-empty" ~: Just 1 ~=? safeHead [1]                      , "Safe head null" ~: Nothing ~=? (safeHead [] :: Maybe Int) ]    testSafeTail = test [ "Safe tail empty" ~: Just [] ~=? safeTail [1]                      , "Safe tail value" ~: Just [1] ~=? safeTail [2,1]                      , "Safe tail nothing" ~: Nothing ~=? (safeTail [] :: Maybe [Int]) ]  testSafeLast = test [ "Safe last empty" ~: Nothing ~=? (safeLast [] :: Maybe Int)                      , "Safe last 1 element" ~: Just 1 ~=? safeLast [1]                      , "Safe last n element" ~: Just 1 ~=? safeLast [3,2,1] ]  testSafeInit = test [ "Safe Init empty" ~: Nothing ~=? (safeInit [] :: Maybe [Int])                      , "Safe Init singleton" ~: Just [] ~=? safeInit [1]                      , "Safe Init list" ~: Just [1,2] ~=? safeInit [1,2,3] ]  tests = TestList [TestLabel "safeHead" testSafeHead                   ,TestLabel "safeTail" testSafeTail                   ,TestLabel "safeLast" testSafeLast                   ,TestLabel "safeInit" testSafeInit]`

You can run these tests within ghci with
`  *Main> runTestTT tests  Cases: 11  Tried: 11  Errors: 0  Failures: 0  Counts {cases = 11, tried = 11, errors = 0, failures = 0}`

I think these tests would have been better written with QuickCheck where I could have compared them with the non-safe versions. But this'll do for now!

## Tuesday 29 September 2009

Most of the exercises at the end of Chapter 3 of Real World Haskell are fairly simple. Find the height of a list, verify a list is a palindrome and so on. The last few however are pretty tough..

The convex hull is the smallest set of points that contain all the other points in the set. The image below from Wikipedia is a much better visual description:

One approach for finding the convex hull is the Graham Scan algorithm and this is the challenge presented in the book. The algorithm can be broken down into the following steps:

1. Find the minimum point (with the lowest Y co-ordinate). In the event of a tie, the lowest x co-ordinate determines the minimum. Call this point P.
2. Sort the remaining points relative to the angle formed by point and P relative to the x-axis.
3. Consider each point in the sequence. For each point, determine whether it is a left turn or a right turn relative to the previous two points. If it is a "right turn" the second to last point is not in the convex hull. Continue until the list is finished.

To solve this, we'll declare some data structures which should be self explanatory.

`  data Direction = Straight                 | LeftTurn                  | RightTurn                   deriving (Show)  data Point = Point Double Double                deriving (Show,Eq)`

Next we define a function that calculates the turn made by three points and returns a Direction. We don't need to calculate the actual angle as the cotangent is proportional to the angle.

`  cross :: Point -> Point -> Point -> Double  cross (Point x1 y1) (Point x2 y2) (Point x3 y3) =  ((x2-x1)*(y3-y1)-(x3-x1)*(y2-y1))  dist :: Point -> Point -> Double  dist (Point x1 y1) (Point x2 y2) = sqrt((x1-x2)^2 + (y1-y2)^2)  turn :: Point -> Point -> Point -> Direction  turn a b c = makeDirection (cross a b c) where      makeDirection x | x == 0 = Straight                      | x < 0 = LeftTurn                      | x > 0 = RightTurn`

In order to find the minimum Point and sort the list we need to define two comparator functions for use with sortBy and minimumBy (both members of Data.List).

`  compareYPoint :: Point -> Point -> Ordering  compareYPoint (Point x1 y1) (Point x2 y2)      | y1 == y2 = compare x1 x2      | y1 <= y2  =  LT      | otherwise =  GT  compareCross :: Point -> Point -> Point -> Ordering  compareCross pvt a b = if (angle == EQ) then distance else angle where      angle = compare (cross pvt a b) 0                       distance = compare (dist pvt a) (dist pvt b)`

Even though compareCross takes 3 arguments, we can still use this as type `Point -> Point -> Ordering` by using currying.

`  lowestY :: [Point] -> Point  lowestY = minimumBy compareYPoint  grahamScan :: [Point] -> [Point]  grahamScan points = undefined where      p = nub points      pvt = lowestY p      sortedPoints = pvt : (sortBy (compareCross pvt) (delete pvt p))`

`nub` is a function that removes duplicate elements from a list. This gives the skeleton of the program. Now we have to do the complicated bit. From Wikipedia:

The algorithm proceeds by considering each of the points in the sorted array in sequence. For each point, it is determined whether moving from the two previously considered points to this point is a "left turn" or a "right turn". If it is a "right turn", this means that the second-to-last point is not part of the convex hull and should be removed from consideration. This process is continued for as long as the set of the last three points is a "right turn". As soon as a "left turn" is encountered, the algorithm moves on to the next point in the sorted array

Let's take the simplest example possible. For the following points (0,0), (2,0), (2,2), (0,2), (1,1) the convex hull should be everything apart from (1,1).

In order to check how this is going to work, we need to define a quick helper function to make building lists of Point types less verbose.

`  pointsFromTupleList :: [(Double,Double)] -> [Point]  pointsFromTupleList = map (\(x,y) -> Point x y)  examplePoints :: [Point]  examplePoints = pointsFromTupleList [(0,0),(2,0),(2,2),(0,2),(1,1)]`

Now we can explore the functions that we're going to use and verify they work

`  -- The minimum point appears correct  *Main> lowestY examplePoints                                                     Point 0.0 0.0  -- The points are sorted as expected  *Main> let sortedPoints = (Point 0 0):(sortBy (compareCross (Point 0 0)) (delete (Point 0 0) examplePoints))  *Main>sortedPoints  [Point 0.0 0.0,Point 0.0 2.0,Point 1.0 1.0,Point 2.0 2.0,Point 2.0 0.0]`

`  *Main> getTurns sortedPoints  [LeftTurn,RightTurn,LeftTurn]`

That's not much use because I don't know the points which constitute the turns. Let's tuple up the results:

`  getTurns :: [Point] -> [((Point,Point,Point),Direction)]  getTurns (x:y:z:ps) = ((x,y,z),turn x y z) : getTurns (y:z:ps)  getTurns _ = []  *Main> getTurns sortedPoints  [((Point 0.0 0.0,Point 0.0 2.0,Point 1.0 1.0),LeftTurn),   ((Point 0.0 2.0,Point 1.0 1.0,Point 2.0 2.0),RightTurn),   ((Point 1.0 1.0,Point 2.0 2.0,Point 2.0 0.0),LeftTurn)]`

The algorithm states that I should ignore the second to last parameter if it is a right turn, and keep it otherwise. Clearly we're close here because the second to last parameter for a right turn is the only point not in the convex hull. One problem with the sortedPoints is that we're not returning to where we started. Let's change the definition again so that we can get back to where we started. We add two nodes at the end so that the extra padding means we consider all triplets

`  sortedPoints = pvt : (sortBy (compareCross pvt) (delete pvt p)) ++ [pvt,pvt]  *Main> getTurns sortedPoints  [((Point 0.0 0.0,Point 0.0 2.0,Point 1.0 1.0),LeftTurn),   ((Point 0.0 2.0,Point 1.0 1.0,Point 2.0 2.0),RightTurn),   ((Point 1.0 1.0,Point 2.0 2.0,Point 2.0 0.0),LeftTurn),   ((Point 2.0 2.0,Point 2.0 0.0,Point 0.0 0.0),LeftTurn),   ((Point 2.0 0.0,Point 0.0 0.0,Point 0.0 0.0),Straight)]`

So now all I need to do to solve the problem is keep the middle point in all cases where we've got a LeftTurn or Straight. I can simplify this further by just only getting the middle point in the tuple.

`  getTurns :: [Point] -> [Point,Direction)]  getTurns (x:y:z:ps) = (y,turn x y z) : getTurns (y:z:ps)  getTurns _ = []  grahamScan :: [Point] -> [Point]  grahamScan points = map fst (filter (\(x,d) -> d /= RightTurn) (getTurns sortedPoints)) where      p = nub points      pvt = lowestY p      sortedPoints = pvt : (sortBy (compareCross pvt) (delete pvt p)) ++ [pvt,pvt]  -- Hurrah, it finally works (for this one example).  *Main> grahamScan examplePoints   [Point 0.0 2.0,Point 2.0 2.0,Point 2.0 0.0,Point 0.0 0.0]`

One example works, that'll do me as I've learnt enough from this exercise to make it worthwhile. I strongly suspect the way I'm extending the point list by appending the pivot twice at the end is just a hack rather than the right way. There's another definition here which looks cleaner, but that apparently has some issues too.

The complete code looks like this. hlint was run on this and pointed out the "uncurry" function which converts a curried function to a function on pairs. This simplifies the definition of `pointsFromTupleList`.

`  data Direction = Straight                 | LeftTurn                  | RightTurn                   deriving (Show,Eq)  data Point = Point Double Double                deriving (Show,Eq)  turn :: Point -> Point -> Point -> Direction  turn a b c = makeDirection (cross a b c) where      makeDirection x | x == 0 = Straight                      | x < 0 = LeftTurn                      | x > 0 = RightTurn  cross :: Point -> Point -> Point -> Double  cross (Point x1 y1) (Point x2 y2) (Point x3 y3) =  (x2-x1)*(y3-y1)-(x3-x1)*(y2-y1)  dist :: Point -> Point -> Double  dist (Point x1 y1) (Point x2 y2) = sqrt((x1-x2)^2 + (y1-y2)^2)  compareCross :: Point -> Point -> Point -> Ordering  compareCross pvt a b = if angle == EQ then distance else angle where      angle = compare (cross pvt a b) 0                       distance = compare (dist pvt a) (dist pvt b)  getTurns :: [Point] -> [(Point,Direction)]  getTurns (x:y:z:ps) = (y,turn x y z) : getTurns (y:z:ps)  getTurns _ = []  grahamScan :: [Point] -> [Point]  grahamScan points = map fst (filter (\(x,d) -> d /= RightTurn) (getTurns sortedPoints)) where      p = nub points      pvt = lowestY p      sortedPoints = pvt : sortBy (compareCross pvt) (delete pvt p) ++ [pvt,pvt]  compareYPoint :: Point -> Point -> Ordering  compareYPoint (Point x1 y1) (Point x2 y2)      | y1 == y2 = compare x1 x2      | y1 <= y2  =  LT      | otherwise =  GT  compareAngle :: Point -> Point -> Point -> Ordering  compareAngle (Point px py) p1 p2 = compare (angle p2) (angle p1) where      angle (Point x1 y1) = y1-py / x1-px  lowestY :: [Point] -> Point  lowestY = minimumBy compareYPoint  pointsFromTupleList :: [(Double,Double)] -> [Point]  pointsFromTupleList = map (uncurry Point)  examplePoints :: [Point]  examplePoints = pointsFromTupleList [(0,0),(2,0),(2,2),(0,2),(1,1)]`

## Saturday 26 September 2009

### Building Basic Types in Haskell

Types are the building blocks of Haskell programs. Algebraic Data Types (not to be confused with Abstract Data types) are one category of types. An algebraic data type is defined as either an enumeration (a set of distinct types) or a discriminated union (a choice between multiple alternatives). For example:

`  -- An enumeration type  data PrimaryColor = Red                     | Green                    | Blue  *Main> :type Red   Red :: PrimaryColor  -- A discrimated union type  data MyIntList = Empty                 | List Int MyIntList                 deriving Show  *Main> :t Empty  Empty :: MyIntList  *Main> :t List 3 Empty  List 3 Empty :: MyIntList`

Note that MyIntList is a recursively defined data type because it uses itself in the definition. This type of structural type recursion is just the same as calling a function recursively. As long as we have a base case (in this case `Empty` then we can easily build the structure. The way an instance of a type was constructed is available at runtime. Pattern matching is a way of breaking down a value into constiuent parts. For example, we could (but should not!) write a function to calculate the length of the `MyIntList` type as follows:

`  myIntListLength :: MyIntList -> Integer  myIntListLength Empty = 0  myIntListLength (List _ rest) = 1 + myIntListLength rest`

There's a good talk about the use of OCaml for financial trading systems available here. Amongst other things the strengths of the pattern matching idea are discussed. In a large system you can change a type, recompile and find out any redundant, missed or impossible cases. In OCaml this seems to be on by default, with GHC you need to use the "-fwarn-incomplete-patterns" option.

Record syntax is an alternative way of specifying a type such that parameters can be referred to be name.

`  data Person = Person {        firstName :: String      , lastName :: String      , title :: String      } deriving (Show)  -- Normal way of creating a Person  *Main> Person "Bill" "Bob" "Mr"  Person {firstName = "Bill", lastName = "Bob", title = "Mr"}  -- Record syntax allows you to specify parameters by name  -- and therefore order does not matter  *Main> Person { title="Mr", firstName="bill", lastName="bob" }  Person {firstName = "bill", lastName = "bob", title = "Mr"}`

The `MyIntList` type is very poor. It's only defined for lists of type integer. We can improve this by making it a parametrized type. In the example below we defined `MyList` for any type a.

`  data MyList a = Empty                | MyList a (MyList a)                deriving (Show)  -- MyList of numbers  *Main> :t MyList 3 (MyList 4 Empty)   MyList 3 (MyList 4 Empty) :: (Num t) => MyList t  -- MyList of characters  *Main> :t MyList 'a' (MyList 'b' Empty)  MyList 'a' (MyList 'b' Empty) :: MyList Char`

You can have multiple parametrized types. For example, if we wanted to create a list that alternated types, we could:

`  data MyOddList a b = OddEmpty                     | MyOddList a (MyOddList b a)                     deriving (Show)  *Main> :t MyOddList 3 (MyOddList 'c' (MyOddList 4 OddEmpty))    MyOddList 3 (MyOddList 'c' (MyOddList 4 OddEmpty)) :: (Num t) => MyOddList t Char`

## Monday 21 September 2009

Chapter 2 of Real World Haskell gives a quick overview of types and functions. I think I've already picked up most of the bits in here, so this is just some notes on the things I found new / interesting.

One of Haskell's distinguishing features is its strong type system. Real World Haskell says the three interesting aspects are:

1. Strong types - The type system guarantees that the program does not contain certain errors (treating an integer as a float). In comparison, something like C allows you to reinterpret the bytes as you see fit (weak typing)
2. Static Types - At the point the code is built, the compiler knows the type of every expression and value. In comparison dynamically typed languages only know the type of an expression at runtime.
3. Type Inference - The compiler can automatically infer the types of expressions. Type definitions are optional and Haskell can deduce the required types (however, it seems to be good practise to provide the type signature as a documentation aid).

Lists and tuples are the primary way of grouping elements. Items in a list must all be the same type, items in a tuple can be different types. One interesting point made was that each tuple is a unique type and therefore writing a function to "get the second element from any tuple" is almost impossible. Looking at the types of `fst :: (a,b) -> a` and `snd :: (a,b) -> b`, it's easy to see why. In order to get the second element from any tuple, we'd have to define it for an infinite range of tuples (a,b,c), (a,b,c,d) and so on.

When Haskell evaluates an expression it uses normal-order evaluation. Normal-order evaluation simply evaluates expressions as they are needed. This is also known as a lazy evaluation strategy because expressions that don't contribute to the result aren't evaluated. This avoids the special case treatment that short-circuiting operators get in other languages.

## Sunday 20 September 2009

I finally got around to ordering Real World Haskell. Currently the only book on Haskell I have is The Haskell School of Expression which is enjoyable, but I've never got around to installing the relevant packages and passive reading only gets you so far!

I thought I'd try to keep notes as I go through the book as that seems to be a good way to prompt myself to actually use books (working through PAIP was very rewarding, so hopefully RWH will be a similar experience!) rather than just adorn my study with expensive door stops. This isn't meant to make particular sense, rather just a brain dump as I read through it.

So on with the book. Chapter 1 covers setting up the environment. There's a few interesting ghci commands I had not seen like `:set prompt` and `:module + M`. `:set +t` will always print the type of an expression. The "it" that is displayed is the name of a special variable which contains the value of the last expression evaluated. `:info` displays the precedence of operators. For example / has higher precedence than +.

```Prelude> :info (+)
class (Eq a, Show a) => Num a where
(+) :: a -> a -> a
...
-- Defined in GHC.Num
infixl 6 +

Prelude> :info (/)
class (Num a) => Fractional a where
(/) :: a -> a -> a
...
-- Defined in GHC.Real
infixl 7 /
```

An infix operator can be converted to a prefix operator by enclosing the operator in parentheses. Similarly, back-ticks can convert a prefix operator into an infix operator.

Enumerations are a way of producing lists of numbers. The Haskell Cheat Sheet gives a great summary of the possibilities.

```[1..3] -> [1,2,3]
[1..] -> [1,2,3,4,...] (infinite list)
[10..1] -> [] (ranges only go forward)
[0, -1 ..] -> [0, -1, -2, ...] (infinite list)
[1,3..10] -> [1,3,5,7,9] (list from 1 to 10 with diff of 2)

-- Also applies to any Enum class
['a' .. 'e'] -> "abcde"
['a', 'd' .. 'z'] -> "adgjmpsvy"
```

Don't use floating points for enumeration because the behaviour can be "quirky" (rounding errors and so on).

The empty string is a synonym for [] for the type Char,so `'a' : "" -> "a"`.

Rational numbers are defined in the `Data.Ratio` module, and constructed with % (e.g. 22 % 7 is nearly Pi).

The exercises showed me a function I'd not seen before. `interact` is a function which takes a function which takes stdin as input, and returns a string which is written to stdout (e.g. `interact :: (String -> String) -> IO()`). Also, once I grokked that "wc" was referring to the Unix command it all became a little clearer!

## Wednesday 16 September 2009

### To Type Class or not to Type Class

Last time I wrote a very simple log parser, but it was very specific. The next task was to work out how to generalize it.

My first thoughts were to use type classes to make the code more generic. Type classes represent a constraint on a type variable in a parametrically polymorphic type. Parametric polymorphism simply means a generic function can be written so it handles values identically independent of type (the list functions are a good example).

So I started, and ended up typing in gibberish like this:

`  class Report r where      printReport :: r -> String  class (Eq l, Show l) => LogProcessor l where      processLine :: String -> Maybe l      combineUnit :: (Report r) => l -> r -> r`

I then read OOP vs type classes and Learning Haskell Notes which made me think that type classes is the wrong approach.

All I'm actually trying to do is change the behaviour of a couple of functions (parsing a line and combining the results). I'm not really interested in any types and there should be as few constraints as possible in terms of what you can implement. The simplest way (and probably the right way) is to just pass these functions in

`  processFile :: FilePath -> (String -> Maybe t) -> IO([t])  processFile path f = do    a <- readFile path    return (Maybe.mapMaybe f (lines a))  reportFile :: FilePath -> (String -> Maybe t) -> ([t] -> String) -> IO()  reportFile path func comb  = do    a <- processFile path func    print (comb a)    return ()`

There aren't really any constraints here. `processFile` now processes a file path, returning `IO([t])`. A combining function processes this list and generates a string to give the result. Much simpler to use too.

I spend way too much of my time digging through gigabytes of log files looking for interesting events. I've looked at this before using Clojure, so I thought I would try to translate the basics into Haskell and then (over the course of the next few weeks) try to develop into something general purpose and halfway useful.

Building a report from a log file (or collection of log files) typically requires:

1. A representation of the events I'm interested in
2. A predicate for determining whether the line is an event (`String -> Bool`)
3. Converting the line into an event (`String -> Event`)
4. Folding the results into a single report (`[Event] -> Report`)

In this example, I wanted to parse my dpkg logs (/var/log/dpkg.log) to see how many times I've upgraded packages on Ubuntu. Firstly I defined a type to represent an upgrade event:

`  type Package = String  data Upgrade = Upgrade { packageName :: Package                         , updateTime :: UTCTime }   instance Show Upgrade where      show a = show (updateTime a) ++                ":" ++ show (packageName a)`

`instance Show Upgrade` is similar to `deriving Show`, but allows you to customize how the object will be converted to a string.

Next step is to get a predicate and to parse the lines. I decided to combine these into one function and return `Maybe Upgrade` to indicate success / failure. I used the Date.Time modules to parse the data and time. The parsing is terrible but suffices for now as I'm just trying to get an idea of where I need to generalize. Note to self read (and ideally understand!) Monadic Parsing Combinators [PDF] and associated Haskell module.

`  getTime :: String -> UTCTime  getTime = fromJust . parseTime defaultTimeLocale timeFormat  getPackageName :: String -> String  getPackageName = takeWhile (not . Char.isSpace)                 -- Poor mans parsing.  parseLine :: String -> Maybe Upgrade  parseLine s             | isInfixOf " upgrade " s = Just                                         (Upgrade                                          (takeWhile (not . Char.isSpace) (drop 28 s))                                         (getTime (take 20 s)))            | otherwise = Nothing`

All I need to do now is a combining action to perform with foldl. For this I've defined a report of type `Map Day [Package]` which represents an association between a day and all the names of the packages updated on that day.

`  processFile :: FilePath -> IO([Upgrade])  processFile s = do     a <- readFile s    return (Maybe.mapMaybe parseLine (lines a))  type Report = Map Day [Package]  combine :: [Upgrade] -> Report  combine = foldl addToReport Map.empty   addToReport :: Report -> Upgrade -> Report  addToReport r p = Map.insert day packages r where                    day = utctDay (updateTime p)                    initVal = Map.findWithDefault [] day r                    packages = packageName p:initVal  reportFile :: FilePath -> IO()  reportFile f = do    a <- processFile f    print (combine a)    return ()`

Hurrah, so now I get output in the right format and I can see that I really shouldn't have added some of the Firefox 3.5 bleeding edge repositories to my Ubuntu upgrade paths. Upgrading Firefox (or Shiretoko) every few days is a bad thing. D'oh!

The next stage for me to understand is how I can generalize this. What I really want to develop next is a simple pluggable framework. It seems that I need to generalize at least the following bits:

• Parsing a line into a type T
• Combining [T] to produce a single report

## Saturday 12 September 2009

### HLint

I found mention of HLint via this post. Installing it was as simple as the instructions provided. The binary ended up in `\$HOME/.cabal/bin` which confused me for a few minutes.

A quick run over my code showed many improvements that I could make. For example. I don't make as much use of eta reduction as I could and HLint not only tells me this, but suggests a solution.

`  ./randomText.hs:15:1: Warning: Eta reduce  Found  createTrainingSet s    = foldl' updateMap Map.empty (wordPairs (words s))  Why not  createTrainingSet = foldl' updateMap Map.empty . wordPairs . words`

I also use far too many brackets in my code. That's a combination of not knowing the precedence rules and switching over from Clojure! Again, a warning is provided and the minimum brackets required shown.

HLint also finds common patterns in code and suggests rewriting them in terms of fold or map. For example when I was trying to write my own `map`:

`  ./myfunctions.hs:11:1: Error: Use map  Found  mymap f [] = []  mymap f (x : xs) = f x : mymap f xs  Why not  mymap f xs = map f xs`

And similarly for written my version of length:

`  ./myfunctions.hs:3:1: Warning: Use foldl  Found  mylength [] count = count  mylength (x : xs) count = mylength xs (1 + count)  Why not  mylength xs count = foldl (\ count x -> 1 + count) count xs`

I'll definitely be making use of this before I post any more Haskell code!

## Wednesday 9 September 2009

Previously I implemented an example from ANSI Common Lisp that generated a stream of text based on the word frequencies of a sample document. This produces surprisingly realistic looking text.

The data structures that represent the frequency map are typed as below. `type` defines a type synonym and are purely used to improve readability (e.g. like `typedef` in C/C++)

`  type Followers = Map String Int  type WordSuccessors = Map String Followers`

To generate this data structure, we generate all pairs of words, and then add them to an instance of the `WordSuccessors` structure until complete.

`  createTrainingSet :: String -> WordSuccessors  createTrainingSet s =  foldl' updateMap Map.empty (wordPairs (words s))  updateMap :: WordSuccessors -> (String,String) -> WordSuccessors  updateMap m (x,y) = Map.insert x v m where                      q = Map.findWithDefault (Map.singleton y 0) x m                      v = Map.insert y (succ (Map.findWithDefault 0 y q)) q      wordPairs :: [String] -> [(String,String)]  wordPairs l = zip l (tail l)`

One tip I picked up from Real World Haskell was using `undefined`. You can use this to sketch out type signatures of functions, prior to putting the body in. For example, `updateMap` was a pain to get right, but with undefined I can fill in bits of the implementation by replacing undefined with a real implementation. Each time I can use the compiler to validate I haven't gone too far wrong.

`  updateMap :: WordSuccessors -> (String,String) -> WordSuccessors  updateMap m (x,y) = undefined  -- Refine  updateMap m (x,y) = Map.insert x v m where                      q = undefined                       v = undefined   -- Refine   updateMap m (x,y) = Map.insert x v m where                      q = Map.findWithDefault (Map.singleton y 0) x m                      v = undefined   -- Refine  updateMap m (x,y) = Map.insert x v m where                      q = Map.findWithDefault (Map.singleton y 0) x m                      v = Map.insert y (succ (Map.findWithDefault 0 y q)) q`

Types help bound your program to make sure it does what you think. Developing in this way draws parallels with test-driven development. Instead of a suite of tests (which I have to write properly!), I can use the type system to verify the program still does what I think it should do! I realize that types don't pick up everything (maybe dependent types will minimize this in the future), but then neither do unit tests!

Next step is to generate random text based on this word successors information. Random functions aren't pure (obviously) and I went a long way down various wrong roads (mostly involving changing the return type of nextWord to `IO(String)`, propagating everything to type IO and then realizing I'm going down the wrong road). On IRC (#Haskell) someone suggested an idiom of passing an infinite list of random numbers to a function - this keeps it pure (the impurity is handled in generating the list) at the cost of an additional argument.

The algorithm is comically naive, but it's just about fast enough. I couldn't find a better way of selecting an element based on a frequency distribution stored in a map though I'm sure there is one!

`  nextWord :: [Int] -> WordSuccessors -> String -> ([Int],String)  nextWord seeds fm start = (r, (poss !! (mod s count))) where                             successors = fm Map.! start                            count = Map.fold (+) 0 successors                            poss = Map.foldWithKey (\k v acc -> (replicate v k) ++ acc) [] successors                            s = head seeds                            r = drop 1 seeds `

The function returns the next word given the previous, and also shifts the random number sequence along again.

Finally, we need something to drive this. `main` returns a type `IO(String)` because it has to perform two bits of IO (reading the sample file and generating the random number).

`  maxWordCount :: Int  maxWordCount = 1000000  main :: IO(String)  main = do    text <- readFile exampleFile    gen <- newStdGen    let training = createTrainingSet text        seeds = randomRs (0,maxWordCount) gen    return (unwords (map snd (iterate (\(s,w) -> nextWord s training w) (seeds,"by"))))`

So finally, we can generate an infinite sequence of random-ish text. Here's an example, based on "Manual of Surgery". Apparently it was the most popular book downloaded. Weird and slightly disturbing.

The patient from a long bones, and, by the blood threatens to fat. In the lower or near the blood vessel, and a pustular eruption are usually be inserted, and irregularity in Pus from three weeks or to employ considerable size by a soft, rapidly developed over looked. If, however, regain the compression causes suffering from one collection of small saphena system run an ideal manner of the risk of pain; the patient's strength of the best hope of wounds are believed to the toxic symptoms which the suppuration are reduced by ankylosis. In surgical diseases, from repeated on the limb becomes discoloured and is fluid and upwards, and loss of any prospect of age, and is caseation and the existing sinuses with the victims of pes equino-varus. Pressure of calcium salts, and then forms a severe pain, with the ulcer may attain a wound infection leads to enter the end of pus forms, but may be diagnosed before it is then foul and the scalp or destruction of Hands 529 Hectic Fever 62 12. Chart of a synovitis_ with the arsenical preparations of syphilis, must be a stab or bossed, the natural protective contraction which is sometimes of the inflow of cerebral congestion and on the blisters, and irregular, stalactite-like processes of the gum, with the penis, scrotum, excision or by freely open method of the first thirty-six or applies a globular cyst, or spindle-shaped, and sometimes resulting in the myeloma is often met with the shafts. The operation of Contusions._--The usual phenomena associated in primipar, there is seldom be employed with lymph glands, 322 Submental lymph from the dorsum.

Sounds like believable medical gibberish to me!

## Friday 4 September 2009

Haskell's primary data type is the list and there are a rich variety of functions for transforming lists. Most of these functions are familiar to me from Clojure sequences but there are some other useful functions that I haven't seen before.

`last`, as you'd expect, gives you the last item in a list. It has a corresponding function `init` that gives you everything but the last item. `intersperse` allows you to insert an element between each element (e.g. `init (intersperse 'a' "bnnn") => "banana"` . Similarly, `intercalate` inserts the lists between lists and concatenates the results. For different arrangements of the items, there are two functions. `subsequences` gives you a list of all subsequences of the argument and `permuatations` gives all the possible permutations.

`  Prelude> subsequences "abc"            ["","a","b","ab","c","ac","bc","abc"]  Prelude> permutations "abc"  ["abc","bac","cba","bca","cab","acb"]`

To reduce lists to a single value there are many versions of the fold function (in Clojure there was just `reduce`!). `foldl` is the direct equivalent to reduce which reduces a list to a single binary operator. `foldl (+) 0 [1,2,3]` is equivalent to (((0 + 1) + 2) + 3). `foldl1` is a convenience method without the initial argument, that only applies to non-empty lists `foldr` and `foldlr1` are right-associative so `foldr (+) 0 [1,2,3]` evaluates to (0 + (1 + (2 + 3))).

The fold family of functions can be extremely powerful - I need to read "A tutorial on the universality and expressiveness of fold" [PDF]that explores this in more detail. Some example functions that operate on lists defined as folds in Haskell include `and`, `or`, `any`, `all` and `concatMap`.

On a side note, there are strict versions of the foldl functions, that with a '. Why'd you need these? Haskell is lazy by default which can mean you build up a great big thunk (a pending calculation). This can be bad (for example, increased space requirements). By making a strict version you evaluate the values as they becomes available. This stops the huge think building up and can be more efficient. There's a good discussion of this here. `foldr` doesn't have a corresponding strict version, and looking at the expansion it's easy to see why - there's nothing to be lazy with as the first value you can evaluate is right at the end of the list!

`unfoldr` is an interesting function. It can be considered the opposite of `foldr` as it is used to build up a list from a seed value. The first argument is a function that returns `Nothing` if it's finished or `Just (a,b)` otherwise. `a` is prepended to the list, and `b` is used as the next element. For example we can generate the Fibonacci sequence:

`  fibonacci :: [Integer]  fibonacci = unfoldr (\[a,b] -> Just(a+b,[b,b+a])) [0,1]  Prelude> take 10 fibonacci  [1,2,3,5,8,13,21,34,55,89]`

`iterate` can be written as `iterate f == unfoldr (\x -> Just (x, f x))`. Another paper to add to the reading list is "The under appreciated unfold".

`take` and `drop` are functions for getting prefixes and suffixes of (potentially) infinite lists. `splitAt` does both at the same time, returning a tuple `(take n xs, drop n xs)`. `takeWhile` and `dropWhile` take or drop errors whilst some predicate holds. Putting these together we can write a function `groupN` which groups elements into sublists of size N.

`  groupN :: [a] -> Int -> [[a]]  groupN [] _ = []  groupN xs n = a : groupN b n where                (a,b) = splitAt n xs  Prelude> groupN [1..7] 2  [[1,2],[3,4],[5,6],[7]]`

The Haskell list library is very complete and there's definitely some new ideas for me to absorb there. In particular, understanding unfoldr and folding in more detail seems to be an important thing to do!

## Wednesday 2 September 2009

One thing I was finding a little frustrating is that if I was to do any IO in a function, then it's type signature changes. This is good most of the time, but it's a little painful to have to change all your code to print out a number!

Thankfully, there's a number of solutions.

The GHCi debugger provides a way of inspecting code. `:b N` sets a break point in the loaded module at a specific line. Taking the anagrams example I set `:b 30` on `anagrams <-anagramList wordfile`.

`  anagramsOf :: String -> IO ()  anagramsOf word = do    anagrams <- anagramList wordfile    putStrLn (show (Map.lookup (stringToKey word) anagrams))`

When the program is run and the breakpoint is hit we get the following:

`  Stopped at anagrams.hs:31:2-57  _result :: IO () = _  anagrams :: Map String (Set String) = _  word :: String = _  [anagrams.hs:31:2-57] *Main> :list  30    anagrams <- anagramList wordfile  31    putStrLn (show (Map.lookup (stringToKey word) anagrams))  32  `

`:list` is used to list of the free variables in scope (`anagrams` and `word`) and these are available to inspect in the debugger. `_result` is a binding for the result expression. Once you've hit a breakpoint, you can use use `:trace` to continue to the next breakpoint, recording the history as you go along. `:back` and `:forward` allow you to go up and down the list of evaluated expressions and inspect each one.

The Haskell Wiki has a section devoted to debugging which brought me to the Debug.Trace module. This allows you to print some text and return the evaluation of the next expression e.g. `Debug.Trace.trace "1+1=" (1 + 1)`.

Initially, I couldn't understand how the type of `Debug.Trace.trace` could be `String -> a -> a`, but then I found `System.IO.Unsafe`. It comes with a large health warning that it doesn't enforce ordering on IO and it's type unsafe. Evil, yet useful (at least to write the trace functions).

## Tuesday 1 September 2009

### Generating ASCII Art

ASCII art is a way of presenting graphics through standard characters, without the need for images. For example, you could draw a triangle, using just forward and backwards characters.

`  /\ /__\`

Given an arbitrary picture, how do we convert this to ASCII art? The technique is dead simple - convert the image to gray scale and replace each pixel in the image with a character representing the brightness value. For example, a * character is darker than a ! character.

The following Haskell program does just that. It uses the PGM package to load an image (so it's already converted to gray scale). All it does it map the pixel to a character using a function and amap and then do some jiggery pokery to turn it into an image.

`import Graphics.Pgmimport Text.Parsec.Errorimport Data.Array.Basebrightness =  " .`-_':,;^=+/\"|)\\<>)iv%xclrs{*}I?!][1taeo7zjLu" ++               "nT#JCwfy325Fp6mqSghVd4EgXPGZbYkOA&8U\$@KHDBWNMR0Q";loadImage :: String -> IO (UArray (Int,Int) Int)loadImage path = do                    r <- pgmsFromFile path                    case r of                      Left e -> error "Failed to parse file"                      Right i -> return (head i)brightnessToChar :: Int -> Int -> CharbrightnessToChar m b = brightness !!                        (round ((fromIntegral b) / (fromIntegral m) * (fromIntegral ((length brightness) - 1))))imageToAscii :: UArray (Int,Int) Int -> UArray (Int,Int) CharimageToAscii image = amap (brightnessToChar 255)  image     convertImage :: String -> String -> IO ()convertImage image out = do                       img <- loadImage image                       let ((_,_),(h,w)) = bounds img                       let x = imageToAscii img                       writeFile out (unlines [ [ x ! (i,j) | i <- [0..w] ] | j <- [0..h] ])                       return ()`

Interesting learning exercises for me were the nested list comprehension to go through the array and the use of `Either` to represent a choice of return values. This seems to be used when you want to return more data than simply "it failed" (which you'd use `Maybe`).

Because I can't quite work out how to get a tiny font, it's easier to post a screen shot of some ASCII art (kind of defeats the purpose, I know!). Below is the Ubuntu logo rendered (badly!) as ASCII art.

## Sunday 30 August 2009

So far I've seen the list data structure, but Haskell also supports items like Maps and Sets.

An association list is a list of tuples of keys to values. For example:

`  alist :: [(String,Double)]  alist = [("pi", 3.14159265), ("e", 2.71828183), ("phi", 1.61803398874)]  getConstant :: String -> Maybe Double  getConstant name = lookup name alist`

`lookup` is a prelude function that returns the value (if existing) for the supplied key. Association lists are useful for small lists, but the lookup time is O(N) in the size of elements. Enter Map which provides the same key/value abstraction but with efficient lookup. `Data.Map` provides operations for insertion, deletion and inspection of keys. For example:

` -- Given an association list, make a map Map.fromList aList -- Insert a new key/value into the empty map Map.insert "pi" 3.14159265 \$ Map.empty -- Is a key in a map? Map.member key map -- lookup and findWithDefault allow you to find values.`

\$ is the function application operator - it's right associative so it means less brackets.

Similarly, `Data.Set` provides a functional implementation of sets.

We can put these together and change the implementation of the anagrams to create a map of search key to a set of results. This is hideously inefficient initially, but once the data structure is built up finding anagrams should be a little quicker.

`  anagramList :: String -> IO (Map String (Set String))  anagramList file = do    filecontent <- readFile file    return (foldl (\x y -> Map.insertWith Set.union (stringToKey y) (Set.singleton y) x)                  Map.empty                   (filter validWord \$ lines filecontent))  anagramsOf :: String -> IO ()  anagramsOf word = do    anagrams <- anagramList wordfile    putStrLn (show (Map.lookup (stringToKey word) anagrams))`

## Saturday 29 August 2009

All functions in Haskell are pure - how do you deal with side effects such as input/output?

`  Prelude> :t putStr "hello world"        putStr "hello world" :: IO ()`

The IO () means that this returns an IO action that has a result type of () (known as unit).

The `do` syntax can be used to glue actions together. For example:

`  saygoodbye = do    putStrLn "Hi - who are you?"    name <- getLine    putStrLn ("Goodbye Mr. " ++ name)  *Main> saygoodbye  Hi - who are you?  Bond  Goodbye Mr. Bond  *Main> :t saygoodbye  saygoodbye :: IO ()`

`<-` performs the getLine application and binds the resulting value to name. Given that getLine has type IO String this gives name a type of String. IO types and normal types can't be mixed so `"Die " ++ getLine` is an illegal statement (IO String and String don't mix).

`return` is like the opposite of `<-` - it takes a pure value and constructs an action out of it. `return` is nothing like its use in other languages such as Java and C. `return` doesn't do anything with the execution path, code continues to the next line; `return` is purely used to construct actions.

So how'd you escape from your IO action? You don't... There's no escape!:

There's one final detail about IO actions that you should be aware of: there is no escape! The only way to get a result from an IO action is to invoke the IO action (through main) and have its result used to affect the outside world through another IO action. There is no way to take an IO action and extract just its results to a simple value (an inverse-return). The only places where an IO action's results appear unwrapped are within a do-block.

Let's try and put this all together to write a quick program that gives all anagrams of a word. The implementation idea is taken from Programming Pearls - load up a dictionary, sort the characters (so "banana" becomes "aaabnn"), then shove it all into an association list. Then given a word, apply the same sort and simply look up the associations.

Unix distros come with a word list file (/usr/share/dict), but it's full of words with punctuation and so on. We need to filter this list to remove invalid words, then build up an association list (list of tuples).

`  import Data.Char  import List  wordfile = "/usr/share/dict/words"  stringToKey :: String -> String  stringToKey = sort.(map toLower)  validWord :: String -> Bool  validWord s = (not (null s)) &&                 length s <= 10 &&                 not (any (not.isAlpha) s)  anagramList :: String -> IO [(String,String)]  anagramList file = do    filecontent <- readFile file    return (map (\x -> ((stringToKey x),x)) (filter validWord (lines filecontent)))  matchingKeys :: String -> [(String,String)] -> [String]  matchingKeys k l = map snd (filter ((== k).fst) l)  anagramsOf :: String -> IO ()  anagramsOf word = do    anagrams <- anagramList wordfile    putStrLn (show (matchingKeys (stringToKey word) anagrams))`

`stringToKey` is a function of one argument which converts a string to a key by making the string lower-case and then sorting the characters. `readFile` does exactly what is says on the tin (it's lazy too). `lines` is a built in function which breaks up a string into separate lines. And it seems to work first time too!

`  *Main> anagramsOf "least"  ["Stael","Tesla","least","slate","stale","steal","tales","teals"]`

My current understanding (and it's all very foggy!) is that if a function uses IO actions then they will always boil up to the top level. `main` is the only place where these can be hidden. For example, I could put the loading of the anagram list in one place, use <- in main and then not have to riddle the rest of the program with IO actions. Good design isolates the IO in the smallest segment of the program possible.

## Friday 28 August 2009

### Testing Times

My last program didn't have anything in the way of tests, purely because I had no idea to write them in Haskell.

There seem to be two major test frameworks in Haskell. The first one, HUnit, is based on the xUnit family. Create test cases which assert various properties of the functions you're testing, bundle them into a test suite and run them.

`  foo :: (Num a) => a -> a -> a -> a  foo a b c = a * b + c  test1 = TestCase (assertEqual "* has higher precedence" 26 (foo 2 10 6))  tests = TestList [TestLabel "Foo test" test1]  -- From the REPL  *Main> runTestTT tests        Cases: 1  Tried: 1  Errors: 0  Failures: 0  Counts {cases = 1, tried = 1, errors = 0, failures = 0}`

Tests like this always feel a bit smelly - the only way to verify the test is to write the code again twice. Whilst measure twice cut once works for carpentry, it doesn't feel right for programming...

Enter an alternative testing framework, QuickCheck. The idea is simple, instead of testing arbitrary assertions about your code, specify the invariants associated with your function and let QuickCheck see if it can generate a failing test case.

As a simple example, let's say we write a function to add two numbers together:

`  addNum :: (Num a) => a -> a -> a  addNum a b = a + b  prop_AddNum a b = (addNum a b) >= b && (addNum a b) >= a`

We specify the invariant that if we add numbers together the result is bigger than either argument. Running Quick Check shows that this is (obviously wrong!) and gives an example set of arguments that fail the test.

`  *Main> quickCheck prop_AddNum  Falsifiable, after 5 tests:  -1  -2`

The convention is that the invariants are usually specified as beginning with "prop_" in case you're wondering where the weird naming comes from.

QuickCheck generates random instances satisfying the types and validates the properties. Generators exist for the basic types and can be extended to your own.

Taking an example from the ray tracing functions we can specify an invariant that the distance between any two points is constant after a linear transform.

`  square :: (Num a) => a -> a  square x = x * x  distance :: Point -> Point -> Float  distance p1 p2 = sqrt(square ((x p1)-(x p2)) + square ((y p1)-(y p2)))  prop_distance :: Point -> Point -> Float -> Float -> Bool  prop_distance p1 p2 d1 d2 = 0.001 > abs (distance p1 p2 -                                            distance (Point ((x p1) + d1) ((y p1) + d2))                                                    (Point ((x p2) + d1) ((y p2) + d2)))`

Note that the `abs` is just to deal with rounding errors that occur when dealing with floating point types results from the square root. The code won't compile as is, because QuickCheck doesn't know how to generate Point objects. We can solve this problem by creating an instance of `Arbitrary` specialized (is that the right word?) for Point types.

`  instance Arbitrary Point where      arbitrary = do        x <- choose(1,1000) :: Gen Float        y <- choose(1,1000) :: Gen Float        return (Point x y)`

`do` is used to provide sequencing of statements. We can now run quickCheck and verify that the invariant holds.

`  *Main> quickCheck prop_distance  OK, passed 100 tests.`

I'm still not quite understanding some aspects of this (e.g. why can't I write `Point choose(1,1000) choose(1,1000)` instead of sequencing?), but this is a pretty neat way of writing tests and definitely gives me further reason to try and understand Haskell in more depth.

## Wednesday 26 August 2009

may be thought of as functions whose domains are isomorphic to contiguous subsets of the integers.

Arrays in Java:

An array object contains a number of variables. The number of variables may be zero, in which case the array is said to be empty. The variables contained in an array have no names; instead they are referenced by array access expressions that use nonnegative integer index values. These variables are called the components of the array. If an array has n components, we say n is the length of the array; the components of the array are referenced using integer indices from 0 to n - 1, inclusive.

This probably sums up the difference between the languages very well. Haskell says it in one sentence, whereas Java waffles a little bit more!

In Haskell arrays are constructed with the `array` constructor. The first argument specifies a pair of bounds so we can create an array mapping the integers 3, 4 and 5 to 10 times their value.

`  *Main> array (3,5) [ (i,i*10) | i <- [3..5]]  array (3,5) [(3,30),(4,40),(5,50)] `

Arrays can also be multidimensional in which case more bounds need to be provided.

`  *Main> (array ((0,0),(1,1)) [ ((i,j),i*2+j) | i <- [0..1], j <- [0..1]])   array ((0,0),(1,1)) [((0,0),0),((0,1),1),((1,0),2),((1,1),3)]`

I struggled to get the PPM package in Hackage working last time because I didn't understand arrays in the slightest. Now that I have a little understanding (only a little!) I can actually visualize the ray tracing code...

One thing I'm still not understanding is how I should format Haskell code. Here's the code to render the image and save it as a PPM.

`image :: [Sphere] -> Point -> Int -> Int -> Array (Int,Int) Int image world eye width height =     array       ((0,0),(width,height))       [((i,j),truncate (255 * (value (colorAt world eye (fromIntegral i) (fromIntegral j))))) |         i <- [0..width], j<- [0..height]]imageWord16 :: Array (Int,Int) Int -> Array (Int,Int) Word16imageWord16 image = fmap (fromIntegral :: Int -> Word16) imagesaveImage :: String -> [Sphere] -> Point -> Int -> Int -> IO ()saveImage filename world eye width height = arrayToFile filename (imageWord16 (image world eye width height))`

Note the horrible usage of `fromIntegral` and `truncate` to convert an integer to a float and back again. I think this is because I should have been more general in my types on the Point data type and specified it as a number rather than a float.

The type of `saveImage` looks a little funny but this is because it returns an IO action rather than a value. This value is a monad, but again I'll ignore this and hope repeatedly using them will lead to understanding! For now, I just grok that its type indicates it does something rather than returns something.

The finished code weighs in at about 100 lines, which is more or less exactly the same as Clojure, but with the advantage of static typing. I found static typing to be in equal measure incredibly useful and incredibly frustrating! Hopefully it'll lean towards useful as I understand things a bit more.