Fatvat: 2009

Monday, 7 December 2009

Implementing Klondike

A comment on my last post suggested using arrays would be far simpler to model the tableau (and also the foundation). And they were right!


  data Index = A | B | C | D | E | F | G 
             deriving (Eq, Ord, Show, Enum, Ix)

  data Slot = Slot 
      {
        shown :: [Card] 
      , hidden :: [Card] 
      } deriving (Show,Eq)

  type Tableau = Array Index Slot

  type Foundation = Array Suit [Card]

By deriving Index and Suit from the class Ix it becomes possible to use this to index items in an array.

To play Klondike we need to deal the deck out. I've assumed the deck is already shuffled.


  newGame :: [Card] -> Game
  newGame cards = Game d emptyFoundation t where
      (t,d) = dealTableau cards

  emptyFoundation :: Foundation
  emptyFoundation = array (Clubs,Spades) [(s,[]) | s <- [Clubs .. Spades]]

  dealTableau :: [Card] -> (Tableau,[Card])
  dealTableau dk = ((array (A,G) [(A,(Slot [a] as))
                                 ,(B,(Slot [b] bs))
                                 ,(C,(Slot [c] cs))
                                 ,(D,(Slot [d] ds))
                                 ,(E,(Slot [e] es))
                                 ,(F,(Slot [f] fs))
                                 ,(G,(Slot [g] gs))]),
                    rest) where 
      (a:as,h) = splitAt 1 dk
      (b:bs,i) = splitAt 2 h
      (c:cs,j) = splitAt 3 i
      (d:ds,k) = splitAt 4 j
      (e:es,l) = splitAt 5 k
      (f:fs,m) = splitAt 6 l
      (g:gs,rest) = splitAt 7 m

My goal is to write a simple model for Klondike so that I can experiment with various playing strategies. The first stage in this is writing the rules of the games.


  data Move = TurnDeck
            | ToFoundation Index 
            | DeckTo Index
            | DeckUp
            | MoveCards Index Int Index
            | GameOver
              deriving (Show,Eq)

  -- |Can we turn the deck?
  turnDeck :: Game -> Bool
  turnDeck = null . deck

  -- |Is the game complete?
  gameWon :: Game -> Bool
  gameWon game = all empty (elems (tableau game)) && not (turnDeck game)
   
  -- |Does the second card follow the first?
  successor :: Card -> Card -> Bool
  successor a b = value a /= King && alternateColors a b && follows a b

  -- |Can the card move down from the deck to the given slot?
  cardDown :: Card -> Slot -> Bool
  cardDown card (Slot s _) | null s = value card == King
                           | otherwise = successor card (head s)

  -- |Can the card move to foundation?
  cardUp :: Card -> Foundation -> Bool
  cardUp (Card v suit) f | null cards = v == Ace
                         | otherwise = x /=King && succ x == v 
                         where
                           cards = f ! suit
                           (Card x _) = head cards

  -- |Can the card move from x to y?
  slotMove :: Slot -> Slot -> Bool
  slotMove (Slot from _) s | null from = False
                           | otherwise = cardDown (head from) s

Once we have a chunky load of predicates defined, we can put these together to get the list of moves possible from a given game state.

The getMoves function builds up a list of valid moves.


  getMoves :: Game -> [Move]
  getMoves g  = [DeckUp | (not.null) dk && cardUp (head dk) (foundation g)]
                ++ [(DeckTo . fst) x | not.null $ dk, x <- filter (cardDown (head dk) . snd) slots]
                ++ [TurnDeck | not.null $ dk] 
                ++ [ToFoundation is | is <- cardsUp]
                ++ slotMoves 
                ++ [GameOver | gameWon g] where
      dk = deck g
      slots = assocs (tableau g)
      cardsUp = map fst (filter (flip cardUpFromSlot (foundation g) . snd) slots)
      slotMoves = [MoveCards (fst x) 1 (fst y) | x <- slots, y <- slots, 
                                                 fst x /= fst y && slotMove (snd x) (snd y)]

There's also a corresponding move :: Game -> Move -> Game function that performs the given move and returns a new game state. It's a bit chunky to paste here (which probably indicates it could be improved!), but you can grab it from github.

Now that I have a basic model of the game, I just need to write a routine that picks the best move based on the current state. My current thoughts are just to sort the moves according to some criteria (e.g. moving a card up is always preferable to turning the deck), but initial attempts having been that good. Maybe I just need brute force!

Sunday, 6 December 2009

Modelling Klondike in Haskell

Following on from here I've made some progress on modelling the game of Klondike Solitaire in Haskell.

The code is really ugly and is definitely not idiomatic Haskell - if you can show me how to improve it then that would be much appreciated - it's the only way I'll learn!

There are several concepts in the game of Solitaire. There are:

Tableau - A set of seven stacks of cards. Cards can be transferred between each stack as long according to some rules.
Foundation - The foundation consists of four bases, each representing one suit of cards. The game finishes when all the cards have been placed in order (from Ace to King) onto the foundation.
Deck - The cards not in the foundation or the tableau are in the deck.

Each item in the tableau has a number of shown cards and some hidden cards.


  data Slot = Slot 
      {
        shown :: [Card] 
      , hidden :: [Card] 
      }deriving (Show,Eq)

  data Tableau = Tableau [Slot] deriving (Show)

One problem I had was that I wanted to represent a Tableau as being exactly 7 slots. I could do this with a tuple, but then my 7 element tuple would need lots of special functions instead of just using the list functions. There are packages for fixed size lists, but I wanted to try and do something minimal.

Foundations are simpler to model. A foundation consists of four bases.


  data Base = Base Suit [Card] deriving (Eq,Show)

  data Foundation = Foundation Base Base Base Base
                    deriving (Show)

I think I'm missing something - how can I change the definition of Foundation so that each base is represented by a different suit?

Finally, a Game consists of all of the elements described above.


  data Game = Game 
      {
        deck :: [Card]
      , foundation :: Foundation
      , tableau :: Tableau
      } deriving (Show)

In order to make progress in a Game, a series of Move's must be made. The possible moves are enumerated below:


  data Move = TurnDeck
            | ToFoundation Slot 
            | DeckTo Slot
            | DeckUp
            | MoveCards Slot Int Slot 
            | MoveCard Slot Slot
            | GameOver
              deriving (Show,Eq)

Once the basic data structures are in place (even if they can surely be improved), it's simple to sketch out the remaining functions.


  -- Make a move
  move :: Game -> Move -> Game

  -- Enumerate all the possible moves from the given game state
  getMoves :: Game -> [Move]

  -- Play a game using the given playing function and return the list of moves
  playGame :: Game -> (Game -> [Move] -> Move) -> [Move]

  -- Replay the list of moves
  replayMoves :: Game -> [Move] -> Game
  replayMoves = foldl move

My goal will be to write a function that picks the best move from the given choices of type Game -> [Move] -> Move and see how many games it solves. If any!

Before I go much further though, I need to spend some time working out how much more of the logic I can force inside the type system. For example, currently Foundation allows bases of the same suit; they should be different. Similarly, Tableau is just a list of slots. I know that if I just use map on this list to transform it, the list stays the same size, but it's not enforced by the type system (for example, I could use concatMap and change the size). Can I define that shown cards must always be lists of cards in alternating colours and successive values?

There's some more code on my git hub repository.

Tuesday, 1 December 2009

A Haskell Mini-Pattern

Rightly or wrongly, I found myself writing code like this:


  foo :: [Bar] -> [Baz]
  foo xs = if (any f xs) then [] else [Baz 4]

This can be replaced with a neater list comprehension


  [Baz 4 | any f xs]

If the right hand side of the list comprehension evaluation to false, then you get the empty list, else you get the single item in the list.

Neat, though I've no idea whether it's idiomatic Haskell!

Playing Cards with Haskell

As my first attempt at writing a reasonably complex lump of code in Haskell, I thought I'd try and model a card game called Klondike Solitaire. Apparently Solitaire is complex enough that:

... the odds of winning a standard game of non-Thoughtful Klondike are currently unknown. It has been said that the inability for theoreticians to calculate these odds is "one of the embarrassments of applied mathematics

The first step is to try and model the playing cards themselves. Cards have both a Suit and a Value.


  data Suit = Clubs
            | Diamonds
            | Hearts
            | Spades
              deriving (Eq,Enum,Show,Bounded)

  data Value = Ace
             | Two
             | Three
             | Four
             | Five
             | Six
             | Seven
             | Eight
             | Nine
             | Ten
             | Jack
             | Queen
             | King
             deriving (Eq,Enum,Show,Bounded)

  data Card = Card Value Suit
              deriving (Eq,Show)

Suit and Value both derive Eq (for equality), Show (for printing), Enum (for successor and predecessor functions) and Bounded (because they are within bounds). We can now define a complete deck of cards with a simple list comprehension.


  allCards :: [Card]
  allCards = [Card x y | x <- [Ace .. King],  y <- [Clubs .. Spades]]

Next there are a bunch of helper functions which seem like they'll be generally useful.


  color :: Card -> Color
  color (Card _ s) | s == Clubs || s == Spades = Black
                   | otherwise = Red

  value :: Card -> Value
  value (Card x _) = x

  -- |Are the two cards alternate colours?
  alternateColors :: Card -> Card -> Bool
  alternateColors a b = color a /= color b

  -- |Does the second card follow the first?
  follows :: Card -> Card -> Bool
  follows (Card King _) _ = False
  follows (Card v1 _) (Card v2 _) = succ v1 == v2

succ gives the successor of a given function. I don't think I can define Card as an instance of Enum because, depending on the context, the successor could be one of multiple values (e.g. the successor to Card Ace Spades might be a Card Two Spades or a Card Two Hearts depending on the context.

The -- | syntax is an indicator for the Haddock documentation tool so that you can associate the comment with the declaration.

Next time, I'll look at defining the types for Klondike Solitaire!

Tuesday, 10 November 2009

Understanding "Do" Blocks

Just some notes as I thumb through Chapter 7 of Real World Haskell.

Haskell "do" notation is a way of writing code to sequence some actions. However, it's simply just syntactic sugar for two functions, >>= and >> that have the following types:


  *Main> :t (>>=)
  (>>=) :: (Monad m) => m a -> (a -> m b) -> m b

  *Main> :t (>>)
  (>>) :: (Monad m) => m a -> m b -> m b

>> performs the first action, discarding the result, and then the second. The result of the function is the value of the second action.


  *Main System.Environment> getEnv "HOME"                 
  "/home/jfoster"
  
  *Main System.Environment> getEnv "USER"     
  "jfoster"

  *Main System.Environment> getEnv "HOME" >> getEnv "USER"
  "jfoster"

The result of chaining together the getEnv (String :: IO(String)) calls is to ignore the return value of the first action.

>>= takes the output of the first action, and feeds it into the second.


  *Main System.Directory> canonicalizePath "foo"                                        
  "/home/haskell/foo"

  *Main System.Directory> makeRelativeToCurrentDirectory "/home/haskell/foo"
  "foo"

  *Main System.Directory> canonicalizePath "foo" >>= makeRelativeToCurrentDirectory
  "foo"

As well as sequencing actions using >> and >>= there are a whole family of functions to do with evaluating actions.

mapM :: (Monad m) => (a -> m b) -> [a] -> m [b] takes a list of actions and performs the action specified for each one (when evaluated, remember it's lazy!), collecting together the results.


  *Main System.Environment> mapM getEnv ["HOME","USER"]
  ["/home/jfoster","jfoster"]

mapM_ :: (Monad m) => (a -> m b) -> [a] -> m () does exactly the same as mapM EXCEPT it discards the results. Useful for ensuring the side effects happen (for example, IO). Function names that end with a "M" are usually related to monads, and function names that end with an "_" typically discard their results.

Back to do blocks. Typically the final construct in a do block is a return statement. As I said previously return is like the opposite of <- - it takes a pure value and constructs an action out of it. Now that I understand a bit more Haskell the type definition of return makes more sense.


  return :: (Monad m) => a -> m a

There's some more information about do notation on the Haskell wiki.

Saturday, 7 November 2009

Haskell, YQL and a bit of JSON

One of the talks I enjoyed at StackOverflow Devdays was about YQL.

The Yahoo! Query Language is an expressive SQL-like language that lets you query, filter, and join data across Web services. With YQL, apps run faster with fewer lines of code and a smaller network footprint.

It provides a standard interface to a whole host of web services and, more importantly, it's extensible to support other data sources. The Data Tables web site contains more information about how to expose your data via YQL.

I'm still trying to learn Haskell so I thought I'd try to knock together a quick program to see how you'd make a basic query and process the results using Haskell. To make a web service call, I'll use Haskell Http and process the results using Text.JSON. Both of these are available to install using cabal.

To make a YQL query we need to point at the right URL, select the output format and URL encode the query text. I've fixed the output format as JSON as it's more light weight.


  yqlurl :: String
  yqlurl = "http://query.yahooapis.com/v1/public/yql?q="

  json :: String
  json = "&format=json"

  yqlRequest :: String -> IO (Result JSValue)
  yqlRequest query = do
    rsp <- simpleHTTP (getRequest (yqlurl ++ urlEncode query ++ json))
    body <- (getResponseBody rsp)
    return (decodeStrict body :: Result JSValue)

So now we have something we can play with in the interpreter and make queries with. The really nice property of YQL is being able to do joins with sub-selects. This helps avoids doing round-trips to the server and means less boilerplate code to join items together. For example, let's say we want to find the URLs of Haskell images from Flickr.


   *Main> yqlRequest "desc flickr.photos.search"
   -- Returns a description of how to search photos in flickr

   *Main> yqlRequest "select * from flickr.photos.search where text=\"haskell\""
   -- Find images where the text is Haskell

   *Main> yqlRequest "select urls from flickr.photos.info where 
             photo_id in (select id from flickr.photos.search where text=\"haskell\")"
   -- Find the URLs for images

That gives us raw JSON back, the next step is to process this into something relevant. The following YQL selects upcoming events in Cambridge.


  select description from upcoming.events where woeid in
    (select woeid from geo.places where text="Cambridge, UK")

woeid provides a way of getting the latitude and longitude of any place on earth. This is consistently used in the APIs so you can feed it in as a sub select. Very neat!

The goal of this strange task is simply to get a list of strings of the descriptions of events coming up in Cambridge. Firstly I defined a couple of helper functions. These feel really clumsy, so I'm 99% sure that there is a better way to do it, but I can't see it.


  getField :: [String] -> JSValue -> JSValue
  getField (x:xs) (JSObject j) = getField xs (fromJust (get_field j x))
  getField [] j = j
    
  toString :: JSValue -> String
  toString (JSString x) = fromJSString x

So now all we need to do is hook in a couple of functions to drill down into the JSON, yank the description out, and bundle it into a list.


  eventsInCambridge :: String
  eventsInCambridge = "Select description from upcoming.events where 
        woeid in (select woeid from geo.places where text=\"Cambridge, UK\")"

  getEventList = do
    response <- yqlRequest eventsInCambridge
    return (case response of
             Ok value -> (processEvents (getField ["query","results","event"] value))
             Error msg -> undefined)
  
  processEvents :: JSValue -> [String]
  processEvents (JSArray events) = map (toString .(getField ["description"])) events

And the output from this is a giant list of descriptions of the upcoming events in Cambridge. You can see the example data by clicking here.

Friday, 30 October 2009

Cambridge Stack Overflow Dev Days

I've just got back from the Stack Overflow Dev Day. Definitely worth attending (and well worth the price).

The opening key note from Joel focused on the balance of features vs. choice. 37 Signals popularized the idea of "simple software" (illustrated by simple bug tracking software). However, FogBugz figures don't back this up the number of sales is proportional to the number of features.

I got the feeling that this talk was targeted at companies whose business it is to sell directly to the customers. In larger software corporations you aren't targeting real end-users, you are trying to convince a budget manager that your software is better than the competitions. I guess this leads to the explosion in features (e.g. Word vs. Word Perfect, Lotus Notes vs. Excel) and ends up with over complicated solutions that do everything adequately and nothing outstanding.

I thoroughly enjoyed the conference and will definitely be going next time. Also found out about another related event.
Next up was Christian Heilmann who is a "developer evangelist" for Yahoo!. The talk show cased some of Yahoo's developer tech. I was very impressed by YQL which provides a consistent SQL-like API for accessing a whole suite of web api's (not just Yahoo). It was very impressive to see mashups created in a few seconds and I'll definitely need to poke around the APIs. The term JSONP came up together with Caja (a safe subset of JavaScript) - both of which I'd never heard of but sound like things I should know.

Frank Stajano gave a good talk of the human aspects of security, focusing on social engineering and illustrating it with examples from The Real Hustle. It seemed slightly at odds with the rest of the talks, but it was interesting and well presented! And I got some new terminology, sock puppets, astro-turfing and Sybil's.

After lunch we had the fastest introduction possible to ASP.NET MVC. Steve Sanderson went through a few slides of the architecture (very similar to Ruby On Rails!) and then proceeded to build a simple file management application at breakneck speed. Very enjoyable presentation again. And ASP.NET MVC even has Linux support with MonoDevelop.

Remy Sharp gave a similarly fast paced presentation of JQuery. It was impressive to see JQuery in action and the simple steps to build a plugin were good to see. The presentation ended with a heroic attempt to "live-code" a tag cloud application into Twitter!

The penultimate talk was about Python by Michael Foord. I wasn't too keen on this presentation. There were statements like "Dynamic Programming makes it easy to test" and "Dynamic Programming means I type less" that I didn't agree with and there was no follow up to back it up. The presentation gave a quick overview of Python using Norvig's spelling corrector as an example.

Finally, Jeff Atwood gave a quick talk about Stack Overflow which was short on detail but definitely watch-able. Whatever has been written about this guy, he (and a couple of others) are responsible for a huge web site that is apparently the 895th biggest site on the whole web! It's been hugely successful so it was interesting to hear where the ideas came from and so on.

In conclusion, definitely worth going too and roll on the next one.

Tuesday, 20 October 2009

Pretty Printing Basic JSON

JSON is an incredibly simple format. In fact, it's so simple that the entire spec can be placed on a business card.

It can be described in Haskell (all based on Real World Haskell - highly recommended!) very simply with an algebraic data type


data JValue = JString String
            | JNumber Double
            | JBool Bool
            | JNull
            | JObject [(String, JValue)]
            | JArray [JValue]
              deriving (Eq, Ord, Show)

This gives us all we need to be able to create JSON objects within ghci.


  *Main> JBool True
  JBool True

  *Main> JObject [("foo", JString "bar"), ("baz", JString "boo")] 
  JObject [("foo",JString "bar"),("baz",JString "boo")]

  *Main> JArray [JNumber 1,JNumber 2,JNumber 3]                                  
  JArray [JNumber 1.0,JNumber 2.0,JNumber 3.0]

This isn't hugely useful at the moment - we could write functions that build up JObjects and the results, but at the moment we have no way to turn them into a readable lump of JSON.

Pretty Printing is the name given to an process that applies stylistic formatting to content.

This is actually a pretty complicated problem. One solution we could use is just to write functions for each type of JValue and go from there. The problem with that is that it wouldn't allow us to keep track of the contextual information. For example, if you are printing out Java code then you might want you { and } characters to align or your columns capped at a maximum length and so on.

Pretty Printing has a long history, from GRIND (used to print Lisp) to Eclipse.

For Haskell, there have been two papers which have focused on the neat design of a pretty printing library. The Design of a Pretty-printing Library and A Prettier Printer both take the same approach of creating a document structure representing the output. Real World Haskell presents the JSON pretty printer using the same approach.

A Doc type defines the rendering of data that we'll do. It is simply defined as:


  data Doc = Empty
           | Char Char
           | Text String
           | Line
           | Concat Doc Doc
           | Union Doc Doc
             deriving (Show,Eq)

Construction functions are provided to go from primitive types to document objects.


  empty :: Doc
  empty = Empty

  char :: Char -> Doc
  char c = Char c

  text :: String -> Doc
  text str = Text str

  double :: Double -> Doc
  double num = text (show num)

  line :: Doc
  line = Line

There are various operations defined for documents. The infix <> operator is used to denote the concatenation of two documents.


  (<>) :: Doc -> Doc -> Doc
  Empty <> y = y
  x <> Empty = x
  x <> y = x `Concat` y

  hcat :: [Doc] -> Doc
  hcat = fold (<>)

  fold :: (Doc -> Doc -> Doc) -> [Doc] -> Doc
  fold f = foldr f empty

  empty :: Doc
  empty = Empty

  () :: Doc -> Doc -> Doc
  x  y = x <> softline <> y

  fsep :: [Doc] -> Doc
  fsep = fold ()

  enclose :: Char -> Char -> Doc -> Doc
  enclose left right x = char left <> x <> char right

  softline :: Doc
  softline = group line

  group :: Doc -> Doc
  group x = flatten x `Union` x

  flatten :: Doc -> Doc
  flatten (x `Concat` y) = flatten x `Concat` flatten y
  flatten Line           = Char ' '
  flatten (x `Union` _)  = flatten x
  flatten other          = other

softline probably takes some explaining. It maintains two alternative paths (using Union). The flatten function replaces new lines with a space, joining two separate lines into one. The other side of the union leaves the document untouched.

All this seems rather abstract; how does it fit into rendering JSON? Firstly we have to define some functions that allow us to create document objects from the JValue instances we declared earlier:


  renderJValue :: JValue -> Doc
  renderJValue (JString s) = string (show s)
  renderJValue (JNumber n) = string (show n)
  renderJValue (JBool True) = string "true"
  renderJValue (JBool False) = string "false"
  renderJValue JNull = string "null"
  renderJValue (JArray ary) = series '[' ']' renderJValue ary
  renderJValue (JObject obj) = series '{' '}' field obj
      where field (name,val) = string name
                            <> text ": "
                            <> renderJValue val

Rending compound objects (JArray and JObject) makes use of a couple of helper functions which break down the object into individual elements:


  series :: Char -> Char -> (a -> Doc) -> [a] -> Doc
  series open close item = enclose open close 
                         . fsep . punctuate (char ',') . map item

  punctuate :: Doc -> [Doc] -> [Doc]
  punctuate p [] = []
  punctuate p [d] = [d]
  punctuate p (d:ds) = (d <> p) : punctuate p ds

So now we have a way of turning JValue into Doc objects. As you can see, this is anything but pretty!


  *Main> renderJValue (JBool True)
  Concat (Concat (Char '"') (Concat (Char 't') (Concat (Char 'r') 
    (Concat (Char 'u') (Char 'e'))))) (Char '"')

  *Main> renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])  
  Concat (Concat (Char '[') (Concat (Concat (Concat (Concat (Concat (Char '"') 
    (Concat (Char '1') (Concat (Char '.') (Char '0')))) (Char '"')) (Char ',')) 
    (Union (Char ' ') Line)) (Concat (Concat (Concat (Concat (Concat (Char '"') 
    (Concat (Char '2') (Concat (Char '.') (Char '0')))) (Char '"')) (Char ',')) 
    (Union (Char ' ') Line)) (Concat (Concat (Concat (Char '"') (Concat (Char '3') 
    (Concat (Char '.') (Char '0')))) (Char '"')) (Union (Char ' ') Line))))) (Char ']')

The main point is that now we have the contextual information and can write a function to render the document. The simplest possible way of printing out the document is just to pick one side of each union operator and print.


  compact :: Doc -> String
  compact x = transform [x]
      where transform [] = ""
            transform (d:ds) = 
                case d of
                  Empty -> transform ds
                  Char c -> c : transform ds
                  Text s -> s ++ transform ds
                  Line -> '\n' : transform ds
                  a `Concat` b -> transform (a:b:ds)
                  _ `Union` b -> transform (b:ds)

This gives back a very basic rendering:


  *Main> putStrLn (compact (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])
  ["1.0",
  "2.0",
  "3.0"
  ]

But we can easily extend this to bound printing within a certain range:


  pretty :: Int -> Doc -> String
  pretty width x = best 0 [x]
      where best col (d:ds) = 
                case d of
                  Empty -> best col ds
                  Char c -> c : best (col + 1) ds
                  Text s -> s ++ best (col + length s) ds
                  Line -> '\n' : best 0 ds
                  a `Concat` b -> best col (a:b:ds)
                  a `Union` b -> nicest col (best col (a:ds))
                                            (best col (b:ds))
            best _ _ = ""
            nicest col a b | (width - least) `fits` a = a
                           | otherwise = b
                           where least = min width col

  fits :: Int -> String -> Bool
  w `fits` _ | w < 0 = False
  w `fits` "" = True
  w `fits` ('\n':_) = True
  w `fits` (c:cs) = (w - 1) `fits` cs

Now we can render to a fixed width, so:


  *Main> putStrLn (pretty 30 (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])))                      
  ["1.0", "2.0", "3.0" ]

  *Main> putStrLn (pretty 15 (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])))                      
  ["1.0", "2.0", 
  "3.0" ]

RWH goes into a lot more detail that the above basic example (for example handling escaping properly and so on), it's definitely worth buying.

Monday, 12 October 2009

Declaring Modules in Haskell

A Haskell program is a collection of modules. Each module consists of a declaration and a number of exports. Modules can relate to each other by import statements.

The most basic declaration is


  module MyModule where
  
  foo x = 1 + x
  bar x = foo (foo x)

If you omit an export list then everything in the module is available to the outside world. Typically you want to limit the exports to provide encapsulation and this is accomplished by explicitly naming the modules you want to expose. In the example below only bar is exported, foo is private to this module.


  module MyModule (bar) where

  foo x = 1 + x
  bar x = foo (foo x)

Note that a module name must correspond to the file name of the haskell source, so the above must be in a file called MyModule.hs.

To relate modules to each other the import directive is used.

Thursday, 1 October 2009

Dealing with Partial Functions

Functions can be either total (e.g. fully defined for all possible input values) or partial. Handling partial functions can be painful, for example if you pass an empty list to head then it'll terminate the program with an exception.

Partial functions are usually handled by either pattern matching being used to exclude cases where the function isn't defined e.g.


  foo [] = 33      -- not safe to call head
  foo xs = head xs -- safe to call head

Or using Maybe to indicate (via the type system) that the function might not return a value. The first exercise of chapter 4 of Real World Haskell asks you to rewrite some of the partial list functions so that they never fail.


  safeHead :: [a] -> Maybe a
  safeHead [] = Nothing
  safeHead (x:_) = Just x

  safeTail :: [a] -> Maybe [a]
  safeTail [] = Nothing
  safeTail (_:xs) = Just xs

  safeLast :: [a] -> Maybe a
  safeLast [] = Nothing
  safeLast (y:[]) = Just y
  safeLast (_:xs) = safeLast xs

  safeInit :: [a] -> Maybe [a]
  safeInit [] = Nothing
  safeInit (x:[]) = Just []
  safeInit (x:xs) = Just (x : fromJust(safeInit xs))

In order to be thorough, I thought I'd explore HUnit a bit more and write a few tests. The funny "~:" and "~=?" are just syntactic sugar for writing assertions. One area that confused me was the need to specify the type explicitly on the functions for undefined values. I'm not entirely sure why this has to be done...


  testSafeHead = test [ "Safe head non-empty" ~: Just 1 ~=? safeHead [1]
                      , "Safe head null" ~: Nothing ~=? (safeHead [] :: Maybe Int) ]
  
  testSafeTail = test [ "Safe tail empty" ~: Just [] ~=? safeTail [1]
                      , "Safe tail value" ~: Just [1] ~=? safeTail [2,1]
                      , "Safe tail nothing" ~: Nothing ~=? (safeTail [] :: Maybe [Int]) ]

  testSafeLast = test [ "Safe last empty" ~: Nothing ~=? (safeLast [] :: Maybe Int)
                      , "Safe last 1 element" ~: Just 1 ~=? safeLast [1]
                      , "Safe last n element" ~: Just 1 ~=? safeLast [3,2,1] ]

  testSafeInit = test [ "Safe Init empty" ~: Nothing ~=? (safeInit [] :: Maybe [Int])
                      , "Safe Init singleton" ~: Just [] ~=? safeInit [1]
                      , "Safe Init list" ~: Just [1,2] ~=? safeInit [1,2,3] ]

  tests = TestList [TestLabel "safeHead" testSafeHead
                   ,TestLabel "safeTail" testSafeTail
                   ,TestLabel "safeLast" testSafeLast
                   ,TestLabel "safeInit" testSafeInit]

You can run these tests within ghci with


  *Main> runTestTT tests
  Cases: 11  Tried: 11  Errors: 0  Failures: 0
  Counts {cases = 11, tried = 11, errors = 0, failures = 0}

I think these tests would have been better written with QuickCheck where I could have compared them with the non-safe versions. But this'll do for now!

Tuesday, 29 September 2009

Graham Scan in Haskell

Most of the exercises at the end of Chapter 3 of Real World Haskell are fairly simple. Find the height of a list, verify a list is a palindrome and so on. The last few however are pretty tough..

The convex hull is the smallest set of points that contain all the other points in the set. The image below from Wikipedia is a much better visual description:

Image from Wikipeda

One approach for finding the convex hull is the Graham Scan algorithm and this is the challenge presented in the book. The algorithm can be broken down into the following steps:

Find the minimum point (with the lowest Y co-ordinate). In the event of a tie, the lowest x co-ordinate determines the minimum. Call this point P.
Sort the remaining points relative to the angle formed by point and P relative to the x-axis.
Consider each point in the sequence. For each point, determine whether it is a left turn or a right turn relative to the previous two points. If it is a "right turn" the second to last point is not in the convex hull. Continue until the list is finished.

To solve this, we'll declare some data structures which should be self explanatory.


  data Direction = Straight
                 | LeftTurn 
                 | RightTurn
                   deriving (Show)

  data Point = Point Double Double 
               deriving (Show,Eq)

Next we define a function that calculates the turn made by three points and returns a Direction. We don't need to calculate the actual angle as the cotangent is proportional to the angle.


  cross :: Point -> Point -> Point -> Double
  cross (Point x1 y1) (Point x2 y2) (Point x3 y3) =  ((x2-x1)*(y3-y1)-(x3-x1)*(y2-y1))

  dist :: Point -> Point -> Double
  dist (Point x1 y1) (Point x2 y2) = sqrt((x1-x2)^2 + (y1-y2)^2)

  turn :: Point -> Point -> Point -> Direction
  turn a b c = makeDirection (cross a b c) where
      makeDirection x | x == 0 = Straight
                      | x < 0 = LeftTurn
                      | x > 0 = RightTurn

In order to find the minimum Point and sort the list we need to define two comparator functions for use with sortBy and minimumBy (both members of Data.List).


  compareYPoint :: Point -> Point -> Ordering
  compareYPoint (Point x1 y1) (Point x2 y2)
      | y1 == y2 = compare x1 x2
      | y1 <= y2  =  LT
      | otherwise =  GT

  compareCross :: Point -> Point -> Point -> Ordering
  compareCross pvt a b = if (angle == EQ) then distance else angle where
      angle = compare (cross pvt a b) 0                 
      distance = compare (dist pvt a) (dist pvt b)

Even though compareCross takes 3 arguments, we can still use this as type Point -> Point -> Ordering by using currying.


  lowestY :: [Point] -> Point
  lowestY = minimumBy compareYPoint

  grahamScan :: [Point] -> [Point]
  grahamScan points = undefined where
      p = nub points
      pvt = lowestY p
      sortedPoints = pvt : (sortBy (compareCross pvt) (delete pvt p))

nub is a function that removes duplicate elements from a list. This gives the skeleton of the program. Now we have to do the complicated bit. From Wikipedia:

The algorithm proceeds by considering each of the points in the sorted array in sequence. For each point, it is determined whether moving from the two previously considered points to this point is a "left turn" or a "right turn". If it is a "right turn", this means that the second-to-last point is not part of the convex hull and should be removed from consideration. This process is continued for as long as the set of the last three points is a "right turn". As soon as a "left turn" is encountered, the algorithm moves on to the next point in the sorted array

Let's take the simplest example possible. For the following points (0,0), (2,0), (2,2), (0,2), (1,1) the convex hull should be everything apart from (1,1).

In order to check how this is going to work, we need to define a quick helper function to make building lists of Point types less verbose.


  pointsFromTupleList :: [(Double,Double)] -> [Point]
  pointsFromTupleList = map (\(x,y) -> Point x y)

  examplePoints :: [Point]
  examplePoints = pointsFromTupleList [(0,0),(2,0),(2,2),(0,2),(1,1)]

Now we can explore the functions that we're going to use and verify they work



  -- The minimum point appears correct
  *Main> lowestY examplePoints                                                   
  Point 0.0 0.0

  -- The points are sorted as expected
  *Main> let sortedPoints = (Point 0 0):(sortBy (compareCross (Point 0 0)) (delete (Point 0 0) examplePoints))

  *Main>sortedPoints
  [Point 0.0 0.0,Point 0.0 2.0,Point 1.0 1.0,Point 2.0 2.0,Point 2.0 0.0]

The next task is to calculate the turns made.


  *Main> getTurns sortedPoints
  [LeftTurn,RightTurn,LeftTurn]

That's not much use because I don't know the points which constitute the turns. Let's tuple up the results:


  getTurns :: [Point] -> [((Point,Point,Point),Direction)]
  getTurns (x:y:z:ps) = ((x,y,z),turn x y z) : getTurns (y:z:ps)
  getTurns _ = []

  *Main> getTurns sortedPoints
  [((Point 0.0 0.0,Point 0.0 2.0,Point 1.0 1.0),LeftTurn),
   ((Point 0.0 2.0,Point 1.0 1.0,Point 2.0 2.0),RightTurn),
   ((Point 1.0 1.0,Point 2.0 2.0,Point 2.0 0.0),LeftTurn)]

The algorithm states that I should ignore the second to last parameter if it is a right turn, and keep it otherwise. Clearly we're close here because the second to last parameter for a right turn is the only point not in the convex hull. One problem with the sortedPoints is that we're not returning to where we started. Let's change the definition again so that we can get back to where we started. We add two nodes at the end so that the extra padding means we consider all triplets


  sortedPoints = pvt : (sortBy (compareCross pvt) (delete pvt p)) ++ [pvt,pvt]

  *Main> getTurns sortedPoints
  [((Point 0.0 0.0,Point 0.0 2.0,Point 1.0 1.0),LeftTurn),
   ((Point 0.0 2.0,Point 1.0 1.0,Point 2.0 2.0),RightTurn),
   ((Point 1.0 1.0,Point 2.0 2.0,Point 2.0 0.0),LeftTurn),
   ((Point 2.0 2.0,Point 2.0 0.0,Point 0.0 0.0),LeftTurn),
   ((Point 2.0 0.0,Point 0.0 0.0,Point 0.0 0.0),Straight)]

So now all I need to do to solve the problem is keep the middle point in all cases where we've got a LeftTurn or Straight. I can simplify this further by just only getting the middle point in the tuple.


  getTurns :: [Point] -> [Point,Direction)]
  getTurns (x:y:z:ps) = (y,turn x y z) : getTurns (y:z:ps)
  getTurns _ = []

  grahamScan :: [Point] -> [Point]
  grahamScan points = map fst (filter (\(x,d) -> d /= RightTurn) (getTurns sortedPoints)) where
      p = nub points
      pvt = lowestY p
      sortedPoints = pvt : (sortBy (compareCross pvt) (delete pvt p)) ++ [pvt,pvt]

  -- Hurrah, it finally works (for this one example).
  *Main> grahamScan examplePoints 
  [Point 0.0 2.0,Point 2.0 2.0,Point 2.0 0.0,Point 0.0 0.0]

One example works, that'll do me as I've learnt enough from this exercise to make it worthwhile. I strongly suspect the way I'm extending the point list by appending the pivot twice at the end is just a hack rather than the right way. There's another definition here which looks cleaner, but that apparently has some issues too.

The complete code looks like this. hlint was run on this and pointed out the "uncurry" function which converts a curried function to a function on pairs. This simplifies the definition of pointsFromTupleList.


  data Direction = Straight
                 | LeftTurn 
                 | RightTurn
                   deriving (Show,Eq)

  data Point = Point Double Double 
               deriving (Show,Eq)

  turn :: Point -> Point -> Point -> Direction
  turn a b c = makeDirection (cross a b c) where
      makeDirection x | x == 0 = Straight
                      | x < 0 = LeftTurn
                      | x > 0 = RightTurn

  cross :: Point -> Point -> Point -> Double
  cross (Point x1 y1) (Point x2 y2) (Point x3 y3) =  (x2-x1)*(y3-y1)-(x3-x1)*(y2-y1)

  dist :: Point -> Point -> Double
  dist (Point x1 y1) (Point x2 y2) = sqrt((x1-x2)^2 + (y1-y2)^2)

  compareCross :: Point -> Point -> Point -> Ordering
  compareCross pvt a b = if angle == EQ then distance else angle where
      angle = compare (cross pvt a b) 0                 
      distance = compare (dist pvt a) (dist pvt b)

  getTurns :: [Point] -> [(Point,Direction)]
  getTurns (x:y:z:ps) = (y,turn x y z) : getTurns (y:z:ps)
  getTurns _ = []

  grahamScan :: [Point] -> [Point]
  grahamScan points = map fst (filter (\(x,d) -> d /= RightTurn) (getTurns sortedPoints)) where
      p = nub points
      pvt = lowestY p
      sortedPoints = pvt : sortBy (compareCross pvt) (delete pvt p) ++ [pvt,pvt]

  compareYPoint :: Point -> Point -> Ordering
  compareYPoint (Point x1 y1) (Point x2 y2)
      | y1 == y2 = compare x1 x2
      | y1 <= y2  =  LT
      | otherwise =  GT

  compareAngle :: Point -> Point -> Point -> Ordering
  compareAngle (Point px py) p1 p2 = compare (angle p2) (angle p1) where
      angle (Point x1 y1) = y1-py / x1-px

  lowestY :: [Point] -> Point
  lowestY = minimumBy compareYPoint

  pointsFromTupleList :: [(Double,Double)] -> [Point]
  pointsFromTupleList = map (uncurry Point)

  examplePoints :: [Point]
  examplePoints = pointsFromTupleList [(0,0),(2,0),(2,2),(0,2),(1,1)]

Saturday, 26 September 2009

Building Basic Types in Haskell

Types are the building blocks of Haskell programs. Algebraic Data Types (not to be confused with Abstract Data types) are one category of types. An algebraic data type is defined as either an enumeration (a set of distinct types) or a discriminated union (a choice between multiple alternatives). For example:


  -- An enumeration type
  data PrimaryColor = Red 
                    | Green
                    | Blue

  *Main> :type Red 
  Red :: PrimaryColor

  -- A discrimated union type
  data MyIntList = Empty
                 | List Int MyIntList
                 deriving Show

  *Main> :t Empty
  Empty :: MyIntList

  *Main> :t List 3 Empty
  List 3 Empty :: MyIntList

Note that MyIntList is a recursively defined data type because it uses itself in the definition. This type of structural type recursion is just the same as calling a function recursively. As long as we have a base case (in this case Empty then we can easily build the structure. The way an instance of a type was constructed is available at runtime. Pattern matching is a way of breaking down a value into constiuent parts. For example, we could (but should not!) write a function to calculate the length of the MyIntList type as follows:


  myIntListLength :: MyIntList -> Integer
  myIntListLength Empty = 0
  myIntListLength (List _ rest) = 1 + myIntListLength rest

There's a good talk about the use of OCaml for financial trading systems available here. Amongst other things the strengths of the pattern matching idea are discussed. In a large system you can change a type, recompile and find out any redundant, missed or impossible cases. In OCaml this seems to be on by default, with GHC you need to use the "-fwarn-incomplete-patterns" option.

Record syntax is an alternative way of specifying a type such that parameters can be referred to be name.


  data Person = Person {
        firstName :: String
      , lastName :: String
      , title :: String
      } deriving (Show)

  -- Normal way of creating a Person
  *Main> Person "Bill" "Bob" "Mr"
  Person {firstName = "Bill", lastName = "Bob", title = "Mr"}

  -- Record syntax allows you to specify parameters by name
  -- and therefore order does not matter
  *Main> Person { title="Mr", firstName="bill", lastName="bob" }
  Person {firstName = "bill", lastName = "bob", title = "Mr"}

The MyIntList type is very poor. It's only defined for lists of type integer. We can improve this by making it a parametrized type. In the example below we defined MyList for any type a.


  data MyList a = Empty
                | MyList a (MyList a)
                deriving (Show)

  -- MyList of numbers
  *Main> :t MyList 3 (MyList 4 Empty) 
  MyList 3 (MyList 4 Empty) :: (Num t) => MyList t

  -- MyList of characters
  *Main> :t MyList 'a' (MyList 'b' Empty)
  MyList 'a' (MyList 'b' Empty) :: MyList Char

You can have multiple parametrized types. For example, if we wanted to create a list that alternated types, we could:


  data MyOddList a b = OddEmpty
                     | MyOddList a (MyOddList b a)
                     deriving (Show)

  *Main> :t MyOddList 3 (MyOddList 'c' (MyOddList 4 OddEmpty))  
  MyOddList 3 (MyOddList 'c' (MyOddList 4 OddEmpty)) :: (Num t) => MyOddList t Char

Monday, 21 September 2009

Real World Haskell (2)

Chapter 2 of Real World Haskell gives a quick overview of types and functions. I think I've already picked up most of the bits in here, so this is just some notes on the things I found new / interesting.

One of Haskell's distinguishing features is its strong type system. Real World Haskell says the three interesting aspects are:

Strong types - The type system guarantees that the program does not contain certain errors (treating an integer as a float). In comparison, something like C allows you to reinterpret the bytes as you see fit (weak typing)
Static Types - At the point the code is built, the compiler knows the type of every expression and value. In comparison dynamically typed languages only know the type of an expression at runtime.
Type Inference - The compiler can automatically infer the types of expressions. Type definitions are optional and Haskell can deduce the required types (however, it seems to be good practise to provide the type signature as a documentation aid).

Lists and tuples are the primary way of grouping elements. Items in a list must all be the same type, items in a tuple can be different types. One interesting point made was that each tuple is a unique type and therefore writing a function to "get the second element from any tuple" is almost impossible. Looking at the types of fst :: (a,b) -> a and snd :: (a,b) -> b, it's easy to see why. In order to get the second element from any tuple, we'd have to define it for an infinite range of tuples (a,b,c), (a,b,c,d) and so on.

When Haskell evaluates an expression it uses normal-order evaluation. Normal-order evaluation simply evaluates expressions as they are needed. This is also known as a lazy evaluation strategy because expressions that don't contribute to the result aren't evaluated. This avoids the special case treatment that short-circuiting operators get in other languages.

Sunday, 20 September 2009

Real World Haskell

I finally got around to ordering Real World Haskell. Currently the only book on Haskell I have is The Haskell School of Expression which is enjoyable, but I've never got around to installing the relevant packages and passive reading only gets you so far!

I thought I'd try to keep notes as I go through the book as that seems to be a good way to prompt myself to actually use books (working through PAIP was very rewarding, so hopefully RWH will be a similar experience!) rather than just adorn my study with expensive door stops. This isn't meant to make particular sense, rather just a brain dump as I read through it.

So on with the book. Chapter 1 covers setting up the environment. There's a few interesting ghci commands I had not seen like :set prompt and :module + M. :set +t will always print the type of an expression. The "it" that is displayed is the name of a special variable which contains the value of the last expression evaluated. :info displays the precedence of operators. For example / has higher precedence than +.

Prelude> :info (+)
class (Eq a, Show a) => Num a where
(+) :: a -> a -> a
...
-- Defined in GHC.Num
infixl 6 +

Prelude> :info (/)
class (Num a) => Fractional a where
(/) :: a -> a -> a
...
-- Defined in GHC.Real
infixl 7 /

An infix operator can be converted to a prefix operator by enclosing the operator in parentheses. Similarly, back-ticks can convert a prefix operator into an infix operator.

Enumerations are a way of producing lists of numbers. The Haskell Cheat Sheet gives a great summary of the possibilities.

[1..3] -> [1,2,3]
[1..] -> [1,2,3,4,...] (infinite list)
[10..1] -> [] (ranges only go forward)
[0, -1 ..] -> [0, -1, -2, ...] (infinite list)
[1,3..10] -> [1,3,5,7,9] (list from 1 to 10 with diff of 2)

-- Also applies to any Enum class
['a' .. 'e'] -> "abcde"
['a', 'd' .. 'z'] -> "adgjmpsvy"

Don't use floating points for enumeration because the behaviour can be "quirky" (rounding errors and so on).

The empty string is a synonym for [] for the type Char,so 'a' : "" -> "a".

Rational numbers are defined in the Data.Ratio module, and constructed with % (e.g. 22 % 7 is nearly Pi).

The exercises showed me a function I'd not seen before. interact is a function which takes a function which takes stdin as input, and returns a string which is written to stdout (e.g. interact :: (String -> String) -> IO()). Also, once I grokked that "wc" was referring to the Unix command it all became a little clearer!

Wednesday, 16 September 2009

To Type Class or not to Type Class

Last time I wrote a very simple log parser, but it was very specific. The next task was to work out how to generalize it.

My first thoughts were to use type classes to make the code more generic. Type classes represent a constraint on a type variable in a parametrically polymorphic type. Parametric polymorphism simply means a generic function can be written so it handles values identically independent of type (the list functions are a good example).

So I started, and ended up typing in gibberish like this:


  class Report r where
      printReport :: r -> String

  class (Eq l, Show l) => LogProcessor l where
      processLine :: String -> Maybe l
      combineUnit :: (Report r) => l -> r -> r

I then read OOP vs type classes and Learning Haskell Notes which made me think that type classes is the wrong approach.

All I'm actually trying to do is change the behaviour of a couple of functions (parsing a line and combining the results). I'm not really interested in any types and there should be as few constraints as possible in terms of what you can implement. The simplest way (and probably the right way) is to just pass these functions in


  processFile :: FilePath -> (String -> Maybe t) -> IO([t])
  processFile path f = do
    a <- readFile path
    return (Maybe.mapMaybe f (lines a))

  reportFile :: FilePath -> (String -> Maybe t) -> ([t] -> String) -> IO()
  reportFile path func comb  = do
    a <- processFile path func
    print (comb a)
    return ()

There aren't really any constraints here. processFile now processes a file path, returning IO([t]). A combining function processes this list and generates a string to give the result. Much simpler to use too.

Parsing Logs with Haskell

I spend way too much of my time digging through gigabytes of log files looking for interesting events. I've looked at this before using Clojure, so I thought I would try to translate the basics into Haskell and then (over the course of the next few weeks) try to develop into something general purpose and halfway useful.

Building a report from a log file (or collection of log files) typically requires:

A representation of the events I'm interested in
A predicate for determining whether the line is an event (String -> Bool)
Converting the line into an event (String -> Event)
Folding the results into a single report ([Event] -> Report)

In this example, I wanted to parse my dpkg logs (/var/log/dpkg.log) to see how many times I've upgraded packages on Ubuntu. Firstly I defined a type to represent an upgrade event:


  type Package = String

  data Upgrade = Upgrade { packageName :: Package
                         , updateTime :: UTCTime } 

  instance Show Upgrade where
      show a = show (updateTime a) ++ 
               ":" ++ show (packageName a)

instance Show Upgrade is similar to deriving Show, but allows you to customize how the object will be converted to a string.

Next step is to get a predicate and to parse the lines. I decided to combine these into one function and return Maybe Upgrade to indicate success / failure. I used the Date.Time modules to parse the data and time. The parsing is terrible but suffices for now as I'm just trying to get an idea of where I need to generalize. Note to self read (and ideally understand!) Monadic Parsing Combinators [PDF] and associated Haskell module.


  getTime :: String -> UTCTime
  getTime = fromJust . parseTime defaultTimeLocale timeFormat

  getPackageName :: String -> String
  getPackageName = takeWhile (not . Char.isSpace)
               
  -- Poor mans parsing.
  parseLine :: String -> Maybe Upgrade
  parseLine s 
            | isInfixOf " upgrade " s = Just 
                                        (Upgrade 
                                         (takeWhile (not . Char.isSpace) (drop 28 s))
                                         (getTime (take 20 s)))
            | otherwise = Nothing

All I need to do now is a combining action to perform with foldl. For this I've defined a report of type Map Day [Package] which represents an association between a day and all the names of the packages updated on that day.


  processFile :: FilePath -> IO([Upgrade])
  processFile s = do 
    a <- readFile s
    return (Maybe.mapMaybe parseLine (lines a))

  type Report = Map Day [Package]

  combine :: [Upgrade] -> Report
  combine = foldl addToReport Map.empty 

  addToReport :: Report -> Upgrade -> Report
  addToReport r p = Map.insert day packages r where
                    day = utctDay (updateTime p)
                    initVal = Map.findWithDefault [] day r
                    packages = packageName p:initVal

  reportFile :: FilePath -> IO()
  reportFile f = do
    a <- processFile f
    print (combine a)
    return ()

Hurrah, so now I get output in the right format and I can see that I really shouldn't have added some of the Firefox 3.5 bleeding edge repositories to my Ubuntu upgrade paths. Upgrading Firefox (or Shiretoko) every few days is a bad thing. D'oh!

The next stage for me to understand is how I can generalize this. What I really want to develop next is a simple pluggable framework. It seems that I need to generalize at least the following bits:

Parsing a line into a type T
Combining [T] to produce a single report

Saturday, 12 September 2009

HLint

I found mention of HLint via this post. Installing it was as simple as the instructions provided. The binary ended up in $HOME/.cabal/bin which confused me for a few minutes.

A quick run over my code showed many improvements that I could make. For example. I don't make as much use of eta reduction as I could and HLint not only tells me this, but suggests a solution.


  ./randomText.hs:15:1: Warning: Eta reduce
  Found

  createTrainingSet s
    = foldl' updateMap Map.empty (wordPairs (words s))

  Why not

  createTrainingSet = foldl' updateMap Map.empty . wordPairs . words

I also use far too many brackets in my code. That's a combination of not knowing the precedence rules and switching over from Clojure! Again, a warning is provided and the minimum brackets required shown.

HLint also finds common patterns in code and suggests rewriting them in terms of fold or map. For example when I was trying to write my own map:


  ./myfunctions.hs:11:1: Error: Use map
  Found

  mymap f [] = []
  mymap f (x : xs) = f x : mymap f xs

  Why not

  mymap f xs = map f xs

And similarly for written my version of length:


  ./myfunctions.hs:3:1: Warning: Use foldl
  Found

  mylength [] count = count
  mylength (x : xs) count = mylength xs (1 + count)

  Why not

  mylength xs count = foldl (\ count x -> 1 + count) count xs

I'll definitely be making use of this before I post any more Haskell code!

Wednesday, 9 September 2009

Generating Text in Haskell

Previously I implemented an example from ANSI Common Lisp that generated a stream of text based on the word frequencies of a sample document. This produces surprisingly realistic looking text.

The data structures that represent the frequency map are typed as below. type defines a type synonym and are purely used to improve readability (e.g. like typedef in C/C++)


  type Followers = Map String Int
  type WordSuccessors = Map String Followers

To generate this data structure, we generate all pairs of words, and then add them to an instance of the WordSuccessors structure until complete.


  createTrainingSet :: String -> WordSuccessors
  createTrainingSet s =  foldl' updateMap Map.empty (wordPairs (words s))

  updateMap :: WordSuccessors -> (String,String) -> WordSuccessors
  updateMap m (x,y) = Map.insert x v m where
                      q = Map.findWithDefault (Map.singleton y 0) x m
                      v = Map.insert y (succ (Map.findWithDefault 0 y q)) q
    
  wordPairs :: [String] -> [(String,String)]
  wordPairs l = zip l (tail l)

One tip I picked up from Real World Haskell was using undefined. You can use this to sketch out type signatures of functions, prior to putting the body in. For example, updateMap was a pain to get right, but with undefined I can fill in bits of the implementation by replacing undefined with a real implementation. Each time I can use the compiler to validate I haven't gone too far wrong.


  updateMap :: WordSuccessors -> (String,String) -> WordSuccessors
  updateMap m (x,y) = undefined

  -- Refine
  updateMap m (x,y) = Map.insert x v m where
                      q = undefined 
                      v = undefined 

  -- Refine 
  updateMap m (x,y) = Map.insert x v m where
                      q = Map.findWithDefault (Map.singleton y 0) x m
                      v = undefined 

  -- Refine
  updateMap m (x,y) = Map.insert x v m where
                      q = Map.findWithDefault (Map.singleton y 0) x m
                      v = Map.insert y (succ (Map.findWithDefault 0 y q)) q

Types help bound your program to make sure it does what you think. Developing in this way draws parallels with test-driven development. Instead of a suite of tests (which I have to write properly!), I can use the type system to verify the program still does what I think it should do! I realize that types don't pick up everything (maybe dependent types will minimize this in the future), but then neither do unit tests!

Next step is to generate random text based on this word successors information. Random functions aren't pure (obviously) and I went a long way down various wrong roads (mostly involving changing the return type of nextWord to IO(String), propagating everything to type IO and then realizing I'm going down the wrong road). On IRC (#Haskell) someone suggested an idiom of passing an infinite list of random numbers to a function - this keeps it pure (the impurity is handled in generating the list) at the cost of an additional argument.

The algorithm is comically naive, but it's just about fast enough. I couldn't find a better way of selecting an element based on a frequency distribution stored in a map though I'm sure there is one!


  nextWord :: [Int] -> WordSuccessors -> String -> ([Int],String)
  nextWord seeds fm start = (r, (poss !! (mod s count))) where 
                            successors = fm Map.! start
                            count = Map.fold (+) 0 successors
                            poss = Map.foldWithKey (\k v acc -> (replicate v k) ++ acc) [] successors
                            s = head seeds
                            r = drop 1 seeds

The function returns the next word given the previous, and also shifts the random number sequence along again.

Finally, we need something to drive this. main returns a type IO(String) because it has to perform two bits of IO (reading the sample file and generating the random number).


  maxWordCount :: Int
  maxWordCount = 1000000

  main :: IO(String)
  main = do
    text <- readFile exampleFile
    gen <- newStdGen
    let training = createTrainingSet text
        seeds = randomRs (0,maxWordCount) gen
    return (unwords (map snd (iterate (\(s,w) -> nextWord s training w) (seeds,"by"))))

So finally, we can generate an infinite sequence of random-ish text. Here's an example, based on "Manual of Surgery". Apparently it was the most popular book downloaded. Weird and slightly disturbing.

The patient from a long bones, and, by the blood threatens to fat. In the lower or near the blood vessel, and a pustular eruption are usually be inserted, and irregularity in Pus from three weeks or to employ considerable size by a soft, rapidly developed over looked. If, however, regain the compression causes suffering from one collection of small saphena system run an ideal manner of the risk of pain; the patient's strength of the best hope of wounds are believed to the toxic symptoms which the suppuration are reduced by ankylosis. In surgical diseases, from repeated on the limb becomes discoloured and is fluid and upwards, and loss of any prospect of age, and is caseation and the existing sinuses with the victims of pes equino-varus. Pressure of calcium salts, and then forms a severe pain, with the ulcer may attain a wound infection leads to enter the end of pus forms, but may be diagnosed before it is then foul and the scalp or destruction of Hands 529 Hectic Fever 62 12. Chart of a synovitis_ with the arsenical preparations of syphilis, must be a stab or bossed, the natural protective contraction which is sometimes of the inflow of cerebral congestion and on the blisters, and irregular, stalactite-like processes of the gum, with the penis, scrotum, excision or by freely open method of the first thirty-six or applies a globular cyst, or spindle-shaped, and sometimes resulting in the myeloma is often met with the shafts. The operation of Contusions._--The usual phenomena associated in primipar, there is seldom be employed with lymph glands, 322 Submental lymph from the dorsum.

Sounds like believable medical gibberish to me!

Friday, 4 September 2009

Exploring Haskell's List Functions

Haskell's primary data type is the list and there are a rich variety of functions for transforming lists. Most of these functions are familiar to me from Clojure sequences but there are some other useful functions that I haven't seen before.

last, as you'd expect, gives you the last item in a list. It has a corresponding function init that gives you everything but the last item. intersperse allows you to insert an element between each element (e.g. init (intersperse 'a' "bnnn") => "banana" . Similarly, intercalate inserts the lists between lists and concatenates the results. For different arrangements of the items, there are two functions. subsequences gives you a list of all subsequences of the argument and permuatations gives all the possible permutations.


  Prelude> subsequences "abc"          
  ["","a","b","ab","c","ac","bc","abc"]

  Prelude> permutations "abc"
  ["abc","bac","cba","bca","cab","acb"]

To reduce lists to a single value there are many versions of the fold function (in Clojure there was just reduce!). foldl is the direct equivalent to reduce which reduces a list to a single binary operator. foldl (+) 0 [1,2,3] is equivalent to (((0 + 1) + 2) + 3). foldl1 is a convenience method without the initial argument, that only applies to non-empty lists foldr and foldlr1 are right-associative so foldr (+) 0 [1,2,3] evaluates to (0 + (1 + (2 + 3))).

The fold family of functions can be extremely powerful - I need to read "A tutorial on the universality and expressiveness of fold" [PDF]that explores this in more detail. Some example functions that operate on lists defined as folds in Haskell include and, or, any, all and concatMap.

On a side note, there are strict versions of the foldl functions, that with a '. Why'd you need these? Haskell is lazy by default which can mean you build up a great big thunk (a pending calculation). This can be bad (for example, increased space requirements). By making a strict version you evaluate the values as they becomes available. This stops the huge think building up and can be more efficient. There's a good discussion of this here. foldr doesn't have a corresponding strict version, and looking at the expansion it's easy to see why - there's nothing to be lazy with as the first value you can evaluate is right at the end of the list!

unfoldr is an interesting function. It can be considered the opposite of foldr as it is used to build up a list from a seed value. The first argument is a function that returns Nothing if it's finished or Just (a,b) otherwise. a is prepended to the list, and b is used as the next element. For example we can generate the Fibonacci sequence:


  fibonacci :: [Integer]
  fibonacci = unfoldr (\[a,b] -> Just(a+b,[b,b+a])) [0,1]

  Prelude> take 10 fibonacci
  [1,2,3,5,8,13,21,34,55,89]

iterate can be written as iterate f == unfoldr (\x -> Just (x, f x)). Another paper to add to the reading list is "The under appreciated unfold".

take and drop are functions for getting prefixes and suffixes of (potentially) infinite lists. splitAt does both at the same time, returning a tuple (take n xs, drop n xs). takeWhile and dropWhile take or drop errors whilst some predicate holds. Putting these together we can write a function groupN which groups elements into sublists of size N.


  groupN :: [a] -> Int -> [[a]]
  groupN [] _ = []
  groupN xs n = a : groupN b n where
                (a,b) = splitAt n xs

  Prelude> groupN [1..7] 2
  [[1,2],[3,4],[5,6],[7]]

The Haskell list library is very complete and there's definitely some new ideas for me to absorb there. In particular, understanding unfoldr and folding in more detail seems to be an important thing to do!

Wednesday, 2 September 2009

Debugging in Haskell

One thing I was finding a little frustrating is that if I was to do any IO in a function, then it's type signature changes. This is good most of the time, but it's a little painful to have to change all your code to print out a number!

Thankfully, there's a number of solutions.

The GHCi debugger provides a way of inspecting code. :b N sets a break point in the loaded module at a specific line. Taking the anagrams example I set :b 30 on anagrams <-anagramList wordfile.


  anagramsOf :: String -> IO ()
  anagramsOf word = do
    anagrams <- anagramList wordfile
    putStrLn (show (Map.lookup (stringToKey word) anagrams))

When the program is run and the breakpoint is hit we get the following:


  Stopped at anagrams.hs:31:2-57
  _result :: IO () = _
  anagrams :: Map String (Set String) = _
  word :: String = _
  [anagrams.hs:31:2-57] *Main> :list
  30    anagrams <- anagramList wordfile
  31    putStrLn (show (Map.lookup (stringToKey word) anagrams))
  32

:list is used to list of the free variables in scope (anagrams and word) and these are available to inspect in the debugger. _result is a binding for the result expression. Once you've hit a breakpoint, you can use use :trace to continue to the next breakpoint, recording the history as you go along. :back and :forward allow you to go up and down the list of evaluated expressions and inspect each one.

The Haskell Wiki has a section devoted to debugging which brought me to the Debug.Trace module. This allows you to print some text and return the evaluation of the next expression e.g. Debug.Trace.trace "1+1=" (1 + 1).

Initially, I couldn't understand how the type of Debug.Trace.trace could be String -> a -> a, but then I found System.IO.Unsafe. It comes with a large health warning that it doesn't enforce ordering on IO and it's type unsafe. Evil, yet useful (at least to write the trace functions).

Tuesday, 1 September 2009

Generating ASCII Art

ASCII art is a way of presenting graphics through standard characters, without the need for images. For example, you could draw a triangle, using just forward and backwards characters.


  /\
 /__\

Given an arbitrary picture, how do we convert this to ASCII art? The technique is dead simple - convert the image to gray scale and replace each pixel in the image with a character representing the brightness value. For example, a * character is darker than a ! character.

The following Haskell program does just that. It uses the PGM package to load an image (so it's already converted to gray scale). All it does it map the pixel to a character using a function and amap and then do some jiggery pokery to turn it into an image.


import Graphics.Pgm

import Text.Parsec.Error
import Data.Array.Base

brightness =  " .`-_':,;^=+/\"|)\\<>)iv%xclrs{*}I?!][1taeo7zjLu" ++ 
              "nT#JCwfy325Fp6mqSghVd4EgXPGZbYkOA&8U$@KHDBWNMR0Q";

loadImage :: String -> IO (UArray (Int,Int) Int)
loadImage path = do
                    r <- pgmsFromFile path
                    case r of
                      Left e -> error "Failed to parse file"
                      Right i -> return (head i)

brightnessToChar :: Int -> Int -> Char
brightnessToChar m b = brightness !! 
                       (round ((fromIntegral b) / (fromIntegral m) * (fromIntegral ((length brightness) - 1))))

imageToAscii :: UArray (Int,Int) Int -> UArray (Int,Int) Char
imageToAscii image = amap (brightnessToChar 255)  image
     
convertImage :: String -> String -> IO ()
convertImage image out = do
                       img <- loadImage image
                       let ((_,_),(h,w)) = bounds img
                       let x = imageToAscii img
                       writeFile out (unlines [ [ x ! (i,j) | i <- [0..w] ] | j <- [0..h] ])
                       return ()

Interesting learning exercises for me were the nested list comprehension to go through the array and the use of Either to represent a choice of return values. This seems to be used when you want to return more data than simply "it failed" (which you'd use Maybe).

Because I can't quite work out how to get a tiny font, it's easier to post a screen shot of some ASCII art (kind of defeats the purpose, I know!). Below is the Ubuntu logo rendered (badly!) as ASCII art.

Sunday, 30 August 2009

Some Haskell Data Structures

So far I've seen the list data structure, but Haskell also supports items like Maps and Sets.

An association list is a list of tuples of keys to values. For example:


  alist :: [(String,Double)]
  alist = [("pi", 3.14159265), ("e", 2.71828183), ("phi", 1.61803398874)]

  getConstant :: String -> Maybe Double
  getConstant name = lookup name alist

lookup is a prelude function that returns the value (if existing) for the supplied key. Association lists are useful for small lists, but the lookup time is O(N) in the size of elements. Enter Map which provides the same key/value abstraction but with efficient lookup. Data.Map provides operations for insertion, deletion and inspection of keys. For example:


  -- Given an association list, make a map
  Map.fromList aList

  -- Insert a new key/value into the empty map
  Map.insert "pi" 3.14159265 $ Map.empty

  -- Is a key in a map?
  Map.member key map

  -- lookup and findWithDefault allow you to find values.

$ is the function application operator - it's right associative so it means less brackets.

Similarly, Data.Set provides a functional implementation of sets.

We can put these together and change the implementation of the anagrams to create a map of search key to a set of results. This is hideously inefficient initially, but once the data structure is built up finding anagrams should be a little quicker.


  anagramList :: String -> IO (Map String (Set String))
  anagramList file = do
    filecontent <- readFile file
    return (foldl (\x y -> Map.insertWith Set.union (stringToKey y) (Set.singleton y) x)
                  Map.empty 
                  (filter validWord $ lines filecontent))

  anagramsOf :: String -> IO ()
  anagramsOf word = do
    anagrams <- anagramList wordfile
    putStrLn (show (Map.lookup (stringToKey word) anagrams))

Saturday, 29 August 2009

IO And Haskell

All functions in Haskell are pure - how do you deal with side effects such as input/output?


  Prelude> :t putStr "hello world"      
  putStr "hello world" :: IO ()

The IO () means that this returns an IO action that has a result type of () (known as unit).

The do syntax can be used to glue actions together. For example:


  saygoodbye = do
    putStrLn "Hi - who are you?"
    name <- getLine
    putStrLn ("Goodbye Mr. " ++ name)

  *Main> saygoodbye
  Hi - who are you?
  Bond
  Goodbye Mr. Bond

  *Main> :t saygoodbye
  saygoodbye :: IO ()

<- performs the getLine application and binds the resulting value to name. Given that getLine has type IO String this gives name a type of String. IO types and normal types can't be mixed so "Die " ++ getLine is an illegal statement (IO String and String don't mix).

return is like the opposite of <- - it takes a pure value and constructs an action out of it. return is nothing like its use in other languages such as Java and C. return doesn't do anything with the execution path, code continues to the next line; return is purely used to construct actions.

So how'd you escape from your IO action? You don't... There's no escape!:

There's one final detail about IO actions that you should be aware of: there is no escape! The only way to get a result from an IO action is to invoke the IO action (through main) and have its result used to affect the outside world through another IO action. There is no way to take an IO action and extract just its results to a simple value (an inverse-return). The only places where an IO action's results appear unwrapped are within a do-block.

Let's try and put this all together to write a quick program that gives all anagrams of a word. The implementation idea is taken from Programming Pearls - load up a dictionary, sort the characters (so "banana" becomes "aaabnn"), then shove it all into an association list. Then given a word, apply the same sort and simply look up the associations.

Unix distros come with a word list file (/usr/share/dict), but it's full of words with punctuation and so on. We need to filter this list to remove invalid words, then build up an association list (list of tuples).


  import Data.Char
  import List

  wordfile = "/usr/share/dict/words"

  stringToKey :: String -> String
  stringToKey = sort.(map toLower)

  validWord :: String -> Bool
  validWord s = (not (null s)) && 
                length s <= 10 && 
                not (any (not.isAlpha) s)

  anagramList :: String -> IO [(String,String)]
  anagramList file = do
    filecontent <- readFile file
    return (map (\x -> ((stringToKey x),x)) (filter validWord (lines filecontent)))

  matchingKeys :: String -> [(String,String)] -> [String]
  matchingKeys k l = map snd (filter ((== k).fst) l)

  anagramsOf :: String -> IO ()
  anagramsOf word = do
    anagrams <- anagramList wordfile
    putStrLn (show (matchingKeys (stringToKey word) anagrams))

stringToKey is a function of one argument which converts a string to a key by making the string lower-case and then sorting the characters. readFile does exactly what is says on the tin (it's lazy too). lines is a built in function which breaks up a string into separate lines. And it seems to work first time too!


  *Main> anagramsOf "least"
  ["Stael","Tesla","least","slate","stale","steal","tales","teals"]

My current understanding (and it's all very foggy!) is that if a function uses IO actions then they will always boil up to the top level. main is the only place where these can be hidden. For example, I could put the loading of the anagram list in one place, use <- in main and then not have to riddle the rest of the program with IO actions. Good design isolates the IO in the smallest segment of the program possible.

Friday, 28 August 2009

Testing Times

My last program didn't have anything in the way of tests, purely because I had no idea to write them in Haskell.

There seem to be two major test frameworks in Haskell. The first one, HUnit, is based on the xUnit family. Create test cases which assert various properties of the functions you're testing, bundle them into a test suite and run them.


  foo :: (Num a) => a -> a -> a -> a
  foo a b c = a * b + c

  test1 = TestCase (assertEqual "* has higher precedence" 26 (foo 2 10 6))

  tests = TestList [TestLabel "Foo test" test1]

  -- From the REPL
  *Main> runTestTT tests      
  Cases: 1  Tried: 1  Errors: 0  Failures: 0
  Counts {cases = 1, tried = 1, errors = 0, failures = 0}

Tests like this always feel a bit smelly - the only way to verify the test is to write the code again twice. Whilst measure twice cut once works for carpentry, it doesn't feel right for programming...

Enter an alternative testing framework, QuickCheck. The idea is simple, instead of testing arbitrary assertions about your code, specify the invariants associated with your function and let QuickCheck see if it can generate a failing test case.

As a simple example, let's say we write a function to add two numbers together:


  addNum :: (Num a) => a -> a -> a
  addNum a b = a + b

  prop_AddNum a b = (addNum a b) >= b && (addNum a b) >= a

We specify the invariant that if we add numbers together the result is bigger than either argument. Running Quick Check shows that this is (obviously wrong!) and gives an example set of arguments that fail the test.


  *Main> quickCheck prop_AddNum
  Falsifiable, after 5 tests:
  -1
  -2

The convention is that the invariants are usually specified as beginning with "prop_" in case you're wondering where the weird naming comes from.

QuickCheck generates random instances satisfying the types and validates the properties. Generators exist for the basic types and can be extended to your own.

Taking an example from the ray tracing functions we can specify an invariant that the distance between any two points is constant after a linear transform.


  square :: (Num a) => a -> a
  square x = x * x

  distance :: Point -> Point -> Float
  distance p1 p2 = sqrt(square ((x p1)-(x p2)) + square ((y p1)-(y p2)))

  prop_distance :: Point -> Point -> Float -> Float -> Bool
  prop_distance p1 p2 d1 d2 = 0.001 > abs (distance p1 p2 - 
                                           distance (Point ((x p1) + d1) ((y p1) + d2))
                                                    (Point ((x p2) + d1) ((y p2) + d2)))

Note that the abs is just to deal with rounding errors that occur when dealing with floating point types results from the square root. The code won't compile as is, because QuickCheck doesn't know how to generate Point objects. We can solve this problem by creating an instance of Arbitrary specialized (is that the right word?) for Point types.


  instance Arbitrary Point where
      arbitrary = do
        x <- choose(1,1000) :: Gen Float
        y <- choose(1,1000) :: Gen Float
        return (Point x y)

do is used to provide sequencing of statements. We can now run quickCheck and verify that the invariant holds.


  *Main> quickCheck prop_distance
  OK, passed 100 tests.

I'm still not quite understanding some aspects of this (e.g. why can't I write Point choose(1,1000) choose(1,1000) instead of sequencing?), but this is a pretty neat way of writing tests and definitely gives me further reason to try and understand Haskell in more depth.

Wednesday, 26 August 2009

Haskell Arrays

Arrays in Haskell:

may be thought of as functions whose domains are isomorphic to contiguous subsets of the integers.

Arrays in Java:

An array object contains a number of variables. The number of variables may be zero, in which case the array is said to be empty. The variables contained in an array have no names; instead they are referenced by array access expressions that use nonnegative integer index values. These variables are called the components of the array. If an array has n components, we say n is the length of the array; the components of the array are referenced using integer indices from 0 to n - 1, inclusive.

This probably sums up the difference between the languages very well. Haskell says it in one sentence, whereas Java waffles a little bit more!

In Haskell arrays are constructed with the array constructor. The first argument specifies a pair of bounds so we can create an array mapping the integers 3, 4 and 5 to 10 times their value.


  *Main> array (3,5) [ (i,i*10) | i <- [3..5]]
  array (3,5) [(3,30),(4,40),(5,50)]

Arrays can also be multidimensional in which case more bounds need to be provided.


  *Main> (array ((0,0),(1,1)) [ ((i,j),i*2+j) | i <- [0..1], j <- [0..1]]) 
  array ((0,0),(1,1)) [((0,0),0),((0,1),1),((1,0),2),((1,1),3)]

I struggled to get the PPM package in Hackage working last time because I didn't understand arrays in the slightest. Now that I have a little understanding (only a little!) I can actually visualize the ray tracing code...

One thing I'm still not understanding is how I should format Haskell code. Here's the code to render the image and save it as a PPM.


image :: [Sphere] -> Point -> Int -> Int -> Array (Int,Int) Int 
image world eye width height = 
    array 
      ((0,0),(width,height)) 
      [((i,j),truncate (255 * (value (colorAt world eye (fromIntegral i) (fromIntegral j))))) | 
        i <- [0..width], j<- [0..height]]

imageWord16 :: Array (Int,Int) Int -> Array (Int,Int) Word16
imageWord16 image = fmap (fromIntegral :: Int -> Word16) image

saveImage :: String -> [Sphere] -> Point -> Int -> Int -> IO ()
saveImage filename world eye width height = arrayToFile filename (imageWord16 (image world eye width height))

Note the horrible usage of fromIntegral and truncate to convert an integer to a float and back again. I think this is because I should have been more general in my types on the Point data type and specified it as a number rather than a float.

The type of saveImage looks a little funny but this is because it returns an IO action rather than a value. This value is a monad, but again I'll ignore this and hope repeatedly using them will lead to understanding! For now, I just grok that its type indicates it does something rather than returns something.

The finished code weighs in at about 100 lines, which is more or less exactly the same as Clojure, but with the advantage of static typing. I found static typing to be in equal measure incredibly useful and incredibly frustrating! Hopefully it'll lean towards useful as I understand things a bit more.