Fatvat: October 2009

Friday, 30 October 2009

Cambridge Stack Overflow Dev Days

I've just got back from the Stack Overflow Dev Day. Definitely worth attending (and well worth the price).

The opening key note from Joel focused on the balance of features vs. choice. 37 Signals popularized the idea of "simple software" (illustrated by simple bug tracking software). However, FogBugz figures don't back this up the number of sales is proportional to the number of features.

I got the feeling that this talk was targeted at companies whose business it is to sell directly to the customers. In larger software corporations you aren't targeting real end-users, you are trying to convince a budget manager that your software is better than the competitions. I guess this leads to the explosion in features (e.g. Word vs. Word Perfect, Lotus Notes vs. Excel) and ends up with over complicated solutions that do everything adequately and nothing outstanding.

I thoroughly enjoyed the conference and will definitely be going next time. Also found out about another related event.
Next up was Christian Heilmann who is a "developer evangelist" for Yahoo!. The talk show cased some of Yahoo's developer tech. I was very impressed by YQL which provides a consistent SQL-like API for accessing a whole suite of web api's (not just Yahoo). It was very impressive to see mashups created in a few seconds and I'll definitely need to poke around the APIs. The term JSONP came up together with Caja (a safe subset of JavaScript) - both of which I'd never heard of but sound like things I should know.

Frank Stajano gave a good talk of the human aspects of security, focusing on social engineering and illustrating it with examples from The Real Hustle. It seemed slightly at odds with the rest of the talks, but it was interesting and well presented! And I got some new terminology, sock puppets, astro-turfing and Sybil's.

After lunch we had the fastest introduction possible to ASP.NET MVC. Steve Sanderson went through a few slides of the architecture (very similar to Ruby On Rails!) and then proceeded to build a simple file management application at breakneck speed. Very enjoyable presentation again. And ASP.NET MVC even has Linux support with MonoDevelop.

Remy Sharp gave a similarly fast paced presentation of JQuery. It was impressive to see JQuery in action and the simple steps to build a plugin were good to see. The presentation ended with a heroic attempt to "live-code" a tag cloud application into Twitter!

The penultimate talk was about Python by Michael Foord. I wasn't too keen on this presentation. There were statements like "Dynamic Programming makes it easy to test" and "Dynamic Programming means I type less" that I didn't agree with and there was no follow up to back it up. The presentation gave a quick overview of Python using Norvig's spelling corrector as an example.

Finally, Jeff Atwood gave a quick talk about Stack Overflow which was short on detail but definitely watch-able. Whatever has been written about this guy, he (and a couple of others) are responsible for a huge web site that is apparently the 895th biggest site on the whole web! It's been hugely successful so it was interesting to hear where the ideas came from and so on.

In conclusion, definitely worth going too and roll on the next one.

Tuesday, 20 October 2009

Pretty Printing Basic JSON

JSON is an incredibly simple format. In fact, it's so simple that the entire spec can be placed on a business card.

It can be described in Haskell (all based on Real World Haskell - highly recommended!) very simply with an algebraic data type


data JValue = JString String
            | JNumber Double
            | JBool Bool
            | JNull
            | JObject [(String, JValue)]
            | JArray [JValue]
              deriving (Eq, Ord, Show)

This gives us all we need to be able to create JSON objects within ghci.


  *Main> JBool True
  JBool True

  *Main> JObject [("foo", JString "bar"), ("baz", JString "boo")] 
  JObject [("foo",JString "bar"),("baz",JString "boo")]

  *Main> JArray [JNumber 1,JNumber 2,JNumber 3]                                  
  JArray [JNumber 1.0,JNumber 2.0,JNumber 3.0]

This isn't hugely useful at the moment - we could write functions that build up JObjects and the results, but at the moment we have no way to turn them into a readable lump of JSON.

Pretty Printing is the name given to an process that applies stylistic formatting to content.

This is actually a pretty complicated problem. One solution we could use is just to write functions for each type of JValue and go from there. The problem with that is that it wouldn't allow us to keep track of the contextual information. For example, if you are printing out Java code then you might want you { and } characters to align or your columns capped at a maximum length and so on.

Pretty Printing has a long history, from GRIND (used to print Lisp) to Eclipse.

For Haskell, there have been two papers which have focused on the neat design of a pretty printing library. The Design of a Pretty-printing Library and A Prettier Printer both take the same approach of creating a document structure representing the output. Real World Haskell presents the JSON pretty printer using the same approach.

A Doc type defines the rendering of data that we'll do. It is simply defined as:


  data Doc = Empty
           | Char Char
           | Text String
           | Line
           | Concat Doc Doc
           | Union Doc Doc
             deriving (Show,Eq)

Construction functions are provided to go from primitive types to document objects.


  empty :: Doc
  empty = Empty

  char :: Char -> Doc
  char c = Char c

  text :: String -> Doc
  text str = Text str

  double :: Double -> Doc
  double num = text (show num)

  line :: Doc
  line = Line

There are various operations defined for documents. The infix <> operator is used to denote the concatenation of two documents.


  (<>) :: Doc -> Doc -> Doc
  Empty <> y = y
  x <> Empty = x
  x <> y = x `Concat` y

  hcat :: [Doc] -> Doc
  hcat = fold (<>)

  fold :: (Doc -> Doc -> Doc) -> [Doc] -> Doc
  fold f = foldr f empty

  empty :: Doc
  empty = Empty

  () :: Doc -> Doc -> Doc
  x  y = x <> softline <> y

  fsep :: [Doc] -> Doc
  fsep = fold ()

  enclose :: Char -> Char -> Doc -> Doc
  enclose left right x = char left <> x <> char right

  softline :: Doc
  softline = group line

  group :: Doc -> Doc
  group x = flatten x `Union` x

  flatten :: Doc -> Doc
  flatten (x `Concat` y) = flatten x `Concat` flatten y
  flatten Line           = Char ' '
  flatten (x `Union` _)  = flatten x
  flatten other          = other

softline probably takes some explaining. It maintains two alternative paths (using Union). The flatten function replaces new lines with a space, joining two separate lines into one. The other side of the union leaves the document untouched.

All this seems rather abstract; how does it fit into rendering JSON? Firstly we have to define some functions that allow us to create document objects from the JValue instances we declared earlier:


  renderJValue :: JValue -> Doc
  renderJValue (JString s) = string (show s)
  renderJValue (JNumber n) = string (show n)
  renderJValue (JBool True) = string "true"
  renderJValue (JBool False) = string "false"
  renderJValue JNull = string "null"
  renderJValue (JArray ary) = series '[' ']' renderJValue ary
  renderJValue (JObject obj) = series '{' '}' field obj
      where field (name,val) = string name
                            <> text ": "
                            <> renderJValue val

Rending compound objects (JArray and JObject) makes use of a couple of helper functions which break down the object into individual elements:


  series :: Char -> Char -> (a -> Doc) -> [a] -> Doc
  series open close item = enclose open close 
                         . fsep . punctuate (char ',') . map item

  punctuate :: Doc -> [Doc] -> [Doc]
  punctuate p [] = []
  punctuate p [d] = [d]
  punctuate p (d:ds) = (d <> p) : punctuate p ds

So now we have a way of turning JValue into Doc objects. As you can see, this is anything but pretty!


  *Main> renderJValue (JBool True)
  Concat (Concat (Char '"') (Concat (Char 't') (Concat (Char 'r') 
    (Concat (Char 'u') (Char 'e'))))) (Char '"')

  *Main> renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])  
  Concat (Concat (Char '[') (Concat (Concat (Concat (Concat (Concat (Char '"') 
    (Concat (Char '1') (Concat (Char '.') (Char '0')))) (Char '"')) (Char ',')) 
    (Union (Char ' ') Line)) (Concat (Concat (Concat (Concat (Concat (Char '"') 
    (Concat (Char '2') (Concat (Char '.') (Char '0')))) (Char '"')) (Char ',')) 
    (Union (Char ' ') Line)) (Concat (Concat (Concat (Char '"') (Concat (Char '3') 
    (Concat (Char '.') (Char '0')))) (Char '"')) (Union (Char ' ') Line))))) (Char ']')

The main point is that now we have the contextual information and can write a function to render the document. The simplest possible way of printing out the document is just to pick one side of each union operator and print.


  compact :: Doc -> String
  compact x = transform [x]
      where transform [] = ""
            transform (d:ds) = 
                case d of
                  Empty -> transform ds
                  Char c -> c : transform ds
                  Text s -> s ++ transform ds
                  Line -> '\n' : transform ds
                  a `Concat` b -> transform (a:b:ds)
                  _ `Union` b -> transform (b:ds)

This gives back a very basic rendering:


  *Main> putStrLn (compact (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])
  ["1.0",
  "2.0",
  "3.0"
  ]

But we can easily extend this to bound printing within a certain range:


  pretty :: Int -> Doc -> String
  pretty width x = best 0 [x]
      where best col (d:ds) = 
                case d of
                  Empty -> best col ds
                  Char c -> c : best (col + 1) ds
                  Text s -> s ++ best (col + length s) ds
                  Line -> '\n' : best 0 ds
                  a `Concat` b -> best col (a:b:ds)
                  a `Union` b -> nicest col (best col (a:ds))
                                            (best col (b:ds))
            best _ _ = ""
            nicest col a b | (width - least) `fits` a = a
                           | otherwise = b
                           where least = min width col

  fits :: Int -> String -> Bool
  w `fits` _ | w < 0 = False
  w `fits` "" = True
  w `fits` ('\n':_) = True
  w `fits` (c:cs) = (w - 1) `fits` cs

Now we can render to a fixed width, so:


  *Main> putStrLn (pretty 30 (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])))                      
  ["1.0", "2.0", "3.0" ]

  *Main> putStrLn (pretty 15 (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])))                      
  ["1.0", "2.0", 
  "3.0" ]

RWH goes into a lot more detail that the above basic example (for example handling escaping properly and so on), it's definitely worth buying.

Monday, 12 October 2009

Declaring Modules in Haskell

A Haskell program is a collection of modules. Each module consists of a declaration and a number of exports. Modules can relate to each other by import statements.

The most basic declaration is


  module MyModule where
  
  foo x = 1 + x
  bar x = foo (foo x)

If you omit an export list then everything in the module is available to the outside world. Typically you want to limit the exports to provide encapsulation and this is accomplished by explicitly naming the modules you want to expose. In the example below only bar is exported, foo is private to this module.


  module MyModule (bar) where

  foo x = 1 + x
  bar x = foo (foo x)

Note that a module name must correspond to the file name of the haskell source, so the above must be in a file called MyModule.hs.

To relate modules to each other the import directive is used.

Thursday, 1 October 2009

Dealing with Partial Functions

Functions can be either total (e.g. fully defined for all possible input values) or partial. Handling partial functions can be painful, for example if you pass an empty list to head then it'll terminate the program with an exception.

Partial functions are usually handled by either pattern matching being used to exclude cases where the function isn't defined e.g.


  foo [] = 33      -- not safe to call head
  foo xs = head xs -- safe to call head

Or using Maybe to indicate (via the type system) that the function might not return a value. The first exercise of chapter 4 of Real World Haskell asks you to rewrite some of the partial list functions so that they never fail.


  safeHead :: [a] -> Maybe a
  safeHead [] = Nothing
  safeHead (x:_) = Just x

  safeTail :: [a] -> Maybe [a]
  safeTail [] = Nothing
  safeTail (_:xs) = Just xs

  safeLast :: [a] -> Maybe a
  safeLast [] = Nothing
  safeLast (y:[]) = Just y
  safeLast (_:xs) = safeLast xs

  safeInit :: [a] -> Maybe [a]
  safeInit [] = Nothing
  safeInit (x:[]) = Just []
  safeInit (x:xs) = Just (x : fromJust(safeInit xs))

In order to be thorough, I thought I'd explore HUnit a bit more and write a few tests. The funny "~:" and "~=?" are just syntactic sugar for writing assertions. One area that confused me was the need to specify the type explicitly on the functions for undefined values. I'm not entirely sure why this has to be done...


  testSafeHead = test [ "Safe head non-empty" ~: Just 1 ~=? safeHead [1]
                      , "Safe head null" ~: Nothing ~=? (safeHead [] :: Maybe Int) ]
  
  testSafeTail = test [ "Safe tail empty" ~: Just [] ~=? safeTail [1]
                      , "Safe tail value" ~: Just [1] ~=? safeTail [2,1]
                      , "Safe tail nothing" ~: Nothing ~=? (safeTail [] :: Maybe [Int]) ]

  testSafeLast = test [ "Safe last empty" ~: Nothing ~=? (safeLast [] :: Maybe Int)
                      , "Safe last 1 element" ~: Just 1 ~=? safeLast [1]
                      , "Safe last n element" ~: Just 1 ~=? safeLast [3,2,1] ]

  testSafeInit = test [ "Safe Init empty" ~: Nothing ~=? (safeInit [] :: Maybe [Int])
                      , "Safe Init singleton" ~: Just [] ~=? safeInit [1]
                      , "Safe Init list" ~: Just [1,2] ~=? safeInit [1,2,3] ]

  tests = TestList [TestLabel "safeHead" testSafeHead
                   ,TestLabel "safeTail" testSafeTail
                   ,TestLabel "safeLast" testSafeLast
                   ,TestLabel "safeInit" testSafeInit]

You can run these tests within ghci with


  *Main> runTestTT tests
  Cases: 11  Tried: 11  Errors: 0  Failures: 0
  Counts {cases = 11, tried = 11, errors = 0, failures = 0}

I think these tests would have been better written with QuickCheck where I could have compared them with the non-safe versions. But this'll do for now!