Tuesday 20 October 2009

Pretty Printing Basic JSON

JSON is an incredibly simple format. In fact, it's so simple that the entire spec can be placed on a business card.

It can be described in Haskell (all based on Real World Haskell - highly recommended!) very simply with an algebraic data type


data JValue = JString String
| JNumber Double
| JBool Bool
| JNull
| JObject [(String, JValue)]
| JArray [JValue]
deriving (Eq, Ord, Show)


This gives us all we need to be able to create JSON objects within ghci.


*Main> JBool True
JBool True

*Main> JObject [("foo", JString "bar"), ("baz", JString "boo")]
JObject [("foo",JString "bar"),("baz",JString "boo")]

*Main> JArray [JNumber 1,JNumber 2,JNumber 3]
JArray [JNumber 1.0,JNumber 2.0,JNumber 3.0]


This isn't hugely useful at the moment - we could write functions that build up JObjects and the results, but at the moment we have no way to turn them into a readable lump of JSON.

Pretty Printing is the name given to an process that applies stylistic formatting to content.

This is actually a pretty complicated problem. One solution we could use is just to write functions for each type of JValue and go from there. The problem with that is that it wouldn't allow us to keep track of the contextual information. For example, if you are printing out Java code then you might want you { and } characters to align or your columns capped at a maximum length and so on.

Pretty Printing has a long history, from GRIND (used to print Lisp) to Eclipse.

For Haskell, there have been two papers which have focused on the neat design of a pretty printing library. The Design of a Pretty-printing Library and A Prettier Printer both take the same approach of creating a document structure representing the output. Real World Haskell presents the JSON pretty printer using the same approach.

A Doc type defines the rendering of data that we'll do. It is simply defined as:


data Doc = Empty
| Char Char
| Text String
| Line
| Concat Doc Doc
| Union Doc Doc
deriving (Show,Eq)


Construction functions are provided to go from primitive types to document objects.


empty :: Doc
empty = Empty

char :: Char -> Doc
char c = Char c

text :: String -> Doc
text str = Text str

double :: Double -> Doc
double num = text (show num)

line :: Doc
line = Line


There are various operations defined for documents. The infix <> operator is used to denote the concatenation of two documents.


(<>) :: Doc -> Doc -> Doc
Empty <> y = y
x <> Empty = x
x <> y = x `Concat` y

hcat :: [Doc] -> Doc
hcat = fold (<>)

fold :: (Doc -> Doc -> Doc) -> [Doc] -> Doc
fold f = foldr f empty

empty :: Doc
empty = Empty

() :: Doc -> Doc -> Doc
x y = x <> softline <> y

fsep :: [Doc] -> Doc
fsep = fold ()

enclose :: Char -> Char -> Doc -> Doc
enclose left right x = char left <> x <> char right

softline :: Doc
softline = group line

group :: Doc -> Doc
group x = flatten x `Union` x

flatten :: Doc -> Doc
flatten (x `Concat` y) = flatten x `Concat` flatten y
flatten Line = Char ' '
flatten (x `Union` _) = flatten x
flatten other = other


softline probably takes some explaining. It maintains two alternative paths (using Union). The flatten function replaces new lines with a space, joining two separate lines into one. The other side of the union leaves the document untouched.

All this seems rather abstract; how does it fit into rendering JSON? Firstly we have to define some functions that allow us to create document objects from the JValue instances we declared earlier:


renderJValue :: JValue -> Doc
renderJValue (JString s) = string (show s)
renderJValue (JNumber n) = string (show n)
renderJValue (JBool True) = string "true"
renderJValue (JBool False) = string "false"
renderJValue JNull = string "null"
renderJValue (JArray ary) = series '[' ']' renderJValue ary
renderJValue (JObject obj) = series '{' '}' field obj
where field (name,val) = string name
<> text ": "
<> renderJValue val


Rending compound objects (JArray and JObject) makes use of a couple of helper functions which break down the object into individual elements:


series :: Char -> Char -> (a -> Doc) -> [a] -> Doc
series open close item = enclose open close
. fsep . punctuate (char ',') . map item

punctuate :: Doc -> [Doc] -> [Doc]
punctuate p [] = []
punctuate p [d] = [d]
punctuate p (d:ds) = (d <> p) : punctuate p ds


So now we have a way of turning JValue into Doc objects. As you can see, this is anything but pretty!


*Main> renderJValue (JBool True)
Concat (Concat (Char '"') (Concat (Char 't') (Concat (Char 'r')
(Concat (Char 'u') (Char 'e'))))) (Char '"')

*Main> renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])
Concat (Concat (Char '[') (Concat (Concat (Concat (Concat (Concat (Char '"')
(Concat (Char '1') (Concat (Char '.') (Char '0')))) (Char '"')) (Char ','))
(Union (Char ' ') Line)) (Concat (Concat (Concat (Concat (Concat (Char '"')
(Concat (Char '2') (Concat (Char '.') (Char '0')))) (Char '"')) (Char ','))
(Union (Char ' ') Line)) (Concat (Concat (Concat (Char '"') (Concat (Char '3')
(Concat (Char '.') (Char '0')))) (Char '"')) (Union (Char ' ') Line))))) (Char ']')


The main point is that now we have the contextual information and can write a function to render the document. The simplest possible way of printing out the document is just to pick one side of each union operator and print.


compact :: Doc -> String
compact x = transform [x]
where transform [] = ""
transform (d:ds) =
case d of
Empty -> transform ds
Char c -> c : transform ds
Text s -> s ++ transform ds
Line -> '\n' : transform ds
a `Concat` b -> transform (a:b:ds)
_ `Union` b -> transform (b:ds)


This gives back a very basic rendering:


*Main> putStrLn (compact (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])
["1.0",
"2.0",
"3.0"
]


But we can easily extend this to bound printing within a certain range:


pretty :: Int -> Doc -> String
pretty width x = best 0 [x]
where best col (d:ds) =
case d of
Empty -> best col ds
Char c -> c : best (col + 1) ds
Text s -> s ++ best (col + length s) ds
Line -> '\n' : best 0 ds
a `Concat` b -> best col (a:b:ds)
a `Union` b -> nicest col (best col (a:ds))
(best col (b:ds))
best _ _ = ""
nicest col a b | (width - least) `fits` a = a
| otherwise = b
where least = min width col

fits :: Int -> String -> Bool
w `fits` _ | w < 0 = False
w `fits` "" = True
w `fits` ('\n':_) = True
w `fits` (c:cs) = (w - 1) `fits` cs


Now we can render to a fixed width, so:


*Main> putStrLn (pretty 30 (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])))
["1.0", "2.0", "3.0" ]

*Main> putStrLn (pretty 15 (renderJValue (JArray [JNumber 1,JNumber 2,JNumber 3])))
["1.0", "2.0",
"3.0" ]


RWH goes into a lot more detail that the above basic example (for example handling escaping properly and so on), it's definitely worth buying.