Saturday, 29 August 2009

IO And Haskell

All functions in Haskell are pure - how do you deal with side effects such as input/output?

Prelude> :t putStr "hello world"
putStr "hello world" :: IO ()

The IO () means that this returns an IO action that has a result type of () (known as unit).

The do syntax can be used to glue actions together. For example:

saygoodbye = do
putStrLn "Hi - who are you?"
name <- getLine
putStrLn ("Goodbye Mr. " ++ name)

*Main> saygoodbye
Hi - who are you?
Goodbye Mr. Bond

*Main> :t saygoodbye
saygoodbye :: IO ()

<- performs the getLine application and binds the resulting value to name. Given that getLine has type IO String this gives name a type of String. IO types and normal types can't be mixed so "Die " ++ getLine is an illegal statement (IO String and String don't mix).

return is like the opposite of <- - it takes a pure value and constructs an action out of it. return is nothing like its use in other languages such as Java and C. return doesn't do anything with the execution path, code continues to the next line; return is purely used to construct actions.

So how'd you escape from your IO action? You don't... There's no escape!:

There's one final detail about IO actions that you should be aware of: there is no escape! The only way to get a result from an IO action is to invoke the IO action (through main) and have its result used to affect the outside world through another IO action. There is no way to take an IO action and extract just its results to a simple value (an inverse-return). The only places where an IO action's results appear unwrapped are within a do-block.

Let's try and put this all together to write a quick program that gives all anagrams of a word. The implementation idea is taken from Programming Pearls - load up a dictionary, sort the characters (so "banana" becomes "aaabnn"), then shove it all into an association list. Then given a word, apply the same sort and simply look up the associations.

Unix distros come with a word list file (/usr/share/dict), but it's full of words with punctuation and so on. We need to filter this list to remove invalid words, then build up an association list (list of tuples).

import Data.Char
import List

wordfile = "/usr/share/dict/words"

stringToKey :: String -> String
stringToKey = sort.(map toLower)

validWord :: String -> Bool
validWord s = (not (null s)) &&
length s <= 10 &&
not (any (not.isAlpha) s)

anagramList :: String -> IO [(String,String)]
anagramList file = do
filecontent <- readFile file
return (map (\x -> ((stringToKey x),x)) (filter validWord (lines filecontent)))

matchingKeys :: String -> [(String,String)] -> [String]
matchingKeys k l = map snd (filter ((== k).fst) l)

anagramsOf :: String -> IO ()
anagramsOf word = do
anagrams <- anagramList wordfile
putStrLn (show (matchingKeys (stringToKey word) anagrams))

stringToKey is a function of one argument which converts a string to a key by making the string lower-case and then sorting the characters. readFile does exactly what is says on the tin (it's lazy too). lines is a built in function which breaks up a string into separate lines. And it seems to work first time too!

*Main> anagramsOf "least"

My current understanding (and it's all very foggy!) is that if a function uses IO actions then they will always boil up to the top level. main is the only place where these can be hidden. For example, I could put the loading of the anagram list in one place, use <- in main and then not have to riddle the rest of the program with IO actions. Good design isolates the IO in the smallest segment of the program possible.