Friday 4 September 2009

Exploring Haskell's List Functions

Haskell's primary data type is the list and there are a rich variety of functions for transforming lists. Most of these functions are familiar to me from Clojure sequences but there are some other useful functions that I haven't seen before.

last, as you'd expect, gives you the last item in a list. It has a corresponding function init that gives you everything but the last item. intersperse allows you to insert an element between each element (e.g. init (intersperse 'a' "bnnn") => "banana" . Similarly, intercalate inserts the lists between lists and concatenates the results. For different arrangements of the items, there are two functions. subsequences gives you a list of all subsequences of the argument and permuatations gives all the possible permutations.


Prelude> subsequences "abc"
["","a","b","ab","c","ac","bc","abc"]

Prelude> permutations "abc"
["abc","bac","cba","bca","cab","acb"]


To reduce lists to a single value there are many versions of the fold function (in Clojure there was just reduce!). foldl is the direct equivalent to reduce which reduces a list to a single binary operator. foldl (+) 0 [1,2,3] is equivalent to (((0 + 1) + 2) + 3). foldl1 is a convenience method without the initial argument, that only applies to non-empty lists foldr and foldlr1 are right-associative so foldr (+) 0 [1,2,3] evaluates to (0 + (1 + (2 + 3))).

The fold family of functions can be extremely powerful - I need to read "A tutorial on the universality and expressiveness of fold" [PDF]that explores this in more detail. Some example functions that operate on lists defined as folds in Haskell include and, or, any, all and concatMap.

On a side note, there are strict versions of the foldl functions, that with a '. Why'd you need these? Haskell is lazy by default which can mean you build up a great big thunk (a pending calculation). This can be bad (for example, increased space requirements). By making a strict version you evaluate the values as they becomes available. This stops the huge think building up and can be more efficient. There's a good discussion of this here. foldr doesn't have a corresponding strict version, and looking at the expansion it's easy to see why - there's nothing to be lazy with as the first value you can evaluate is right at the end of the list!

unfoldr is an interesting function. It can be considered the opposite of foldr as it is used to build up a list from a seed value. The first argument is a function that returns Nothing if it's finished or Just (a,b) otherwise. a is prepended to the list, and b is used as the next element. For example we can generate the Fibonacci sequence:


fibonacci :: [Integer]
fibonacci = unfoldr (\[a,b] -> Just(a+b,[b,b+a])) [0,1]

Prelude> take 10 fibonacci
[1,2,3,5,8,13,21,34,55,89]


iterate can be written as iterate f == unfoldr (\x -> Just (x, f x)). Another paper to add to the reading list is "The under appreciated unfold".

take and drop are functions for getting prefixes and suffixes of (potentially) infinite lists. splitAt does both at the same time, returning a tuple (take n xs, drop n xs). takeWhile and dropWhile take or drop errors whilst some predicate holds. Putting these together we can write a function groupN which groups elements into sublists of size N.


groupN :: [a] -> Int -> [[a]]
groupN [] _ = []
groupN xs n = a : groupN b n where
(a,b) = splitAt n xs

Prelude> groupN [1..7] 2
[[1,2],[3,4],[5,6],[7]]


The Haskell list library is very complete and there's definitely some new ideas for me to absorb there. In particular, understanding unfoldr and folding in more detail seems to be an important thing to do!