Guy Steele's paper, Organizing functional code for parallel execution or, foldl and foldr considered slightly harmful [PDF] gives a very readable presentation of designing code for performance.
It advocates using trees instead of lists as the primary data structure. Trees naturally decompose into a structure easily processed by Map/reduce.
The paper introduces "conc lists" which is a way of implementing lists in terms of trees. In comparison with cons lists there are some trade-offs. Cons lists have first / rest as O(1) operations whereas in a tree structure these involve traversing the tree to the left-most leaf. The big win is on functions that traverse the entire list (such as length / map / filter / append) as these can now be processed in logarithmic time because of the opportunity for parallelism.
For example, map and filter can now be defined in terms of map reduce as shown below.
(define (mapreduce f g id xs)
(cond ((null? xs) id)
(singleton? xs) (f item xs)))
(else (split xs (lambda (ys zs)
(g (mapreduce f g id ys)
(mapreduce f g id zs))))))
(define (map f xs)
(mapreduce (lambda (x) (list (f x))) append '() xs))
(define (length xs)
(mapreduce (lambda (x) 1) + 0 xs))
The split function takes a tree, applies a function to both halves and puts it back together again. This gives an opportunity for parallelism which isn't present when working with a normal cons list.
Conc lists reminds me of Clojure's persistent vectors implementation. To my knowledge Clojure currently doesn't take advantage of these, but it's nice to know it could!