Friday 26 December 2008

Bubbling Clojure

There's quite a few sorting algorithms around. I wanted to do something similar to Sorting Algorithms, only in Clojure + Swing as some kind of learning exercise. (99 Problems is still on-going, but to be honest it's boring (!) I'll reach the end one day...).

Perhaps the simplest is bubble sort, go through the list and if you find two adjacent items out of order, swap them. Repeat until done.


(defn bubble [lst]
(if (or (nil? lst) (nil? (second lst)))
lst
(if (> (first lst) (second lst))
(cons (second lst) (cons (first lst) (nthrest lst 2)))
(lazy-cons (first lst) (bubble (rest lst))))))

(defn bubble-sort [lst]
(if (= (bubble lst) lst)
lst
(recur (bubble lst))))


I couldn't find a neater way of doing it. An alternative is to use iterate to apply bubble until no more swaps are necessary. This at least makes the time complexity pretty obvious.


(defn bubble-sort [lst]
(last (take (* (count lst) (count lst)) (iterate bubble lst))))


We can simplify this further. The bubble function sucks because it only applies a single swap and then just rebuilds up the sequence. Applying the function as we go up we get:


(defn bubble [lst]
(if (or (nil? lst) (nil? (second lst)))
lst
(if (> (first lst) (second lst))
(cons (second lst) (lazy-cons (first lst) (bubble (nthrest lst 2))))
(lazy-cons (first lst) (bubble (rest lst))))))


Quicksort is a much better algorithm. The algorithm is:

  • Pick a pivot X

  • Move all items less than the pivot to the left and all greater to the right

  • Apply recursively to each side



This has a very simple implementation in Clojure


(defn qsort [lst]
(if (nil? lst)
lst
(concat
(qsort (filter (partial > (first lst)) (rest lst)))
(list (first lst))
(qsort (filter (partial <= (first lst)) (rest lst))))))


How do the two compare?

First I needed to find out how to generate a big random sequence. repeatedly applies a function of no arguments and generates an infinite list. Using repeatedly and take, you can get a list of random elements like this:


(take 500 (repeatedly (fn [] (rand-int 100)))


So now I can compare the two:


user> (time (count (bubble-sort (take 1000 (repeatedly (fn [] (rand-int 100)))))))
"Elapsed time: 575.713898 msecs"
1000
user> (time (count (qsort (take 1000 (repeatedly (fn [] (rand-int 100)))))))
"Elapsed time: 38.933079 msecs"
1000


Things I still need to work out:

  • Why does my bubble sort implementation blow the stack with large lists? Shouldn't the laziness mean it shouldn't? last looks daft, replace with take/drop combo
  • Multiple return values (or faking them). It'd be nice to stop swapping in the bubble sort when no swaps are done. Clojure doesn't support multiple return values (for compatibility with Java)


Things I have learnt:

  • Iterate is a cool function

8 comments:

  1. Nice!

    I by chance assigned the same task to myself as a first Clojure exercise. My solution is less elegant, a bit longer and a lot slower: http://bit.ly/6mwvh.

    A few questions/comments:
    1. (nthrest lst 2) is the same as (drop lst 2), isn't it?
    2. I am using Clojure 1.1.0 and lazy-cons is not defined any more. (That may be an explanation of the missing nthrest, too)
    3. I think I did everything with lazy sequences. The "Programming Clojure" book advises to not use explicit recursion either to speed things up, but I did not manage to achieve that with bubble-sort.

    Thank you,
    Bálint

    ReplyDelete
  2. Hi Bálint,

    Thanks for your comments. This version of bubble sort is using Clojure pre version 1.0 so some of the items in it are a bit out of date.

    nthrest was replaced when Clojure got lazy sequences. The new function is called nthnext.

    user> (nthnext '(1 2 3 4 5) 2)
    (3 4 5)

    From a list perspective I don't think there's much difference between using that and drop. You are right, lazy-cons is gone now. This is replaced with lazy-seq.

    Explicit recursion is OK if the sequence you are producing is lazy. recur is preferred if you are doing something that isn't lazy.

    The code below updates to something that should work in Clojure 1.1.

    (defn bubble [lst]
    (if (or (empty? lst) (nil? (second lst)))
    lst
    (if (> (first lst) (second lst))
    (cons (second lst) (lazy-seq
    (cons (first lst) (bubble (nthnext lst 2)))))
    (lazy-seq (cons (first lst) (bubble (rest lst)))))))

    (defn bubble-sort
    [lst]
    (last (take (* (count lst) (count lst)) (iterate bubble lst))))

    In the approach I've used, we only ever compare two adjacent elements. The downside is that I have to make potentially O(N^2) passes of the whole list in order to be sure. I could optimize it out by returning extra information (such as the number of swaps) and then just terminating if that were zero.

    I'm not so sure why your approach has nasty time problems, I suspect the concat call is expensive. The "swap-in-coll" function looks like the likely cause. I think the problem is that you're thinking of the sequence as a whole and trying to act on all of it in one go. A typical functional version will walk through a sequence only once, building up a new list as it goes. I realize that's a fairly useless description, but I'm still learning!

    I found the documentation on sequences really helpful. Basically if you can use a standard algorithm *always* do it. I've found few things that aren't expressable with a standard map, reduce, iterate or repeatedly function! Whenever you write your own code with recur, always see if you're doing a standard pattern. If I was looking at bubble again I'd probably try to go through the sequence using reduce, comparing each pair of elements and swapping them if necessary.

    If I looked at this again I'd try to avoid writing my own lazy-seq stuff? Maybe you could do this with repeated (e.g. iterate) applications of map/partition?

    user> (apply concat (map (fn [[x y]] (if (> x y) [y x] [x y])) (partition 2 '(4 3 2 1))))

    You'd have to make sure you altered the partition start place to ensure that everything got swapped. (no idea if that'll work, but worth a punt!).

    One of the cool things about Clojure is you can break open the standard Java bag of tricks. For example, you can use a profiler like JVisualVM if you want to get an idea of where the time is spent (see here).

    Let me know if you get anything cool working :)

    ReplyDelete
  3. Thanks, Jeff.

    Yes, after reading the appropriate chapter of the "Programming Clojure" book, my intention was to (only) use lazy sequences and lazy-seq library functions. (I have now replaced the one concat call with lazy-cat but it did not improve the speed).

    In fact, that chapter says that one should use tail recursion (recur) when dealing with small or fixed size sequences, otherwise one should use lazy sequences. Since in the case of a bubble-sort the collection can be potentially large, I tried to use lazy sequences.

    In fact, there is still an explicit recursion in swap-all and I had experimented with something similar to what you recommend: map, partition, iterate.

    The problem was exactly what you mention: that the swapping index should shift along the collection, and I could not get my head around how to express that with the aforementioned functions. That does not mean, it is not possible, on the contrary :) In fact, just by talking about it, I have a few other ideas, I might try now...

    Anyway, thank you for your insights and the profiler tip, I'll let you know if I manage to improve my algorythm.

    Bálint

    ReplyDelete
  4. Hey Jeff,

    I posted another comment with my improved bubble-sorting code but somehow it did not get through. Did it appear for moderation?

    Thank you,
    Balint

    ReplyDelete
  5. I haven't seen any other comments in the queue, could you try again?

    ReplyDelete
  6. Here is the improved code:

    (defn swap-even [coll]
    (let [swapped (apply concat (map (fn [[x y]] (if (> x y) [y x] [x y])) (partition 2 coll)))]
    (if (odd? (count coll))
    (lazy-cat swapped (take 1 (drop (dec (count coll)) coll)))
    swapped
    )
    )
    )

    (defn swap-odd [coll]
    (let [swapped (apply concat (map (fn [[x y]] (if (> x y) [y x] [x y])) (partition 2 (rest coll))))]
    (if (even? (count coll))
    (lazy-cat (take 1 coll) swapped (take 1 (drop (dec (count coll)) coll)))
    (lazy-cat (take 1 coll) swapped)
    )
    )
    )

    (defn bubble-sort [coll]
    "A bubble sort using lazy sequences"
    (if (empty? coll)
    []
    (let [sorter (fn [[coll n]]
    (let [swapper (if (even? n) swap-even swap-odd)] [(swapper coll) (inc n)]))]
    (first (nth (iterate sorter [coll 0]) (count coll)))
    )
    )
    )

    As you see, I now make n-1 swaps in one iteration on an n long collection which is probably the reason for the speedup.

    It takes ~4 seconds to execute this sort:

    (time (take 5 (bubble-sort (reverse (range 1000)))))

    So it's a whole lot faster than my previous versions but I guess still slower than the original code you posted :)

    Balint

    ReplyDelete
  7. balint - That's looking really nice. Good to see that you can do it using that swapping pattern!

    Maybe you could get rid of the duplication between swap-odd and swap-even by passing the difference as parameters? (e.g. a test function and a rest function)?

    ReplyDelete
  8. I came up with a different way of stopping the sort. Also in Clojure 1.1 you don't need to use lazy-cons.

    (defn bubble [lst]
    (if (or (empty? lst) (nil? (second lst)))
    lst
    (if (> (first lst) (second lst))
    (cons (second lst) (lazy-seq
    (cons (first lst) (bubble (nthnext lst 2)))))
    (lazy-seq (cons (first lst) (bubble (rest lst)))))))

    (defn bubble-sort
    [lst]
    (last (take (* (count lst) (count lst)) (iterate bubble lst))))

    ReplyDelete