Fatvat: February 2009

Tuesday, 24 February 2009

Bloom Filters

Bloom Filters are an efficient data structure for determining whether an item is a member of a set. It's a probabilistic set, it's guaranteed to never return a false negative BUT can sometimes falsely report that an item is in the set.

The Bloom filter doesn't store the data in the structure, instead it uses a bit array. There are two operations on a Bloom filter:

Add - as the name suggests adds a new element to the filter.
Query - returns whether an item is in the set or not

A series of hash functions (ideally independent) are used to calculate a number of indices within the bit-array. When we add something we set the various bit indices, and when we query we check whether all these bit indices are set.


(defstruct bloom-filter :hashfns :value)

(defn make-bloom-filter
  ([n] (struct bloom-filter md5-hashes (bit-array n)))
  ([n fns] (struct bloom-filter fns (bit-array n))))

(defn add!
  [bloom n]
  (let [hashes (map (fn [x] (x n)) (bloom :hashfns))]
    (doseq [x hashes] (set-bit! (bloom :value) x 1))
    bloom))

(defn query
  [bloom n]
  (let [hashes (map (fn [x] (x n)) (bloom :hashfns))]
    (reduce bit-and (map (fn [z] (get-bit (bloom :value) z)) hashes))))

The gotcha for me was remembering to use doseq for side effects. If instead I'd used map I'd have (and was) in trouble because it wasn't evaluated. doseq forces the evaluation.

One simple choice for the hashes is to use MD5 hash values and split it. MessageDigest allows you to calculate various hash functions.


(ns bloom
  (:use bitarray)
  (:use clojure.contrib.test-is)
  (:import (java.security MessageDigest)))

(defn pad [n s]
  (let [padding (- n (count s))]
    (apply str (concat (apply str (repeat padding "0")) s))))

(defn md5-hash [s]
  (let [m (MessageDigest/getInstance "MD5")]
    (.update m (.getBytes (str s)) 0 (count s))
    (let [x (.toString (BigInteger. 1 (.digest m)) 16)]
      (pad 32 x))))

So how well does this work?


(deftest test-bloom
  (let [teststrs (map (fn [x] (str x)) (range 0 1000))
        bloom (make-bloom-filter 0xFFFF)]
    (doseq [x teststrs]
      (is (= 0 (query bloom x)))
      (add! bloom x)
      (is (= 0 (query bloom (str "not" x))))
      (is (query bloom x)))))

In this example, I've used hash functions which break MD5 down into 4 lots of 4 hex characters which gives a range of 65536.

Running this test gives


  bloom> (run-tests 'bloom)
  Ran 1 tests containing 3000 assertions.
  0 failures, 0 errors.

Awesome, no false positives. Taking it down to 0xFFF gives 82 false positives which is inline(ish!) with the figures here Bloom filter error table.

The primary use case is caching - check it's in some storage mechanism before doing something expensive (BigTable and Squid Cache both use bloom filters.).

Rapleaf write about how using a Bloom filter saved some serious time (see here).

Monday, 23 February 2009

Bit Fields using Clojure

A bit array is a way of getting a very compact array of Boolean values with each value being represented as a single bit. Bit arrays are usually associated with low(er)-level like C, but you can do them in Clojure too.

Clojure provides array functions through aget and aset (slightly strange in that it mutates a data structure in place). Using Java arrays isn't very functional, but in my opinion this is about providing the balance between strictness (e.g. Haskell) and leniency (e.g. C).

We define a bit-array as a structure consisting of some array data and the width of each field.


(defstruct bit-field :element-width :array-data)

(defn bit-array
  [n]
  (struct bit-field 31 (int-array (inc (int (/ n 31))))))

Where 31 is the range of an integer in Java (it doesn't support unsigned) (yeah, my thinking is fuzzy here, but I think that's right... Certainly 32 fails all my tests.).

A bit-array only consists of a couple of operations, get-bit and set-bit!. The bang (!) notation at the end of the set-bit function name is an informal way of indicating that the function mutates its data.


(defn set-bit!
  [bitfield bit val]
  (let [r (mod bit (bitfield :element-width))
        n (int (/ bit (bitfield :element-width)))
        x (aget (bitfield :array-data) n)]
    (if (not (zero? val))
      (aset (bitfield :array-data) n (bit-or x (bit-shift-left 1 r)))
      (aset (bitfield :array-data) n (bit-xor x (bit-shift-left 1 r))))
    bitfield))

(defn get-bit
  [bitfield bit]
  (let [r (mod bit (bitfield :element-width))
        x (aget (bitfield :array-data) (int (/ bit (bitfield :element-width))))]
    (if (= 0 (bit-and x (bit-shift-left 1 r))) 0 1)))

We work out the index in the array to change by dividing the bit to set by the width of each element and use bit-shift-left to identify the bit in question to twiddle.

How can we be sure this works? Well, we can never be 100% sure without a proof, but we can at least run it against a reasonable set of data! Clojure Contrib has a very simple library for testing. You define tests using the deftest macro, each test consists of a number of is functions that verify assertions.



(deftest test-bits
  (let [n 3000
        f (bit-array n)]
    (is (= 31 (f :element-width)))
    (doseq [x (range 0 n)]
      (is (= 0 (get-bit f x))))
    (doseq [x (range 0 n)]
      (set-bit! f x 1)
      (is (= 1 (get-bit f x))))
    (doseq [x (range 0 n)]
      (set-bit! f x 0)

Tests are run using run-tests. For example:


  bitarray> (run-tests 'bitarray)

  Testing bitarray

  Ran 1 tests containing 9001 assertions.
  0 failures, 0 errors.
  nil

So it passes the tests therefore it at least vaguely works!

Friday, 20 February 2009

Lazy Clojure, Slime and Emacs

The latest version of Swank Clojure is now compatible with the recent lazy changes to Clojure.

The most visible changes:

Use lazy-seq to construct things, ~~lazy-cons~~ is gone for example:


  user> (lazy-seq (cons 1 (cons 2 (cons 3 ()))))
  (1 2 3)

Nil punning is gone (mentioned before)

JMusic and Clojure

JMusic is a Java library for music composition. I thought it'd be fun to play with music as with images, so I'm looking at trying some algorithmic compositions.

Since it's a Java library, it's easy to use JMusic with Clojure. It's simple as downloading the JAR file (see here) and then making sure the JAR is on your class path when you start Clojure.

You'll need to import a few classes to get started. Clojure doesn't support importing all members from a namespace, so it can be a little tedious. (as a side note, In Java I'm so used to IntelliJ auto-importing that I'd forgotten how much crud you have to import these days).

Here's my basic imports for a simple program which mirrors the basic one in the tutorial.


(ns jmusic
  (:use [clojure.contrib.import-static :only (import-static)])
  (:import jm.JMC)
  (:import (jm.util Write))
  (:import (jm.music.data Note Score Part Phrase)))

(import-static jm.JMC 
               CROTCHET
               C4
               FLUTE)

import-static is a very handy function - it does exactly what it says on the tin!

JMusic has a simple composite model for music.

A Note is the building block of any composition
A Phrase is 1 or more notes
A Part is a collection of phrases played with a given Instrument
A Score is a collection of parts

Writing functions to composes phrases of notes, and parts of phrases is very tedious, so we'll use a macro to help avoid duplication. Here's a few helper functions to build the various music domain objects in JMusic. Note that this is just for my "hello world" style application, they are incredibly inflexible functions at the moment!


(defmacro jm-add-children
  [m obj parts]
  `(let [obj# ~obj]
      (doseq [p# ~parts]
        (doto obj#
          (~m p#)))
      obj#))

(defn make-score 
  [name parts]
  (let [sc (Score. name)]
    (jm-add-children .addPart sc parts)))

(defn make-phrase
  [name notes]
  (let [p (Phrase. name)]
    (jm-add-children .addNote p notes)))

(defn make-part 
  [name instrument phrases]
  (let [part (Part. name instrument)]
    (jm-add-children .addPhrase part phrases)))

(defn make-note
  [freq rhythm]
  (Note. freq rhythm))

Once you've got a score together, you need to be able to save it.


(defn save [score output]
  (Write/midi score output))

OK, with all those helper functions out of the way we can now write some music. Taking the first example (chromatic scale) from the JMusic tutorial, we get:


(defn make-noise []
  (let [notes (map (fn [y] (make-note (+ C4 y) CROTCHET)) (range 0 12))
        phrase (make-phrase "Phrase1" notes)
        part   (make-part "Part" FLUTE (list phrase))
        score (make-score "Score" (list part))]
    (save score "chromatic-scale.mid")))

Running this at your REPL should get:


  jmusic> (make-noise)
  ----------------------------- Writing MIDI File ------------------------------
  Converting to SMF data structure...
      Part 0 'Part' to SMF Track on Ch. 0:  Phrase 0:............
  MIDI file 'chromatic-scale.mid' written from score 'Score' in 0.001 seconds.
  ------------------------------------------------------------------------------
  nil

And result in a chromatic scale midi file being output. Next, to find something funkier to do!

Wednesday, 18 February 2009

Lazier Clojure

Clojure has become a lazier language. See here for a description of the changes.

One of the main changes is the removal of "nil-punning". This was a technique where functions operating on empty lists returned nil which evaluated to false in a conditional statement. All of this is explained in much more detail here..

Monday, 16 February 2009

Countdown

Countdown is a Channel4 game show with standard number and word puzzles. In this post we'll look at the Numbers game. The rules are simple, given 6 numbers (between 1 and 999 inclusive), calculate the target number (between 100 and 999 inclusive). You can use +, -, / and * to get the numbers and you have a 30 second time limit to do so.

To solve this in Clojure we'll start with a brute force search of all the possibilities and see if that's good enough to solve it.

One approach I tried initially was just to build up the Clojure expression tree for all possible examples, using code like this:


(def *operators* ['+ '- '/ '*])

(defn expr
  "A list of expressions for a and b"
  [a b]
  (map (fn [x] (list x a b)) *operators*))

The idea would be that I could then just (map eval (expr 4 5)) and get all the possible results. This turned out to be very slow. Generally you want to avoid calling eval at run-time if you can help it.

To solve this I defined a simple structure to keep track of running expressions and their value. As the expressions are built up, the values are calculated in sync.


(def *operators* {'+ + '- - '/ / '* *})

(defn is-valid [op a b]
  (cond 
   (= + op) true
   (= - op) (> a b)
   (= * op) true
   (= / op) (= (mod a b) 0)))

(defstruct node :expression :value)

(defn value 
  [x]
  (if (map? x)
    (x :value)
    x))

(defn expression
  [x]
  (if (map? x)
    (x :expression)
    x))

(defn expr 
  "A list of expressions for a and b"
  [a b]
  (let [nodea (map? a) nodeb (map? b)]
    (filter (fn [x] (not (nil? x))) 
            (map (fn [x] (when (is-valid (second x) (value a) (value b))
                           (struct node 
                                   (list (first x) (expression a) (expression b)) 
                                   ((second x) (value a) (value b)))))
                 *operators*))))

Why is *operators* a map? That's simply because "+" doesn't print very nicely e.g.


  countdown> +
  #<core$_PLUS___3180 clojure.core$_PLUS___3180@61dd1c39>

I also added a check to prune entries out that results in floating point or negative numbers, that just helps keep the number of combinations down a little.

Armed with a function that calculates all the possible expressions for a pair of expressions, how do we now use that to generate all the possible expressions?


(defn make-expressions-helper 
  "Given a lst, build up all valid Countdown expressions"
  [x]
  (cond
   (< (count x) 2) (list (struct node (first x) (first x)))
   (= 2 (count x)) (apply expr x)
   :else
     (let [exps (apply expr (take 2 x))
           remd (drop 2 x)]
       (mapcat make-expressions-helper (map (fn [x] (cons x remd)) exps)))))

This is a recursive definition with the following logic:

A singleton list (1) can only evaluate to itself so the only possibility is [expr=1 value=1]
A list of size two just uses the expr function to generate all the expressions
Any other list builds all the possible expressions for the first two elements, and then for all of these (mapcat) calls make-expressions-helper on the rest.

Note that this just builds up the possible expressions with the numbers in this particular order. For example.


  countdown>(make-expressions-helper '(1 2 3))
  ({:expression (+ (+ 1 2) 3), :value 6} {:expression (/ (+ 1 2) 3), :value 1}
   {:expression (* (+ 1 2) 3), :value 9} {:expression (+ (* 1 2) 3), :value 5}
   {:expression (* (* 1 2) 3), :value 6})

  countdown> (count (make-expressions-helper '(1 2 3 4 5 6)))
  118

So now we need to apply the helper function to all possible combinations. Thankfully, Clojure Contrib already has a few combinatorics algorithms. permutations returns a lazy list of all possible permutations of the supplied list.


(defn make-expressions [lst]
  (if (nil? lst)
    nil
    (lazy-cat
     (mapcat make-expressions-helper (permutations lst))
     (mapcat make-expressions (drop-one lst)))))

So this algorithm applies the helper function to all permutations of the input, and then applies itself to all combinations of the remainder of the list. drop-one is a helper function which gives a list of all combinations of a list without one element.

So how many valid Countdown expressions are there?


  countdown> (count (make-expressions '(1 2 3 4 5 6)))
  300290

  countdown> (time (count (make-expressions '(1 7 8 25 50 75))))
  "Elapsed time: 2653.618442 msecs"
  268175

Note that the number is different because we rule out cases which result in floating point or negative numbers. The elapsed time is just under three seconds which is pretty fast! Remember that this time includes all the calculation of the results too, not just generating the expressions. So finally, all we need is a solver function.


(defn solve 
  "Solve the countdown problem"
  [numbers target]
  (filter (fn [x] (= (x :value) target)) (make-expressions numbers)))

This will return all the combinations that lead to the right results. Let's try it out with a toy examples:


  countdown> (time (solve '(4 5 6) 15))
  "Elapsed time: 0.281907 msecs"
  ({:expression (+ (+ 4 5) 6), :value 15} {:expression (+ (+ 4 6) 5), :value 15}
   {:expression (+ (+ 5 4) 6), :value 15} {:expression (+ (+ 5 6) 4), :value 15}
   {:expression (+ (+ 6 4) 5), :value 15} {:expression (+ (+ 6 5) 4), :value 15})

Notice that we've returned all possible + expressions that make 15. We've not taken any notice of the commutative properties of addition. Taking advantage of these properties is explored in "The Countdown Problem" [PDF] by Graham Hutton.

How does it fair on bigger solutions?


  countdown> (time (solve '(7 5 9 25 40 10) 753))
  "Elapsed time: 222.632493 msecs"
  {:expression (- (* (- (- (* 5 25) 9) 40) 10) 7), :value 753}

With the code as it stands we could add additional operators (exponent for example) without any code changes, but more operators would probably require something more sophisticated than brute force.

As usual, any suggestions for making the code clearer (or finding any bugs!) are greatly appreciated. Full code is on my Git repository.

Saturday, 14 February 2009

Huffman Encoding

Huffman Encoding is a simple technique for lossless data compression. The idea is simple; replace frequently occuring symbols with short bit patterns and infrequently occuring symbols with longer ones.

Firstly, we must produce a frequency table that gives weights for each symbols. ~~Here's a version~~ (update 14/2/9) Turns out there was a much better implementation of doing this (frequencies) already in Clojure contrib so I'll use that and save the world from seeing my bad version. Notes on why it was bad at end.

For example:


user> (frequencies "aaaabbbbcdddde")
{\e 1, \d 4, \c 1, \b 4, \a 4}

Once we've got the frequencies, we can construct a Huffman Coding Tree. The algorithm description (from Wikipedia) is:

Create a leaf node for each symbol and add it to the priority queue.
While there is more than one node in the queue:
1. Remove the node of highest priority (lowest probability) twice to get two nodes.
2. Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
3. Add the new node to the queue.
The remaining node is the root node and the tree is complete.

This tree has the property that the path to each node has a unique prefix. We can translate this directly into Clojure as:


(defn coding-tree 
  "Given an ordered frequency list, create an encoding tree"
  [open]
  (prn open)
  (if (> (count open) 1)
    (let [new-node (apply tree-node (take 2 open))]
      (recur (add-to-queue new-node (drop 2 open))))
    (first open)))

Where add-to-queue simply inserts a node in the right place. See huffman.clj for full code.

The coding tree isn't enough on its own, we have to change this in to a map from symbol to bit-pattern. To get the bit pattern for any node we start from the root and follow a route to the symbol in question. When we take a left node we get a "1" and a right branch gets a "0". The lookup function takes an encoding tree and flattens it into a map.


(defn- lookup-helper
  [tree path]
  (if (nil? tree)
    nil
    (let [v (first (first tree))]
      (lazy-cat (if (= v \*) nil (list [v path] ))
                (lookup-helper (left-node tree)  (cons 0 path)) 
                (lookup-helper (right-node tree) (cons 1 path))))))

(defn lookup
  [tree]
  (into {} (lookup-helper tree nil)))

Lazy functions ensure that we don't get a stack overflow. The defn- indicates that lookup-helper is a private function.

Finally we need a function that given a sequence and an encoding table gives us the encoded series of bits.


(defn huffman-compress 
  [s table]
  (mapcat (partial get table) s))

Note that the sequence and the encoding table don't have to be the same. If, for example, the data to compress was in the English language, then you could use a known Huffman table based on Frequency Analysis of a typical corpus.

So how much compression can we get? Let's look at an example:



  user> (let [x "busy busy bee"]
       (compress x (huffman-coding-table x)))
  (1 0 0 0 1 1 1 0 1 1 1 0 1 1 0 0 0 1 1 1 0 1 1 1 0 1 1 0 0 0 1 0 0 1)

  user> (count *1)
  34

So "busy busy bee" encoded to 34 bits (*1 is used to refer to the last evaluated expression at the REPL). Compared to the 13*8 bits this would take with ASCII this is a good saving. How do we fair with bigger texts? Let's try Hamlet.


  user> (time (let [x (slurp "/home/jfoster/Desktop/2ws2610.txt")]
   (count (compress x (huffman-coding-table x)))))
  "Elapsed time: 592.317906 msecs"
  921595
  
  user> (* 8 (count (slurp "2ws2610.txt")))
  1544656

A pretty big saving again (down from ~1.5 million bits to 900000 bits). Note that in all these savings I'm not including the size of the tree!

In this use a symbol is a character, we could use words instead to get bigger savings (we wouldn't have to change the code at all). PKZIP use Huffman in their arsenal of compression techniques (see LZ77 and LZ78 for other examples).

As a side note, why was my version of frequencies less than good? (bad version preserved for posterity here).

I didn't know assoc took an optional argument representing the default, this means I could have avoided special casing the first occurrence of a symbol.
I didn't use a standard idiom (reduce) when I could have done. Counting frequencies is simply taking a big sequence and making a different one.

I should spend some more time reading source code - any other improvements that you can see are gratefully accepted!

Wednesday, 11 February 2009

Base 64 Decoding

For completeness.


(defn decode-num
  [num]
  (let [a (bit-and num 255)
        b (bit-shift-right (bit-and num 65280) 8)
        c (bit-shift-right (bit-and num 16711680) 16)]
    (list (char c) (char b) (char a))))
    
(defn decode
  "Lazily decode a sequence from base64"
  [s]
  (when-not (nil? s)
    (let [x (map (fn [x] (.indexOf *encode-table* (int x))) (take 4 s))
          num (+ (nth x 3) (bit-shift-left (nth x 1) 6) (bit-shift-left (nth x 2) 12) (bit-shift-left (nth x 0) 18))]
      (lazy-cat (decode-num num) (decode (drop 4 s))))))

Obviously base 64 decoding is a just what we did previously, only backwards!


  user> (apply str (decode (encode (decode (encode "The quick brown fox jumped over the lazy dog.")))))
 "The quick brown fox jumped over the lazy dog."

Tuesday, 10 February 2009

Emacs Fonts

Came across a Coding Horror article about fonts and programming and that encouraged me to find a sexier font. I settled on Bitstream Vera Sans Mono font.

I found numerous sites giving instructions about installing fonts for Emacs, but none of those seemed to work straight away for me so I thought I'd document the steps I went though to get things going:

sudo apt-get install ttf-bitstream-vera
M-x customize-face (return)
default (return)
Enter bitstream-bistream vera sans mono as the font family
Save all

Then use the excellent color-theme to select some nice colours (Arjen in my case) and Emacs starts to look nice!

Sunday, 8 February 2009

Bit Shifting in Clojure

Base64 encoding is a way of converting a stream of binary data into a printable form. The name comes from the 64 allowable characters ([a-z][A-Z][0-9]+/=) that are used.

The algorithm is very simple. Get 3 bytes at a time (if you can't, just pad with a character, typically =), munge them together (making 24 bits). We then split this 24 bits into 4 lots of 6 bits which allows us to pick one of the 64 allowable characters. This involves dealing with a few of the bit operators in Clojure, as shown below:


(def *encode-table*
  "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=")

; Daft and way too slow
(defn encode-num 
  [num]
  (let [a (bit-and num 63)
        b (bit-shift-right (bit-and num 4032) 6)
        c (bit-shift-right (bit-and num 258048) 12)
        d (bit-shift-right (bit-and num 16515072) 18)]
    (map (fn [x] (nth *encode-table* x )) (list d b c a))))

(defn str-pad [x size padchar]
  (let [d (rem (count x) size)]
    (if (zero? d)
      x
      (concat x (take (- size d) (repeat padchar))))))

                   
(defn encode 
  "Lazily encode a sequence as base64"
  [s]
  (if (nil? s)
    nil
    (let [x (map int (str-pad (take 3 s) 3 \=))
          num (+ (nth x 2) (* 256 (nth x 1)) (* 256 256 (first x)))]
      (lazy-cat (encode-num num) (encode (drop 3 s))))))

The magic numbers in the bit-and allow us to select the right parts of the integer (63 is 111111, 4032 is 111111000000 and so on). This could be improved substantially with more bit-twiddling (see for example String Encoders).

One nice property of this is that by using lazy-cat we can deal with infinite sequences (just as long as you don't try and print the result!)


user> (apply str (take 10 (encode (range 0 1000000000000000000))))
"AEACAQwFBc"

In case you're wondering, I am really scraping the bottom of the barrel for little programming tasks to learn something new at the moment! I've ordered myself a copy of The Princeton Companion to Mathematics which'll hopefully provide more inspiration when it arrives.

(Update 24/2/2009) As Cubix pointed out, there's a bug in the code above as padding should be applied after encoding, not before. The full code is in the comments, but the main point is to change the code such that if we don't get three elements we don't do any encoding and instead use a helper function to do the last bytes. I'm sure there must be a further improvement as padding duplicates functionality in encode.


(defn encode
  "Lazily encode a sequence as base64"
  [s]
  (if s
    (let [x (map int (take 3 s))]
      (if (= 3 (count x))
        (let [num (+ (nth x 2) (* 256 (nth x 1)) (* 256 256 (first x)))]
          (lazy-cat (encode-num num) (encode (drop 3 s))))
        (padding x))))) ;;; helper function, see comments

Tuesday, 3 February 2009

Speeding up the Mandlebrot app

Previously I had some simple code for calculating the Mandelbrot fractal. Unfortunately, it was dog slow.

The primary reason is the use of BufferedImage. This is fine when you're loading an image from a file, but it's a big useless to go through a sequence setting each and every pixel with setRGB.

When you've already got all the data in memory, MemoryImageSource is a much better bet. You can create one with a 1D array of pixel data (with a specified width/height). MemoryImageSource implements the ImageProducer interface which means that any AWT derived component can use createImage to make an Image.

Firstly, I converted over the (x,y) set to just a list of numbers where (given the height and width) you can work out the corresponding pixel index, then I can just apply pmap (see also Ray Tracing)to calculate all the pixels in parallel.


(defn calculate-pixels []
  (let [pixels (range 0 (* *width* *height*))]
    (pmap (fn [p] 
            (let [row (rem p *width*) col (int (/ p *height*))]
              (get-color (process-pixel (/ row (double *width*)) (/ col (double *height*))))))

Next we need the interop to go from my sequence of pixel values to an integer array.


(defn simple-mandlebrot [w h]
  (let [x (int-array (calculate-pixels))]

int-array converts between a list of integers to a plain old Java array. I could probably use a Java Array for the whole thing, and not use seq, but I doubt this is an appreciable performance difference.

These changes have made a huge difference, instead of taking minutes to render a 512x512 image it takes a few seconds and all my CPU cores are occupied.

Monday, 2 February 2009

Clojure Macros

Macros are one of the defining features of Lisp languages. A macro operates prior to compilation, allowing you to shape the code as you wish. Since Lisp code is homoiconic the macro language is the language.

Languages like C/C++ have macros, but they are in no way the same. You have a very limited language and you don't get the fine grained access to the code that you do with Lisp (since the code is just itself a Lisp data structure).

Pl Patterns tries to describe a taxonomy of macro use. One of the examples is debug printing with an example from the Arc language. Converted to Clojure this looks like this:


(defmacro dbg-prn
  "Debugging form that prints out results"
  [& more]
  `(let [start# ~more]
     (print '~more "==>" start# "\n")
     start#))

defmacro defines a new macro with similar structure to defn The ` is used to create a template expression, where we can evaluate certain items within the expression by using macro characters (#,~,`,list-frag?). In this example we used start# to create a uniquely named value for the let expression and ~more to evaluate the parameters.

So now when I'm trying to debug code there's no more repetition of print x, return x I can just edit my function definition by adding dbg-prn (without the normal hardship of wrapping extra brackets around).


  user> (dbg-prn + 1 2 3 4 5)
  (+ 1 2 3 4 5) ==> 15  ;; printed to std-out
  15

  user> (dbg-prn + (* 2 3) (* 4 5))
  (+ (* 2 3) (* 4 5)) ==> 26 
  26

You can use macroexpand-1 to expand out macros to see what they actually do:


  user> (macroexpand-1 '(dbg-prn + 1 1))
    (clojure.core/let [start__2150__auto__ (+ 1 1)] 
      (clojure.core/print (quote (+ 1 1)) "==>" start__2150__auto__ "\n") 
      start__2150__auto__)

Sunday, 1 February 2009

Highlighting Code for the Web using Emacs


(defn process-pixel [x y]
  ((fn [x y xc yc accum]
     (let [x1 (+ (- (* x x) (* y y)) xc)
           y1 (+ (* 2 x y) yc)
           sq (+ (* x1 x1) (* y1 y1))]
       (cond

        (> accum *max-iteration*) *max-iteration*
        (> sq 2.0) accum
        :else (recur x1 y1 xc yc (inc accum))))) x y x y 0))

On Ubuntu you can install a bunch of Emacs goodies by installing the emacs-goodies-el package.

Once you've done this, you'll have Htmlize and can just do M-x htmlize-buffer to get an HTML rendition of the current buffer. This uses CSS, so all you really need to do is whack the definitions in your Blogger template and then paste the body of the code in. Obviously you'll need something like Slime or some other Emacs package that does the highlighting.

Mandlebrot Fractals

The Mandlebrot Set is probably the most famous set of fractals. The maths behind it is dead simple and with just a few lines of code you can get some impressive results.

For any given pixel, you can work out a colour value thus:

(def *max-iteration* 512)

(defn process-pixel [x y]
((fn [x y xc yc accum]
(let [x1 (+ (- (* x x) (* y y)) xc)
y1 (+ (* 2 x y) yc)
sq (+ (* x1 x1) (* y1 y1))]
(cond
(> accum *max-iteration*) *max-iteration*
(> sq 2.0) accum
:else (recur x1 y1 xc yc (inc accum))))) x y x y 0))

The harder part is translating a number between 0 and *max-iteration* into a decent range of colours. I'll ignore this for now and stick with green!

Source code for version 0.1 is here. Next on the list:

Make it look like the Math - write complex number library
Make it run fast - optimize (uses more cores, minimize type coercions)
Make it look nice - find a better colour mapping function

Refactoring is the Java term for doing work.

I think this is a great quote (and has a touch of truth about it).