Monday 29 June 2009

ICFP Contest - This time it's functional...

Last time, I showed the VM implementation for the ICFP 2009 problem [PDF]. I'd used mutable state and hence it was a bugger to try and understand it. For example, looking back, what's this code do?

After I realized that I was not going to go anywhere with actually solving the problem, I thought it'd be more interesting to make a purely functional implementation of the VM and compare the performance with the mutable solution.

In order to refactor to a functional solution, all mutable state needs to be replaced with items that return a new instance of the code. This means that now we can write the main "loop" as a standard functional construct, reduce, which applies the list of operations to the VM and creates a new instance each time.

Adjusting the code to a functional style was a small transformation. Each instance of swap! was replaced with a creation operation instead. No more dereferences (@) signs cluttering up the code, and something that is much clearer to understand.

So how does this affect performance? Previously, the code did 1000 iterations in 3.85 seconds. Running the functional code gives almost exactly the same results (1000 iterations in about 3.75 seconds). I'm not sure how this compares to other implementations of the same code (I suspect poorly!), so I'll hunt around for some comparisons. Either way, I will definitely prefer no mutation wherever possible...

I thoroughly enjoyed the ICFP contest this year. Learnt some bit-fu, a small amount of orbital mechanics and actually managed to make some progress even though ultimately I didn't even submit a single solution :)

ICFP Contest 2009 (2)

In my last post about the ICFP 2009 Programming Contest I'd wrestled with some bits and finally correctly (well, it matched up with this at least!) interpreted the instructions for the virtual machine that was provided.

The next challenge is to actually run the virtual machine. I went through lots of bad designs, but in this post and the next I thought I'd highlight two of these.

For reasons known only to myself, I decided the first approach would use mutable state. This seemed like a good model of a virtual machine, the memory would be represented by a vector of atoms; each atom would be the value of the memory at that address and I could use swap! to update the value of the atom.

Previously, I'd use simple symbols to represent the machine instructions. Now that we've got to process instructions, it makes more sense to represent each instruction as a function of the virtual machine.

;;; Virtual machine specification
(defstruct virtualmachine :mem :counter :inport :outport :status :firstrun)

(defn get-val
[vm x]
@((:mem vm) x))

(defn numeric-op
"D-type General numeric op"
[vm [x y] f]
(let [m (:mem vm)]
(swap! (m @(:counter vm)) (constantly (f @(m x) @(m y))))))

(defn phi
[vm [x y]]
(let [m (:mem vm)]
(trace vm 'Phi (format "%s ? %s : %s --> %s" @(:status vm) @(m x) @(m y) (if @(:status vm) @(m x) @(m y))))
(swap! (m @(:counter vm))
(if @(:status vm)
@(m x)
@(m y))))))

(defn print-args
[vm op x y]
(format "%s %s // %s %s %s" x y (get-val vm x) op (get-val vm y)))

(defn add
"D-type Add instruction"
[vm [x y]]
(trace vm 'Add (print-args vm '+ x y))
(numeric-op vm [x y] +))

(defn sub
"D-type Sub instruction"
[vm [x y]]
(trace vm 'Sub (print-args vm '- x y))
(numeric-op vm [x y] -))

(defn mult
"D-type Multiply instruction"
[vm [x y]]
(trace vm 'Mult (print-args vm '* x y))
(numeric-op vm [x y] *))

(defn div
"D-type Divide"
[vm args]
(trace vm 'Div)
(numeric-op vm args (fn [x y] (if (zero? y) 0 (/ x y)))))

(defn noop
"S-type Noop instruction"
[vm args]
(trace vm 'Noop)

(defn copy
"S-Type: Copy instruction"
[vm [x]]
(trace vm 'Copy (format "%s // %s" x (get-val vm x)))
(swap! ((:mem vm) @(:counter vm)) (constantly (get-val vm x))))

(defn sqrt
"S-Type: Square root instruction: undefined for negative values"
[vm [x]]
(trace vm 'Sqrt)
(assert (not (neg? (get-val vm x))))
(swap! ((:mem vm) @(:counter vm)) (constantly (Math/sqrt (get-val vm x)))))

(defn input
"S-Type: Set the memory from the inport"
[vm [x]]
(trace vm 'Input)
(swap! ((:mem vm) @(:counter vm)) (constantly @((:inport vm) x))))

(defn output
"Output instruction: Set the memory on the outport"
[vm [x y]]
(trace vm 'Output (format "%s %s // %s" x y (get-val vm y)))
(swap! ((:outport vm) x) (constantly (get-val vm y))))

(defn cmpz
"Comparison function"
[vm [cmp y]]
(let [val @((:mem vm) y)
status (cond ;; TODO replace this with functions so it becomes (apply cmp val)
(= cmp 'LTZ) (< val 0)
(= cmp 'LEZ) (<= val 0)
(= cmp 'EQZ) (zero? val)
(= cmp 'GEZ) (> val 0)
(= cmp 'GTZ) (>= val 0)
:else (assert false))]
(trace vm 'Cmpz (format "%s %s --> %s" cmp y status))
(swap! (:status vm) (constantly status))))

(def d-type-instructions {1 add, 2 sub, 3 mult, 4 div, 5 output, 6 phi})
(def s-type-instructions {0 noop, 1 cmpz, 2 sqrt, 3 copy, 4 input})
(def comparison {0 'LTZ, 1 'LEZ, 2 'EQZ, 3 'GEZ, 4 'GTZ})

You'll notice an abundance of trace output! This got added as a found repeated bugs in my code. One thing I learnt this time round at ICFP is that the tracing should be built in from the start (not retrofitted as I did here!).

Originally, I wrote my own trace functions (based on ones mentioned in the GPS Problem Sovler post), but I found that Clojure Contrib has a trace package too.

#clojure helped out once again with how to turn tracing off. (alter-var-root (var clojure.contrib.trace/tracer) (constantly (constantly nil))) alters the bindings of trace/tracer to be a function returning nil. alter-var-root's second argument is a function which takes the previous value of the var and returns the new binding.

The code below shows how the VM was implemented. The virtual machine struct is initialized with 16384 (14 bits of address space) atoms, each representing one memory unit. As we run through the list of operations, we simply apply the operations and increment the program counter.

The virtual machine takes an updater function, the goal of the updater function was to read the outputs and adjust the input functions accordingly. Unfortunately, I realized pretty soon into this, that a team of one is not the way to win this contest so I never got far with implementing the Hohmann transfer implementation.

;;; Virtual machine executing instructions
(defn vector-atoms
"Create a vector of atoms, initialized to zero"
(into [] (take n (repeatedly #(atom 0)))))

(defn init-vm
(let [memory (vector-atoms (count data))]
(doall (map (fn [a d] (swap! a (constantly d))) memory data)))
(struct virtualmachine memory (atom 0) (vector-atoms 16384) (vector-atoms 16384) (atom false) (atom true))))

(defn hohmann-updater
(when @(:firstrun vm)
(swap! ((:inport vm) 0x3E80) (constantly 1001))))

(defn create-vm
(init-vm (map second instructions)))

(def bin1 (read-data (get-bytes bin1)))

;; TODO This could be purely functional, if I just returned
;; a copy of the entire VM after each operation was applied.
(defn run-machine
"Run the virtual machine with the decoded instructions.
Reset the program counter when complete"

[vm ops update-input]
(update-input vm)
(doseq [[op args] ops]
(apply op (list vm args)) ;; dodgy side effect
(swap! (:counter vm) inc))
(swap! (:counter vm) (constantly 0))
(swap! (:firstrun vm) (constantly false))

(defn run []
(let [x (create-vm bin1)
ops (map first bin1)]
(count (take 1 (repeatedly #(run-machine x ops hohmann-updater))))))

The speed of the VM is quite important. If it's too slow, you'll spend far too long waiting for satellite transfers, and not enough time working out the right answers. This first version is pretty fast 1000 iterations on my machine takes about 3.5 seconds (about 285 iterations per second). The very first implementation (the very bad one) used refs rather than atoms, and this was about twice as slow. I guess the extra time comes from the STM implementation.

This was the first time I'd made a conscious effort to program with mutable state in Clojure and I really wish I hadn't! The end result is really ugly code that confuses me each time I look at it. In the next post, I'll compare this against the purely functional version.

Friday 26 June 2009

ICFP Contest 2009

The ICFP Contest 2009 started today, so I thought I'd have an attempt at solving the problem in Clojure.

The contest involves implementing a virtual machine which acts as a control module for a satellite. The fun bit is controlling the satellite (which I haven't got to yet), the really *annoying* bit is decoding the virtual machine instructions.

The code below decodes the first dump and prints out the list of op codes. I've no idea if it prints out the right code though...

(:import [java.lang.reflect Array])
(:import [ FileInputStream File]))

;;; Location of relevant files
(def bin1 "/home/jfoster/clojureprojects/icfp/uk/co/fatvat/bin1.obf")

;;; Boring static definitions
(def d-type-instructions {1 'Add, 2 'Sub, 3 'Mult, 4 'Div, 5 'Output, 6 'Phi})
(def s-type-instructions {0 'Noop, 1 'Cmpz, 2 'Sqrt, 3 'Copy, 4 'Input})
(def comparison {0 'LTZ, 1 'LEZ, 2 'EQZ, 3 'GEZ, 4 'GTZ})

;;; Define what the virtual memory looks like
(defstruct vm :data-memory :instruction-memory :programcounter :status)

(defn get-bytes
"Read the bytes for the given file, stored as a sequence of bytes"
(let [file (File. filename)]
(assert (.exists file))
(with-open [x (FileInputStream. file)]
(into [] (take-while (partial not= -1) (repeatedly #(.read x))))))))

(defn get-op-code
(println "bytes" bytes)
(bit-shift-right (bit-and (byte (first bytes)) 0xF0) 4))

(defn to-double
(reduce bit-or
(map (fn [x shift] (bit-shift-left (bit-and (int x) 0xFF) shift))
(range 0 64 8))))))

(defn to-int
(int (reduce bit-or
(map (fn [x shift] (bit-shift-left (bit-and (int x) 0xFF) shift))
(range 0 32 8)))))

(defn get-op
(let [d-opcode (bit-shift-right (bit-and 0xF0 (last ins)) 4)
s-opcode (bit-and 0x0F (last ins))]
(if (zero? d-opcode)
(let [sins (s-type-instructions s-opcode)]
[sins (s-args sins ins)])
[(d-type-instructions d-opcode) (d-args ins)])))

(defn d-args
(let [x (to-int ins)]
[(bit-shift-right (bit-and x 0xFFFC000) 14) (bit-and x 0x00003FFF)]))

(defn s-args
[op ins]
(if (= 'Cmpz op)
[(comparison (bit-shift-right (bit-and (to-int ins) 0x700000) 21)) (bit-and (to-int ins) 0x00003FFF)]
[(bit-and (to-int ins) 0x00003FFF)
(bit-shift-right (bit-and (last (butlast ins)) 0xF0) 4)]))

(defn get-instruction-data
[image address]
(if (even? (/ address 12))
[(get-op (subvec image (+ address 8) (+ address 8 4)))
(to-double (subvec image address (+ address 8)))]

[(get-op (subvec image address (+ address 4)))
(to-double (subvec image (+ address 4) (+ address 12)))]))

(defn read-data
[image pc]
(map (fn [x] (get-instruction-data image x)) (range 0 (count image) 12)))

Currently you can just see the op codes, but not execute them. For example:> (take 10 (read-data (get-bytes bin1) 0))
([[Noop [0 0]] 1.0]
[[Copy [265 0]] 0.0]
[[Noop [0 0]] 30.0]
[[Noop [0 0]] 0.0]
[[Copy [248 0]] 0.0]
[[Sub [4 3]] 0.0]
[[Cmpz [EQZ 5]] 0.0]
[[Phi [2 1]] 0.0]
[[Sub [7 0]] 0.0]
[[Copy [263 0]] 0.0])

Initially the spec was slightly wrong which meant a little banging of my head against a brick rule. Next plan is to get the VM to execute the instructions. I *think* I can do this by replacing my symbols with the real functions, and then just applying them to the arguments. The tricky bit will be modelling the memory model of the VM in Clojure (e.g. not a functional style but with references and so on).

Tuesday 23 June 2009

Revisiting Eliza

I previously blogged about Eliza here but decided it was worth revisiting. Looking back I was being daft and trying to use strings to represent sentences the user enters. Lisp's symbol handling is much simpler to deal with.

Eliza is one of the most famous "AI" programs. It was written to give the impression of a Rogerian pyschoanalyst by answering questions with further questions in an effort to make the patient understand the problem.

The original Eliza paper is a good read. The introduction to the paper is awesome and explains the human learning process really well.

It is said that to explain is to explain away. This maxim is nowhere so well fulfilled as in the area of computer programming, especially in what is called heuristic programming and artificial intelligence. For in those realms machines are made to behave in wondrous ways, often sufficient to dazzle even the most experienced observer. But once a particular program is unmasked, once its inner workings are explained in language sufficiently plain to induce understanding, its magic crumbles away; it stands revealed as a mere collection of procedures, each quite comprehensible. The observer says to himself "I could have written that". With that thought he moves the program in question from the shelf marked "intelligent" to that reserved for curios, fit to be discussed only with people less enlightened that he.

This explains my learning process for functional programming in general. First everything seems weird and wonderful, then understandable and then I might even have a chance of solving it myself!

Once the curtain is lifted we see that Eliza is nothing more than a pattern matching system. Lists of symbols are parsed and matched against a series of rules. The pattern matching functions associate variables with parts of the match and then substitutions can be made.

The tests below demonstrate the use of the pattern matching functions based on the examples in PAIP. fail and no-bindings are sentinel values that can be used to distinguish between no match and failure. An alternative would have been to use exceptions to represent failure, but exceptions always feel yucky to me in FP.

In Norvig's example, he uses the Lisp reader to provide a REPL for Eliza. Since Clojure has access to the GUI libraries of Java then it makes sense to use this and get a basic UI together:

Basic UI of Eliza

The code to create the UI is shown below. The interesting bit is reading the users input as a series of symbols (and the avoidance of laziness!) and the use of interleave to space things out.

Translating this from PAIP was mostly straight-forward, but there were several gotcha's. Nil-punning caught me out a few times, and some Lisp functions didn't have Clojure equivalents with the same name (so digging through the API taught me a few things e.g. sublis and replace). I'm sure there's still much I could do to improve this code. I'm particularly not impressed with position, I must be missing a trick there? Perhaps I should be using vectors more?

The full code can be found on my Clojure Projects Git repository.

Thursday 18 June 2009

Generalizing the General Problem Solver

The last post has a basic implementation of the General Problem Solver from PAIP, but it features a few big problems.

  1. "Running around the Block Problem" - In order for an action to occur it must have a consequence. How would you represent running around in circles?
  2. "Clobbered Sibling Problem" - The function to achieve the goal in the previous version just focuses on whether the each item in the list of goal states is achieved at least once by the end of the solution. By the time the final solution is reached sibling goals may have been undone.
  3. "Leaping before you look" - If we fail to find a solution we execute all actions on the way (looking before you leak). For example, we might execute jump off cliff before executing wear parachute!
  4. "Recursive Sub Goal" - To achieve A you need B, which needs A thus resulting in an infinite loop (or recursion in FP terms!).
Before we see how to solve these problems, Norvig introduces some nice debug functions, shown below in Clojure form. The functions allow you to start listening to specific tokens. Lines will only be printing if you are listening to the specific keyword. This is an example of a function that should probably be rewritten as a macro so that you only pay the cost at compile time. Of course that assumes that you aren't changing the tokens you want to listen to at run time.
(def *dbg-ids* (ref #{:gps}))

(defn dbg 
[id format-string & args]
"Print debugging info if (DEBUG id) has been specified"
(when (contains? id @*dbg-ids*)
(println (format format-string args))))

(defn debug
[& ids]
"Start dbg output on the given ids"
(alter *dbg-ids* (fn [x] (set (union x ids))))))

(defn undebug
[& ids]
"Stop dbg output on the given ids"
(alter *dbg-ids* (fn [x] (difference x ids)))))

(defn dbg-indent
[id indent format-string & args]
"Print indented debugging info if (DEBUG ID) has been specified"
(when (@*dbg-ids* id)
(println (format (str (apply str (repeat indent \space)) format-string) args))))

The main difference from the previous code is achieve-all which ensures all goals are achieved throughout the process and the use of the executing convention to denote actions that are being executed. As you'd expect the primary difference with Common Lisp is that we've got to use transactions to perform mutability. In achieve-all, I've used an atom to represent the current state. Atoms are used when only one thread needs access to the code and all changes are synchronous. In this case, the atom exists only for one loop hence an atom is the right choice. Clojure's support of sets meant that representing the goal, add and delete lists as sets made sense. Representing the result as a set is wrong though, because I want to return the list of actions that must be performed in the correct order (see the version history on Git to see the catalogue of mistakes I made before realizing this!).
(defn contains-value?
[coll val]
(not (nil? (some (partial = val) coll))))

(defn executing?
"Is x of the form: (executing ...)?"
(and (seq? x) (= 'executing (first x))))

(defn convert-op
"Make op conform the the (EXECUTING op) convention"
(if-not (some executing? (:add-list op))
(struct operation 
(:action op) 
(:preconditions op) 
(set (conj (:add-list op) (list 'executing (:action op))))
(:del-list op))

(defn make-op
[action preconditions add-list del-list]
(convert-op (struct operation action preconditions add-list del-list)))

(defn appropriate?
[goal operation]
"An op is appropriate to a goal if it is in its add list"
(contains-value? (:add-list operation) goal))

(declare achieve-all)

(defn apply-op
[state goal op goal-stack]
"Return a new, transformed state if op is applicable."
(dbg-indent :gps (count goal-stack) "Consider: %s" (:action op))
(let [new-state (achieve-all state (:preconditions op) (cons goal goal-stack))]
(when-not (nil? state)
(dbg-indent :gps (count goal-stack) "Action: %s" (:action op))
(concat (remove (fn [x] (= x (:del-list op))) new-state)
(:add-list op)))))

(defn achieve
[state goal goal-stack]
"A goal is achieved if it already holds,
or if there is an appropriate op for it that is applicable"
(dbg-indent :gps (count goal-stack) "Goal: %s" goal)

(contains-value? state goal) state
(contains-value? goal-stack goal) nil
:else (some (fn [op] (apply-op state goal op goal-stack))
(filter (fn [x] (appropriate? goal x)) @*ops*))))

(defn sequential-subset?
[s1 s2]
(and (<= (count s1) (count s2))
(every? (fn [x] (contains-value? s2 x)) s1)))

(defn achieve-all 
[state goals goal-stack]
"Achieve each goal, and make sure they still hold at the end."
(let [current-state (atom state)]
(if (and (every? (fn [g] (swap! current-state 
(fn [s] (achieve s g goal-stack)))) goals)
(sequential-subset? goals @current-state))

(defn GPS
[state goals ops]
"General Problem Solver: from state, achieve using ops"
(ref-set *ops* ops))
(remove (comp not sequential?) (achieve-all (cons (list 'start) state) goals [])))

GPS is a semi-predicate (a function that returns nil on failure and some useful value otherwise). It returns the list of actions required to reach the goal, or nil. This GPS can solve a larger class of problems, and Norvig gives the example of the "Monkey and Bananas" problem attributed to Saul Amarel. The output from the REPL shows this implementation of the "General Problem Solver" working on this problem:> (monkey-and-bananas)
Goal: (not-hungry)
Consider: (eat-bananas)
Goal: (has-bananas)
Consider: (grasp-bananas)
Goal: (empty-handed)
Consider: (drop-ball)
Goal: (has-ball)
Action: (drop-ball)
Goal: (at-bananas)
Consider: (climb-on-chair)
Goal: (at-middle-room)
Consider: (push-chair-from-door-to-middle-room)
Goal: (at-door)
Goal: (chair-at-door)
Action: (push-chair-from-door-to-middle-room)
Goal: (chair-at-middle-room)
Goal: (on-floor)
Action: (climb-on-chair)
Action: (grasp-bananas)
Action: (eat-bananas)
(executing drop-ball) 
(executing push-chair-from-door-to-middle-room)
(executing climb-on-chair)
(executing grasp-bananas)
(executing eat-bananas))
So does the GPS live up to the promises? As you may have guessed, not quite! It requires a complete specification of the problem to be useful. Even if you do have a perfect description of the problem, it might take forever to find the answer (see NP-hard problems for many examples). Full code is available on my Clojure Project git page.

Monday 15 June 2009

The General Problem Solver

Paradigms of Artificial Intelligence Programming has been sitting on my desk for a few months since I worked through the Eliza chapter, so time for me to brush it off and (finally) work my way through it.

The General Problem Solver is early example of computer AI. Herbert Simon made a speech at the 12th National Meeting of the Operations Research Society where he stated:

It is not my aim to surprise or shock you- if indeed that were possible in an age of nuclear fission and prospective-interplanetary travel. But the simplest way I can summarize the situation is to say that there are now in the world machines that think, that learn, and that create. Moreover, their ability to do these things is going to increase rapidly until - in a visible future -the range of problems they can handle will be coextensive with the range to which the human mind has been applied. (Heuristic Problem Solving: The Next Advance in Operations Research [PDF])

That's a pretty big claim that proves (unsurprisingly) to be slightly wide of the mark!

The GPS solution solves problems by gathering the following information:

  • The current state of the world

  • The available operations that transfer from one-state to the next

  • A goal that we desire to reach

States and goals are represented using symbols. We represent the initial state of the world as a set of symbols, for example, #{'Bored 'At Home'}, indicates that the current state is bored and at home and the goal condition might be #{'Happy 'Inebriated}. Given a series of operators, GPS will attempt to find the way there.

An operator alters the current states. An operator consists of an action (a friendly name for what is taking place), a set of preconditions and an add/remove list which affect the current state of the world. For example, the operation drink might be defined as {:action drink :preconditions #{'have-money} :add-list 'drunk :remove-list 'money}.

The complete code is shown below. Note that it has mutable state and that references are used to hold this. This is enforced by Clojure and is one of the key differences between Lisp and Clojure.

One thing that caught me out was contains? which checks the presence of an INDEX not a value. Hence, the contains-value? definition.

;;; Implementation of a simplified General Problem Solver
;;; Based on code in "Paradigms of Artificial Intelligence Programming"
(:use [clojure.set]))

(defstruct operation :action :preconditions :add-list :delete-list)

(defn make-op
[action preconditions add-list delete-list]
(struct operation action preconditions add-list delete-list))

(def *state* (ref nil))

(def *ops* (ref nil))

(defn contains-value?
[coll val]
(not (nil? (some (partial = val) coll))))

(defn appropriate?
[goal operation]
"An op is appropriate to a goal if it is in its add list"
(contains-value? (:add-list operation) goal))

(defn apply-op
"Print a message and update state if op is applicable"
(when (every? achieve (:preconditions op))
(println "Executing: " (:action op))
(alter *state*
(fn [s]
(difference s (:delete-list op))
(:add-list op)))))))

(defn achieve
"A goal is achieved if it already holds. Or if there is an appropriate
operation for it that is applicable"

(or (contains-value? @*state* goal)
(some apply-op (filter (fn [x] (appropriate? goal x)) @*ops*))))

(defn gps
[state goals ops]
"General Problem Solver: Achieve all goals using the operations available."
(when (every? achieve goals) 'solved))

So having written all this code, how'd we use it? All we have to do is define the problem in terms that GPS can understand. The example below shows a common problem. I'm at home, with money and Facebook, however I'd much rather be at a bar and dancing the night away. However, I need to get to the bar and before I can dance I really need a drink.

(def example-operations
[(make-op 'message-friends

(make-op 'arrange-party

(make-op 'get-on-bus

(make-op 'drink

(make-op 'dance

(make-op 'give-bar-money
#{'have-money 'at-club}
#{'bar-has-money 'have-drink}

(defn example []
(ref-set *state* #{'at-home 'have-money 'have-facebook})
(ref-set *ops* example-operations))
(gps @*state*
#{'dancing 'at-club}

Running this at the REPL shows me the way to go:> (example)
Executing: message-friends
Executing: arrange-party
Executing: get-on-bus
Executing: give-bar-money
Executing: drink
Executing: dance

This isn't very general, but we'll save that till next time!

Sunday 14 June 2009

The Shunting Yard Algorithm (2)

In my previous post, I looked at the Shunting Yard algorithm and implemented a *really* bad version which required you to escape parenthesis. Not very Lispy!

The code below makes a very small change which converts nested expressions (via lists) into RPN.

(defn shunting-yard
(shunting-yard expr []))
([expr stack]
(if (empty? expr)
(let [token (first expr)
remainder (rest expr)]
(number? token) (lazy-seq
(cons token (shunting-yard remainder stack)))

(operator? token) (if (operator? (first stack))
(let [stackfn (partial op-compare token)]
(concat (take-while stackfn stack)
(shunting-yard remainder
(cons token
(drop-while stackfn stack))))))
(shunting-yard remainder (cons token stack)))

(sequential? token) (lazy-seq
(concat (shunting-yard token) (shunting-yard remainder stack)))

:else (assert false))))))

This is much nicer as now, as the tests below show, you can write expressions in a more natural way.

(deftest test-shuntingyard

;; Basic operators defined
(is (= (pretty-print (shunting-yard [100 + 200])) '(100 200 +)))
(is (= (pretty-print (shunting-yard [100 * 200])) '(100 200 *)))
(is (= (pretty-print (shunting-yard [100 / 200])) '(100 200 /)))
(is (= (pretty-print (shunting-yard [100 - 200])) '(100 200 -)))

;; Test some precedence rules
(is (= (pretty-print (shunting-yard [100 * 200 + 300])) '(100 200 * 300 +)))
(is (= (pretty-print (shunting-yard [100 + 200 * 300])) '(100 200 300 * +)))

;; Redundant Parenthesis
(is (= (pretty-print (shunting-yard [[[[1 + 1]]]] '(1 1 +)))))
(is (= (pretty-print (shunting-yard [1 + [1]] '(1 1 +)))))
(is (= (pretty-print (shunting-yard [[1] + [1]] '(1 1 +)))))

;; Bracketing of expressions
(is (= (pretty-print (shunting-yard [[1 + 2 + 3] + 4])) '(1 2 + 3 + 4 +)))
(is (= (pretty-print (shunting-yard [[1 + 2 * 3] + 4])) '(1 2 3 * + 4 +)))
(is (= (pretty-print (shunting-yard [[4 + 5] / [7 - 8]])) '(4 5 + 7 8 - /))))

I was thinking (as was Mark) that I could write a macro in order to simplify writing expressions. The idea would be to write something to convert infix to prefix (rather than postfix as here).

On a slight tangent, in other Lisps, you can write a reader macro which would allow you to annotate an expression with a symbol that your reader would use to convert into the appropriate form. Clojure doesn't allow reader macros (there's a discussion here [see around 19:00] as to the reasons why).

Before we write this as a macro, we'll first need an evaluator loop. This is dead simple and just involves (again) a stack. This is a very basic version that assumes all operators have arity 2.

(defn rpn-evaluator
(rpn-evaluator expr []))
([expr stack]
(if (empty? expr)
(if (= 1 (count stack))
(first stack)
(assert false)) ;; unbalanced
(let [f (first expr)]

(number? f) (recur (rest expr) (cons f stack))

;; Should look up arity of operator!
(operator? f) (recur (rest expr)
(f (second stack) (first stack))
(drop 2 stack))))))))

Having written this code, what does it buy me to write it as a macro? Not much! A macro is simply a translation of some source code to another lump of source code. In this case I'm better off writing a simple function, such as (def infix (comp rpn-evaluator shunting-yard)) to get the job done. The main point is that the macro doesn't have access to the values in the expression (unless they are constant). If the code was (infix [a + b]) then the macro can not evaluation a and b because the values will not be known at macro expansion time.

The one use of the macro that I can see would be for constant folding which I'm sure already occurs as part of the compilation process!

Programming Clojure gives a taxonomy of macros as:

  • Conditional Evaluation
  • Defining vars
  • Java interop
  • Postponing evaluation
  • Wrapping evaluation
  • Avoiding a lambda

None of these seem appropriate here to me, so I *think* writing it as a function is the right way to do it. I need to read On Lisp again since I think my macro understanding is a little off.

Saturday 13 June 2009

The Shunting Yard Algorithm

The Shunting Yard Algorithm converts expressions written in infix notation (i.e. with the operator between the operands) to Reverse Polish Notation (RPN). The algorithm was published by Dijkstra in November 1961 and the original paper can be found here.

RPN is often used in stack-based languages, such as Joy. In this respect, Clojure is to prefix notation as Joy is to stack-based notation.

The algorithm is simple and the description on Wikipedia is very complete; but it still was a great learning exercise for lazy sequences. The code implements a very simple expression parser based on a sequence of elements. Currently nesting expressions isn't supported directly, you have to use \( and \) to escape parenthesis.

;;; Shunting Yard Algorithm
(:use clojure.contrib.test-is))

(defstruct operator :name :precedence :associativity)

(def operators
#{(struct operator + '1 'left)
(struct operator - '1 'left)
(struct operator * '2 'left)
(struct operator / '2 'left)
(struct operator '^ 3 'right)})

(defn lookup-operator
(first (filter (fn [x] (= (:name x) symb)) operators)))

(defn operator?
(not (nil? (lookup-operator op))))

(defn op-compare
[op1 op2]
(let [operand1 (lookup-operator op1)
operand2 (lookup-operator op2)]
(and (= 'left (:associativity operand1)) (<= (:precedence operand1) (:precedence operand2)))
(and (= 'right (:associativity operand1)) (<= (:precedence operand1) (:precedence operand2))))))

(defn- open-bracket? [op]
(= op \())

(defn- close-bracket? [op]
(= op \)))

(defn shunting-yard
(shunting-yard expr []))
([expr stack]
(if (empty? expr)
(if-not (some (partial = \() stack)
(assert false))
(let [token (first expr)
remainder (rest expr)]

(number? token) (lazy-seq
(cons token (shunting-yard remainder stack)))

(operator? token) (if (operator? (first stack))
(concat (take-while (partial op-compare token) stack)
(shunting-yard remainder
(cons token
(drop-while (partial op-compare token) stack)))))
(shunting-yard remainder (cons token stack)))

(open-bracket? token) (shunting-yard remainder (cons token stack))

(close-bracket? token) (let [ops (take-while (comp not open-bracket?) stack)
ret (drop-while (comp not open-bracket?) stack)]
(assert (= (first ret) \())
(concat ops (shunting-yard remainder (rest ret)))))

:else (assert false))))))

(defn pretty-print
(map (fn [x]
(= x +) '+
(= x -) '-
(= x /) '/
(= x *) '*
:else x)) expr))

(deftest test-shuntingyard
(is (= (pretty-print (shunting-yard [100 + 200])) '(100 200 +)))
(is (= (pretty-print (shunting-yard [100 * 200])) '(100 200 *)))
(is (= (pretty-print (shunting-yard [100 / 200])) '(100 200 /)))
(is (= (pretty-print (shunting-yard [100 - 200])) '(100 200 -)))
(is (= (pretty-print (shunting-yard [4 + 5 + \( 6 * 7 \)] '(4 5 + 6 7 * +)))))
(is (= (pretty-print (shunting-yard [\( \( \( 6 * 7 \) \) \)] '(6 7 *)))))
(is (= (pretty-print (shunting-yard [3 + 4 * 5])) '(3 4 5 * +))))

Most of the entries on my blog could be considered Coding Dojo type tasks. I'm basically after problems I can solve in a few hours that I can learn something from!

Wednesday 10 June 2009

Generating Images using Image Magick

In order to play with neural networks for OCR, I needed to create a series of test images.

ImageMagick is a set of command line tools that provide image manipulation functions. It's pretty simple to write a quick script to generate a bucket load of images.

The script below generates a load of images of digits between 0 and 9 using every font available within image magick,

use strict;
use warnings;

# Gives me a list of fonts
my @fonts = split('\n', qx{identify -list font | grep Font: | cut -d: -f2});
my @numbers = (0..9);

foreach my $font (@fonts) {
foreach my $number (@numbers) {
my $filename = sprintf("%s_%s.gif", $font, $number);
my $commandLine = "convert -fill black -pointsize 8 -size 8x8 -font$font label:$number$filename";

5 minutes later, and I've got 340 test images. Neat.

Generated images showing big list of numbers

Sunday 7 June 2009

Learning some lessons from Programming Clojure

My copy of Programming Clojure by Stuart Halloway arrived over the weekend. It's a thoroughly enjoyable read so far and I've definitely learned a few things (for example, I hadn't realized that keywords can act as functions for maps, nor did I realize that a set is a function for determining membership. D'oh

;; A keyword is a function
user> (:houses {:biscuits 4 :houses 7})

;; A map is a function
user> ({:biscuits 4 :houses 7}) :houses})

;; A set is a function on membership
user> (#{4 5 6} 7)

;; On success returns the element
user> (#{4 5 6} 4)

The section on Functional Programming was very interesting, especially with respect to laziness. Stuart presents "Six Rules of Clojure FP", which have a heavy emphasis on laziness. Item 5 is:

Know the sequence library. You can often write code without using recur or the lazy APIs at all

Looking back at my neural network code made me realize I'd reinvented several wheels there and the code could be significantly tidied by just using the standard libraries. Revisiting the sequence documentation helped remind me the various tools in the armoury, particularly iterate and reduce.

As a reminder, the two lumps of code in question are run-patterns and train-network.

(defn run-patterns
[network samples expecteds]
(if (empty? samples)
(let [expected (first expecteds)
sample (first samples)
[ah ao] (run-network sample network)
updated-network (back-propagate expected sample [ah ao] network)]
(recur updated-network (rest samples) (rest expecteds)))))

(defn train-network
([samples expected iterations]
(train-network (create-network (count (first samples))
num-hidden (count (first expected)))
samples expected iterations))
([network samples expected iterations]
(if (zero? iterations)
(recur (run-patterns network samples expected) samples expected (dec iterations)))))

Starting with run-patterns. This takes a sequence of values (the samples and expected values) and reduces it down to a single output (the network) using the supplied function. This can be refactored to be a simple reduce operation (sequence in, value out) and this simplifies the code significantly.

(defn run-patterns
[network samples expecteds]
(fn [n expectations]
(let [[sample expected] expectations
[ah ao] (run-network sample n)]
(back-propagate expected sample [ah ao] n)))
network ;; initial value
(map list samples expecteds)))

Next is train-network. At the moment the implementation is burdened by a hand-rolled looping construct for a specified number of iterations. What if instead of calculating a fixed number, we just calculated an infinite amount of trained neural networks (with laziness obviously!) and let the caller decide what value they'd like?

The new version of train-network drops the number of iterations and returns an infinite sequence of neural networks, each trained one more time than the last. The Clojure function, iterate does the job here:

Returns a lazy sequence of x, (f x), (f (f x)) etc. f must be free of side-effects

Once we refactor to use iterate we get the following. The definition of train-network is considerably simpler, and the only change to the example function is specifying the number of iterations to use.

(defn train-network
([samples expected]
(train-network (create-network (count (first samples))
num-hidden (count (first expected)))
samples expected))
([network samples expected]
(iterate (fn [n] (run-patterns n samples expected)) network)))

(defn example[]
(let [x (nth (apply train-network xor-test-data) 100)]
(println (first (run-network [0 0] x)) "-->" 0)
(println (first (run-network [0 1] x)) "-->" 1)
(println (first (run-network [1 0] x)) "-->" 1)
(println (first (run-network [1 1] x)) "-->" 0)))

I've still got to read the remaining chapters of the book, but I'm looking forward to learning more. Definitely recommended.

Wednesday 3 June 2009

Back propagation algorithm in Clojure

In my previous post I looked at the most basic type of neural network, the single layer perceptron. Next, we'll look at the multi-layer perceptron network. This is a more powerful class of neural network than the single layer because it can handle non linearly separable data (such as the XOR test case which failed last time).

The main difference in a multi-layer perceptron is that each neuron is only activated based on the results of an activation function. This makes use of the kernel trick that maps a non-linear problem into a higher dimensional problem which is linearly separable (see also support vector machines). The mathematics behind this is better explained here [PDF].

A typical activation function is tanh, shown below between -10 and 10 and generated using Wolfram Alpha

Graph of tanh(x) from -10 to 10 from Wolfram Alpha

So how do we model this network in a functional programming language like Clojure? A typical solution using an imperative/OO language (see here for a good Python implementation) would use arrays representing the weights and modify things in situ, but that's not very functional.

We start by defining some constants (including the activation-function and its derivation). bp-nn represents the network itself.

(def activation-function (fn [x] (Math/tanh x)))
(def activation-function-derivation (fn [y] (- 1.0 (* y y))))
(def num-hidden 2)

(def learning-rate 0.5)
(def momentum 0.1)

(defstruct bp-nn :weight-input :weight-output :change-input :change-output)

Next we create some simple initialization functions to create an initial neural network, together with some helper functions for iterating over matrices (which we'll model as lists of lists). Usually you'd use random weights to initialize things, but allowing fixed values makes testing possible.

(defn make-matrix
[width height]
"Create a matrix (list of lists)"
(repeat width (repeat height 0)))

(defn matrix-map
[m func]
"Apply a function to every element in a matrix"
(map (fn [x] (map func x)) m))

(defn rand-range
[l h]
"Return a real number within the given range"
(+ (rand (- h l)) l))

(defn create-network
([input hidden output]
(create-network input hidden output true))
([input hidden output use-random-weights]
"Create a network with the given number of input, hidden and output nodes"
(let [i (inc input)
w-func (if use-random-weights (fn [_] (rand-range -0.2 0.2)) (fn [_] 0.2))
o-func (if use-random-weights (fn [_] (rand-range -2.0 2.0)) (fn [_] 2.0))]
(struct bp-nn
(matrix-map (make-matrix i hidden) w-func)
(matrix-map (make-matrix hidden output) o-func)
(make-matrix i hidden)
(make-matrix hidden output)))))

The first thing we should do is run a pattern through the network and see what comes out the other end. We're not just interested in the output result, we want to know what happened at the hidden layer, so we return a vector of ao (activation output) and ah (activation hidden).

comp is functional composition. ((comp x y) 5) is the equivalent of (x (y 5)) so in the example below we add the numbers together and then apply the activation function. The nested map calls allow us to iterate over the elements in a matrix.

(defn apply-activation-function
[w i]
"Calculate the hidden activations"
(apply map (comp activation-function +) (map (fn [col p] (map (partial * p) col)) w i)))

(defn run-network
[pattern network]
"Run the network with the given pattern and return the output and the hidden values"
(assert (= (count pattern) (dec (count (network :weight-input)))))
(let [p (cons 1 pattern)] ;; ensure bias term added
(let [wi (network :weight-input)
wo (network :weight-output)
ah (apply-activation-function wi p)
ao (apply-activation-function wo ah)]
[ao ah])))

In order to perform backwards-propagation we need a couple of helper functions that work on matrices and vectors to calculate changes in the output that will be used to update the weights.

These helper functions are pretty sick - (no-one wants to read (map (partial reduce +) ...). A better design would probably be to introduce a proper matrix abstraction. There's the beginnings of one here but this is a bit too "Java" syntax heavy for more liking.

(defn calculate-hidden-deltas
[wo ah od]
"Calculate the error terms for the hidden"
(let [errors (map (partial reduce +) (map (fn [x] (map * x od)) wo))] ;; Sick.
(map (fn [h e] (* e (activation-function-derivation h))) ah errors)))

(defn update-weights
[w deltas co ah]
(let [x (map
(fn [wcol ccol h]
(map (fn [wrow crow od]
(let [change (* od h)]
[(+ wrow (* learning-rate change) (* momentum crow)) change]))
wcol ccol deltas))
w co ah)]
[(matrix-map x first) (matrix-map x second)]))

I did warn you...

The next thing we need to implement is the back-propagation algorithm itself. This takes in more parameters than it needs to so that it can be tested standalone (it could be implemented as a local function using a closure to capture some of them). It returns an updated version of the neural network.

(defn back-propagate
[target p results network]
"Back propagate the results to adjust the rates"
(assert (= (count target) (count (first (network :weight-output)))))
(let [pattern (cons 1 p) ;; ensure bias term added
ao (first results)
ah (second results)
error (map - target ao)
wi (network :weight-input)
wo (network :weight-output)
ci (network :change-input)
co (network :change-output)
output-deltas (map (fn [o e] (* e (activation-function-derivation o))) ao error)
hidden-deltas (calculate-hidden-deltas wo ah output-deltas)
updated-output-weights (update-weights wo output-deltas co ah)
updated-input-weights (update-weights wi hidden-deltas ci pattern)]
(struct bp-nn
(first updated-input-weights)
(first updated-output-weights)
(second updated-input-weights)
(second updated-output-weights))

All that remains is to train the network. We need a set of samples with know results, together with a number of iterations to try. I've split these into run-patterns which runs through the patterns once, and train-network> which creates the initial network and runs it through the patterns the specified number of times.

(defn run-patterns
[network samples expecteds]
(if (empty? samples)
(let [expected (first expecteds)
sample (first samples)
[ah ao] (run-network sample network)
updated-network (back-propagate expected sample [ah ao] network)]
(recur updated-network (rest samples) (rest expecteds)))))

(defn train-network
([samples expected iterations]
(train-network (create-network (count (first samples))
num-hidden (count (first expected)))
samples expected iterations))
([network samples expected iterations]
(if (zero? iterations)
(recur (run-patterns network samples expected) samples expected (dec iterations)))))

So how well does it work in practise? Pretty damn good. It correctly converges after a few iterations (this time 100) and consistently gets the XOR test data set correct.

(defn example[]
(let [x (apply train-network (conj xor-test-data 100))]
(println (first (run-network [0 0] x)) "-->" 0)
(println (first (run-network [0 1] x)) "-->" 1)
(println (first (run-network [1 0] x)) "-->" 1)
(println (first (run-network [1 1] x)) "-->" 0)))

;;; Example run through the REPL> (example)
(0.10717792758953508) --> 0
(0.993502708495955) --> 1
(0.9930515903590437) --> 1
(0.00883530181467182) --> 0

Total code weighs in at ~100 of so lines of purely functional code. This made it a doddle to test. Coding this was an exercise in map masochism. Never have I had to construct so many convoluted map expressions. At least it works in the end. It feels like it would be considerably simpler to implement this using mutability, I might try that for comparison purposes. Any suggestions on improving the code much appreciated! Full code is on GitHub.

(update - read some of Programming Clojure and applied some of the lessons learnt - see here.)

Neural networks are pretty interesting and in the next post I'll look at how to implement basic OCR using them.