Saturday 31 January 2009
Thursday 29 January 2009
Eliza
Eliza was one of the first AI programs, written by Weizenbaum in 1966. It's the basis of the Emacs Psychotherapist.
It is very basic and simply works on pattern matching. For example:
By formulating some stock rules (and writing the pattern matching code obviously) we get Eliza.
Code is in my GitHub repository as eliza.clj.
Useful functions I found whilst developing this:
It is very basic and simply works on pattern matching. For example:
Pattern: (I need a ?X)
Input: (I need a vacation)
Output: {?X vacation}
By formulating some stock rules (and writing the pattern matching code obviously) we get Eliza.
Code is in my GitHub repository as eliza.clj.
Useful functions I found whilst developing this:
- Declare for forward declarations
- Some - which returns the first non-true value for fn[x] in any sequence e.g.
user> (some (fn [x] (when (> x 0) x)) '(0 -1 -2 -1 0 1 2 3)) ==> 1 - Replace - takes a map of source->replacement and performs replacements
user> (replace '{a,b b,c c,d} '(a b c)) ==> (b c d)
Tuesday 27 January 2009
Clojure Forward Declarations
Whilst working on Eliza I found I had a couple of functions that were mutually recursive; this meant I couldn't compile them because one didn't have the declaration of the other.
The solution is
(Update after comment from Matt)
As pointed out, this doesn't actually work because the stack can blow:
Using trampoline can fix the issue if you rewrite it so that the tail calls are actually closures
The syntax
The solution is
declare
e.g.
(declare my-odd? my-even?)
(defn my-even? [x]
(if (zero? x)
true
(my-odd? (dec x))))
(defn my-odd? [x]
(if (zero? x)
false
(my-even? (dec x))))
(Update after comment from Matt)
As pointed out, this doesn't actually work because the stack can blow:
user> (my-even? 1000000000) ==> stackoverflow!
Using trampoline can fix the issue if you rewrite it so that the tail calls are actually closures
(declare my-odd?)
(defn my-even? [x] (if (zero? x) true #(my-odd? (dec x))))
(defn my-odd? [x] (if (zero? x) false #(my-even? (dec x))))
user> (trampoline #(my-even? 100000000)) ==> true
The syntax
#(my-even? 1000000000)
creates an anonymous function of no arguments (a thunk to be evaluated later) e.g.
user> (apply #(+ 1 1) ) ==> 2
Passing Parameters in Clojure
Clojure has support for a number of ways of passing parameters. A function body can have multiple function bodies associated with it, each with different arity. This isn't any where near as general as Haskell's pattern matching (see [1]).
Similarly, you can make a var arg function with the
The last option for parameter passing is keywords, the parameter list becomes a map. This is discussed in more detail here.
user> (defn foo ([x] x) ([x y] (+ x y)) ([x y z] (+ x y z)))
#'user/foo
user> (foo 1)
1
user> (foo 1 2)
3
user> (foo 1 2 3)
6
Similarly, you can make a var arg function with the
&
notation.
user> (defn bar ([x] x) ([x & rest-args] (reduce + (cons x rest-args))))
#'user/bar
user> (bar 1)
1
user> (bar 1 2 3 4 5 6 7)
28
The last option for parameter passing is keywords, the parameter list becomes a map. This is discussed in more detail here.
Monday 26 January 2009
Design Patterns in Dynamic Programming
Peter Norvig (of Google fame) has a page Design Patterns in Dynamic Programming which is exactly the kind of reference I've been after, in terms of discussing common object-oriented patterns in a functional context. It also features a number of other FP patterns:
As an aside, Norvig's book (Paradigms of AI Programming) is next on my list to port a few interesting examples across from. Eliza is first on my hit list!
I'm still yet to see something akin to GoF for functional programming. Perhaps this is because it doesn't exist and functional programming "in-the-large" is exactly the same as in the small?
- Coroutines
- Control Abstraction
- Memoization
- Rule-based Translator
As an aside, Norvig's book (Paradigms of AI Programming) is next on my list to port a few interesting examples across from. Eliza is first on my hit list!
I'm still yet to see something akin to GoF for functional programming. Perhaps this is because it doesn't exist and functional programming "in-the-large" is exactly the same as in the small?
Wednesday 21 January 2009
Multi-methods in Clojure
Multimethods (more commonly known as multiple dispatch) means that a method call is dispatched dynamically based upon the type of multiple arguments.
In Java / C++ you'd typically approach this with the visitor pattern, with something like the following.
Why's this not good? Well, the
Clojure multimethods support dispatching on variables other than the runtime type of an argument. From the Clojure docs, you can dispatch on "types, values, attributes and metadata of, and relationships between, one or more arguments".
Clojure multi-methods consist of two things:
Here's a trivial example defining an increment function that dispatches on the class of the arguments.
But, we do not just have to be boring and dispatch only on the type of the argument, we could be dispatching on the value:
The dispatching function isn't limited to just the first argument, you could dispatch on the type of multiple arguments e.g.
This gives about the most general method of method dispatch imaginable!
The documentation explains how this goes further still, allowing relationships to be created with
Reading Programming Clojure shows that multi-methods are very rarely used in most of the current Clojure code (only about 1 occurrence per thousand lines of code). There's some guidelines in the book about choosing whether to use multi-methods, but it's summed up best with "try writing the function in both styles, and pick the one that seems more reliable".
In Java / C++ you'd typically approach this with the visitor pattern, with something like the following.
interface Foo {
// ... lots of exciting methods
void accept( FooVisitor visitor );
}
interface FooVisitor {
void visit( Bar bar );
void visit( Baz baz );
}
class Bar implements Foo {
//... implementation
public void accept ( FooVisitor visitor ) {
visitor.visit( this );
}
}
class Baz implements Foo {
//... implementation
public void accept( FooVisitor visitor ) {
visitor.visit( this );
}
}
Why's this not good? Well, the
FooVisitor
has to know about all the subclasses, which is pretty nasty. In addition, well, it's just a lot of typing - way too much scaffolding. Languages like Smalltalk and Lisp both support multi-methods natively which give you all the benefits of this "pattern", without the crazy verbosity of the Java version.Clojure multimethods support dispatching on variables other than the runtime type of an argument. From the Clojure docs, you can dispatch on "types, values, attributes and metadata of, and relationships between, one or more arguments".
Clojure multi-methods consist of two things:
* Some methods (the multi!)
* A dispatcher function (chooses which one)
Here's a trivial example defining an increment function that dispatches on the class of the arguments.
class
is a built in function within Clojure core that gives the class of the object.
(defmulti my-increment class)
(defmethod my-increment String [s]
(let [x (Integer/valueOf s)]
(str (my-increment x))))
(defmethod my-increment Integer [i]
(+ i 1))
user> (my-increment 4) ==> 5
user> (my-increment "4") ==> "5"
But, we do not just have to be boring and dispatch only on the type of the argument, we could be dispatching on the value:
(defmulti my-decrement identity) ;; identify is built in (fn [x] x)
(defmethod my-decrement 0 [x]
99999)
(defmethod my-decrement :default [x]
(- x 1))
user> (my-decrement 2) ==> 1
user> (my-decrement 1) ==> 0
user> (my-decrement 0) ==> 99999
The dispatching function isn't limited to just the first argument, you could dispatch on the type of multiple arguments e.g.
(defmulti my-add (fn [x y] (and (string? x) (string? y))))
(defmethod my-add true [x y]
(str x y))
(defmethod my-add false [x y]
(+ x y))
user> (my-add 3 4) ==> 7
user> (my-add "3" "4") ==> "34"
This gives about the most general method of method dispatch imaginable!
The documentation explains how this goes further still, allowing relationships to be created with
derive
and how you can add/remove/define an ordering implementations of methods at runtime.Reading Programming Clojure shows that multi-methods are very rarely used in most of the current Clojure code (only about 1 occurrence per thousand lines of code). There's some guidelines in the book about choosing whether to use multi-methods, but it's summed up best with "try writing the function in both styles, and pick the one that seems more reliable".
Tuesday 20 January 2009
Compojure
Compojure is a web-framework for Clojure.
Grab the latest source from Git and build with Ant.
Now you've got to add Compojure into your classpath. I use Emacs and Slime so I edited my
When
After this, I wasted (well wasted probably isn't true, I got some new knowledge) precious minutes of my life wondering why I couldn't get anything working, but on the plus side I found some functions that helped:
Then, I got a bit lost because I'd mismatched versions of compojure.jar and clojure.ar! Rather than using the version of Clojure that came with Compojure, I was using the head revision - bad idea! Use a consistent set of JARs that come from the git repository.
And now I'm able to compile the examples from the Wikibook and all is well.
I also decided to finally by the Programming Clojure book, now all I need is a decent e-book reader!
Grab the latest source from Git and build with Ant.
git clone git://github.com/weavejester/compojure.git
cd compojure
ant
Now you've got to add Compojure into your classpath. I use Emacs and Slime so I edited my
.emacs
file based on some suggestions from IRC and got (in addition to the usual)
(swank-clojure-config
(setq swank-clojure-jar-path (concat clj-root "clojure/trunk/clojure.jar")))
(setq swank-clojure-extra-classpaths (directory-files "~/lisp/clj/" t ".jar$"))
When
~/lisp/clj
contains all my JARs (well symlinks anyway).After this, I wasted (well wasted probably isn't true, I got some new knowledge) precious minutes of my life wondering why I couldn't get anything working, but on the plus side I found some functions that helped:
user> (all-ns)
;; big list of name spaces, check that you've got Compojure on here!
user> (System/getProperty "java.class.path")
;; big class path list, make sure you have Compojure + all the dependent JARs
Then, I got a bit lost because I'd mismatched versions of compojure.jar and clojure.ar! Rather than using the version of Clojure that came with Compojure, I was using the head revision - bad idea! Use a consistent set of JARs that come from the git repository.
And now I'm able to compile the examples from the Wikibook and all is well.
I also decided to finally by the Programming Clojure book, now all I need is a decent e-book reader!
Monday 19 January 2009
Functional Tree Editing
The zipper is a Functional Pearl which allows localized editing of a list-based data structure (most commonly trees).
The idea is very simple, for any list-based data structure you can create a data-structure (the zipper) which allows you to navigate (next, previous, up, down) but also do functional-editing (by which I mean localized editing without having to reconstruct the entire data structure). The original data is untouched.
In the example below (see [1] for docs), we create a zip structure from a vector
This example looks a bit backward; we can make things clearer with
If we apply this code to the previous zip example, we get something much clearer (apart from the namespace gumph).
The main point of zippers is that the original is untouched.
Clojure provides zippers for vectors, sequences and XML elements.
The idea is very simple, for any list-based data structure you can create a data-structure (the zipper) which allows you to navigate (next, previous, up, down) but also do functional-editing (by which I mean localized editing without having to reconstruct the entire data structure). The original data is untouched.
In the example below (see [1] for docs), we create a zip structure from a vector
zip/vector-zip
. Navigating along we move along with zip/next
, zip/remove
and return to the root with zip/root
user> (let [zip (clojure.zip/vector-zip [1 2 3 4 5])]
(clojure.zip/root (clojure.zip/remove (clojure.zip/next zip))))
[2 3 4 5]
This example looks a bit backward; we can make things clearer with
->
. This is a macro that "threads" function calls, we can explain this a bit clearer with macroexpand
as this shows what a macro actually becomes.
user> (macroexpand '(-> [1 2 3] rest rest))
(rest (clojure.core/-> [1 2 3] rest))
user> (macroexpand '(-> [1 2 3] rest))
(rest [1 2 3])
;; Put the two together and we get (rest (rest [1 2 3])) ==> 3
If we apply this code to the previous zip example, we get something much clearer (apart from the namespace gumph).
user> (let [zip (clojure.zip/vector-zip [1 2 3 4 5])]
(-> zip clojure.zip/next clojure.zip/remove clojure.zip/root))
The main point of zippers is that the original is untouched.
user> (let [data [1 2 3 4 5] zip (clojure.zip/vector-zip data)]
(prn "Changed is " (-> zip clojure.zip/next clojure.zip/remove clojure.zip/root))
(prn "Original is " data))
"Changed is " [2 3 4 5]
"Original is " [1 2 3 4 5]
Clojure provides zippers for vectors, sequences and XML elements.
Learning Emacs
Emacs and vi/vim are the two most popular text editors on Unix. I'm currently learning Emacs (yes, learning, there's a bit more to it than Notepad).
The coolest thing I've found so far is Emacs keyboard macros.
You can combine the execution with
The coolest thing I've found so far is Emacs keyboard macros.
- Start recording with
C-x (
- Type keys
- End recording with
C-x )
- Execute recording with
C-x e
You can combine the execution with
C-u
to repeat the command a given number of times.
Wednesday 14 January 2009
Using Agents
I've had a hard time groking Clojure agents, so this is mostly just a series of daft micro-examples to understand them together with restating the bleeding obvious.
An agent is created with
Agents are reactive - they'll only do something if you tell them so. All communication with agents is through functions. There are two commands for sending functions to agents,
Without the invocation of
What if the function you are sending has errors? Let's look at a divide by zero:
Errors in agents can be inspected with
So agents seem incredibly simple - why are they so powerful?
An agent is created with
agent
and the associated data. You can use deref
or @
to access the data associated with an agent (same syntax as I mentioned previously for atoms).
user> (agent '(1 2 3 4 5 6))
#
user> @(agent '(1 2 3 4 5 6))
(1 2 3 4 5 6)
Agents are reactive - they'll only do something if you tell them so. All communication with agents is through functions. There are two commands for sending functions to agents,
send
(used for non-blocking calls) and send-off
(used for potentially blocking calls). Both send
and send-off
return immediately, the difference being that send-off
guarantees the message will be processed in a different thread.
user> (let [a (agent 4)]
(send a + 1) ; schedules a call of (apply + agent_state 1)
(await a)
(prn @a))
5
nil
Without the invocation of
await
, this may return 4 not 5. await
blocks the current thread indefinitely until all actions have been completed (which makes it quite dangerous!).What if the function you are sending has errors? Let's look at a divide by zero:
user> (let [a (agent 4)]
(send a / 0)
(await a)
(prn @a))
; Evaluation aborted.
java.lang.Exception: Agent has errors (NO_SOURCE_FILE:0)
[Thrown class clojure.lang.Compiler$CompilerException]
Errors in agents can be inspected with
agent-errors
which returns a sequence of exceptions. Once an agent is in an error state it can not process any more messages until the errors have been cleared with clear-agent-errors
.
user> (let [a (agent 4)]
(send a / 0)
(await a)
(prn (agent-errors a))
(clear-agent-errors a)
(prn @a))
(#)
4
nil
So agents seem incredibly simple - why are they so powerful?
- Integration with STM, which leads too...
- Always observable - no matter what calculation is going on, you can always get the current (valid) value with
deref
. - No message-loops - agents are purely reactive which (to me) seems a simpler programming model to understand
- Open - agents are mutated by functions and therefore this can be extended by anyone (in contrast with having to add a new type of message for the loop to switch on)
Monday 12 January 2009
Metadata in Clojure
Each Clojure function can have a documentation string associated with it. This must be declared immediately before the parameters, for example:
This is actually short hand for:
Metadata is associated with symbols or collections, and
Note how extra metadata has arrived when it was compiled, including the line number of the original file, name space etc.
There are many cool potential applications of metadata in Clojure, though I haven't seen any implemented yet!:
(defn foo-baz
"Foo-bazes bar"
[bar]
(+ bar 77))
user> (doc foo-baz)
-------------------------
user/foo-baz
([bar])
Foo-bazes bar
This is actually short hand for:
(defn #^{:doc "Foo-bazes bar"} foo-baz
[bar]
(+ bar 77))
#^
indicates to the reader that the following map is metadata to be associated with the next element read. You can use this to document def
'd values too (e.g. (def #^{:doc "This is the value x"} x)
.Metadata is associated with symbols or collections, and
meta
can be used to view the associated values. Initially I was doing (meta count)
and expecting it to come up with the details about agent
. This is wrong because count
is an instance of the function, whereas what I need to pass as an argument to meta is the symbol associated with the function e.g.
user> (meta count)
nil
user> (meta #'count) ; ^#'count also works (^ is a reader character to get the metadat)
{:ns #, :name count, :doc "Returns the number of items in the collection.
(count nil) returns\n 0. Also works on strings, arrays, and Java Collections and Maps",
:arglists ([coll]), :line 842, :file "core.clj"}
Note how extra metadata has arrived when it was compiled, including the line number of the original file, name space etc.
There are many cool potential applications of metadata in Clojure, though I haven't seen any implemented yet!:
- Aspect oriented programming (side note inspiration for this came from Lisp Advice (see Emacsspeak mentioned in the book Beautiful Code.)
- Software tools - knowing the line number and file helps in a number of ways, maybe could be used to do semantic diff (for example, re-ordering my functions wouldn't make any difference because they'd semantically still be the same).
Ants!
The Ants simulation is the de-facto example of agents in Clojure. Grab the full code from the Clojure Concurrency presentation.
Saturday 10 January 2009
Software Transactional Memory
Software Transactional Memory is a technique that aims to simplify developing concurrent programming. It allows you to define units of work (transactions) which obey the ACID properties:
During a unit of work each thread completes modifications to the (shared) data structure. All modifications are recorded in a log. When the transaction reaches the end, the memory logs are analysed; if the logs are consistent (e.g. no read/write of the same variable) then the transaction is commited, if not the transaction is aborted or retried. All functions in a transaction must be pure. IO in a transaction is wrong. A language like Haskell enforces this via its type system - IO operations are performed with a monad. STM is still very new, there's no guarantee that this will be the "winner" in the concurrency programming war...
- Atomic - work either happens completely or not at all.
- Consistent - only valid work is carried out
- Isolation - no-one can see intermediate work, only the final result
- Durable - Once the work has been completed, it's done and will survive failure
During a unit of work each thread completes modifications to the (shared) data structure. All modifications are recorded in a log. When the transaction reaches the end, the memory logs are analysed; if the logs are consistent (e.g. no read/write of the same variable) then the transaction is commited, if not the transaction is aborted or retried. All functions in a transaction must be pure. IO in a transaction is wrong. A language like Haskell enforces this via its type system - IO operations are performed with a monad. STM is still very new, there's no guarantee that this will be the "winner" in the concurrency programming war...
Friday 9 January 2009
Generating Text
Often you need a chunk of vaguely real looking text to test some code, web layout, file handling etc. In ANSI Common Lisp (which I'm currently reading if you hadn't guessed) there's an example of how to generate random text.
The idea is simple - read in a list of words, for each pair of words keep a count of the number of occurrences of that pair. Once you've got that data, you can pick a word at random and then pick from a probability distribution what the next word should be. Apply that pattern until you've generated enough text. There's much more sophisticated work based on same ideas.
The example Lisp code is quite nasty and it's written in what feels like an iterative style. In Clojure we have access to a richer standard library (Java). For example, we can read in a list of words thus:
This takes a file name as an arguments, slurps the entire file into memory and splits it using String.split
Next we need to be a frequency map which has an entry for each word, together with a count of each word that follows it. We use a map of Word => (Word => Count).
This is a beefy function, but I couldn't see how to simplify it. It also assumes that the last word maps to the first, which is daft but then this is only playing around! The idea is to get the two sequences, shift one to the left and then count the two in a map. For example the frequency map of Caroline Kennedy would be, well you know?, something like this:
So once we've generated a frequency map we need to work out what the next work should be based on the probability. "know" is followed by 1 occurrence of either "it's" or "you", so each should have a 50/50 chance of getting picked.
Finally, all we need is a function to apply
There's several nice things here from a functional programming point of view:
* Use of
* Use of iterate/take to have an infinite stream of text (laziness)
So is it any good? Here's 250 words Dracula
"the same condition of those which pretend and said Finis and I agree later period He evidently the geologic and women jumped down one whom each side manifestly as to set in them he had pressed upon us? Be not in through my poor Lucy noticed it up the section their swords can be more and had expected A great brain just finished the third person or sea I do for the belt with so sweet little before he woke bright smile of it afterwards when we did anent something like a night till tonight it and on in paragraph 1 October --As I describe when Arthur I have unchecked sway a low shrill call him what will have not friend John was no opinion there would indeed happiness were parted at rest Happily Van Helsing hypnotized me But you? Whilst we may deign to be at rest seemed to the facts of a cluster and mixed up the wide spread out one by the library of whips The soft voice gave them both quiet joy when we were they come to the big swan thought struck with my eyelids and pondered over to take away I took out of the free Nay sir bethink yourself It would like it so strange tongues fierce voice came close to bear you consated his hands Our dear Lucy after you shall not look into my hands our intent I will grace said nothing The opiate of white cloud in spirit Perhaps it"
Well, it's not going to win the Booker Prize, but it's almost believable! Full code available here
Other applications of this:
Predictive messaging
Generating auto-blog posts by "rewriting" other content
Language recognition based on work occurrence / frequency
The idea is simple - read in a list of words, for each pair of words keep a count of the number of occurrences of that pair. Once you've got that data, you can pick a word at random and then pick from a probability distribution what the next word should be. Apply that pattern until you've generated enough text. There's much more sophisticated work based on same ideas.
The example Lisp code is quite nasty and it's written in what feels like an iterative style. In Clojure we have access to a richer standard library (Java). For example, we can read in a list of words thus:
(defn file-as-wordlist [f]
(filter (fn [x] (> (count x) 0)) (.split (slurp f) "\n|[ ]|\r|[.]|[,]|[\"]")))
This takes a file name as an arguments, slurps the entire file into memory and splits it using String.split
Next we need to be a frequency map which has an entry for each word, together with a count of each word that follows it. We use a map of Word => (Word => Count).
(defn build-frequency-map [words]
(let [word-pairs (mapcat (fn [x y] (list [x y])) (cons (last words) words) words)]
(reduce (fn [accum v]
(let [w1 (first v) w2 (second v) val (get accum w1)]
(if (nil? val)
(assoc accum w1 {w2 1})
(let [currentVal (get val w2)]
(if (nil? currentVal)
(assoc accum w1 (conj val {w2 1}))
(assoc accum w1 (conj val {w2 (inc currentVal)})))))))
{} word-pairs)))
This is a beefy function, but I couldn't see how to simplify it. It also assumes that the last word maps to the first, which is daft but then this is only playing around! The idea is to get the two sequences, shift one to the left and then count the two in a map. For example the frequency map of Caroline Kennedy would be, well you know?, something like this:
user> (build-frequency-map ["you" "know" "it's" "great" "you" "know"])
{"know" {"it's" 1, "you" 1}, "great" {"you" 1}, "it's" {"great" 1}, "you" {"know" 2}}
So once we've generated a frequency map we need to work out what the next work should be based on the probability. "know" is followed by 1 occurrence of either "it's" or "you", so each should have a 50/50 chance of getting picked.
(defn frequency-map-count [m word]
(let [v (get m word)]
(if (nil? v)
0
(reduce (fn [x y] (+ x (second y))) 0 v))))
(defn next-word [m word]
(let [following (get m word) p (rand-int (frequency-map-count m word))]
((fn [words prob]
(let [word-count (second (first words))]
(if (>= word-count prob)
(first (first words))
(recur (rest words) (- prob word-count))))) following p)))
frequency-map-count
counts the number of occurrences of a word. next-word
rolls a dice and selects the next word based on the number of occurrences.Finally, all we need is a function to apply
next-word
(defn generate-text [example n & [start]]
(let [words (file-as-wordlist example) fm (build-frequency-map words)
start-word (if (nil? start) "the" start)]
(apply str (interpose " " (take n (iterate (partial next-word fm) start-word))))))
There's several nice things here from a functional programming point of view:
* Use of
partial
the function of a single argument (currying)* Use of iterate/take to have an infinite stream of text (laziness)
So is it any good? Here's 250 words Dracula
user> (generate-text "/home/jfoster/Documents/345.txt" 250)
"the same condition of those which pretend and said Finis and I agree later period He evidently the geologic and women jumped down one whom each side manifestly as to set in them he had pressed upon us? Be not in through my poor Lucy noticed it up the section their swords can be more and had expected A great brain just finished the third person or sea I do for the belt with so sweet little before he woke bright smile of it afterwards when we did anent something like a night till tonight it and on in paragraph 1 October --As I describe when Arthur I have unchecked sway a low shrill call him what will have not friend John was no opinion there would indeed happiness were parted at rest Happily Van Helsing hypnotized me But you? Whilst we may deign to be at rest seemed to the facts of a cluster and mixed up the wide spread out one by the library of whips The soft voice gave them both quiet joy when we were they come to the big swan thought struck with my eyelids and pondered over to take away I took out of the free Nay sir bethink yourself It would like it so strange tongues fierce voice came close to bear you consated his hands Our dear Lucy after you shall not look into my hands our intent I will grace said nothing The opiate of white cloud in spirit Perhaps it"
Well, it's not going to win the Booker Prize, but it's almost believable! Full code available here
Other applications of this:
Wednesday 7 January 2009
Regular Expressions in Clojure
Regular Expressions are a incredibly powerful language for performing text manipulation.
ANSI Common Lisp doesn't feature regex support, though there are a large number of libraries that do (see here). Clojure uses the Java implementation of regular expressions (see the regex tutorial) and adds some nice syntactic sugar to the mix.
To define a regex pattern in Clojure you use the
The Clojure regex functions are well documented. Unfortunately, I didn't really read the documentation!
When learning regular expressions, I found Regex Coach invaluable (which is actually Lisp powered!). It's an application that lets you immediately see how a regular expression matches some text. Let's do the basics of this in Clojure.
You have two areas, one for the regular expression and one for the text. As you press keys the matching regular expression (if any) is highlighted in the text area.
Firstly we need a function that given a regular expression and some text returns where to do some highlighting:
Next a bit of UI that features a couple of new things:
Each time a key is pressed
Full code here.
ANSI Common Lisp doesn't feature regex support, though there are a large number of libraries that do (see here). Clojure uses the Java implementation of regular expressions (see the regex tutorial) and adds some nice syntactic sugar to the mix.
To define a regex pattern in Clojure you use the
#"<regex>"
syntax. This is actually the regex pattern, so evaluating it at the REPL gives you itself. The pattern is compiled, so putting anything nonsensical results in an error.
user> #"[0-9]+"
#"[0-9]+"
user> #"[]+"
; Evaluation aborted.
; clojure.lang.LispReader$ReaderException: java.util.regex.PatternSyntaxException:
; Unclosed character class near index 2
The Clojure regex functions are well documented. Unfortunately, I didn't really read the documentation!
re-find
returns either the first match (directly as a string) OR a vector of matches if multiple matches. This is optimized for the normal case where a user enters the text and the regex is well known ahead of time e.g.
user>(re-find #"bar" "bar")
"bar"
user>(re-find #"(foo)|(bar)" "foo bar")
["foo" "foo" nil]
When learning regular expressions, I found Regex Coach invaluable (which is actually Lisp powered!). It's an application that lets you immediately see how a regular expression matches some text. Let's do the basics of this in Clojure.
You have two areas, one for the regular expression and one for the text. As you press keys the matching regular expression (if any) is highlighted in the text area.
Firstly we need a function that given a regular expression and some text returns where to do some highlighting:
(defn first-match [m]
(if (coll? m) (first m) m))
(defn match [regex text]
(let [m (first-match (re-find (re-pattern regex) text))]
(if (nil? m)
[0 0]
(let [ind (.indexOf text m) len (.length m)]
[ind (+ ind len)]))))
first-match
is a helper function that gives the first match (handling the case of re-find
returning multiple entries). match
just gives you a vector [x y]
representing the index of the match.Next a bit of UI that features a couple of new things:
- $ syntax for inner classes (a Java thing)
- Exception handling
Each time a key is pressed
match
returns the area to highlight and Highlighter does the rest. The exception handling ensures that if there was an error compiling the regex then that's printed on the status bar.
(defn regexcoach []
(let [frame (JFrame. "Regular Expression Coach") pane (JPanel.) regexText (JTextField.)
targetText (JTextField. "")
statusBar (JLabel. "Match from 0 to 0")
keyHandler (proxy [KeyAdapter] []
(keyTyped [keyEvent]
(try
(let [m (match (.getText regexText) (.getText targetText))
hl (.getHighlighter targetText)
pen (DefaultHighlighter$DefaultHighlightPainter. Color/RED)]
(.removeAllHighlights hl)
(.addHighlight hl (first m) (second m) pen)
(.setText statusBar (format "Match from %s to %s" (first m) (second m))))
(catch PatternSyntaxException e (.setText statusBar (.getMessage e))))))]
(doto regexText
(.addKeyListener keyHandler))
(doto targetText
(.addKeyListener keyHandler))
(doto pane
(.setLayout (BoxLayout. pane BoxLayout/Y_AXIS))
(.add (JLabel. "Regular Expression"))
(.add regexText)
(.add (JLabel. "Target String"))
(.add targetText)
(.add statusBar))
(doto frame
(.add pane)
(.setSize 300 300)
(.setVisible true))))
Full code here.
Tuesday 6 January 2009
Game of Life Part II
A helpful comment on my previous blog entry suggested that instead of crazy zip rubbish, I could instead view the grid as a Point => Value mapping. Is that a good change to make?
Here's the changes from the previous version. Full code on my Git repository.
Most of the functions above are much clearer (apart from
I did sacrifice sparseness for neatness. I needed to have the values populated as dead such that
Overall, I think this is a definite improvement over the previous version.
Here's the changes from the previous version. Full code on my Git repository.
(defstruct point :x :y)
(defn world-at [world point]
(get world point))
(defn toggle-pos [world point]
(if (zero? (world-at world point))
(assoc world point 1)
(assoc world point 0)))
(defn neighbours [p]
(let [x (:x p) y (:y p)]
[(struct point (dec x) (dec y)) (struct point x (dec y)) (struct point (inc x) (dec y))
(struct point (dec x) y) (struct point (inc x) y)
(struct point (dec x) (inc y)) (struct point x (inc y)) (struct point (inc x) (inc y))]))
(defn neighbour-count [world p]
(reduce + (map (fn [x] (let [v (world-at world x)] (if (nil? v) 0 v))) (neighbours p))))
(defn new-state [world p]
(let [neighbours (neighbour-count world p) alive (world-at world p)]
(cond
(and (= alive 1) (< neighbours 2)) 0 ;; under population
(and (= alive 1) (> neighbours 3)) 0 ;; over-crowding
(and (= alive 1) (or (= 2 neighbours) (= 3 neighbours))) 1 ;; unchanged to the next generation
(and (= 3 neighbours)) 1 ;; any tile with exactly 3 live neighbour cells becomes alive
:else 0)))
(defn life-step [w]
(into (hash-map) (map (fn [x] [(first x) (new-state w (first x))]) w)))
(defn create-world [w h]
(let [x (range 0 w) y (range 0 h)]
(apply hash-map (mapcat (fn [a] (mapcat (fn [b] (list (struct point a b) 0)) y)) x))))
Most of the functions above are much clearer (apart from
create-world
) than they were previously. In addition the SLOC has decreased from 74 to 66, so the code is more concise too.I did sacrifice sparseness for neatness. I needed to have the values populated as dead such that
life-step
could just be written as a map function. If the values didn't exist, I'd have to create something from nothing. In this case, I think the trade off is OK.Overall, I think this is a definite improvement over the previous version.
Monday 5 January 2009
The Game of Life in Clojure.
Conway's Game of Life is by far the most famous example of cellular automaton. It's not really a game, you set an initial state and then watch it evolve.
The "game" is played on a grid with each cell having two states (alive or dead). The next state can be computed from the previous one using the following rules:
- Any live cell with fewer than two live neighbours dies, as if by needs caused by under-population.
- Any live cell with more than three live neighbours dies, as if by overcrowding.
- Any live cell with two or three live neighbours lives, unchanged, to the next generation.
- Any tile with exactly three live neighbours cells will be populated with a living cell.
The first thing we need to do is determine a representation for a grid. I've gone for a simple approach of representing a grid as a list of lists.
create-world
creates a blank grid
(defn create-world [w h]
(replicate h (replicate w 0)))
Now we need some helper functions for getting the state of an individual element and toggling values.
(defn world-at [world x y]
(if (and (>= x 0) (>= y 0) (< x (count world)) (< y (count (first world))))
(nth (nth world x) y)
0))
(defn toggle [x]
(if (= x 0) 1 0))
(defn toggle-row-at [row pos]
(map (fn [x] (if (= pos (first x)) (toggle (second x)) (second x))) (zipmap (range 0 (count row)) row)))
(defn toggle-pos [world x y]
(map (fn [v] (if (= (first v) x)
(toggle-row-at (second v) y)
(second v)))
(zipmap (range 0 (count world)) world)))
toggle-row-at
and toggle-pos
both feel wrong to me - I'm not sure of a better way to carry position information into a map function (for example, I want to perform a map operation with positional information - zip is the only way I can see of doing that - wonder if there is a better way?).Armed with functions to explore the world we need to write the logic to transition from one-state to the next.
(defn neighbour-count [world x y]
(+ (world-at world (dec x) (dec y)) (world-at world x (dec y)) (world-at world (inc x) (dec y))
(world-at world (dec x) y) (world-at world (inc x) y)
(world-at world (dec x) (inc y)) (world-at world x (inc y)) (world-at world (inc x) (inc y))))
(defn new-state [world x y]
(let [neighbours (neighbour-count world x y) alive (world-at world x y)]
(cond
(and (= alive 1) (< neighbours 2)) 0 ;; under population
(and (= alive 1) (> neighbours 3)) 0 ;; over-crowding
(and (= alive 1) (or (= 2 neighbours) (= 3 neighbours))) 1 ;; unchanged to the next generation
(and (= 3 neighbours)) 1 ;; any tile with exactly 3 live neighbour cells becomes alive
:else 0)))
new-state
encodes the rules, and neighbour-count
just does exactly what it says on the tin. Now all we need to do is apply new-state
to each cell in the grid and produce the next one. This is suprisingly simple:
(defn life-step [w]
(let [width (count w) height (count (first w))]
(map
(fn [row] (map (fn [col]
(let [x (first row) y (first col)]
(new-state w x y)))
(zipmap (range 0 height) (second row))))
(zipmap (range 0 width) w))))
It again makes use of the
zipmap
idiom for applying mapping function with positional information. I wish I could find a reference which states whether this is a good thing or a bad thing :)In order to implement the GUI around this, we define
*world*
as an atom, and provide a mouse listener which on a left-click toggles elements, and on a right-click moves forward one step. Animating it would be fairly simple too (just hook up a Swing timer a la bubble sort).Full code is availabe on my Git Repository
Sunday 4 January 2009
Ray Tracing in Clojure (Part II)
In my previous post, I ported across a ray-tracing example from ANSI Common Lisp. Now, I'll try and work out how threading functions work.
Each area of the viewing image is independent, so let's divide up the viewing surface into a number of independent buffered images. We have a little helper function which given a width, height and tile size gives us back a list of positions to draw at.
For example, if we want to draw use a tile size of 150x150 with a viewing surface of 300x300 we get the following offsets:
Now we should make the
Now the function is pure because it always gives the same output for the same input and no IO occurs.
Finally, we need to distribute this work across all the cores. Clojure has a built in function, pmap that works the same as map but in parallel.
Initially I performed the drawing to the canvas within a
If you want to force the evaluation there's a number of functions which do that:
Pretty much full CPU utilization across all cores, with only a few lines of code changed, funky!
Each area of the viewing image is independent, so let's divide up the viewing surface into a number of independent buffered images. We have a little helper function which given a width, height and tile size gives us back a list of positions to draw at.
(defn create-work-list [width height unitX unitY] (let [xs (range 0 width unitX) ys (range 0 height unitY)] (mapcat (fn [x] (mapcat (fn [y] (list (list x y))) ys)) xs)))
For example, if we want to draw use a tile size of 150x150 with a viewing surface of 300x300 we get the following offsets:
user> (create-work-list 300 300 150 150) ((0 0) (0 150) (150 0) (150 150))
Now we should make the
ray-trace
function take notice of these co-ordinates. The previous version of the ray-trace function wasn't very functional as it performed IO (drawing to the canvas). Side-effects are the enemy of referential transparency.(defn ray-trace [world w h ox oy] (let [buffered-image (BufferedImage. w h BufferedImage/TYPE_BYTE_GRAY)] (doseq [x (range 0 (dec w))] (doseq [y (range 0 (dec h))] (.setRGB buffered-image x y (color-at (+ x ox) (+ y oy))))) buffered-image))
Now the function is pure because it always gives the same output for the same input and no IO occurs.
Finally, we need to distribute this work across all the cores. Clojure has a built in function, pmap that works the same as map but in parallel.
Initially I performed the drawing to the canvas within a
pmap
expression - this is very wrong! pmap
is lazy, it's not evaluated unless it is needed. Clojure doesn't know that I intended that to always been evaluated, so it was only ever drawing the first four tiles (presumably because that's how many cores my desktop has).If you want to force the evaluation there's a number of functions which do that:
doseq
dorun
doall
ray-trace
in a pmap
expression to produce a list of images, and then use doseq
to perform the side effects. (def canvas (proxy [JPanel] [] (paintComponent [g] (proxy-super paintComponent g) (.setColor g Color/RED) (let [width (.getWidth this) height (.getHeight this) unitX (/ width 16) unitY (/ height 16) work-list (create-work-list width height unitX unitY)] (doseq [image (pmap (fn [pos] (list (apply ray-trace (list world unitX unitY (first pos) (second pos))) (first pos) (second pos))) work-list)] (.drawImage g (first image) (second image) (nth image 2) unitX unitY nil))))))The separation of IO and pure functions is something advocated in "A wish list for the next mainstream programming language". Clojure doesn't force this, whereas something like Haskell does. Haskell uses monads to achieve this, which is something I'll visit at some point. See LtU for some explanations. Ok, enough theory - what difference does this actually make? Well for me, about a 4x difference, and I have 4 cores, so that's good! Timing's aren't that exciting, but you can see the difference with the system monitor.
Pretty much full CPU utilization across all cores, with only a few lines of code changed, funky!
Saturday 3 January 2009
Ray Tracing in Clojure
ANSI Common Lisp, by Paul Graham, is a great introduction to Lisp. Lots of compelling examples on run-length encoding (see previous post), poetry generation and ray-tracing.
This post looks at translating the example (from Chapter 9) into Clojure.
Ray-tracing is a very simple technique. From ACL:
"To generate a 3D image, we need to define at least four things: an eye, one or more light sources, a simulated world consisting of one or more surfaces, and a plane (the image plane) that serves as a window onto this world. The image we generate is the projection of the world onto a region of the image plane."
So how do we generate the pictures? That's pretty simple too, all we do is for every pixel in the image plane just trace the ray that the eye would see from there. Done.
We start off by defining some simple maths functions and an object to represent a point in 3D space.
The original Lisp code mixed the point structure with individual values. I felt this made the code a bit ugly and hard to read, so in here we try to use the point structure as much as possible.
Anyway, the code above is self explanatory,
Next we need to define some of the environment. For this we'll fix the image plan between (0,0) and (300,300) and we'll just render spheres. Each surface has a grey-scale colour associated with it (a surface).
One thing I did find was that Clojure doesn't support the
The following functions determine where a sphere gets hit with a ray from a specific source (in this case a point) and the surface normal of a hit.
That's it for the machinery to actually generate the image - now we need some UI and something to actually draw it. The original code generated a PPM format image, but since Clojure has a decent UI toolkit with Swing, let's just render something in a Window instead. The UI just uses the "canvas" idiom I used for the bubble sort application.
All that remains is to define the
This post looks at translating the example (from Chapter 9) into Clojure.
Ray-tracing is a very simple technique. From ACL:
"To generate a 3D image, we need to define at least four things: an eye, one or more light sources, a simulated world consisting of one or more surfaces, and a plane (the image plane) that serves as a window onto this world. The image we generate is the projection of the world onto a region of the image plane."
So how do we generate the pictures? That's pretty simple too, all we do is for every pixel in the image plane just trace the ray that the eye would see from there. Done.
We start off by defining some simple maths functions and an object to represent a point in 3D space.
(defn square [x] (* x x)) (defstruct point :x :y :z) (defn magnitude [p] (Math/sqrt (+ (square (:x p)) (square (:y p)) (square (:z p))))) (defn unit-vector [p] (let [d (magnitude p)] (struct point (/ (:x p) d) (/ (:y p) d) (/ (:z p) d)))) (defn point-subtract [p1 p2] (struct point (- (:x p1) (:x p2)) (- (:y p1) (:y p2)) (- (:z p1) (:z p2)))) (defn distance [p1 p2] (magnitude (point-subtract p1 p2))) (defn minroot [a b c] (if (zero? a) (/ (- c) b) (let [disc (- (square b) (* 4 a c))] (if (> disc 0) (let [discroot (Math/sqrt disc)] (min (/ (+ (- b) discroot) (* 2 a)) (/ (- (- b) discroot) (* 2 a))))))))
The original Lisp code mixed the point structure with individual values. I felt this made the code a bit ugly and hard to read, so in here we try to use the point structure as much as possible.
(struct point 1 2 3)
feels like clunky syntax, but I was unable to find anything better. Perhaps an alternative is to just use a plain vector / map? Or wait for the future and see if struct support improves? Anyway, the code above is self explanatory,
minroot
is the big one and that's just a solver for the quadratic equation. function.Next we need to define some of the environment. For this we'll fix the image plan between (0,0) and (300,300) and we'll just render spheres. Each surface has a grey-scale colour associated with it (a surface).
(def eye (struct point 150 150 200)) (defstruct surface :color) (defstruct sphere :color :radius :centre) (defn defsphere [point r c] (struct sphere c r point)) (def world [(defsphere (struct point 150 150 -600) 250 0.32) (defsphere (struct point 175 175 -300) 100 0.64)])
One thing I did find was that Clojure doesn't support the
:include
for structs that Common Lisp does. For this example, the world is a couple of spheres one smaller than the other and in front (and slightly brighter).The following functions determine where a sphere gets hit with a ray from a specific source (in this case a point) and the surface normal of a hit.
(defn sphere-normal [s pt] (let [c (:centre s)] (unit-vector (point-subtract c pt)))) (defn sphere-intersect [s pt ray] (let [c (:centre s) n (minroot (+ (square (:x ray)) (square (:y ray)) (square (:z ray))) (* 2 (+ (* (- (:x pt) (:x c)) (:x ray)) (* (- (:y pt) (:y c)) (:y ray)) (* (- (:z pt) (:z c)) (:z ray)))) (+ (square (- (:x pt) (:x c))) (square (- (:y pt) (:y c))) (square (- (:z pt) (:z c))) (- (square (:radius s)))))] (if n (struct point (+ (:x pt) (* n (:x ray))) (+ (:y pt) (* n (:y ray))) (+ (:z pt) (* n (:z ray)))))))
sphere-intersect
can return nil if it doesn't hit. Now we define the Lambert function(defn lambert [s intersection ray] (let [normal (sphere-normal s intersection)] (max 0 (+ (* (:x ray) (:x normal)) (* (:y ray) (:y normal)) (* (:z ray) (:z normal))))))
That's it for the machinery to actually generate the image - now we need some UI and something to actually draw it. The original code generated a PPM format image, but since Clojure has a decent UI toolkit with Swing, let's just render something in a Window instead. The UI just uses the "canvas" idiom I used for the bubble sort application.
(def canvas (proxy [JPanel] [] (paintComponent [g] (proxy-super paintComponent g) (.setColor g Color/RED) (ray-trace world 1 g (.getWidth this) (.getHeight this))))) (defn raytraceapp [] (let [frame (JFrame. "Ray Tracing")] (doto frame (.add canvas) (.setSize 300 300) (.setResizable false) (.setVisible true))))
All that remains is to define the
ray-trace
function;; second item = what we hit ;; first item = where we hit (defn first-hit [pt ray] (reduce (fn [x y] (let [hx (first x) hy (first y)] (cond (nil? hx) y (nil? hy) x :else (let [d1 (distance hx pt) d2 (distance hy pt)] (if (< d1 d2) x y))))) (map (fn [obj] (let [h (sphere-intersect obj pt ray)] (if (not (nil? h)) [h obj]))) world))) (defn send-ray [src ray] (let [hit (first-hit src ray)] (if (not (nil? hit)) (let [int (first hit) s (second hit)] (* (lambert s ray int) (:color s))) 0))) (defn color-at [x y] (let [ray (unit-vector (point-subtract (struct point x y 0) eye))] (* (send-ray eye ray) 255))) (defn ray-trace [world res g w h] (let [buffered-image (BufferedImage. w h BufferedImage/TYPE_BYTE_GRAY)] (doseq [x (range 1 w)] (doseq [y (range 1 h)] (.setRGB buffered-image x y (color-at x y)))) (.drawImage g buffered-image 0 0 Color/RED nil)))The only major difference between this and the ACL code, is prefering to use
map
and reduce
instead of the nested do
code. This feels more functional to me and also opens up parallelism opportunities which I'll look at for the next post. So what does it look like (not very good, but a background of code looks cool!)?
Thursday 1 January 2009
Functional Programming Patterns.
Roundabout presents a set of patterns for writing recursive programs.
The relationship between patterns and refactoring has already been written about ("Refactoring to Patterns"). In this case the relationships are quite clear.
Not all of the patterns are directly applicable to all languages. For example, Program Derivation is unwarranted for Lisp type languages because macros provide all those capabilities. I guess that's another one for the design patterns are a weakness of the language meme.
- Structural Recursion - use recursion to traverse inductively defined data structures (e.g. list ortree)
- Interface Procedure - use a helper method to adapt a function call if you need an extra argument
- Mutual Recursion - use multiple recursive functions to traverse inductively defined data structures where elements in the data structure are also inductively defined
- Accumulator Variable - use an additional parameter to carry calculations through recursion
- Syntax Procedure - extract methods that mean something rather than accessing raw members of data structures (e.g. for trees written as lists use
(lhs foo)
instead of(first foo)
- Local Procedure - use anonymous methods to hide details from the outside world
- Program Derivation - inline code if you want more performance
The relationship between patterns and refactoring has already been written about ("Refactoring to Patterns"). In this case the relationships are quite clear.
- Program Derivation => inline method.
- Interface Procedure => adapter pattern
- Local Procedure => encapsulation
- Syntax Procedure => extract method
- Structural Recursion/ Mutual Recursion => Visitor Pattern (might be a bit tenuous!)
Not all of the patterns are directly applicable to all languages. For example, Program Derivation is unwarranted for Lisp type languages because macros provide all those capabilities. I guess that's another one for the design patterns are a weakness of the language meme.
Subscribe to:
Posts (Atom)