Fatvat: STUDENT

After adventures with Google Wave, back to PAIP. Next on the list is understanding and implementing the STUDENT problem solving system. The abstract of Daniel Bobrow's thesis is shown below:

The STUDENT problem solving system, programmed in LISP, accepts as input a comfortable but restricted subset of English which can express a wide variety of algebra story problems. STUDENT finds the solution to a large class of these problems. STUDENT can utilize a store of global information not specific to any one problem, and may make assumptions about the interpretation of ambiguities in the wording of the problem being solved. If it uses such information or makes any assumptions, STUDENT communicates this fact to the user.

On a side note, being at MIT in the 60's must have been amazing. The acknowledgements name Noam Chomsky, Marvin Minksy, Seymour Papert and Herbert Simon. Wow!

STUDENT consists of two parts. Linguistic analysis (pattern matching) is used to break the sentences into equations. The second part is simply solving those equations. This can be thought of as the precursor to systems like Wolfram Alpha. The Linguistic Analysis searches for certain patterns (kernel sentences). This pattern matching is expressed very simply using the pattern matching tools discussed previously. PAIP makes heavy use of symbols which means you need to learn how to escape. Clojure uses , as a symbol so if you want to put it in a pattern you need to use a back quote and use a ~ to evaluate the inner bit. For example:

`[~[(list 'if '?x* (symbol ",") 'then '?y*) '(?x ?y)]]

Constraint Propagation is a problem solving technique that relies on logical inference, not search. A good example would be Sudoku. Placing a number in a square places constraints on the other square (e.g. once you've put a 1 in a row, no other number in that row can be a 1). There's a good description of using constraint propagation to solve Sudoku here.

The equation solving part of Norvig's implementation of STUDENT is a good example of making a complex sounding problem simple. The algorithm for solving a list of equations is dead simple and the Clojure code itself explains it well.

Find an equation with one unknown, isolate the unknown, substitute and repeat until solved. Isolate itself is a simple rule based approach to isolating a variable on one side.

So how does this constraint propagation work?

(solve-equations '((= (+ 3 4) (* (- 5 (+ 2 x)) 7))
(= (+ (* 3 x) y) 12)))

From the list of equations we figure that the first equation has a single unknown, x. This is isolated by repeatedly applying the simple rules until we determine x is 2. This is then replaced in the second equation and the results returned.

One big difference between my Clojure implementation and the ANSI Common Lisp is the use of the struct to represent the expressions. I (foolishly!) represented mine with named keys which made life very painful. Lisp's defstruct is more powerful than Clojure's because it supports more representations. For example, you can choose whether you want a list or hash representation; in my case I went for hash in Clojure (that's the only choice), but that was painful. This could be addressed by defining a better defstruct for Clojure.

Friday, 24 July 2009

STUDENT