Thursday, 18 June 2009

Generalizing the General Problem Solver

The last post has a basic implementation of the General Problem Solver from PAIP, but it features a few big problems.

  1. "Running around the Block Problem" - In order for an action to occur it must have a consequence. How would you represent running around in circles?
  2. "Clobbered Sibling Problem" - The function to achieve the goal in the previous version just focuses on whether the each item in the list of goal states is achieved at least once by the end of the solution. By the time the final solution is reached sibling goals may have been undone.
  3. "Leaping before you look" - If we fail to find a solution we execute all actions on the way (looking before you leak). For example, we might execute jump off cliff before executing wear parachute!
  4. "Recursive Sub Goal" - To achieve A you need B, which needs A thus resulting in an infinite loop (or recursion in FP terms!).
Before we see how to solve these problems, Norvig introduces some nice debug functions, shown below in Clojure form. The functions allow you to start listening to specific tokens. Lines will only be printing if you are listening to the specific keyword. This is an example of a function that should probably be rewritten as a macro so that you only pay the cost at compile time. Of course that assumes that you aren't changing the tokens you want to listen to at run time.
(def *dbg-ids* (ref #{:gps}))

(defn dbg 
[id format-string & args]
"Print debugging info if (DEBUG id) has been specified"
(when (contains? id @*dbg-ids*)
(println (format format-string args))))

(defn debug
[& ids]
"Start dbg output on the given ids"
(dosync
(alter *dbg-ids* (fn [x] (set (union x ids))))))

(defn undebug
[& ids]
"Stop dbg output on the given ids"
(dosync
(alter *dbg-ids* (fn [x] (difference x ids)))))

(defn dbg-indent
[id indent format-string & args]
"Print indented debugging info if (DEBUG ID) has been specified"
(when (@*dbg-ids* id)
(println (format (str (apply str (repeat indent \space)) format-string) args))))

The main difference from the previous code is achieve-all which ensures all goals are achieved throughout the process and the use of the executing convention to denote actions that are being executed. As you'd expect the primary difference with Common Lisp is that we've got to use transactions to perform mutability. In achieve-all, I've used an atom to represent the current state. Atoms are used when only one thread needs access to the code and all changes are synchronous. In this case, the atom exists only for one loop hence an atom is the right choice. Clojure's support of sets meant that representing the goal, add and delete lists as sets made sense. Representing the result as a set is wrong though, because I want to return the list of actions that must be performed in the correct order (see the version history on Git to see the catalogue of mistakes I made before realizing this!).
(defn contains-value?
[coll val]
(not (nil? (some (partial = val) coll))))

(defn executing?
[x]
"Is x of the form: (executing ...)?"
(and (seq? x) (= 'executing (first x))))

(defn convert-op
[op]
"Make op conform the the (EXECUTING op) convention"
(if-not (some executing? (:add-list op))
(struct operation 
(:action op) 
(:preconditions op) 
(set (conj (:add-list op) (list 'executing (:action op))))
(:del-list op))
op))

(defn make-op
[action preconditions add-list del-list]
(convert-op (struct operation action preconditions add-list del-list)))

(defn appropriate?
[goal operation]
"An op is appropriate to a goal if it is in its add list"
(contains-value? (:add-list operation) goal))

(declare achieve-all)

(defn apply-op
[state goal op goal-stack]
"Return a new, transformed state if op is applicable."
(dbg-indent :gps (count goal-stack) "Consider: %s" (:action op))
(let [new-state (achieve-all state (:preconditions op) (cons goal goal-stack))]
(when-not (nil? state)
(dbg-indent :gps (count goal-stack) "Action: %s" (:action op))
(concat (remove (fn [x] (= x (:del-list op))) new-state)
(:add-list op)))))

(defn achieve
[state goal goal-stack]
"A goal is achieved if it already holds,
or if there is an appropriate op for it that is applicable"
(dbg-indent :gps (count goal-stack) "Goal: %s" goal)

(cond
(contains-value? state goal) state
(contains-value? goal-stack goal) nil
:else (some (fn [op] (apply-op state goal op goal-stack))
(filter (fn [x] (appropriate? goal x)) @*ops*))))

(defn sequential-subset?
[s1 s2]
(and (<= (count s1) (count s2))
(every? (fn [x] (contains-value? s2 x)) s1)))

(defn achieve-all 
[state goals goal-stack]
"Achieve each goal, and make sure they still hold at the end."
(let [current-state (atom state)]
(if (and (every? (fn [g] (swap! current-state 
(fn [s] (achieve s g goal-stack)))) goals)
(sequential-subset? goals @current-state))
@current-state)))


(defn GPS
[state goals ops]
"General Problem Solver: from state, achieve using ops"
(dosync
(ref-set *ops* ops))
(remove (comp not sequential?) (achieve-all (cons (list 'start) state) goals [])))

GPS is a semi-predicate (a function that returns nil on failure and some useful value otherwise). It returns the list of actions required to reach the goal, or nil. This GPS can solve a larger class of problems, and Norvig gives the example of the "Monkey and Bananas" problem attributed to Saul Amarel. The output from the REPL shows this implementation of the "General Problem Solver" working on this problem:
uk.co.fatvat.gps2> (monkey-and-bananas)
Goal: (not-hungry)
Consider: (eat-bananas)
Goal: (has-bananas)
Consider: (grasp-bananas)
Goal: (empty-handed)
Consider: (drop-ball)
Goal: (has-ball)
Action: (drop-ball)
Goal: (at-bananas)
Consider: (climb-on-chair)
Goal: (at-middle-room)
Consider: (push-chair-from-door-to-middle-room)
Goal: (at-door)
Goal: (chair-at-door)
Action: (push-chair-from-door-to-middle-room)
Goal: (chair-at-middle-room)
Goal: (on-floor)
Action: (climb-on-chair)
Action: (grasp-bananas)
Action: (eat-bananas)
((start) 
(executing drop-ball) 
(executing push-chair-from-door-to-middle-room)
(executing climb-on-chair)
(executing grasp-bananas)
(executing eat-bananas))
So does the GPS live up to the promises? As you may have guessed, not quite! It requires a complete specification of the problem to be useful. Even if you do have a perfect description of the problem, it might take forever to find the answer (see NP-hard problems for many examples). Full code is available on my Clojure Project git page.