BEER BEAR BEAD BEAK BEAT BELT WELT WILT WILE WINE
Let's write a simple program that calculates word ladders for a simplified version of the game where each change can only change a single letter at a time (rather than allow deletes and inserts too). Finding a word ladder can be thought about as a graph problem. Each node is a word, and each edge represents a connection to another valid word. The graph below is a partial representation of the graph starting with BEER. The real graph is much more complicated!
First things first, how do we build the graph? For starters we need to know if two words are neighbours or not. The following simple functions determine the difference between two strings. Remember that I'm only working on the simplest possible distance metric at the moment, that is allowing a single character to change.
neighbour :: String -> String -> Bool neighbour x y = difference x y == 1 difference :: String -> String -> Int difference [] [] = 0 difference (x:xs) (y:ys) | x == y = difference xs ys | otherwise = 1 + difference xs ys difference _ _ = error "Two strings must be the same length"
Next thing we need to grab is a huge list of words. We'll store these words in a Set because we'll want to test for membership frequently (an O(lg(N)) operation. An alternative would be to use a perfect hash (this is an option because the dictionary is fixed). This would give O(1) lookup times, and it looks like there is (as almost always) a library on Hackage that does just that. The simple distance metric chosen means we can limit the number of words based on the size of the input word.
import qualified Data.Set as S type WordSet = S.Set String wordListPath :: String wordListPath = "/usr/share/dict/british-english" createDictionary :: Int -> IO WordSet createDictionary n = do file <- readFile wordListPath return $ S.fromList $ filter (\x -> length x == n && all isAlpha x) (map (map toLower) $ words file)
Once we've got a dictionary, all we need to do is build the graph. Since Haskell is lazy, we don't need to worry about the space complexity of the graph - we'll just build it lazily and only the bit that is explored will be resident in memory.
Each node contains the word it represents and the links to the child elements. The graph is built by starting at a root, and filling all the valid neighbours. Each time we place a word in the graph we remove it from the dictionary, otherwise we'll get cycles in the graph.
data Node = Node String [Node] deriving Show buildGraph :: WordSet -> String -> Node buildGraph wordset top = Node top (map (buildGraph smaller) neighbours) where neighbours = S.toList (S.filter (neighbour top) smaller) smaller = S.delete top wordset
The graph is *huge*, so we need to find some way to limit the search space. The most obvious way is to give a restriction on the depth of the search. A word ladder that is 10000 words rungs high is probably not much fun to complete. We can also cut the search short if the word is too many changes away given the maximum depth (for example, if 4 characters need to change in a 5 letter word and the maximum left to search is 3 then we can prune this search branch).
search :: Node -> Int -> String -> [String] search graph maxDepth goal = search' graph maxDepth goal [] search' :: Node -> Int -> String -> [String] -> [String] search' (Node end children) maxDepth goal path | end == goal = end : path | null children = [] | length path >= maxDepth = [] -- too deep | difference end goal >= maxDepth - length path = [] -- too much difference | otherwise = first where childRoutes = filter (not . null) $ map (\child -> search' child maxDepth goal (end : path)) children first | null childRoutes = [] | otherwise = head childRoutes quickest | null childRoutes = [] | otherwise = minimumBy (comparing length) childRoutes
The way we search the children is important. In this case we've used
first
gone for the first available route that satisfies the depth guarantee, but isn't guaranteed to be the shortest route. quickest
on the other hand calculates all child routes and finds the minimum length part.Finally, we can put this all together and write a simple search search function.
makeLadder :: Int-> String -> String -> IO [String] makeLadder maxDepth start end | length start /= length end = error "Only two strings of equal length are currently supported." | otherwise = do dict <- createDictionary (length start) if (S.member start dict && S.member end dict) then return $ search (buildGraph dict start) maxDepth end else return []The complete code for this version available on my git hub repo here. This version has several problems.
- It's too slow - searching for the minimal path can take considerable time
- It's not very flexible
data DistanceMetric = DistanceMetric (Word -> Word -> Int) (Word -> WordSet) difference :: Word -> Word -> Int difference x y | length x /= length y = 999999 | otherwise = sum $ zipWith (\c1 c2 -> if c1 == c2 then 0 else 1) x y transposeChar :: Word -> [Word] transposeChar [] = [] transposeChar (x:xs) = map (:xs) (validChars \\ [x]) deleteChar :: Word -> [Word] deleteChar [] = [] deleteChar (x:xs) = [xs] insertChar :: Word -> [Word] insertChar [] = [] insertChar (x:xs) = map (\y -> y:x:xs) validChars differenceEdit :: Word -> WordSet differenceEdit x = edit' x [transposeChar] editDistanceEdits :: Word -> WordSet editDistanceEdits x = edit' x [insertChar,transposeChar,deleteChar] edit' :: Word -> [Word -> [Word]] -> WordSet edit' w fns = S.fromList $ concat $ zipWith (\x y -> map (\z -> x ++ z) (concatMap (\x -> x y) fns)) (inits w) (tails w) simple :: DistanceMetric simple = DistanceMetric difference differenceEdit edits :: DistanceMetric edits = DistanceMetric editDistance editDistanceEditsThis gives two distance functions and two ways of generating edits. The Levenshtein distance of 1 is generated by transposing, deleting and inserting characters from the original word. This gives us the flexibility, because another distance metric could be put in place (anagrams perhaps?). Next to performance.
buildGraph :: DistanceMetric -> WordSet -> Word -> Node buildGraph d@(DistanceMetric dist edits) wordset top = Node top (map (buildGraph d smaller) neighbours) where possibleNeighbours = edits top neighbours = S.toList (smaller `S.intersection` possibleNeighbours) smaller = S.delete top wordset search :: DistanceMetric -> Node -> Int -> Word -> [Word] search (DistanceMetric dist _) graph maxDepth goal = search' graph [] where search' (Node end children) path | end == goal = end : path | length path >= maxDepth = [] -- too deep | dist end goal >= maxDepth - length path = [] -- too much difference | otherwise = first where -- Find the best node to search by comparing it against the goal costForNextChild :: [(Int,Node)] costForNextChild = zip (map (\(Node x _) -> dist x goal) children) children bestFirst = map snd $ sortBy (comparing fst) costForNextChild -- Best first search childRoutes = filter (not . null) $ map (\child -> search' child (end : path)) bestFirst first | null childRoutes = [] | otherwise = head childRoutesTwo things have changed from the original code. The first is that the graph is built by comparing the edits against the dictionary, rather than the word against the whole dictionary. This is the main saving and makes it *hugely* faster (thanks to jkkramer for the pointer and this post.) The only other change is that we decide which node to search next based on how close it is to the goal (a best-first search). With these changes it can now solve all of the problems I've tried at wordchains.com. Neat. The complete code is available here.