Building a report from a log file (or collection of log files) typically requires:
- A representation of the events I'm interested in
- A predicate for determining whether the line is an event (
String -> Bool
) - Converting the line into an event (
String -> Event
) - Folding the results into a single report (
[Event] -> Report
)
In this example, I wanted to parse my dpkg logs (/var/log/dpkg.log) to see how many times I've upgraded packages on Ubuntu. Firstly I defined a type to represent an upgrade event:
type Package = String
data Upgrade = Upgrade { packageName :: Package
, updateTime :: UTCTime }
instance Show Upgrade where
show a = show (updateTime a) ++
":" ++ show (packageName a)
instance Show Upgrade
is similar to deriving Show
, but allows you to customize how the object will be converted to a string.Next step is to get a predicate and to parse the lines. I decided to combine these into one function and return
Maybe Upgrade
to indicate success / failure. I used the Date.Time modules to parse the data and time. The parsing is terrible but suffices for now as I'm just trying to get an idea of where I need to generalize. Note to self read (and ideally understand!) Monadic Parsing Combinators [PDF] and associated Haskell module.
getTime :: String -> UTCTime
getTime = fromJust . parseTime defaultTimeLocale timeFormat
getPackageName :: String -> String
getPackageName = takeWhile (not . Char.isSpace)
-- Poor mans parsing.
parseLine :: String -> Maybe Upgrade
parseLine s
| isInfixOf " upgrade " s = Just
(Upgrade
(takeWhile (not . Char.isSpace) (drop 28 s))
(getTime (take 20 s)))
| otherwise = Nothing
All I need to do now is a combining action to perform with foldl. For this I've defined a report of type
Map Day [Package]
which represents an association between a day and all the names of the packages updated on that day.
processFile :: FilePath -> IO([Upgrade])
processFile s = do
a <- readFile s
return (Maybe.mapMaybe parseLine (lines a))
type Report = Map Day [Package]
combine :: [Upgrade] -> Report
combine = foldl addToReport Map.empty
addToReport :: Report -> Upgrade -> Report
addToReport r p = Map.insert day packages r where
day = utctDay (updateTime p)
initVal = Map.findWithDefault [] day r
packages = packageName p:initVal
reportFile :: FilePath -> IO()
reportFile f = do
a <- processFile f
print (combine a)
return ()
Hurrah, so now I get output in the right format and I can see that I really shouldn't have added some of the Firefox 3.5 bleeding edge repositories to my Ubuntu upgrade paths. Upgrading Firefox (or Shiretoko) every few days is a bad thing. D'oh!
The next stage for me to understand is how I can generalize this. What I really want to develop next is a simple pluggable framework. It seems that I need to generalize at least the following bits:
- Parsing a line into a type T
- Combining [T] to produce a single report