Sunday 12 July 2009

Merging RSS Feeds

In order to investigate XML parsing in Clojure I knocked up a quick utility to aggregate RSS feeds. The idea is that a user provides a number of URLs, runs the program and a merged RSS feed is returned, sorted by publish date.

Firstly, let's write a trivial function to join any number of lists together according to a selection function. This will be used to merge together the XML by selecting the minimum published date each time.

The function f should be a function of two arguments that returns one of the arguments. For example:

user> (join-all min [1 2 3] [1 1 3])
(1 1 1 2 3 3)

user> (join-all min [2 4 6 8 10] [1 3 5 7 9])
(1 2 3 4 5 6 7 8 9 10)

Next, we want some utilities to mess around with RSS format and select various XML elements. Clojure Contrib provides a lazy xml package, together with some utilities to make XML zip filtering easier (I previously looked at zip filtering here).

Since the examples for already use RSS this is really trivial:

Note that the code above already handles URLs (if the supplied type to parse-trim is a URI then this is resolved, retrieved and parsed. Finally, all we need to do is put it together:

Comparing XML dates is very tiresome in Java because it's supplied date/time libraries are painful. Thankfully, Joda Time provides a solution (see here for a description of the best way to parse a date time).