Sunday 12 July 2009

Merging RSS Feeds

In order to investigate XML parsing in Clojure I knocked up a quick utility to aggregate RSS feeds. The idea is that a user provides a number of URLs, runs the program and a merged RSS feed is returned, sorted by publish date.

Firstly, let's write a trivial function to join any number of lists together according to a selection function. This will be used to merge together the XML by selecting the minimum published date each time.



The function f should be a function of two arguments that returns one of the arguments. For example:


user> (join-all min [1 2 3] [1 1 3])
(1 1 1 2 3 3)

user> (join-all min [2 4 6 8 10] [1 3 5 7 9])
(1 2 3 4 5 6 7 8 9 10)


Next, we want some utilities to mess around with RSS format and select various XML elements. Clojure Contrib provides a lazy xml package, together with some utilities to make XML zip filtering easier (I previously looked at zip filtering here).

Since the examples for clojure.contrib.zip-filter.xml already use RSS this is really trivial:



Note that the code above already handles URLs (if the supplied type to parse-trim is a URI then this is resolved, retrieved and parsed. Finally, all we need to do is put it together:



Comparing XML dates is very tiresome in Java because it's supplied date/time libraries are painful. Thankfully, Joda Time provides a solution (see here for a description of the best way to parse a date time).