Up: Code | [Related] «^» «T» |
Wednesday, August 22, 2007
Twitter RSS in XSLT FTW
I've decided to geek out on Wednesdays. Today, I want to include a Twitter RSS feed (which, inspired by Joshua Allen, I tentatively plan to fill not with news of my clever doings but rather with filth about a fictional evil family) on the front page here at Ftrain. I'll do the coding in XSLT2.
(What I build below is just a toy. A grown-up XSLT2.0 RSS reader with a hardcore RFC 822 munger can be found over here. You can also look at the XSL FAQ.)
The Output and Input
First, here's the HTML output I'm hoping to create:
Tuesday, 21 Aug
7:24 p.m. — Grandpa Dieter up all night screaming take me to Maple St. synagogue so I can apologize. I'm like, for what? He just shakes his head. Crazy.
Now let's look at the input. Here's a typical Twitter RSS item:
<item> <title>Paul Ford: Grandpa Dieter up all night screaming take me to Maple St. synagogue so I can apologize. I'm like, for what? He just shakes his head. Crazy.</title> <description>Paul Ford: Grandpa Dieter up all night screaming take me to Maple St. synagogue so I can apologize. I'm like, for what? He just shakes his head. Crazy.</description> <pubDate>Tue, 21 Aug 2007 23:24:45 +0000</pubDate> <guid>http://twitter.com/paul_e_ford/statuses/218935552</guid> <link>http://twitter.com/paul_e_ford/statuses/218935552</link> </item>
You know how the Iraq War solved 9/11? That's how RSS 2.0 solved syndication. As you can see above, dates in RSS 2.0 are mistakenly formatted according to RFC 822, a standard created in the early months of the Reagan Administration. I'll make this mistake livable by writing some functions to turn RFC 822 dates into date strings that XSLT can understand.
Date Parser
First, since months in RFC 822 are indicated as abbreviated English month-names, i.e. “Jan” and “Feb,” I need a way to know the numeric value of each month. I create a global variable called $months.
<xsl:variable name="months"> <xsl:for-each select="tokenize('Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec', ' ')"> <f:m pos="{format-number(position(),'00')}"><xsl:value-of select="."/></f:m> </xsl:for-each> </xsl:variable>
Don't worry about what that means just yet. What it does is make a temporary XML tree that I can use for my own purposes:
<f:m pos="01">Jan</f:m> <f:m pos="02">Feb</f:m> <f:m pos="03">Mar</f:m> <f:m pos="04">Apr</f:m> <f:m pos="05">May</f:m> <f:m pos="06">Jun</f:m> <f:m pos="07">Jul</f:m> <f:m pos="08">Aug</f:m> <f:m pos="09">Sep</f:m> <f:m pos="10">Oct</f:m> <f:m pos="11">Nov</f:m> <f:m pos="12">Dec</f:m>
Now, if I need to know which month Jan represents, I just use this XPath statement:
$months//f:m[.='Jan']/@pos
The question that answers is “what is the value of the @pos attribute where the element <f:m> has a value equal to Jan?” The answer is always 01.
So now that I can find out the numeric value of the month I'll write a function that takes a more-or-less normal RFC 822 date/time string and returns a string that can be easily converted to a date by XSLT. The idea here is to take
Tue, 21 Aug 2007 23:24:45 +0000
for example, and turn it into
2007-08-20T23:24:45+00:00 .
<xsl:function name="f:rfc822-to-xs"> <xsl:param name="rfc822"/> <xsl:variable name="date-list" select="tokenize($rfc822 ,' ')"/> <xsl:variable name="year" select="$date-list[4]"/> <xsl:variable name="month" select="$months//f:m[.=$date-list[3]]/@pos"/> <xsl:variable name="day" select="format-number(xs:integer($date-list[2]),'00')"/> <xsl:variable name="time" select="$date-list[5]"/> <xsl:variable name="zone-hour" select="substring($date-list[6],1,3)"/> <xsl:variable name="zone-minute" select="substring($date-list[6],3,2)"/> <xsl:value-of select="concat($year,'-',$month,'-',$day,'T',$time,$zone-hour,':',$zone-minute)"/> </xsl:function>
The tokenize function splits the incoming string into a list, which is stored inside the variable $date-list. I chop up the elements inside $date-list a little more, formatting numbers, getting month values from the $months variable we created above, and cutting substrings out of strings to create the time zone; then I assign them to new variables.
Once I'm done slicing I put everything into the proper sequence with concat. Concat is the seaweed paper in my dateTime sushi. The resulting string can be parsed as a date in XSLT. (Of course this doesn't prepare us for all sorts of things that can go wrong—two-digit dates and so forth. But it works for Twitter RSS, so far.)
That function is never called directly. Instead, I write two functions to produce actual xs:date or xs:dateTime values.
<xsl:function name="f:rfc822-to-dateTime"> <xsl:param name="rfc822"/> <xsl:value-of select="adjust-dateTime-to-timezone(xs:dateTime(f:rfc822-to-xs($rfc822)))"/> </xsl:function> <xsl:function name="f:rfc822-to-date"> <xsl:param name="rfc822"/> <xsl:value-of select="xs:date(substring(xs:string(f:rfc822-to-dateTime($rfc822)),1,10))"/> </xsl:function>
The first function, f:rfc822-to-dateTime(), calls f:rfc822-to-xs() to turn the RFC 822 date into a more XSL-friendly format. Then it turns that into a xs:dateTime, and adjusts that xs:dateTime to the current timezone. I'm eastern standard, so this turns the time we started with from 2007-08-20T23:24:45+00:00 to 2007-08-21T19:24:45-04:00.
The second function, f:rfc822-to-date(), repeats all that by calling the first function (we want to adjust the time zone properly first thing), then slices off the first ten characters in the xs:dateTime and turns that into an xs:date. So you give it 2007-08-20T23:24:45+00:00, it turns that into the xs:dateTime 2007-08-21T19:24:45-04:00, turns that into text, cuts that down to 2007-08-21, and turns that into an xs:date. As far as I can tell there is no way to cast (convert) an xs:dateTime to an xs:date directly. You might ask: “why not?” But the serious XSLT practictioner does not ask but waits to learn. (I honestly have no idea. I'm sure there's a good reason.)
The Feed
All righty. Now we can deal with the feed itself. First I create two variables, one containing the address for the RSS feed and the other with my name (so that I can strip it out of the text).
<xsl:variable name="rss-uri" select="'http://twitter.com/statuses/user_timeline/6981492.rss'"/> <xsl:variable name="rss-to-strip" select="'Paul Ford: '"/>
Then I create a skeleton function that will turn an RSS feed into a sidebar here on Ftrain. It takes two parameters corresponding to the two variables we just defined.
<xsl:function name="f:rss-to-sidebar"> <xsl:param name="rss"/> <xsl:param name="strip"/> </xsl:function>
Now we need a root template to call the function.
<xsl:template match="/"> <div><xsl:sequence select="f:rss-to-sidebar(document($rss-uri), $rss-to-strip)"/></div> </xsl:template>
Which says, “fetch the document in $rss-uri and pass that, along with the contents of $rss-to-strip, into f:rss-to-sidebar.”
What I want to do next is take the flat list of RSS items that I passed to f:rss-to-sidebar and group the items by individual day. So back to f:rss-to-sidebar().
<xsl:function name="f:rss-to-sidebar"> <xsl:param name="rss"/> <xsl:param name="strip"/> <xsl:for-each-group select="$rss//item" group-by="xs:date(f:rfc822-to-date(pubDate))"> <xsl:sort select="current-grouping-key()" order="descending"/> <h3><xsl:value-of select="format-date(current-grouping-key(),'[FNn], [D01] [MNn,*-3]')"/></h3> </xsl:for-each-group> </xsl:function>
I use the for-each-group function to do this, and as my grouping key I use the date-processing function I wrote above (xs:date(f:rfc822-to-date(pubDate))). XSLT takes all the <item>s from the RSS feed and turns their <pubDate>s into real xs:dates. Then it groups the items together by date—it turns the list of RSS <item>s into a list of days and each day is associated with a list of <item>s.
Next, inside an <h3> tag, I format and print the grouping key (which represents the current date) using the “picture string” [FNn], [D01] [MNn,*-3]. Figuring that out is left as an exercise for the reader (it's 11 p.m. and I've got to get on the train and go home). But that turns 2007-08-21 into “Tuesday, 21 Aug.”
Now, inside of the for-each-group, I add the code that actually displays the individual RSS items, which are sitting there grouped up into current-group(), waiting to be used.
<xsl:function name="f:rss-to-sidebar"> <xsl:param name="rss"/> <xsl:param name="strip"/> <xsl:for-each-group select="$rss//item" group-by="xs:date(f:rfc822-to-date(pubDate))"> <xsl:sort select="current-grouping-key()" order="descending"/> <h3><xsl:value-of select="format-date(current-grouping-key(),'[FNn], [D01] [MNn,*-3]')"/></h3> <xsl:for-each select="current-group()"> <xsl:sort select="f:rfc822-to-dateTime(pubDate)" order="descending"/> <p><a href="{link}"><xsl:value-of select="format-dateTime(f:rfc822-to-dateTime(pubDate), '[h1]:[m01] [P]')"/></a> - <xsl:value-of select="replace(description, $strip, '')"/></p> </xsl:for-each> </xsl:for-each-group> </xsl:function>
This takes every item in the current-group() and in descending chronological order (thanks to the xsl:sort), spits out a paragraph with a link to the item on Twitter, the time it was published (another picture string here), and the actual text with my name stripped out.
The Code
The whole thing appears below. The content it produces appears in a column on the home page. The code is in the public domain if you want it.
I run it like this:
$ java -jar /Users/paul/bin/saxon8.jar rss2html.xsl rss2html.xsl
Command-line XSLT requires you to specify an XML file on which to operate. In this case I'm going outside to Twitter for our XML, so I just pass the xsl file itself (which is valid XML) as the source file.
So what's it good for? It's good for me. It'd be pretty easy to extend it to eat up a bunch of feeds and generate a page, but you probably already have a solution for that. XSLT2 is good for a lot of other stuff—I didn't use recursive templates or keys or any of the things that make the language awesome (nor did I type my params or do some other best-practice stuff, so do not kill me). But we have time, so what the hell. Maybe next week I'll go into some more for-each-group tricks for drawing complex HTML tables.
File: rss2html.xsl:
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:f="http://ftrain.com/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml" version="2.0" exclude-result-prefixes="f xsl xs" > <xsl:output method="xml" indent="yes"/> <xsl:variable name="rss-uri" select="'http://twitter.com/statuses/user_timeline/6981492.rss'"/> <xsl:variable name="rss-to-strip" select="'Paul Ford: '"/> <xsl:variable name="months"> <xsl:for-each select="tokenize('Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec', ' ')"> <f:m pos="{format-number(position(),'00')}"><xsl:value-of select="."/></f:m> </xsl:for-each> </xsl:variable> <xsl:template match="/"> <div> <xsl:sequence select="f:rss-to-sidebar(document($rss-uri), $rss-to-strip)"/> </div> </xsl:template> <xsl:function name="f:rss-to-sidebar"> <xsl:param name="rss"/> <xsl:param name="strip"/> <xsl:for-each-group select="$rss//item" group-by="xs:date(f:rfc822-to-date(pubDate))"> <xsl:sort select="current-grouping-key()" order="descending"/> <h3><xsl:value-of select="format-date(current-grouping-key(),'[FNn], [D01] [MNn,*-3]')"/></h3> <xsl:for-each select="current-group()"> <xsl:sort select="f:rfc822-to-dateTime(pubDate)" order="descending"/> <p> <a href="{link}"> <xsl:value-of select="format-dateTime(f:rfc822-to-dateTime(pubDate), '[h1]:[m01] [P]')"/> </a> — <xsl:value-of select="replace(description, $strip, '')"/> </p> </xsl:for-each> </xsl:for-each-group> </xsl:function> <xsl:function name="f:rfc822-to-dateTime"> <xsl:param name="rfc822"/> <xsl:value-of select="adjust-dateTime-to-timezone(xs:dateTime(f:rfc822-to-xs($rfc822)))"/> </xsl:function> <xsl:function name="f:rfc822-to-date"> <xsl:param name="rfc822"/> <xsl:value-of select="xs:date(substring(xs:string(f:rfc822-to-dateTime($rfc822)),1,10))"/> </xsl:function> <xsl:function name="f:rfc822-to-xs"> <xsl:param name="rfc822"/> <xsl:variable name="date-list" select="tokenize($rfc822 ,' ')"/> <xsl:variable name="year" select="$date-list[4]"/> <xsl:variable name="month" select="$months//f:m[.=$date-list[3]]/@pos"/> <xsl:variable name="day" select="format-number(xs:integer($date-list[2]),'00')"/> <xsl:variable name="time" select="$date-list[5]"/> <xsl:variable name="zone-hour" select="substring($date-list[6],1,3)"/> <xsl:variable name="zone-minute" select="substring($date-list[6],3,2)"/> <xsl:value-of select="concat($year,'-',$month,'-',$day,'T',$time,$zone-hour,':',$zone-minute)"/> </xsl:function> </xsl:stylesheet>
That's it.