I spend most of my work day writing twiki pages or going to meetings, so I've been doing coding on my own time, and presently have been working on refactoring the internals of Raptor's XML support. I'll explain why below.
It's been a long process and probably never ending. The reason I started writing Raptor in October 2000 was to have a conformant RDF/XML parser and to use the best XML parser available. This wasn't too clear then as you had the choice of:
- libxml / libxml2: new and good
- expat 1.95.x: old i.e. mature, well known but not having much development and also good.
So I made it work with both and as I needed namespace support for RDF/XML, made them both look like they generated something like SAX2 namespace events. At that time, only libxml2 supported namespaces. This libxml/expat + namespace support + RDF/XML parser was all done in one 140K C file. Which was a problem, but the parser did work!
Raptor slowly grew more features to support the updating of
RDF/XML and I became the editor of what would be the revised
RDF W3C Recommendation.
It added: URIs, URI resolving, URI retrieval, XML Qnames, XML
Namespaces, XML Base, Unicode, UTF-8 and an XML Writer for the
rdf:parseType="Literal"
handling. Plus a few new
parsers:
N-Triples
(I co-created this),
Turtle
(I created this; there's a theme here!)
and RSS Tag Soup for the
9 flavours of
RSS
(I have nothing to do with this :) ) plus
Atom.
Plus a slew of serializers to match, the XML Writer being
refactored to it's own public API for this. It's not really
an RDF parser library anymore, it's a web library with support
for mapping between syntaxes and RDF triples.
Meanwhile, SAX2 and RDF/XML were still intertwined. Until this
week in 2006. Finally I've pulled them apart which allows me to make a
few
neat things possible - the RSS tag soup parser has switched from
using libxml-only xmlReader
API to the separate SAX2 API
so now you can do RSS and Atom with expat too.
This also improves the Atom support as it can handle the
type='xhtml'
and type='xml'
markup plus
now uses the well-tested xml:base
, QNames and Namespaces
parts from Raptor. I hope that it'll also be able to deal with other
xml formats inside Atom, so I'm guessing RDF/XML in Atom will be
possible.
DOAP over Atom
anyone?
However at this point I'm stopping as the part that has me stumped is how to best represent Atom in RDF triples. The Atom OWL work seems to be going slowly (Aside: also the web site acts very oddly to HTTP wget/curl requests). Mostly I'd like to have readers not have to care that it was Atom or RSS tag soup to begin with, so I'm thinking something like an Atom / RSS1.0 hybrid format.
Handwaving: here's something I hand-edited:
<item rdf:about="http://example.org/blog/2006/04/01/stuff">
<!-- the common bits -->
<title>Stuff</title>
<link>http://example.org/blog/2006/04/01/stuff</link>
<description><div xmlns="http://www.w3.org/1999/xhtml"><>Content here</p></div></description>
<-- The RSS 1.0 bits -->
<dc:date>2006-04-02T20:19:00-08:00</dc:date>
<content:encoded><![CDATA[<div xmlns="http://www.w3.org/1999/xhtml"><p>Content here</p></div>]]></content:encoded>
<!-- The atom bits -->
<atom:id>tag:example.org,2006:1234</atom:id>
<atom:link rdf:parseType="Resource">
<atom:link-href rdf:resource="http://example.org/blog/2006/04/01/stuff" />
<atom:link-rel>alternate</atom:link-rel>
</atom:link>
<atom:updated>2006-04-02T20:19:00-08:00</atom:updated>
<atom:content rdf:parseType="Resource">
<atom:content-type>xhtml</atom:content-type>
<atom:content-content rdf:parseType="Literal"><![CDATA[<div xmlns="http://www.w3.org/1999/xhtml"><p>Content here</p></div>]]></atom:content-content>
</atom:content>
</item>
So why am I working on better Atom support in an RDF parser?
Because Atom 1.0 is the best way to encode data for blog entries. It's long past time to ditch the horror that is RSS, the worst ambiguously defined XML format since OPML.