Dave Beckett's blog

RDF Syntaxes 2.0

2010-01-24 20:38

I've been diligently ignoring the RDF 2.0 threads on the semantic-web interest list, especially on Syntax since I've been there before (Modernising Semantic Web Markup). Firstly I'd endorse what Jeremy Carroll says about the features.

I think I'm qualified as an expert on RDF graph serializations / syntax since:

and I implemented all of the above plus GRDDL, RDFa (via librdfa), Atom and RSS*es, RDF/JSON, ... in Raptor

People moan about RDF/XML and have for years. I even wrote down in great detail the flaws in Modernising Semantic Web Markup. Over all that time nobody has come up with a credible and complete XML syntax alternative that stuck, even myself. Let me summarize the ones I know:

  • TriX: had little takeup
  • RXR: ditto
  • GRIT: new, but flawed since it can only represent trees (no named bnodes)

The fundamental problem I think with using XML to write down graphs is:

People looking at XML expect they are looking at a hierarchical Tree.

So writing a Graph in an XML Tree is just going to always fail the simplicity test. This might come from using the XML DOM or looking at HTML, XHTML, but it's pretty embedded in the mind.

Right now I'd dismiss any XML format for any "simple" or "obvious" way to write down RDF graphs that will be accepted by new users.

(Aside: There's also a technical argument that no XML format can ever represent all RDF graphs since RDF allows Unicode codepoints that are not allowed in XML).

Now this isn't a problem just with XML, it's also true of other non-XML formats that are serial hierarchical documents. That means formats like JSON, which cannot even out-of-the-box represent anything that is not a tree, since it has no ID/REF mechanism.

Of course, apart having dealt with the RDF/XML I also invented Turtle (based on the N3 syntax, simplified) and although it's a non-XML syntax, does seem to be in the sweet spot for users understanding it, without having the hierarchical document expectation. Yes, Turtle is close to JSON/python in syntax design space but this doesn't seem to have been a problem.

So I'm happy with how Turtle turned out and that should be the focus of RDF syntax formats for users. It does need an update and I'll probably work on that whether or not a new syntax is part of some future working group - I have a pile of fixes to go in. Adding named graphs (TRIG) might be the next step for this if it was a standard.

It may be there is a need for a better machine format, but please don't mix them. Also, machines can read Turtle RDF :)

Consider this stream of conciousness RDF syntax thoughts as the basis of my position paper for the W3C RDF Next Steps workshop.