A retrospective on the development of the RDF/XML Revised Syntax

David Beckett

Institute for Learning and Research Technology (ILRT),
University of Bristol, 8-10 Berkeley Square, Bristol, BS8 1HH, UK

This is a pre-print of a paper submitted to ISWC2003

Abstract:

This paper reviews the process that was undertaken in revising the transfer syntax for RDF as defined in the RDF Model and Syntax W3C Recommendation[1] by the RDF Core Working Group[2] and the problems that are now clear especially comparing the revised RDF model and new abstract syntax. The syntax looks out of date in particular with the use of XML QNames giving unconstrained syntax terms in the XML, causing problems with newer XML technology such as XSLT, DTDS and W3C XML Schema[3] and other XML-constraining languages.

In order to deliver a modern RDF syntax, this paper reviews the requirements for RDF in two aspects - as a canonical transfer syntax and one for end-users, targeted at HTML. It evaluates previous RDF syntax proposals against these requirements and analyses the pros and cons of XML and non-XML syntaxes. The conclusion is a summary of syntax approaches for future standardisation activity.

1. Introduction to RDF/XML

This paper reviews the process revising the transfer syntax for RDF as defined in the W3C RDF Model and Syntax W3C Recommendation[1] (M&S) in February 1999. This syntax was designed for a variety of goals by the RDF working group including enabling it to be embedded in HTML (not XHTML) in order to describe web pages, with a frame-style syntax and using XML QNames in order to shorten the long URIs that RDF uses for its terms. The XML Namespace specification was developed in parallel with RDF, and RDF was one of the first W3C specifications to use it. Figure 1 shows some RDF/XML that captures the sentence Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila.

Figure 1: Example RDF/XML from the RDF Model and Syntax Specification (1999)
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:s="http://description.org/schema/">
  <rdf:Description about="http://www.w3.org/Home/Lassila">
    <s:Creator>Ora Lassila</s:Creator>
  </rdf:Description>
</rdf:RDF>

The outer rdf:RDF XML element encloses the scope of the RDF/XML. The inner rdf:Description element is the ``frame-style'' block of properties, all about the resource with URI http://www.w3.org/Home/Lassila. Here the element s:Creator represents the property with the value ``Ora Lassila''. This element encodes for the URI reference that is defined by the namespace name (URI) for ``s'' which in this case is http://description.org/schema/ concatenated with the local name of the element (Creator) giving the URI http://description.org/schema/Creator.

When a property has a URI value, an rdf:resource attribute is used on the empty property element with the URI as the attribute value. A property value can also have an XML language, given with an xml:lang attribute and can have an XML content when the parseType="Literal" attribute is used on the property element.

In order to allow embedding in HTML such that simple web clients could ignore them, this required no visible XML element content. The syntax allowed this by several abbreviations including writing properties with literal content as XML attributes in what was called the Basic Abbreviated Syntax form.

There were several other abbreviations both to make the resulting RDF/XML more compact and to allow the omission of description blocks. Several common RDF vocabulary terms had special support such as the rdf:type property and the reification vocabulary. RDF containers (ordered / unordered / alternative of list of resources) had an abbreviation that provided easy generation of the container membership properties.

Above the statement level there were generators for distributed description in three ways. The aboutEach and aboutEachPrefix attributes allowed statements to be made about multiple resources in a container (the former) or about all resources with a URI of a certain prefix (the latter). The bagID attribute allowed descriptions of the collection of statements given in one of the frame-style descriptions using RDF reification.

The RDF/XML syntax was defined by an extended BNF in a formal grammar along with descriptive text in several sections of the document. The use of namespaced elements and attributes meant that using a DTD to define it was not possible and this was before modern XML schema language standardisation work was started so there was no W3C XML Schema, Relax or Relax NG etc. available.

2. Revising RDF - RDF Core

The RDF specification and syntax was picked up by several groups working with metadata on the web and the RDF/XML syntax was used to transfer their RDF content. These users included the Dublin Core[4] community, DAML+OIL[5], CC/PP, PRISM, RSS 1.0 and Adobe's XMP[6]. These groups along with other implementors gave feedback to the W3C via the RDF comments and interest group lists on issues for RDF and RDF Schema that needed answering. The W3C Semantic Web Activity begin in February 2001 and started a new working group for RDF, the RDF Core working group[2] to deal with these.

RDF Core's charter was to:

The single specification (of RDF, excluding RDF schema) was found to have problems with mixing the model, syntax and semantics. A clear split was required and RDF Core dealt with this by creating a RDF Concepts and Abstract Syntax[7] document where the RDF graph was defined with no concrete syntax and an RDF Semantics document that used it to define the model theoretic semantics of RDF. The remaining parts of introduction and explanatory material was covered by the RDF Primer and the RDF/XML syntax material in a separate RDF/XML Syntax Specification (Revised)[8] document. The development of the latter document is the primary subject of this paper.

The original RDF formal model in [1] gave three different descriptions of RDF - as 3-tuples (triples), as a graph of directed labelled arcs and as 2-ary (binary) predicates. The XML syntax hinted at a fourth frame/slots-style formulation. RDF Core adopted the expression of the RDF abstract syntax in terms of sets of triples and used that as the single method to connect the separate documents in revising and clarifying RDF. The model also was clarified in certain key places; M&S allowed nodes to appear in the graph without URIs, but this was never discussed in detail. The updated specifications named them ``blank nodes'' and allowed syntax-specific identifiers to be used in order to preserve their identity in serialisations.

3. Problems with RDF/XML in RDF Model and Syntax

3.1. Definition of the syntax

The specification of RDF/XML in Section 6 of [1] via an extended BNF grammar had several problems:

Thus a better way was needed to define this was required with unambiguous clarity in the expression, including separation from the XML detail.

3.2. Reported RDF/XML problems

Comments on the RDF documents were recorded on the RDF Issue List[9] as well as taken from discussion on the RDF interest group and other lists. The main issues from these fora are as follows:

  1. One cannot tell an RDF node element/property element by simple inspection of the element in question without knowing the ``striping'' (after Brickley[10]).
  2. The frame-style approach does not clearly match the triples in the RDF graph.
  3. There are excessive choices in choosing how to write RDF/XML.
  4. Elements, attributes and attribute values are used for the same purposes, for example, encoding a URI.
  5. The way that XML QNames are used does not constrain the elements and attributes that can appear in RDF/XML.
  6. The unconstrained syntax cannot be described completely with XML schema languages such as DTDs and W3C XML Schema.
  7. RDF/XML does not use W3C XML Schema datatypes.
  8. The syntax is not easy to use with XML technologies such as XSLT, XQuery and other XML tools.
  9. It is impossible to embed in XHTML while retaining DTD validation.
  10. The syntax is incapable of encoding all legal RDF graphs.
  11. In particular, certain graphs with blank nodes cannot be serialised.
  12. It uses both namespaced and non-namespaced XML attributes for syntax terms.
  13. It is hard to emit human-readable RDF/XML from an RDF graph due to the range of choices (after Carroll[11]).
  14. Various aesthetic comments were levelled such as such as being ``ugly''.

4. Revising RDF/XML

4.1. XML changes since 1.0

The XML specification has changed since 1.0 in 1998 with XML (Second Edition)[12] (2000) and there has been further work in building upon and formalising the XML core document. There are many XML technologies that have been developed; the main ones related to core XML that have widespread development are XML Base[13] and XML Infoset[14]. XML Base allows a document to specify a URI for resolving relative URIs against. XML Infoset, which itself uses XML Base, defines one way of expressing XML without depending on certain XML details. Both of these have been used to define other important specifications such as W3C XML Schema, XPath and XQuery. The XML and XML Namespaces specifications are also being modified to version 1.1 for enhanced internationalisation (I18N) support by using newer Unicode versions and updated whitespace rules. The Character Model for the World Wide Web[15] explains current best practice in how web specifications should provide interoperable text manipulation. The W3C Technical Architecture Group (TAG) work on key web technology such as URIs, resources and namespaces also touched on many issues that RDF and RDF/XML depended on. The existing RDF/XML definition needed to be updated in light of these considerations.

4.2. RDF/XML Syntax Data Model

The new RDF/XML definition was required to be precise as possible and provide a complete and deterministic mapping from XML to the RDF graph such that it was clear that an isomorphic graph was always generated from the given XML. As the group decided issues on the mapping, these were codified as test cases written in a form suitable for checking by software (disccussed below).

In order to abstract away from the XML detail, the revised RDF/XML document was specified from XML Infoset Information Items (Infoitems) which provides the required formality and supports XML Base and Namespaces above the core XML. Infoitems provide an model of the XML still in terms of elements, attributes and character data. This is not totally appropriate for defining the mapping from RDF/XML to the RDF graph which deals with URI-references rather than XML QNames. An intermediate syntax data model form was designed that mapped one-to-one from the Infoitems to syntax data model Events (the name implies no time-based processing, it was used in order to distinguish it from Infoitems, RDF triple objects and RDF graph nodes).

The syntax data model also defined new Events for terms that would be used in the mapping to the RDF Concepts such as for URI references, blank nodes, RDF literals (with language) and Typed Literals (with datatype).

The transformation to syntax data model events turned the QNames in the Infoitems into RDF URI References in the Events and attribute values that were used as URI strings into RDF URI References relative to the current base URI. It also dealt with some corner cases such as handling allowing both non-namespace about and rdf:about (transforming to the latter) as well as turning xml:lang on XML literals into language-coded RDF Literals.

In order to write down the mapping in test cases, the input form was RDF/XML however the output form had to match the RDF graph very precisely - in triples - and it was clear that RDF/XML did not clearly match the triples abstract syntax.

Therefore RDF Core defined a text-based, simple and canonical RDF test case language N-Triples in the RDF Test Cases[16] working draft. This format allowed RDF graphs to be written down clearly and used not just in RDF/XML/RDF graph mappings but in other graph-to-graph entailment tests in the RDF Semantics specification. N-Triples also enabled discussions of the abstract syntax separate from XML issues. This format was not intended as a new user syntax (discussed further in Section 8).

The new working draft defined the mapping from RDF/XML to the abstract RDF Graph (using N-Triples to write down that abstraction) along with test cases in [16] to pin down the issue resolution precisely in testable form. A user-friendly explanation of the syntax was added to the working draft to explain all the abbreviated forms begining with an introduction of the property / node element ``striping'' so that this alternation was more explicitly pointed out to readers.

4.3. RDF/XML (Revised) Syntax Additions and Removals

RDF Core considered some particularly problematic parts of the syntax and after due consideration and consultation with the community, removed some from the language. The items that were deleted were the distributed referents bagID, aboutEach and aboutEachPrefix. These were felt to operate at the wrong level as well as having complex interactions, were sporadically implemented and little used. Removing them made the syntax data model more clearly based about mapping to triples rather than deal with these somewhat higher level operations.

Updating and formalising the RDF model as well as resolving issues caused additional syntax to be added. The additions were:

The design of the last in terms of triples was copied from the DAML+OIL[5] syntax extension over the original RDF/XML and provides a lisp-style list of resources that could be more easily reasoned over than the RDF container style approach. This design from DAML+OIL was not able to describe a collection of datatyped literals which was another problem raised during the revising.

4.4. RDF/XML revised but not replaced

The charter restriction about not making a new syntax limited what the group could do in terms of changing or paring down the syntax beyond what was strictly required to fix problems or reply to issues. This included creating an XML syntax that closely mapped the abstract syntax and could represent all legal RDF graphs, deleting some of the many abbreviation forms, and updating to use newly available XML technologies such as defining the syntax in terms of an XML schema (due to the unconstrainedness). This suggested requirements on a future syntax to provide one that can represent all RDF graphs, and also a canonical RDF graph serialisation. Here canonical means no abbreviated or alternative forms for individual triples, not whole-graph canonicalisation.

XML best practice and style has also evolved from when RDF/XML was created, XML was one year old and XML Namespaces was one month old, so that the design may not seem so modern.

5. Embedding RDF/XML in other XML formats

Subsequent to the RDF recommendation in 1999, there has been some consideration of embedding RDF in other XML formats. The Scalable Vector Graphics (SVG)[19] recommendation contains a metadata tag that can contain any metadata, with RDF given as the example.

The main RDF embedding issue that arise is for HTML, and in particular with XHTML. Embedding in either format without validation is straightforward but validation of HTML is generally seen as the current best practice. Adding new elements to XHTML is expected to be done by using XHTML Modularization and validating via DTDs or W3C XML Schema. Adding an unconstrained RDF/XML syntax this way is rather tricky for a particular subset of RDF/XML or impossible for arbitrary RDF/XML. RDF Core recommended not embedding but using a <link> tag in the <head> of the document to give the URI of what could potentially be another RDF/XML document.

Another approach that has been tried is embedding RDF/XML in HTML comments as used by the Creative Commons (CC) metadata and Movable Type's TrackBack RDF approaches. It is not clear if this is portable since the content of HTML or XML comments are not generally seen as being in the document content[20] and it cannot be distinguished from other commented materials. However, the existing applications of these do not use HTML or even XML-level techniques to find the embedded RDF but rely on regular expression matches on the surrounding HTML.


6. New Syntaxes for RDF

There are two general classes of syntax that have been found needed from the existing development of RDF/XML and discussion with other communities:

  1. A transfer syntax that clearly represents the RDF graph, preferably canonically (as defined in Section4.4).
  2. A syntax that is aimed to be easy to author and read.

These have such different targets that they may not be met by a single syntax since the former tends to suggest minimal use of user-friendly forms and the latter would have ``syntactic sugar'' to enable both common and complex RDF triple structures to be written concisely. A single syntax may work poorly at both jobs and remain inappropriate for both - not much of an improvement over the current state. It is not clear that there is even one good end user syntax, rather than many for different communities.

The requirements for a future syntax come from the problem reports on the existing syntax, experience from issues[9] that emerged during the revision of RDF/XML, comments on the new syntax working drafts and also recorded issues on RDF Core's postponed issue list. These were mostly postponed due to it being out of scope of the group's charter. The following sections contain the requirements grouped into approximate categories.

6.1. Critical requirements

These requirements come from the lessons learnt from the current syntax and feedback and must be satisfied. The reported problems enumerated in Section 3.2 are given where associated with a requirement. Any new RDF syntax:

6.2. XML design requirements

These came as advice from the XML community and W3C XML working groups on how to modernise the XML to current best practice and make it easier to work with using other XML technologies and tools. An RDF syntax expressed in XML should:

6.3. Syntax conveniences requirements

Any RDF syntax intended for hand-production by end users should provide:

6.4. Extended RDF model requirements

These are not immediate requirements but the lack of an easy way to do these as modifications to RDF/XML influenced RDF Core from making changes such as these to the RDF model. Syntax for an extended RDF model would provide support for:

6.5. Conflicting requirements

The parts of RDF/XML that made embedding in non-validated HTML possible are also those that make up the excessive number of alternate forms (for example, all property attributes of RDF/XML could be removed and the syntax would be able to represent all the same graphs as at present). This means that a design for embedding in this way would clash with a minimal design. However, in this case, a design for embedding in XHTML would require DTD or W3C XML Schema validation via using XHTML Modularization so the same approach as RDF/XML would not be possible. More detailed discussion of these problems will be discussed in Section 8.

7. RDF syntax proposals

There have been several proposals for new syntaxes for RDF, both aimed as canonical transfer syntaxes, end-user syntaxes and a combination thereof. These have included proposals to add or remove functionality to RDF/XML or HTML to make embedding RDF more convenient, entirely new XML syntaxes, using existing XML technologies to define a transfer encoding and also non-XML proposals aimed at making things easier to write. DTD based approaches using the existing RDF/XML have been possible but only when the terms in the application are limited in scope[21].

It is clear that RDF/XML has already too many options in the ways to encode RDF graphs (although some people have proposed more). So a true subset of RDF/XML could be used as a recommended form. This is the approach used by Adobe's XMP[6] which encodes a profile of RDF/XML inside several formats (TIFF, JPG, PNG, HTML, PDF and others) to describe the content. Seven items were removed or changed from RDF/XML - rdf:RDF was made required and top-level containers, rdf:ID, rdf:bagID, rdf:aboutEach, rdf:aboutEachprefix and rdf:parseType="Literal" were forbidden. This smaller profile has been called ``RDF/XML-7'' and has been successfully deployed inside several Adobe file formats. Berners-Lee considered another subset of RDF/XML[22] but without the node/property element striping. This led to a rather complex set of additions in order to declare the current subject of the triple.

XML has a linking technology XLink[23] and a way to point to parts of XML documents (XPointer) that could be used to encode a graph similar to RDF. This was recognised early on in the design of these technologies - whereas RDF has links built in, XML has linking added outside the core. Daniel[24] described a mapping from XLinked-XML to RDF and more recently as ongoing work of the W3C TAG, they have been considering the kind of document that might live behind an XML namespace URI. This document potentially could link to several other resources such as style sheets, schemas and RDF descriptions. The current best proposal RDDL[25] by Borden and Bray is based on XLinks inside HTML. This provides a mapping to RDF/XML using XSLT however is not a general approach for RDF, targeted at a simpler problem of a catalogue of resources.

Berners-Lee's Notation 3 (N3)[26] (2000-) is a ``an academic exercise in language designed for a human-readable and scribblable[sic] language''. The N3 language and its primary implementation CWM describe a research language that includes functionality outside the RDF model. The syntax defines a text format using a BNF-like grammar that uses a lot of punctuation to abbreviate the RDF. Each RDF triple can be given as a set of three terms explicitly or abbreviated in a variety of forms using a form that operates like XML QNames in RDF/XML. Declarations are allowed starting with @ such as @prefix to attach a namespace URIs to a short prefix. This is similar to how Cascading Style Sheets (CSS) escapes from its text-based grammar to add higher level directives such as @import.

RDF Core designed N-Triples (described in RDF Test Cases[16]) as a true subset of N3, with no abbreviated forms allowed. This restriction and simplicity meant that existing N3 tools could read N-Triples documents and being a simple format to understand, it was quick to read, write and implement in dealing with test case descriptions. The advantages and disadvantages of non-XML formats are discussed below in Section 8.

A more recent proposal for a new XML format was Bray's RPV[27] ``designed to be entirely unambiguous and highly human-readable.''. It took a strong resource-centred approach describing a particular resource with the properties and values parts of the RDF triple very clearly written, using a small number of elements and attributes. It was restricted in the triples that could be written in the graph, for example providing no blank node or datatyped literals support and inventing a new base URI mechanism, parallel to XML Base but applying to individual triple parts.

8. XML and non-XML syntaxes

As already introduced, N-Triples[16] and N3[26] are existing RDF syntaxes that have been deployed successfully as a test case language and a format that is very compact and powerful. Designing a new syntax and not using XML has costs as well as benefits in terms of perceived simplicity that need to be drawn out. XML is generally required by W3C policy for serialisations except where it is excruciatingly painful.

A text format will typically be MIME type text/something such as text/plain. If it is sent without an encoding, the receiving software is required to treat it as US-ASCII. This means text formats lose one of XML's big wins - built-in Unicode. The CSS language is one widely used text web format which had to solve this flaw, and in CSS2 it gained an @charset directive to allow specifying of the charset. A similar directive could be added to N-Triples or N3. However, N3 was changed recently from being an US-ASCII format to UTF-8 encoded so that some native encoding of characters are possible, albeit with a restriction to what might be a non-preferred encoding.

Although a text based format might be superficially seen as easy to read and write, it does mean writing new tools that deal with the lexical analysis, grammar (and if used, Unicode decoding and encoding). These are the aspects that are implemented and made available by many well-tested XML tools and APIs that abstract from the detail and can be assumed to be available.

However, these formats do give (in the least abbreviated form, N-Triples) a very clear description of the RDF triples and can make the long URIs disappear from user view, when the XML QName-style abbreviations are used. (Both RDF Core and WebOnt use N-Triples with QName-style abbreviations in their documents). This gives the advantages of improved clarity and reduced verbosity that aid comprehension.

New syntaxes written in XML also have a cost, in terms of choosing which XML abstraction to base upon. The revised RDF/XML syntax used the XML Infoset[14] which is the (direct) basis of W3C XML Schema's PSVI and others. Earlier XML technology was designed on SGML, DTDs, DOM and more recently are new data models such as the ongoing design of the XQuery 1.0 and/ XPath 2.0 Data Model[28].

The SOAP Encoding (Section 3, [29]) allows the encoding of directed labelled graphs, although it is not yet clear if all RDF graphs could be transfered via this method (apart from embedding RDF/XML in a naive form). In particular it may be that there is no way to encode blank nodes or RDF datatypes - however whether this is possible is still an ongoing research issue.

9. New syntax approaches

A new syntax should be closely based on the RDF graph via the terminology in RDF Concepts and Abstract Syntax[7] so that it is complete, and also take into account the requirements given earlier (Section 6). In particular the critical requirements (Section 6.1) will be met if it closely aligns with the abstract syntax.

A new XML syntax that looks like the abstract syntax will tend to seem like an XML-ized version of N-Triples, if it is minimal. This is sufficient but does not meet the additional XML requirements (Section 6.2) that suggest using some more modern XML design ideas e.g. QNames. At present RDF/XML uses QNames only as the element and attribute names however newer XML work such as W3C XML Schema use and allow them as attribute values to identify concepts that are identified by a (namespace name, local name) pair. RDF does not use such identifiers, so QNames could only be make to define or refer to URI-references, blank node identifiers or literals. This suggests continuing the RDF/XML approach of concatenating the (namespace name, local name) to give a URI. However, QNames used in this fashion cannot encode all URI-references so cannot be used as the sole way to encode identifiers for RDF graphs, and thus there must be a way to give any URI. This tends to suggest having either both QName-style and longer URI-style approaches. However, allowing QNames in element content (or attribute values) causes problems such as invisibility from XML processors, XML Namespace scoping and with XML Canonicalization. Mixing QNames with URIs in similar fields can cause interoperability problems since the syntax of both are very similar - ex:prop is a syntactically legal QName and URI with URI scheme ex.

XML entities are another alternative for abbreviating URIs into shorter forms but they are tied very closely to DTDs and are also are not possible to validate with the current W3C XML Schema.

In terms of minimising the vocabulary used for an XML syntax, this means that the elements and attributes must be fixed, with the varying parts of the triples either in element or attribute content (CDATA, or defined by other W3C XML Schema datatype). Given the requirement to encode all RDF, this means that the distinction between URIs, blank nodes and literals needs to be made either by additional elements or attributes. The additional element for each part of the triple will tend to give a very verbose appearance as shown in Figure 9 although the <literal> element could be omitted here with the loss of regularity.

Figure 2: An RDF XML syntax using only elements
<triple>
  <node><uri>http://www.w3.org/Home/Lassila</uri></node>
  <node><qname>s:Creator</qname></node>
  <node><literal>Ora Lassila</literal></node>
</triple>

This syntax does not seem very ``modern'' although it is minimal in use of elements. It addition, it might be better to replace the node element with subject, predicate and object in particular to enforce current RDF model requirements on where URIs, blank nodes and literals can be used.

The main alternative to an all-element approach is to use XML attributes to indicate the triple part such as that shown in Figure 9.

Figure 3: An RDF XML syntax using attributes to indicate triple part
<triple>
  <subject uri="http://www.w3.org/Home/Lassila" />
  <predicate ref="s:Creator" />
  <object>Ora Lassila</object>
</triple>

This looks more similar to the kind of syntax seen in W3C XML Schema although the attribute names might be slightly different. It is now that introducing the xsi:type commonly used for indicating the content is W3C XML Schema datatypes would fit in well. QNames, URIs and blank nodes would be all needed which requires both defining and referring attributes for all of these.

The main syntax shortcuts that are very common and could be added are the rdf:type property and the collection and container forms. An additional type element could replace the predicate and object elements with a single <type ref="a:Class" /> although that would remove the clear triple view. The container and collections are patterns that respectively generate properties or more complex sets of nodes. These might benefit from support, particularly the latter which is very long to write out longhand, so an additional collection element with contained nodes could be added in a form something like that shown in Figure 9 (also using an xsi:type)

Figure 4: An RDF XML syntax with a collection of nodes
<collection>
  <node uri="http://example.org/resource" />
  <node ref="ex:anotherResource" />
  <node xsi:type="xsd:decimal">10</node>
</collection>

A new text-based syntax should be probably be something very similar to the above outline XML design, with influence from N-Triples and N3 given that they have been found relatively easy to explain (at least in the most regular triple form). The latter has an excess of punctuation for either a minimal or user-friendly language so would have to be cut down dramatically, but the most commonly used ideas given above have analogues in N3 (QNames, prefixes, datatypes, collections). As already discussed, careful updates for internationalisation support such as declaring of charset and enabling the use of local characters in URIs and literals would have to be designed, enabling as much flexibility as in XML such as an optional @encoding.

If a such a text syntax and an XML one were being designed, it would be a great benefit if they were of a similar level of complexity and preferably, providing as far as possible equivalent mappings to the same model. This has been successfully achieved with the RELAX NG XML schema language and it's text equivalent RELAX NG Compact.

The other main requirement of embedding in XHTML would clearly be most suitable for an XML syntax. In XHTML, given that the above outline uses element content, it would probably not work in the simplest of web clients that ignored the XML detail (however modern XML browsers will ignore <head> content). As previously mentioned, the XHTML Modulaization would be required for such embedded content, which also requires the application to perform schema validation of either sort, unusual for a web client. A recent strawman proposal from the XHTML working group, also being tracked by the W3C TAG was to use the existing HTML <meta> tag and give it an overloaded definition suitable for directly transporting RDF content. In order to keep validation, this would be in a future version of XHTML and not of immediate use.

10. Conclusion

The revision of the RDF/XML syntax into a more precise specification, based strongly on XML technology has allowed existing and new implementors to better update and write their applications. In the new form they can be machine checked them for correctness directly based on test cases in the specification. This has allowed several new and verifyably complete and conformant applications of the RDF/XML to RDF graph mapping to be available before the revising work process was fully completed.

The RDF/XML syntax as revised is a more workable and durable format for transfering and writing RDF. It is now both more complete in handling more of the RDF model from some critical additions and cleaner after removing what turned out to be underused and most unclear parts.

There were several issues that existed before revising RDF/XML and there are some that still remain. This paper has discussed future work on new XML and textual syntaxes, the approaches and compromises that addressing soem of them would require. It has shown that it is not trivial to make a clearly better syntax, that one syntax will not suit all purposes, and that there are both benefits and costs of persuing multiple ways to write the same thing, especially when they are written with different audiences in mind.

11. Acknowledgements

Thanks to Bijan Parsia and Jan Grant for their encouragement and support.

This paper reports on work done under the Semantic Web Advanced Development Europe (SWAD-Europe) project< http://www.w3.org/2001/sw/Europe/> funded by the EU IST-7 programme IST-2001-34732.

Bibliography

1
O. Lassila and R. Swick (eds.).
Resource Description Framework (RDF) Model & Syntax.
World Wide Web Consortium (W3C), February 1999.
W3C Recommendation, <http://www.w3.org/TR/REC-rdf-syntax>.
2
W3C.
RDF core working group.
World Wide Web Consoritum (W3C).
<http://www.w3.org/2001/sw/RDFCore/>.
3
D.C. Fallside (ed. part 0).
XML Schema 1.0.
World Wide Web Consortium (W3C), May 2001.
W3C Recommendation, <http://www.w3.org/TR/xmlschema-0/>.
4
DCMI.
Dublin Core Metadata Element Set, Version 1.1: Reference Description, July 1999.
DCMI Recommendation, <http://dublincore.org/documents/1999/07/02/dces/>.
5
D. Connolly, F. van Harmelen, I. Horrocks, D. L. McGuinness, P.F. Patel-Schneider, and L.A. Stein.
DAML+OIL (march 2001) reference description.
Technical report, World Wide Web Consortium (W3C), December 2001.
W3C Note, <http://www.w3.org/TR/daml+oil-reference>.
6
Adobe.
Adobe extensible metadata platform (XMP), 2001.
<http://www.adobe.com/products/xmp/>.
7
G. Klyne, , and J. J. Carroll (eds.).
Resource Description Framework (RDF): Concepts and Abstract Syntax.
World Wide Web Consortium (W3C), January 2003.
W3C Working Draft, work in progress, <http://www.w3.org/TR/rdf-concepts/>.
8
W. Conen, R. Klapsing, and E. Köppen.
RDF M&S revisited: From reification to nesting, from containers to lists, from dialect to pure XML.
In Proceedings of the first Semantic Web Working Symposium. Stanford University, July/August 2001.
9
W3C.
RDF issues tracking list.
RDF Core Working Group, World Wide Web Consoritum (W3C).
<http://www.w3.org/2000/03/rdf-tracking/>.
10
D. Brickley.
Understanding the Striped RDF/XML Syntax, October 2001.
<http://www.w3.org/2001/10/stripes/>.
11
J. J. Carroll.
Unparsing rdf/xml.
In Proceedings of the eleventh international conference on World Wide Web, pages 454-461. ACM Press, 2002.
12
T. Bray, J. Paoli, C.M. Sperberg-McQueen, and E. Maler (eds.).
Extensible Markup Language (XML) 1.0, Second Edition.
World Wide Web Consortium (W3C), October 2000.
W3C Recommendation, <http://www.w3.org/TR/REC-xml>.
13
J. Marsh (ed.).
XML Base.
World Wide Web Consortium (W3C), June 2001.
W3C Recommendation, <http://www.w3.org/TR/xmlbase/>.
14
J. Cowan and R. Tobin (eds.).
XML Information Set.
World Wide Web Consortium (W3C), October 2001.
W3C Recommendation, <http://www.w3.org/TR/xml-infoset/>.
15
M. Dürst, F. Yergeau, R. Ishida, M. Wolf, A. Freytag, and T. Texin (eds.).
Character Model for the World Wide Web 1.0.
World Wide Web Consortium (W3C), February 2002.
W3C Working Draft, work in progress, <http://www.w3.org/TR/REC-charmod/>.
16
J. Grant and D. Beckett (eds.).
RDF Test Cases.
World Wide Web Consortium (W3C), February 2003.
W3C Working Draft, work in progress, <http://www.w3.org/TR/rdf-tevstcases/>.
17
F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, and L. A. Stein (eds.).
OWL Web Ontology Language Reference.
World Wide Web Consortium (W3C), March 2003.
W3C Working Draft, work in progress, <http://www.w3.org/TR/2003/WD-owl-ref-20030331/>.
18
W3C.
Web ontology working group (WebONT).
World Wide Web Consoritum (W3C).
<http://www.w3.org/2001/sw/WebOnt/>.
19
J. Ferraiolo (ed.).
Scalable Vector Graphics (SVG) 1.0.
World Wide Web Consortium (W3C), February 2000.
W3C Recommendation, <http://www.w3.org/TR/SVG/>.
20
K.G. Clark.
Creative comments: On the uses and abuses of markup.
XML.com, January 2003.
<http://www.xml.com/pub/a/2003/01/15/creative.html>.
21
D. Beckett, E. Miller, and D. Brickley.
Expressing Simple Dublin Core in RDF/XML.
Dublin Core Metadata Initiative, July 2002.
<http://dublincore.org/documents/dcmes-xml/>.
22
T. Berners-Lee.
A strawman unstriped syntax for RDF in XML.
Technical report, World Wide Web Consortium (W3C), 1999.
Design Note, <http://www.w3.org/DesignIssues/Syntax>.
23
S. DeRose, E. Maler, and D. Orchard.
XML XLinking Language (XLink).
World Wide Web Consortium (W3C), June 2001.
W3C Recommendation, <http://www.w3.org/TR/xlink/>.
24
R. Daniel Jr.
Harvesting RDF statements from xlinks.
Technical report, World Wide Web Consortium (W3C), September 2000.
W3C Note, <http://www.w3.org/TR/xlink2rdf/>.
25
J. Borden and T. Bray.
Resource directory description language (RDDL).
Technical report, The Open Healthcare Group and Antarctica Systems, November 2002.
<http://www.textuality.com/xml/rddl2.html>.
26
T. Berners-Lee.
Notation 3.
Technical report, World Wide Web Consortium (W3C), 1998.
Design Note, <http://www.w3.org/DesignIssues/Notation3>.
27
T. Bray.
The RPV (resource/property/value) syntax for RDF.
Technical report, Antarctica Systems, January 2003.
<http://www.textuality.com/xml/RPV.html>.
28
M. Fernández, A. Malhotra, J. Marsh, M. Nagy, and N. Walsh (eds.).
XQuery 1.0 and XPath 2.0 Data Model.
World Wide Web Consortium (W3C), November 2002.
W3C Working Draft, <http://www.w3.org/TR/2002/WD-query-datamodel-20021115/>.
29
M. Gudgin, M. Hadley, N. Mendelsohn, J-J. Moreau, and H.F. Nielsen (eds.).
SOAP Version 1.2 Part 2: Adjuncts.
World Wide Web Consortium (W3C), December 2002.
W3C Candidate Recommendation, <http://www.w3.org/TR/2002/CR-soap12-part2-20021219/>.

Footnotes

... values.[FT1]
Although this isn't friendly to all XML tools such as XML Canonicalization, XSLT


Dave Beckett