RDF metadata in XHTML gets even easier

Elias Torres did the hard part; join in with the fun part!

I've written here before about RDF/A (now known as RDFa), the spec for embedding RDF triples into XHTML using existing XHTML markup. I've felt for a while that it holds great promise for making RDF easier to use and easier to incorporate into typical web pages, thereby allowing the creation of a real semantic web of RDF data. I had vague plans to write an XSLT stylesheet that would extract the RDF triples from an XHTML file's RDFa markup, and for sample input I did put together a test document that incorporates a lot of sample RDFa from a March version of the RDFa Primer.

[RDF logo]

While I put off writing the stylesheet that would do this, Elias Torres wasn't as lazy as me, and he went ahead and created an RDFa Extractor. The REST interface makes it easy to extract RDF/XML triples from an existing document; check out the triples from my first test document that it pulled out.

Many people are interested in RDFa for its ability to add semantics to existing data—for example, to add markup around a string of digits that already exist in a web page to indicate that the string is a http://xmlns.com/foaf/0.1/phone number. (As the primer tells us, "An important goal of RDFa is to achieve this RDF embedding without repeating existing XHTML content when that content is the metadata.") Because of my work with the PRISM group, I was interested in adding data that's a little more meta, such as production workflow data, in which the actual metadata values are not part of the content. After a few questions on the rdf-in-xhtml mailing list, I found this pretty easy. To test this kind of metadata, I created rdfa2.html yesterday to see what Elias's program would do with it, and the results are great.

There are several things that I like about these results. First, I put an empty string as the subject of the metadata about the document itself, as shown below, and Elias's extractor created triples with the document's full URL as the subject of the triples. It also created separate triples for the two dc:subject properties that I assigned to the document.

<meta about= "">
  <meta property="dc:title" content="Meta-metadata"/>
  <meta property="dc:date" content="2006-06-06"/>
  <meta property="dc:subject" content="metadata"/>
  <meta property="dc:subject" content="RDFa test document 2"/>
</meta>

Publishers in the PRISM group were concerned the ability to assign out-of-line metadata to specific sections of the document, such as a recipe or image within a larger document. To test this, the rdfa2.html document has the following in the content of its body element (the section element is a nice new XHTML 2 feature):

<section id="s1">
  <h2>Part one</h2>
  <p>This document has very little data, but plenty of metadata.</p>
  <p>It's my second RDFa test document. I created my 
   <a id="l1" href='rdfa1.html'>first one</a> several months ago.</p>
</section>
<section id="s2">
  <h2>Part two</h2>
  <p>This concludes our test.</p>
</section>

It also has the following inside the html/head element, where the meta elements shown earlier with metadata about the document itself are stored:

<meta about="#s1">
  <meta property="sn:goofinessFactor" content="3.2"/>
  <meta property="sn:direction" content="south"/>
  <meta property="sn:editor" content="lj"/>
</meta>

<meta about="#s2">
 <meta property="sn:goofinessFactor" content="4.3"/>
 <meta property="sn:direction" content="north"/>
 <meta property="sn:editor" content="tr"/>
</meta>

(When I test the assignment of arbitrary metadata, I like to pick some pretty arbitrary metadata.) Again, Elias's extractor extracted the triples I hoped to see. Right after those two sets of nested meta elements, I had a third that assigned metadata to the a link element inside of section s1:

<meta about="#l1">
  <meta property="sn:cost" content="0"/>
  <meta property="sn:lastChecked" content="2006-06-06T09:04"/>
  <meta property="sn:type" content="cite"/>
</meta>

The history of advanced linking architectures is mostly a series of arguments over the appropriate metadata to store with the address (direct or indirect) of the link destination, the one piece of information that a link can't do without. Different people have different ideas about what "typical" applications need, and a committee that comes up with a common set of additional metadata typically end up with a mess. RDFa gives people the ability to add whatever metadata they like (with the precisely defined semantics that can come from property names in specific namespaces), which could enable some big advances in linking applications.

This assignment of metadata to an entire document, to sections within it, and to a specific link within it was just some quick dabbling. There are many other ways that RDFa could be valuable, and Elias's extractor makes them easy to test. So get out there and create new RDFa! Just take some existing web pages, or mock some up, and add semantics to them—movie schedules, directions, parts catalogs, home pages—and see what Elias's RDFa extractor gets out of them.

4 Comments

Christoph Gorn has made a nice use of RDFa:

He already has SIOC profiles with RDF metadata about blog and its posts.

Duplicating it all as embedded RDF would not make sense, but RDFa can be nicely used to create rdfs:seeAlso links to the profiles with more RDF data about a post. That 'd be especially useful for pages with multiple posts per page, such as monthly archives.


This is a very nice, elegant approach. The ONLY thing I don't like about it as much as using microformats is that the data doesn't stay with the text itself. How would say, a feed reader handle this, and would it be able to extract a brief "sample" text chunk from the text and carry the meta data with it?


Jason,

That's actually the use case around which RDFa was designed, and what its designers more typically expect people to do. (I'm probably not the first to describe RDFa as "microformats done right.") Because of publishing use cases from the PRISM group, I was more interested in seeing how well it worked with more out-of-line metadata, and as it turned out, it works fine. That's why my examples focus on that.