« Law metadata on the web | Main | XML, summer, and Oxford »

XHTML 2 for storing content?

People will use XHTML 2 for more than shipping pages to browsers.

While discussing RDF/A, Shelley Powers recently wrote:

I still believe that we don't need to embed RDF directly into our web pages because many web sites are dynamic now. As such, if one accesses the page as a human, you get data formatted for human consumption through a browser; if you access the page as a webbot, by attaching /rdf to the end of the document, the same data is formatted for mechanical consumption. No need to clutter up web pages, or make page creation or generation that much harder.

I think that will be used for more than delivery of content to browsers, adding a lot of value to the ability to add RDF metadata to XHTML 2. People will take XHTML 2 more seriously as a format for actually storing content than they ever took its predecessors for several reasons:

  • Improvements such as nested section elements with content-dependent h headers instead of h1, h2, h3, and their brethren will let us make documents structurally richer and therefore easier to slice and dice. The "Structuring Advantages" slide of Steven Pemberton's XHTML2: Accessible, Usable, Device Independent and Semantic slideshow makes some excellent points about this. The whole slideshow is worth reading.

  • I've noticed a trend among web designers psyched about to take more and more presentation out of their (X)HTML and store it in CSS. The ability to apply different CSS stylesheets to the same XHTML and have it look nice seems to be a mark of professionalism for them now. (Old SGML geeks are tempted to say: "Woo-hoo! Professional web designers are finally taking separation of content from presentation seriously! We won!") Maybe the web designers' move away from messy HTML is just a fringe benefit of their moves toward the CSS Zen Garden. It's still a huge benefit.

  • Structurally, XHTML 1 wasn't enough for some content applications, and DocBook—even DocBook Lite—was often too much. XHTML 2 will hit a sweet spot for a lot of applications, especially those involved in the interchange of content across workflow steps (which may cross business boundaries—think of it as B2B content). Content exchanged across workflow steps needs metadata; that's often how you know which workflow steps have touched a document. Separate metadata means more documents to track. Embedded metadata is part of the appeal of XMP, which lets you embed (some) RDF into binary files such as PDF and JPEG files. Embedded metadata makes a lot of things easier, and RDF/A will do this for XHTML.

If I'm writing something shorter than an entire book, I'm sure I'll mostly use XHTML 2 once its schema gets more settled. The ability to put a list or pre element inside of a p element will be very handy for tech writing. If someone needs my content in some other format, it will be easy enough to transform. looks to me like microformats done right, and it could even benefit the community more than the XHTML community as it spreads RDF beyond the ivory towers where it's been most comfortable.

I'm looking forward to XHTML 2, and RDF/A is a key reason. Again, read Steven's slides.


(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)

100% Agree. -m