XHTML 2 for authoring?

Suddenly it all made sense.

On Monday I gave a talk at XML 2007 titled XHTML 2 for Publishers: New opportunities for storing interoperable content and metadata. It used a lot of material from the article Put XHTML 2 to work now that I did for developerWorks, but with a greater focus on the potential value to publishers.

The session that preceded mine in the Publishing Track room was Where are XML authoring tools today, where are they going, and what do we want?, in which Marc Jacobson of Really Strategies moderated a panel discussion on XML authoring by representatives of Just Systems XMetal, Xopus, and Adobe. A major theme in that discussion was how Microsoft Word has set people's expectations for a writing environment, and a minor theme was how people learning about XML-based authoring can be intimidated by a huge number of elements to learn about possibly using for their document.

I had an idea, and discussed it with several people during the question session after my talk: how about having people author editorial content in XHTML 2? While previous versions of HTML were useful for little more than shipping pages to browsers for display, the additional structure, semantics, and metadata that you can add to an XHTML 2 document make it a more reasonable option for authoring content that may end up being used in a variety of media. There are plenty of people who have created web pages and don't expect to use Word to do so, so they don't have the expectation of a B button to bold text, an I button to italicize, and change bars to identify revisions. They're already familiar with the basic elements of HTML, and the few new XHTML 2 ones they'd have to learn (for example, h for headers and the section element) are intuitive enough for them to pick up without much trouble.

If I was going to set up such an authoring environment, I'd customize the XHTML 2 schema to impose a few more constraints, none of which should appear illogical to people who've created web pages before and none of which would make the documents invalid XHTML 2. For example, instead of letting authors put h, h1, h2, and other header elements anywhere they wanted, I'd remove the h1 through h6 elements from the schema and only allow h elements as the first element of body and section elements. (Perhaps I'd even require h as the first element of a section element, depending on the content being authored.)

To add additional semantics, I might make XHTML 2's new role attribute required for certain elements and specify a list of allowable values that could be entered there—again, depending on the content being authored. If an application that used this content needed XML that was not XHTML 2, these values could be used as hooks to transform the content to conform to another schema.

Metadata would also depend on the needs of the shop setting up the authoring environment, and because mixed content isn't an issue for this, XForms or InfoPath forms would be a sensible way to gather this information and then insert it as RDFa in the appropriate places in the document.

There are cases where this wouldn't be a good idea, but there are cases where it could be a good idea. It fits in well with the main thesis of my talk: unlike all versions of HTML being developed before (or concurrently with) XHTML 2, XHTML 2 can be useful for more than just shipping pages to browsers for display.


We have found XHTML2 to be perfect for legal documents. The defined elements in XHTML2 cover all structural requirements and RDFa allows the creator to add stuff from the legal domain.

On a national level we define a basic vocabulary for law makers. Each government authority can then add their own domain specific information in the same document. In the end, the XHTML2 document carries a lot of information and can be parsed from different perspectives and be converted to HTML/PDF/Whatever.

I foresee that XHTML2 may be great for this type of work but I doubt that it will ever become the preferred way to create web pages.

On the downside (currently) is the lack of tool support. We implemented our own in-browser editor to create legal documents (e.g. it has buttons for "legal paragraph" and other things from the domain). We have also extended MS Word.

What do you use when you want to provide an editor for people who shouldn't see the XML (i.e. WYSIWYM)?

I haven't assembled such a system for others, and for my own work I use Emacs+nxml (see http://www.snee.com/bobdc.blog/2007/04/using_xhtml_2_schemas.html ).

After nearly five years at LexisNexis, I found it very interesting to hear that you use XHTML 2 for legal documents. What governments are using this (or working toward doing so), and is the documentation available on the public web? Has it been a problem that XHTML 2 is still in Working Draft status?