Customizing nxml to find your schemas automatically

By namespace or document element.

The first time I loaded an RDF/XML document into Emacs with nxml mode, it automatically loaded the appropriate RELAX NG compact schema for me. I was especially impressed because RDF/XML has such a potentially tricky structure. (Perhaps too tricky, but that's another topic.) In its default configuration, nxml automatically loads the appropriate schemas for RDF/XML, XHTML 1, RELAX NG, DocBook, and XSLT. This last one has been my only real XSLT development tool other than actual XSLT processors for years.

For other document types, I'd go to the XML menu that nxml adds to Emacs, pick Set Schema, then File..., and then browse to the appropriate RELAX NG compact schema file. Because I edit a lot of XML files, this was adding up to a lot of time, and I just learned how to set up nxml to find most of the schemas I need automatically based on a document's namespace or document element.

All you need to do is to add new namespace/schema pairs to the right configuration file. The same configuration file lets me add more choices to the /XML/Set Schema/For Document Type cascade menu so that if I create an empty new document with an extension of "xml" and nxml has no other clue about what schema to use, I can pick the document type off of this menu instead of browsing around my hard disk looking for the schema.

When you assign a schema to a document with /Set Schema/File..., nxml asks if you want to "Save Schema Location to [directory of file being edited]". If so, it adds an entry to that directory's schemas.xml file that points at the document and at the schema so that the next time you load that document nxml will know what schema to use. After using nxml for years, I only just learned that the directory with the elisp files of nxml code has the central schemas.xml file where you can add elements and attributes to do everything I described above.

It's all described on Dave Pawson's nxml-mode Schema location page, but I'll summarize it here. The following element in this schemas.xml file adds an XHTML2 entry to the For Document Type cascade menu so that picking it loads an XHTML 2 schema:

<typeId id="XHTML2" uri="xhtml2/xhtml2.rnc"/>

The following tells nxml to load the same schema for a document whose document element is in the XHTML 2 namespace:

<namespace ns="" uri="xhtml2/xhtml2.rnc"/>

Another option is to automatically load a schema based on the document element, as opposed to the namespace. I wouldn't want to do this for a html element, because it might be an XHTML 1 or an XHTML 2 document. It's handy for DITA documents, though, which don't have specific namespaces. The following loads the appropriate schema for a DITA reference topic:

<documentElement localName="reference" uri="/usr/local/DITA-OT1.4.1/rnc/reference.rnc"/>

(To create RELAX NG compact schemas for DITA, use trang and then set the start pattern to equal reference.element in reference.rnc, task.element in task.rnc, and so forth.)

Sometimes I wonder if I'll ever do large-scale editing with anything but Emacs. Then, I find yet another way to make Emacs even more convenient to use, and I know that making such a switch would be an even bigger, more difficult jump.


I cooked up a Perl script (that I suppose I'm now committing to publish) that reads the same locating rules document so that I can say "xjparse foo.xml" and have it find the right schema automatically.

(The name xjparse is an historical accident, but it's what I'm used to typing).

Why perl?

Just kidding. Looking forward to seeing it...