« What Shelley said | Main | A nice free XML editor »

Praising DITA

For the wrong and right reasons.

There many reasons to like the Darwin Information Typing Architecture, but much of the praise for it lately seems a bit misguided. For a lot of XML products and services companies, DITA is the new bottle in which to put their old wine. They talk about how DITA is great because it lets you:

  • write content once and then automate its use in multiple media (streamlining the publishing process, etc. etc.)

  • mix and match blocks of content to create new products on the fly

  • reduce dependency on proprietary tools

  • select subsets of content based on metadata in attributes

These are all great things, but XML technology had them before DITA came along. Take a look at the boldface bullet points in the SOA World article Improving Customer's SOA Experience with DITA (a product of the ever-opportunistic SYS-CON folk—let's give them extra points for working two trendy acronyms into the same article title, but take one off for bad punctuation): every one applies to to pre-DITA XML, and even to SGML. (Perhaps "Translation efficiency and acceleration" wouldn't be as easy in SGML—people forget how much easier XML's Unicode base made a lot of things.)

Pre-DITA schema customization

While the idea of customizing DTDs isn't new with DITA, to me DITA's greatest contribution is the new possibilities it offers for DTD/schema customization.

Most good XML and SGML schemas were customizable before. At the XML 2005 conference I did a presentation titled Your schema and the industry-standard schema on how to evaluate the customizability of a standard schema as you consider adopting it. That paper goes into more detail on the syntax for the hooks typically used to allow customization, but to summarize, there are two basic techniques that can be mixed and matched. First, instead of defining an element's content like this,

<!ELEMENT article (title,(paragraph|picture)+)>

you can define it using a parameter entity and reference the parameter entity:

<!ENTITY % article-content "title,(paragraph|picture)+">
<!ELEMENT article (%article-content;)>
DITA's greatest contribution is the new possibilities it offers for DTD/schema customization.

Your customized version can reference the DTD with these declarations and then redeclare the article-content parameter entity to have any content model you like.

If a DTD doesn't want to allow a complete replacement of its article content model and wants to instead just provide a hook to add new things, it can use the second technique for customization, which looks more like this:

<!ENTITY % article.content.cust "">
<!ELEMENT article (title,(paragraph|picture %article.content.cust;)+)>

This still requires that an article begin with a title and allows paragraph and picture elements in the mix of what comes after that, but it lets you redefine the customization parameter entity article.content.cust from its default value as an empty string to something like "| warning". Then, as long as your customization declares a warning element somewhere, that element can be part of the mix of what follows the article element's title.

As I described in the paper accompanying the XML 2005 presentation, well-behaved DTDs such as DocBook and TEI provide many such opportunities for customization; badly-bahaved DTDs like NITF don't.

DITA's new approach to DTD customization

DITA defines a topic element, with a structure suitable for technical documentation, and a map element for assembling topics into a sequence, hierarchy, or whatever you need for your output media. DITA also declares three specialized versions of topic called concept, task, and reference, and a mechanism for creating your own specializations of topic, task, reference, concept, map, and other elements.

This specialization mechanism is DITA's real contribution to the XML world, because it offers new levels of customizability—not new levels above and beyond the old levels, but new levels in between "just use the existing content models" and "rewrite the content models" that were the choices before. This is great news for those who found the other two choices a little too far apart. (Before I go further, Norm Walsh has shown that all of these ideas can be implemented in DocBook, but by limiting DITA's domain to be a little narrower than that of DocBook, its developers seem to have created something that appears less complex and therefore easier to use for people in that domain: topic-oriented technical documentation.)

I won't go into the mechanics of DITA specialization here, but their key advantage is that a processor for a base version of an element can process a specialization that it didn't know existed. For example, if you have a recipe element based on DITA's task element, a DITA-compliant XSLT stylesheet designed to create HTML versions of task elements can do the same with recipe elements, even if the stylesheet was written before anyone had the idea to create this recipe specialization. It's similar to the object-oriented technique of treating objects of derived classes as objects of the base classes, although the DITA analogy with OO development can get pushed too far.

This feature of DITA has its roots in a noble SGML concept known as architectural forms that never made much progress. The "conversion to HTML" part of my example above demonstrates why the DITA version of this concept got so much more traction than the original architectural forms idea: because there's a working implementation that can convert your DITA documents, regardless of their level of specialization, into HTML, PDF, Java help, troff, RTF, and more—the DITA Open Toolkit. People are using it to get useful work done now.

Watch out for products whose DITA support consists of bundling the DITA DTDs and the open-source DITA Open Toolkit with their product and then documenting this "support" by rehashing OASIS documents telling you what DITA is. Proper support of DITA means proper support of DITA specialization. For example, if you take your editor from a document using a DITA DTD to a document using a specialization of that DTD (for example, a recipe document), you want to see all the DITA features still available in the editor to edit that second document. Eliot Kimber recently wrote about some related issues in Automatic Handling of DITA Docs In XML Editors.

As much as I like this new approach to DTD customization (apparently, it works for W3C Schemas as well, but most support is a little behind there), it seems that a surprisingly small amount of DITA users are actually customizing the base DTD. I guess DITA's appeal for them is the straightforward recipe (no pun intended) for a topic-oriented organization of technical material and the free toolkit for converting this content to all the other formats, which, being open source, is easy to integrate into larger applications. (Don't get too excited by the existence of these transformations—what I've seen of the output isn't very slick looking, and would need some work before a professional publisher would want their content delivered to their customers looking like that.)

Regardless of the reasons for its appeal, DITA is hot. Above I mentioned that many XML products and services companies are trumpeting their comfort level with DITA, and I do work for a company that offers XML services and has a lot of DITA-related promotional material on their website. I had no hand in writing this material, but I do like its realistic tone of "DITA can be a big help, but it's not a magic bullet, here are some important things to think about", and I've recommended to new co-workers who want to learn about DITA that they read those web pages. The free and commercial software and acceptance by the tech writing community of this relatively new XML standard have given it good traction and a bright future.


(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)

Excellent summary. I'll add that I do think DITA's relatively strict definition of topic, which in turn enables the use of DITA maps across a diverse range of content, is one of the reasons that DITA has been able to deliver scalable single-sourcing - something that has been traditionally promised by XML, but hard to achieve without a lot of planning and discipline (which DITA doesn't replace, but does provide a headstart with).

On the specialization front, anecdotally I've seen about half of DITA's adopters using specialization, which sounds like more than you've seen. A typical adoption curve is use DITA out-of-the-box for a year or two, then add some small-scale specializations, then begin building out additional specializations as you hit particular needs. In other words, you start simple, then evolve the architecture along with your understanding, rather than trying to do everything at once in the first adoption phase.

Thanks Michael! I like the idea of the headstart that doesn't replace planning and discipline. This could be a recurring theme when discussing a variety of aspects of DITA, ranging from the structure of maps to the transformations available in the DITA Open Toolkit.

Can you expand a bit on this comment about DITA relative to Docbook: "by limiting DITA's domain to be a little narrower than that of DocBook, its developers seem to have created something that appears less complex and therefore easier to use for people in that domain: topic-oriented technical documentation."

It was my impression that DocBook was actually more limited in the sense that is designed for a single technical narrative. It has a particular use case in mind, and is designed around that use case. DITA seems more like a toolkit approach for writing building blocks of documentation.

Is that right? If so, how do you see DITA's domain actually being narrower than DocBook?

That's what I meant by "topic-oriented technical documentation". There are lot of different elements that can serve as the root document of a Docbook document (book, article, etc.), so in my opinion it can be used in a broader range of cases than DITA. For example, a Docbook document could be organized by topic (or some rough equivalent) with less shoe-horning than would be necessary if using DITA for a narrative document.

This reduced amount of flexibility in DITA makes it easier for people to get a handle on exactly what good it can do them--sometimes extra flexibility is is more work for people as they figure out what profiles are available and which fits their needs best.

I think the distinction is appropriate (topic-oriented vs. narrative-oriented), but I'm not sure about which is broader these days, or which is easier to adapt towards the other.

Topic-oriented info is pretty much the norm for any professional tech pubs group, and with the rise of wikis and component-oriented CMSs it's rapidly becoming the norm for most new content on the Web or intranet. This doesn't mean that narrative content goes away, but I do think the broader use case probably is topics these days.

While it's hard to disagree with a man as likable as Mike Priestley, I believe the "DITA is for topics/DocBook is for narratives" meme is a false dichotomy. Nothing prevents you from writing effective, topic-oriented content with DocBook. As a technical communicator who uses DocBook every day, it's the only way I work. DocBook gives you everything you need to write granular, topic-oriented content. You can detail a task, reference information or a concept, recurse the heck out of it within your document (online help, web page, pdf, or whatever) reuse it in other documents, or stick that bit of content in a CMS and do all those things you'd expect that system to do.

Mr. DuCharme's old wine, new bottle argument is spot on. I've long asserted that DITA's "hotness" has more to do with the marketing dollars spent by IBM and the rest of the vendor community, than with any intrinsic feature or function DITA provides. While there may be specific reasons for one person to prefer DITA over DocBook, my experience tells me that it is largely about preference. One might prefer DITA because it seems simpler than DocBook ("Martha, just look at all those tags!); seems more suited to online work ("DocBook? We don' nee no estinkin' books!"); or, because it seems more appropriate for writing topics ("Our help is organized by topics, except when we publish it as a PDF; then, it's a narrative."). All these are fine and appropriate reasons, but none of them have anything to do with one standard having more juice than the other.

Yes, yes. I know. Except for specialization:

DITAdroid: "That's right. You can specialize with DITA. That way, when you're entire enterprise writes everything in DITA, you'll be able to use the Development team's use cases in the Marketing team's white papers."

You: "The whole enterprise writes in DITA? Even marketing?"

DITAdroid: "Yeah. Isn't it great!"

You: "Yeah, except, if you choose DITA because it
is "easier," then why would you wade in the deep
waters of specialization?"

DITAdroid: "Because, silly, once you get
experience with DITA, it's SOOO much easier
to specialize."

You: "Right, just like with DocBook. Once you
gain experience with it, the full range of its
possibilities for writing reusable,
topic-oriented, online content, and even - gasp -
narrative-oriented content destined for the
landfill (we called them books) becomes easier to
understand and implement."

DITAdroid: "No."

You: "Why?"

DITAdroid: "Because...Because the DITA guy told
me that DocBook is for books."

You :"Right."

DITA is cool. DITA rocks, rolls, and still loves you in the morning. Like any new love, she's hotter than the old one. But, is she really all that different? Not so much, I think.

IBM had topic-oriented authoring guidelines long before we had DITA; yet the move to DITA still caused shakeups and rewriting, because they forced teams to confront the issue of topics and chunking to fit their content in the architecture. One of the roles of XML is to enforce your content model - if your content model includes topic-orientation, it makes sense to use an XML architecture that includes that constraint, rather than punting to editorial guidelines on a key information model issue.

You don't mention DITA maps, which is interesting - I see them as a key component to the architecture, as important as topics or specialization. And maps don't work if you don't have predictable, addressable topics.

On the specialization front, generally people use specialization when they have business rules or information model constraints beyond just topic orientation that they want to enforce. That's why concept, task, and reference exist as specializations, along with all the user-created specializations out there, and why DITA subcommittees are actively developing industry-specific specializations. I recommend you read a few DITA specialization case studies, or ask on the dita-users list, if you want more detailed examples.

I like DocBook for what it does, but it doesn't enforce topics, it doesn't do maps, and it doesn't do specialization. If these three things don't matter to you, then you probably shouldn't consider DITA. But clearly they do matter to a lot of DITA users who are making active use of all three capabilities.

Just to add to what Michael said, Norm has shown that you could implement map- and topic-like constructs in Docbook, and it's very customizable, but these particular constructs are easier in DITA because of the head start it gives you if your information fits well into these structures. (That's why I liked his "head start" image.)

I'm sure that IBM has a budgeted strategy regarding it use of and participation in DITA work, but the term "marketing dollars" is a bit much--it's not an IBM product for sale like Websphere or DB2, but a standard that they support and want to see grow. I've written entire books in DocBook, and may write more, but in my role as an employee of a consulting firm that specializes in standards-based automated publishing, I see plenty of situations where DITA is more appropriate.

Mike, I appreciate your passion for DITA. I respect your deep knowledge of the standard and how it can support the work of technical communicators. I've been the beneficiary of your instruction and am happy to say that nearly everything (everything that's correct, anyway) I know about DITA, I first learned from you. Just so we're clear, I think DITA rocks. I don't believe I've ever once suggested that it fails to deliver the goods for online content (and, this latest iteration is well on its way to giving us more of what we need for print). If you say folks are falling over themselves to specialize, well more power to them - specialize away. And, while I can do the much same with DocBook, DITA maps are pretty neat, too.

My primary beef is that people who know better seem to consistently mischaracterize the capabilities of a mature, time-tested standard that continues to offer benefits to tens of thousands of users across the globe. These mischaracterizations end up misleading the uninitiated into believing that DocBook can't do what it can and isn't suited to do what it does. It may lead them to abandon an approach that gives them 99% of what they want and gives it to them as fast and as cheap as can be. I have a problem with that.

You know we can write topic oriented content in DocBook, and we can write narrative content in DITA. We can be as strict or as loosey goosey as our hearts desire. Whatever topic enforcement is purpose built into DITA (or, is not in DocBook) is only as strong as the willingness of the writer to obey those restrictions. There's no invisible hand guiding authors working in DITA compelling them to write tidy, compact and standalone topics. That's a skill to be learned, and - as you say - an approach that requires planning and discipline. DITA's "head start" might make it a bit easier, but it alone is not sufficient.

All I am saying is that it is misleading to assert that DocBook is unsuited for topic oriented content. That's demonstrably false. There may be plenty of reasons to choose DITA over DocBook (maps, specialization, hanging with the cool kids, whatever), but topic orientation is not an item where the two standards offer much of a difference. I've written books in DocBook, will write more, and as an owner of a consulting firm see situations where DocBook is more appropriate and situations where it is less appropriate. I don't have a horse in the race, and I owe my clients my objective opinion derived from a study of their requirements and the technical landscape. Other folks do have a vested interest in the growth and adoption of one standard, and when these folks consistently misrepresent and minimize the capabilities of a - like it or not - competing standard, that get's my attention.

As for IBM and marketing DITA, puh-leeze. Who employs Day, Priestley, Schell, Anderson, Hennum, Hunt, et. al.? Who pays their travel and perdiem at the dozen or so conferences they attend each year. Who sponsors these conferences, pays for developing the Toolkit and the Task Modeler, provides these folks time for writing articles in DeveloperWorks, etcetera? IBM. Every dollar they spend, compounded by the dollars spent by the dozens of smaller companies pushing their particular DITA "solution" is a marketing dollar for DITA. Why is it that so many folks of common sense and uncommon intelligence seem to squirm at the mere mention of this?

Tony, I gave you a concrete example of the difference between system-based topic orientation versus guideline-based. I am not slamming DocBook, I am reporting personal experience. That experience has been validated by dozens of other early DITA adopters, who have had to shake up content to fit it into DITA, and have benefitted from the result.

In terms of marketing dollars, my main job is not promoting DITA, nor is that the main job of the others you mentioned. We're fortunate to have supportive management, and in some cases conferences that are willing to help with the cost of travel to get us where we're needed. Does Norm Walsh's work on DocBook represent "marketing dollars" from Sun?

You're going to have to spread the credit for DITA's success much more widely. I'd start with the list of companies and individuals on the DITA TC page (which includes Sun, by the way).

You're so very right, Mike. Thanks for DITA. Thanks to the DITA TC membership for all your work. Thanks also to

Adobe Systems
BMC Software
Citrix Systems, Inc.
Comet Communication
Comtech Services, Inc.
Intel Corporation
Justsystems Corporation
Nokia Corporation
Oracle Corporation
Sun Microsystems
The Boeing Company
US Department of Defense

for their continuing investment in the work of these good people.

Thanks for DocBook, too. Thanks to Norm and Bob and the DocBook TC membership for all your work. Thanks also to

Reed Elsevier
Sun Microsystems

for their continuing investment in the work of these good people.

And, to all those people and organizations whose work benefits both communities, thanks and thanks again.

You've helped me feed my brain, feed my kids, and feed my wife's appetite for designer shoes!

I owe you, man. Big time.