ERH Tired of Acrobat PDFs. Me too.

PDF is a great format, and it's used way too much.

I keep a file of notes for ideas for potential postings in this weblog. Here are the notes for one idea:

tired of PDF (impedance mismatch between screen and page metaphor; printed pg numbers vs. Acrobat ones; marketing "fact sheets" in PDF--why not HTML?)

In a recent Mokka mit Schlag posting titled PDF killed the Programming Language, Elliotte Rusty Harold has beaten me to it (managing to work in a nice Buggles pun as well), but I'll add a few points.

I'm tired of people who create content on a computer screen and deliver it for viewers to see on a computer screen with that content optimized for the printout that only gets printed when it's time for their boss or art director to review it. After all, real brochure-ware should look like a nice brochure when the customer wants to print it, right?

Those customers don't want to nearly as often as the art director may insist. Customers read product fact sheets because they want to see the facts about the product, not because they want a hi-res view of some designer's design skills. Gratuitous use of PDF these days is in some ways worse than gratuitous use of Flash, because at least Flash is about creating things on a computer screen for viewing on a computer screen. PDFs are bad for viewing on a computer screen.

Good things about PDFs

Adobe's Why PDF? page lists some obvious advantages of PDF files. (And, bless them, this page is HTML, not a PDF.) They call it an "Open format", but "open" here is software corporate-speak for "documented", which is certainly a Good Thing, but not open in the sense of "to any outside influence", which is what "open standard" means to most people, even (finally) to Sun.

The same page also lists "Multiplatform", "Accessible", and "Searchable" as advantages. I can't argue with the first. While I can't argue with "Accessible" either, I'd be curious about the opinion of someone who studies these issues more closely. "Searchable" I'll take with a grain of salt—I know I can't write an application that could search through PDF files. (Advocates of so-called "binary XML" forget that a key advantage of XML as defined in its spec is that, as a text-based format, writing code to search and manipulate it is very, very easy.) Adobe also says that Acrobat files can "maintain information integrity"—I wouldn't have worded it the same way, but it can be an advantage to know that page breaks will happen in the exact same place with any viewer on any platform, if you really care about page breaks. Many of my company's clients really care about page breaks, because they're publishers who have products that they send off to printing houses. PDF minimizes so many worries about page layout, fonts, and other printer-related issues that for this purpose it's a wonderful thing. Still, while page fidelity is important in contracts and certain other documents, it shouldn't matter to a tutorial or product fact sheet.

In general, you can't edit PDFs, although Adobe won't list this as an advantage. This is probably because it falls under "information integrity", but more importantly, they have software that lets you edit it. Few people own this software (compared with the number of people who own PDF readers), so when you send a PDF to multiple people, you know they won't screw around with it. Unless you're deliberately asking for revisions to a document, sending a PDF is almost always better than sending a Word file. If John sends a Word (or HTML) file to Jane and Jane forwards it to Jim, how does Jim know that Jane didn't alter it en route?

I suppose that outside of the publishing industry the key advantage to PDF files these days is that they're not Microsoft Word files. In addition to not being able to edit them, people can't add macros to them that will screw up your computer. When someone at my daughter's school's parent association sends a Word file of a flyer for some event, I wish that it was a PDF, but they don't know any better. My wife was very happy when I showed her PDFCreator, an open-source Windows program that appears in the printer menu of any program that can print and sends your output to a PDF file instead of a printer. For more advanced use, OpenOffice and free and commercial XSL-FO implementations offer more PDF creation choices. I think we can assume that the programmers that Elliotte mentions in his weblog posting are more technically sophisticated than the people running a typical primary school parents council, so they have less excuse for using PDF.

Bad things about PDFs

I want to focus on disadvantages of the PDF format itself, and not the Acrobat Viewer program. Acrobat Viewer is free, but it's big and slow and bloated, and as Adobe's foot in my door it spends too much time reminding me about "critical" upgrades and trying to sell me things I don't need. However, you don't have to use Acrobat Viewer to view your PDFs; last June I discovered a nice alternative.

When you present printed pages on a glowing screen, there's a certain impedance mismatch, and the advantages of fidelity to printed pages (for example, knowing that when you say "bottom of page 12", it will mean the same to all viewers) are often outweighed by the disadvantages. I'm especially tired of looking at on-screen representations of the gap between the bottom of one page and the top of another; what good does this do for us?

[PDF page break]

How about when Acrobat tells you that you're looking at page 13 of a document, but the bottom of that page say "15"? Perhaps I could hunt through the twenty categories under "Edit Preferences" for some check box that lets me change this, but why do I need to? Come to think of it, when you say "bottom of page 12", maybe it doesn't mean the same thing to all viewers.

In a comment on Elliotte's posting, John Cowan points out that an article about a programming language, when written for (or aspiring to) a peer-reviewed journal article, is taken more seriously when it's in PDF format. That's fine, but as Elliotte wrote of the offending web site, "it seems all their tutorials, manuals, white papers, and almost everything else are in PDF". There's just too much PDF out there. The number of FAQs alone that are in PDF instead of HTML is just shameful.

I think that companies that are large-scale publishers as a by-product of their main business are getting ready to move past PDF. You've heard of the jet fighter planes whose total printed documentation weighs more than one of the planes? Step one for easing the access of repair engineers to the relevant documentation—the process of putting it "online", as they still call it—has been to create PDF versions of those books. A CD or DVD full of PDFs is more convenient than a wallful of books to someone crawling around the inside of a plane's engine compartment, but pictures of pages with all the negative issues described above are not the best way to present the relevant information to these repair engineers. Designing a better interface for this content delivery is work, but more of these companies are realizing that the work is worth it.

Reliance on the page metaphor may be a symptom of the historical moment as we spend a few decades completing the transition from hard copy books to more sensible online delivery for the appropriate content. (Note my little qualifier at the end of that sentence—I'm sure that for novels read on the beach, bound hardcopy books will remain the best delivery medium for years to come.) You could compare it to the state of movies before D.W. Griffith, when each film was little more than a single static shot of a filmed silent play.

Many publishers are now picking up hints that it's time to move on from the page metaphor. I recently attended some meetings in which publishers were discussing ways to control costs, and the first step is discussing ways to measure costs. Cost per page is a classic measure in the publishing world, but everyone in the meeting agreed that it's increasingly meaningless as more content is delivered online.

My former employer LexisNexis recently renamed the division of the company that creates books to Offline Products. Considering that everyone used to consider online delivery to be an alternative to print delivery, it's interesting that someone would now define print delivery as essentially being "not online" delivery. Outside of diagrams accompanying patents, I don't know of any content that LexisNexis deliver as PDFs, so they have the right idea. A brief check around (which unlike or is more of a marketing web site than a product delivery one) didn't turn up any PDFs either, and those marketing types are usually the quickest to think that PDF is better than HTML. Although this is a company that was founded to deliver content online, they're still capable of being behind the curve on various technical issues; it's nice to see that they have their PDF/HTML priorities straight.

To summarize: PDF doesn't make your content look slicker or more professional. It may make the printed output look better, but make damn sure that most of your content's readers intend to print it before reading it. And, make an honest effort to create an HTML version that prints nicely. Smart people are making progress on this front all the time.


The PDF format actually can deal with matching the page numbers in the PDF to the page numbers in the content; I have seen PDFs that do so, with introductory pages 1-8 followed by body pages 1-107 or whatever, and googling suggests that even roman page numbers can be handled.

Of course, a PDF creator has to understand the content in order to generate such page numbers. A printer driver, for example, can't; it doesn't know which feature of the page being printed represents a page number. OpenOffice could but (as of 2.1) does not. Unfortunately, I can't file a bug on this because I can't persuade OO.o to validate me as a bug-tracker user.

All I know is that Adobe Reader frequently "forgets" that I want to view documents fit to width and continuous pages. It's still set in preferences. What I think is happening is that the Reader lets the documents hijack the settings. Why it would be so important to let the document enforce these settings is beyond me.

The performance issue is huge. I've started using FoxIt Reader for this very reason. It loads PDFs blazing fast, and it's also free.