« Minor new email | Main | Me as 80s New York lead guitarist »

Download as spreadsheet

A one-line text file tells your web server to send a directory's HTML files as "spreadsheets".

I used to think that a website's "download as spreadsheet" button triggered some back end process that created a binary Excel spreadsheet on the server and sent that to your browser, much like many "download as PDF" links do. It turns out that it's much, much simpler than that.

[Excel icon logo]

The key is that the server doesn't really send the file as a spreadsheet. It sends it as HTML with a MIME media type of "application/vnd.ms-excel". In other words, the server adds an HTTP header field that tells your browser "here comes an Excel spreadsheet, so display this with whatever program you use to view those", and Excel can open up HTML files. (Tell OpenOffice Calc to open an HTML file and it opens it up in OpenOffice Writer, the word processing program.) This way, users think that they're downloading spreadsheets. Lots of HTML formatting is preserved and numbers get treated as numbers, so that if after downloading you add a function like =sum() in a cell that references other cells, it does the math properly. You'll want your HTML to have one or more tables in it—there's not much point in sending a Shakespeare play to Excel.

The first few times I did this, I wrote perl and python CGI scripts to send the HTTP header identifying the MIME type to the browser with the file, but if you're using an Apache web server, an .htaccess file gives you an easier way to do it. Among other things, an .htaccess file lets you say "for files in this directory with extension foo, the web server should deliver them with MIME type bar." I first learned about these files when someone told me that this weblog's Atom feed wasn't being delivered with the correct MIME type, and he recommended that I fix it with an .htaccess file. I created the following one-line file in my http://www.snee.com/bobdc.blog directory:

AddType application/atom+xml .atom

To deliver HTML tables as spreadsheets with no CGI coding, I created a http://www.snee.com/xml/wks directory and put the following .htaccess file in it:

AddType application/vnd.ms-excel .html

This tells the Apache server to deliver any files with an extension of html that are in this directory (I certainly wouldn't want this to apply to all HTML files!) with a MIME type of "application/vnd.ms-excel". You can see a test result by looking at http://www.snee.com/xml/wks/wks1.html. I threw in two copies of a table and some other HTML elements to try to confuse Excel, but it wasn't confused. To see what the file really looks like, see http://www.snee.com/xml/wks/wks1.xml, an exact copy whose delivery is unaffected by the .htaccess file.

So, if your website includes tables and you'd like to offer viewers the option to download them as spreadsheets, you can keep HTML copies in a directory with an .htaccess file like the one shown above and point your "download as spreadsheet" links there. Or, if you have a more complex system that generates pages on the fly, the content generation routines hopefully give you some way to set the Content-type in the HTTP header to "application/vnd.ms-excel" for selected HTML output.

If I was writing this from home instead of a hotel room, I'd try it on our Mac and Linux machines to see if the file opens up in OpenOffice Calc. Of course, if I was at home, I'd have other things to do besides playing with MIME media type tricks.

TrackBack

Listed below are links to weblogs that reference Download as spreadsheet:

» Download HTML as Excel Sheet from Stefan Tilkov's Random Stuff
Bob DuCharme explains a neat trick to provide an “Excel export”, based on sending HTML with an appropriate MIME type. I remember one of my colleagues using this trick in a project; much easier than any POI or other more complex solution, an... [Read More]

Comments

(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)

Firefox 1.5.0.3 (on Windows XP) says http://www.snee.com/xml/wks/wks1.html "is an HTML document" and offers to open it with an application called, er, Firefox... Maybe they're content sniffing?

Hmmm, I'm using Firefox 1.5.0.3 on XP too--XP professional? It also works from IE, which interestingly enough tells me "You are downloading the file: wks1.xls from www.snee.com" even though the file is named wks1.html.

I just made another duplicate of the file and called it wks1.xls in the same directory, and I added the following line to .htaccess for good measure:

AddType application/vnd.ms-excel .xls

Firefox and IE both send http://www.snee.com/xml/wks/wks1.xls to Excel without a problem when I try them. Let me know if it's any better with your copy of 1.5.0.3.

Of course, the real moral of the story is that anyone who does this for a serious production app instead of just playing around like me has a lot of testing to do.


Firefox 1.5.03 on XP Pro: I saw the same as Dan B for the .html file. For the .xls file it opened fine in OOo 1.1.2. It seems to me to be a bit naughty of Firefox to be looking inside the URL which is supposed to be opaque. curl --head confirms to me that the headers returned for the two files have no significant differences.

That explains why I didn't have this problem when doing it the CGI way: because the URL of the "resource" ended with .pl or .py.