Linking to (Almost) Any Block Element of (Almost) Any Web Page with addids.cgi

Bob DuCharme
January 13, 2004

Introduction
Creating a Link
Geekier Details
Existing Work
Disclaimer

Introduction

In the early days of the web, you could only link to a specific point within a web page if that point had an a element with a name attribute. Recent releases of the Mozilla, Internet Explorer, and Opera web browsers, however, let you link to any element that has an id attribute. (More on this in a weblog posting I did.) Hopefully, more and more web development tools will start adding id attributes to more block elements; I'm trying to get into the habit of doing it to everything I write.

Meanwhile, I've written a CGI script named addids.cgi ("add IDs") that creates a temporary copy of any web page you pass to it, with IDs added to block elements so that you can create links to any block element you like in that temporary copy. For a web page that doesn't change much (not, for example, the home page of a newspaper's web site), nearly all generated IDs will be the same every time a temporary copy is generated. This means that you can look at a copy created by addids.cgi, create a URL that links to a specific point within that copy, and send that link to someone else with reasonable confidence that it will show them the same point in the document.

A few random tests show that it works with some slick commercial sites (I linked to stories in the archives so that the examples would last longer): The BBC ("The varying hotel guests in each episode...") , Rolling Stone ("On 1971's Gets Next to You..." ) and a Vignette Storyserver-generated Time Magazine article ("Ethiopia: Tackling terror in East Africa." Scroll up for slickness.) For a layout so complex that the CGI messes it up (for example, Wired) there may be a "Print" version of the same story that's easier to link to ("Paper modeling reached the zenith..."). I found that it doesn't always work properly with IE 6.0 under Windows, but it seems to work fine with Firebird .7, Mozilla 1.5, and Opera 6.1 under Windows and IE 5, IE 5.1, Safari 1.0 under OS X.

Creating a Link

To create a URL that passes a web page to addids.cgi, simply add the web page's URL to the end of this:

   http://www.snee.com/cgi/addids.cgi?url=

(Make sure that it's a complete, legal URL, including a schema protocol designator such as http://.) To automate this, you can paste a URL in the following text field and click Submit to see the result.

URL of web page to have IDs added:

Another way to do this is to right-click this JavaScript bookmarklet and add it to your bookmarks so that selecting that bookmark when viewing a page will create an addids version of that page with the URL for it showing in your browser's address/navigation toolbar. (I got the idea for this from a similar trick used for the PurpleSlurple™ utility described below.)

To create a link to a specific point in the ID-enhanced copy of a web page, follow these steps:

  1. Find some text near the point you want to to link to. For example, let's say I want to link to the passage in "Moby Dick" beginning "No, when I go to sea" in http://www.princeton.edu/~batke/moby/moby_001.html.

  2. Pass the web page to addids.cgi. For our example, the URL would be http://www.snee.com/cgi/addids.cgi?url=http://www.princeton.edu/~batke/moby/moby_001.html.

  3. Tell your browser to display the source of the copy created by addids.cgi. In Mozilla, pick Page Source from the View menu; in Internet Explorer and Opera, pick Source from the View menu.

  4. Search the source for the text at the point you want to link to. The window that displays the source will have Find as a choice on one of its menus.

  5. After finding the text, look for the closest preceding tag that has an id element. In the source for http://www.snee.com/cgi/addids.cgi?url=http://www.princeton.edu/~batke/moby/moby_001.html, you'll see this:

    the mummies of those creatures in their huge bake-houses the 
    
    pyramids.
    
    <p id="i255">
    
    No, when I go to sea, I go as a simple 
    
    sailor, right before the mast, plumb down into the forecastle, 
    
  6. Add a pound sign ("#") and the ID value to the end of the URL that passes the web page to addids.cgi, and you've got a link to that point in the copy: http://www.snee.com/cgi/addids.cgi?url=http://www.princeton.edu/~batke/moby/moby_001.html#i255.

What if you reach step 4 above and the element you want to link to already had an id attribute, and addids.cgi added a second one? Then you didn't need addids.cgi in the first place, and could have just added the pound sign and ID value to the original URL. For example, this document already has IDs added to it by a stylesheet that I use for this purpose.

Geekier Details

I'm not a hardcore python programmer, and would appreciate any suggestions from those who are for ways to improve the code. Still, I was impressed at how much I was able to do with a minimum of code—without comments, blank lines, and error checking, about 42 lines in all. That's python for you.

The most important part of addids.cgi is the use of a checksum for a given line to create its ID value. If addids.cgi generates a checksum value that was already used in an earlier line in the file, the new one is incremented until it reaches a value that hasn't been used yet in that document, to ensure that no two id attributes have the same ID value.

Calculating IDs this way means that edits to the file won't affect an element's ID value unless one of two things happen: 1. the line itself gets edited, so its checksum and hence ID value change 2. a line before it gets edited, resulting in a new checksum for the edited line that is the same as the one for the line we're concerned with. The chances of this happening are small. PurpleSlurple (see below) bases the ID values used as link targets (which are actually a/@name values, not @id values) on line numbers, so any line insertions or deletions at all will throw off all ID values that follow it.

Existing Work

In the days when a browser wouldn't go to an existing point in a web page unless there was an a element with a @name attribute there, Eugene Eric Kim came up with the idea of Purple Numbers, which are "HTML anchors attached to every paragraph in a document, which allow you to link to these paragraphs." A parenthesized purple number indicates the presence of each anchor. A PHP script by Matthew A. Schneider called PurpleSlurple creates and displays a copy of a web page with purple numbers that can be used for link addressing. For example, a PurpleSlurple-generated copy of the "Moby Dick" web page above looks like this, and this link goes right to the "No, when I go to sea" line.

The indication of link anchor values without requiring a "View Source" is an advantage, but at the cost of cluttering up a page with lots of parenthesized purple numbers. As Tim Bray often says, the use of View Source has always been a fundamental principle of the spread of useful web technology, so I don't think it's too much to ask.

Disclaimer

No warrantee expressed or implied, use at your own risk, etc.