1 May 2013

A nineteenth-century linking application

An encore presentation.

Alabama Shepards

From early 2003 to late 2005 I wrote a blog on oreillynet.com that I called Thinking About Linking. The last entry summarizes what I covered and my experiences with that blog, but today I wanted to republish my favorite entry from that blog on the tenth anniversary of its original publication. It's the same as the 2003 version except that I updated one link. On the right: a page from Shepard's 1902 "Shepard's Alabama Citations," which I bought on ebay. (My comment below about link typing would certainly need updating now, given my experience with RDF.)

Frank Shepard was a salesman for a Chicago legal publisher. Shortly after the American Civil War, he noticed that when one court case overruled, criticized, or otherwise cited another, lawyers often jotted a note about it in the margin of the reporter volume with the cited case's text. For example, upon learning that the judge in the case known as “La Bourgogne” (210 U.S. 95) made a negative references to the “Moore v. American Transportation Company” (65 U.S. 1) case, a lawyer might turn to page 1 in volume 65 of the U.S. Supreme Court case reporter and write “210 U.S. 95, negative” in the margin next to the Moore case. This way, if if the Moore case ever came up in court, the lawyer would have a better idea of its exact value.

Shepard had an idea: if he printed gummed labels for each case listing the cases that cited it, he could save the lawyers the trouble of writing in these references by hand. He built a business out of selling these inter-case links to the legal profession and named the company after himself: Shepard's. (Full disclosure: since Reed Elsevier acquired Shepard's in the mid-1990s, Shepard's Citations has been a product of my employer, LexisNexis. Other than some occasional XSLT advice to the folks in Colorado Springs, where Shepard's has been based since 1947, I don't do any work on that particular product.) In one sense, the stickers they produced in 1873 were already more sophisticated than web links, because if more than one case had cited the same case, the sticker for that case added a one-to-many link to it.

To help the lawyers quickly learn why one case had been cited by another, Shepard's started including one-letter codes to show that the citing case had overruled, criticized, modified, or applied some other treatment to the cited case. Now their links had link types: indications about the nature of the links to give a clue about why they might be worth traversing.

The stickers, or “Adhesive Annotations,” became very popular. While sitting on the Massachusetts Supreme Judicial Court, future United States Supreme Court Justice Oliver Wendell Holmes Jr. wrote “I regard Shepard's Massachusetts Annotations as the most thorough labor-saving device that has even been brought to my attention. No one owning a set of reports can afford to be without one.”

Before the nineteenth century came to a close, the company began producing alternatives to the sticker collections: bound books that listed, for each case, the cases that cited it and codes describing the citing case's treatment. Today, we call this separation of the links from the linked resources “out-of-line links.”

The books became so popular that their inventor's last name became a verb. Any lawyer or law student knows that to Shepardize a case is to find out all relevant cases that cite it. Of course, automating the storage and lookup of these links is much easier with software, and it's all online now. When you view a case using LexisNexis, clicking the “Shepardize” link displays a list of citing cases with links to the full text of those cases. This saves a lot of running around a law library, which was how the links were followed for the first century of their existence. (LexisNexis's chief competitor, WestLaw, has a competing on-line product called KeyCite.)

The success of Frank Shepard's invention tells us several things about linking:

  • Link typing can add real value to a linking application. If a lawyer who's going to bring up a case in court Shepardizes it and sees only codes for positive treatment, there's little need to look up the citing cases. If other cases criticized the case to be cited, however, it's his job to find out why. (Too bad it's so difficult to find other examples of link typing adding obvious value!)

  • Out-of-line links can sometimes be more useful than in-line links. The web and other hypertext systems leading up to it have conditioned many to think of a link as something that connects the resource they're looking at to a single other resource somewhere else, but links can be more than that. Shepard's customers found that having all the citation links in a single set of books instead of as a set of stickers to be spread around hundreds of volumes can make the research go much more quickly, especially with the treatment codes added to the link identifiers to give clues about whether the links are worth traversing.

  • It's not about the technology, but about the information. Just as a well-written song can work well when performed by different bands, a good linking application can still have value when implemented using different technologies.


Please add any comments to this Google+ post.

17 April 2013

Appreciating SPARQL property paths more

More and more useful.

I had been thinking of property paths as something that could slow down queries, and Paul's experience was that the property path version was more efficient.

I have played with SPARQL 1.1's new property paths features and described them in my book, and I've felt that I understood them for a while, but two recent occasions have helped me to appreciate them even more.

First, to prepare for the talk I'm giving at the Semantic Technology & Business on Enhancing Searches with Semantic Technology, at one point my demo app needed to find a SKOS concept that has either a skos:prefLabel or a skos:hiddenLabel value of a particular string. At first I thought I'd need a UNION query, like this,

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?c
WHERE {
 ?c a skos:Concept .
 {?c skos:prefLabel "motrin"@en }
 UNION
 {?c  skos:hiddenLabel "motrin"@en }
}

but then I realized that the alternative path operator could make it much terser: just two triple patterns in the query, with the second one's predicate expression essentially saying "a predicate of skos:prefLabel or of skos:hiddenLabel":

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT  ?c
WHERE {
 ?c a skos:Concept .
 ?c skos:prefLabel|skos:hiddenLabel "motrin"@en . 
}

The second occasion for appreciating property paths more was reading the recent Paul Groth blog posting 5 heuristics for writing better SPARQL queries, which recommended that we "use property paths to replace connected triple patterns where the object of one triple pattern is the subject of another."

I'd seen examples of the XPath-like property paths, like the foaf:knows/foaf:name one in the SPARQL 1.1 Query Recommendation, but I hadn't realized their value for replacing triple patterns where the object of one triple pattern is the subject of another that has a different predicate, and I've written a lot of those. For example, to find the four-step connection between d:a and d:e in the following,

@prefix d:  <http://learningsparql.com/ns/data#> .
@prefix dm: <http://learningsparql.com/ns/demo#> .

d:a dm:prop1 d:b . 
d:b dm:prop2 d:c . 
d:c dm:prop3 d:d . 
d:d dm:prop4 d:e . 

I would have written a SPARQL graph pattern that looked pretty much like the four triples that you see there, but with variables substituted for d:b, d:c, and d:d. Paul's blog entry made me realize that I could simply write this:

SELECT ?s ?o
WHERE
{ ?s dm:prop1/dm:prop2/dm:prop3/dm:prop4 ?o }

What makes this interesting is that I had been thinking of property paths as something that could slow down queries, and Paul's experience was that the property path version was more efficient. Of course, I was generalizing too much—the property path * and + operators, while very handy, essentially say "and then keep looking for more," which can really increase the search space and execution time. I suppose I was also still hearing the ringing in my ears of the alarm sounded by the paper Counting Beyond a Yottabyte, or how SPARQL 1.1 Property Paths will Prevent Adoption of the Standard (pdf), but that too was focusing on a subset of property paths options unrelated to the path format that Paul was discussing. (After the release of that paper and before SPARQL 1.1's ascent to Recommendation status, the SPARQL Working Group did make adjustments to certain property path features to address the paper's concerns.)

In my formerly extensive use of XSLT, I never got to the point where I couldn't picture being limited to XSLT 1.0, even though 2.0 became a Recommendation in 2007. (I know that Jeni Tennison got to that point about about 2007, if not earlier.) Now that it's been almost four weeks since the SPARQL 1.1 specs became Recommendations, I already have a difficult time being limited to SPARQL 1.0, which is still the case with some endpoints; there's just so much great stuff in 1.1.


Please add any comments to this Google+ post.

21 March 2013

In publishing? Listen to WFMU's "Radio Free Culture" podcast

A new radio show (and podcast) has some great observations about the future of content creation and distribution.

some description

People who listen to Jersey City freeform radio station WFMU tend to be a bit fanatical about it. The Wikipedia page on the station quotes the New York Times referring to them as "a station whose name has become like a secret handshake among a certain tastemaking cognoscenti." It's not only because of the range of their musical eclecticism, which is an easy game for college and other non-profit radio stations to play; the depth of their commitment and their role in the music and art scenes of New York City and beyond has been impressive for over forty years.

They have a new show called Radio Free Culture which is also available as a podcast. Different FMU hosts from different shows take turns with this one, so it's apparently on at different, unpredictable times, but the MP3s of past shows are all sitting there waiting for you. Some of the hosts are better than others, but as with the rest of the station, the unpredictability is part of the fun.

The discussions are often about music, but not exclusively so, and still—the music industry has already been through stages that the movie and "print" publishing industries are only now sliding into, and distribution of files of content is distribution of files of content. Roles in creating and publicizing that content, the potential value of redistributors (for example, record companies or publishers), and especially issues about who pays who for what, at which stage of creation or distribution, are topics that the podcast returns to regularly.

The most recent show, on February 25th, interviewed several people involved with the Future of Music Coalition's Artist Revenue Streams project, which gathered data on how musicians make money today, with statistics about how 25 categories of musicians make money from 48 potential income streams. Of course, the number and relative importance of the different categories and streams has evolved over time, and the reaction to the FMC's work shows that there hasn't been much serious data gathering in this area before, because a lot of organizations are very interested in using their work.

The December 31st show has a fascinating interview with MIT PhD candidate Benjamin Mako Hill about the implications to our culture of the fact that the song "Happy Birthday to You" is copyrighted. Did you know that if a group of people sitting around a table in a restaurant or a bunch of kids in a summer camp sing this song without getting permission first, they are technically violating U.S. copyright laws? Warner Music Group collects literally millions of dollars every year from higher-profile performances of the song. (The second half of this particular show is people calling in to talk about their worst birthday ever, and I didn't make it all the way through that.)

The December 24th show features talks from the FMU-sponsored Radiovision festival about "piracy" in its many meanings, with a particularly good talk by Anna Troberg, who once fought against content bootlegging but eventually became the leader of Sweden's quite successful Pirate political party after getting to understand their values better.

The October 29th discussion about live streaming public protests with independent journalist and video broadcaster Tim Pool, as well as several other more music-focused shows, often return to the issues of how new technology makes it easier to create and distribute content, but how larger infrastructures are necessary to build an audience for that content—infrastructures once only provided by traditional publishing companies but now involving social media networks as well. Of course, the roles, relationships, and relative need for publishers and social media networks are further fuel for discussion.

I've know people involved in diverse aspects of many kinds of publishing, and I think that WFMU's "Radio Free Culture" can teach all of us a lot about the range, direction, and magnitude of many of the current forces affecting how people create, distribute, pay for, and get paid for content now and in the future. (Further discussions of these topics as they relate to specific episodes are available at the show's Free Music Archive page.)


Please add any comments to this Google+ post.

"Learning SPARQL" cover

Recent Tweets

    Feeds

    [What are these?]
    Atom 1.0 (summarized entries)
    Atom 1.0 (full entries)
    RSS 1.0
    RSS 2.0
    Gawker Artists