Using SPARQL to find the right DBpedia URI

Even with the wrong name.
some description

In Pulling SKOS prefLabel and altLabel values out of DBpedia, I described how Wikipedia and DBpedia store useful data about alternative names for resources described on Wikipedia, and I showed how you can use these to populate a SKOS dataset's alternative and preferred label properties. Today I want to show how to use these as part of an application that lets you retrieve data even when you don't necessarily have the right name for something—for example, retrieving a picture of Bob Marley using the misspelled version of his name "Bob Marly".

The DBpedia page for Bob Marley shows that dbpedia:Bob_Marly is one of the dbpedia-owl:wikiPageRedirects values of http://dbpedia.org/page/Bob_Marley. This means that if you send your browser to http://en.wikipedia.org/wiki/Bob_Marly, you'll end up on http://en.wikipedia.org/wiki/Bob_Marley.

It doesn't show that this redirect URI has the rdfs:label value "Bob Marly"@en associated with it, and this is the really handy part for retrieving data based on not-quite-right values. Because of this, the following SPARQL query will return the URI http://dbpedia.org/resource/Bob_Marley whether the quoted literal value is "Bob Marly" or "Bob Marley":

# First two PREFIX declarations unnecessary on SNORQL
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?s WHERE {
  {
    ?s rdfs:label "Bob Marly"@en ;
       a owl:Thing .       
  }
  UNION
  {
    ?altName rdfs:label "Bob Marly"@en ;
             dbo:wikiPageRedirects ?s .
  }
}

The graph pattern before the UNION keyword checks whether there is an actual Wikipedia page for the quoted value, and the part after checks whether it's a redirect of something else. Effectively, it will be one or the other; there are only about a dozen labels in DBpedia that can be both.

To use this in a simple application, I created a form that, after you enter a name on it, attempts to display a picture of what you entered. Because the redirect data includes common misspellings as well as nicknames, entering "Bob Marly" will get you a picture of Marley and the URL of the actual resource, as shown below the picture above. Other interesting nicknames and misspellings to try are Bob Dillan, Mary Casat, Prince Billy, Big Blue, and Proctor and Gamble. (Warning: DBpedia image data is incorrect for some very well-known people, like Abraham Lincoln and Barack Obama, even when the Wikipedia page has a picture, so you may see the symbol for a broken image link. I had hoped to have the picture above have a title of "Abe Lincon".)

Because the output creates a specialized web page, I used the technique I described in Build Wikipedia query forms with semantic technology (which can be used with any SPARQL endpoint, not just DBpedia): a CGI Python script stores a SPARQL query, replaces a string in that query with whatever was entered in the form, sends the query off to the endpoint, and then sends HTML based on the result back to the browser. You can see the source here.

It's safe to say that this ability to find the right information based on a nickname or common misspelling could add a lot to a lot of applications. Once again, while the most important part of the semantic web is the data—in this case, DBpedia's wikiPageRedirects values—and not the standards and technologies used to get at the data, the existence of so much useful SPARQL-accessible data should make the SPARQL query language look more and more appealing to people who might have doubted before.