Learning more about SPARQL

Improving the Bart blackboard query.

Since I first wrote on sending DBpedia SPARQL queries about Bart's blackboard messages at the start of Simpsons episodes, I've learned a lot more about SPARQL (reading the spec helped) and I wanted to walk through some of the things I've learned by expanding on and refining my original query.

Bart and SPARQL query

I had finished that entry by wondering how to list Bart's blackboard entries for all episodes instead of for just one season. Vaclav Synacek showed me one way, and I recently realized that there's a much simpler way—maybe too simple (all queries shown assume the namespace declarations shown on the SNORQL interface form for sending SPARQL queries to DBpedia):

SELECT ?blackboard WHERE {
  ?s dbpedia2:blackboard ?blackboard.
}

(See it executed here.) What makes this too simple is that it asks for the dbpedia2:blackboard value for anything in DBpedia, whether it's a Simpsons episode or not. I wanted to only ask about Simpsons episodes—not that it comes up for anything else, but I thought this would be a good exercise—so I looked on the DBpedia page for one episode and found a property called dbpedia2:portalProperty. For Simpsons episodes, it has a value of "The Simpsons"@en, with the final @en indicating that this string is in English, so I entered this query:

SELECT ?episode,?blackboard WHERE {
  ?episode dbpedia2:blackboard ?blackboard;
           dbpedia2:portalProperty "The Simpsons"@en.
}

(See it executed here.) This query and its answer set brought up two more questions for me:

  • Some answers are URLs, and some are actual strings of what Bart wrote. How can I tell DBpedia to only give me the latter?

  • What's a portalProperty, and what other values might show up there?

I learned from the spec how to filter the answer set so that only literal strings get returned, with no URLs (more technically, with no IRIs): with the isLiteral operator in a filter, like this:

SELECT ?episode,?blackboard WHERE {
  ?episode dbpedia2:blackboard ?blackboard;
           dbpedia2:portalProperty "The Simpsons"@en.
  FILTER isLiteral(?blackboard)
}

(See it executed here.) Now to the portalProperty. As I described in How you can explore a new set of linked data, a query like the following lists all the values that came up for a particular property, although if there are too many, DBpedia may not return them all:

SELECT DISTINCT ?pprop WHERE {
  ?s dbpedia2:portalProperty ?pprop.
}

(See it executed here.) We want that DISTINCT keyword because otherwise we're asking about all the triples that have dbpedia2:portalProperty predicates, and we know that for the Simpsons alone that's over a hundred repetitions.

The list of potential portalProperty values is interesting, but not all Simpsons episodes have this property assigned, so asking for dbpedia2:blackboard values for any subjects that have a dbpedia2:portalProperty value of "The Simpsons"@en won't give us a complete list of blackboard gags. Most episodes seem to have a dbpedia2:reference property pointing to the page on thesimpsons.com for that episode, so I considered querying for dbpedia2:blackboard values for any subjects that have a dbpedia2:reference value with "thesimpsons.com" in it, but then I realized that this wouldn't be much different from Vaclav's solution.

The real point is that as I learn more about SPARQL (and DBpedia), I'm finding more ways to explore this huge set of interesting data and more ways to control the data that's returned to me. Checking what Bart wrote on the blackboard is fun, but I have some more interesting ideas in the works.

3 Comments

Inspiring post! I had a shot at it and discovered you need to use a couple of techniques to get it close. I wonder if inference rules embedded in the DB might also be required to deal with categories properly.

SELECT ?chalkboard_gag WHERE {
{
{?episode skos:subject _:category . _:category skos:broader }
UNION
{?episode skos:subject }
}
?episode dbpedia2:blackboard ?chalkboard_gag
FILTER isLiteral(?chalkboard_gag)
FILTER (?chalkboard_gag != "None"@en)
}

There are a couple of interesting modelling issues in here as well. Many values are not atomic, most are pre-formatted with quotes and Lisa has a quote in one episode. Perhaps most interesting, there is a child-safe and non-child-safe version of one message - and rightly so, you might not want to encourage your audience to visit that domain.


Many thanks! I'm realizing more and more how useful it is to share SPARQL queries focused on particular topics, just to get a broader sense of the twists and turns we all encounter.

Here's a tiny one I discovered this week. Using ARC2, I plugged in a query to DBpedia that worked just fine in the query form at http://dbpedia.org/sparql/. But it failed in the PHP script. Turned out that, when defining prefixes, I had included a space between the prefix and the URI. That worked in the form, but failed the script. Just a wee gotcha in case anyone else encounters it.

Patrick