« SKOS and SWOOP: how | Main | Jonathan Zittrain's "The Future of the Internet: and How to Stop It" »

How you can explore a new set of linked data

Some great tips from Dean Allemang.

Although he doesn't describe it in linked data terms, a recent posting from Dean Allemang has some great suggestions for how to dive into a set of SPARQL-accessible data you know nothing about in order to find out what's there. If there's cool stuff in the data set, this is a lot of fun. (Also check out the recent Talking with Talis with Dean, where he describes many examples of semantic web technology helping large organizations solve very real problems.)

If someone gives you access to an SQL database, commands like show databases, use [database name], show tables, and describe [table name] let you explore the data, even if you have no idea of its schema at first, but that's a big "if"—there aren't many large relational databases with useful data available over the public Internet waiting for you to issue SQL queries. There is a growing amount of linked data with SPARQL front ends, and Dean describes a few general-purpose SPARQL queries and a few more that build on the results to explore a set of data that you might know nothing about. He uses dbpedia in his examples, so we know that his demonstration will work with a huge data set.

Before recommending that everyone else go and try this, I thought I should try it myself on another data set whose structure I knew nothing about, so I went to Richard Cygniak's The Linking Open Data dataset cloud page (at the Linked Data Planet conference, pretty much everyone had a slide of this interactive diagram) to find another data set on which to try this out. Some servers were down, and some had RDF files to download that I could have queried against, but I ended up with the D2R Server for the Gutenberg Project, where I entered SPARQL queries at its SNORQL web-based front end for SPARQL queries.

As Dean suggested, I listed all the predicates:


I saw a lot of Dublin Core predicates, including dc:creator, and dc:title, and dc:description. I did this to list all the authors:

SELECT DISTINCT ?o where { ?s <http://purl.org/dc/elements/1.1/creator> ?o }

One of the values there was "db:people/Goethe_Johann_Wolfgang_von_1749-1832", so I did the following to list his works in Project Gutenberg:

SELECT ?title where {
  ?s <http://purl.org/dc/elements/1.1/creator> 
     <http://purl.org/dc/elements/1.1/title> ?title.

I wondered about Project Gutenberg's description of one title, "The Sorrows of Young Werther", so I entered this:

SELECT ?desc where {
  ?s <http://purl.org/dc/elements/1.1/title> "The Sorrows of Young Werther";
     <http://purl.org/dc/elements/1.1/description> ?desc.

The answer is: "Translation of: Die Leiden des jungen Werther." (The German version is also available—most of the Project Gutenberg Goethe texts are in German.)

I could go on, and I certainly will try this with more sites that offer a SNORQL front end to a SPARQL interface. Like I said, it's a lot of fun; check out Dean's suggested queries, Richard's suggested data sets, and try it yourself!


(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)

I do not know what an information resource is. I have come to think of RDF as algerbra "over" information resources. RDF writers should be barred from coining new URI's. I'll stop pontificating and read the rest of this interesting material now ;-)

URIs (and sets of them packaged as ontologies) are a lot like source code: everyone agrees that re-use of existing ones is good, but instead of looking for some to re-use, they create some and tell the world that they should re-use it. This is easier than tracking down existing well-design URIs (or code) to re-use. That being said, what you need isn't always out there, so sometimes you have to make up new URIs (or code).


Here another way:

1. Go to http://dbpedia.org:8890/isparql
2. Go to "Advanced Tab" (just so you can paste in the query that follows)

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

FROM <http://dbpedia.org>
?s ?p ?o. ?o bif:contains "'Goethe_Johann_Wolfgang'"

3. The results grid contains URIs, click on the URI for Wolfgang, and select the "Describe" option.

You can also do it the other way round starting with this query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

FROM <http://dbpedia.org>
?s ?p ?o. ?o bif:contains "'Goethe_Johann_Wolfgang'"

To see the visualization of the SPARQL Query click on the triple icon in the "Advanced" UI.

As you explore the resulting graph, this visual query tool will constuct SPARQL and the fly, and at each turn you can visualize the queries etc..