Playing with SPARQL Graph Store HTTP Protocol

GETting, POSTing, PUTting, and DELETEing named graphs.

One of the new SPARQL 1.1 specifications is the SPARQL 1.1 Graph Store HTTP Protocol, which is currently still a W3C Working Draft. According to its abstract, it "describes the use of HTTP operations for the purpose of managing a collection of graphs in the REST architectural style." Recent releases of Sesame support it, so I used that to try out some of the operations described by this spec. I managed to do GET, PUT, POST, and DELETE operations with individual named graphs, so that was fun, in an RDF geek kind of way.

Adding and deleting triples at the named graph level of granularity (as opposed to the triple level) will also make more sense for data publishing workflows where sets of data are added and deleted as a unit.

As this Working Draft often points out, you can also perform most if not all of these operations with a query sent to a SPARQL endpoint. Hardcore RESTafarians will prefer the new HTTP protocol way, though, because it uses basic HTTP operations with URIs that name resources (in this case, graphs of triples) and the operations to perform on them, instead of the more implementation-detail-oriented practice of embedding queries in URLs.

Adding and deleting triples at the named graph level of granularity (as opposed to the triple level) will also make more sense for data publishing workflows in which sets of data—probably with their own metadata about things like provenance—are added and deleted as a unit. For example, if you're a data publisher and I'm one of your providers, I would send you a set of data to replace the current set that you're offering from my organization, which you may have distinguished from your other data offerings in your triplestore by keeping the data from my company in its own named graph.

Maybe not enough people will agree, and find that UPDATE queries are good enough to achieve their goals. Ultimately, support for the Graph Store HTTP Protocol across the spectrum of semantic web tools will probably be tied to the extent of customer demand for it. At the very least I would expect all triplestores to support it shortly after it becomes a Recommendation, if not before.

To test drive these operations, I used cURL from the command line. cURL is part of Linux and Mac OS, and a free version for Window is available. Your favorite programming language should also offer ways to perform GET, PUT, POST, and DELETE operations—if not natively, then with some add-in library.

Everything below works, but not necessarily the best way possible. I went back and forth between the W3C specification document and the Sesame documentation on the topic a lot (with plenty of searches about cURL command line syntax in between) and I had plenty of both hits and misses. I probably missed some better ways to do several of these and I'm open to any suggestions.

Also, I have no idea what role authorization could play in all of this—you don't want to let just anyone with HTTP access change and delete your data—but this seemed like a nice start at getting to know this new part of the SPARQL standard.

Setup

To start, I created a new repository (a Sesame term, not a W3C standard term) called updatetest. This will be important below, because the URLs to pass to Sesame must specify the name of the repository to act on.

Then, on Sesame's SPARQL Update screen, I entered the following to insert some starter data into the updatetest repository:

PREFIX d:  <http://learningsparql.com/ns/data#>
PREFIX dm: <http://learningsparql.com/ns/demo#>

INSERT DATA
{
  d:x dm:tag "one" . 
  d:x dm:tag "two" . 

  GRAPH d:g1
  { 
    d:x dm:tag "three" . 
    d:x dm:tag "four" . 
  }

  GRAPH d:g2
  { 
    d:x dm:tag "five" . 
    d:x dm:tag "six" . 
  }
}

It adds two triples to the repository's default graph, creates named graphs called d:g1 and d:g2, and puts two triples in each of those. (If you're new to the use of named graphs or SPARQL 1.1 Update, which is also still in Working Draft status, see my O'Reilly book Learning SPARQL.)

To check that this update query above had the desired effect, and to see the results of the operations described below, I entered the following query on Sesame's Query screen. It lists all the triples currently in the dataset:

SELECT ?g ?s ?p ?o
WHERE
{
  { ?s ?p ?o }
  UNION
  { GRAPH ?g { ?s ?p ?o } }
}

After you execute this query you'll see a URL-escaped version of it embedded in the URL in your browser's address bar. (Don't call that RESTful, though, or the RESTafarians will come after you!) If you're going to try many of the examples below, you might want to bookmark the result of this query or keep it in its own browser tab so that you can reload it after trying each command line below to see the command's effect on the data in the updatetest repository.

GET

The GET examples should work when pasted as the URL into any browser, because a web browser that doesn't support GET isn't much of a browser. I did it with cURL anyway to be consistent with the rest of my examples. The following asks for everything in the updatetest dataset's default graph, and gets the "one" and "two" triples:

curl http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?default

(Several command lines that I've pasted here may reach off to the right where you can't see them because they're too long. I chose not to break them up with carriage returns to make them easier to copy and paste if you want to try them.)

Sesame returns the triples in the Turtle format, but in true RESTful fashion, you can ask for the result in one of the other formats that Sesame supports:

curl -H "Accept: application/rdf+xml" http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?default

The next request asks for all the triples in named graph http://learningsparql.com/ns/data#g1. Note that graph name characters that might cause problems in URL parameters are escaped in the request:

curl http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?graph=http%3A%2F%2Flearningsparql.com%2Fns%2Fdata%23g1

PUT

An HTTP PUT is a request to put a resource at a particular URL. The idea is to create a new resource at that URL—even if something already exists there, in which case the existing resource gets replaced.

Our first PUT example puts the triples from the file test.ttl into the http://learningsparql.com/ns/data#g2 named graph, replacing any existing ones that may be there. (Note how the command line uses the cURL -X switch to indicate the operation to perform, the @ character to point to the file with the triples to send, and the -H switch to send a custom header indicating the MIME type of the data being sent.) If the http://learningsparql.com/ns/data#g2 graph didn't exist, the PUT operation would create it.

curl -X PUT -d @test.ttl -H "Content-Type: application/x-turtle" http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?graph=http%3A%2F%2Flearningsparql.com%2Fns%2Fdata%23g2

(For the remainder of these commands, I changed something in test.ttl each time to make sure that I could see, when querying Sesame, that the latest version of the data really had been sent to the repository.) For the next query, I wanted to completely replace all of the updatetest repository's triples with the ones in test.ttl. Based on the other working examples and the correspondences between the Sesame documentation and the standard, I thought this would work, but it didn't:

curl -X PUT -d @test.ttl  -H "Content-Type: application/x-turtle" http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?default

This more Sesame-ish URL syntax did work to replace all of the update test repository's triples with the ones in test.ttl:

curl -X PUT -d @test.ttl -H "Content-Type: application/x-turtle" http://localhost:8080/openrdf-sesame/repositories/updatetest/statements

(After running it, you may want to rerun the INSERT DATA update query above to more easily see the effect of the remaining operations.)

POST

While a PUT command replaces any existing triples at the named URL with the ones being sent, a POST command adds the new ones to the existing ones. The following adds the test.ttl triples to the http://learningsparql.com/ns/data#g2 named graph:

curl -X POST -d @test.ttl -H "Content-Type: application/x-turtle" http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?graph=http%3A%2F%2Flearningsparql.com%2Fns%2Fdata%23g2

I could not put together a command line that POSTed triples to the default graph, and didn't see any examples in the Sesame documentation.

DELETE

When applied to a named graph, this command's effect is pretty obvious. The following deletes the http://learningsparql.com/ns/data#g2 named graph and all of its triples:

curl -X DELETE http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?graph=http%3A%2F%2Flearningsparql.com%2Fns%2Fdata%23g2

This last command deletes the default graph's triples, leaving named graphs and their triples intact:

curl -X DELETE http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?default

Have you tried this new specification's operations with other tools? Does anyone see clear-cut cases where they'd rather use this than send the corresponding queries to a SPARQL endpoint, or vice versa? Let me know at this Google+ post.