I've been using the curl utility to retrieve data from SPARQL endpoints for years, but I still have trouble remembering some of the important syntax, so I jotted down a quick reference for myself and I thought I'd share it. I also added some background.
Quick reference
Submit a URL-encoded SPARQL query on the operating system command line to the endpoint http://edan.si.edu/saam/sparql
:
curl "http://edan.si.edu/saam/sparql?query=SELECT%20*%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D%20LIMIT%208"
(Quoting the URL isn't always necessary, but won't hurt. Omitting it may hurt if some of the characters mean something special to your operating system's command line interpreter.)
Submit the same query stored in the file query1.rq:
curl --data-urlencode "query@query1.rq" http://edan.si.edu/saam/sparql
There is no need to escape the query in the file, because the --data-urlencode
parameter tells curl to do so.
The above queries return the data in whatever format the endpoint's system administrators chose as the default. You can pass a request header to specify that you want a particular format. The following requests comma-separated values:
curl -H "Accept: text/csv" --data-urlencode "query@query1.rq" http://edan.si.edu/saam/sparql
Other possible content types are application/sparql-results+json
, application/sparql-results+xml
, and text/tab-separated-values
.
The above examples all use a SELECT query. A CONSTRUCT query requests triples, so instead of CSV or one of the other tabular formats you want an RDF serialization such as Turtle:
curl -H "Accept: text/turtle" --data-urlencode "query@query2.rq" http://edan.si.edu/saam/sparql
Other possible content types for CONSTRUCT queries are application/rdf+xml
, application/rdf+json
, and, for ntriples, text/plain
. The bio2rdf github page has good long lists for both SELECT and CONSTRUCT content types, although not all endpoints will support all of the listed types. (It lists text/plain
for N-triples, but you're better off using application/n-triples
.)
Background
curl lets you submit many kinds of HTTP requests to HTTP servers. It's part of the Linux and MacOS operating systems, and if you don't have it on your Windows machine, you can download it.
If you enter curl
with no parameters other than a URL, like this,
curl http://www.learningsparql.com
it does the same HTTP GET that a browser would do. This has the same effect as doing a browser View Source on that web page.
It gets more interesting when you're not pointing curl at a static web page like http://www.learningsparql.com but at a dynamic resource such as a SPARQL endpoint. A SPARQL endpoint is usually identified with a URL ending with /sparql
. I tested everything shown above with these endpoint URLs:
https://query.wikidata.org/bigdata/namespace/wdq/sparql
, the SPARQL endpoint for Wikidata.http://localhost:3030/myDataset/sparql
, the SPARQL endpoint for a local instance of Apache Jena Fuseki. This is the triplestore that I described in the "Updating Data with SPARQL" chapter of my book Learning SPARQL because, for a server that accepts SPARQL UPDATE commands, it's so easy to get up and running. Before running the queries against this endpoint I created a dataset on this running instance with the clever name of myDataset and loaded some triples into it. As you can see, a Fuseki endpoint URL includes the dataset name.http://edan.si.edu/saam/sparql
, the SPARQL endpoint for the Smithsonian Institution. I used this one in the examples here because it's the shortest of the three endpoint URLs that I used for testing.
The simplest way to send a query to a SPARQL endpoint is to add query=[your URL-encoded query]
to the end of the endpoint's URL as with the very first example above. You can paste the resulting URL into the address bar of a web browser so that the browser will retrieve the query results from the endpoint, but curl lets you retrieve the results from a command line so that you can save the returned data and use it as part of an application.
URL encoding is the process of taking characters that might screw up the parsing of the URL and converting each to a percent sign followed by a number representing its Unicode code point--most often, converting each space to %20. For example, the escaped version of the query SELECT * WHERE {?s ?p ?o} LIMIT 8
that I used in the examples above is SELECT%20*%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D%20LIMIT%208
. Most programming languages offer built-in functions to do this; I usually paste one of these queries into a form on a website like this one and then copy the result after having the form do the conversion.
When you add the escaped query to a SPARQL endpoint URL such as the Smithsonian one and enter the result as a parameter to curl at your command line, like this,
curl http://edan.si.edu/saam/sparql?query=SELECT%20*%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D%20LIMIT%208
it should retrieve a SPARQL Query Results JSON Format version of the data requested by that query, because that's the default format for that endpoint.
I actually don't escape queries and add them to a curl command line often. When I'm refining a query by iteratively editing and running it, re-encoding the URL each time can be a pain, so I usually store the query in a text file (query1.rq for the sample SELECT query above and query2.rq for the CONSTRUCT query) and tell curl to URL-encode the file's contents and send the result off to the SPARQL endpoint.
If I keep the file with the query in a text editor, I can refine it, save it, and run the same command over and over without worrying about escaping each revision of the query. (Because my editor is Emacs, I could actually send the query to the endpoint using Emacs SPARQLMode, but today's topic is curl.)
The curl website has plenty of documentation, but you can learn a lot with just this:
curl --help
Among the many, many options, some useful ones are -o
to redirect output to a file and -L
for "follow location hints" (that is, if the server has instructions to redirect a request for a given URL to something else, take the hint). Another is-I
for "Show document info only": just get information about the requested "document" without actually retrieving a named resource, which is useful for debugging. The classic -v
for "verbose" is also handy for debugging.
Take a look at the available options, experiment with some SPARQL endpoints, and soon you'll be using "curl" as a verb (for example, "I tried to curl it but I didn't have the right certs"--see the -E
command line option for more on that) and you won't be talking about hairstyling, arm exercises, or sliding round stones across the ice.
(I just learned about Curling SPARQL HTTP Graph Store protocol by @jindrichmynarz, so if you've gotten this far, you'll like that too.)
Curling image by Greg Scheckter via flicker CC some rights reserved
Comments? Just tweet to @bobdc for now, because Google+ is shutting down. I will be moving my blog to a new more phone-responsive platform shortly and I'm researching options for hosted comments.