Easier querying of strings with RDF 1.1

In which a spoonful of syntactic sugar makes the string querying go down a bit easier.
If it looks and walks and talks like a string...

The recent publication of RDF 1.1 specifications fifteen years and three days after RDF 1.0 became a Recommendation has not added many new features to RDF, although it has made a few new syntaxes official, and there were no new documents about the SPARQL query language. The new Recommendations did clean up a few odds and ends, and one bit of cleanup officially removes an annoying impediment to straightforward querying of strings.

Near the beginning of chapter 5 of my book Learning SPARQL, I wrote

Discussions are currently underway at the W3C about potentially doing away with the concept of the plain literal and just making xsd:string the default datatype, so that "this" and "this"^^xsd:string would mean the same thing.

When dealing with the difference between simple literals and those that were explicitly cast as xsd:string values, casting in one direction or the other with the str() and xsd:string() functions gave us a workaround, but once all the query engines catch up with RDF 1.1 we won't have to work around this anymore.

The 2011 document StringLiterals/LanguageTaggedStringDatatypeProposal describes the problem in more detail, but here's a short example. Imagine that you want to query for the author of one of the works listed in these triples:

@prefix dc:  <http://purl.org/dc/elements/1.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ls:  <http://learningsparql.com/id#> . 

ls:i1001 dc:creator "Jane Austen" ;
         dc:title "Persuasion" .
ls:i1002 dc:creator "Nathaniel Hawthorne" ;
         dc:title "The Scarlet Letter"^^xsd:string .

For example, let's say you want to know who wrote "The Scarlet Letter" and you enter this query:

PREFIX dc:  <http://purl.org/dc/elements/1.1/> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 

SELECT ?author WHERE { 
  ?work  dc:title "The Scarlet Letter" ; 
         dc:creator ?author . 

Using a SPARQL engine that was strictly compliant with RDF 1.0, this query wouldn't find anything, because the dc:title value of ls:i1002 is the typed literal "The Scarlet Letter"^^xsd:string and not the untyped string that the query was looking for. If a similar query asked for the author of "Persuasion"^^xsd:string, it wouldn't find anything, because the query is looking for a string that has been explicitly typed as an xsd:string, and in the data the value is an untyped literal.

This, in fact, is what happens with release 2.6.4 of Sesame, the version currently on my hard disk. Sesame is now up to 2.7.10, and, seeing the change coming, may have accounted for it by now. ARQ and the TopBraid platform stopped distinguishing between simple literals and typed string literals several years ago.

Treating the simple literal and typed string versions of a string as the same thing is now officially what's supposed to happen. According to section 3.3 of the new RDF 1.1 Concepts and Abstract Syntax Recommendation, "Simple literals are syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string". In other words, if it looks and walks and talks like a string, treat it like a string.

With this update, there's nothing to hold back other SPARQL engines from treating simple literals and typed string literals the same way. This is going to make the development of a lot of SPARQL queries a little bit simpler.

Please add any comments to this Google+ post.