SPARQL 1.1's new VALUES keyword

New ways to filter search results.
[Iggy New Values cover]

SPARQL 1.1's new BIND keyword lets you assign a value to a variable, and the even newer VALUES keyword lets you create tables of values, giving you new options when filtering query results. As the July 24th draft of the SPARQL query 1.1 spec (where the keyword first appeared) tells us, VALUES, "replaces and generalizes BINDINGS," a new keyword from earlier drafts of the SPARQL 1.1 spec. The ARQ 2.7.4 snapshot supports the VALUES keyword, so I played with it a bit.

The following query ignores any input you pass to it (make sure to pass some anyway if you're using command line ARQ, which complains if you don't include a --data parameter) and demonstrates how you can create a table of values. This example populates the table with qnames and literal values, but you can use any kinds of RDF values you want:

PREFIX dm: <http://learningsparql.com/ns/demo#>

SELECT * WHERE { } 
   VALUES (?color ?direction) {
   ( dm:red  "north" )
   ( dm:blue  "west" )
}

Here's the result:

-----------------------
| color   | direction |
=======================
| dm:red  | "north"   |
| dm:blue | "west"    |
-----------------------

This result isn't particularly exciting, but it shows how simple it is to create a two-dimensional table in a SPARQL query. To see what VALUES can add to our queries, we'll use the following dataset:

@prefix e: <http://learningsparql.com/ns/expenses#> .
@prefix d: <http://learningsparql.com/ns/data#> .

d:m40392 e:description "breakfast" ;
         e:date "2011-10-14" ;
         e:amount 6.53 . 

d:m40393 e:description "lunch" ;
         e:date "2011-10-14" ;
         e:amount 11.13 . 

d:m40394 e:description "dinner" ;
         e:date "2011-10-14" ;
         e:amount 28.30 . 

d:m40395 e:description "breakfast" ;
         e:date "2011-10-15" ;
         e:amount 4.32 . 

d:m40396 e:description "lunch" ;
         e:date "2011-10-15" ;
         e:amount 9.45 . 

d:m40396 e:description "lunch" ;
         e:date "2011-10-15" ;
         e:amount 6.20 . 

d:m40397 e:description "dinner" ;
         e:date "2011-10-15" ;
         e:amount 31.45 . 

d:m40398 e:description "breakfast" ;
         e:date "2011-10-16" ;
         e:amount 6.65 . 

d:m40399 e:description "lunch" ;
         e:date "2011-10-16" ;
         e:amount 10.00 . 

d:m40400 e:description "dinner" ;
         e:date "2011-10-16" ;
         e:amount 25.05 . 

As a baseline, we'll start with a simple query that asks for the values of all the dataset's properties without using the VALUES keyword:

# filename: values1.rq

PREFIX e: <http://learningsparql.com/ns/expenses#> 

SELECT ?description ?date ?amount
WHERE
{
  ?meal e:description ?description ;
        e:date ?date ;
        e:amount ?amount . 
}

When run with the dataset above, this query lists all the description, date, and amount values:

---------------------------------------
| description | date         | amount |
=======================================
| "dinner"    | "2011-10-16" | 25.05  |
| "lunch"     | "2011-10-16" | 10.00  |
| "breakfast" | "2011-10-16" | 6.65   |
| "dinner"    | "2011-10-15" | 31.45  |
| "lunch"     | "2011-10-15" | 6.20   |
| "lunch"     | "2011-10-15" | 9.45   |
| "breakfast" | "2011-10-15" | 4.32   |
| "dinner"    | "2011-10-14" | 28.30  |
| "lunch"     | "2011-10-14" | 11.13  |
| "breakfast" | "2011-10-14" | 6.53   |
---------------------------------------

This next version of the query adds a VALUES clause saying that we're only interested in results that have "lunch" or "dinner" in the ?description value:

# filename: values2.rq

PREFIX e: <http://learningsparql.com/ns/expenses#> 

SELECT ?description ?date ?amount
WHERE
{
  ?meal e:description ?description ;
        e:date ?date ;
        e:amount ?amount . 
  VALUES ?description { "lunch" "dinner" }
}

(Note that, in this case, the VALUES data structure being created is one dimensional, not two; this is still a step up from the BIND keyword's ability to only assign a single value to a variable at a time.) With the same meal expense data, this new query's output is similar to the output of the preceding one without the "breakfast" result rows:

---------------------------------------
| description | date         | amount |
=======================================
| "lunch"     | "2011-10-16" | 10.00  |
| "lunch"     | "2011-10-15" | 6.20   |
| "lunch"     | "2011-10-15" | 9.45   |
| "lunch"     | "2011-10-14" | 11.13  |
| "dinner"    | "2011-10-16" | 25.05  |
| "dinner"    | "2011-10-15" | 31.45  |
| "dinner"    | "2011-10-14" | 28.30  |
---------------------------------------

This query's VALUES clause could go after the SELECT clause's closing curly brace, instead of before it, and it wouldn't affect the results. (This won't always be the case with the VALUES clause in GROUP BY and federated queries.)

This next query of the same data creates a two-dimensional table to use for filtering output results:

# filename: values3.rq

PREFIX e: <http://learningsparql.com/ns/expenses#> 

SELECT ?description ?date ?amount
WHERE
{
  ?meal e:description ?description ;
        e:date ?date ;
        e:amount ?amount . 

  VALUES (?date ?description) {
         ("2011-10-15" "lunch") 
         ("2011-10-16" "dinner")
  } 

}

After retrieving all the meal data, this query only passes along the results that have either a ?date value of "2011-10-15" and a ?description value of "lunch" or a ?date value of "2011-10-16" and a ?description value of "dinner":

---------------------------------------
| description | date         | amount |
=======================================
| "lunch"     | "2011-10-15" | 6.20   |
| "lunch"     | "2011-10-15" | 9.45   |
| "dinner"    | "2011-10-16" | 25.05  |
---------------------------------------

(It looks like someone had two lunches on October 15th.)

When you use VALUES to create a data table, you don't have to assign a value to every position. The UNDEF keyword acts as a wildcard, accepting any value that may come up there. The following variation on the preceding query asks for any result rows with "lunch" as the ?description value, regardless of the ?date value, and also for any result rows with a ?date value of "2011-10-16", regardless of the ?description value:

# filename: values4.rq

PREFIX e: <http://learningsparql.com/ns/expenses#> 

SELECT ?description ?date ?amount
WHERE
{
  ?meal e:description ?description ;
        e:date ?date ;
        e:amount ?amount . 

  VALUES (?date ?description) {
         (UNDEF "lunch") 
         ("2011-10-16" UNDEF) 
  }

}

The output of this query has more rows than the previous query:

---------------------------------------
| description | date         | amount |
=======================================
| "lunch"     | "2011-10-16" | 10.00  |
| "lunch"     | "2011-10-15" | 6.20   |
| "lunch"     | "2011-10-15" | 9.45   |
| "lunch"     | "2011-10-14" | 11.13  |
| "dinner"    | "2011-10-16" | 25.05  |
| "lunch"     | "2011-10-16" | 10.00  |
| "breakfast" | "2011-10-16" | 6.65   |
---------------------------------------

When you saw the descriptions of what each of these queries did, it may have occurred to you that all of these query conditions could have been specified without the VALUES keyword (for example, with a FILTER IN clause in the values2.rq query, although that would only work to replace a one-dimensional VALUES setting). That's true, but I was using a small amount of data to demonstrate different ways to use the new keyword. When you work with larger amounts of data and especially with more complex filtering conditions, VALUES offers an extra layer of result filtering that can give you more control over your final search results with very little extra code in your query.

(Thanks to Andy Seaborne for reviewing this befor publication.)


Please add any comments to this Google+ post.