A rules language for RDF

Right under our noses.

Last May, in Adding semantics to make data more valuable: the secret revealed, I showed how storing a little bit of semantics about the word "spouse"—the fact that it's a symmetric property (that is, that if A is the spouse of B, then B is the spouse of A)—let me look up someone's home phone number in my address book even if my entry for him there lacks his home phone number. I like this story because unlike biotech and some of the other popular domains for Semantic Web technology, everyone has an address book and understands the basic properties of an entry: first name, last name, email address, and so forth. (Because so many people have lived through the annoyances of moving their contact information from one email client or phone to another, address books also provide nice use cases for data integration issues.)

Back then, I wrote:

With software that understands an OWL expression stating that spouse is a symmetric property and a rule I define to say that spouses have the same home phone number, I can retrieve Leroy's home phone number...

OWL is great for defining the symmetry, but I glossed over the part about defining the fact that spouses have the same phone number. How do you define such a rule? n3 has a rules language, but I haven't seen it used much as the n3 subset known as Turtle (which leaves out such things) becomes more popular. Instead of defining a Semantic Web rules language, the W3C has decided to have the Rules Interchange Format Working Group standardize an interchange format between the many rules languages out there. (The W3C Rules Interchange Format Basic Logic Dialect PowerPoint presentation by WG co-chair Chris Welty provides good historical background.)

I can write a query that generates the triples I want to infer and call this query a "rule", but what do I do with it?

I've used a proprietary RDF rules language before, and was wondering if a standard one would come along. Some colleagues at TopQuadrant have shown me that we all have a straightforward, standardized RDF rules language right under our noses: SPARQL. I've been appreciating SPARQL's CONSTRUCT form more lately, and CONSTRUCT is the key here: like a SELECT statement, a CONSTRUCT statement defines conditions about which pieces of which triples to retrieve, but unlike SELECT, a CONSTRUCT statement assembles these into new triples. If we view a CONSTRUCT statement as the definition of a rule and the resulting new triples as the result of the execution of the rule, then we have a rules language and plenty of implementations of it available.

For example, the following SPARQL "rule" says that if ?person1 has the spouse ?person2 and the home telephone number ?phoneNum, then ?person2 also has the home telephone number ?phoneNum:

PREFIX  : <http://www.snee.com/ns/demo#>
PREFIX v: <http://www.w3.org/2006/vcard/ns#>

CONSTRUCT { ?person2 v:homeTel ?phoneNum . }
WHERE {
  ?person1 :spouse   ?person2 ;
           v:homeTel ?phoneNum .
}

When run with the following data (for the purposes of this demo, assume that the {:leroy :spouse :loretta} triple was generated by an OWL reasoner that saw {:loretta :spouse :leroy} and knew that :spouse was symmetrical),

@prefix  : <http://www.snee.com/ns/demo#> .
@prefix v: <http://www.w3.org/2006/vcard/ns#> .
:loretta :spouse   :leroy ;
         v:homeTel "434-923-9321" .
:leroy   v:workTel "434-932-5329" ;
         :spouse   :loretta .

It generates the triple {:leroy v:homeTel "434-923-9321"}.

OK, so I can write a query that generates the triples I want to infer and call this query a "rule", but what do I do with it? What makes it a rule about a particular set of data?

Holger Knublauch, a co-worker of mine who designed and developed the OWL plugin for Protégé before coming to TopQuadrant, recently wrote an RDF vocabulary called SPIN ("SPARQL Inferencing Notation"), which—among other things—can express associations between these rules and classes. So, for example, if the blank node rdf:_1 pointed to the query above, the following triple would associate this query rule to the v:Address class:

  v:Address spin:rule rdf:_1

To make the storage of the SPARQL rule in a triplestore even cleaner, Holger has implemented a way to store SPARQL queries as triples, and he's written the code to roundtrip between this and the standard text version. (See the SPARQL Text to SPIN RDF Syntax Converter for an online converter, and see spinrdf.org for more about what else the SPIN vocabulary can do, especially his blog entries as he developed it. I'm now finishing up a tutorial for the use of SPIN features in TopQuadrant products, and except for one optional step of the tutorial, it all works with the free version of TopBraid Composer.)

When you take it a little further, symmetrical properties and many other parts of OWL can also be implemented with SPARQL queries, and there's a lot going on among those who are doing this to find a sweet spot between RDFS and OWL Full that meets typical business needs without using a lot of processing power or dollars.

6 Comments

Bob,

Yes, SPARQL CONSTRUCT is a rule language in its own right for controlled "forward chanining". The optical illusion that many have missed is this: Other Rules Languages have Head and Body on a Horizontal Plane, while SPARQL CONSTRUCT's plane is vertical :-)

SPIN is neat formalization of the basic concept via a controlled vocabulary; certainly something we wil use, as yet another mechanism for showcasing this aspect of SPARQL esp. in the Virtuoso Sponger Middleware which is already a constained forward-chaining mechanism within the general non RDF to RDF processing pipeline.

Kingsley


For a coincidental post about this same concept, query language as rules language, in another context, see http://www.furia.com/page.cgi?type=log&id=330 .

Kingsley, what does your horizontal/vertical comment mean? What other Rules Langauges, and what do "horizontal" and "vertical" mean here?


Oh, and what makes this idea inherently a forward-chaining solution? Seems to me that structurally you can evaluate your query for all data ahead of time, or for individual nodes when asked. There are various reasons to want one or the other in particular uses, but I don't immediately see how the expression of the rule in a query-language is better or worse suited for one direction.


Check:

Axel Polleres. From SPARQL to rules (and back). In Proceedings of the 16th World Wide Web Conference (WWW2007), pages 787-796, Banff, Canada, May 2007. ACM Press. Extended technical report version available at http://www.polleres.net/TRs/GIA-TR-2006-11-28.pdf, slides available at http://www.polleres.net/publications/poll-2007www-slides.pdf.


But isn't it a problem that SPARQL rules, once immortalized, define a reality that can easily become outdated? The example you used here is a case in point.

Several of my younger married friends in fact do not share the same telephone number because they don't have a land line at home and instead use their own mobile numbers. What mechanisms are there or could there be to make sure that the semantic web keeps up with the changing times?


If you're concerned about potential issues around the mapping of reality to a rule set--a perfectly reasonable concern--then you want to avoid rule set specifications that rely on binary formats (e.g. compiled code) or proprietary languages from vendors who offer rule set apps using rule languages that they made up themselves. SPARQL scores very well on both counts, being a W3C standard.

Because the SPIN approach treats the SPARQL queries as more data to manage, the addition, removal, and modification of the rules is as straightforward as doing so with the data it's querying. There's no need to immortalize anything.

In fact, the semantic web model is often more adaptable than others because of the greater ease of schema modification than you'll find with relational databases or XML.

So, a SPARQL-based system can do a fine job of keeping up with the changing times.