Adding metadata value with Pellet

A nice new feature of Pellet 2.0.

The open-source program Pellet is described as an OWL reasoner, but I've used it mostly as a SPARQL engine that happens to understand OWL. So, for example, if I have RDF that says "Loretta's spouse is Leroy and spouse is a symmetric property," but the data makes no mention of Leroy's spouse, and I ask Pellet "who is Leroy's spouse," it can give me the answer.

Most SPARQL engines can't do this kind of OWL inferencing, and I thought it would be cool if Pellet could read a batch of RDF with some facts and some OWL properties, infer what it can, and then write out a copy of the RDF with all the implicit facts made explicit. This way, the less intelligent SPARQL engines could take advantage of the inferred data. It's one of those holy grails in publishing technology: a process that reads data and adds value to it (in this case, by adding new facts that weren't there before) and then writes out the data in a standard format so that other programs can use it. Pellet 2.0's new extract subcommand now makes this possible.

First, let's review how Pellet would run a SPARQL query against some sample data and infer a new fact to answer a question that a non-reasoning SPARQL engine could not answer. The following RDF/XML sample has a few facts about Leroy and Loretta and specifies that the spouse property is symmetric (that is, that if X is the spouse of Y, then Y is the spouse of X):

<-- spousedemo.rdf -->
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:owl="http://www.w3.org/2002/07/owl#"
         xmlns="http://www.snee.com/ns/abook#">

  <rdf:Description rdf:about="L1">
    <first>Leroy</first>
    <last>Lockhorn</last>
  </rdf:Description>

  <rdf:Description rdf:about="L2">
    <first>Loretta</first>
    <last>Lockhorn</last>
    <spouse rdf:resource="L1"/>
  </rdf:Description>

  <owl:ObjectProperty rdf:about="http://www.snee.com/ns/abook#spouse">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#SymmetricProperty"/>
  </owl:ObjectProperty>

</rdf:RDF>

Leroy has no spouse property, and if I tell a SPARQL engine such as ARQ to run the following query against the RDF above to ask who Leroy's spouse is, it won't have anything to tell us. Old or new versions of Pellet, though, will read this query and tell us that Leroy's spouse is Loretta Lockhorn because that information is available to it after it uses the extra OWL metadata to infer what it can.

PREFIX a: <http://www.snee.com/ns/abook#>
SELECT ?spouseFirst ?spouseLast WHERE {

       ?s a:first  "Leroy";
          a:last   "Lockhorn";
          a:spouse ?spouse.

       ?spouse a:first ?spouseFirst;
               a:last  ?spouseLast.
}

Pellet 2.0's extract subcommand reads RDF, does any inferencing it can from included OWL metadata, and then writes out RDF that includes the inferenced data. The following command line shows how I used it. (Additional command line parameters let you control just how much inferenced data Pellet adds when doing this.)

pellet extract --input-format RDF/XML spousedemo.rdf  > temp.rdf

This copies all the triples from spousedemo.rdf to temp.rdf and includes new data such as the bolded part in the following (the "j.0" prefix is assigned to the URL that was the default namespace in spousedemo.rdf):

  <rdf:Description rdf:about="http://www.snee.com/ns/ID#L1">
    <j.0:last>Lockhorn</j.0:last>
    <j.0:first>Leroy</j.0:first>
    <j.0:spouse rdf:resource="http://www.snee.com/ns/ID#L2"/>
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
  </rdf:Description>

If I ask ARQ to run the query shown earlier on temp.rdf, it can tell me the name of Leroy's spouse, because Pellet's extract subcommand has made temp.rdf a richer data file than spousedemo.rdf.

Declaring the spouse property to be symmetric is just a small bit of metadata added to the data shown in the file. OWL can add all kinds of metadata, and Pellet now makes it even easier to take take advantage of that metadata.

For me, this small bit of metadata also proves something important about the value of semantic technology: while it would be silly to try to encode all the semantics of the word "spouse" in a machine-readable form, encoding just this small bit of the word's semantics—that it's a symmetric property—can add value to data and let you answer questions that you couldn't answer before.

Lockhorns' semantic cartoon display: block;margin-left: auto;margin-right: auto

1 Comments

Nice post, Bob! This sort of use of OWL and RDF is just the kind of insanely boring but incredibly useful thing that too often gets overlooked. ;>