« DevX article "Relational Database Integration with RDF/OWL" | Main | SKOS and SWOOP: how »

Using the ontology editing tool SWOOP to edit taxonomies and thesaurii

Hopefully, as a more powerful open source alternative to existing taxonomy packages.

In the online course in taxonomy development that I took recently we reviewed several popular taxonomy development tools. I found them to be expensive or to have clunky, dated interfaces, and was disappointed that the formats most of these programs supported for storing saved work was either a binary proprietary format or what they just called "XML". (I'm open to correction on any of these points.) "OK," I wondered, "What XML?" Reviewing some samples of their exported XML, it was pretty easy to understand the structure by looking at the element names and container patterns, but I never saw any mentions of a DTD, and I thought it would be ideal if there was a standard format that they could share.

There is a standard format that they can share: SKOS, which provides an ontology (available as an OWL file here) that defines the kinds of relationships that taxonomists want to see in taxonomy or thesaurus development. This includes basic ones such as "narrower" and "broader" and more sophisticated variations on these such as "broaderPartitive" and "narrowerInstantive". (A little background on these variations, featuring examples from the ANSI Z39 standard for controlled vocabularies that I wrote about recently: in a hierarchy of terms, we can qualify the relationship between a term in a tree and its parent by saying that the child node is narrowerInstantive, as the Louvre is an instance of a museum, or narrowerPartitive, as a brain stem is a part of a brain, or narrowerGeneric, as the class of parrots is a subclass of the class of birds. In addition to defining the taxonomy term relationship properties "broader" and "narrower", SKOS defines instantive, partitive, and generic subproperties of "broader" and "narrower".)

If the SKOS standard lays out the potential relationships and provides a definition of these relationships in a standard syntax (OWL), and an open source GUI tool like SWOOP can read that and let you define the terms and relationships in a new thesaurus by pointing and clicking, then the most difficult part of providing a new alternative to the well-known taxonomy tools is already done, right? Well, not quite. There are two key things missing, but we'll see them both available for SWOOP use in time:

  • Thesaurus editors usually offer a series of canned reports about the terms and relationships within a given thesaurus—the kinds of reports that taxonomists want to see as they perform their work. A little Python code to read a SKOS-based thesaurus and then sort and summarize its contents would be simple to write.

  • As far as I could tell, intelligence about inverse relationships is not built into SWOOP. For example, let's say I read the SKOS ontology into SWOOP and create Individuals (or, in object-oriented terms, instances) of the terms "The Louvre" and "museum". Then, I use the appropriate SWOOP features to indicate that The Louvre has the relationship broaderInstantive to museum, because The Louvre is an instance of the class museum. I'd like to then go to SWOOP's panel for museum and see "The Louvre" listed as having a narrowerInstantive relationship to this term I'm reading about, but I won't.

My first idea was to tell SWOOP to save the ontology file with these relationships and instances, then use Pellet to turn the implicit relationships in the file (for example, that museum has a narrowerInstantive relationship to The Louvre) into explicit ones written right out in the same file, and then read that file with all the spelled-out relationships back into SWOOP, but apparently Pellet isn't quite there yet. A SPARQL query delivered via Pellet can pull out explicit and implicit triples, but not in a syntax that can be used for an RDF/OWL file. I saw on the Pellet mailing list that the next version would support SPARQL CONSTRUCT queries that let you create a new set of RDF around the returned triples, so that will help.

Describing all this here, I can casually refer to the use of SWOOP to read an ontology file and then define individuals and their taxonomic relationships, but I'd like to spell out in more detail how I used SWOOP to do this. My family is about to head out for a summer beach vacation, so instead of postponing the completion of a great big posting on all this, I'm making this overview part 1, and I will describe the hands-on part in part 2 sometime next week.

Comments

(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)

Bob, Pellet will do this, but not with the default SWOOP plugin - the online version of Pellet (I think in the code fork at Google) will export an ontology with the additions - problem is SWOOP is no longer under real development (but please feel free to contribute open source and keep it running - there's a large user community who would love to see this happe) - the pellet at owldl.org has a lot of stuff in it the original one didn't (and the incremental Pellet developed by Chris Halaschek-Weiner is an incredible improvement) so there's been a lot of code splitting and such since it left Maryland -- sorry about that...
-Jim H.

Actually, Bob, Pellet can perfectly well do what you want, only not in yr preferred way. As Evren said on the pellet-users list in response to yr question, you can get what you want by writing some Java, but the command-line interface doesn't support SPARQL CONSTRUCT queries presently. It will in the next release, due soon now.

Thanks Jim--I was running the most recent version of Pellet (downloaded from the website I linked to) from the command line. I will play some more.

Kendall--I'm too lazy to write the Java code. I'll just wait.