RDFS: The primary document

Shorter and more interesting than I remember.

About two years ago I wondered if RDF Schema had become merely a layer of OWL or if anyone used RDFS by itself without OWL. My theory was that because tools such as TopBraidComposer, Protege, and SWOOP that let you design RDFS vocabularies also let you assign OWL properties to your classes, people used those because they were there, and we ended up with few pure RDFS vocabularies.

I heartily recommend that you read the first 11 or 18 page of the RDFS spec and skim the rest.

Lately, though, it seems that a lot of people who had been using the terms vocabulary/taxonomy/ontology interchangeably have started to understand better when OWL is too much. As they review the issues surrounding the choice between OWL 1 Lite, DL, and Full, around OWL 2 EL, QL, and RL, and the implications of open vs. closed world assumptions, more attitudes can be summarized as "sounds interesting, but pretty complicated; maybe later." This makes good sense for people whose main interest is defining a standardized vocabulary.

SKOS looks pretty good to more and more of them, but here I want to focus on RDFS. As I thought more about it recently, I realized that I had never read the RDF Schema Recommendation, so about five years late I sat down to do so. It's nice to remember, when you're wondering about the true meaning of some term or the relationship between some concepts, that a spec is available where you can just read the official explanation of what's what. (Of course, some specs are less enlightening than others when you're confused about what they describe.)

I found the RDFS Recommendation to be an interesting mix of simple things that are commonly used and complex things that are rarely used. When I printed it out, it was 27 pages, but the summaries and references start on page 18, and the appropriately titled Other Vocabulary section on pages 12 through 17 describes the rarely used features. Let's look at some interesting parts that lead up to that. From the Abstract:

This specification describes how to use RDF to describe RDF vocabularies.

Maybe that's obvious to some, but it's reassuring when confusion over vocabularies, taxonomies, and ontologies comes up. From the introduction:

The Resource Description Framework (RDF) is a general-purpose language for representing information in the Web.

As opposed to being a data model. (It's certainly not a syntax!)

Why do we need this schema language?

RDF properties may be thought of as attributes of resources and in this sense correspond to traditional attribute-value pairs. RDF properties also represent relationships between resources.

RDF however, provides no mechanisms for describing these properties, nor does it provide any mechanisms for describing the relationships between these properties and other resources. That is the role of the RDF vocabulary description language, RDF Schema. RDF Schema defines classes and properties that may be used to describe classes, properties and other resources.

The following is interesting for two reasons: first, because it describes a member of a class as an "instance," reminding me that "individual" is definitely an an OWL term that has no particular role in RDFS. (A little later the document tells us that "the members of a class are known as instances [their emphasis] of the class".) It's also interesting as a nice summary of an issue that often confuses people with an object-oriented background.

The RDF vocabulary description language class and property system is similar to the type systems of object-oriented programming languages such as Java. RDF differs from many such systems in that instead of defining a class in terms of the properties its instances may have, the RDF vocabulary description language describes properties in terms of the classes of resource to which they apply. This is the role of the domain and range mechanisms described in this specification. For example, we could define the eg:author property to have a domain of eg:Document and a range of eg:Person, whereas a classical object oriented system might typically define a class eg:Book with an attribute called eg:author of type eg:Person. Using the RDF approach, it is easy for others to subsequently define additional properties with a domain of eg:Document or a range of eg:Person. This can be done without the need to re-define the original description of these classes. One benefit of the RDF property-centric approach is that it allows anyone to extend the description of existing resources, one of the architectural principles of the Web.

The role and relationship of the rdfs:domain and rdfs:range properties have confused me and many others. The spec's description of their use is rather technical (nothing wrong with that; it's a spec) but there's this nice passage after that:

...an RDF vocabulary might describe limitations on the types of values that are appropriate for some property, or on the classes to which it makes sense to ascribe such properties.

The RDF Vocabulary Description language provides a mechanism for describing this information, but does not say whether or how an application should use it...

For example, data checking tools might use this to help discover errors in some data set, an interactive editor might suggest appropriate values, and a reasoning application might use it to infer additional information from instance data.

RDF vocabularies can describe relationships between vocabulary items from multiple independently developed vocabularies. Since URI-References are used to identify classes and properties in the Web, it is possible to create new properties that have a domain or range whose value is a class defined in another namespace.

I think that makes some basic issues clearer.

I have mixed feelings about the "Other vocabulary" section on features that, from what I've seen, never got much traction: container classes and properties, RDF collections, and reification. On the one hand, usage of these can appear so complex that I think it scared a lot of people away from RDF in the early days, obscuring the simplicity of the triple as the fundamental concept of RDF. On the other hand, as I read about these options now, they looked like they could be fun to play with, in a geeky sort of way. (I also realize that the whole concept of reification—the ability to refer to triples as resources themselves so that properties can be assigned to them—is an important bit of RDF foundational architecture for other good RDF-related ideas to build on.)

So, whether you're new to the whole idea of a standardized definition of a vocabulary or you've been using OWL and RDFS together for years, I heartily recommend that you read the first 11 or 18 page of the RDFS spec and skim the rest, which includes some handy reference material.

1 Comments

Finally, someone read it! Thanks ;) I remember bits of those paragraphs coming together from contributors to the original RDFS WG, 98/9, and other things from the RDF Core makeover. There's lots I'd do differently now but that's life! There are some other bits that got mostly dropped from the doc at some point, eg. about the Warwick Framework,
"""RDF and the RDF Schema language were also based on metadata research in the Digital Library community. In particular, RDF adopts a modular approach to metadata that can be considered an implementation of the Warwick Framework [WF]. RDF represents an evolution of the Warwick Framework model in that the Warwick Framework allowed each metadata vocabulary to be represented in a different syntax. In RDF, all vocabularies are expressed within a single well defined model. This allows for a finer grained mixing of machine-processable vocabularies, and addresses the need [EXTWEB] to create metadata in which statements can draw upon multiple vocabularies that are managed in a decentralized fashion by independent communities of expertise. """

http://www.w3.org/TR/2000/CR-rdf-schema-20000327/