« XForms + REST + XQuery (+ Jenni Tennison) | Main | DevX article "Relational Database Integration with RDF/OWL" »

What is a taxonomy?

A standard definition.

There are many terms that people can't agree on. The great thing about standards is that even when everyone doesn't agree about definitions included in those standards, these definitions provide a common baseline for everyone to work from.

After hearing many definitions of the word "taxonomy", I was pleased to discover the ANSI/NISO Z39.19 standard, Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, a specification that among other things defines the terminology for several classes of controlled vocabularies, including "taxonomy". (It even defines the term "term"!) It does a great job of putting the term "taxonomy" in the right context of related terms such as "controlled vocabulary" and "thesaurus", but not, unfortunately, the term "ontology". More on this below; first, I'll paste a few handy quotations.

"A taxonomy is a controlled vocabulary consisting of preferred terms, all of which are connected in a hierarchy or polyhierarchy".

From section 2.5, "Maintenance":

A controlled vocabulary can be as simple as a short list of terms or as complex as a thesaurus containing tens of thousands of terms with a complex hierarchical structure and many different types of relationships among the terms.

From section 4.1, "Definitions":

controlled vocabulary A list of terms that have been enumerated explicitly.
taxonomy A collection of controlled vocabulary terms organized into a hierarchical structure. Each term in a taxonomy is in one or more parent/child (broader/narrower) relationships to other terms in the taxonomy.
thesaurus A controlled vocabulary arranged in a known order and structured so that the various relationships among terms are displayed clearly and identified by standardized relationship indicators. Relationship indicators should be employed reciprocally.

From section 5.4, "Structure":

There are four different types of controlled vocabularies, determined by their increasingly complex structure. These are:

  • List
  • Synonym ring
  • Taxonomy
  • Thesaurus

From section 5.4.1 "List":

A list (also sometimes called a pick list) is a limited set of terms arranged as a simple alphabetical list or in some other logically evident way. Lists are used to describe aspects of content objects or entities that have a limited number of possibilities.

From section 5.4.2 "Synonym Ring":

While a synonym ring is considered to be a type of controlled vocabulary, it plays a somewhat different role than the other types covered by this Standard. Synonym rings cannot be used during the indexing process. Rather, they are used only during retrieval. Use of synonym rings ensures that a concept that can be described by multiple synonymous or equivalent terms will be retrieved if any one of the terms is used in a search.

From section 5.4.3, "Taxonomy":

A taxonomy is a controlled vocabulary consisting of preferred terms, all of which are connected in a hierarchy or polyhierarchy.

From section 5.4.4, "Thesaurus":

A thesaurus is a controlled vocabulary arranged in a known order and structured so that the various relationships among terms are displayed clearly and identified by standardized relationship indicators.

From section 8.3, "Hierarchical Relationships":

The use of hierarchical relationships is the primary feature that distinguishes a taxonomy or thesaurus from other, simple forms of controlled vocabularies such as lists and synonym rings.

From section 2.1, "Applying the Standard":

This Standard does not cover numerical classification schemes (except as they correlate to topics such as Dewey, for example), ontologies or semantic networks.

(It actually does include a section on semantic networks.) The "standardized relationship indicators" mentioned in section 5.4.4 are typically things like "Broader Term" to show the relationship between, for example, the terms "collies" and "dogs". This broader/narrower relationship is about the only relationship that a taxonomy tree represents; a thesaurus can show other relationships that one term can have to another—for example, it can be a related term, a preferred term, or a non-preferred term.

Some research I've been doing lately, including an online course in "Taxonomies and Controlled Vocabularies", gives me the impression that most of what taxonomists do is develop thesauri, not taxonomies. I guess calling themselves "thesaurists" would sound a bit odd, and the term "thesarus" conjures up images of the Roget book that our teachers told us about as teenagers if we overused any words in the papers we handed in.

We saw above that a thesaurus uses "standardized relationship indicators". I've described ontologies to people as being like taxonomies, except that you (or more likely, people in your field) get to make up new, specialized relationships beyond those standardized for thesauri. For example, in legal publishing, a higher court ruling could have the relationship property "cite" to a lower court ruling, with potential values such as "overturns" or "affirms". According to the OWL Use Cases and Requirements, which I wrote about last August, "The word ontology has been used to describe artifacts with different degrees of structure. These range from simple taxonomies (such as the Yahoo hierarchy), to metadata schemes (such as the Dublin Core), to logical theories". This describes a taxonomy as a simpler version of an ontology, so it makes sense to me to add "ontology" as a fifth step on the four levels of controlled vocabulary shown above.

If ontologies are a potentially more complex class of taxonomy, then knowledge of taxonomy development can help ontology development, and vice versa. And, I've got some ideas about the use of ontology development tools to develop taxonomies and thesaurii that I'll be writing about here shortly.


(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)

Perhaps along the lines of the ontology spectrum from

Ontologies come of age?

nice bit of synchronicity ... I was just yesterday looking for a reasonable definition of taxonomy and look what dropped into my reader ...

Nice description. I will go and read both the standards documents. A nice discussion would be about various classification and tagging (folksonomy)and faceted classification and discuss some applications.

There are bits of Semantic Web Technologies (and standards efforts) and related area that may incrementally improve how we gather, classify/view and consume information as well.