Big legal publishers and semantic web technology

Which one will see the good fit first?

A recent @TopQuadrant tweet about legal knowledge and RDF/XML led me to Dr. Adam Wyner's piece Legal Ontologies Spin a Semantic Web on law.com. After reading it, I wanted to leave a comment, but this required registering on law.com and telling them lots of details about the law firm I work for. I don't work for a law firm, so I'm just putting my comments here and expanding on them a bit.

It's a logical next step for the big legal publishers to build ontologies that define new kinds of relationships among the data that they store.

Before discussing the value that ontologies can bring to the practice of law, Dr. Wyner writes:

Reading a case such as Manhattan Loft v. Mercury Liquors, there are elementary questions that can be answered by any legal professional, but not by a computer:

  • Where was the case decided?
  • Who were the participants and what roles did they play?
  • Was it a case of first instance or on appeal?
  • What was the basis of the appeal?
  • What were the legal issues at stake?
  • What were the facts?
  • What factors were relevant in making the decision?
  • What was the decision?
  • What legislation or case law was cited?
Legal information service providers such as LexisNexis index some of the information...

Actually, they identify and index most of the information in this list, as do Westlaw and the Wolters-Kluwer legal publishers, because they store the majority of their content in XML. (As early adopters of this technology, these companies sometimes store it using XML's predecessor, SGML.) A case's venue, its participants and their roles, the facts of the case, and the judge's decision are typical pieces of information that a legal publisher identifies with XML markup and stores in a system that can use this information for specialized queries.

Ontologies can add a lot to this, and the schemas for this XML will be a great head start to any semantic web-oriented system for getting more out of this data. This won't happen outside of the publishers' firewalls soon, though, because the schemas for their legal content play such an important role in the extra value that they add and charge for that no legal publisher would share them. (They don't worry about open source efforts to reproduce their work nearly as much as they worry about competitive advantages over each other.)

Two other resources that these publishers can build on are their existing taxonomies and their databases of citation relationships. Taxonomies such as West's Key Number system are divided by practice areas (for example, asbestos construction issues vs. child custody) and not document roles or purposes, and therefore make a nice complement to the XML schemas. Legal publishers have sold databases of citation relationships (for example, which case overruled another one) since the nineteenth century, and this data is all in clean, well-organized databases.

Kingsley Idehen likes to discuss how relational databases added a level of abstraction over previous models, XML provided an additional layer of flexibility by enabling people to store and use structured data whose structure wasn't necessarily tables, and the RDF data model and associated technology add another layer of abstraction and therefore more possibilities. Behind their firewalls, it's a logical next step for the big legal publishers to build ontologies that define new kinds of relationships among the XML content, the relational citation information, and the taxonomy data that they currently store so that they can get more value out of this data.

While there are cool things to do with this technology using content such as ancient literature, it's much easier to see a business model in a domain such as legal publishing where customers have a bigger budget to spend on information that can help them do their jobs. Making a case for the return on semantic web technology investment for legal publishing will be an interesting challenge, but not too difficult, because these technologies can build incrementally on so many existing information resources such as relational databases and the XML content infrastructure that Dr. Wyner forgot to mention. It will be interesting to see which of the big legal publishers moves ahead with this first, although they may choose not to publicize it.

For work outside of the big legal publishers, in a 2006 posting titled Law metadata on the web I wrote about how legal-rdf.org looked like a good start, but apparently there's been little enough activity there that they let their domain name ownership lapse, and now it's just parked by a speculator. (That posting also mentions the OASIS LegalXML work, which hasn't gotten to defining schemas for court decisions and kind of petered out in defining schemas for legislation, the other main document type for legal publishing.)

Can anyone tell me of other public standards for legal metadata in development that could provide input to semantic web projects?

1 Comments

Bob,

I was smiling as I read your post. The future is actually even closer than you may think.

Can't name any names as it would not be appropriate, but a case study on this page http://www.topquadrant.com/solutions/ent_vocab_mgmt.html is based on our work with one of the large legal information publishers mentioned in your blog. Representatives from the other large publisher you name spent quite a bit of time last week at our booth at the Semantic Technologies conference.

Not rich ontologies so far, just taxonomies, but it is happening as we speak (or write, for that matter). Publishing is going RDF.