Semantic web technology and humanities research

A Canadian historian uses semantic web technology to do interesting research and to lay the groundwork for others to do so.

I've attended and given a few Scholar's Lab talks at the nearby University of Virginia, and I'm kicking myself for missing a recent talk by Mount Allison University's Bruce Robertson, whose field at Mount Allison is ancient Greek and Roman history. (A podcast of his Scholars Lab talk is available here.) He's the main guy behind the Historical Event Markup Linking Project (HEML) and apparently even the people who brought him to UVa to give his recent talk were surprised at how far he'd refocused his XML orientation toward semantic web technologies.

A few quotes from his presentation:

The semantic web stack... allows a schema to be always growable in a federated way. You can add to my schema and I can't do anything about it, and that's a wonderful, wonderful thing.
"You can add to my schema and I can't do anything about it, and that's a wonderful, wonderful thing".

I agree. While extensibility of a given XML DTD or schema must be designed into it from the start, RDFS and OWL schemas allow a lot more flexibility and therefore more possibilities to build on the work of others. On a related note, here's my favorite quote, which was a bit of a lightbulb moment for me:

If in the XML world the schema next door is just a stylesheet away, in the RDF world, the schema next door can be reasoned into, so you can include reasoning rules so that the same server is providing data in very many different flavors. I think this is an underexplored and exciting aspect of RDF, that if we have multiple schemas, as we do in the humanities, and we're not going to agree on one, we can just do all of them.

When I give an XSLT class I like to provide some introductory historical background before I show the first stylesheet. I always say that the main growth driver for XSLT's popularity was that people got tired of waiting for the shareable DTDs that they heard about when XML was first released—they just decided to send and accept whatever XML had the information they needed and and then write stylesheets to rename and rearrange that XML to fit into their systems. I never thought of RDF-oriented schemas the same way, but I now I realize that they're all that and more, because it's much easier to combine multiple RDFS/OWL schemas for a single application than it is to combine multiple XML schemas/DTDs. (As a side note, I'm currently reading Dean Allemang and Jim Hendler's book Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL and I'm learning a great deal. I'm familiar with most of the components of RDFS and OWL that they explain, but their advice on how to put those pieces together has taught me a lot and given me many ideas.)

The Semantic Web community is sometimes accused, even from within, of being an echo chamber of tools vendors and open source developers telling each other about their latest features. A corollary issue is that these people must hear more from users about their needs, and Bruce's talk is just the kind of thing they need to hear. His talk that I link to above covers issues such as what went well for him as he built his application, what didn't, the mining of Wikipedia/DBPedia for historical research, issues he found with the representation of time and languages of content... it's great stuff. Too bad it's too late for him to get on the bill of the Semantic Technology conference; in a recent Semantic Web Gang discussion, Reuters Clearforest's Tom Tague discussed his hopes that more non-industry people would help make this conference less echoey than it had been in the past. To be honest, he actually said he was hoping to see more "business users"; perhaps, to get more non-semweb geek perspectives, we should think about how much non-computer science academic people can contribute to the discussion as well. Bruce Robertson is a great example.