« What do you do with your ebook prototypes? | Main | Vote for my brother's Radiohead remix »

RDF and social networks

Better than XML!

Looking at Michael Pick's video DataPortability - Connect, Control, Share, Remix and on the dataportability.com home page, I saw that RDF was included in a brief list of standards involved, and something occurred to me about the value of RDF in attempts to share data across applications such as social networking sites—in particular, why it's better than XML for this.

data portability standards logos

XML was invented for online publishing, but its popularity grew so quickly because of its value for sharing data between organizations that have different information infrastructures. When I give a class in XSLT, I begin by describing the visions people originally had of DTDs describing common information structures (I almost wrote "common vocabularies" there, but the difference is precisely the point of this posting) that would let different business partners in the same field share data more easily.

Many of these visions worked out very well, but many people are still waiting for the DTD or schema of their field's information. (I bring this up in XSLT classes because many people got tired of waiting for relevant DTD standards and decided to just accept others' XML in whatever format and convert it as needed upon arrival. This practice has been a big driver of XSLT's success.) DTD development is complex because in addition to identifying common vocabularies, you must spell out the relationships between these pieces of information. In other words, you must do some data modeling. There is a lot more heuristic gut reaction involved in this than in designing relational databases, where the straightforward procedure of normalization can guide many of your decisions about data relationships. DTDs also require you to decide which information should be wrapped in containers, which should be stored in attributes, which need unique IDs... there's a great payoff if you do this work, but it's a lot of work, and it's especially difficult for committees of people from different organizations to work together on this.

Defining a vocabulary instead of a DTD is the low-hanging fruit. (I'm deliberately using the term "vocabulary" instead of taxonomy or ontology to keep things simple, but the tools and techniques of those fields have much to contribute.) It doesn't reduce the work to do by simplifying it, but by reducing the scope: by forgetting about the data structures. If you want to just define a list of terms and exchange collections of field name/value pairs whose names correspond to that list of terms—or even to several lists, as long as each is clearly identified—RDF makes it very simple, and it's even more portable than XML. With the right vocabularies, I can deliver my contact information, my Facebook and LinkedIn IDs, and any other data that deserves to be portable without worrying about hierarchies in this set of information, or which order they should be in, or which should be in attributes and which should be in elements.

There is a bit of irony here, in that what turned many people off from RDF was the ugliness of XML representations of it to to represent data structures such as containers. RDF can help people get a lot of useful work done if they ignore these data structures (and striping). This can leave them with some simple, intuitive RDF/XML, and they also have the option of ignoring RDF/XML and using something like n3 instead.

It's nice to see that the data portability folks didn't get scared off.


(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)

Yep, all the way back to 2003, I've been pitching RDFs two strengths as aggregation and inference.

For data portability, there's a huge win to cheap aggregation. XML is my hammer of choice, but even I can't claim that mixing random XML vocabularies is cheap.