Since the beginning of RDFa's history, many of its advocates have stressed its value in adding machine-readable semantics to personal web pages. This example from the RDFa Primer is typical:
<p class="contactinfo" about="http://example.org/staff/jo"> <span property="contact:fn">Jo Smith</span>. <span property="contact:title">Web hacker</span> at <a rel="contact:org" href="http://example.org">Example.org</a>. You can contact me <a rel="contact:email" href="mailto:firstname.lastname@example.org">via email</a>. </p>
An important principle has been the ability to make a web page's data readable by both eyeballs and automated processes. This is great, but there are two related issues that I feel need a higher profile: first, RDFa has great potential for storing non-eyeball information in web pages. Secondly, examples like the one above go after microformats on their own turf, where they're dug in pretty well. Being a more generalized, scalable solution, RDFa can do a lot more than microformats, and with many of those other applications having more commercial potential, I see them as the best growth area for the format.
First, the non-eyeballs part. When I speak about RDFa to people with a publishing background, they like its ability to store metadata such as workflow information. Some had heard of RDF in its RDF/XML incarnation, and it was just too complex for them. RDFa isn't. I submitted an example of this kind of workflow metadata usage to the RDFa Use Cases document, where it can provide a placeholder for future work. People often say that it's difficult to measure RDF adoption rates because so much of it is behind firewalls; electronic publishing workflow metadata is a pretty classic case of this, considering that publishers want to track various bits of information about documents as they work on them but don't want to include that information in the publicly available versions, so again, I think it's great potential growth area for RDFa.
Being a more generalized, scalable solution, RDFa can do a lot more than microformats, and with many of those other applications having more commercial potential, I see them as the best growth area for the format.
I wrote recently about how microformats, the semantic web, and the linked data movement are making more data available as HTTP-accessible resources. The linked data strategy is often to build a front end to a data source that lets you issue SPARQL queries against it—a "SPARQL endpoint" —and/or to maintain an updated copy of valuable information to query against, as with DBPedia. Microformats and the semantic web efforts (or at least the RDFa aspect of this) compete more directly with each other, each offering ways to embed semantics and machine-readable data into web pages, so it's worth examing what each does well and what clues this offers about their future.
The microformats effort has settled on formats to represent vCard contact information and outlines in HTML, and there are various efforts to re-use existing bits of HTML markup for other domains, but there's a much longer list of failed (or rather, "moribund") microformats efforts. Microformats' hCard conventions for contact information looks like a success, and the XOXO outline effort addresses a problem that RDF was never very good at anyway: imposing structure on the relationships among collection of data.
The list of moribund microformats efforts shows that it's moving slowly, if at all, to many new domains, and my theory is that it's so slow because for each new domain a new set of things needs to be worked out: how to identify each piece of information and where to put it in the available HTML slots. They have a few design patterns to guide this process, but I know of no generalized microformats way to say that a given resource has a given field name/value pairing in a way that would work for all resources and fields. RDFa's use of actual specifications (as opposed to warm and fuzzy exhortations like "pave the cow paths" and "a way of thinking about data") make the RDFa representation of any straightforward facts pretty simple, as long as a vocabulary exists to describe the resources and attributes. If it doesn't, you can make one up, but they can build on existing naming schemes such as SKU or ISBN numbers.
These two naming schemes in particular can cover a vast amount of machine-readable data that's worth embedding into web pages. For example, if the book with ISBN 1930220111 is for sale for $19.77, then it's pretty clear what's going on here:
<span about="http://site:www.isbn.org/1930220111" property="cbc:PriceAmount">19.77</span>
(I'm assuming for now that an application reading such data would only be interested in its developer's local currency, which leaves plenty of useful applications to write.) If you and I each have a million triples of pricing information, but you used something other than the UBL urn:oasis:names:tc:ubl:CommonBasicComponents:1:0 namespace to indicate your PriceAmount predicates, a simple OWL rule can tell a program reading these prices that you and I meant the same thing by the two different predicates we used.
Pricing is a good example. It's a huge area where people would be happy to give away data in the form of extra embedded metadata in their web pages, because it can drive new paying customers to the source of that data (for example, to sell more copies of the book with the ISBN 1930220111). Scheduling is another example of how giving away data such as flight times or movie times can drive paying customers to an organization with something for sale. Microformats have made some progress (the German Depeche Mode party list?), but I think that RDFa can make a lot more progress here.
Let microformats do what they do best: shoehorning bits of personal data into leftover HTML attributes that no one was using (such as the abbr attribute for dates) and adding <div class="foo"></div> and <span class="foo"></span> elements in places where they wish HTML offered a foo element. That's not going to scale to more enterprise-oriented data, because there are no clear answers to questions about the relationships between the various bits of markup. For example, what does <div class="title"></div> mean? The title of an audio track or a job title? I suppose it depends whether the div element in question has a <div class="haudio"></div> ancestor or a <div class="vcard"></div> ancestor. So what role does a div element play in setting the context of its descendants? Hell if I know; a search for "div" at microformats.org just brought up "No page title matches" and "No page text matches". The documentation for the class design pattern tells us that "if an appropriate semantic element is not available, use span or div", with no clue about what might be special about div. The documentation for the elemental and compound design patterns don't offer any more help.
This is not a markup infrastructure that someone can take and run with to develop (or even augment) applications for arbitrary data domains. RDFa is way ahead of microformats in its ability to do this, so its best opportunities for traction are in domains with a lot of structured data that doesn't fit well into hCard format or the two or three other microformat success stories.
There are plenty of these. Those who would benefit most from giving away embedded machine-readable data are companies and other large organizations who are now generating tables of HTML describing their products and services using PHP, Perl, Ruby on Rails, or other scripting languages, and a few tweaks (, ) to those scripts can make a wide range of that data machine-readable RDFa in addition to being human readable data. Let's find the people who can get those tweaks made and convince them of the value of doing so.