Semantic Web project ideas number 1

Hello, lazy semweb world.

When I spoke at a conference recently, the speaker's gift was a copy of the book that keynote speaker Don Tapscott write with Anthony D. Williams: Wikinomics. The book is very biz-buzzwordy (from page 150: "consumer product companies can find ways to monetize customer-led ecosystems"—have these guys bookmarked the Web Economy Bullshit Generator?), and they feel compelled to coin their own buzzwords, from the book's title to terms like N-gen, B-web, ideagora, and prosumer. I'll admit that I'm a little jealous, though; I wish I could come with some visionary lite tech book that people in suits would want to read on planes. As I write this, "Wikinomics" has an Amazon ranking of 164. I remember getting excited when XSLT Quickly broke the 7,000 mark.

[Wikinomics cover]

The book's many case studies about the New Collaboration (actually, I don't think the book used that phrase—maybe if I start using it a lot with a capital "N" and "C" I've got a title for my lite visionary book!) alerted me to some interesting projects such as Innocentive,, and NineSigma. Many of the book's topics sound like the kinds of things that people hope to see grow out of the semantic web, although the book don't mention that at all. I started thinking about what semantic web technologies could add to the projects that the book does mention, and I got some ideas.

I already have several ideas that I have no time to follow through on, so I thought I'd start throwing them out there for anyone who's interested. None are quite PhD thesis material, but some might be masters material. They could all be useful, and some could be popular if someone followed through on them. Here's another new coinage: "Lazy Semweb"! That is, a lazy web for semantic technologies. Danny Ayers recently thew out an offering, although he didn't use the term.

For many of the things I suggest, I imagine that some people (for several of my ideas, probably Kingsley Idehen) will point out existing work already addressing my idea. That's fine with me, because it provides further input for those seeking ideas. I'll tag all the entries with a metadata/semantic_web/project_ideas tag to make it easier to find the gathered collection of ideas in this weblog.

The most important thing for each project is that it should include a demonstration of how to get more value out of the data in question than would be possible without the semweb part. I'm not going to accuse Tapscott and Williams of falling short by not mentioning semantic web technology (although I'm sure that if they'd seen this video they would have run with it); I'm going to challenge semantic web advocates to prove to the Tapscotts and Williamses of the world that semweb technology adds value.

Google Desktop API + Semantic Web technology = ?

After all this introductory rambling, I'll start with something short and simple for the first idea. Instead of building on a Tapscott/Williams topic, I'll build on something I've mentioned recently here. I wondered about semantic web tools that built on the way people choose to work instead of making them use new tools. One tool that builds very nicely on the way I work is Google Desktop. It's free, for work-related issues I use it more than Google itself, and I don't know why I waited so long before trying it. Unfortunately, it's limited to use on Windows machines, but there are rumors of an Ubuntu version.

And it's got an API! If Google Desktop retrieves a handful of metadata about each file in which it found your search string, what more can we do with that metadata? I'd love to see a program that takes a user's query, reads some ontology rules, and then passes in a more sophisticated query to identify additional related resources that don't fall within the exact parameters of the user's original query. Or, someone could build a SPARQL endpoint around the API. Or, they could use the API to pull a bunch of this metadata into a triplestore, combine that with an ontology and other data... I think there's some real possibilities here.


KDE and Gnome desktop environments both have triple-stores and faceted metadata recall/browsing/autocreation in experimental versions - check Nepomuk, Beagle, etc.


How about starting here:

Just add URLs to the Data Source URI field and hit "Query". For you Blog Data Space the TimeLine Tab is a nice place to perform the data link traversal (URI dereferencing) which basically results in some interesting meshups :-)

Note: You can bookmark via the Permalinks.