August 21, 2008

Jonathan Zittrain's "The Future of the Internet: and How to Stop It"

Highly recommended.

The Future of the Internet: and How to Stop It

The title of Jonathan Zittrain's book The Future of the Internet makes it sound like one of those upbeat future technology books that you see people in suits reading on planes, but the subtitle "and How to Stop It" shows that it's not so upbeat. Zittrain, the Professor of Internet Governance and Regulation at Oxford University and the co-founder of Harvard Law School's Berkman Center for Internet & Society, describes how so much use of the Internet is headed in directions that contradict the principles that made the Internet great in the first place. The most important of these principles is what he calls generativity—flexibility in the creation of hardware, operating systems, applications, or websites that allow people to make new contributions, often resulting in unexpected contributions that others can build on further. While Linux, Apache web servers, Firefox, wikis, the IBM PC's open architecture, and many other platforms have provided this so far, the increasing use of "tethered appliances" to perform Internet-related tasks threatens this pattern. Products such as the iPhone, TiVo, and the XBox are so tightly controlled by their makers that any innovations built on these platforms must come from within the companies that control them, much like any innovation in the U.S. telephone system had to come from the monopoly company that controlled it for so many decades. Sure, you can write a new application for the iPhone, but no one can load your app onto their iPhone until it goes into Apple, gets approved, and then distributed by them. If you want to add a new menu option to Firefox and recompile it, no such approval process is necessary for people to use it, and this kind of freedom is how the Internet grew to where it is today.

I don't want to rehash the whole book here, but I'll admit that I expected it to be fairly dry and read it mostly out of a sense of responsibility to be up on these issues. It actually is a fairly quick read; I read most of my copy sitting on a beach. Once Zittrain lays out his case, which includes a history of the Internet that was fascinating to someone who's read quite a few histories of the Internet, he reviews several of the things that have gone wrong (for example, spam and malware). This sets the stage for how tightly-controlled Internet walled gardens are becoming more appealing to people, and he describes some of the decentralized, grass-roots practices that have dealt with such issues surprisingly effectively—for example, robots.txt files and Wikipedia's practices for resolving disputes.

He does present a hopeful case for how the future can build on current work by technical people and legal scholars to prevent the looming corporate-controlled Internet. (One legal scholars he mentions is Pamela Samuelson, a member of the markup geek family if only by marriage.) I strongly recommend the book to geeks interested in relevant legal issues and to lawyers interested in Internet technology, because Zittrain lays out the explicit, implicit, and potential connections between these worlds so well.

He's made an online version of the book available under a Creative Commons license at at his web site. All his talk of people building on each others' work gave me one nice idea: create a version of his footnotes in which court case citations are live links to publicly available versions of those cases. For example, a link from the citation in footnote 3 of chapter 2 of the book to the judge's decision on 238 F.2d 266 at justia.com.

One more thing for the to-do pile.

August 15, 2008

How you can explore a new set of linked data

Some great tips from Dean Allemang.

Although he doesn't describe it in linked data terms, a recent posting from Dean Allemang has some great suggestions for how to dive into a set of SPARQL-accessible data you know nothing about in order to find out what's there. If there's cool stuff in the data set, this is a lot of fun. (Also check out the recent Talking with Talis with Dean, where he describes many examples of semantic web technology helping large organizations solve very real problems.)

If someone gives you access to an SQL database, commands like show databases, use [database name], show tables, and describe [table name] let you explore the data, even if you have no idea of its schema at first, but that's a big "if"—there aren't many large relational databases with useful data available over the public Internet waiting for you to issue SQL queries. There is a growing amount of linked data with SPARQL front ends, and Dean describes a few general-purpose SPARQL queries and a few more that build on the results to explore a set of data that you might know nothing about. He uses dbpedia in his examples, so we know that his demonstration will work with a huge data set.

Before recommending that everyone else go and try this, I thought I should try it myself on another data set whose structure I knew nothing about, so I went to Richard Cygniak's The Linking Open Data dataset cloud page (at the Linked Data Planet conference, pretty much everyone had a slide of this interactive diagram) to find another data set on which to try this out. Some servers were down, and some had RDF files to download that I could have queried against, but I ended up with the D2R Server for the Gutenberg Project, where I entered SPARQL queries at its SNORQL web-based front end for SPARQL queries.

As Dean suggested, I listed all the predicates:

SELECT DISTINCT ?p WHERE {?s ?p ?o}

I saw a lot of Dublin Core predicates, including dc:creator, and dc:title, and dc:description. I did this to list all the authors:

SELECT DISTINCT ?o where { ?s <http://purl.org/dc/elements/1.1/creator> ?o }

One of the values there was "db:people/Goethe_Johann_Wolfgang_von_1749-1832", so I did the following to list his works in Project Gutenberg:

SELECT ?title where {
  ?s <http://purl.org/dc/elements/1.1/creator> 
     <http://www4.wiwiss.fu-berlin.de/gutendata/resource/people/Goethe_Johann_Wolfgang_von_1749-1832>;
     <http://purl.org/dc/elements/1.1/title> ?title.
}

I wondered about Project Gutenberg's description of one title, "The Sorrows of Young Werther", so I entered this:


SELECT ?desc where {
  ?s <http://purl.org/dc/elements/1.1/title> "The Sorrows of Young Werther";
     <http://purl.org/dc/elements/1.1/description> ?desc.
}

The answer is: "Translation of: Die Leiden des jungen Werther." (The German version is also available—most of the Project Gutenberg Goethe texts are in German.)

I could go on, and I certainly will try this with more sites that offer a SNORQL front end to a SPARQL interface. Like I said, it's a lot of fun; check out Dean's suggested queries, Richard's suggested data sets, and try it yourself!

August 12, 2008

SKOS and SWOOP: how

A step-by-step example.

Last week I discussed the possibility of using the SWOOP ontology editor and the W3C's SKOS standard to create taxonomies or thesaurii, and I promised to go into a little more detail about how to do so.

(Again, I encourage those more familiar than I am with SKOS and these tools to correct me.) The file 2006-04-18.rdf defines the SKOS Core Vocabulary. It defines some of the more sophisticated relationships that I described last week, such as broaderPartitive and narrowerInstantive, as deprecated properties with the owl:versionInfo message "This term has been moved to the 'SKOS Extensions' vocabulary. See http://www.w3.org/2004/02/skos/extensions/". I downloaded the extensions ontology file and wrote a little XSLT stylesheet to combine the core file and the extensions file and to remove the deprecated properties. Otherwise, when viewing the combined ontology in SWOOP, you would see properties like broaderPartitive listed twice: the deprecated version and the new version.

To use this ontology with SWOOP to define a thesaurus, start up SWOOP and load the combined ontology created from 2006-04-18.rdf and extensions.rdf. Add a term (or, in OWL terms, add an Individual) such as "museum" by clicking SWOOP's "Add I" button, which has its "I" inside of a little pink diamond. In the New Entity dialog box that appears, the default value for "Instance-of" is owl:Thing, but you're working with SKOS, so pick the Concept class instead. Enter Museum as an ID, and then click "Add and Close". Do the same to add the term "TheLouvre". Remember not to include any space in this term's ID name; you can add "The Louvre" as a Label for the term in the same dialog box if you like.

After clicking "Add and Close" for TheLouvre, you'll see the "Concise Format" tab for TheLouvre, where you can add some metadata about it: the fact that it has the relationship BroaderInstantive to Museum.

To do this, first click "Add" next to "Object Assertions". In the "Select Property" dialog box that appears, look at that long list of properties to choose from. This is the main reason to use SWOOP and SKOS together: because the combination gives you the ability to create rich standardized metadata by simply picking names from lists like this. Click on BroaderInstantive and the "Select Prop[erty] & Proceed" button, then pick Museum from the list that appears. After you click the "Add and Close" button, you'll see it reflected on the Concise Format information about TheLouvre:

SKOS screenshot

As I described last week, the Concise Format screen for Museum will have no mention of the term's relationship to TheLouvre, but an automated way to add that is apparently not far off.

To get some more ideas about the things that SWOOP can do with a SKOS file, download the Invasive Species Management Thesaurus from the California Information Node, load it into SWOOP, and look at the Concise Format tab for a few terms. They have a lot of metadata. The recent devX article Applying SKOS Concept Schemes also showed me that there are plenty of other aspects of SKOS for me to explore.

Looking this over made me even more sure of something I wrote last week: once Pellet supports SPARQL CONSTRUCT queries, the combination of the SKOS ontology, SWOOP, and Pellet is going to be very useful for people working with taxonomies and thesaurii.

Gawker Artists

Feeds

[What are these?]
Atom 1.0 (summarized entries)
Atom 1.0 (full entries)
RSS 1.0
RSS 2.0