<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>bobdc.blog</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/" />
    <link rel="self" type="application/atom+xml" href="http://www.snee.com/bobdc.blog/atom.xml" />
    <id>tag:www.snee.com,2008-10-31:/bobdc.blog/2</id>
    <updated>2012-01-25T14:07:06Z</updated>
    <subtitle>Bob DuCharme&apos;s weblog, mostly on technology for representing and linking information.</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.32-en</generator>

<entry>
    <title>A brief, opinionated history of XML</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2012/01/a-brief-opinionated-history-of.html" />
    <id>tag:www.snee.com,2012:/bobdc.blog//2.679</id>

    <published>2012-01-25T14:01:51Z</published>
    <updated>2012-01-25T14:07:06Z</updated>

    <summary>From someone who had a front row seat....</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="XML" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="xml" label="xml" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        From someone who had a front row seat.
        <![CDATA[    <div id="id103350">

<p id="id103352">There are a few histories of XML out there, but I still find myself explaining certain points to people surprisingly often, so I thought I'd write them down. If you don't want to read this whole thing, I'll put the moral of the story right at the top: </p>

<blockquote id="id103361" class="pullquote" style="width: 190px; font: bold 1.333em/1.125em &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double #ddd; border-width: 3px 0; text-align: center; float: right; "><strong id="id103372">They didn't understand that it wasn't designed to meet their needs. It was designed to make electronic publishing in multiple media easier.</strong></blockquote>

<p id="id103375"><i id="id103377">XML was designed as a simplified subset of SGML to make electronic publishing in multiple media easier. People found it useful for other things. When some people working on those other things found that XML wasn't perfect for their needs, they complained and complained about how badly designed XML was. They didn't understand that it wasn't designed to meet their needs. It was designed to make electronic publishing in multiple media easier.</i></p>

<h2 id="id103388">Automated typesetting and page layout...</h2>

<p id="id103393">In the 1970s, computerized typesetting made automated page layout much easier, but three guys at IBM named Goldfarb, Mosher, and Lorie got tired of the proprietary nature of the typesetting codes used in these systems, so they came up with a nonproprietary, generic way to store content for automated publishing that would make it easier to convert this content for publication on multiple systems. This became the ISO standard <a id="id103403" href="http://en.wikipedia.org/wiki/SGML">SGML</a>, and the standardized nonproprietary part made it popular among U.S. defense contractors, legal publishers, and other organizations that did large-scale automated publishing.</p>


<p id="id103414">When I first got involved, SGML was gaining popularity among publishers creating CD-ROMs and bound books from the same content, because they could create and edit an SGML version and then run scripts to publish that content in the various media. The structure of an SGML document type (for example, the available text elements and element relationships in a set of legal court cases, or the elements and element relationships that you could use in a set of aircraft repair manuals) was specified in something called a <a id="id103426" href="http://en.wikipedia.org/wiki/Document_Type_Definition">DTD</a>, which had its own syntax and was part of the SGML standard. The scripts to convert SGML documents were usually written using a language and engine called Omnimark, which was a proprietary product, but a perl-based alternative was also available. </p>
<p id="id103438">When Tim Berners-Lee was wondering how exactly to specify that one of his new hypertext documents had a title here, a subtitle there, and a link in the middle of a paragraph that led to another document, SGML was a logical choice&#8212;it was a text-based, flexible, non-proprietary, standardized way to specify document structure with various tools available to help you work with those documents. That's why HTML tags are delimited with angle brackets: because SGML elements were (nearly always) delimited with angle brackets. Dan Connolly sketched out the <a id="id103467" href="http://lists.w3.org/Archives/Public/www-talk/1992MayJun/0020.html">first HTML DTD</a> in 1992. </p>

<p id="id103475">SGML's designers couldn't see into the future, so they deliberately made it very flexible. For example, you could use other delimiters for element tags besides angle brackets, but everyone used angle brackets. SGML parsing programs were still required to account for the possibility that a document used other delimiters, and the possibility that many other options had been reset, so these parsers were large and complex, and few were available to choose from. By the mid-90s, enough best practices had developed that Sun Microsystems' Jon Bosak had the idea for a simplified, slimmer version of SGML that assumed a lot of default settings and could be parsed by a smaller program&#8212;maybe even a program written in Sun's new Java language&#8212;and that could be transmitted over the web when necessary. The documents themselves would be easier to share over the web than typical SGML documents, following the example of HTML documents.</p>

<p id="id103514">Around this time SGML was considered a niche technology in the electronic publishing industry, and I worked at several jobs where I wrote and modified DTDs and Omnimark scripts to create and maintain document conversion systems. I also went to the relevant SGML conferences, where I got to know several of the people who eventually joined Jon to create the simplified version of SGML. (Many are still friends.) At first this group called their new spec WebSGML, but eventually they named it XML. </p>


<p id="sgmlcompat">You could still process XML with Omnimark and other SGML tools. Many people would <a id="id103533" href="#whydtds">fail to appreciate the value of this design decision</a>: as a valid subset of SGML, XML documents could be processed with existing SGML technology. This meant that on that day in 1998 when XML became an official W3C standard, we already had plenty of software out there, including programs like Adobe's special SGML edition of FrameMaker, that could process XML documents right away. This gave the new standard a running start, and XML may not have gotten anywhere without this running start, because those of us using the existing tools didn't have to wait around for new tools for the new standard and then work out how to incorporate these tools into our publishing workflows. We already had tools and workflows that could take advantage of the new standard.</p>


<p id="id103554">I've heard some people describe certain things that SGML specialists didn't like about XML, but these people don't understand that XML was invented by and for SGML specialists, and it made SGML peoples' lives much easier. For one thing, we weren't so dependent on Omnimark anymore; at least one of my former employers switched from SGML to XML just so they could ditch Omnimark. XML's companion standard <a id="id103563" href="http://en.wikipedia.org/wiki/XSLT">XSLT</a> let us convert XML to a variety of formats using robust, free, standardized software, and as the web became a bigger publishing medium we found ourselves writing XSLT stylesheets to convert the same XML documents to print, CD-ROM, and HTML. Electronic publishing had never been so easy.</p>

<h2 id="id103576">...and beyond...</h2>


<p id="id103580">Then along came the dot com boom. People got excited about how "seamless e-commerce" would change everything. People would save money as obsolete middlemen were removed from old-fashioned transactions, and people would make lots of money by taking part in this streamlining (selling pick axes during a gold rush) or by automating the buying and selling of products.</p>
<p id="id103590">Orders would be transmitted over this fabulous free network known as The Internet instead of over the expensive, proprietary <a id="id103595" href="http://en.wikipedia.org/wiki/Electronic_Data_Interchange">EDI</a> networks. But when my computer sent an order to yours, how exactly would this order be represented? XML provided a good syntax: it was plain text, easy to transmit and parse, and could group labeled pieces of information in fairly arbitrary structures while remaining an open, straightforward standard. (When I say "straightforward", I'm talking about the <a id="id103881" href="http://www.w3.org/TR/1998/REC-xml-19980210">original spec</a> here, not the collection of related specs that most people are referring to when they complain about the complexity of XML. More on this <a id="id103891" href="#schema">below</a>.) This let people send any combination of information back and forth, regardless of the potential lack of compatibility between the back end systems that the different parties were using. </p>
<p id="id103902">So, as an important technology of the dot com boom, XML became trendy, and it was a heady feeling to suddenly be an expert in a trendy technology. I'll never forget hearing it mentioned in a Microsoft ad on a prime time network TV show; sure, it was spoken by the character of a geek who normal people weren't supposed to understand, but still, this subset of a niche technology that my friends help to invent was mentioned on prime time network TV. Three different series of XML conference series were running, and they were much better attended than the <a id="id103913" href="http://www.idealliance.org/events/xtech-2012">single one</a> that's left now. The best part was that there was enough money behind  some of those conferences to fly most speakers in and put them up in hotels, which got me my first trips to London and Silicon Valley.</p>

<p id="id103926">XML wasn't really a perfect fit for ecommerce systems, though. The elements vs. attributes distinction, which publishing systems used to distinguish between content to publish and metadata about that content, didn't have a clear role when describing transactions that weren't content for publishing. XML had some odd data types (NMTOKEN? CDATA?) that only applied to attribute values, instead of traditional data types like integer, string, and boolean that could be applied to content as well as attributes.</p>
<p id="whydtds">And then there was that strange DTD syntax: if XML was so good at describing structure, why wasn't XML used to describe the structure of a set of documents? The answer is <a id="id103946" href="#sgmlcompat">above</a>, but it didn't get publicized very well, so many people complained about DTD syntax. Everyone agreed that an XML-based schema syntax that provided for traditional data types would be a Good Thing, so various groups came up with <a id="id103956" href="http://docstore.mik.ua/orelly/xml/schema/appa_03.htm#xmlschema-APP-A-SECT-3.2">proposals</a> and the W3C convened a Working Group to review these proposals and come up with a single standard.</p>
<p id="id103965">But, in the words of Cindy Lauper, <a id="id103968" href="http://www.youtube.com/watch?v=3aK-UjR3Oj4">money changes everything</a>. XML itself was assembled by eleven specialists in a niche technology, SGML, that wanted to make standardized electronic publishing simpler, and they managed to stay under most radar systems and come out with something <a id="id103106" href="http://www.w3.org/TR/1998/REC-xml-19980210">simple and lean</a>. However, when the XML Schema Working Group convened, many big and small companies were smelling lots of money and wanted to influence the results. Of the 31 companies that sent representatives to this Working Group (31!), many had little or nothing to do with publishing, electronic or otherwise. There were database vendors such as Microsoft, Informix, Software AG, IBM and Oracle (to be fair, large software companies have always been up there with legal publishers and defense contractors as believers in automated publishing technology; note where SGML got its start).  There were successful or aspiring B2B ecommerce vendors such as CommerceOne, Progress Software, and webMethods. Microsoft, Xerox, CommerceOne, IBM, Oracle, Progress Software, and Sun were each interested enough to send two representatives to the committee, so there were a lot of cooks working on this broth.</p>

<p id="schema">The result was a <a id="id103135" href="http://www.w3.org/TR/#tr_XML_Schema">three-part specification</a>: Part 0 was a primer, Part 1 specified how to define document structures, and Part 2 described basic data types and how to extend them. Part 2 is pretty good, and also provides the basis for RDF data typing. Part 1, in my opinion, ended up being an ugly, complicated mess in its attempt to serve so many powerful masters. </p>
<p id="id103154">Two members of the original eleven-member XML team, James Clark and Makoto Murata, developed an alternative to Part 1 that was both simpler and more powerful called <a id="id103157" href="http://relaxng.org/">RELAX NG schemas</a>. Clark had written the only open source SGML parser, and the first XSLT processor, and came up with the name "XML," among his many other achievements; he's also written some <a id="id103166" href="http://code.google.com/p/jing-trang/">great software</a> to implement RELAX NG and convert between schema formats. RELAX NG never became as popular as XML Schema, because it didn't have the big industry names behind it, and because it was optimized around the original XML use case: describing content for publication.</p>
<p id="id103179">Despite a complex syntax, incompatibilities among parsers, an often inscrutable spec, and less expressive power than RELAX NG, the W3C XML Schema specification has become popular because it's a W3C standard that addresses the original main problems of XML for ecommerce: it specifies document structures using XML, it lets you use traditional datatypes, and it has the added bonus for many developers of making it easier to round-trip XML elements to Java data structures. (After railing against the influence of this last part for years, I learned that it was primarily the work of Matthew Fuchs, an old friend I've known since he was finishing up his Ph.D. in computer science at NYU's <a id="id103193" href="http://cims.nyu.edu/">Courant Institute</a> when I was doing my masters there in the mid-nineties. He was the only other person there who even knew what SGML was.) So, XML Schema continues to be used by many large organizations to store data that doesn't fit neatly into relational tables. In fact, <a id="id103205" href="http://www.topquadrant.com">TopQuadrant</a> has been adding more and more features to the TopBraid platform to make it easier to incorporate such data into a system that uses semantic web standards.  </p>

<h2 id="id103216">...and back.</h2>

<p id="id103220">Getting back to to the topic of leaner, simpler alternatives for representing information of potentially arbitrary structure, the JavaScript-based <a id="id103225" href="http://json.org/">JSON</a> format started getting popular around 2006. The third paragraph of its <a id="id103970" href="http://en.wikipedia.org/wiki/JSON">Wikipedia page</a> flatly states that "it is used primarily to transmit data between a server and web application, serving as an alternative to XML."</p>

<p id="id103980">A Google search for <a id="id103983" href="https://www.google.com/search?q=%22json+replace+xml%22">"json replace xml"</a> gets over 5,000 hits. (That's with the quotes around the search terms, to make Google search for the exact phrase. Without the quotes, it gets almost five million hits.) I like JSON, and see how it can replace many of the uses of XML that have been around since the dot com boom days, but anyone who thinks it can completely replace XML doesn't understand what XML was designed for. Documents with inline markup (or, in XML geekspeak, "mixed content"&#8212;for example, the way the HTML <tt id="id104006">a</tt> element can be in the middle of a sentence within a <tt id="id104011">p</tt> element) would theoretically work fine in JSON, but in practice, it would be too easy to screw it up when editing it with a text editor by accidentally adding or removing a single curly brace. Tools to hide the syntax behind a more intuitive interface may address the issue, but dependence on such tools was something that the original XML designers wanted to avoid. And frankly, when I picture a complex prose document stored in JSON, I hear the ghost of Microsoft's <a id="id104022" href="http://en.wikipedia.org/wiki/Rich_Text_Format">RTF</a> dragging chains through the attic.</p>

<p id="id104031">Between JSON's growing role as an inter-computer data format and RELAX NG's foothold in schemas like DocBook and companies like LexisNexis, I see the XML infrastucture getting back to its original use cases, which makes good sense to me. Each year at the <a id="id104038" href="http://xmlsummerschool.com/">XML Summer School</a> in Oxford, it's been very interesting to see the new things people are doing with XML, especially as XQuery-based XML databases like <a id="id104047" href="http://www.marklogic.com/">MarkLogic</a> and <a id="id104054" href="http://exist.sourceforge.net/">eXist</a> grow in power. I've been chairing the semantic web track at the summer school for the past few years and hardly been involved in XML at all, but it's always great to hear what my old friends are up to. Especially when there's great beer available.</p>


<center id="id104067">
<a id="id104069" href="http://www.snee.com/bob/sgmlfree/"><img id="id104074" height="150" src="http://www.snee.com/bob/img/sgmlcdsmall.jpg" border="0" alt="SGML CD cover"/></a>
   
<a id="id104088" href="http://www.snee.com/bob/xmlann/"><img id="id104092" height="150" src="http://www.snee.com/bob/img/xmlasbig.gif" border="0" alt="XML Annotated Spec cover"/></a>
   
<a id="id104105" href="http://www.snee.com/bob/xsltquickly/"><img id="id104109" height="150" src="http://www.snee.com/bob/img/XQcoverSmall.jpg" border="0" alt="XSLT Quickly cover"/></a>
</center>





<p id="id104132">Please add any comments to  <a id="id104135" href="https://plus.google.com/101006505484718936507/posts/HNF95EdnXEy">this Google+ post</a>.</p>



    </div>
]]>
    </content>
</entry>

<entry>
    <title>Having a Blue Ridge Christmas</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/12/having-a-blue-ridge-christmas.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.678</id>

    <published>2011-12-16T14:56:01Z</published>
    <updated>2011-12-16T15:16:53Z</updated>

    <summary>They&apos;re playing my song!</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="music" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="music" label="music" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        They&apos;re playing my song!
        <![CDATA[    <div id="id103351">


<p id="id103366">A few months ago I saw a <a id="id103369" href="http://cvillechristmascd.wordpress.com/about/">call for contributions</a> of recordings of original holiday songs for a CD to be called "A Charlottesville Songwriters Christmas" to benefit a <a id="id103378" href="http://www.kidpanalley.org/">local charity</a>. Around here there seems to be a law that when you name a business you have to name it either Jefferson (whatever), Piedmont (whatever), or Blue Ridge (whatever), so I decided to write a song whose name is a variation on "Blue Christmas" called "Blue Ridge Christmas." I thought about trying to put together a band to record it, but some friends who I've <a id="id103391" href="https://www.facebook.com/pages/Jazz-Collective-9/155717025518">played jazz</a> with are also in a <a id="id103399" href="http://soultransitband.com/">local soul band</a> with a really great singer (note his <a id="id103406" href="http://www.jerusalemchurchva.org/">day job</a>), so I offered it to them, and they made a great recording of it.</p>
<p id="id103414">For the holiday season, the Charlottesville Downtown Business Association made a <a id="id103418" href="http://youtu.be/uMvhYX63ds4">video</a> to encourage people to shop on the downtown mall and they chose this recording as the music. It was fun for me to see it, and it's nice to know that letting my friends  hear the song won't mean ripping it from a charity CD and putting it where people can download it. This doesn't quite compare with my <a id="id103430" href="http://www.mcylinder.com/">brother's</a> work for <a id="id103438" href="http://www.youtube.com/watch?v=Pa0oA5IxwJk">VW</a> or <a id="id103445" href="http://www.youtube.com/watch?v=h5n7bQdW0CQ">Wendy's</a>, but it's fun to know that it came out well and that lots of people can see the video&#8212;and that the song has had a bit of <a id="id103455" href="https://www.facebook.com/permalink.php?story_fbid=319707264724842&amp;id=115486258480278">airplay</a> on WNRN!</p>

<object id="id103464" width="560" height="315"><param id="id103470" name="movie" value="http://www.youtube.com/v/uMvhYX63ds4?version=3&amp;hl=en_US"/><param id="id103476" name="allowFullScreen" value="true"/><param id="id103483" name="allowscriptaccess" value="always"/><embed id="id103489" src="http://www.youtube.com/v/uMvhYX63ds4?version=3&amp;hl=en_US" type="application/x-shockwave-flash" width="560" height="315" allowscriptaccess="always" allowfullscreen="true"/></object>

<hr id="id103516"/>

<p id="id103518">Please add any comments to  <a id="id103521" href="https://plus.google.com/101006505484718936507/posts/co31Mpurbw7">this Google+ post</a>.</p>


    </div>
]]>
    </content>
</entry>

<entry>
    <title>Javascript from the command line</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/11/javascript-from-the-command-li.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.676</id>

    <published>2011-11-21T13:46:39Z</published>
    <updated>2011-11-21T13:56:14Z</updated>

    <summary>In Linux and Windows. (Goodbye Cscript!)</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="neat tricks" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="javascriptrhino" label="javascript rhino" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        In Linux and Windows. (Goodbye Cscript!)
        <![CDATA[<div>

<a id="id103332" href="http://www.mozilla.org/rhino/"><img id="id103337" src="http://www.mozilla.org/rhino/rhino50.jpg" width="200" border="0" align="right" hspace="30px" vspace="30px" alt="Mozilla Rhino"/></a>

<p id="id103358">A few years ago I wrote about <a id="id103361" href="http://www.snee.com/bobdc.blog/2008/04/windows-command-line-text-proc.html">Windows command line text processing with Javascript</a> using Microsoft's <a id="id103369" href="http://technet.microsoft.com/en-us/library/bb490887.aspx">Cscript</a> utility. I was surprised to find no Linux equivalent, and while I'd heard of <a id="id103378" href="http://www.mozilla.org/rhino/">Mozilla Rhino</a> I had some vague ideas about how using it only meant integrating it into other applications.</p>
<p id="id103388">After some hunting, I learned that Rhino includes a jar file that makes it easy to run a script from the command line. Once you have it, running a script named myscript.js is as simple as this:</p>
<pre id="id103395">
java -jar js.jar myscript.js
</pre>
<p id="id103400">If you're really interested in text processing, you can pipe and redirect the output. </p>
<p id="id103405">After I downloaded Rhino and got this to work I searched my hard disk and found that js.jar was already on my hard disk in several places: with OpenOffice, with Swoop, and with Eclipse (and therefore with TopBraid Composer), so I've had it right under my nose for years. <a id="id103412" href="http://www.mcylinder.com/">My brother</a> checked his Mac and found that js.jar came with an <a id="id103420" href="http://cmusphinx.sourceforge.net/sphinx4/">open source speech recognizer</a> that he had installed.</p>
<p id="id103428">One neat part was that some fairly complex JavaScript scripts that I had run with Cscript ran with js.jar after one minor change that actually improved the scripts: instead of a <tt id="id103434">print()</tt> function for basic text output, Cscript has this <tt id="id103439">WScript.Echo()</tt> thing instead (WScript is a more Windows-oriented version of Cscript), so I had put the following function in my command-line JavaScript scripts:</p>
<pre id="id103447">
function print(OutString) {
  WScript.Echo(OutString);
};
</pre>
<p id="id103453">Because js.jar supports a native <tt id="id103456">print()</tt> function, the only change necessary to any of my scripts was to comment out the three lines above, and js.jar then happily ran my existing scripts. </p>
<p id="id103464">If you start up js.jar without providing a script name as an argument, you get a js command line. Enter <tt id="id103468">help()</tt> there to see some interesting commands that you can add to your scripts&#8212;for example, <tt id="id103475">readUrl()</tt>. (Note that these commands are case-sensitive.) </p>
<p id="id103481">I mostly tested this on a Windows machine, but it all worked fine on a machine running the latest Ubuntu.</p>
<p id="id103487">The reason I got interested in this recently was that I had just pulled a ton of menu definition JavaScript off a website, with the majority of it being JSON definitions of the website's menu structure. I wanted to store all these definitions in SKOS RDF. Once I added and redefined a few functions in the JavaScript code that I had downloaded, I ran it all and redirected the output to RDF files all pretty easily. I'm definitely going to have some more fun with this.</p>
    <div id="id103498">

<p id="id103501"/>


<hr id="id103528"/>

<p id="id103531">Please add any comments to  <a id="id103534" href="https://plus.google.com/101006505484718936507/posts/D5MyYkdzFft">this Google+ post</a>.</p>



    </div>

</div>]]>
    </content>
</entry>

<entry>
    <title>Publishing academic research data</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/10/publishing-academic-research-d.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.670</id>

    <published>2011-10-17T17:54:58Z</published>
    <updated>2011-10-17T18:02:02Z</updated>

    <summary>My geeky perspective and some broader perspectives.</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="linked data" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="publishing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        My geeky perspective and some broader perspectives.
        <![CDATA[    <div id="id103335">

<a id="id103338" href="http://opencitations.wordpress.com/2011/10/17/the-five-stars-of-online-journal-articles-3/"><img id="id103344" src="http://www.snee.com/bobdc.blog/img/5stars.png" border="0" align="right" hspace="30px" vspace="30px" width="200" alt="David Shotton's 5 stars of academic publishing"/></a>

<p id="id103365">Along with Jo Rabin's talk that I mentioned here <a id="id103368" href="http://www.snee.com/bobdc.blog/2011/10/displaying-sparql-results-on-a.html">earlier this month</a>, another inspirational talk in the recent  <a id="id103377" href="http://xmlsummerschool.com/">XML Summer School</a> <a id="id103384" href="http://xmlsummerschool.com/curriculum-2011/trends-and-transients-2011/">Trends and Transients</a> track was "Applying XML and semantic technologies to liberate infectious disease data" by Oxford University zoology professor <a id="id103393" href="http://www.zoo.ox.ac.uk/staff/academics/shotton_dm.htm">David Shotton</a>. He described how, while assembling a paper on leptospira infection in urban slums, he used data and metadata from the project to create the version described in a separate paper, <a id="id103403" href="http://ora.ox.ac.uk/objects/uuid%3A3e39b4ec-8cdd-40d6-8648-a5d7b2946bb9">Semantically enhanced version of a research article from PLoS Neglected Tropical Diseases</a>. (Note the bottom of that page, where it lets you pull down bibliographic data in your choice of RDF serializations. Also, don't miss the  <a id="id103412" href="http://imageweb.zoo.ox.ac.uk/pub/2008/plospaper/latest/"> semantically enhanced paper</a> itself, and make sure to click around in it.)</p>

<p id="id103422">After his presentation one audience member asked how an academic department with limited resources and technical background could move in this same direction without attempting to reproduce the full infrastructure, and Professor Shotton suggested that they start by putting their research data on the web along with some metadata about it. This got me thinking about Tim Berners-Lee's <a id="id103431" href="http://www.w3.org/DesignIssues/LinkedData.html">Linked Data 5 Stars</a>, a series of incremental steps toward publishing open linked data in machine-readable standardized formats. I raised my hand and suggested to Shotton that, building on his answer to that question, an alternative version of the five stars for academic researchers could provide a valuable guideline for others interested in following in his footsteps. And he's done it! He just published <a id="id103445" href="http://opencitations.wordpress.com/2011/10/17/the-five-stars-of-online-journal-articles-3/">The Five Stars of Online Journal Articles</a> on his blog, which points to a longer version of the article that he's submitted to <a id="id103455" href="http://www.nature.com/">Nature</a>. My original idea was more of a revision of Berners-Lee's original five stars, but Shotton drew on his extensive academic publishing experience to bring in a lot of bigger-picture issues such as peer review and specific repositories that could host such data.</p>

<p id="id103467">I had been thinking about the potential of academic researchers publishing data using Linked Data principles before this year's XML Summer School; one reason I started the <a id="id103473" href="http://www.meetup.com/cvillesemweb/">Charlottesville Semantic Web Meetup</a> was to find people at the University of Virginia who were interested in pursuing this. I recently learned about someone else who's been thinking hard about issues around publication of research data: UCLA's <a id="id103484" href="http://polaris.gseis.ucla.edu/cborgman/Chriss_Site/Welcome.html">Christine Borgman</a>, whose paper <a id="id103492" href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1869155">The Conundrum of Sharing Research Data</a> appeared in the June issue of the Journal of the American Society for Information Science and Technology. (Click "One-Click Download" on that page to retrieve the paper itself.)</p>
<p id="id103504">As I realized when I read David Shotton's article, I've been focused on the technical issues, but there are many others to consider. Here are a few quotes from Borgman's abstract:</p>
<blockquote id="id103511">
This article explores the complexities of data, research practices, innovation, incentives, economics, intellectual property, and public policy associated with the data sharing conundrum.</blockquote>

<blockquote id="id103518">Rationales for sharing data vary along two dimensions: whether motivated by research concerns or by leveraging public investments, and whether intended to serve the interests of researchers who produce data or the interests of potential re-users of data.
</blockquote>

<blockquote id="id103531">Four rationales for sharing research data are identified and positioned on these dimensions. Researchers&#8217; incentives to share their data depend not only on these rationales, but on characteristics of their data and research practices, funding agency policies, and resources for data management. Much more is understood about why researchers do not share data than about when, why, and how researchers do share data, or about when, how, and why researchers or the public reuse data. The model and research agenda are illustrated with examples from the sciences, social sciences, and humanities.
</blockquote>

<p id="id103548">Here's one quote from the main body of the article:</p>
<blockquote id="id103553">
If the rewards of big data are to be reaped, then researchers who produce those data must share them, and do so in such a way that the data are interpretable and reusable by others. Underlying this simple statement are thick layers of complexity about the nature of data, research, innovation, and scholarship, incentives and rewards, economics and intellectual property, and public policy.</blockquote>

<p id="id103564">Her paper goes on to describe these layers. And, I have to love any academic paper that refers to a "dirty little secret." I'll let you find that part yourself. While Borgman's paper doesn't get down to the level of data models and serializations for sharing data, if you're at all interested in how Linked Data may benefit the academic research world, her paper is really worth reading. </p>

<hr id="id103574"/>

<p id="id103577">Please add any comments to  <a id="id103580" href="https://plus.google.com/101006505484718936507/posts/JsyCFpxnTL2">this Google+ post</a>.</p>



    </div>
]]>
    </content>
</entry>

<entry>
    <title>Displaying SPARQL results on a mobile phone</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/10/displaying-sparql-results-on-a.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.669</id>

    <published>2011-10-04T13:31:53Z</published>
    <updated>2011-10-05T13:50:09Z</updated>

    <summary>Nicely.</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="SPARQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="XSLT" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="sparqlmobilexslt" label="SPARQL mobile XSLT" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        Nicely.
        <![CDATA[    <div id="id103330">

<blockquote id="id103334" class="pullquote" style="width: 190px; font: bold 1.333em/1.125em &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double #ddd; border-width: 3px 0; text-align: center; float: right; "><strong id="id103344">The ability to create mobile-native web apps with SPARQL and simple XSLT stylesheets should open up a lot of possibilities.</strong></blockquote>

<p id="id103350"><a id="id103351" href="http://xmlsummerschool.com/faculty-2011/#rabin">Jo Rabin</a>'s "Mobile is not The Future (It&#8217;s Now)" presentation in the <a id="id103361" href="http://xmlsummerschool.com/curriculum-2011/trends-and-transients-2011/">Trends and Transients</a> portion of this year's <a id="id103368" href="http://xmlsummerschool.com/">XML Summer School</a> (and the reading he suggested, such as <a id="id103376" href="http://communities-dominate.blogs.com/brands/2011/09/22-percent-changed-their-mind-while-in-the-store-why-every-retailer-needs-a-mobile-strategy.html">this Tomi Ahonen blog post</a>) got me thinking much harder about mobile delivery. One of my first ideas was how easy the <a id="id103386" href="http://jquerymobile.com/">jQuery Mobile</a> Javascript library could make it to deliver SPARQL query results, and in less than 30 minutes I wrote an <a id="id103395" href="http://snee.com/sparql/xslt/SPARQLMobileResults.xsl">XSLT stylesheet</a> that can take the <a id="id103403" href="http://www.w3.org/TR/2008/REC-rdf-sparql-XMLres-20080115/">SPARQL Query Results XML Format</a> version of any SPARQL query result and use this library to render the results nicely for mobile phones.</p>


<p id="id103414">A SPARQL query that SELECTs more than one variable returns a two-dimensional grid of information, but a more one-dimensional display works better on phones, so the initial display created by my stylesheet is a series of buttons that show the values of the first selected variable. Clicking one displays the values that go with it&#8212;the values that would have been the rest of its row in a two-dimensional display. Below, on both an LG Ally running Android and on an iPhone, you can see the stylesheet's rendering of DBpedia's results from a query for the name, artist, release date, and URI of albums produced by Timbaland. Below that you can see the same thing on the Ally after I turned the phone sideways. (Click either image to see a larger version.) You can see the results of the query in your own browse formatted for a mobile <a id="id103461" href="http://snee.com/sparql/m/timbaland.html">here</a>; for context (and to see the actual query) see the <a id="id103469" href="http://dbpedia.org/snorql/?query=PREFIX+dbpedia-owl%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E+%0D%0APREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0ASELECT+%3FalbumName+%3FartistName+%3FreleaseDate++%3FalbumURL+WHERE+%0D%0A{+%3FalbumURL+dbpedia-owl%3Aproducer+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FTimbaland%3E+%3B++++++++%0D%0A++++++++++++dbpedia-owl%3Aartist+%3Fartist+%3B++++++++%0D%0A++++++++++++dbpedia-owl%3AreleaseDate+%3FreleaseDate+%3B++++++++%0D%0A++++++++++++foaf%3Aname+%3FalbumName+.++%0D%0A++%3Fartist+foaf%3Aname+%3FartistName.++++%0D%0A++FILTER+%28+lang%28%3FartistName%29+%3D+%27en%27+%29++++%0D%0A++FILTER+%28+lang%28%3FalbumName%29+%3D+%27en%27+%29%0D%0A}%0D%0AORDER+BY+%3FreleaseDate%0D%0A">DBpedia default display</a> of the results. </p>

<center id="id103490"><a id="id103491" href="http://www.snee.com/bobdc.blog/img/AndroidAndIPhone.jpg"><img id="id103496" src="http://www.snee.com/bobdc.blog/img/AndroidAndIPhone.jpg" border="0" alt="Android LG Ally and iPhone showing SPARQL results" width="300"/></a></center>


<center id="id103510"><a id="id103511" href="http://www.snee.com/bobdc.blog/img/AndroidHorz.jpg"><img id="id103515" src="http://www.snee.com/bobdc.blog/img/AndroidHorz.jpg" border="0" alt="Horizontal Android LG Ally" width="300"/></a></center>


<p id="id103528">For another demo query, I asked DBpedia for the names, revenue figures, foundation year, and descriptions of CRM vendors. Compare the  <a id="id103533" href="http://snee.com/sparql/m/crm.html">version formatted for mobiles</a> with <a id="id103541" href="http://dbpedia.org/snorql/?query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+dct%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E+%0D%0APREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0D%0ASELECT+%3Fname+%3Ffounded+%3Frevenue+%3Fdescription+WHERE+{%0D%0A++%3Fco+dct%3Asubject+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3ACRM_software_companies%3E+%3B%0D%0A++++++rdfs%3Alabel+%3Fname+%3B%0D%0A++++++dbo%3Arevenue+%3Frevenue+%3B%0D%0A++++++rdfs%3Acomment+%3Fdescription+%3B%0D%0A++++++dbo%3AformationYear+%3Ffounded+.%0D%0A++FILTER+%28+lang%28%3Fname%29+%3D+%27en%27+%29%0D%0A++FILTER+%28+lang%28%3Fdescription%29+%3D+%27en%27+%29%0D%0A}%0D%0AORDER+BY+%3Fname%0D%0A%0D%0A%0D%0A">the default DBpedia display</a>.</p>

<p id="id103561">A few issues to keep in mind:</p>
<ul id="id103566">
<li id="id103569"><p id="id103570">The display includes variable names with each value to show what that value represents (for example, albumName and releaseDate in the pictures above), but you could customize the stylesheet to display the text any way you like, especially if you planned on using it with a specific dataset. For example, you could omit the variable names or have your query provide <tt id="id103579">rdfs:label</tt> versions of them to use instead.</p></li>
<li id="id103585"><p id="id103586">Long strings of text with no spaces to wrap, like the album URLs in the Timbaland query results, may not look great, but I included the albumURL one in that query just to make sure that my stylesheet would render them as working hypertext links.</p></li>
<li id="id103594"><p id="id103595">If your first variable represents a resource URI instead of a literal value, it won't be a hypertext link in the displayed page, because pressing the button with each result row's first value expands or contracts the display of the rest of the row's values. It makes more sense to have human-readable values and not URIs on the initial display's buttons anyway.</p></li>
<li id="id103605"><p id="id103606">If your query retrieves a lot of data, the stylesheet creates a big HTML file, and the button response may be slow on your phone, especially if the model is as old as my LG Ally. </p></li>
</ul>

<p id="id103615">I've read a little about jQuery, but I didn't need any of what I learned from that reading to create this stylesheet. If you're happy with the effects  of a particular jQuery library, using it may mean no more than creating some simple HTML (typically, some <tt id="id103622">ul</tt>, <tt id="id103626">table</tt>, and <tt id="id103630">div</tt> elements) with specific attributes set for them so that the right jQuery code affects the right elements. To design the pages created by my stylesheet, I just viewed the source and followed the model on the <a id="id103637" href="http://jquerymobile.com/demos/1.0b3/docs/content/content-collapsible-set.html">collapsible content</a> page of the jQuery Mobile site.</p>

<p id="id103647">The XML format SPARQL query results format is a model of elegant simplicity compared with RDF/XML.  (Granted, it has a much simpler job to do.) Writing code to process it in any language is usually easy. If you're new to XSLT, then with some bias I can recommend <a id="id103654" href="http://www.snee.com/bob/xsltquickly/index.html">a book on XSLT</a> that has helped many people I know learn it quickly.</p>

<p id="id103664">The ability to create mobile-native web apps with SPARQL and simple XSLT stylesheets should open up a lot of possibilities, because semantic web and linked data application architectures ranging from simple batch files to TopBraid's <a id="id103670" href="http://topquadrant.com/products/SPARQLMotion.html">SPARQLMotion</a> let you hand off XML format SPARQL query results to an XSLT processor. (It should work with the SNORQL interface to Linked Data Cloud datasets such as DBpedia, where the input form lets you specify your own XSLT stylesheet to run, but <a id="id103685" href="http://sourceforge.net/mailarchive/forum.php?thread_name=4E889FF0.40908%40openlinksw.com&amp;forum_name=dbpedia-discussion">this feature is currently disabled on the DBpedia Virtuoso instance</a>. It will be great if they enable it or include a similar stylesheet among the installed choices; meanwhile, you can retrieve the XML results and run the XSLT on your own system.)</p>

<p><b>2011-10-05 update:</b> with Kingsley Idehen's help, I now know how to query DBpedia with my own (or any other) XSLT stylesheet. Remove the carriage returns from the following and replace the &amp;query parameter value as described:</p>
<pre>
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org
&amp;query=REPLACE-WITH-ESCAPED-QUERY
&amp;format=application%2Fsparql-results%2Bxml&amp;save=display&amp;fname=
&amp;xslt-uri=http://snee.com/sparql/xslt/SPARQLMobileResults.xsl
</pre>
<p>For example, <a href="http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&amp;query=PREFIX%20dbo%3A%20%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0ASELECT%20%3Fname%20%3Faliases%20%3Fborn%20%3Fdied%20WHERE%20%7B%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FThe_Beatles%3E%20dbo%3AbandMember%20%3FbeatleURL%20.%0A%3FbeatleURL%20%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2FalternativeNames%3E%20%3Faliases%20%3B%0Ardfs%3Alabel%20%3Fname%20%3B%0Adbo%3AbirthDate%20%3Fborn%20.%0AOPTIONAL%20%7B%3FbeatleURL%20dbo%3AdeathDate%20%3Fdied%20.%20%7D%0AFILTER%20(%20lang(%3Fname)%20%3D%20%22en%22%20)%0A%7D%0A&amp;format=application%2Fsparql-results%2Bxml&amp;save=display&amp;fname=&amp;xslt-uri=http://snee.com/sparql/xslt/SPARQLMobileResults.xsl">this query</a> asks DBpedia for the Beatles' names, aliases, birth dates, and death dates, and formats the results with the spreadsheet described above.</p>
<hr id="id103699"/>

<p id="id103701"><b id="id103703">Note on comments</b>: after turning off comments on this blog for a few days because of comment spam, turning them back seems to have no effect. So, inspired by <a id="id103707" href="http://www.jenitennison.com/blog/">Jeni Tennison</a>, I'll ask you to add any comments to <a id="id103715" href="https://plus.google.com/101006505484718936507/posts/DDX3fjABLSf">this Google+ post</a>.</p>




    </div>
]]>
    </content>
</entry>

<entry>
    <title>RDFa can be so simple</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/08/rdfa-can-be-so-simple.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.666</id>

    <published>2011-08-16T12:19:12Z</published>
    <updated>2011-08-16T12:21:09Z</updated>

    <summary>Despite claims to the contrary.</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="RDF" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="rdfa" label="RDFa" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        Despite claims to the contrary.
        <![CDATA[
    <div id="id103347">

<blockquote id="id103350" class="pullquote" style="width: 190px; font: bold 1.333em/1.125em &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double #ddd; border-width: 3px 0; text-align: center; float: right; "><strong id="id103361">You can write simple, parsable RDFa with very little syntax and trouble. Really.</strong></blockquote>

<p id="id103366">I got so tired of hearing people complain about how confusing RDFa is that while I was on hold during a recent phone call I threw together a <a id="id103369" href="http://rdfdata.org/dat/rdfademo.html">demo</a> of just how simple it can be. The document has the two basic kinds of triples: one with a literal for an object, with data typing thrown in for good measure, and one with a resource URI as its object. A View Source of that document will show this in its <tt id="id103380">head</tt> element (namespaces are declared earlier): </p>

<pre id="id103386">
    &lt;meta about="http://www.snee.com/bob/foaf.rdf#bob"
          property="foaf:givenName"
          content="Bob"
          datatype="xsd:string"/&gt;

    &lt;meta about="http://www.snee.com/bob/foaf.rdf#bob"
          rel="foaf:homePage"
          href="http://www.snee.com/bob"/&gt;
</pre>

<p id="id103396"><a id="id103397" href="http://www.w3.org/2007/08/pyRdfa/extract?uri=http%3A%2F%2Frdfdata.org%2Fdat%2Frdfademo.html">This link</a> will show you the triples as extracted by the W3C's RDFa Distiller and Parser service.</p>
<p id="id103408">My little demo doesn't take into account all the swirling attempts to innovate, accommodate, and disassociate various ideas about embedding machine-readable markup that are currently out there (if you want to stay on top of this, read <a id="id103415" href="http://www.jenitennison.com/blog/">Jeni Tennison's blog)</a>, but it highlights a principle that is probably older than FORTRAN: parsing data in a particular syntax can be a big job, because the parser must understand the full language, but writing data in a particular language can be simple because you can pick the subset that you prefer to work with.</p>
<p id="id103427">RDFa gives you many more options for embedding triples&#8212;especially if you want to embed metadata about content this is already part of an HTML page, which seems to be a key original use case, or about the page itself&#8212;but you can write simple, parsable RDFa with very little syntax and trouble. Really.</p>

<hr id="id103441"/>

<p id="id103443">(<b id="id103446">Note on comments</b>: after turning off comments on this blog for a few days because of comment spam, turning them back seems to have no effect. If you send me an email about what I've written at snee.com (bob), I'll add it and any response here.)</p>

    </div>
]]>
    </content>
</entry>

<entry>
    <title>&quot;Learning SPARQL&quot; now available</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/07/learning-sparql-now-available.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.665</id>

    <published>2011-07-27T12:12:53Z</published>
    <updated>2011-07-27T12:15:19Z</updated>

    <summary>In print and ebook formats.</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="SPARQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="publishing" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="sparql" label="sparql" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        In print and ebook formats.
        <![CDATA[
    <div id="id103331">

<a id="id103334" href="http://www.learningsparql.com"><img id="id103338" src="http://www.learningsparql.com/img/cover.jpg" width="200" border="0" align="right" hspace="30px" vspace="30px" alt="Learning SPARQL cover"/></a>

<p id="id103359">I'm very happy to announce that the ebook and print editions of <a id="id103362" href="http://www.learningsparql.com">Learning SPARQL</a> are now <a id="id103369" href="http://oreilly.com/catalog/0636920020547/">available from O'Reilly</a>. Print editions are also available from <a id="id103377" href="http://www.amazon.com/exec/obidos/ISBN=1449306594/bobducharmeA/">amazon.com</a>, <a id="id103385" href="http://www.amazon.co.uk/Learning-SPARQL-Bob-DuCharme/dp/1449306594">amazon.co.uk</a>, maybe some more Amazons, and <a id="id103393" href="http://www.barnesandnoble.com/w/learning-sparql-bob-ducharme/1103138225">Barnes and Noble</a>. (<a id="id103400" href="http://www.borders.com/online/store/TitleDetail?sku=1449306594">Borders</a> says that it's on backorder, but I wouldn't hold your breath for that.) You can read more about how I came to write the book in an <a id="id103410" href="http://www.snee.com/bobdc.blog/2011/06/my-upcoming-oreilly-book-learn.html">earlier blog posting</a>.</p>
<p id="id103420">Right now it's the only complete book on the W3C standard query language for linked data and the semantic web, and as far as I know the only book at all that covers the full range of SPARQL 1.1 features such as the ability to update data. The book steps you through simple examples that can all be performed with free software, and all sample queries, data, and output are available on the book's website. In the words of <a id="id103429" href="http://datypic.com/">Priscilla Walmsley</a>, "It's excellent&#8212;very well organized and written, a completely painless read. I not only feel like I understand SPARQL now, but I have a much better idea why RDF is useful (I was a little skeptical before!)"</p>
<p id="id103444">I will continue to post news about the book and about SPARQL on the book's twitter account at <a id="id103449" href="http://twitter.com/#!/learningsparql">@LearningSPARQL</a>. I'm not starting a separate blog for the book, so I will continue to blog about SPARQL <a id="id103457" href="http://www.snee.com/bobdc.blog/metadata/rdf/sparql/">here</a>.</p>

    </div>
]]>
    </content>
</entry>

<entry>
    <title>Linking linked data to U.S. law</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/07/linking-linked-data-to-us-law.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.664</id>

    <published>2011-07-08T12:29:08Z</published>
    <updated>2011-07-20T21:57:36Z</updated>

    <summary>Automating conversion of citations into URLs.</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="RDF" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="legal publishing" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="legalpublishing" label="legal publishing" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="linkeddata" label="linkeddata" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        Automating conversion of citations into URLs.
        <![CDATA[   <div id="id103332">

<p id="id103335">At a recent <a id="id103338" href="http://www.w3.org/2011/gld/wiki/F2F1">W3C Government Linked Data Working Group working group meeting</a>, I started thinking more about the role in linked data of laws that are published online. To summarize, you don't want to publish the laws themselves as triples, because they're a bad fit for the triples data model, but as online resources relevant to a lot of issues out there, they make an excellent set of resources to point to, although you may not always get the granularity you want. </p>

<blockquote id="id103354" class="pullquote" style="width: 190px; font: bold 1.333em/1.125em &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double #ddd; border-width: 3px 0; text-align: center; float: right; "><strong id="id103364">Plenty of government data references laws and related materials.</strong></blockquote>

<p id="id103368">I'm discussing U.S. Federal law here, but similar principles should apply both in individual states and in other countries. The main sets of laws here are legislation, code, regulations, and court decisions. ("Code" refers to laws passed by legislation, arranged by topic; for example, laws passed about taxes are gathered into the Internal Revenue Code.) If you really want to learn about the various forms of legal material and their relationship, I highly recommend the book <a id="id103379" href="http://www.amazon.com/Finding-Law-12th-American-Casebooks/dp/0314145796/bobducharmeA/">Finding the Law</a>, which I found indispensable when I worked at LexisNexis.  </p>

<p id="id103389">Most law consists of narrative sentences arranged as paragraphs, often with metadata assigned to certain blocks of it. It's such a good fit for XML that legal publishers were among the first users of XML's predecessor, SGML. (Their use of XML and SGML account for a large chunk of my career, and I know that some old XML friends like <a id="id103397" href="http://seanmcgrath.blogspot.com/">Sean McGrath</a> and Dale Waldt continue to make great contributions in this area.) So, while you wouldn't get much benefit splitting these sentences and paragraphs into subjects, predicates, and objects and publishing them as triples, plenty of government data references laws and related materials, and it's more helpful if they can reference them with URLs that lead to the actual laws. To add these URLs with any kind of scalability, you need to find out the common format for citing a document (or, if possible, a point within a document) and an online source of those legal documents whose URLs can be built from that citation format with a regular expression or some other automated tool.  </p>

<p id="id103417">When creating links to any specific bits of U.S. law, the most valuable book is <a id="id103421" href="http://www.amazon.com/Bluebook-Uniform-System-Citation/dp/0615361161/bobducharmeA">The Bluebook: A Uniform System of Citation</a>. As the subtitle implies, the book describes the normalized way to refer to legal documents and their components. Once you know these, a regular expression can often turn them into a URL that leads a browser right to the part you want. For example, while people often refer to the Supreme Court case outlawing school segregation as "Brown v. Board of Education", its official name is "347 U.S. 483", which means "the case beginning on page 483 of volume 347 of the official publication of U.S. Supreme Court decisions". </p>

<p id="id103440">While there are several sites hosting Supreme Court decisions out there, notably Cornell Law School's <a id="id103442" href="http://www.law.cornell.edu/supct/">Legal Information Institute</a>, the one whose URLs are easiest to construct from a proper Supreme Court citation are at justia.com, where the URL for Brown v. Board of Education is <a id="id103452" href="http://supreme.justia.com/us/347/483/case.html">http://supreme.justia.com/us/347/483/case.html</a>. (See also my favorite case,  Campbell aka Skyywalker et al v. Acuff Rose Music, Inc. at <a id="id103462" href="http://supreme.justia.com/us/510/569/case.html">http://supreme.justia.com/us/510/569/case.html</a>. Make sure to listen to the relevant work <a id="id103470" href="http://www.youtube.com/watch?v=65GQ70Rf_8Y">on YouTube</a> while you review it.) If you're really interested in linked data and U.S. Supreme Court cases, DBpedia has lots of great metadata for many important cases, as I wrote about in <a id="id103480" href="http://www.snee.com/bobdc.blog/2009/07/court-decision-metadata-and-db.html">Court decision metadata and DBpedia</a>.</p>
<p id="id103490">To create a URL for other U.S. court systems, you'll have to look up the proper way to cite them in a resource like the Bluebook and then look for versions of that court's cases online with URLs that reflect the citation in a manner that lets you automate the creation of the URL. This is a theme for linking to any kind of law on the web, and you can be sure that developers at the Legal Information Institute, LexisNexis, WestLaw, and other legal publishers have put plenty of time into developing regular expressions to make this happen so that they can turn plain text citations into hypertext links. (It would be great if the LII made their regular expressions public. LexisNexis and WestLaw never would, although they're more interested in keeping such proprietary work away from each other than from us.)  </p>

<p id="id103523">Legislation can be more complicated, but two excellent resources make it remarkably simple: the Library of Congress's <a id="id103528" href="http://thomas.loc.gov/home/thomas.php">THOMAS</a> system lets you create persistent URLs for legislation using the <a id="id103536" href="http://thomas.loc.gov/home/handles/help.html">handle system</a> (see also <a id="id103543" href="http://www.handle.net/factsheet.html">its inventor's web page on it</a>), which I hadn't heard of before the Government Linked Data meeting. The Law Librarian Blog has a <a id="id103552" href="http://lawprofessors.typepad.com/law_librarian_blog/2008/10/lc-thomas-imple.html">nice entry</a> showing examples of how to use it. <a id="id103561" href="http://legislink.org/">LegisLink</a> is another way to link to legislation, and looks simpler to me. A Legal Information Institute <a id="id103569" href="http://blog.law.cornell.edu/voxpop/tag/persistent-urls-for-legal-information/">blog entry</a> has a good explanation of this, and LegisLink provides an excellent <a id="id103578" href="http://legislink.org/us">form</a> to construct the URLs. These even let you construct links to a specific section of a piece of legislation. </p>

<p id="id103587">Granularity is an even bigger issue when linking to code and regulations, which are often broken down into numbered and lettered pieces of pieces of pieces. Ever since I worked at the grandly named <a id="id103593" href="http://ria.thomsonreuters.com/">Research Institute of America</a> (a publisher of hyperlinked U.S. tax law and related information), it's always irked me to see people refer to a pension plan as a 401K, because as subsection k of section 401 of the U.S. Tax Code (title 26 of the U.S. Code), it's more properly written 401(k), or, to use its full name, 26 USC 401(k). The Government Printing Office lets you you link directly to section 401, if not subsection k, with the URL <a id="id103882" href="http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=browse_usc&amp;docid=Cite:+26USC401">http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=browse_usc&amp;docid=Cite:+26USC401</a>, and the LII lets you link to it with <a id="id103893" href="http://www.law.cornell.edu/uscode/26/usc_sec_26_00000401----000-.html">http://www.law.cornell.edu/uscode/26/usc_sec_26_00000401----000-.html</a>. </p>

<p id="id103903">That's the US Code, which arranges the laws by topic. Regulations are arranged by topic in the CFR, or Code of Federal Regulations. For example, the legal definition of bourbon is in title 27 of the CFR (Alcohol, Tobacco Products and Firearms), Part 5 (Labeling and Advertising of Distilled Spirits), section 22 (The standards of identity), subsection b (Class 2; whisky) subsubsubsection (1)(i). The full citation would be 27 CFR 5.22(b)(1)(i), but I know of no way to link to anything more specific than 27 CFR 5.22: <a id="id103914" href="http://edocket.access.gpo.gov/cfr_2010/aprqtr/27cfr5.22.htm">http://edocket.access.gpo.gov/cfr_2010/aprqtr/27cfr5.22.htm</a>. (Bookmark that on your phone's browser and then bet a Maker's Mark with the next barroom loudmouth that you hear insisting that bourbon must legally be made in Bourbon County, Kentucky. He's wrong. It can be made anywhere in the United States.)</p>

<p id="id103928">As you can see, there's some work involved in creating URLs for links to laws, but research for this blog entry led me to new resources like LegisLink that I hadn't heard of before, so I encourage you to let me know if there's anything important that I'm missing. </p>

<p id="id103936">It was also interesting to see that the LII is involved in <a id="id103940" href="http://topics.law.cornell.edu/wiki/lexcraft/urn_lex">efforts</a> to create an international standard for legal document URIs proposed by some Italian legal researchers. (This is particularly interesting when you consider that Italian legal researchers basically <a id="id103951" href="http://www.oreillynet.com/xml/blog/2003/05/when_did_linking_begin.html">invented the concept of linking</a> 900 years ago.) </p>

<hr/>

<p>A comment from Frank Bennett of Nagoya University's Faculty of Law:</p>

<p>These are indeed important developments. The systematic linking of
case law and statutory data promise to have a large and positive
impact on our access to legal resources. The only point I would take
issue with is the reliance on Bluebook citation forms as the rosetta
stone for identifying resources. Parsing cites out of plain text is a
necessary kludge, given the general absence of meaningful structured
metadata from online legal resources (thank you Lexis, thank you
WestLaw), but it should be recognized as a kludge.</p>

<p>To get a lively set of service layers running on top of legal data,
the metadata contained in or relevant to a particular case, statutory
provision or regulatory provision needs to be readily accessible to
calling applications. While it is true that string parsing machinery
can be written to a good standard, assuming perfectly regular citation
forms and uniform document formats, neither of those constraints
applies in the wild. The Bluebook shares the field in North America
with the ALWD and the McGill Guide. To make matters worse, the
Bluebook specifies citation forms for some foreign legal resources
that vary significantly from the native citation forms of the target
jurisdictions. Document formats vary as well, so getting an accurate
string parse may require special-purpose serialization of the document
before applying a string parser to the text -- which may be hundreds
of pages in length. Although certainly better than nothing, string
parsing is a fragile strategy that would be very cumbersome to
standardize and does not scale well.</p>

<p>Matching rendered cites to URLs is an important prospect, but we won't
see significant progress at the application level until the
intervening step of producing true structured metadata -- and
embedding it in our online resources -- is covered.</p>

<hr/>

<p>A comment from Augusto Herrmann:</p>

<p>I just read your interesting article intitled "Linking linked data to U.S.
law". I'd like to point you to a quite successful government project that
uses URN for Brazilian legislation. The portal where you can search for
legislation is at http://www.lexml.gov.br and information about the project
can be found on http://projeto.lexml.gov.br . There you can find the
document <a href='http://projeto.lexml.gov.br/documentacao/Parte-2-LexML-URN.pdf'>"Parte 2: LEXML
URN"</a>
which describes the rules to construct official URN for legislation and
court decisions (it's in portuguese, though). The project started circa 2004
and closely followed the footsteps of the Italian Norme in Rete project. If
you aren't yet familiar with it, it's worth a look (see also akomantoso.organd
metalex.eu).</p>



<hr id="id103960"/>
<p id="id103963">(<b id="id103966">Note on comments:</b> after turning off comments on this blog for a few days because of comment spam, turning them back seems to have no effect. If you send me an email about what I've written at snee.com (bob), I'll add it and any response here.)</p>

    </div>
]]>
    </content>
</entry>

<entry>
    <title>My upcoming O&apos;Reilly book: &quot;Learning SPARQL&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/06/my-upcoming-oreilly-book-learn.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.662</id>

    <published>2011-06-01T14:07:13Z</published>
    <updated>2011-06-01T14:14:32Z</updated>

    <summary>Querying and Updating with SPARQL 1.1.</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="SPARQL" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="publishing" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="sparql" label="sparql" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        Querying and Updating with SPARQL 1.1.
        <![CDATA[    <div id="id103332">


<a id="id103334" href="http://www.learningsparql.com"><img id="id103339" src="http://www.learningsparql.com/img/cover.jpg" width="200" border="0" align="right" hspace="30px" vspace="30px" alt="Learning SPARQL cover"/></a>

<p id="id103360">51 weeks ago at <a id="id103363" href="http://semtech2010.semanticuniverse.com/">last year's semtech</a> I couldn't believe that there was still no book about SPARQL available. I had accumulated notes for such a book, and by that point I'd learned enough about SPARQL as a TopQuadrant employee that I decided to start studying the specifications (and especialy the 1.1 update) more systematically and write the book myself. (This explains why I've been writing less on my blog in the last year and <a id="id103377" href="http://www.snee.com/bobdc.blog/metadata/rdf/sparql/">writing about SPARQL</a> more when I do.)</p>
<p id="id103386">I'm proud to announce that I'm publishing the book with O'Reilly. Print and electronic versions will be available in July at the latest, and we're already planning on releasing an expanded edition with additional new material and any necessary updates once SPARQL 1.1 becomes a Recommendation. Anyone who buys the ebook version of the first edition will get the expanded edition on SPARQL 1.1 at no extra cost.</p>

<p id="id103397">As you can tell from the book's cover on the right, the O'Reilly animal for this one is the anglerfish&#8212;the one with the light that hangs off the front of its head, for the pun on "sparkle". (I should really pick up the <a id="id103411" href="http://www.neatoshop.com/product/Deep-Sea-Anglerfish-LED-Light?tag=2302">nightlight version</a> of this lovely fish.) </p>

<p id="id103419">From what I've seen so far, the only coverage of SPARQL in any existing books is a chapter or two in more general books on the semantic web, and I haven't seen any coverage of SPARQL 1.1 in those books just yet. (The second edition of of Dean Allemang and Jim Hendler's <a id="id103426" href="http://www.amazon.com/exec/obidos/ISBN=0123859654/bobducharmeA/">Semantic Web for the Working Ontologist</a>, which is available on Amazon today, covers some SPARQL 1.1 query features, but not SPARQL Update.) "Learning SPARQL" is the first complete book on SPARQL, and covers both 1.0 and 1.1&#8212;including <a id="id103440" href="http://www.w3.org/TR/sparql11-update/">SPARQL Update</a>&#8212;with working sample queries and data that you can try yourself with free software.</p>

<p id="id103452">I parked the domain name <a id="id103455" href="http://www.learningsparql.com">learningsparql.com</a> some time ago, and now there's a full web site about the book there. For up-to-date information about the book's availability and SPARQL news in general, subscribe to the twitter feed <a id="id103465" href="http://twitter.com/#!/LearningSPARQL">@LearningSPARQL</a>.</p>

    </div>
]]>
    </content>
</entry>

<entry>
    <title>Semantic web technology at NASA: lower costs and greater productivity</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/05/semantic-web-at-nasa-lower-cos.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.661</id>

    <published>2011-05-27T21:54:52Z</published>
    <updated>2011-05-27T22:28:40Z</updated>

    <summary>An inspiring story.</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="semantic web" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="nasa" label="NASA" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="semanticweb" label="semanticweb" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        An inspiring story.
        <![CDATA[
    <div id="id103348">



<p id="id103351">Ian Jacob's recent <a id="id103354" href="http://www.w3.org/QA/2011/05/semantic_web_its_not_rocket_sc.html">interview with NASA's Jean Holm</a> on the W3C website is an excellent case study of semantic web technology. It's not a long article, so I recommend that you read the whole thing. Here are few points that caught me eye:</p>
<img id="id103366" src="http://humbabe.arc.nasa.gov/MarsDustWorkshop/NASA_Logo.gif" border="0" align="right" hspace="30px" vspace="30px" alt="NASA logo" width="140"/>
<ul id="id103387">
<li id="id103390"><p id="id103391">She gives nice hard numbers about money spent and money saved, and saw a downward trend of the costs.</p></li>
<li id="id103394"><p id="id103396">They used publication data to infer social networks and shared expertise and found other related ways to reduce the need for staff data entry.</p></li>
<li id="id103399"><p id="id103400">The use of service agreements encouraged people to share data more easily.</p></li>
<li id="id103405"><p id="id103407">This sharing led to demonstrated serendipitous reuse of data.</p></li>
<li id="id103411"><p id="id103413">They plan to network the vocabularies (she doesn't use this term literally&#8212;I know it from a <a id="id103418" href="http://www.topquadrant.com/solutions/ent_vocab_net.html">TopQuadrant context</a>&#8212;but she's clearly talking about the same thing).</p></li>
</ul>
<p id="id103429"> It was nice to see the credit that she gave to Kendall Clark. With my TopQuadrant hat on, I wish she'd mentioned some of the <a id="id103434" href="http://www.scribd.com/doc/25387652/NASA-Constellation-Program-Ontologies-Ralph-Hodgson-20080320">extensive work</a> that Raph Hodgson has done there, but NASA is a big organization.</p>
<p id="id103444">After reading Danny Ayers' <a id="id103448" href="http://dannyayers.com/2011/05/27/Smell-the-coffee">Smell the coffee</a> blog post this morning, which wasn't very hopeful about recent progress in the semantic web, I <a id="id103457" href="http://twitter.com/#!/bobdc/status/74163734284742656">hoped that</a> Ian's interview with Jeanne would cheer him up.</p>

    </div>
]]>
    </content>
</entry>

<entry>
    <title>Using SPARQL to find the right DBpedia URI</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/05/using-sparql-to-find-the-right.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.660</id>

    <published>2011-05-17T12:40:52Z</published>
    <updated>2011-05-17T12:45:03Z</updated>

    <summary>Even with the wrong name.</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="SPARQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="sparqldbpedia" label="sparql dbpedia" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        Even with the wrong name.
        <![CDATA[    <div id="id103348">

<img id="id103352" src="http://www.snee.com/bobdc.blog/img/BobMarly.jpg" border="0" align="right" hspace="30px" vspace="30px" alt="some description"/>

<p id="id103370">In <a id="id103372" href="http://www.snee.com/bobdc.blog/2011/02/pulling-skos-preflabel-and-alt.html">Pulling SKOS prefLabel and altLabel values out of DBpedia</a>, I described how Wikipedia and DBpedia store useful data about alternative names for resources described on Wikipedia, and I showed how you can use these to populate a SKOS dataset's alternative and preferred label properties. Today I want to show how to use these as part of an application that lets you retrieve data even when you don't necessarily have the right name for something&#8212;for example, retrieving a picture of Bob Marley using the misspelled version of  his name "Bob Marly". </p>
<p id="id103396">The <a id="id103398" href="http://dbpedia.org/page/Bob_Marley">DBpedia page for Bob Marley</a> shows that dbpedia:Bob_Marly is one of the dbpedia-owl:wikiPageRedirects values of http://dbpedia.org/page/Bob_Marley. This means that if you send your browser to <a id="id103405" href="http://en.wikipedia.org/wiki/Bob_Marly">http://en.wikipedia.org/wiki/Bob_Marly</a>, you'll end up on <a id="id103413" href="http://en.wikipedia.org/wiki/Bob_Marley">http://en.wikipedia.org/wiki/Bob_Marley</a>. </p>
<p id="id103421">It doesn't show that this redirect URI has  the rdfs:label value "Bob Marly"@en associated with it, and this is the really handy part for retrieving data based on not-quite-right values. Because of this, the following SPARQL query will return the URI http://dbpedia.org/resource/Bob_Marley whether the quoted literal value is "Bob Marly" or "Bob Marley":</p>

<pre id="id103431">
# First two PREFIX declarations unnecessary on <a id="id103435" href="http://dbpedia.org/snorql/?query=PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0APREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0D%0A%0D%0ASELECT+%3Fs+WHERE+{%0D%0A++{%0D%0A++++%3Fs+rdfs%3Alabel+%22Bob+Marly%22%40en+%3B%0D%0A+++++++a+owl%3AThing+.+++++++%0D%0A++}%0D%0A++UNION%0D%0A++{%0D%0A++++%3FaltName+rdfs%3Alabel+%22Bob+Marly%22%40en+%3B%0D%0A+++++++++++++dbo%3AwikiPageRedirects+%3Fs+.%0D%0A++}%0D%0A}">SNORQL</a>
PREFIX rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
PREFIX foaf: &lt;http://xmlns.com/foaf/0.1/&gt;
PREFIX dbo: &lt;http://dbpedia.org/ontology/&gt;

SELECT ?s WHERE {
  {
    ?s rdfs:label "Bob Marly"@en ;
       a owl:Thing .       
  }
  UNION
  {
    ?altName rdfs:label "Bob Marly"@en ;
             dbo:wikiPageRedirects ?s .
  }
}
</pre>
<p id="id103462">The graph pattern before the UNION keyword checks whether there is an actual Wikipedia page for the quoted value, and the part after checks whether it's a redirect of something else. Effectively, it will be one or the other; there are only about a dozen labels in DBpedia that can be both.</p>

<p id="id103471">To use this in a simple application, I created a <a id="id103474" href="http://www.snee.com/sparqlforms/findWikipediaImage.html">form</a> that, after you enter a name on it, attempts to display a picture of what you entered. Because the redirect data includes common misspellings as well as nicknames, entering "Bob Marly" will get you a picture of Marley and the URL of the actual resource, as shown below the picture above. Other interesting nicknames and misspellings to try are Bob Dillan, Mary Casat, Prince Billy,  Big Blue, and Proctor and Gamble. (Warning: DBpedia image data is incorrect for some very well-known people, like Abraham Lincoln and Barack Obama, even when the Wikipedia page has a picture, so you may see the symbol for a broken image link. I had hoped to have the picture above have a title of "<a id="id103493" href="http://en.wikipedia.org/wiki/Abe_Lincon">Abe Lincon</a>".) </p>

<p id="id103501">Because the output creates a specialized web page, I used the technique I described in <a id="id103505" href="http://www.ibm.com/developerworks/xml/library/x-wikiquery/">Build Wikipedia query forms with semantic technology</a> (which can be used with any SPARQL endpoint, not just DBpedia): a CGI Python script stores a SPARQL query, replaces a string in that query with whatever was entered in the form, sends the query off to the endpoint, and then sends HTML based on the result back to the browser. You can see the source <a id="id103518" href="http://www.snee.com/sparqlforms/findWikipediaImage.txt">here</a>.</p>

<p id="id103527">It's safe to say that this ability to find the right information based on a nickname or common misspelling could add a lot to a lot of applications. Once again, while the most important part of the semantic web is the data&#8212;in this case, DBpedia's <a id="id103537" href="http://dbpedia.org/ontology/wikiPageRedirects">wikiPageRedirects</a> values&#8212;and not the standards and technologies used to get at the data, the existence of so much useful SPARQL-accessible data should make the SPARQL query language look more and more appealing to people who might have doubted before. </p>
    </div>
]]>
    </content>
</entry>

<entry>
    <title>SKOS overview article on IBM developerWorks</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/05/skos-overview-article-on-ibm-d.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.658</id>

    <published>2011-05-11T14:04:44Z</published>
    <updated>2011-05-11T14:05:40Z</updated>

    <summary>SKOS, vocabulary management, the semantic web, and more</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="SKOS" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="skosdbpediasparql" label="SKOS dbpedia SPARQL" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        SKOS, vocabulary management, the semantic web, and more
        <![CDATA[    <div id="id103334">

<a id="id103337" href="http://www.ibm.com/developerworks/xml/library/x-skostaxonomy/index.html"><img id="id103341" src="http://www.ibm.com/developerworks/i/dwwordmark.gif" border="0" align="right" hspace="30px" vspace="30px" alt="developerWorks logo"/></a>

<p id="id103359">I've been interested in the SKOS standard for vocabulary management for several years (and written about it <a id="id103364" href="http://www.snee.com/bobdc.blog/metadata/rdf/skos/">here</a> several times), but since we at TopQuadrant first began planning out the <a id="id103372" href="http://www.topquadrant.com/solutions/ent_vocab_net.html">Enterprise Vocabulary Net</a> product, I've learned a lot more about the theory and practice of using SKOS. I've recently written up an overview of SKOS and where it fits into vocabulary management and the semantic web, and IBM developerWorks has just published this as <a id="id103384" href="http://www.ibm.com/developerworks/xml/library/x-skostaxonomy/index.html">Improve your taxonomy management using the W3C SKOS standard</a>. I hope it provides useful to people who want to learn more about SKOS.</p>


    </div>
]]>
    </content>
</entry>

<entry>
    <title>Quick and dirty linked data content negotiation</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/05/quick-and-dirty-linked-data-co.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.657</id>

    <published>2011-05-09T14:32:08Z</published>
    <updated>2011-05-09T14:36:39Z</updated>

    <summary>Not even that dirty.</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="RDF" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="restlinked_datahttprdf" label="rest linked_data http rdf" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        Not even that dirty.
        <![CDATA[    <div id="id103362">



<p id="id103375">I've managed to fill a key gap in the world's supply of Linked Open Data by publishing triples that connect Mad Magazine film parody titles to the DBpedia URIs of the actual films. For example:</p>

<pre id="id103382">
&lt;http://dbpedia.org/resource/Judge_Dredd_%28film%29&gt;
      mad:FilmParody
              [ prism:CoverDate "1995-08-00" ;
                prism:issueIdentifier
                        "338" ;
                dc:title "Judge Dreck"
              ] .

&lt;http://dbpedia.org/resource/2001:_A_Space_Odyssey_%28film%29&gt;
      mad:FilmParody
              [ prism:CoverDate "1969-03-00" ;
                prism:issueIdentifier "125" ;
                dc:title "201 Minutes of a Space Idiocy"
              ] .
</pre>
<p id="id103398">(To prepare the data, I scraped a <a id="id103401" href="http://en.wikipedia.org/wiki/List_of_Mad's_movie_spoofs">Wikipedia list</a>, tested the URIs, then hand-corrected a few.) To really make this serious RESTful linked open data, I wanted to make it available as both RDF/XML and Turtle depending on the <tt id="id103412">Accept</tt> value in the header of the HTTP request. All this took was a few lines in the <tt id="id103417">.htaccess</tt> file (which I've been learning <a id="id103422" href="http://www.snee.com/bobdc.blog/2011/04/form-driven-sparql-queries-wit.html">more about lately</a>) in the directory storing the RDF/XML and Turtle versions of the data.</p>
<p id="id103430">For example, either of the following two commands retrieves the Turtle version: 
</p>
<pre id="id103437">
<a id="id103439" href="http://www.gnu.org/software/wget/">wget</a> --header="Accept: text/turtle" http://www.rdfdata.org/dat/MadFilmParodies/
<a id="id103449" href="http://curl.haxx.se/">curl</a> --header "Accept: text/turtle" -L http://www.rdfdata.org/dat/MadFilmParodies/
</pre>

<p id="id103459">Substituting <tt id="id103462">application/rdf+xml</tt> for <tt id="id103467">text/turtle</tt> in either command gets you the RDF/XML version, and omitting the <tt id="id103472">--header</tt> parameter altogether gets you an HTML version.</p>
<p id="id103478">Here's the complete <tt id="id103481">.htaccess</tt> file:</p>
<pre id="id103486">
RewriteEngine on

RewriteCond %{HTTP_ACCEPT} ^.*text/turtle.*
RewriteRule ^index.html$ http://www.rdfdata.org/dat/MadFilmParodies/MadFilmParodies.ttl [L]
# no luck:
#RewriteRule ^index.html$ http://www.rdfdata.org/dat/MadFilmParodies/MadFilmParodies.ttl [R=303,L]

RewriteCond %{HTTP_ACCEPT} ^.*application/rdf\+xml.*
RewriteRule ^index.html$ http://www.rdfdata.org/dat/MadFilmParodies/MadFilmParodies.rdf [L]

RewriteRule ^index.html$ http://en.wikipedia.org/wiki/List_of_Mad's_movie_spoofs
</pre>


<p id="id103502">The Apache web server where I have this hosted is configured to look for an index.html file in a directory if the requested URL doesn't mention a specific filename, so the three rules here each modify that "request" to look for something else, depending on what the <tt id="id103509">RewriteCond</tt> line finds in the <tt id="id103513">HTTP_ACCEPT</tt> value. If it finds "text/turtle", it sends the Turtle version of my data, and the <tt id="id103530">L</tt> directive  tells the Apache mod_rewrite module that is processing these instructions not to look at any more of them.</p>
<p id="id103541">The next rule performs the corresponding <tt id="id103544">HTTP_ACCEPT</tt> check and file delivery for an RDF/XML request, and the default behavior if neither of those happen is to deliver an HTML version of the data. (I took the lazy way out and just redirected to the appropriate Wikipedia page instead of creating a new HTML file.) As you can see from the two commented-out lines, I <a id="id103554" href="http://www.qc4blog.com/?p=934">had the impression</a> that adding <tt id="id103561">R=303</tt> in the brackets with the <tt id="id103565">L</tt> would send an HTTP return code of <a id="id103570" href="http://en.wikipedia.org/wiki/HTTP_303">303</a> back to the requester, overriding the default code of <a id="id103577" href="http://en.wikipedia.org/wiki/HTTP_302">302</a>, but never got that to work. If anyone has any any suggestions about how to fix this, or whether 303 is even the most appropriate return code, please let me know.</p>
<p id="id103585">From what I've read on how the syntax of these instructions work, I shouldn't have needed the full URLs for the Turtle and RDF/XML versions of the Mad Film Parody data, because they were in the same directory as the <tt id="id103591">.htaccess</tt> file, but that was the only way I could get this to work. </p>

<p id="id103598">Now that I know how to do this, I can do it again for other resources pretty quickly. It took me about five minutes to do it for the little http://www.snee.com/ns/madMag/MadFilmParody ontology that the data points to. I consider this solution quick and a bit dirty because it requires the maintenance of two copies of the data, but the XML guy in me knows that it would be wrong to perform parallel edits on the two copies, and that I should instead pick one as a master, edit it when necessary, and generate the other from it. If I had to do this on a larger scale, I learned from Brian Sletten at <a id="id103611" href="http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42&amp;proposalid=3065">last year's semtech</a> that I should look into <a id="id103619" href="http://www.1060research.com/netkernel/">NetKernel</a>, but it was a good exercise to do it this way to learn what was really going on. </p>
<p id="id103629">I'm going to try to get into the habit of doing this for data and ontologies that I create, so I'd appreciate any suggestions about tweaking details before any suboptimal aspects of this become habits. </p>

<a id="id103633" href="http://www.dccomics.com/mad/about/?action=timeline"><img id="id103639" src="http://www.dccomics.com/mad/i/timeline/jul1990.jpg" border="0" style="display: block;margin-left: auto;margin-right: auto " hspace="30px" vspace="30px" alt="MAD cover"/></a>

    </div>
]]>
    </content>
</entry>

<entry>
    <title>Data providers</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/05/data-providers.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.655</id>

    <published>2011-05-02T12:31:42Z</published>
    <updated>2011-05-02T12:32:39Z</updated>

    <summary>RDF or otherwise.</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
    <category term="rdf" label="rdf" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        RDF or otherwise.
        <![CDATA[    <div id="id103301">

<p id="id103303">While beta testing Talis's Kasabi, I got to wondering about the data publishing market: who out there is hosting raw data, potentially charging for it and passing money along to the data's providers? Poking around, I learned who the key names are. (Corrections welcome.) I accidentally stumbled across a few more when I followed a <a id="id103311" href="http://twitter.com/#!/xmlgrrl/status/62701417810509824">tweet</a> from @xmlgrrl (a.k.a. Eve Maler, a friend of mine in the XML world since it was the SGML world) and started looking at her husband Eli's blog. His posting <a id="id103321" href="http://www.eliasisrael.com/2011/04/05/ten-services-to-get-your-cloud-startup-off-the-ground-now/">Ten services to get your cloud startup off the ground now</a> mentioned a few more companies that provide raw data&#8212;one that even provides free RDF. I tagged a few with a <a id="id103330" href="http://www.delicious.com/bobdc/data">delicious.com</a> bookmark, but wanted to write out notes about a few here in order of how interesting they are to a semantic web geek.</p>


<p id="id103341">Some general notes:</p>
<ul id="id103346">
<li id="id103348"><p id="id103349">The more I studied, the more I found, but I didn't want to spend more than an afternoon on this.</p></li>
<li id="id103355"><p id="id103356">These sites all let you download data directly. I didn't include sites like <a id="id103360" href="http://www.data.gov/">Data.gov</a> that function more as directories that link to data sources on other sites. </p></li>
<li id="id103369"><p id="id103370">Most of these providers have boosted their numbers of available datasets by including small datasets with as few as 100 records, and by hosting copies of data from the well-known names in the <a id="id103376" href="http://richard.cyganiak.de/2007/10/lod/">Linked Data Cloud</a>. The advertised added value is typically the ease of programmatic access to that data. </p></li>
<li id="id103386"><p id="id103388">Despite the title of this blog entry (I was tempted to call it "Data resellers", but many make the data available for free) I focused on a more narrow case of data providers: the redistributors that gather data from specific, identified places and then make it available publicly with attribution, not actual data sources themselves such as government agencies, university projects, media making their metadata available, and various other circles on the Linked Data Cloud diagram.</p></li>

<li id="id103400"><p id="id103401">If I've quoted some companies' websites more than others, it's because they had "About" and "FAQ" pages that were easy to find and actually answered the questions I was wondering about.</p></li>
</ul>

<p id="id103409">The most interesting thing about  <a id="id103412" href="http://blog.kasabi.com/"><b id="id103417">Kasabi</b></a> in this field is their commitment to providing data according to Linked Data principles, giving you SPARQL endpoints for data sources and the ability to define new APIs around each data source. The current data selection is interesting, considering that Kasabi is still in beta. For now it all looks like data that is freely available elsewhere, but the advantages of retrieving it from them go beyond the ability to use the SPARQL query language. For example, with BestBuy's RDFa spread out across many different dynamically generated pages on bestbuy.com, querying this data from BestBuy's server has a lot of limitations. Kasabi seems to have the BestBuy data aggregated so that their customers have more flexibility in how they query it.</p>

<blockquote id="id103435" class="pullquote" style="width: 190px; font: bold 1.333em/1.125em &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double #ddd; border-width: 3px 0; text-align: center; float: right; "><strong id="id103446">While disintermediation was a big buzzword of the dot com boom, intermediation is now getting bigger.</strong></blockquote>

<p id="id103451">I list <!-- founded 2007 --><b id="id103456"><a id="id103457" href="http://www.socrata.com/">Socrata</a></b> right after Kasabi because RDF is one of their export formats, along with XML, JSON, CSV, XLS, and more. In a business that depends on finding both data providers and data users, their home page makes the clearest case about why someone should work with them as a data provider: they're clearly targeting government agencies who need to fulfill data transparency mandates. (Other providers are certainly targeting this market; just not as clearly.) The <a id="id103471" href="http://www.socrata.com/company-info/">company info</a> page calls them "The Leader in Open Data Services for Government". Another paragraph on the homepage makes a nice case for why developers should be interested in their data, and upcoming webinar titles of "Launch your own Data.Gov" and "Open Data as a Service Delivery Platform" are also pretty catchy to someone interested in this market. </p>

<p id="id103485"><!-- founded 2007 --><b id="id103488"><a id="id103490" href="http://www.factual.com/">Factual</a></b> targets data users more than data providers on their current home page, telling developers "Access great data for your web and mobile apps". The only download format I could find was CSV, but with their emphasis on helping developers build apps, they focus more data delivery with their RESTful <a id="id103502" href="http://wiki.developer.factual.com/w/page/29670788/Server-API">API</a>. According to their <a id="id103509" href="http://www.factual.com/FAQ">FAQ</a>, "Factual, Inc. is an open data platform for application developers that leverages large scale aggregation and community exchange... Factual's hosted data comes from our community of users, developers and partners, and from our powerful data mining tools... Factual offers several hundred thousand datasets across a variety of topics (with a deep focus in Local) aggregated from multiple sources, made easily accessible for developers to build web and mobile apps... Our APIs are free to everyone&#8212;if you want SLAs or have certain performance requirements, we would charge you a fee based on usage volume. Our downloads are free for smaller developers". A <a id="id103532" href="http://semantifi.wordpress.com/2010/02/11/data-is-the-future-of-web-latest-validation-from-prominent-investors/">press release</a> on Semantifi's web site shows that some big names and big money are behind Factual. </p>

<p id="id103543"><!-- founded 2008 --><b id="id103546"><a id="id103548" href="http://www.infochimps.com">Infochimp</a></b> seems to be one of the more well-known (and memorable) names in the field. From their <a id="id103556" href="http://www.infochimps.com/faq">FAQ</a>: "Infochimps is a place for people to find, share and sell formatted data. Both users and Infochimps employees scrape, parse and format data so that it's easily accessible to you. We take the chimp work out of working with data so you can literally start building cool stuff in minutes... There is no sign up fee to use Infochimps. Some of the data sets available on our site are free. Some require attribution, and others are available for purchase. 
The first 100,000 data API calls are free. We offer subscriptions if you would like to use more... The data sets available through our API are 1.) hosted for you and 2.) scraped on a regular basis. ... Most of our data comes in tsv, csv or yaml format". The part about users scraping, parsing, and formatting highlights another aspect of the business model of some of these companies: crowd-sourcing the labor whenever possible.</p>

<p id="id103593"><b id="id103594"><a id="id103595" href="http://www.aggdata.com">AggData</a></b> sells CSV files, typically of locations of all the stores in a particular chain. For example, a complete list of <a id="id103604" href="http://www.cinnabon.com/">Cinnabon</a> locations, with 454 records, costs $29. The <a id="id103612" href="http://www.aggdata.com/locations/cinnabon">description page</a> for each data set lists the fields and lets you download a sample. Prices that I saw ranged from $9 to $49. According to their <a id="id103621" href="http://www.aggdata.com/faq">FAQ</a>, you order a dataset, and when payment is confirmed they email you a URL for the data that is good for 5 downloads or 120 hours. Being founded in 2006 and therefore  the oldest of these companies, AggData is the most low-tech (no APIs here) but it's a lot easier to look at their lists of franchise locations and churches and imagine that data being useful to someone than it is for many of the other data providers. Infochimps lists AggData as a "featured data provider", but lists the same prices for the same datasets, so I'm not sure whether they're just routing you to the same batches of data or making it available through their own APIs. (I got an Infochimps ID, clicked through for an AggData dataset until it asked me for credit card information, and stopped there.)</p>

<!-- founded 2009 --><p id="id103644">According to their <a id="id103647" href="http://www.semantifi.com/SemantifiPortal.html">About</a> page, <b id="id103655"><a id="id103656" href="http://www.semantifi.com/semantifiHome.action?type=SI">Semantifi</a></b> "developed a meaning based search platform to search both structured and unstructured content and filed multiple patents". Along with the platform, they say that they have an "App Store like marketplace for a community of publishers to build data search apps" and that <!--a href='http://semantifi.wordpress.com/2010/02/11/data-is-the-future-of-web-latest-validation-from-prominent-investors/'-->"Both Socrata and Factual are quite similar in concept and both lack the technology to search datasets like Semantifi". As far as I could tell, Socrata and Factual have a lot more datasets than Semantifi; the first three Semantifi links that I clicked to look into specific data sets went to an <a id="id103680" href="http://wiki.semantifi.com/index.php/100_Best_Places_To_Live">empty wiki page</a>. (If I was clicking in the wrong place, that's not a great reflection on their site design. Also, with all of the people with hardcore financial markets experience on Semantifi's <a id="id103690" href="http://www.semantifi.com/Management.htm">management</a> page, why they need Google ads on their home page?) Perhaps Semantifi is less like data providers Socrata and Factual then they think and more like <a id="id103700" href="http://open.mflask.com/">Open Data Directory</a>, which doesn't provide actual data but instead a search engine for data spread out across other sites that they index. </p>

<p id="id103710">I wanted to mention one other interesting source of fairly large-scale data to use in applications&#8212;when I learned how to add a volume for more disk space to an Amazon EC2 cloud image, I found that some of the volumes I could choose from included data from a choice of public data sets: DBpedia and Freebase dumps, the <a id="id103727" href="http://www.cs.cmu.edu/~enron/">Enron email</a>,  US Census, Labor, and economic data,  various biological data collections, and more. There is a <a id="id103738" href="http://aws.amazon.com/datasets?_encoding=UTF8&amp;jiveRedirect=1">list of such data</a> on Amazon's website, but doesn't show all the choices; additional data sets include <a id="id103747" href="http://ods.openlinksw.com/wiki/main/Main/VirtAWSBBCMusicProgs">BBC Music and programs data</a>. If you were going to jump into the data reseller market with the various companies described above, an Amazon image with some of this data would be one logical place to start your company.</p>

<p id="id103759">A local friend <a id="id103762" href="http://twitter.com/#!/dep4b">Eric Pugh</a> was recently pointing out to me the irony of how, while disintermediation was a big buzzword of the dot com boom, intermediation is now getting bigger. These data resellers are a good example. If you're going to insert yourself as a middleman between a data provider and a data user, it's a compelling case for either side to use your service if you have a lot of customers on the other side, but before you get there, you need to make your own compelling case to each side. Some of the companies listed above are better at doing this than others, and it will be interesting to see which of them are in business in five years and why they lasted. </p>

<!-- 
Open Data Directory not really a reseller of data, but an index that lets you search across public data sets, complete with an API. See mflask.com for more. 

 -->


    </div>]]>
    </content>
</entry>

<entry>
    <title>Inserting data from a SPARQL endpoint into a relational database</title>
    <link rel="alternate" type="text/html" href="http://www.snee.com/bobdc.blog/2011/04/inserting-data-from-a-sparql-e.html" />
    <id>tag:www.snee.com,2011:/bobdc.blog//2.653</id>

    <published>2011-04-27T13:28:18Z</published>
    <updated>2011-04-27T13:30:58Z</updated>

    <summary>Via XML.</summary>
    <author>
        <name>Bob DuCharme</name>
        <uri>http://www.snee.com/bobdc.blog</uri>
    </author>
    
        <category term="SPARQL" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.snee.com/bobdc.blog/">
        Via XML.
        <![CDATA[    <div id="id103318">

<p id="id103320">Retrieval of triples from relational databases is a popular topic in the semantic web world, but I was recently wondering how much trouble it would be to go in the opposite direction: to retrieve data from a SPARQL endpoint and load it into a relational database. It wasn't much trouble at all. When you retrieve the results in the <a id="id103329" href="http://www.w3.org/TR/2008/REC-rdf-sparql-XMLres-20080115/">SPARQL query results XML format</a>, a straightforward XSLT spreadsheet can convert it into the necessary SQL INSERT statements. I was able to automate the data retrieval, conversion to INSERT statements, and actual insertion into a MySQL database with a three-line batch file that used no Windows-specific tricks, so I'm sure it would work on Linux just as well.</p>

<p id="id103343">I used the following SPARQL query to retrieve the name, founding year, and equity, revenue, net income, and operating income figures of companies listed on the New York Stock Exchange according to DBpedia. I used <a id="id103350" href="http://jena.sourceforge.net/ARQ/">ARQ</a> to execute the query, so that after the inner query retrieved the raw data from the http://DBpedia.org/sparql SPARQL endpoint service, the outer query could use ARQ's SPARQL 1.1 support to format the data a bit&#8212;mostly, by using the <tt id="id103363">str()</tt> function to strip language and datatype tags.</p>
<pre id="id103370">
PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
PREFIX do: &lt;http://dbpedia.org/ontology/&gt;

SELECT (str(?name) as ?coName) 
       (substr(str(?formationYearTyped),1,4) as ?formationYear)
       (str(?equityTyped) as ?equity) 
       (str(?revenueTyped) as ?revenue) 
       (str(?netIncomeTyped) as ?netIncome) 
       (str(?operatingIncomeTyped) as ?operatingIncome) 
  WHERE {
  SERVICE &lt;http://DBpedia.org/sparql&gt;
  {
    SELECT * WHERE {
     ?company &lt;http://purl.org/dc/terms/subject&gt; 
     &lt;http://dbpedia.org/resource/Category:Companies_listed_on_the_New_York_Stock_Exchange&gt; .
     ?company rdfs:label ?name . 
    FILTER ( lang(?name) = "en" )
      OPTIONAL { ?company do:formationYear ?formationYearTyped . } 
      OPTIONAL { ?company do:equity ?equityTyped . }
      OPTIONAL { ?company do:revenue ?revenueTyped . } 
      OPTIONAL { ?company do:netIncome  ?netIncomeTyped . } 
      OPTIONAL { ?company do:operatingIncome ?operatingIncomeTyped . } 
    }
  }
}
</pre>

<p id="id103406">The following command line told ARQ to put the results of this query in an XML file called companyData.xml. (Because the query doesn't have the FROM keyword, ARQ needs an input dataset specified, so the command names dummy.ttl as this input even though the query above ignores this file and gets its data from DBpedia using the SERVICE keyword.)  </p>
<pre id="id103416">
arq --results XML --query getCompanyData.spq --data dummy.ttl &gt; companyData.xml
</pre>
<p id="id103422">Next, I ran the following command to apply an XSLT stylesheet to the result of the ARQ output using libxslt's <a id="id103427" href="http://xmlsoft.org/XSLT/xsltproc2.html">xsltproc</a> XSLT processor. (You could use Saxon or Xalan just as easily.) This generated the SQL statements that would add the data to a MySQL database and stored them in the file insertCompanData.sql:</p>
<pre id="id103438">
xsltproc SPARQLXMLtoSQL.xsl companyData.xml &gt; insertCompanyData.sql
</pre>
<p id="id103444">The XSLT stylesheet is not particularly brief, but there's no customized logic to process the output of the query above other than the use of the query's variable names and the quotes that it adds around the <tt id="id103450">coName</tt> values. (The potential need for quotes depends on whether you're inserting the value into the SQL database as a string.) The trickiest part was having the stylesheet output the string "NULL" when a value was missing; I used a named template, so it wasn't too tricky.</p>
<p id="id103460">If I had many different query results to convert to SQL INSERT statements, I'd write a more generalized version of this stylesheet (for example, setting the the name of the database and table  to receive the data in variables at the top), but if I only had two or three sets of SPARQL query results to deal with, I could adapt this one for each of those pretty quickly:</p>
<pre id="id103470">
&lt;xsl:stylesheet version="1.0"
                xmlns:s="http://www.w3.org/2005/sparql-results#"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

  &lt;xsl:strip-space elements="*"/&gt;
  &lt;xsl:output method="text"/&gt;


  &lt;xsl:template match="s:sparql"&gt;
    <b id="id103482">USE testdb;</b>
    &lt;xsl:apply-templates/&gt;
  &lt;/xsl:template&gt;


  &lt;xsl:template match="text()"/&gt; &lt;!-- all values output with xsl:value-of --&gt;


  &lt;xsl:template match="s:result"&gt;

  &lt;!-- Typical line for this template rule to create 
       (with carriage return added here):
       <b id="id103498">INSERT INTO company</b> VALUES(
       "Protective Life",1907,NULL,3.06E9,2.71E8,4.16E8);
   --&gt;
    &lt;xsl:text&gt;INSERT INTO company VALUES("&lt;/xsl:text&gt;
    &lt;xsl:value-of select="s:binding[@name='coName']/s:literal"/&gt;
    &lt;xsl:text&gt;",&lt;/xsl:text&gt;

    &lt;xsl:call-template name="valueOrNULL"&gt;
      &lt;xsl:with-param name="value"
                      select="s:binding[@name='formationYear']/s:literal"/&gt;
    &lt;/xsl:call-template&gt;
    &lt;xsl:text&gt;,&lt;/xsl:text&gt;

    &lt;xsl:call-template name="valueOrNULL"&gt;
      &lt;xsl:with-param name="value"
                      select="s:binding[@name='equity']/s:literal"/&gt;
    &lt;/xsl:call-template&gt;
    &lt;xsl:text&gt;,&lt;/xsl:text&gt;

    &lt;xsl:call-template name="valueOrNULL"&gt;
      &lt;xsl:with-param name="value"
                      select="s:binding[@name='revenue']/s:literal"/&gt;
    &lt;/xsl:call-template&gt;
    &lt;xsl:text&gt;,&lt;/xsl:text&gt;

    &lt;xsl:call-template name="valueOrNULL"&gt;
      &lt;xsl:with-param name="value"
                      select="s:binding[@name='netIncome']/s:literal"/&gt;
    &lt;/xsl:call-template&gt;
    &lt;xsl:text&gt;,&lt;/xsl:text&gt;

    &lt;xsl:call-template name="valueOrNULL"&gt;
      &lt;xsl:with-param name="value"
                      select="s:binding[@name='operatingIncome']/s:literal"/&gt;
    &lt;/xsl:call-template&gt;

    &lt;xsl:text&gt;);&amp;#10;&lt;/xsl:text&gt;
  &lt;/xsl:template&gt;
 

  &lt;xsl:template name="valueOrNULL"&gt;
    &lt;xsl:param name="value"/&gt;
    &lt;xsl:choose&gt;
      &lt;xsl:when test=" $value != '' "&gt;
        &lt;xsl:value-of select="$value"/&gt;
      &lt;/xsl:when&gt;
      &lt;xsl:otherwise&gt;NULL&lt;/xsl:otherwise&gt;
    &lt;/xsl:choose&gt;
  &lt;/xsl:template&gt;


&lt;/xsl:stylesheet&gt;
</pre>
<p id="id103530">To run the created INSERT statements with a MySQL database table, I just did this, substituting my own MySQL username and password:</p>
<pre id="id103536">
mysql -u myusername --password=mypass &lt; insertCompanyData.sql
</pre>
<p id="id103541">Of course, the created set of INSERT statements assumes that a database named testdb with a table named company already exists, and that the appropriate columns have been declared for that table.</p>
<p id="id103545">After combining the command line calls to arq, xsltproc, and mysql in a three-line batch file, it was fun to see it all happen unattended. For a more serious implementation, you'd want to look into the use of APIs to the various tools as a more efficient alternative to this kind of scripting, but it's nice to see how much can be done with a little scripting.  </p>


    </div>]]>
    </content>
</entry>

</feed>

