28 January 2018

JavaScript SPARQL

With rdfstore-js.

... all in the world's most popular programming language.

I finally had a chance to play with rdfstore-js by Antonio Garrote and it was all pretty straightforward. I already had node.js installed, so a simple npm install js installed his library. Then, I was ready to include the library in a JavaScript script that would read some RDF and query it with SPARQL. I just ran my script from the command line, but node.js fans know that they can take advantage of this library's features in much more interesting application architectures. (Before I go on, I wanted to mention that after I tweeted yesterday that this blog entry was coming, Andy Seaborne reminded me about Apache Jena's ability to load and run JavaScript functions. I tried the example from the feature's home page and it worked great right out of the box.)

My sample script starts with a function I wrote for general-purpose output of SPARQL SELECT queries, then creates an rdfstore object and saves a query that will be used twice later in the script. After loading some RDF data about my book Learning SPARQL from the OCLC's Worldcat online library catalog into the rdfstore, it runs the saved query against the loaded data to list ISBN numbers. The script then loads data about another book, runs the same query, and you can see the additional ISBN numbers in the new output.

// Utility function for outputting SELECT results
function outputSPARQLResults(results) {
    for (row in results) {
        printedLine = ''
        for (column in results[row]) {
            printedLine = printedLine + results[row][column].value + ' '

// Create an rdfstore
var rdfstore = require('rdfstore') 

// Define a query to execute.
var listISBNs = 'PREFIX s: <http://schema.org/> \
PREFIX ls: <http://learningsparql.com/ns/data#> \
PREFIX wco: <http://www.worldcat.org/title/-/oclc/> \
PREFIX wci: <http://worldcat.org/isbn/> \
SELECT ?isbn \
FROM ls:g1 WHERE { ?book s:isbn ?isbn } '

rdfstore.create(function(err, store) {   // no error handling
        // Load data about the book Learning SPARQL into named graph g1 in the rdfstore.
        'LOAD <http://worldcat.org/oclc/890467322.ttl> \
        INTO GRAPH <http://learningsparql.com/ns/data#g1>', function(err) {

            store.setPrefix('s', 'http://schema.org/')
            store.setPrefix('ls', 'http://learningsparql.com/ns/data#')
            store.setPrefix('wco', 'http://www.worldcat.org/title/-/oclc/')
            store.setPrefix('wci', 'http://worldcat.org/isbn/')
	    store.execute(listISBNs, function(err, results) {
                console.log("=== ISBN value ===")

        // Load data about the book "XML: The Annotated Specification" into the same graph
        'LOAD <http://worldcat.org/oclc/40768745.ttl> \
        INTO GRAPH <http://learningsparql.com/ns/data#g1>', function(err) {
	    store.execute(listISBNs, function(err, results) {
                console.log("\n=== ISBN values after adding 2nd book's data ===")

The script produces this output:

=== ISBN value ===

=== ISBN values after adding 2nd book's data ===

I loaded the data into a named graph because the library documentation's sample query for loading remote data did. I briefly tried loading the data into the default graph, but had no luck; I'm all for the use of name graphs, anyway. I also tried deleting triples from and inserting them into the g1 named graph and then querying again to see the results, and I didn't have much luck there either (no error messages--I just didn't see the query results I expected after the deletion and insertion) , but my minimal understanding of node.js asynchronous behavior was probably to blame. The library's github page shows that it does support INSERT and DELETE queries.

I wouldn't use this library's triplestore for ongoing production maintenance of a set of triples, anyway; I see it as a great lightweight way to grab triples from one or more sources and then perform SPARQL queries on those triples to look for subsets and patterns that can contribute to an application, all in the world's most popular programming language.

The rdfstore-js github page also shows that it offers many ways to query and manipulate the loaded data that, for a JavaScript programmer, would be more direct. If Antonio's ultimate goal was to bring RDF to JavaScript developers, I won't complain; I'm just glad that he brought a useful JavaScript library to RDF (and SPARQL) developers.

Please add any comments to this Google+ post.

31 December 2017

SPARQL and Amazon Web Service's Neptune database

Promising news for large-scale RDF development.

Amazon recently announced Neptune as an AWS service. As its home page describes it,

Amazon Neptune is a fast, scalable graph database service. Neptune efficiently stores and navigates highly connected data. Its query processing engine is optimized for leading graph query languages, Apache TinkerPop™ Gremlin and the W3C's RDF SPARQL. Neptune provides high performance through the open and standard APIs of these graph frameworks. And, Neptune is fully managed, so you no longer need to worry about database management tasks such as hardware provisioning, software patching, setup, configuration, or backups.

Apart from the practical aspects of the scalable yet convenient use of RDF and SPARQL that Neptune will enable, it's exciting to see such a high-profile acknowledgment of SPARQL as a serious development tool. Many organizations already knew this, but judging from the reaction to the Neptune announcement on Twitter, many more people are finally understanding this.

It's exciting to see such a high-profile acknowledgment of SPARQL as a serious development tool.

Rumors have been flying that the Blazegraph triplestore may play some role in Amazon's new graph store. As Stardog CEO Kendall Clark wrote on ycombinator recently, "Amazon acquired the domains, etc. Many former Blazegraph engineers are now Amazon Neptune engineers according to LinkedIn, etc. It was rumored widely in the graph db world fwiw." Yahoo Knowledge Graph science and data lead Nicolas Torzec responded to Kendall's comment with a link showing that Amazon now owns the Blazegraph trademark. (Blazegraph's website hasn't shown much activity in a while, with the latest post on their Press page being from May of last year.)

May of last year was also when I wrote Trying out Blazegraph about my positive experiences about this graph store, and after the recent announcement I tweeted that if Blazegraph was part of Neptune, it would be very cool if that included Blazegraph's inferencing. Pavel Klinov replied by pointing out a Neptune announcement video where they explicitly say that inferencing is not supported.

This hour-long "AWS re:Invent 2017: NEW LAUNCH! Deep dive on Amazon Neptune" video included some other interesting points. Because Neptune supports property graphs via Tinkerpop as well as SPARQL, early in the video the speaker provides some background on property graphs versus RDF. He devotes a good portion of his presentation to talking through an SQL query for people who are unfamiliar with graph databases and then covering comparable SPARQL and Tinkerpop Gremlin queries.

The plug from Thomson Reuters early in the video was nice to see, coming from a large well-known organization that has been taking SPARQL seriously for a while. Later in the video, one slide's use of Thomson Reuter's PermID vocabulary with the geonames vocabulary in the same triple was especially nice to see, because while the extent of RDF's usage continues to be a pleasant surprise for me, I'm also surprised by how many people only use it for the simplicity of the triples data model--they're missing the data integration power of the ability to mix and match the wide variety of existing vocabularies (and hence data sources) with their own data.

The video's second speaker talks more about Neptune's enterprise features such as fast failover, encryption at rest and in transit, and backup and restore, which are all great things to see in a cloud-based triplestore. Neptune offers a lot of room; as this speaker mentions, "Storage volumes are not required to be statically allocated; they actually grow automatically up to a maximum size of 64 terabytes." The ability to restore a dataset to its state from a previous point in time also sounds very useful.

Once the speakers started taking questions, it looked to me like there were more questions about RDF and SPARQL than there were about Tinkerpop and Gremlin. The former included the question about inferencing, which got a response (as Pavel had pointed out to me) of "we do not have in-database inference currently... we are very interested in use cases for inferencing." They also said that Neptune's underlying graph engine was custom-built by Amazon as a graph system, which left me more curious about the potential role of Blazegraph in the released version of Neptune. (Maybe "by Amazon" includes former Blazegraph engineers.)

Some more interesting facts from the question and answer session:

I'm looking forward to playing with SPARQL on AWS Neptune and will certainly be reporting back about my experiences here.

Please add any comments to this Google+ post.

19 November 2017

SPARQL queries of Beatles recording sessions

Who played what when?

SPARQL and Beatles logos

While listening to the song Dear Life on the new Beck album, I wondered who played the piano on the Beatles' Martha My Dear. A web search found the website Beatles Bible, where the Martha My Dear page showed that it was Paul.

This was not a big surprise, but one pleasant surprise was how that page listed absolutely everyone who played on the song and what they played. For example, a musician named Leon Calvert played both trumpet and flugelhorn. The site's Beatles' Songs page links to pages for every song, listing everyone who played on them, with very few exceptions--for example, for giant Phil Spector productions like The Long and Winding Road, it does list all the instruments, but not who played them. On the other hand, for the orchestra on A Day in the Life, it lists the individual names of all 12 violin players, all 4 violists, and the other 25 or so musicians who joined the Fab Four for that.

An especially nice surprise on this website was how syntactically consistent the listings were, leading me to think "with some curl commands, python scripting, and some regular expressions, I could, dare I say it, convert all these listings to an RDF database of everyone who played on everything, then do some really cool SPARQL queries!"

So I did, and the RDF is available in the file BeatlesMusicians.ttl. The great part about having this is the ability to query across the songs to find out things such as how many different people played a given instrument on Beatles recordings or what songs a given person may have played on, regardless of instrument. In a pop music geek kind of way, it's been kind of exciting to think that I could ask and answer questions about the Beatles that may have never been answered before.

Here are two typical triples. All of these resources have corresponding rdfs:label values to make query output look nicer:

t:HereComesTheSun i:Moogsynthesiser  m:GeorgeHarrison .
t:EleanorRigby    i:cello          

Here are some of the queries I entered.

Who ever played piano for the Beatles, and on how many songs?

PREFIX  i:     <http://learningsparql.com/ns/instrument/> 
PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 

SELECT ?pianistName (COUNT(?pianist) AS ?pianistCount)  WHERE {
  ?song i:piano ?pianist .
  ?pianist rdfs:label ?pianistName . 
GROUP BY ?pianistName
ORDER BY DESC(?pianistCount)

The result:

| pianistName        | pianistCount |
| "Paul McCartney"   | 60           |
| "George Martin"    | 22           |
| "John Lennon"      | 16           |
| "John 'Duff' Lowe" | 2            |
| "Chris Thomas"     | 1            |
| "Kenny Powell"     | 1            |
| "Mal Evans"        | 1            |
| "Ringo Starr"      | 1            |

Paul's number one spot is no surprise, and these results and other data do support any assertion that George Martin truly was the fifth Beatle. Seeing Chris Thomas there was a surprise to me; he went on to produce the Sex Pistols album, the first three Pretenders albums, the second through fifth Roxy Music albums, and many more classics. And, we have to wonder "what song had Ringo on piano?" As we'll see, that was easy enough to query.

This variation on the query above is slightly broader, because it looks for people who played any instruments with the string "piano" in their name:

PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 

SELECT ?pianistName (COUNT(?pianist) AS ?pianistCount)  WHERE {
  ?instrument rdfs:label ?instrumentName . 
  ?song ?instrument ?pianist . 
  ?pianist rdfs:label ?pianistName . 
GROUP BY ?pianistName
ORDER BY DESC(?pianistCount)
| pianistName        | pianistCount |
| "Paul McCartney"   | 67           |
| "George Martin"    | 22           |
| "John Lennon"      | 20           |
| "Billy Preston"    | 6            |
| "Chris Thomas"     | 2            |
| "John 'Duff' Lowe" | 2            |
| "Kenny Powell"     | 1            |
| "Mal Evans"        | 1            |
| "Nicky Hopkins"    | 1            |
| "Ringo Starr"      | 1            |

This raises Paul and John's numbers and adds Nicky Hopkins (who also did important piano work for the Stones, the Kinks, and the Who) and Billy Preston, who in addition to the electric piano on Get Back, apparently played on five other songs. (The increase in numbers isn't all from electric pianos, but also from the pianet that John and Paul each played once or twice.)

What song had Ringo on piano?

PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX  i:     <http://learningsparql.com/ns/instrument/> 
PREFIX  m:     <http://learningsparql.com/ns/musician/> 

  ?songURI i:piano m:RingoStarr .
  ?songURI rdfs:label ?song .

The result is a White Album song that Ringo apparently wrote himself:

| song                     |
| "Don't Pass Me By" |

Who were all the cellists the Beatles ever used, and on what songs?

PREFIX  i:     <http://learningsparql.com/ns/instrument/> 
PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 

SELECT?name ?songTitle WHERE {
  ?song i:cello ?musician .
  ?song rdfs:label ?songTitle .
  ?musician rdfs:label ?name . 
ORDER BY ?name

The result:

| name                  | songTitle                   |
| "Alan Dalziel"        | "A Day In The Life"         |
| "Alan Dalziel"        | "She's Leaving Home"  |
| "Alex Nifosi"         | "A Day In The Life"         |
| "Allen Ford"          | "Within You Without You"    |
| "Bram Martin"         | "I Am The Walrus"           |
| "Dennis Vigay"        | "A Day In The Life"         |
| "Dennis Vigay"        | "She's Leaving Home"  |
| "Derek Simpson"       | "Eleanor Rigby"             |
| "Derek Simpson"       | "Strawberry Fields Forever" |
| "Eldon Fox"           | "Glass Onion"               |
| "Eldon Fox"           | "I Am The Walrus"           |
| "Eldon Fox"           | "Piggies"                   |
| "Francisco Gabarro"   | "A Day In The Life"         |
| "Francisco Gabarro"   | "Yesterday"                 |
| "Frederick Alexander" | "Martha My Dear"            |
| "Jack Holmes"         | "All You Need Is Love"      |
| "John Hall"           | "Strawberry Fields Forever" |
| "Lionel Ross"         | "All You Need Is Love"      |
| "Lionel Ross"         | "I Am The Walrus"           |
| "Norman Jones"        | "Eleanor Rigby"             |
| "Norman Jones"        | "Strawberry Fields Forever" |
| "Peter Beavan"        | "Within You Without You"    |
| "Peter Willison"      | "Blue Jay Way"              |
| "Reginald Kilbey"     | "Glass Onion"               |
| "Reginald Kilbey"     | "Martha My Dear"            |
| "Reginald Kilbey"     | "Piggies"                   |
| "Reginald Kilbey"     | "Within You Without You"    |
| "Terry Weil"          | "I Am The Walrus"           |
| "Uncredited"          | "Let It Be"                 |

I have no reason to recognize any of the names here, but I when I sent the URL of the She's Leaving Home page to a friend who was a London session string player in the sixties, he said that the members of the double string quartet on on that song were very top people and that some were friends of his.

Who played on how many songs, period?

PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 

SELECT ?playerName (COUNT(?player) AS ?playerCount)  WHERE {
  ?song ?instrument ?player . 
  ?player rdfs:label ?playerName . 
GROUP BY ?playerName
# don't bother with people who only played on one song
HAVING (COUNT(?player) > 1)        
ORDER BY DESC(?playerCount)

The result:

| playerName           | playerCount |
| "Paul McCartney"     | 678         |
| "John Lennon"        | 576         |
| "George Harrison"    | 502         |
| "Ringo Starr"        | 412         |
| "Uncredited"         | 58          |
| "George Martin"      | 45          |
| "Unknown"            | 28          |
| "Mal Evans"          | 16          |
| "Pete Best"          | 14          |
| "Billy Preston"      | 11          |
| "Tony Sheridan"      | 8           |
| "Chris Thomas"       | 6           |
| "John Underwood"     | 5           |
| "Neil Aspinall"      | 5           |
| "Sidney Sax"         | 5           |
| "Yoko Ono"           | 5           |
| "David Mason"        | 4           |
| "Jeff Lynne"         | 4           |
| "Reginald Kilbey"    | 4           |
| "Eldon Fox"          | 3           |
| "Eric Bowie"         | 3           |
| "Erich Gruenberg"    | 3           |
| "Harry Klein"        | 3           |
| "Henry Datyner"      | 3           |
| "Leon Calvert"       | 3           |
| "Neil Sanders"       | 3           |
| "Pattie Harrison"    | 3           |
| "Rex Morris"         | 3           |
| "Stuart Sutcliffe"   | 3           |
| "Alan Civil"         | 2           |
| "Alan Dalziel"       | 2           |
| "Andy White"         | 2           |
| "Bill Povey"         | 2           |
| "Brian Jones"        | 2           |
| "Colin Hanton"       | 2           |
| "Dennis Vigay"       | 2           |
| "Dennis Walton"      | 2           |
| "Derek Simpson"      | 2           |
| "Derek Watkins"      | 2           |
| "Eric Clapton"       | 2           |
| "Francisco Gabarro"  | 2           |
| "Fred Lucas"         | 2           |
| "Freddy Clayton"     | 2           |
| "Geoff Emerick"      | 2           |
| "Gordon Pearce"      | 2           |
| "Irene King"         | 2           |
| "Jack Greene"        | 2           |
| "Jack Rothstein"     | 2           |
| "John 'Duff' Lowe"   | 2           |
| "Johnnie Scott"      | 2           |
| "Jurgen Hess"        | 2           |
| "Keith Cummings"     | 2           |
| "Kenneth Essex"      | 2           |
| "Leo Birnbaum"       | 2           |
| "Lionel Ross"        | 2           |
| "Mahapurush Misra"   | 2           |
| "Marianne Faithfull" | 2           |
| "Mick Jagger"        | 2           |
| "Mike Redway"        | 2           |
| "Norman Jones"       | 2           |
| "Norman Lederman"    | 2           |
| "Norman Smith"       | 2           |
| "Other musicians"    | 2           |
| "Pat Whitmore"       | 2           |
| "Ralph Elman"        | 2           |
| "Ronald Thomas"      | 2           |
| "Stephen Shingles"   | 2           |
| "Tony Gilbert"       | 2           |
| "Tony Tunstall"      | 2           |
| "Tristan Fry"        | 2           |
| "Victor Spinetti"    | 2           |

No big surprises in the top 10 but there definitely are after that. For example...

What 4 Beatles tracks did ELO founder Jeff Lynne play on?

PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX  m:     <http://learningsparql.com/ns/musician/> 

SELECT ?instrument ?songName WHERE {
  ?song ?instrumentURI m:JeffLynne .
  ?song rdfs:label ?songName .
  ?instrumentURI rdfs:label ?instrument . 

Apparently he sang and played overdubs with Paul, George, and Ringo on some John demos, after John died, as "new" Beatle material to go with the Anthology documentary and rereleases.

| instrument       | songName         |
| "backing vocals" | "Real Love"      |
| "guitar"         | "Real Love"      |
| "harmony vocals" | "Free As A Bird" |
| "guitar"         | "Free As A Bird" |

As you look through the big list of musicians above, you'll probably want to plug more names into that last query. For example, any Beatles or Eric Clapton fan knows that he played the guitar solo on While My Guitar Gently Weeps, but why does he get a "2" up there? It turns out that he and some other big names sang backing vocals on All You Need Is Love.

Let me know what kinds of queries and results you come up with!

Please add any comments to this Google+ post.

"Learning SPARQL" cover

Recent Tweets



    [What are these?]
    Atom 1.0 (summarized entries)
    Atom 1.0 (full entries)
    RSS 1.0
    RSS 2.0
    Gawker Artists