31 December 2017

SPARQL and Amazon Web Service's Neptune database

Promising news for large-scale RDF development.

Amazon recently announced Neptune as an AWS service. As its home page describes it,

Amazon Neptune is a fast, scalable graph database service. Neptune efficiently stores and navigates highly connected data. Its query processing engine is optimized for leading graph query languages, Apache TinkerPop™ Gremlin and the W3C's RDF SPARQL. Neptune provides high performance through the open and standard APIs of these graph frameworks. And, Neptune is fully managed, so you no longer need to worry about database management tasks such as hardware provisioning, software patching, setup, configuration, or backups.

Apart from the practical aspects of the scalable yet convenient use of RDF and SPARQL that Neptune will enable, it's exciting to see such a high-profile acknowledgment of SPARQL as a serious development tool. Many organizations already knew this, but judging from the reaction to the Neptune announcement on Twitter, many more people are finally understanding this.

It's exciting to see such a high-profile acknowledgment of SPARQL as a serious development tool.

Rumors have been flying that the Blazegraph triplestore may play some role in Amazon's new graph store. As Stardog CEO Kendall Clark wrote on ycombinator recently, "Amazon acquired the domains, etc. Many former Blazegraph engineers are now Amazon Neptune engineers according to LinkedIn, etc. It was rumored widely in the graph db world fwiw." Yahoo Knowledge Graph science and data lead Nicolas Torzec responded to Kendall's comment with a link showing that Amazon now owns the Blazegraph trademark. (Blazegraph's website hasn't shown much activity in a while, with the latest post on their Press page being from May of last year.)

May of last year was also when I wrote Trying out Blazegraph about my positive experiences about this graph store, and after the recent announcement I tweeted that if Blazegraph was part of Neptune, it would be very cool if that included Blazegraph's inferencing. Pavel Klinov replied by pointing out a Neptune announcement video where they explicitly say that inferencing is not supported.

This hour-long "AWS re:Invent 2017: NEW LAUNCH! Deep dive on Amazon Neptune" video included some other interesting points. Because Neptune supports property graphs via Tinkerpop as well as SPARQL, early in the video the speaker provides some background on property graphs versus RDF. He devotes a good portion of his presentation to talking through an SQL query for people who are unfamiliar with graph databases and then covering comparable SPARQL and Tinkerpop Gremlin queries.

The plug from Thomson Reuters early in the video was nice to see, coming from a large well-known organization that has been taking SPARQL seriously for a while. Later in the video, one slide's use of Thomson Reuter's PermID vocabulary with the geonames vocabulary in the same triple was especially nice to see, because while the extent of RDF's usage continues to be a pleasant surprise for me, I'm also surprised by how many people only use it for the simplicity of the triples data model--they're missing the data integration power of the ability to mix and match the wide variety of existing vocabularies (and hence data sources) with their own data.

The video's second speaker talks more about Neptune's enterprise features such as fast failover, encryption at rest and in transit, and backup and restore, which are all great things to see in a cloud-based triplestore. Neptune offers a lot of room; as this speaker mentions, "Storage volumes are not required to be statically allocated; they actually grow automatically up to a maximum size of 64 terabytes." The ability to restore a dataset to its state from a previous point in time also sounds very useful.

Once the speakers started taking questions, it looked to me like there were more questions about RDF and SPARQL than there were about Tinkerpop and Gremlin. The former included the question about inferencing, which got a response (as Pavel had pointed out to me) of "we do not have in-database inference currently... we are very interested in use cases for inferencing." They also said that Neptune's underlying graph engine was custom-built by Amazon as a graph system, which left me more curious about the potential role of Blazegraph in the released version of Neptune. (Maybe "by Amazon" includes former Blazegraph engineers.)

Some more interesting facts from the question and answer session:

I'm looking forward to playing with SPARQL on AWS Neptune and will certainly be reporting back about my experiences here.

Please add any comments to this Google+ post.

19 November 2017

SPARQL queries of Beatles recording sessions

Who played what when?

SPARQL and Beatles logos

While listening to the song Dear Life on the new Beck album, I wondered who played the piano on the Beatles' Martha My Dear. A web search found the website Beatles Bible, where the Martha My Dear page showed that it was Paul.

This was not a big surprise, but one pleasant surprise was how that page listed absolutely everyone who played on the song and what they played. For example, a musician named Leon Calvert played both trumpet and flugelhorn. The site's Beatles' Songs page links to pages for every song, listing everyone who played on them, with very few exceptions--for example, for giant Phil Spector productions like The Long and Winding Road, it does list all the instruments, but not who played them. On the other hand, for the orchestra on A Day in the Life, it lists the individual names of all 12 violin players, all 4 violists, and the other 25 or so musicians who joined the Fab Four for that.

An especially nice surprise on this website was how syntactically consistent the listings were, leading me to think "with some curl commands, python scripting, and some regular expressions, I could, dare I say it, convert all these listings to an RDF database of everyone who played on everything, then do some really cool SPARQL queries!"

So I did, and the RDF is available in the file BeatlesMusicians.ttl. The great part about having this is the ability to query across the songs to find out things such as how many different people played a given instrument on Beatles recordings or what songs a given person may have played on, regardless of instrument. In a pop music geek kind of way, it's been kind of exciting to think that I could ask and answer questions about the Beatles that may have never been answered before.

Here are two typical triples. All of these resources have corresponding rdfs:label values to make query output look nicer:

t:HereComesTheSun i:Moogsynthesiser  m:GeorgeHarrison .
t:EleanorRigby    i:cello          

Here are some of the queries I entered.

Who ever played piano for the Beatles, and on how many songs?

PREFIX  i:     <http://learningsparql.com/ns/instrument/> 
PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 

SELECT ?pianistName (COUNT(?pianist) AS ?pianistCount)  WHERE {
  ?song i:piano ?pianist .
  ?pianist rdfs:label ?pianistName . 
GROUP BY ?pianistName
ORDER BY DESC(?pianistCount)

The result:

| pianistName        | pianistCount |
| "Paul McCartney"   | 60           |
| "George Martin"    | 22           |
| "John Lennon"      | 16           |
| "John 'Duff' Lowe" | 2            |
| "Chris Thomas"     | 1            |
| "Kenny Powell"     | 1            |
| "Mal Evans"        | 1            |
| "Ringo Starr"      | 1            |

Paul's number one spot is no surprise, and these results and other data do support any assertion that George Martin truly was the fifth Beatle. Seeing Chris Thomas there was a surprise to me; he went on to produce the Sex Pistols album, the first three Pretenders albums, the second through fifth Roxy Music albums, and many more classics. And, we have to wonder "what song had Ringo on piano?" As we'll see, that was easy enough to query.

This variation on the query above is slightly broader, because it looks for people who played any instruments with the string "piano" in their name:

PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 

SELECT ?pianistName (COUNT(?pianist) AS ?pianistCount)  WHERE {
  ?instrument rdfs:label ?instrumentName . 
  ?song ?instrument ?pianist . 
  ?pianist rdfs:label ?pianistName . 
GROUP BY ?pianistName
ORDER BY DESC(?pianistCount)
| pianistName        | pianistCount |
| "Paul McCartney"   | 67           |
| "George Martin"    | 22           |
| "John Lennon"      | 20           |
| "Billy Preston"    | 6            |
| "Chris Thomas"     | 2            |
| "John 'Duff' Lowe" | 2            |
| "Kenny Powell"     | 1            |
| "Mal Evans"        | 1            |
| "Nicky Hopkins"    | 1            |
| "Ringo Starr"      | 1            |

This raises Paul and John's numbers and adds Nicky Hopkins (who also did important piano work for the Stones, the Kinks, and the Who) and Billy Preston, who in addition to the electric piano on Get Back, apparently played on five other songs. (The increase in numbers isn't all from electric pianos, but also from the pianet that John and Paul each played once or twice.)

What song had Ringo on piano?

PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX  i:     <http://learningsparql.com/ns/instrument/> 
PREFIX  m:     <http://learningsparql.com/ns/musician/> 

  ?songURI i:piano m:RingoStarr .
  ?songURI rdfs:label ?song .

The result is a White Album song that Ringo apparently wrote himself:

| song                     |
| "Don't Pass Me By" |

Who were all the cellists the Beatles ever used, and on what songs?

PREFIX  i:     <http://learningsparql.com/ns/instrument/> 
PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 

SELECT?name ?songTitle WHERE {
  ?song i:cello ?musician .
  ?song rdfs:label ?songTitle .
  ?musician rdfs:label ?name . 
ORDER BY ?name

The result:

| name                  | songTitle                   |
| "Alan Dalziel"        | "A Day In The Life"         |
| "Alan Dalziel"        | "She's Leaving Home"  |
| "Alex Nifosi"         | "A Day In The Life"         |
| "Allen Ford"          | "Within You Without You"    |
| "Bram Martin"         | "I Am The Walrus"           |
| "Dennis Vigay"        | "A Day In The Life"         |
| "Dennis Vigay"        | "She's Leaving Home"  |
| "Derek Simpson"       | "Eleanor Rigby"             |
| "Derek Simpson"       | "Strawberry Fields Forever" |
| "Eldon Fox"           | "Glass Onion"               |
| "Eldon Fox"           | "I Am The Walrus"           |
| "Eldon Fox"           | "Piggies"                   |
| "Francisco Gabarro"   | "A Day In The Life"         |
| "Francisco Gabarro"   | "Yesterday"                 |
| "Frederick Alexander" | "Martha My Dear"            |
| "Jack Holmes"         | "All You Need Is Love"      |
| "John Hall"           | "Strawberry Fields Forever" |
| "Lionel Ross"         | "All You Need Is Love"      |
| "Lionel Ross"         | "I Am The Walrus"           |
| "Norman Jones"        | "Eleanor Rigby"             |
| "Norman Jones"        | "Strawberry Fields Forever" |
| "Peter Beavan"        | "Within You Without You"    |
| "Peter Willison"      | "Blue Jay Way"              |
| "Reginald Kilbey"     | "Glass Onion"               |
| "Reginald Kilbey"     | "Martha My Dear"            |
| "Reginald Kilbey"     | "Piggies"                   |
| "Reginald Kilbey"     | "Within You Without You"    |
| "Terry Weil"          | "I Am The Walrus"           |
| "Uncredited"          | "Let It Be"                 |

I have no reason to recognize any of the names here, but I when I sent the URL of the She's Leaving Home page to a friend who was a London session string player in the sixties, he said that the members of the double string quartet on on that song were very top people and that some were friends of his.

Who played on how many songs, period?

PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 

SELECT ?playerName (COUNT(?player) AS ?playerCount)  WHERE {
  ?song ?instrument ?player . 
  ?player rdfs:label ?playerName . 
GROUP BY ?playerName
# don't bother with people who only played on one song
HAVING (COUNT(?player) > 1)        
ORDER BY DESC(?playerCount)

The result:

| playerName           | playerCount |
| "Paul McCartney"     | 678         |
| "John Lennon"        | 576         |
| "George Harrison"    | 502         |
| "Ringo Starr"        | 412         |
| "Uncredited"         | 58          |
| "George Martin"      | 45          |
| "Unknown"            | 28          |
| "Mal Evans"          | 16          |
| "Pete Best"          | 14          |
| "Billy Preston"      | 11          |
| "Tony Sheridan"      | 8           |
| "Chris Thomas"       | 6           |
| "John Underwood"     | 5           |
| "Neil Aspinall"      | 5           |
| "Sidney Sax"         | 5           |
| "Yoko Ono"           | 5           |
| "David Mason"        | 4           |
| "Jeff Lynne"         | 4           |
| "Reginald Kilbey"    | 4           |
| "Eldon Fox"          | 3           |
| "Eric Bowie"         | 3           |
| "Erich Gruenberg"    | 3           |
| "Harry Klein"        | 3           |
| "Henry Datyner"      | 3           |
| "Leon Calvert"       | 3           |
| "Neil Sanders"       | 3           |
| "Pattie Harrison"    | 3           |
| "Rex Morris"         | 3           |
| "Stuart Sutcliffe"   | 3           |
| "Alan Civil"         | 2           |
| "Alan Dalziel"       | 2           |
| "Andy White"         | 2           |
| "Bill Povey"         | 2           |
| "Brian Jones"        | 2           |
| "Colin Hanton"       | 2           |
| "Dennis Vigay"       | 2           |
| "Dennis Walton"      | 2           |
| "Derek Simpson"      | 2           |
| "Derek Watkins"      | 2           |
| "Eric Clapton"       | 2           |
| "Francisco Gabarro"  | 2           |
| "Fred Lucas"         | 2           |
| "Freddy Clayton"     | 2           |
| "Geoff Emerick"      | 2           |
| "Gordon Pearce"      | 2           |
| "Irene King"         | 2           |
| "Jack Greene"        | 2           |
| "Jack Rothstein"     | 2           |
| "John 'Duff' Lowe"   | 2           |
| "Johnnie Scott"      | 2           |
| "Jurgen Hess"        | 2           |
| "Keith Cummings"     | 2           |
| "Kenneth Essex"      | 2           |
| "Leo Birnbaum"       | 2           |
| "Lionel Ross"        | 2           |
| "Mahapurush Misra"   | 2           |
| "Marianne Faithfull" | 2           |
| "Mick Jagger"        | 2           |
| "Mike Redway"        | 2           |
| "Norman Jones"       | 2           |
| "Norman Lederman"    | 2           |
| "Norman Smith"       | 2           |
| "Other musicians"    | 2           |
| "Pat Whitmore"       | 2           |
| "Ralph Elman"        | 2           |
| "Ronald Thomas"      | 2           |
| "Stephen Shingles"   | 2           |
| "Tony Gilbert"       | 2           |
| "Tony Tunstall"      | 2           |
| "Tristan Fry"        | 2           |
| "Victor Spinetti"    | 2           |

No big surprises in the top 10 but there definitely are after that. For example...

What 4 Beatles tracks did ELO founder Jeff Lynne play on?

PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX  m:     <http://learningsparql.com/ns/musician/> 

SELECT ?instrument ?songName WHERE {
  ?song ?instrumentURI m:JeffLynne .
  ?song rdfs:label ?songName .
  ?instrumentURI rdfs:label ?instrument . 

Apparently he sang and played overdubs with Paul, George, and Ringo on some John demos, after John died, as "new" Beatle material to go with the Anthology documentary and rereleases.

| instrument       | songName         |
| "backing vocals" | "Real Love"      |
| "guitar"         | "Real Love"      |
| "harmony vocals" | "Free As A Bird" |
| "guitar"         | "Free As A Bird" |

As you look through the big list of musicians above, you'll probably want to plug more names into that last query. For example, any Beatles or Eric Clapton fan knows that he played the guitar solo on While My Guitar Gently Weeps, but why does he get a "2" up there? It turns out that he and some other big names sang backing vocals on All You Need Is Love.

Let me know what kinds of queries and results you come up with!

Please add any comments to this Google+ post.

29 October 2017

An HTML form trick to add some convenience to life

With a little JavaScript as needed.

On the computers that I use the most, the browser home page is an HTML file with links to my favorite pages and a "single" form that lets me search the sites that I search the most. I can enter a search term in the field for any of the sites, press Enter, and then that site gets searched. The two tricks that I use to create these fields have been handy enough that I thought I'd share them in case they're useful to others.

I quote the word "single" above because it appears to be a single form but is actually multiple little forms in the HTML. Here is an example with four of my entries; enter something into any of the fields and press Enter to see what I mean:


The first two fields search the way most search forms do, by passing a search string as a parameter to some back end process. To add one of these fields to my form, I just had to look at the source of the actual website's search form to find out what variable it was passing to what URL and then reproduce that in a little form around that field in my home page file. For Wikipedia, I set the form's action attribute to "http://en.wikipedia.org/wiki/Special:Search" and the input element's name attribute to "search". This way, if I enter "foobar" in my version of their search field above, the form creates the URL https://en.wikipedia.org/wiki/Special:Search?search=foobar to perform the search, and it works. (The input field of the Wikipedia field also has the autofocus field set to "autofocus" so that when a browser displays the page, the cursor is in that field, and I can then just press Tab a few times to quickly get to the others.) For YouTube there's a different URL and the search parameter variable name is "search_query", so I set the name attribute on that second little form's input element to have that value.

The third and fourth input fields above search websites with a more RESTful interface, so instead of passing a value in a particular variable name to a CGI script, they just construct a URL with the search term at the end. From within the form, this is actually trickier than the CGI way to do it because some JavaScript must be embedded into the form's action attribute to concatenate the entered value onto the appropriate URL and then send the browser to the resulting URL. You can see how this is done with a View Source of this blog entry. (Note how verbose the JavaScript way to grab that form value is--I'd appreciate any suggestions for a simpler way.) You'll also see that to send the browser to the appropriate destination, the form sets the href property of the window.location object to the new URL.

Just about all the search forms I've found fall into one of these two categories, so for my master search forms at home and at work I've also added fields to search Google Maps, JIRA, Amazon, and more. You can see three more examples at the end of my entry from last April, The Wikidata data model and your SPARQL queries.

It all makes a nice example of doing a little fun scripting, instead of real work, to save upwards of minutes a day!

xkcd cartoon

Please add any comments to this Google+ post.

"Learning SPARQL" cover

Recent Tweets



    [What are these?]
    Atom 1.0 (summarized entries)
    Atom 1.0 (full entries)
    RSS 1.0
    RSS 2.0
    Gawker Artists