Getting started with AllegroGraph

Via Python and via HTTP.

The home page of Franz Inc.'s AllegroGraph RDFStore calls it "a modern, high-performance, persistent RDF graph database" that "scale[s] to billions of triples while maintaining superior performance". Franz offers a free version that lets you store up to 50 million triples, so I installed and played with release 3.2 of the Windows version. When I tried it, the documentation and examples were not well coordinated with the configuration of the latest release, but Franz's email support was very responsive and helpful, even to a non-paying customer like me. I've also seen some evidence that they're bringing this documentation up to date.

For each triplestore I've played with, I tried to avoid coding and compiling. I didn't see any web interface or command line tool for loading RDF triples into AllegroGraph and then querying the data using SPARQL, so I started with its Python interface and then tried the HTTP interface. I first learned Python several years ago because of all of the RDF-related libraries out there, so I'm happy to write some scripts with it. It would be interesting to try AllegroGraph's LISP interface, but my last experience coding in LISP is some time ago, so there'd be some catch-up time.

The AllegroGraph server

AllegroGraph's setup routine configured it to automatically run as a service under Windows. After some early frustration with the Python client, I discovered that this copy of the server was not being started up according to assumptions made by the sample code in AllegroGraph's Python API for AllegroGraph tutorial. For one thing, a line in the tutorial's first Python script tells the server to open up the "ag" catalog—according to the tutorial, a repository is another term for an RDF triplestore, and a catalog is a container for a set of repositories—but the server didn't know about this catalog. I shut down the AllegroGraph service (in Windows, from Control Panel/Administrative Tools/Services, right-click "AllegroGraph Server" and pick "Stop") and then started it up from the Program Files\AllegroGraphFJE32 directory with this command, which specifies a directory included with the AllegroGraph distribution as the catalog location:

AllegroGraphServer --new-http-port 8080 --new-http-catalog doc/agraph-javadoc/com/franz/ag

This also tells the server to use port 8080, which is where the Python tutorial's sample scripts send their requests.

A little Python client

The AllegroGraphFJE32/doc/server-installation.html file included with the distribution recommends that Windows users use ActiveState's version of Python, which may explain some of my other early problems with the Python interface. I also found mistakes in the Python tutorial's sample code; instead of listing these problems, I've posted my script, which includes corrected versions of the first few examples, at http://www.snee.com/rdf/agdemo.py.txt.

The script creates a repository in the ag catalog, loads the same RDF files that I loaded into other triplestores I've tried, and sends the server the "SELECT DISTINCT ?p WHERE {?s ?p ?o}" query I usually use to start any SPARQL session. I commented this Python script script where I could, so I won't describe it here. For now, AllegroGraph's documentation of their Python interface is skimpy, but better documentation is on the way. You can learn more about AllegroGraph's Python interface from this blog posting by someone in Austria named "Rho". Keep in mind that Rho's examples use release 3.1.1, and apparently improvements to the Python client were an important part of AllegroGraph's upgrade to release 3.2.

Trying the HTTP interface

AllegroGraph's currently available documentation of their HTTP interface provides no examples of complete URLs to send to the server, so it took me some time to work out the correct format, but once I did, it was pretty straightforward to use. (As with the Python interface documentation, I heard that better HTTP interface documentation is on the way.) One other caveat: when I tried this with a recent distribution version of release 3.2, some of these commands didn't work until after I'd picked "Download AllegroGraph 3.2 Free Java Edition Updates" from the AllegroGraph program group on the Windows Start menu.

AllegroGraph's HTTP interface documentation says that if you start the server with the -new-http-port option, as I did, then you should us the separate documentation for their new HTTP server. I used cURL to send URIs to the server's HTTP interface.

To list existing repositories, the following query retrieved a SPARQL query results XML format listing with fields for the uri, id, title, readable, and writable status of each repository:

curl http://localhost:8080/catalogs/ag/repositories

This is an important command, because many others require you to supply a repository id.

This next command following successfully created a new repository with an id of test1 (all curl commands were actually issued as one line; I added carriage returns here for readability):

curl -X PUT -H "content-type: application/x-www-form-urlencoded; accept: */*" 
  http://localhost:8080/catalogs/ag/repositories/test1

The first time I tried it I saw no response, but the second time I was told "there is already a store named 'test1'", which was good news.

The following command added triples from the indicated disk file to the test1 repository:

curl -X POST -T \bob\dev\xml\rdf\fakeaddrbookpt1.rdf -H "content-type: application/rdf+xml"
  http://localhost:8080/catalogs/ag/repositories/test1/statements"

(April 9th correction: when I posted this entry yesterday, the preceding command and the remainder of this paragraph had the POST and PUT references backwards, so I just fixed them.) I found that without that "-X POST" in the command line, either the server or curl assumed that I was PUTting data. An HTTP PUT replaces any existing data in the repository, so if you want to add several files to the same repository, make sure to explicitly POST them there.

The next command sent an escaped SPARQL query to the server, which sent back a SPARQL query result format list of the predicates used in the data that I had loaded:

curl -H "Accept: application/sparql-results+xml" 
  http://localhost:8080/catalogs/ag/repositories/test1?query=SELECT%20DISTINCT%20%3Fp%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D

Querying sets of named graphs

Using the HTTP interface, I also managed to reproduce my experiment with named graphs described at Querying a set of named RDF graphs without naming the graphs. (See that posting for background on what I was trying to accomplish, the sample data files, and the queries I used. And, if you're interested in named graphs, don't miss the discussion between Paul Gearon, Lee Feigenbaum, and Andy Seaborne in the comments section of that post.) Following the steps described there, I first loaded the mybluegraph.rdf file into the graph named http://www.snee.com/ng/mybluegraph.rdf (or, in AllegroGraph terms, into the context named http://www.snee.com/ng/mybluegraph.rdf):

curl -X POST -T \bob\dev\xml\rdf\sparql\namedgraphs\mybluegraph.rdf 
  -H "Content-Type: application/rdf+xml" 
  http://localhost:8080/catalogs/ag/repositories/test1/statements?context=%3Chttp%3A%2F%2Fwww.snee.com%2Fng%2Fmybluegraph.rdf%3E

Then I loaded myredgraph.rdf into the http://www.snee.com/ng/myredgraph.rdf graph with a similar command:

curl -X POST -T \bob\dev\xml\rdf\sparql\namedgraphs\myredgraph.rdf 
  -H "Content-Type: application/rdf+xml" 
  http://localhost:8080/catalogs/ag/repositories/test1/statements?context=%3Chttp%3A%2F%2Fwww.snee.com%2Fng%2Fmyredgraph.rdf%3E

I loaded mygreengraph.rdf without specifying a graph in which to load it:

curl -X POST -T \bob\dev\xml\rdf\sparql\namedgraphs\mygreengraph.rdf 
  -H "Content-Type: application/rdf+xml" 
  http://localhost:8080/catalogs/ag/repositories/test1/statements

A query for all dc:title values retrieved them from all three files,

curl -H "Accept: application/sparql-results+xml" http://localhost:8080/catalogs/ag/repositories/test1?query=PREFIX%20dc%3A%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%20select%20%3Ftitle%20WHERE%20%7B%3Fs%20dc%3Atitle%20%3Ftitle%7D%0A

but a query for dc:title values from graphs that were subgraphs of http://www.snee.com/ng/mygraph.rdf only retrieved the redgraph and bluegraph ones, just as I'd hoped:

curl -H "Accept: application/sparql-results+xml" 
  http://localhost:8080/catalogs/ag/repositories/test1?query=PREFIX%20dc%3A%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%20PREFIX%20rdfg%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F03%2Ftrix%2Frdfg-1%2F%3E%0Aselect%20%3Ftitle%20WHERE%20%7B%20%3Fg%20rdfg%3AsubGraphOf%20%3Chttp%3A%2F%2Fwww.snee.com%2Fng%2Fmygraph.rdf%3E%20GRAPH%20%3Fg%20%7B%3Fs%20dc%3Atitle%20%3Ftitle%7D%20%7D%0A

As one of the commercial triplestores, AllegroGraph looks very scalable, and as I mentioned, their support is very good. Franz has been holding some webinars about large-scale applications of their server lately, and an upcoming one on Solving Scale and Reasoning in Large RDF Datasets looks interesting; Franz distributes the Racer Description Logics reasoner in much of the world, so I assume that it will play a role in this reasoning application.

4 Comments

By Dan Brickley on April 9, 2009 4:42 AM

Thanks for the writeup! I never got any further than the impression I needed to write java code to talk to the db.

Have you figured out any of the social network analysis stuff? http://danbri.org/words/2008/06/02/327

http://www.franz.com/agraph/support/documentation/3.0/reference-guide.html#header3-65

eg. can i fill it full of mail headers and foaf and do clustering to find out which groups and lists are interconnected?

By Bob DuCharme on April 9, 2009 8:26 AM

Thanks Dan! I was going for breadth more than depth with these, trying to follow through on the same set of baseline tasks with each triplestore. Particularly with the commercial ones like AllegroGraph and OpenLink, there are weeks' worth of features to play with.

Bob

By Robert (rho) Barta on April 10, 2009 2:51 AM

Hi Bob.

If you want to keep your coding at a minimum, then maybe watch Perl RDF::AllegroGraph::Easy evolve on CPAN. I'll progress it as spare time allows.

Worth noting is also that AllegroGraph seems to be more than "just an RDF store" as it can host tuples (and not just triples). But I admit that I have not yet fathomed out this thing yet.

I can recommend these webinars, especially Jans Aasman talking. But they only last about an hour and cannot get very deep. Experimenting with the code remains a must. Good for me, as a consultant ;-)

BTW, it's rho, not Rho. And, yes, I'm sailing under no flags to avoid angry ladies emailing me about my schroedinger'sch cat experiments...

By Bill on June 6, 2009 4:44 PM

I've also started a Google Group for AllegroGraph users, called, cleverly enough, "AllegroGraph-users". You can sign up at

http://groups.google.com/group/allegrograph-users

bobdc.blog

Bob DuCharme's weblog, mostly on technology for representing and linking information.