Getting started with Open Anzo

Don't miss the exciting command line video demo!

Open Anzo is the third disk-based triplestore that I managed to set up, load with a few files of RDF data, and query with SPARQL. Its home page describes it as "an open source enterprise-featured RDF store and service oriented middleware platform that provides support for multiple users, distributed clients, offline work, real-time notification, named-graph modularization, versioning, access controls, and transactions with preconditions".

Before I describe my experience setting it up, loading sample data, and querying that data, take a look at Lee Feigenbaum's short video demonstrating the use of Open Anzo's command line interface:

He's using Linux in the video, but I managed to perform similar queries using Open Anzo under Windows XP. I got the impression from one documentation web page that the product requires DB2, Oracle, PostgreSQL, HSQLDB, or Apache DB on the back end in order to run it, but you don't need any external database manager to try it out. It is nice to know that these database managers are options as your storage needs scale up; a readme file mentions that it also supports MySQL, and documentation for configuring Open Anzo to hook up to each of these database managers is easy to find on the web site.

After I downloaded release 3.1 of the Open Anzo full distribution and unzipped it, I set the ANZO_HOME environment variable to the name of the directory where I had unzipped it and then ran the startAnzo.bat script that started the server. (Once the server is started, sending a browser to http://localhost:8080/status shows whether you've got it up and running properly.) The server gives you an "osgi>" prompt in the command window where you started it up. Entering "help" at this server prompt shows you various things you can do there, but I didn't play with that much.

Once the server is running, you can interact with it using a command line client, as Lee demonstrated in his video. From a Windows operating system prompt, you do this by supplying parameters to the anzo.bat script. In addition to the ANZO_HOME variable, the window where you issue these commands also needs the ANZO_CLI_HOME variable set; I pointed it to the same directory.

Entering "anzo help" lists the various anzo commands, and entering a command name after "help" like this tells you about that command:

anzo help query

Before you issue your first successful command, you also need to make sure that the server recognizes you as a legitimate user. I used peter as a username and 123 as a password, because I found these in the configuration\anzo.ldif file. Open Anzo offers options to point the client to a username and password pair stored in a configuration file, which is why Lee didn't need to include them on the command line in his video, but I just added them to each anzo command with the -w and -u switches. The following two commands each loaded a file of RDF data into the named graph identified by the -g option:

anzo import -w 123 -u peter -g \bob\dev\xml\rdf\fakeAddrBookPt1.rdf
anzo import -w 123 -u peter -g \bob\dev\xml\rdf\fakeAddrBookPt2.rdf

The following query then asked for a list of all the predicates used by triples in the graph:

anzo query -u peter -w 123  "SELECT DISTINCT ?p FROM <> WHERE {?s ?p ?o}"

The next query doesn't mention a specific graph, but it does include the -a switch, which tells Open Anzo to query against a merge of all the named graphs in the repository:

anzo query -u sysadmin -w 123 -a "SELECT DISTINCT ?p WHERE {?s ?p ?o}"

Both queries worked just fine. As I mentioned in an update to last week's post, I also managed to query a set of graphs at once in Open Anzo based on metadata associated with the graph.

As I understand it, Open Anzo once included a RESTful SPARQL endpoint to provide an HTTP interface, and although some more recent builds didn't include this, it's being put back in. I couldn't get it to work in a few tests with curl, but I'm going to keep trying with future builds.

As with Virtuoso, Open Anzo has an impressive list of features beyond the simple ability to load and query triples that I've demonstrated here. I love the command line interface, and Lee's video quickly demonstrates a lot of cool things you can do with it. I look forward to playing more with Open Anzo.



are you sure Open Anzo works with a file-based triple store? As far as the documentation reads, without an underlying RDBMS an in-memory store is used, which means your triples will be gone once the server is shut down.