Querying aggregated XBRL reports with SPARQL

Easier than I though it would be.

My main goal for doing a SPARQL query against XBRL data was to be able to pull out the same bit of information from multiple companies' reports at once, and it turned out to be much less work than I thought it would be. Here is the result of my query for interest expense figures across several companies:

------------------------------------------------------------------------------
| companyName                    | periodStart  | periodEnd    | interestExp |
==============================================================================
| "GENERAL MILLS INC"            | "2005-05-30" | "2006-05-28" | "399600000" |
| "GENERAL MILLS INC"            | "2006-05-29" | "2007-05-27" | "426500000" |
| "GENERAL MILLS INC"            | "2007-05-28" | "2008-05-25" | "421700000" |
| "PAPA JOHNS INTERNATIONAL INC" | "2007-01-01" | "2007-07-01" | "3232000"   |
| "PAPA JOHNS INTERNATIONAL INC" | "2007-04-02" | "2007-07-01" | "1706000"   |
| "PAPA JOHNS INTERNATIONAL INC" | "2007-12-31" | "2008-06-29" | "3694000"   |
| "PAPA JOHNS INTERNATIONAL INC" | "2008-03-31" | "2008-06-29" | "1802000"   |
| "PEPSICO INC"                  | "2006-12-31" | "2007-06-16" | "96000000"  |
| "PEPSICO INC"                  | "2007-03-25" | "2007-06-16" | "54000000"  |
| "PEPSICO INC"                  | "2007-12-30" | "2008-06-14" | "132000000" |
| "PEPSICO INC"                  | "2008-03-23" | "2008-06-14" | "74000000"  |
------------------------------------------------------------------------------
Being able to compare specific financial figures from different companies will be great for people doing financial research.

A given company's XBRL SEC filing is typically an instance file full of facts plus additional files with taxonomies about the terms used and XLink linkbases about the relationships between the facts. The instance files, on their own, looked like the low hanging fruit to me.

After kicking around some of my ideas for modeling XBRL in RDF with Dave Raggett (who's doing some very interesting, more ambitious work modeling the whole deal in RDF—in related news, Kingsley Idehen said that OpenLink has an XBRL ontology almost ready, and TopQuadrant's Ralph Hodgson pointed out the BRONTO project at TIFbrewery), I wrote an instance2rdf.xsl XSLT stylesheet to convert an XBRL instance to RDF/XML. After running it on the instance documents for several companies that I downloaded from the SEC website and manually creating a file that I called colist.rdf to map company identifiers in the XBRL instances to company names, I ran the following query with ARQ 1.4 to ask about all Interest Expense figures in my collection of reports:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xi:  <http://www.xbrl.org/2003/instance>

SELECT DISTINCT ?companyName ?periodStart ?periodEnd ?interestExp

FROM <RRDonnelley.rdf>
FROM <pepsico.rdf>
FROM <nobleenergy.rdf>
FROM <generalmills.rdf>
FROM <papajohns.rdf>
FROM <dow.rdf>
FROM <ge.rdf>
FROM <cocacola.rdf>
FROM <colist.rdf>

WHERE {
  ?s rdf:type http://xbrl.us/us-gaap/2008-03-31#InterestExpense>;
     rdf:value     ?interestExp;
     xi:identifier ?identifier;
     xi:startDate  ?periodStart;
     xi:endDate    ?periodEnd.

  ?identifier rdfs:label ?companyName.

}
ORDER BY ?companyName ?periodStart

Not all of the named companies have InterestExpense figures in that namespace; the query just asks for the figure from the companies that do.

I originally planned to merge all the RDF files into one before running the query, but I decided to let SPARQL do it, which is why there are nine FROM clauses above. In a more realistic scenario, the RDF versions of the companies' XBRL data would be loaded into a single triplestore and you would run the query against that.

As Dave suggested, I could add data typing to the RDF created from the XBRL instances. Before I add anything else to the RDF, though, I want to make sure that it enables a new kind of useful SPARQL query against the data that I couldn't do before the addition. I'm open to suggestions!

What does this prove? We know that RDF is great for aggregating data, especially resources that may have different data structures but certain data in common. XBRL gets more interesting when you start aggregating data from multiple companies, and I haven't seen much of that, although my research was limited to free software. Being able to compare specific financial figures from different companies will be great for people doing financial research, and this new combination of standards and free software makes it pretty easy.

3 Comments

Bob,

From http://demo.openlinksw.com/sparql, just execute a SPARQL query (ie. select * from the XBRL instance URI ).

Take any XBRL instance from: http://www.sec.gov/Archives/edgar/xbrl.html .

I would still encourage you to assist me in getting all the XBRL interested parties to work together via the XBRL Financial Report Ontology effort at:

http://groups.google.com/group/xbrl-ontology-specification-group

We should be able to collectively produce a Financial Reporting Ontology from XBRL.

Kingsley


Hi Kingsley,

Does http://demo.openlinksw.com/sparql offer a way to issue a SPARQL query against multiple XBRL reports at once? That's really what I was interested in.

I had another question for you, but decided that http://groups.google.com/group/xbrl-ontology-specification-group would be a more effective place to put it.

Bob


Bob,

SPARQL FROM NAMED is how you refer to multiple RDF information resources via their URIs. You can even scope your SPARQL query patterns to specific graphs if you want via GRAPH {query-pattern} .

Just try it :-)

Note: use the drop down to tell the service to SPONGE (i.e. get remote Graphs).

Kingsley