Changing my mind about XBRL again

Call me a flip-flopper.
XBRL logo

When I first heard about the eXtensible Business Reporting Language, it sounded great: an XML standard for business reports and their contents. Who could argue with lots of data with lots of value to many people, available in an open standard? I knew some of the people who worked on it, and I dug in and played a bit, but eventually lost interest. A comment that I left on a Tim Bray ongoing posting titled XBRL News last December showed the high point (or perhaps low point) of my cynicism:

XBRL (which has taken over 7 years to achieve the minor level of adoption that you describe) is second only to W3C Schemas in the number of people it's inspired to say "sure it's complex, but don't worry--there'll be tools to take care of that!" A key problem is that is that it's so customizable and flexible that it's difficult to put together something that can perform similar processing on multiple arbitrary XBRL documents. Compare DITA, which allows two different documents to appear structurally very different, but has open-source software to abstract away the differences to let us treat the documents as having an equivalent structure. (Of course DITA has a simpler, more straightforward domain, so there's no ocean to boil.)

When XBRL has an open source equivalent of the DITA Open Toolkit, I'll be ready to take another good look at it. Until then, as with so many standards, without free software that lets us do stuff with data that conforms to the standard, I don't see much incentive for conformance to the standard, except of course for regulations.

Another source of my frustration was that in May of 2004, when I was last playing with it, Fujitsu's set of XBRL Tools, the most popular free XBRL software at the time, was going through the upgrade from XBRL 2.0 to 2.1, and at that point their latest document validator couldn't handle documents created by their latest document creator. Although the Fujitsu tools are now available only to XBRL Consortium members (and joining the consortium means jumping through a hoop or two), more free XBRL software is now available.

More importantly, Tim pointed out last December (with a link to the SEC's XBRL Data Submitted in the XBRL Voluntary Program on EDGAR page) that less than 100 big public companies were reporting in XBRL, but this week I count 468 companies listed on that page. That's a substantial increase, and that's plenty of data of data to play with—enough to make the whole idea of XBRL more than just a theoretical nice-to-have.

Something else that drew me back to XBRL is that as I studied various kinds of ontology and taxonomy work, I noticed that XBRL people were doing a lot of very careful taxonomy work to support specific business goals. An American Council for Technology white paper titled "Transforming Financial Information – Use of XBRL in Federal Financial Management" (PDF) quoted Charles Hoffman (both the father and author of XBRL, according to the paper) as saying

XBRL is in a field referred to collectively as Semantic Technology... Semantic Technologies is a multi-faceted field with progressive layers of technology and complexity. The World Wide Web Consortium developed a set of semantic standards established at the turn of the century (most significant of which are the Resource Description Framework (RDF) and the Web Ontology Language (OWL)). This field is rich with possibilities and stands as the next logical step in the natural progression of information technology to seek a higher value proposition.

I'll bite, but it doesn't seem that many others have, because I don't see much work on the potential connection XBRL and the W3C-oriented semantic web standards. (I'd be happy to have things I missed pointed out to me.) Many web pages out there mention both XBRL and RDF, but mostly as examples in lists, with no explicit discussion of possible relationships. The XBRL Ontology Specification hasn't gotten any further than the 0.0 status it had in April of last year, with mailing list activity ending a month after it started. There was a Financial Services XBRL Seminar at the 2008 Semantic Web Technology Conference, but I haven't seen any evidence of cross-fertilization that came out of it.

So I'm going to pursue the potential connections between XBRL and RDF-related technology myself. As I read about all those information relationships that XBRL can model, on the one hand I'm thinking "Cool! (How about that XLink, after all!") and on the other hand I'm thinking "This would be so difficult to model as triples!" I'm more interested in a bottom-up proof-of-concept than in a top-down ontology, though. For a start, instead of modeling all of XBRL's many potential data structures as triples, I plan to model a subset that can be queried with reasonably non-contorted SPARQL queries and to put together a demo using some of that EDGAR data. Playing with the existing free software and writing some XSLT to convert EDGAR filings to RDF will be priorities. I'll report on my progress (or lack thereof) as I move along.

9 Comments

Bob,

Please make time to do the following:

1. Visit http://ode.openlinksw.com
2. Follow the examples link
3. See XBRL instance data in Linked Data form
4. Download and install the OpenLink Data Explorer for Firefox
5. Visit any XBRL instance doc URL
6. Use the "View | Linked Data Sources" feature of ODE to flip from the XBRL view to Linked Data View


Kingsley


Thanks Kingsley, that works as describes and is very cool.

Bob


I think the XBRL instances are a really terrific design: linking rather than direct markup.

But the modeling on top of XSD absolutely stinks as a system, big suckerooney: there are some nice tools to be sure, but the tools never stop someone from needing to know what is going on, and what is going on is XSD+++++. IIRC it uses equivalence classes a lot, which is one of those "we don't implement that" features for some data-binding/DMBS kinds of tools.

The difference is that the XBRL modeling is at least more straightforward. It is not the XSD sea-of-details approach.

So a mixed bag on friendliness, but definitely a 'hardcore' technology, not a casual one or for visibility to Mom-and-Pop.

Cheers
Rick


XBRL's complexity reflects the richness of the domain it models. Its sophisticated use of XLink certainly makes XBRL hard to process with XSLT. I prefer to think of XBRL as a transfer format, as it turns out to be rather easy to convert to RDF. This way you can generate different kinds of XBRL reports using queries over a scalable RDF triple store, such as sesame. This also opens the theoretical possibility for XBRL filings to be submitted in one of the RDF syntaxes, e.g. turtle. The current XML syntax makes use of XML Schema to assist with validation of XBRL filings, and it will be interesting to look at validation using Semantic Web technologies as an alternative.


Thanks Dave!

>it turns out to be rather easy to convert to RDF.

Just what I was looking for! Who has done this? Is there code available to use or see?

thanks,

Bob


This is something I am working on as a background activity. The code for converting XBRL to RDF turtle syntax is in C and linked against libxml2. I will ask my manager at JustSystems if it would be possible to release this as open source, but that will inevitably take some time as it requires sign-off at the top levels of the company. I will post some details on the relation between XBRL and RDF on my blog, but due to vacation and other higher priority work, this may take a while.


Hi Bob,

I have started to map XBRL XSDs and instance data from the EDGAR program to OWL and RDF. I use the generic mappings provided by ReDeFer XSD2OWL and XML2RDF tools (http://rhizomik.net/redefer).

The mappings are partial and quite preliminary. All is available from http://rhizomik.net/ontologies/bizontos

Best,

Roberto


Thanks Roberto, this looks interesting.

Bob