Pulling data out of computers in the mid-twentieth and early twenty-first centuries

Report generation in the 1950s and the future of RDF.

I've written before about 's 1981 article in 's Journal of Research and Development covering the history of database systems from 1955 to 1980, and I left off saying that I'd devote a separate entry to his history of report generation. The creation of reports may sound mundane, but throughout the history of computers the pulling of data meeting specific criteria is the most important thing we do with computers. (Why put data in or calculate new data in the first place?) Since writing that, I've found a more primary source, written by McGee in 1959 when he was at GE's Hanford Atomic Products Operation, titled Generalization: Key to Successful Electronic Data Processing (unfortunately, this requires ACM membership or a fee to access).

The "generalization" he writes of is the creation of routines that can be re-used in multiple applications, such as a sorting routine. "Thus, by suitable generalization [his italics] it is possible to design a sorting routine that will sort any file, regardless of the data it contains." While many take the principle of increased abstraction to promote code re-use for granted today, Harold Abelson and Gerald Jay Sussman don't in their classic Structure and Interpretation of Computer Programs, devoting plenty of pages to why it's a Good Thing and the best way to go about it.

[diagram of records on magnetic tape]

As an example of these routines, McGee describes "the generalized and file maintenance routines [that] have been available to Data Processing planning personnel a little less than three months at the time of this writing." Try to imagine the first ever techniques for modular generation of parameterized reports being more recent than Apple's introduction of the video iPod is today. The automated generation of printed reports based on mechanically stored data had already been around for decades in the use of punched card manipulation machines, but with no software to speak of, redesigning those reports meant rewiring plugboards. In McGee's description of doing this with software, it's interesting to read such a primary source on some of the earliest uses of terms such as file, record, and field. (Some of the earliest that I know of, anyway—I'd love to see pointers to earlier use of these terms.)

McGee's 1981 paper describes developments in report generation after his Hanford work, particularly in the early 1960s at IBM on their 1401 machine. This "Report Program Generator" program built on the Hanford work, and evolved into RPG II and RPG III, programs I remember seeing mentioned in job listings when I was younger. And now I know what "RPG" stands for!

In the Hanford days, when people considered the use of alphanumeric assembly language symbols to represent machine opcodes to be a leap forward in ease and usability, a one-day turnaround for the design, implementation, and generation of reports added a huge amount of value to computers and to the data on them. As data models and delivery platforms have evolved since then, the ability of less technical users to more easily get the data they want has driven the adoption of many new platforms and data models, and, extrapolating to the future, I think that a report generator will be the killer app that is waiting for. (Either that or a simplification of RDF comparable to what XML did to SGML.)

While playing with Leigh Dodds' Twinkle SPARQL query tool, which I would describe as a simple IDE for the creation and running of queries and viewing of results, I was thinking about the possibility of a GUI tool that generates and runs queries against RDF data sets. (OK, RDF "graphs." I don't like this term because, while I know it refers to a class of data structures and not a class of pictures, it's the kind of technical alternative use of plain English terms that has helped to confine most RDF use to academia.) This tool would allow a user unfamilar with SPARQL or RDF syntax to fill out dialog boxes to show which data he or she wants, and it would then generate and run a SPARQL query without that user ever seeing that query. I think that a tool like this would help people appreciate the value of the flexibility of RDF, and Leigh agrees. Users of such a tool will miss out on several things, just as the legions of people using Crystal Report Writer who aren't fluent in SQL are missing out on a few things, but like those people generating payroll and inventory reports from the relational databases, people using such an RDF query tool will get more useful work done than they would if they weren't using data stored according to this model.

I'm going to try to prototype such a tool. So, if you see a lower frequency of postings on my weblog for a little while, rest assured that I'm doing something more productive than reading ancient computer science literature.

2 Comments

Cool post; I love computer history, especially finding out the roots of concepts I take for granted.

I also like how you mentioned someone doing to RDF what XML did to SGML.... that's been a long overdue thing that I've expected to happen, but hasn't. You should do a drastic refactoring and paring down (or a reconceptualization) of RDF to make it more amenable to the mainstream and easier to work with. Your right about how technical language can also create barriers to adoption; when the RDF community throws around terms like ontology markup langauges, you bet this scares off your standard enterprise developer.


Hey Bob,

We've done such a thing: Jspace is a visual query builder for RDF databases with an interesting UI wrapped around it. Basically a "polyarchical browser". There's an in-browser version cooked up by the folks at Southampton, mSpace, which we've reimplemented as a Java app and used for the problem of database integration for NASA. They're using it to do expertise location.

So, basically, you convert existing databases into RDF, federate them, and then run JSpace against the result -- what you end up with is a pretty useful tool which, behind the scenes, is doing exactly what yr article suggests: building queries based on a user's arbitrary navigation through an information space and presenting the results, i.e., reports, to the user in some novel ways. (Okay, I'm exaggerating a tiny bit: you also have to write a JSpace browser "model" which tells the tool how to relate the federated bits together, but it's not very difficult to write and it's just another RDF graph thingie.)

For NASA we threw in social network graphs (to locate an expert in a rolodex culture, yr next best move is to call someone who is closer to that expert in the social network) just for fun! :>

There are more features to add, including Atom channel generation for arbitrary queries, etc., but you might want to download and play with it some. It's GPL'd. ;>