DBMS Support of SGML Files


September 20, 1999: Approaching the third anniversary of the last time I updated this, I thought I'd point people to something more timely: Ronald Bourret's more recent XML and Databases.

The following summarizes information that I have collected about database systems that present themselves as reasonable solutions for storing SGML data. I'll be happy to update it; for new entries, please follow the model shown. Overly long comment text will be trimmed. The more trimming necessary, the longer they'll take to show up here.

As of 5/18/95, I'm trying to date edits and new entries (for example, the Inforium and XSoft entries).

I tried to point URLs at information about the products in question, and not at company home pages.

I'd like to thank Lori Westbrook, whose 12/95 summary of similar information was a great resource in my 3/96 update of this page.

copyright 1996 Bob DuCharme bob@snee.com

SGML source for this HTML file

Shameless plug: Fake Your Way Through Minis and Mainframes!

See also the list of database products on the Whirlwind Guide to SGML Tools.

Open Text Corp.
Electronic Book Technologies
AIS Berger-Levrault
Information Dimensions
Oracle Corporation
Xyvision, Inc.
TechnoTeacher, Inc.
Texcel Research, Inc.
Computer Resources International
Georg Heeg Objektorientierte Systeme
Integrated Publication and Information Systems Institute (IPSI)
InfoDesign Corporation
Collaborative Information Technology Research Institute
Thunderstone Software - EPI
Infrastructures for Information
INFORIUM, The Information Atrium Inc.
XSoft, A Division of Xerox
Passage Systems Inc.
LSC Scandinavia AB

Open Text Corp.
Open Text 5
Tim sent me the following quote and attribution:

The following is from the Seybold report, Vol 24 #6, Nov. 30, 1994

"If we include full-text retrieval as a component of editorial document management systems, names such as [several competitors] and Open Text surface as full-text engine suppliers. (Of these, only Open Text makes use of SGML structural information)."

Tim Bray
180 Columbia Street West
Waterloo, Ontario Canada N2L 3L3
Tel +1 519-888-7111
Fax +1 519-888-0677

Electronic Book Technologies
DynaBase database / publishing environment
Last updated 7/12/96

EBT had originally planned to make Dynabase an SGML DBMS, but in early 1996 they repositioned it as a web page repository/document management product with no ability to handle arbitrary DTDs. Some of the press release talk around EBT's recent acquisition by INSO has promised further improvements to Dynabase that take advantage of INSO technology.

One Richmond Square,
Providence Rhode Island 02906
Tel.: +1 401 421 9550
Fax: +1 401 421 9551

AIS Berger-Levrault
Balise conversion
SGML/Store database
Last updated 2/13/96. The following is from Lori Westbrook's 12/22/95 summary:

ADOC is based on SGML/Store(TM) storage sub-system which has been designed to store and to manipulate SGML document or collection of documents under a database format. SGML/Store (and so ADOC) accepts arbitrary DTDs, without requiring any specific database schema definition or settings step. A SGML/Store (and so ADOC) database can manage heterogeneous document collections which are instance of multiple DTDs. ADOC 1.2 (Available)

- Documentation under CALS standards: SGML, CGM & CCITT G4 - DTD structuration: arbitrary DTD, definition of SGML elements granularity level - Parsing SGML instance according to its DTD - SGML transformation engine for scipting - Author private working space. ADOC is native concurrent editing system - Integrated native SGML editor - Integrated Graphic CGM editor - Integrated Group 4 graphic editor - Formatted manuals database - OS CALS A & B composer with incremental formatting - Viewer/differential viewer - Printer/differential printer (Postscript) - Electronic publications maker

ADOC 2.0 (2Q96)

New features

- customization of the client application - Client/server architecture (management, import/export, printing, viewing) - Document server based on a RDBMS - Configuration server is based on a RDBMS - SGML Entity management in SGML/Store (and so in ADOC) - Generic declaration at a Class level (DTD, FOSI, filter, access rights, granularity level,...) - Security refinement - APIs

Francois Chahuneau
Christoph Lexluse
35, Rue du Pont
F-92200 Neuilly sur Seine

Stephane Bornerand
phone: [+33] 1 46 40 84 06

Isabelle Bornier
phone: [+33] 1 46 40 84 15

FAX: [+33] 1 46 40 84 10
email: ibor@ais.berger-levrault.fr

Information Dimensions
The following is from Informations product overview web page as of 2/14/96.

BASISplus(c) is a client/server relational database system for text and mixed object documents that adheres to fundamental principles of open systems including interoperability, portability and scalability. Its multi-process server architecture allows BASISplus to perform well not only on single processor systems but on state-of-the-art symmetric multiprocessor and massively parallel systems. The database engine provides user authentication, document access control, concurrency control, deadlock protection, rollback and recovery. BASISplus also utilizes a powerful data dictionary (known as the DDB) for database definition and flexible data modeling.

BASIS SGMLserver(tm) retrieves, updates and manipulates SGML components as separate objects. BASIS SGMLserver sits on top of the powerful BASISplus database engine and recognizes SGML components and the hierarchical relationships that exist between them. It parses and validates every document that enters the database. xs

5080 Tuttle Crossing Blvd
Dublin OH 43017-3569
Tel.: 1 614 761 8083
Fax: 1 614 761 7290

Dave Warner


ADOC Patricia Francois
316 Route de Bayonne
F-31060 France

Oracle Corporation
Oracle Media Server (multimedia database)
Oracle Document (office product family)
Oracle Text Server (content-based retrieval)
Oracle ConText (meaning-based retrieval)
Oracle Book version 2 and SGML Gateway
also see: SQL3 (which features extensible SQL)
Diane Li or Scott Stephens
Oracle Corporation
500 Oracle Parkway, Redword Shores
CA 94065

Xyvision, Inc.
Parlance Document Manager
Last updated 5/13/96. The following information was supplied to me by Judy Hall Cox, Xyvision's Director of Marketing Services:

Parlance Document Manager is a content management system that is compatible with any DTD and stores SGML fragments as individual modules, independent of any document. These modules can be used and reused in several documents at the same time. PDM also works with more sophisticated SGML contructs, such as using SGML-marked sections to contain information for more than one document version. The underlying database is Informix with an object layer built on top giving PDM advantages of both a relational and object-oriented database.

Judy Hall Cox
Director, Marketing Services
Xyvision, Inc.
617-245-4100 ext. 5312

TechnoTeacher, Inc.
Steve Newcomb sent me the following on 3/18/96:

The HyMinder HyTime Engine is:

- an SGML Engine based, in part, on James Clark's "SP" SGML parsing technology; and

- a HyTime engine which has been under continuous development since 1989.

Version 0.8.4 of the HyMinder system is now available as a C++ library that can be linked to programs compiled using the Gnu, Borland, Unixware, SPARCcompiler, and Microsoft C++ compilers, and which run under Sun-OS 4.1.x, Solaris 2.0, Linux, Unixware, Windows 3.1, Windows 95, and Windows NT. Included with the HyMinder system is a bare-bones ISAM-based DBMS that is best suited for small stand-alone applications and prototyping. TechnoTeacher, Inc. embeds the HyMinder system in industrial DBMSs on a contract basis, and at least one such embedded version will be announced by its vendor in the second quarter of 1996.

TechnoTeacher, Inc.
P.O. Box 23795
Rochester, New York 14692-3795 USA
e-mail: srn@techno.com
voice: +1 716 389 0961
fax: +1 716 389 0960
e-mail: HyMinder@Techno.com

Texcel Research, Inc.
Information Manager SGML-aware document management system
The following is from a 12/95 press release.

Texcel ships Version 1.0 of Texcel Information Manager to major customers worldwide. "We're pleased to be able to provide a solution to the large-scale information publishing needs of our customers," said founder Bruce Wolman. "Texcel Information Manager lets large teams of authors, reviewers and production managers work in parallel to quickly produce high quality, up-to-date information products such as textbooks, maintenance manuals and regulatory submissions, which can be published either in print, on CD-ROM or on the Web. Texcel Information Manager is the result of our extensive experience building custom SGML systems and will increase our ability to provide even more cost-effective strategic publishing solutions."

Texcel Information Manager includes a suite of applications to help people work together to produce accurate, up-to- date information, such as online electronic review, dynamic document assembly and workflow management.

Texcel information Manager Version 1.0 supports the major UNIX platforms, including Hewlett-Packard's HP 9000 (HP-UX), IBM RS 6000 (AIX), and Sun SPARC (Solaris and SunOS) as well as an MS-Windows 3.1 client. Pricing starts at $25,000 for the server and a 4-seat license. Customization and support services are available.

Sebastian Holst
Texcel (UK) Ltd.
Stuart MacRae, +44 1753 833 111

Computer Resources International
Following supplied by Bret Dangelmaier of CRI on 9/30/96:

Life*CDM (Compound Document Manager) is an SGML repository and document management system designed for the ever-changing needs and demands of current and future publishing. The core architecture of Life*CDM was designed around a strategy for providing the publisher with optimal control of both the information structure and publishing environment, thereby enabling the *dynamic tuning* of the system in response to virtually any contingency while ensuring complete data integrity, workflow control and a strategic growth path.

Life*CDM's design emphasizes exceptionally easy customization, flexibility, performance tuning, control, and error reduction with inherent productivity features, including: Minimized human interaction; Maximized rule-based processing; Composition, formatting, revision marking; Multiple simultaneous media support; and Optimized creation and storage of customized manuals.

The main components of Life*CDM are: The Life*CDM Repository, which provides the foundation for an electronic publications management system. It stores all text and graphic information, including DTDs and layout definitions, in a shareable object hierarchy maintained in an Oracle database. The Life*CDM repository allows for import/export of SGML documents. The Life*CDM Revision Shell, which contains functions for initiating, reviewing and committing changes to the repository. It is also the main interface for repository configuration and administration. Interaction with this functionality typically occurs through APIs to integrated third party applications such as the ArborText Adept Editor , Grif SGML Editor, and the Auto-trol Technical Illustrator. The Life*CDM Composer, which provides composition and production of partial and complete manual revisions based on the SGML hierarchy and layout formats defined within the Life*CDM repository. It automatically generates page numbering, revision bars, and various types of front matter (like LEPs & TOCs). The Composer produces automated output for screen display, as well as media varieties ranging from paper to microfilm to digital delivery.

Contact point in the US (Seattle):
Michael Sandifer

Contact point in Denmark:
Jan Lauridsen
Vice President of Sales and Marketing JL@cri.dk

Technical Questions about Life*CDM:
(CRI subsidiary in Norway that develops product)

Svein Myhra

Georg Heeg Objektorientierte Systeme
ObjectDocs, "System zur Erzeugung, Verwaltung und Verteilung
technischer Dokumente"
(My German is nicht so gut, but the following sentence from the German-only marketing literature that I have seems relevant.) "Tabelleninhalte einer Werksnorm, die in ihrer SGML-Reprasentation in ObjectDocs vorliegt, konnen von anderen Berechnungsprogrammen verarbeitet werden."

Baroperstrasse 337
D-44227 Dortmund, Deutschland
Tel (x 49-2 31) 9 75 99-0
Fax (x 49-2 31) 9 75 99-20
e-mail info@heeg.de

Integrated Publication and Information Systems Institute (IPSI)
Darmstadt, Germany
The IPSI has created an SGML database application that is operational and have an interactive online journal application running on top of it. As of 2/95, they plan to make the SGML database application framework available free in the near future to anyone willing to install VODAK, an object-oriented DBMS developed at IPSI, after they finish some final tuning and debugging. VODAK is already available via anonymous ftp under ftp.darmstadt.gmd.de.


InfoDesign Corporation
The following is from some information that was e-mailed to me (U.S. contact information added 2/14/96, courtesy of Lori Westbrook's research)

WorkSMART is an integrated work environment that provides an enterprise with the tools to manage the production and maintenance of millions of pieces of complex information, that are used by thousands of people at hundreds of locations.

Derived from object-oriented work management technology that InfoDesign developed to support the U.S. DoD JCALS program, WorkSMART enables any organization to distribute work and corporate information assets throughout the enterprise in a transparent, controlled manner.

(among the many features listed:)

- a flexible data repository supporting data evolution - the WorkSMART Data Base (WSDB) supports dynamic schema evolution (object models can be modified in real-time without the need to dump the database, build a new schema and then reload the database) - multi-dimensional data storage (support for multiple data domains) - supports multiple, concurrent revision trees for an object - supports SGML data, providing an entity manager for 3rd party applications - full audit trail on object revisions - flexible object storage and retrieval capabilities - rules-based storage schemes - customer-definable storage classes utilize configurable access "methods"

InfoDesign Corporation
Waterpark Place
10 Bay Street, Suite 610
Toronto, Ontario, Canada, M5J 2R8
416 369-9125
416 369-0042 fax

Joe Synder
7700 Leesburg Pike Suite 204
Falls Church, Virginia 22043

Collaborative Information Technology Research Institute
the Structured Information Manager (SIM)
From some information e-mailed to me:

SIM is an SGML based database system, specifically designed to support multi gigabyte collections of data, with hundreds of on-line users. SIM uses the Z39.50 communication protocol, in order to support true client-server information retrieval. SIM also supports storage of binary data, such as images, and supports data in the MARC format. Interfaces include an MS-Windows client, a Command Line interface and a World Wide Web gateway.

Philip Anderson
723 Swanston Street
Melbourne Victoria 3125 Australia
Tel +613 282 2496
Fax. +613 282 2444
Email phil@kbs.citri.edu.au

Thunderstone Software - EPI
TEX.IS (others available as well, but this is the one with some SGML support)
(The following is from the summary on their web page. See also their home page listed below.)

Text Information Server

Thunderstone's TEX.IS program merges the horsepower of Metamorph & 3DB with a SQL Relational Database Server to provide a software package that can tackle the most demanding information management tasks. TEX.IS can be applied anytime there exists information in any quantity that has text as one or more of it's major attributes. TEX.IS allows you to easily bridge the gully [sic] that exists today between traditional databases and document driven activities by allowing the import, export, management and concept-based retrieval of textual information.

- ANSI SQL driven So you don't need to learn a new 'standard'. - Binary Large Object support A BLOB field in TEX.IS can be of any length and may contain anything you want. - Unix, DOS/Windows and NT Servers You choose the power you need and we'll provide the engine. - Unix Remote Procedure Call (RPC) The RPC call set provides client access over any TCP/UDP network. - Microsoft ACCESS for Windows compatible With TEX.IS as a server you can roll your own client application in hours. - Microsoft Open Database Connectivity (ODBC) driver This driver allows you to talk to TEX.IS from within your Windows application program. - Postscript file and record handling You can even print your postscript document to a non-postscript printer! - SGML document manipulation TEX.IS can search, display, import, and export SGML documents. - Embedded SQL for C and C++ We provide a embedded SQL preprocessor that allows you to use TEX.IS within your application in short order. Just imagine what you could do with the real-time Metamorph API in conjunction with TEX.IS.

Thunderstone Software / EPI, Inc.
P. Barton Richards ( Bart )
11115 Edgewater Dr.
Cleveland, Ohio 44102
Phone: 216-631-8544
Fax : 216-281-0828

Infrastructures for Information
VISion Servers - SGML based document management
SAS - a linkable SGML parser for SGML application development
The following is from the Infrastructures for Information information packet as of 2/14/96:

The VISion© product family is an on-line client/server Document Manufacturing System that uses virtual information space technology or VIS©. Built with the Standard SGML Support System or S4 API's, it is designed to manage the efficient transformation of raw information into various useful information products. VISion products apply the discipline of the manufacturing environment to the business of document production, imposing clearly defined standards for both format and content, promoting reusability of components, and enforcing communication among all those involved in the process of building a document. VISion guides a document through every stage of evolution, from a primary draft to a published document. VISion gives its users electronic access to all the information components used to construct document information: text, graphics, comments, approvals, and annotations. VISion com-bines groupware mail distribution functions with the structure of a document to create a framework for collaborative development.

Infrastructures for Information Inc.
330 Dupont, Suite 302
Toronto, Ont. M5R 1V9
Tel.: +1 416.920.6489
Tel.: +1 416.920.6489


INFORIUM, The Information Atrium Inc.
Ron Neumann e-mailed me the following on 4/11/95:

INFORIUM, The Information Atrium Inc. offers LivePAGE(tm) software which efficiently stores SGML documents, including HTML, in an SQL relational database. The resulting database can be viewed, searched, and updated as an online or CD-ROM electronic document. A LivePAGE document can include tables, graphics as well as multimedia objects.

Ron Neumann
The Information Atrium Inc.
158 University Avenue West
Waterloo, Ont.
N2L 3E9
phone: (519) 885-2181
fax: (519) 746-7362

XSoft, A Division of Xerox
Last updated 7/12/96.

Laura Walker of XSoft sent me the following on 7/3/96:

Astoria is an object-oriented document component management system that enables users to easily find, use, share and manage SGML documents and their components, as well as unstructured documents. Unlike other document management systems built on relational or hybrid databases, Astoria is ready for production as soon as it's installed.

For managing SGML elements, employing an object-oriented database is the most natural approach. Because Astoria works directly with SGML elements using an object-oriented database, it can provide unprecedented control over SGML documents as well as unstructured information by allowing fine-grained access and version control.

Features Highlights:

- NAVIGATION: The Astoria Navigator lets users explore the database and view the document hierarchy down to individual components.

- SGML EDITOR INTEGRATION: Astoria's open interfaces allow integration with any SGML editor. These "Bridges" give authors creation and management tools in their preferred environment, and let them work with either complete documents or a single paragraph. Regardless of an element's position in the document hierarchy, Astoria manufactures a fragment DTD on the fly that conforms to the document structure that the author checked out. Astoria Bridges are currently available for InContext and Arbortext's ADEPT*Editor. We plan to release Astoria Bridges for SoftQuad's Author/Editor, GriF, and Adobe's FrameMaker+SGML by the end of 1996.

- REVISION TRACKING: Because of its sophisticated integration with SGML editors, Astoria maintains revision information on individual elements, and past versions are always available. Non-SGML files benefit from the same revision tracking mechanism used for SGML elements.

- REUSE DOCUMENTS AND COMPONENTS: Any SGML element stored in Astoria can be referenced in many different documents. Elements can be reused at any level in the SGML hierarchy, from warning paragraphs to chapters to entire documents. - SEARCH: Astoria provides a search tool that makes element reuse straightforward. Astoria's search engine lets users search on document content, SGML structure, SGML attributes, and version data such as date and author. From the list of search results, users can easily place a reference to the selected component in the target document using the familiar clip and paste technique.

- IMPORTING DTDs, DOCUMENTS, FILES: Adding documents and DTDs to the system does not require special mapping, tool-building, or modification to the documents or the DTDs. Astoria can accept arbitrary DTDs, and the process takes a matter of minutes rather than the days or weeks required by relational databases. Expertise in SGML is not needed. Importing can be done in a bulk-loading mode for large volumes of information.

- LINKS: Users can connect elements to other elements in hypertext fashion within and between documents using links. The links let workers create non-linear paths of relationship through the database.

- API: Astoria's full-featured C++ API is accessible through the Astoria Software Developer's Kit, and can be used to integrate Astoria with any other software product.

- ARCHITECTURE: Astoria uses a true client-server architecture with user authentication through TCP/IP network credentials. Servers can be distributed across the network, and each server can host multiple cabinets. Documents can also be distributed, sharing components across multiple cabinets.

- CONFIGURATIONS: Clients: Sun Solaris 2.5 and/or Windows NT 3.5.1. Server: Sun Solaris 2.4 or 2.5 or Windows NT 3.5.1. Network: Microsoft TCP/IP

- CAPACITY: Astoria 1.1 supports databases of up to 100,000 pages of SGML data, representing approximately 6 million objects (the actual number of pages depends on the complexity of the DTD). The total capacity of the database exceeds 6 Gigabytes.

10875 Rancho Bernardo Road, #200
San Diego, California 92127-2116
USA Telephone: 619-676-7700
Fax: 619-676-7710
Email: pubmkt@xsoft.sd.xerox.com


Rosetta Robin Wynne-Edwards, the Andyne marketing manager, e-mailed me the following on December 12, 1995:

Rosetta is a client/server, SGML aware, object-oriented database that creates the database schema on the fly from the DTD. It supports the storage of SGML instances at the element level. This allows client side searches based on the structure of the document as well as the content. The product also supports user defined metadata at the element level.

In addition to from the traditional authoring and publishing uses of SGML, Andyne will be integrating Rosetta with their decision support tools, Andyne GQL and Andyne PaBLO as well as other desktop applications. = This will create the first Decision Support suite capable of accessing = and analyzing information from relational databases, multidimensional databases and structured text repositories in a single document.

Robin Wynne-Edwards
Andyne Computing Limited
552 Princess St.
Kingston, Ontario, Canada
K7L 1C7
Phone: 613-548-4355
Fax: 613-548-7801


Passage Systems Inc.
Last updated 2/15/96.

The following is from some literature I received from David G. Dow at Passage Systems in 11/94 and from some information provided by Steven Hares to Lori Westbrook:

PassagePRO is designed to accomplish two major tasks. These are: 1) Provide the tools for generating SGML online documents in an automated fashion. PassagePRO has been designed so that content providers can continue to author in their favorite tool and then use PassagePRO to convert to SGML. With PassagePRO the user does not need to know anything about SGML, how to translate to SGML, or the entire online production process. PassagePRO has abstracted that knowledge away from the user and put it behind an easy to use GUI. 2) Provide a document management and storage tool. Our product uses an object-oriented database to store all files in a persistent manner. PassagePRO assigns revisions to files, manages their relationships between each other, and provides the checkin and checkout tools needed to take advantage of document management principles.

Steve Hares
Software Engineering Manager
Passage Systems, Inc,
Cupertino, CA.
Phone: 408-366-0324 x260
Fax: 408-366-0320

LSC Scandinavia AB
Module Master
Fredrick Boye supplied me with the following on 5/22/96:

What is Module Master?

Module Master is a fully AECMA 1000D, DEF STAN 00-60 compliant database (CommonSource DataBase, CSDB) for the storage, management and retrieval of modular information constructed in accordance with the above standards. The package is an automated storage mechanism designed to absorb data in a Standard Generalised Markup Language (SGML) format, thereby exploiting the inherent data neutrality offered by SGML. The prime advantage of the package is that it has the flexibility to integrate with any SGML authoring system in any work flow environment and to output to any SGML compliant viewer. The product has the added flexibility of allowing the user to customise the storage criteria and therefore has uses beyond strict compliance with the standards.

How does it work?

To permit the flexible and efficient retrieval of information from the CSDB, the author has traditionally been required to add a significant amount of status information to comply with the standards. This includes information such as the issue status, verification state etc. In fact AECMA 1000D and DEF STAN 00-60 both mandate over a dozen separate storage and retrieval criteria. Module Master extracts this information from the SGML data automatically and adds it to a transparent database structure, thereby facilitating rapid and efficient retrieval of the information packets at a later date.


- DEF STAN 00-60 compliant - Automatic data extraction - Fully SGML data compatible - MS Windows interface - Inherent data version control - On-line help

What are the benefits?

In an AECMA 1000D or DEF STAN 00-60 environment the primary storage key is the Document Module Code (DMC). This has changed radically from AECMA 1000D to DEFSTAN 00-60 and has been extended from 17 characters to 39 characters. Therefore the automatic extraction of data of this complexity is clearly highly advantageous. It saves considerable time and minimises the potential for the introduction of error. Module Master provides a full Windows Graphical User Interface (GUI) for entering the retrieval selection criteria. Once the retrieval criteria have been entered by the user on the specifically-designed screens, the appropriate packets of information are retrieved and displayed. The user can then select either to modify this information by re-entering the SGML authoring environment or to publish the documentation into an Interactive Electronic Technical Publication (IETP).

The system has been developed to take SGML data in its raw form, constructed against any Document Type Definition (DTD) in any SGML development package and store this information on any mass storage system, be it magnetic or optical.

Fredrik Boye
LSC Scandinavia AB
Vartavagen 22, P.O.BOX 10022
TEL +46 (0)8 660 02 80
FAX +46 (0)8 660 09 65
MOBILE +46 (0)70 550 27 43
RESIDENCE +46 (0)8 735 80 25
E-MAIL boye@lsc.se