Extensible Markup Language (XML) is emerging as the format of choice for a variety of types of data, especially documents. With its ability to tag different fields, XML makes searching simpler and more dynamic, turning enterprise documents from recycling fodder into data mining gold. Because XML content is liberated from presentation format - which independent style sheets specify - XML enables the extensive reuse of material. This allows enterprises to turn the same content into press releases, white papers, brochures, presentations and Web pages. For enterprises trying to meld incompatible systems, XML can serve as a common transport technology for moving data around in a system-neutral format. In addition, XML can handle all kinds of data, including text, images and sound - and is user-extensible to handle anything special.
Clearly, XML is coming into its own and seems destined to become the lingua franca of data online and off-line.
The problem until now has been how to manage the XML-tagged data. One promising solution is to use databases to store, retrieve and manipulate XML. The idea is to place the XML-tagged data in a framework where searching, analysis, updating and output can proceed in a more manageable, systematic and well-understood environment. Databases have the merit that users are familiar with them and their behaviour, so taming XML with a database context seems natural.
However, there are XML databases and there are XML databases. Purists would contend that only databases that store XML in its native format deserve the label "XML database". Others contend that if you can store and retrieve XML from it, and it's a database, then it's an XML database, regardless of how the data is stored. We'll sidestep these religious battles and consider both types. If the XML isn't stored internally as XML, we'll call that an "XML-enabled database". If the XML is actually stored as XML internally, we'll call it a "native XML database".
There are a number of reasons to use existing database types, and existing database products, to store XML even if it isn't in its native form. First, ordinary relational and object-oriented databases are well known, while native XML databases are new. Second, as a result of familiarity with relational and object-oriented databases, users understand their behaviour, especially with regard to performance. There is a reluctance to move to a native XML database whose characteristics - especially scalability - haven't been tested. Finally, relational and object-oriented databases are safe choices in the corporate mind. It's the old "nobody ever got fired for buying X" rationale. You don't necessarily want to bet the enterprise on a native XML database when you don't have to.
Luckily, you don't have to. There are XML-enabled databases that handle XML fine and that are based on tried-and-true relational or object-oriented models. These databases typically accept XML, parse it into chunks that fit the database schema and store it as usual. To retrieve XML, the chunks are pieced back together again.
Content@XML from Xyvision Enterprise Solutions is a content management system that stores XML documents in any of the popular relational databases.
"Content@XML enables collaborative work on the content itself, while allowing multichannel delivery of output," explains Jonathan Parsons, director of product marketing at Xyvision. Content@XML originated with clients who were working with Standard Generalised Markup Language, so the transition to XML was a natural one. The XML format is helpful since it's a World Wide Web Consortium standard and keeps the structural information independent of format presentation, lending itself to easier reuse.
"Clients feel that native XML databases aren't as well known or well supported as they'd prefer," says Parsons. "Using a relational database allows them to leverage existing database expertise."
One user of Content@XML is Element K Content LLC, a technical publisher in the US. "Using XML, we can create content regardless of presentation, reuse that content and customise the output as necessary," says Kress Riley, vice president of content development at Element K. The system receives material in XML and delivers it in whatever format is appropriate. "XML cuts a two-week process down to a few minutes," Riley reports.
Lotus Development's Domino database can also handle XML. Lotus' XML Toolkit even allows you to create and process content as native XML.
When using XML with a relational database, third-party middleware can be useful to handle the translation. One such product is XML-DBMS, a tool based on Java Database Connectivity (JDBC) that transfers data between XML documents and the database. "XML-DBMS allows you to rapidly use XML to populate a database that may already be part of existing applications," says Ronald Bourret, creator of XML-DBMS. Naturally, it works in reverse also, turning database output into XML. This can be useful both for publishing the resulting XML using style sheets and for transporting the data as XML. "XML-DBMS components plug the void in the middle between the database and the structure of an XML document," says Asante Bremang, a researcher at the University of Liverpool in England.
There are several criticisms of the use of relational and object-oriented databases to store XML. For example, one of XML's attractive features is its hierarchical organisation, which database tables crush. Relational databases must map XML to relational tables and therefore flatten XML structures into rows and columns each time data is needed. Uche Ogbuji, principal consultant at Fourthought Inc, also in the US, says XML is a mismatch with relational databases. "You can do tricky joins associating XML type to a database row to make them work, but they're hard to maintain," he says.
In addition, translating XML to and from the database requires considerable processing, especially for large or complex documents. This performance factor may be most bothersome when dealing with one of XML's fortes: producing Web pages from format-independent content. The problem is that the resulting pages may not load fast enough. Usually a client requires that a certain relational database be used, regardless of its suitability to the task. In such cases, Ogbuji says he prefers placing a wrapper around the relational database to handle the XML translation. But there's a lot of overhead in such an approach.
XML purists would argue that the way around these difficulties is to store XML natively, which makes sense. This immediately eliminates the need for translation between XML and the database. A new breed of such native XML databases is now emerging.
The first and probably best-known commercial native XML database is Tamino from Software AG. Besides being able to store and access XML, Tamino has all the fixings, like Open Database Connectivity, Unicode compliance, HTTP communications and the ability to handle non-XML data. A report from Gartner notes that "Tamino is especially well suited for organisations to integrate information from many different platforms and formats and send it to business partners or customers."
"Tamino has both straight XML and specially indexed search capabilities," says Tamino developer Klaus Fittges. Tamino has an elegant query language that enables short but powerful queries that would twist SQL into pretzels. "SQL can't handle queries to an arbitrary depth," observes Mike Champion, senior research and development adviser for new technologies at Software AG.
Other native XML databases include dbXML, eXcelon and X-Hive/DB. The eXcelon Data Server from eXcelon includes an object-oriented XML database that stores, manages and distributes native XML. X-Hive/DB, from The Connection Factory, works with JDBC-compliant relational databases. The dbXML Group is still developing dbXML, but Kimbro Staken, the company's chief technology officer, comments that "the thousands of downloads of our pre-alpha Core Version illustrate the booming interest in native XML databases".
Wireless Dimensions Corp is one user of the Tamino native XML database. It is developing an application called Mobile-Venue Suite that will let sports fans use mobile phones and other wireless devices to access a variety of services when they're in or near sports venues. Users will be able to check scores and statistics, order merchandise and food and have it delivered to their seats and test their expertise with trivia quizzes. Rollout is expected by the end of this year.
"XML's hierarchical features are helpful for several aspects of Mobile-Venue Suite," notes Leslie Townsend, vice president of marketing at Wireless Dimensions. "Services differ for box seats and other seats in a venue, for example. The trivia quizzes also have stepped levels of skill." Since the application must work in real time over wireless channels, performance and reliability are additional concerns.
The XML database choice was a natural, says Scott Cote, Wireless Dimensions' CTO. "Not everything maps to a relational table. You can force it, but you only have to denormalise it again for performance. Plus, we're dealing with text, not objects, so an object-oriented database didn't make sense either." XML, however, is an effortless fit for data such as sports statistics, where one set of facts (say, a list of scoring leaders) links naturally to another (a capsule of one player's statistics).
The ability to generate device-independent output from XML for use in Wireless Markup Language- or HTML-equipped devices is also crucial. "Software AG is a well-established company, so we can have confidence in new technology like Tamino," says Cote.
Fourthought is a consultancy mainly concerned with middleware and database integration, a great deal of which is Web-based and XML-oriented. Ogbuji has experience with a number of XML databases and says he finds that "eXcelon works very well for people familiar with object databases".
Oddly, one of the main criticisms of native XML databases is performance. Some foresee problems with searching for information that may lie near the end of a large document. With no other mechanism in place, a native XML database would have to slog through the whole document to complete the search. (Relational and object-oriented databases would probably dodge this difficulty by breaking the document into smaller pieces, each searchable more rapidly.)However, this isn't an insurmountable difficulty, provided you index each document when you store it.
"Tamino's indexing capabilities make up for any downside to searching through large documents," Cote points out. "Performance is the main issue for us. Native XML storage eliminates unneeded translation operations."
Ogbuji agrees: "Tamino is very fast."
Software AG says it has plans for Tamino, both internally and externally. It will be integrated with other Software AG products to permit even simpler access. Also in the works are a style sheet translation engine and new security elements to govern access. In addition, Tamino - currently available for Windows NT, Windows 2000, Solaris and The Santa Cruz Operation Unix - will appear on more platforms, including Linux and some mainframes.
Many of the major database vendors are incorporating XML support into their products or providing tools for using XML with their databases. IBM has an XML Extender for DB2 to let you store XML documents in DB2 databases; and new functions assist you in working with XML structured documents. Microsoft's SQL Server 6.5 and 7.0 use extensions for working with XML. SQL Server will one day include an XML output option for passing information to other systems. "Oracle has the broadest range of enterprise features and a powerful XML indexing engine," Ogbuji says.
In addition, many say they expect that the database biggies will soon offer their own native XML databases response to demand for XML processing that Web-based e-commerce applications will require. This will let information technology departments that must buy from specific vendors have their cake and eat it too, getting native XML functionality from their approved vendors.
Demand for XML will expand; new uses will include Internet search engines that use XML tags, e-commerce systems that must produce output rapidly, electronic data interchange with XML tags, data reuse and content personalisation. The move to XML databases to handle such applications will proceed in turn.
* Discussion of how XML and databases relate: www.rpbourret.com/xml/XMLAndDatabases.htm* Discussion of XML-enabled and native XML databases, and associated tools: www.rpbourret.com/xml/XMLDatabaseProds.htm* Fourthought Inc consultants: http://Fourthought.com* Microsoft SQL Server and XML: http://msdn.microsoft.com/workshop/xml/articles/xmlsql/sqlxml_prev.asp* Oracle and XML: http://technet.oracle.com/tech/xml/info/htdocs/relational/index.htm#ID79* White papers on consultants with analysis of XML data servers: www.xml-data-servers.com/* Developers of dbXML: www.dbxmlgroup.com* Developers of eXcelon: www.exceloncorp.com/* IBM DB2 and XML: www-4.ibm.com/software/data/db2/extenders/xmlext/* Lotus Domino and XML: www.lotus.com/developers/devbase.nsf/homedata/xml* Software AG's Tamino: www.softwareag.com/ tamino/default.htm* Application using XML: www.wirelessdimensions.net/mvs.html* Developers of X-Hive/DB: www.x-hive.com* Source of XML information: www.xml.com/* XML database industry initiative:www.xmldb.org* XML conference information: www.xmldevcon2000.com/conference.html