Getting people of different backgrounds to communicate effectively can be a difficult task, but translate that chore to data in disparate computer systems and, writes Ian Yates, you get help from XML.
When it comes to human languages, there never was an obvious logical reason for the failure of Esperanto, a language specifically designed to be the universal communication medium. Basically, Esperanto failed because everyone liked their own language better, and there were enough bilinguals around to make international communication possible.
When it comes to Internet languages the XML markup language should have replaced every other means of data interchange and we should all have deployed XML-native databases by now. But we all like our existing databases too much. Or more realistically, we all have too much data stashed away in them, and they work fine, and we have learnt not to screw around with things that work. That’s a sure-fire way to make them stop working.
Instead we have bolted XML-aware front-ends onto our databases and we have spent wads of cash on mapping tools, making it relatively easy to become XML-compliant without upsetting the status quo too quickly. Of all the standards that have appeared over the years, the uptake of XML has been faster than any other excepting perhaps HTML. XML is quite obviously an idea whose time has not only come, it is long overdue.
Although both XML and HTML are markup languages, HTML concentrates on describing how the data should look when viewed by humans. XML defines the data elements, and while readable by humans, it is designed to allow computers to communicate with each other without the need to be running the same application, or even hosting the same database. XML defines the structured information those database elements contain, allowing intelligent mapping between disparate data stores.
Although XML was developed initially for a specific purpose, the ease and elegance of the language rapidly appealed to a much wider audience and almost every application shipping today offers XML as an option, if not the default format, for data exchange. Instead of word documents containing only words, they can now contain information describing how those words relate to each other, and what information if any, is hidden inside. The area where XML has found most immediate acceptance is for data exchange between different businesses. Anyone familiar with the EDI (Electronic Data Interchange) specifications would understand immediately why XML is more appealing.
The benefits of using XML are almost self-evident. Because the language is an open standard it’s reasonably easy to code data exchange front- and back-ends on one platform with a high expectation of success when the files are sent to another system. The only other way to achieve this level of data interchange is to insist that everyone uses the same database and preferably the same application suite as well. Since that was never going to happen anytime soon, XML simply stands out as the answer to many a code jockey’s prayers.
XML documents can be read by humans using any text editor, unlike the raw contents of a database or data file. That simple fact allows developers to rapidly debug any errant data transfers without needing sophisticated tools, although such tools now exist to make life even easier. We are also beginning to see a proliferation of standardised XML protocols known as XML Schemas, being produced for every imaginable industry sector. Need to send a purchase order to a furniture factory? No longer a problem if you both stick to the furniture manufacturer’s XML Schema. There's a high probability that in the near future a majority of data transactions will happen with XML under the hood.
If humans had ever embraced Esperanto it is highly likely that national languages would by now have slipped into the background. With the runaway success of XML it seems logical that before too long native-XML databases will be the only survivors in the data centre. Certainly, there is strong appeal for being able to store XML data directly and not spend time parsing and processing, unless there is a need to act on the data as it is received. However there are a significant number of relational databases out there and every single one of them has released add-ons to allow XML data to co-habit with the other data already residing there.
That hasn’t stopped a raft of start-ups offering native-XML databases, with promises of better efficiency and speed of access when compared with their “legacy” competitors. Most of the native-XML offerings have emerged from the object-oriented database vendors that failed to gain much traction back in the late eighties and early nineties. This time around these vendors are hoping that they’ve hit the mother lode. Indeed they certainly appear to be on a winner if you hammer their databases with XML data made up of structured text or blobby multimedia content. When the native-XML databases need to store tables of inventory and accounting records, legacy databases reign supreme, and there hasn’t been a rush to move away from the familiarity and stability they offer for those applications.
Rather than a revolution, the back-end databases seem to be in evolution and it will take some time yet before XML sweeps away the incumbents in the data centre. However, it will be increasingly hard to spot the legacy database since they all appear to speak XML like a native, and for all intents and purposes developers will treat them as natives. Of course, native-XML database technology was developed in the first place to try and get around the so-called XML-relational impedance mismatch. But very few native object-oriented databases survive, while the familiar relational databases masquerade as object-capable with mapping and translating front-ends in place. It is possible that a similar fate awaits native-XML databases.
While the industry generally agrees that XML and relational data will become interchangeable, converting XML to and from relational data is often not as easy as the brochures suggest. In practice most automatic conversion utilities only work properly on the sample data that’s included with the utility. As they famously say in automobile adverts, ‘your mileage may vary’. That means many hours spent on each project accurately mapping the datasets and testing to make sure that nothing is corrupted along the way. Plenty of graphical tools are available to help with syntax and semantic checking, and as long as both parties stick to the XML specifications, the task only has to be completed once. It’s still a lot faster to debug a schema than to debug an application written specifically for the purpose of data exchange.
None of the major database vendors has yet integrated every possible XML option, providing fierce marketing contests for the enterprise dollar, while the minor players and middleware vendors are working long hours hoping to be the one with the silver bullet that works with any and every database. There have been claims and counter claims about the proprietary nature of some vendors’ XML integration efforts, which is a worrying trend if proven true, since the whole tenet of interoperability is based on everyone sticking to the standards. However, in general it really does look as though the next generation of data storage will be XML-based, mirroring XML’s conquest of the data exchange world. It may not be the perfect solution in terms of speed and efficiency but simple, standardised data transactions will prove to be an irresistible drawcard.