The alphabet soup of information technology standards can be hard to swallow. Why XML? Don't we have enough with HTML (Hypertext Markup Language), SGML (Standard Generalised Markup Language) and all the other Meta Languages?
Certainly XML is the flavour of the month. It grew out of attempts by the World Wide Web Consortium (W3C) to overcome the various limitations of HTML, while providing a solution that is simpler than SGML.
SGML became an ISO standard in 1986 and has been used by specialist publishers and database developers. However, it is too complex for most people and at the time of its development, there was little need to communicate electronic documents widely.
The Internet and the Web changed all that with HTML in 1989, proving that a simple language could be very powerful when combined with the Internet. But HTML is an inelegant combination of mark-up features with a fixed structure that requires Web browser developers to add their own non-standard extensions to provide missing features.
XML offers a more flexible and accessible alternative. In fact you can't write a document directly in XML. You must have a Document Type Definition (DTD) which defines the syntax which can be used to write a particular class of document and software which can interpret that syntax to carry out some function, such as display a Web page.
One example XML application is XHTML, which provides a more carefully formatted implementation of HTML using the XML syntax. XHTML is designed to allow a bridge between the existing Web and new features. XHTML's stricter definition will require some minor adjustments for Web authors, but it does allow extensions to be easily added for special applications.
Proposals for standard syntax for metadata, styles, multimedia and e-books are not new. What is new is that XML allows these standards to be expressed using the same basic notation, and applications for them can be built with the same functional building blocks.
Previously, a video editing program would be built using software different from that used for a book typesetting system. With XML, the same editor can handle a book or a movie. An XML enabled Web browser has a built-in ability to interpret any XML document, with applets used to render the document's components for presentation in the required format.
Portable Document Format (PDF) is commonly used for publishing electronic documents where proprietary word processing and HTML formats are inadequate. However, PDF's origin as a page description language means it has inherent limitations.
PDF comes from a publishing model where the document creator decides how the final document will appear and readers passively accept the content and format. The Web has encouraged a more interactive mode, where readers can adjust the layout of the document to suit themselves.
While PDF has been extended for the Web, it is used primarily for producing static, published documents that are designed to be self contained, so they can be sent in one file. However, an electronic document read on screen needs to have the font size, style and screen layout dynamically adjusted to suit the display device and the person reading it.
Screens have a lower resolution than printed pages, so larger font sizes and simpler designs improve readability. A typical printed page layout will not fit on today's computer screens. The reader has to clumsily scroll back and forth across a line to read the page. The disabled have additional requirements for increasing font sizes or using Braille devices.
In contrast, Web browsers designed for HTML don't have built-in page sizes. Text wraps to fit the screen with options for font style and size as well as to meet the needs of the disabled.
Web pages use hypertext links to break large documents into components. PDF documents can use the book metaphor of chapters, but all chapters are stored within the one large document, which makes reading most documents a very slow process.
Sun Microsystems recently announced plans to release the source code of its StarOffice Suite under a GNU General Public License (GPL) in a bid to define a set of XML-based file formats for word processing, spreadsheets and presentation tools.
Combined with the capabilities of XML enabled Web browsers, this would enable low-cost software generating portable file formats. A document created in a presentation tool could then be presented using a Web browser. There would be no need to convert the file from one proprietary format to another, or download a special viewer program. The Web browser would display the document directly.
It also creates the possibility of more flexible document formats, such as integrating a printable text document and a slide show in one file, or displaying database records as a document. XML is not certain to succeed. If fast and flexible editing and display software can be built for XML, it has a chance, however the flexibility of the format may be its undoing.
Anyone can easily invent an XML DTD and propose it as a standard. Theoretically, any XML browser should be able to display the document. In reality, document formats must be carefully designed and widely agreed if they are to be useful. Well meaning groups and companies trying to gain market share could create a Babel of incompatible, overlapping and unimplementable XML standards.
Tom Worthington is Director of the ACS Publishing Board, an e-business consultant to the Federal Government and a Visiting Fellow at the Australian National University.
E-mail tom.worthington @tomw.net.au or visit http://www.tomw.net.au for more information