Publishers, online or hard-copy, should get into XML as early as possible in the production process, as it will ease the task of rendering the publication in a range of formats and media.
So says Gurvinder Batra, technology chief of international publisher TechBooks.
As well as making many aspects of editing and checking easier, such as ensuring references or index entries are complete and accurate, XML is a good jumping off point for rendering the same publication, without further attention, in different print and digital formats. This makes for smarter, quicker and cheaper distribution.
The same text, with the format and meaning of elements embedded in XML, could be published to an internet or intranet library, an e-book or the smaller RocketBook format, a print-on-demand service (usually translating into PDF) and a wireless website using the WML standard.
Typically today, he says, if XML is introduced at all, it is done at a late stage, when pages have already been put together in printer-oriented PostScript format, and much of the editing and proofreading have been done.
"XML is the ideal data format for structured content," Batra says, having had ample experience of working on projects for leading publishers such as Harpers, using the language for producing new content and for translation of legacy material to broaden its usefulness.
The mantra of publishing today is "cheaper, faster without sacrificing accuracy", he told a seminar lthis week organised by XML-specialist web developer 3months.com.
From Batra's experience comprehensive XML use delivers as much as "20% faster delivery of all deliverables [in the production process] and 25% less cost for the complete process".
Based on his experience with major publications, the workload in training people into a different way of working is surprisingly light, Batra says. "The biggest thing is to change the mindset" -- to get people used to working on a digital file rather than a paper proof-copy.
Enshrining the whole process in a digital framework allows efficient workflow to be incorporated, he says.
Since data items in XML have meaning, while in a format like Microsoft Word they are described only in terms of position, something like a reference at the end of an article can be tightly associated with the line that cites it. Thus no references will be missed and "orphans" with no real match will be picked up early and accurately.
There are already many requests from books and journals to authors and editors to send copy in XML, or in PDF, he says.
Adobe is fighting to get its PDF format seen as "the successor to XML", Batra says, but both will be in the ascendancy for a long time, and will inevitably have to work together.
XML being a hyper-language rather than a language in itself, each publisher will have to define its own document type definitions (DTDs) for its own standard layouts. Tools are available to convert between style-sheets for common word-processing products and the defined DTDs.
Some of the tools for manipulating and translating XML are still under development, Batra acknowledges, and not everything can be left to automatic processes. Quality assurance (QA) is very critical, and not always manageable with off-the-shelf tools in the way the publisher wants. "You may have to develop some QA tools internally."