XML With Apache

XML is an exciting technology now being realised in online publishing, information serving and content management on the Web. XML looks just like HTML - it is an embedded tag mark-up language. It is not so much meant to replace HTML, but make HTML do what it should: allow document generation and management to be simplified and to seperate the tasks of writing, Web developing and graphically formatting Web pages.

This is achievable because of how broad the XML standard is. Figure 1 shows a simple HTML document; Figure 2 shows how an XML version of this document could look. I say 'could' because what each of the tags translates to is also defined. This makes Web document generation more logical. It also makes content sharing simplier, because relevant information (such as in the example above) can be extracted and used on other pages. This process of translating XML into HTML is called XSL Transformations (XSLT).

Browser side support of XML/XSLT is currently limited. Technology on the server end, however is much more excited. It is on the server side that XML is really useful.

Apache XML Project

One of the leading XML/XSLT projects currently is XML Apache, run by the Apache Software Foundation of Web server fame. The Apache team, along with Lotus AlphaWorks, have produced an Apache Web server plug-in designed to serve, transform and manage XML documents. The project, like that of the Apache project itself, is run under an open-source model. There are three main elements to Apache XML Xerces, Xalan and Cocoon. .

Apache/XML Xerces

Xerces is the Apache XML parser. That is, this component reads XML files and breaks them down into elements so that they can be translated to HTML. This is one of the most fundamental aspects of the Apache XML project, for it does much of the work involved in XML generation and translation as well as giving a functional example of a standards based XML implementation. The Apache Software Foundation has also written a C++ and perl interface to the XML Parser, thereby passing on the capabilities of the Xerces engine to other programmers.

Apache/XML Xalan

Xalan is the XSL Transformations processor. This is the component of the XML Apache project responsible for translating XML documents into HTML Web documents. As such, it is with Xalan that things really start to happen. The Xalan processor was contributed to the project by Alpha Works Lotus -- highlighting the commitment of Lotus and IBM to open source development.


Cocoon is where everything comes together with XML Apache. It is the XML publishing processor and is responsible for actually making things happen. With Cocoon, XML documents can be created dynamically, so that the resulting HTML pages are up-to-date. This means that Cocoon can generate pages to look exactly like your current Web site -- just that sharing of content amongst different pages is moved out of the database arena, and into a Web document one and defining what a document looks like is handled outside of the document itself.

One of the main limitations of XML Apache, however, is it's Java-based design. Good run-time Java compilers are still in development: they do not provide the same speed and resource efficiency as PHP or Perl. As a result, to run Cocoon requires quite a large amount of memory, as well as spare CPU cycles -- galore!

It also means that to run Cocoon, you must have a Java Servlet Server installed with Apache. Sun, IBM and Oracle all produce good Java Servlet engines, though for my money, I prefer the JServ Apache module. This module, in my opinion, offers all the functionality of the proprietary Servlet engines, but is open source and designed specifically for an Apache style Web server. JServ can be downloaded from the Java Apache Web site.

Like Cocoon, JServ also requires a lot of RAM as well as a good working knowledge of Java and Apache in order to get it working at its best in a production environment.

Another problem with the Java design of Cocoon is that all dynamic content must be produced using Java. Whereas in a standard environment I could use my language of choice if I wanted to create on-the-fly Web pages, with Cocoon I would have to write a Java Servlet and tie it into the Cocoon and Jserv infrastructure. This compounds the resource problem created by using Java.

The Cocoon caching system goes some way to relieving the problem created by using Java. It stores as copy of the dynamically generated content statically, so that the Java Servlets and Cocoon do not need to be run every time a page is downloaded. Regardless of this, I would not recommend running XML Apache with less than 128MB ram.

Cocoon 2?

The Apache Software Foundation is well aware of the criticisms which I have aimed at XML Apache -- the documentation on the XML Apache Web site describes performance problems and limitations. One must remember, however, that Cocoon grew out of a proof of concept project and as such should not be considered the final word. Cocoon 2 is currently underway, but is still very far from completion. This second release addresses the resource problems which I have discussed, as well as implementing the XML standard as it now stands. If such a release can bring to XML the stability and functionality which XML itself hints at, XML Apache will mark the second generation of the Web.

XML Apache can be downloaded from the Apache Web site.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about ApacheApache Software FoundationIBM AustraliaLogicalOracle

Show Comments