IBM is working on ways to make XML documents and data easier to pull into its content management software, and to index and search the data once it is in there.
The initiative, code-named Cinnamon, currently is under development within IBM's research arm, according to Jim Reimer, chief architect of content management at IBM.
"The technology here is against the backdrop of our DB2 Content Manager products. The technology relates to handling XML documents and doing tasks such as automatic ingesting of the documents," Reimer said.
Until now, within content management and IBM's DB2 database the handling of XML documents has been focused on being able to receive XML documents that are set in different DTD schemas and have them be, in effect, mapped into rows in a database; so that is kind of parsing, extraction, and flattening action to be able to take XML documents from different sources and have them be added in with values out of the XML documents populated into certain columns, he said.
"In that context, when content systems are done, it's necessary to use much more complete or complex ways of expressing what's in the collection. One of the ways of gauging the completeness of a content management system is how rich a model you are able to manage for the way in which you are describing the content objects that are in the collection?" he said. "Content systems frequently have much more extensive description methods, like hierarchy and structure, like folders or folders in folders."
In IBM's latest Content Manager, Version 8, the company made extensions to what could be represented in a data collection, such as the primitives, the data modeling services, or whatever can be expressed in an XML document, including multi-valued attribute sets, arbitrary hierarchy, links, and relationships.
"The challenge if you have such documents is how to get them into CM and, secondly, how to deal with the landscape where you have evolving DTDs and schemas over time and different authors, writing in different DTDs and schemas, that are giving you content," Reimer explained.
The underlying technology aimed at this mapping, administration, and adaptation problem of dealing with evolving schemas is a project also within IBM research, dubbed Clio, and part of the overall eXperanto effort.
Cinnamon, then, is IBM's effort to extend that technology base to permit users to take complex XML documents, whatever might be expressed in an XML document and the associated DTDs and schemas, and then manage the oversight of the mapping task that defines how to project that into the full data modeling services of CM. Secondly, from a runtime perspective the goal is to handle the automatic ingesting of those documents including all the parsing, extraction, and projection into the new data model, Reimer explained.
"It's a key step for being able to improve the productivity of ingesting such documents in that complex of an environment. It's very important also to be able to live with the evolution of those schemas," Reimer said.
Additionally, the Cinnamon effort is a step toward administrative controls in the product that eliminate the need for programming through the ability to automatically ingest content and have it be automatically ingested, projected, and modeled in the same system, he said.
Stephen O'Grady, an analyst at RedMonk, said that IBM, with Cinnamon, is potentially addressing the future problem that companies will have as they collect more and more XML documents.
"There's no question that having documents in XML will be advantageous to companies for a host of reasons, such as indexing and personalization," O'Grady said. "It's going to be a problem because companies will have to really know what they are doing for indexing and retrieval."
O'Grady estimated that major companies will face these issues in approximately a year and a half.
Cinnamon is in what IBM calls the technology preview state, and will come to market as one of the administrative tools included with a future version of DB2 Content Manager, due within next year's timeframe, Reimer said.