IBM last week released the initial open beta of a planned update to its flagship DB2 database that will offer users the ability to natively store unstructured XML data separately from conventional relational data.
The upgrade, code-named Viper, is scheduled to be released in the middle of next year, according to Bernie Spang, director of database marketing at IBM. Spang said that Viper will be able to store data such as multimedia files, Excel spreadsheets and Word documents in an XML repository, which will operate in parallel with IBM's relational data repository under the control of a single DB2 engine.
Typically, relational databases handle XML data either by storing the entire file as an object that isn't relationally indexed, or by "shredding" the file so the unstructured information fits into multiple relational data cells.
"Offering native XML functionality is very essential to delivering improved performance for data access," said Noel Yuhanna, an analyst at Forrester Research. Oracle had "a head start on XML for many years," he added. "I expect this to become a game of catch-up and leapfrog among the big vendors."
Viper's XML storage capabilities could be of use to CheckFree in light of its interest in service-oriented architectures (SOA) and Web services, said Robert Catterall, director of engineering at the Norcross based provider of online bill-payment services.
CheckFree uses DB2 to run databases with multiple terabytes or more of information. "We are not today storing XML documents as such in our databases," Catterall said. "But that has partly been because there wasn't an appealing way to do that in a single database."
Oracle officials said the company began offering XML storage options five years ago and in July enabled users of its 10g database to natively search XML files using the XQuery markup language. IBM's native XML storage feature "doesn't add any value," said Mark Drake, an XML technology manager at Oracle.
Viper will also support XQuery for processing XML data, along with standard SQL. In addition, it will be the first DB2 release to support three different partitioning methods: range partitioning, multidimensional clustering and hashing. That support is aimed at helping IBM compete against Oracle, which has said that 10g offers six methods of partitioning data tables for faster access to information.
Despite estimates that the amount of unstructured or XML-formatted data at most companies is already larger and growing faster than structured relational data is, Yuhanna said he doesn't think that XML will replace SQL as the preferred data format.