IBM's labs are at work on a number of improvements to IBM's DB2 data management platform, focusing on data warehousing and integration as well as on autonomic, or self-managing, computing systems, company officials said on Thursday.
Projects under way aim to make the database platform a better resource for integrating data from multiple sources, including information stored in databases from other vendors. IBM researchers are also developing additional self-tuning capabilities and working to provide better support for grid computing, in which distributed, heterogeneous systems are linked together to provide a virtual pool of computing resources for running applications.
IBM officials say their data management strategy differs from that of rivals Microsoft and Oracle in that IBM favors a federated approach, in which data is stored and accessed from multiple locations, rather than storing all of a companies data in a single, monolithic platform.
One project, code-named Masala, an Indian term for a mixture of spices, focuses on information discovery among massive amounts of distributed data, including data not stored in a data warehouse, IBM officials said during a briefing at the company's Silicon Valley Laboratory in San Jose, Calif. For example, a customer service agent seeking information about a caller could have access to information such as e-mails and scanned letters.
IBM's DB2 Information Integrator product already lets businesses access information stored in distributed locations. Masala extends those capabilities to help businesses make better use of the information once it can be accessed centrally. It includes meta data management to track information about the data being integrated and provides access to distributed data more quickly, allowing businesses to make business decisions based on near real time information, said Nelson Mattos, IBM distinguished engineer and director of information integration.
Masala could be available to customers for beta testing by the end of the year and could show up in products in 2004, he said.
"We're extending the boundary that a warehouse brings by allowing it to reach out to data that is not necessarily stored in a warehouse," Mattos said.
As part of IBM's autonomic computing initiative, LEO, for Learning Optimizer, is software that learns about relationships between data sets to improve query performance. The software, for example, could prevent redundancies in data queries by learning about correlations in data. LEO is expected to appear in products in the next 12 to 18 months. It could be used in business intelligence tools for functions such as correlating customers' buying habits.
"(LEO is) a research project that attempts to learn from executing queries and improve the execution time on the next time that (query) or related queries execute, and it does so by looking at how many rows it gets back and feeds that back into statistics that are used by the optimizer to come up with future queries," said Guy Lohman, manager of advanced optimization in the research division of the IBM Almaden Research Center, in San Jose, Calif.
IBM's autonomic initiative is intended to relieve database administrators of laborious tasks that should be automated, according to IBM. "We're trying to get rid of the mundane, monotonous, time-consuming stuff and freeing them for what they're really good at," said Patricia Selinger, IBM fellow and vice president of data management architecture and technology.
IBM also intends for its data management platform to be a player in grid computing in the enterprise.
Big Blue will build off its data federation, replication, clustering, and high-performance technologies to enable grids, said Laura Haas, IBM distinguished engineer and development manager for DB2 Information Integrator. Additionally, IBM will produce its own versions of the Globus toolkit for building grids, Haas said.
"With DB2, we're providing the information infrastructure for the grid," Haas said. IBM wants to provide virtualized access to systems that would be treated as a single system in a grid, she said.
Through grids, a retail enterprise, for example, could pool computing resources during the holidays to handle the peak sales period, Mattos said.
IBM also is developing data replication technology using persistent queuing systems to transport data changes, said Bruce Lindsay, IBM fellow in the Almaden Research Center. Persistent queuing is a message system that ensures receipt of a message. It utilizes IBM's MQ Series messaging technology.
Lindsay added that he believes the concept of automatic schema integration will become popular in the next five to 10 years. Automatic schema integration figures out which data should be stored with other data, Lindsay said.
Selinger stressed that the variety of data sources is growing. Data from sources such as sensors, GPS units, and satellite imagery will mean less data is being touched by human hands.
"Ten years out, there will be new techniques for storing this data, for comparing it, for analyzing. That's a big challenge. We're not done yet," Selinger said.
(James Niccolai of the IDG News Service contributed to this story.)