Optimize what you know about databases

Some pundits -- and some technology vendors -- assert that database management systems have become commodities and that DBMS innovation is either dead or irrelevant. They're wrong. DBMSs haven't stopped evolving, and they aren't close to being all alike.

In particular, the major relational DBMS vendors are pursuing technical differentiation in three important areas:

Lower-cost analytical processing. DBMS vendors are enhancing their query optimizers to relieve much of the burden on database administrators, as well as to directly improve query performance. They're also improving processing speed though specialized indices and materialized views.

Real-time analytical processing. In some cases, these enhancements make it practical to run analytics straight off a production database, greatly facilitating real-time analysis. This is especially important in CRM, where it allows customer-specific pricing and confirmed product availability. Real-time analytics have also become cost-effective in some supply chain, logistics and portfolio/risk analysis applications.

Nonrelational data types. Specialized data types are crucial to many applications -- geographic data for marketing, mineral exploration and homeland security, or genomic data for drug research, for example. Text data is used for a broad range of search and document applications. And among the horde of new XML-based applications, a small but significant fraction depends on actual XML data storage.

Much of this differentiation is closely connected to the query optimizer, the brain of a relational DBMS. Any improvement in how a DBMS processes queries -- such as better parallelism, a new kind of index or a new kind of data type -- must be understood by the optimizer, or the DBMS can't take advantage of it.

All optimizers are not created equal, and understanding what a particular one does and doesn't do gives a lot of insight into the capabilities of the overall DBMS. So to understand differences among DBMSs, it's useful to know a bit about how optimizers work.

For each query, the optimizer determines which indices and table columns should be read and joined, what kinds of joins should be used and in what order the joins should be performed. Modern query optimizers are all cost-based, i.e., the optimizer estimates the cost of each operation contained in each reasonable query path, adds them up and chooses the cheapest path. Costs are computed for I/O, in-memory processing and interprocessor communication, based on summary statistics about the distribution of specific values in the underlying data.

Unfortunately, these estimates aren't perfect, so optimizers often fail to find the best query plan. Database administrators then need to laboriously hand-optimize SQL code or optimizer parameters. In response, DBMS vendors are rolling out a slew of enhancements that help find and fix the worst of suboptimized queries. And in another major aid to database administrators, optimizers are being exploited to recommend new indices and other database-tuning choices, traditionally the province of third-party tools.

The most direct benefits of better optimizers are faster queries and lowered database administrators costs. But equally crucial are the advanced access methods that optimizers enable. For starters, every major enhancement that lowers the cost of analytical processing depends on added intelligence in the optimizer. There are plenty of those; top-end DBMSs are replete with bitmaps, star-schema indices and more exotic aids to complex data-warehouse-style queries.

Most important over time may be materialized views, supported by IBM, Oracle and Microsoft alike. These are precomputed query results, stored like actual tables and typically updated on a near-real-time basis. In principle, materialized views can support efficient analytic queries from within online transaction processing databases in near real time without badly affecting OLTP performance or recopying or hiding the underlying transactional data. However, their widespread use depends upon optimizers that, at a minimum, can recognize views already created or, better yet, create new ones where appropriate.

Support for nonrelational data types is also highly optimizer-dependent. Each new access method relies on its own indexing techniques, usually very different from those used for conventional relational data. Selecting and joining these data types efficiently therefore requires that the optimizer have a good cost model for a previously unfamiliar kind of index. IBM and Oracle provide capabilities to define such cost models, with Oracle's being the more flexible and comprehensive of the two.

Almost any sizable enterprise is likely to benefit from at least some advanced DBMS capabilities. Potential advantages include easier tuning, faster analytic processing or support for nonrelational data types. With luck, the particular features you can best use will be implemented in your organization's preferred brand of DBMS. But if they're not, you may want to explore selective use of alternative DBMS suppliers. Either way, it's worthwhile to spend some time tracking developments in database technology. And optimizers are a good place to start your exploration.

- Curt A. Monash is a consultant in Acton, Mass. You can reach him at curtmonash@monash.com.

More about IBMMicrosoftOracle


Comments are now closed

Mobile payments in Australia: state of the banks