In the world of mainframe computing, the concepts of clustering and fail-over are decades old. The relatively recent migration to PC computing, however, has created a demand for comparable levels of enterprise-grade risk-mitigation for these low-cost, distributed computing platforms as well.
By clustering -- synchronizing multiple servers to replicate data and disperse workload -- companies can help guarantee uninterrupted service against fail-over and bottlenecks, and protect data integrity from any single point of failure. Adding more computers as needed increases reliability at a cost that typically rings in well below one expensive, SMP (symmetric multiprocessing) system.
The difficulty that has faced clustering, however, comes from the management of the clusters themselves, which are often difficult to set up and even tougher to optimize. Further, the cost and hassle of recoding applications or writing specialized scripts that take advantage of redundant availability has left the potential of clustering largely untapped.
For scenarios in which downtime, lost data, and transactional bottlenecks cannot be tolerated, computer clustering can help provide a stable foundation for improving the availability, reliability, and load balancing requirements of OLTP (online transaction processing).
Node to node
The three flavors of clusters most commonly used to improve application performance are scientific clusters, load-balancing clusters, and high-availability clusters. In the business of OLTP, the latter two take center stage.
High-availability clustering provides assurances against hardware and software failure by maintaining a mirror image of server activity on a secondary backup system, or node. When faced with a system failure on the primary node, the redundant node is promoted from understudy to lead in the role of application delivery.
Load-balancing clusters, on the other hand, dynamically distribute processing loads and network traffic among available systems in your server farm to help maintain workflow equilibrium, ensuring that no single system or network leg gets bogged down by client requests. The nodes in load-balancing clusters, which can be added or removed as usage dictates, are typically grouped according to a particular application, with the same collection of applications running within a single cluster to ensure availability against usage spikes.
Although high-availability clusters protect against fail-over, load-balancing clusters ensure the optimal usage of network and hardware resources. In practice, most clustering implementations for OLTP benefit from a hybrid approach, one that leverages both cluster types to improve both application efficiency and assurances for system uptime.
With the increased demand for personalization and dynamic content delivery, the need for performance and availability beyond the middle tier has never been greater. Consequently, database clustering has become an important consideration in mission-critical e-business applications, but one in which design differences, namely shared-disk vs. shared-nothing clusters, spawn debate.
In the shared-disk approach, any node in the cluster can access any block of data, enabling incoming queries to be routed to any RDBMS instance. By contrast, the shared-nothing approach partitions data into static, logical segments, each only accessible from a single, "owning" node.
The shared-disk method has inherent advantages in scenarios that call for high availability. Requests can be balanced dynamically or round-robin across nodes, regardless of which data partition is requested. In a case of a node failure, requests are simply rerouted to the next available node with no disruption of data availability. Adding nodes to a cluster requires no reconfiguration to the architecture, application, or underlying data organization.
On the downside, such ubiquitous data access necessitates that only a single node has transactional access to a block of data at any given time. A distributed lock manager is therefore necessary to provide global control over cache updates and disk writes to ensure data integrity, and this control comes at the cost of speed.
By contrast, shared-nothing clusters do not require locking and gain performance benefits through efficient use of caching. Because data is localized, the chances of a cache hit are increased at each partition node. This ability to partition data makes shared-nothing the preferred model for improving performance in large data warehouses.
Setting up partitioning has additional up-front costs and requires applications and transaction managers to be partition-aware for optimal routing. If partitions are not properly optimized, individual partition nodes can quickly become overloaded, and in the event of a node failure, the fail-over period is greater than for shared disks because partitions must be reassigned. Further, adding new nodes to shared-nothing clusters requires a redistribution of data and a new partition map, posing greater headaches than adding nodes to shared-disk clusters.
In cases with well-defined keys for modeling data partitions, a shared-nothing approach can make good sense even for OLTP. But for maximum flexibility, and in cases in which servers may be added and deleted based on varying demand, you can't beat the shared-disk architecture.
Chart a course for clustering by analyzing your applications, data types, usage models, and transactional processing requirements. The best route to improving availability and fail-over for your OLTP solution may be just a processing-constellation away.
THE BOTTOM LINE
Application and database clustering
Executive Summary: Clustering Web and application servers can help ensure availability, scalability, and reliability of mission-critical applications. OLTP demands clustering efficiency, and the right database model can achieve that goal.
Test Center Perspective: Clustering should be viewed as a necessary evil for OLTP. Consider shared-disk clusters for most scenarios; shared-nothing clusters are best-suited to large data warehouses.