Blog Archive

I am an Oracle database Consultant My Areas of Interests being High availabilty,Infrastructure consulting and Performance Tuning. I am posting Topics of My interests Related to these Technology Areas. Do post Your comments as well.

Friday 18 July 2008

Oracle Clusterware I

Oracle 10g Real Application Clusters has redefined High Availability (HA) architecture by extending a solid framework through a complete integrated clustering and volume management solution on all supported platforms. I am trying to provide a technical overview of the 10g Clusterware and its usage when compared to the third party Clusterware products in the market.

CLUSTERED DATABASE

Oracle Real Application Clusters (RAC) allows multiple computers to run the Oracle RDBMS software simultaneously while accessing a single database. This is called a clustered database. In a non-RAC Oracle database, a single database is accessed by a single instance. The database is considered the collection of data files, control files, and redo log files located on a shared disk subsystem. The instance is considered the collection of Oracle-related memory and operating system processes that are running on the computer.
In Oracle RAC, two or more computers (each with an instance) concurrently access a single database. This allows an application or user to connect to either computer and have access to the same data.

Now having said about a cluster, there should be a mechanism in place which monitors and manages the multiple computers part of the cluster. The piece of software that does this functionality is known as the Cluster Manager (CM) or some times referred to as the Clusterware. The CM is primarily responsible for maintaining information about nodes in the system. Starting with oracle 10g RAC, Oracle Corporation has started providing the Clusterware known as Oracle Clusterware as a product which not only manages the interconnected computers but also has integrated with the High Availability (HA) Functionality of the application i.e. the RDBMS software. Prior to oracle 10g RAC the vendor supplied Clusterware like Veritas Storage Foundation For RAC ,Sun Cluster or HP-Service Guard etc had to be relied upon. The same can be used in 10g RAC as well for Managing the Cluster. This paper examines the usage of Vendor supplied Clusterware against Oracle Clusterware for Managing the Cluster.

CLUSTERWARE AND SPLIT BRAIN

Cluster nodes has to communicate with each other to check whether all the participating computers are part of the cluster or alive, for this they make use of the private interconnect which is ideally a redundant network for intranode communication. Also this information needs to be captured by the Clusterware.
Now imagine a situation where all the intranode communication fails, but the nodes are still running without the knowledge of each other.
Considering the above scenario each node thinks that the other one is dead, and that it should have control over the application. This Condition is known as a split Brain this is a cluster phenomenon and not RAC specific. This would end up in nodes independently accessing the shared disk subsystem; because they do not know the other nodes are doing the same thing. This could lead to a potential Data corruption. It should be noted that splitting communications between cluster nodes does not constitute a split brain. A split-brain means cluster membership is under threat in such a way that multiple nodes are trying to access the same exclusive resources, which results in data corruption. The goal is to minimize the chance of a system taking over an exclusive resource while another has it active, yet accommodate a system powering off.

To prevent this kind of situation the Clusterware operates in the following manner.
When a catastrophe in the form of private interconnects failure leading to the formation of sub clusters happens then only one of the sub clusters is allowed to continue cluster operation and the rest are kicked out. Clusterware implements this based on a voting or quorum disk, the mechanism is such that the participating nodes writes to this disk continually and any failure in writing here with in a specified time out interval then the node is evicted out of the cluster. The implementation of the quorum disk varies from vendor to vendor but the basic concept remains the same.The quorum disk cannot store any data pertaining to the application i.e. the user data.

Consider a scenario where only one node experienced problems on all interconnect links and could not receive heartbeats from the other node anymore. Having this information this node would go through a reconfiguration and decide to kick the other node out and take control of the cluster. However if the other node was still receiving heartbeats from this node and did not reconfigure or check whether it could take control over the cluster or not, at this point it could be possible that this would continue accessing the shared disk and issuing a disk I/O. But once a node has decided to take control over the cluster and access the data it should make sure that the other node is unable to access any of the shared data.

The Clusterware makes sure that soon as the ‘unaware’ node tries to access the data on one of the shared disks; it is kicked out to prevent any data corruption. The other node then remains in the cluster fully operational. Simple reconfiguration of the node membership does not guarantee data protection. If a node is hung or suspended and comes back to life, it could cause data corruption before the cluster manager can determine the node was supposed to be dead. Clusterware takes care of this situation by providing full data protection at the data disk level; the mechanism used for this is called Disk Fencing or I/O Fencing. This is a key aspect of a Clusterware which ensures data consistency.

No comments: