Network partitioning (aka split brain) is something you don’t want to happen, but what is the problem really?
Sometimes, users set up a Galera Cluster (or Percona XtraDB Cluster) containing an even number
of nodes, e.g, four or six nodes, and then place half of the nodes in one data center and the other half in another data center. If the network link goes down between the two data centers, then the Galera cluster is partitioned and half of the nodes cannot communicate with the other half. Galera’s quorum algorithm
will not be able to select the ‘primary component’, since both sets have the same number of nodes. You then end up with 2 sets of nodes that are ‘non-primary’, and effectively, none of the nodes would be available for database transactions.
Below is a screenshot of ClusterControl for Galera. I have simulated a network partition by simply killing two of the nodes at the same time.
|Network partitioning / split brain|
At this point, our 2 nodes are pretty useless; at least one node should be in state ‘synced’ before it can be accessed. ClusterControl will address this situation by first recovering the first two nodes in ‘non-Primary’ state and setting them to Primary. It will then resync the two crashed nodes when they are back.
In order to avoid this, you can install an arbitrator (garbd). garbd is a stateless daemon which acts as a lightweight group member. In a quorum situation where half of the nodes are not available, garbd will help avoid a split-brain.
In ClusterControl, garbd is a managed process (will be automatically restarted in case of failure) and can easily be installed from the deployment package:
Now that we have installed garbd in our 4-node test cluster, it will now consist of 1+4 nodes. Killing 2 nodes will not affect the whole cluster, as the remaining nodes will have the majority, see the picture below The minority will be excluded from the cluster, and when they are up and ready to rejoin the cluster, they will go through a recovery protocol before being accepted into the cluster.
|Majority win - garbd + two survivors > two failed nodes|