Galera Cluster – what is it?

MySQL comes in different shapes and sizes. Asynchronous replication, semisynchronous replication, NDB Cluster, Galera Cluster, InnoDB Cluster. We are not even mentioning MariaDB as a separate entity. One of the common questions we hear when delivering MySQL consulting is: what should I use? What are the differences between X and Y? This blog post is intended to be a starting point in your journey across the land of MySQL. We are going to sum up the main features of Galera Cluster and we will also compare it to “traditional” MySQL replication.

What is Galera?

Galera is a “virtually” synchronous cluster that is based on MySQL. Every node shares the same, full dataset. There is no cluster-wide locking nor two-phase commit. Locking is optimistic and conflicts are resolved during the certification phase. Cluster has an ability to provision nodes with data. Cluster supports both local network and WAN deployments.

Ok, that’s some buzzwords. What does it mean in layman’s terms? We are going to explain it a little bit. Only a little, because every paragraph in this blog can be a separate blog post. Probably it will become one at some point, but for now let’s talk about Galera and what it is.

Galera is based on MySQL/InnoDB

First, Galera is based on MySQL and it uses InnoDB as a storage engine. In fact, Galera is a set of libraries that enhance MySQL, making it able to be cluster-aware: be able to work with multiple nodes. This results in some important but quite often overlooked characteristics of Galera. First, yes, it does behave quite similar to the standalone MySQL with InnoDB storage engine but when you start digging into the performance, you will definitely notice quite significant changes: Galera relies heavily on the network performance as every transaction has to be distributed across all nodes in the cluster. Then the transaction has to go through the certification process. As a result, every commit has to wait for the network (to distribute) and nodes (to certify) before it can be returned to the client. This may, or may not be acceptable, depending on the requirements of the application and depending on the geographical distribution of the cluster. If we are talking about distributing nodes between high availability zones in the same datacenter, this probably will work just fine. If we are talking about spanning the cluster across different continents, the latency may become a blocker. We have seen Galera clusters spanning across the world, US – EMEA – APAC but the fact that you can do something doesn’t always mean you should do it.

Galera shares the dataset across the nodes

Well, this should be self-explanatory. Unfortunately, from our experience, it is not necessarily the case. Let’s reiterate. All Galera nodes contain the same data set. This implies that there is no sharding involved and, as such, there is no write scalability. Yes, you can write to all nodes in the cluster but, again, even if you can do it, it doesn’t mean that you should do it. Performance-wise there is pretty much no gain in terms of the write performance. No matter where you write the transaction, it will have to be distributed across all nodes and executed on all of them. Yes, you can have multiple threads that would execute all writesets in parallel but for the majority of the cases this will not make any difference. If you are in the business of scaling writes, sharding is the only way to go. You can try to use Vitess or figure out some sort of a solution on your own, but Galera will not really help you here. Sure, it can be used in a sharded environment as the platform to build a shard, to improve high availability and reduce complexity of the operations compared to the replication topology but that’s just it.

Galera uses optimistic locking

Let’s say that you decided to execute writes on all of the Galera nodes at the same time. This might be a choice made with intention to simplify the balancing of the write requests – instead of figuring some way to check the status of the nodes, pick the one and only writer (as long as it is reachable, that is) and send all traffic to it, we can simply define a round-robin across all nodes in the Galera cluster and just skip one if it is not reachable. What would happen if you attempted to execute a DML (Data Modification Language) query on two or more nodes that touch the same data? Let’s test it. We will create a schema in our Galera cluster, we’ll create a table as well and then we’ll run some queries.

root@vagrant:~# mysql -ppass
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 1326
Server version: 10.5.12-MariaDB-1:10.5.12+maria~focal-log mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> CREATE SCHEMA blog;
Query OK, 1 row affected (0.050 sec)

MariaDB [(none)]> CREATE TABLE tab (a INT NOT NULL, b VARCHAR(20));
MariaDB [blog]> USE blog;
Database changed
MariaDB [blog]> CREATE TABLE tab (a INT NOT NULL, b VARCHAR(20));
Query OK, 0 rows affected (0.180 sec)

MariaDB [blog]> INSERT INTO tab VALUES (1, 'b');
Query OK, 1 row affected (0.019 sec)

MariaDB [blog]> INSERT INTO tab VALUES (2, 'a');
Query OK, 1 row affected (0.004 sec)

Once this is done, let’s run UPDATE on the first row in the ‘tab’ table from two nodes. We will create transaction in an explicit way and see how it goes.

On node1:

MariaDB [blog]> BEGIN;
Query OK, 0 rows affected (0.002 sec)

MariaDB [blog]> UPDATE tab SET b='x' WHERE a=1;
Query OK, 1 row affected (0.002 sec)
Rows matched: 1  Changed: 1  Warnings: 0

On node2:

MariaDB [blog]> BEGIN;
Query OK, 0 rows affected (0.001 sec)

MariaDB [blog]> UPDATE tab SET b='z' WHERE a=1;
Query OK, 1 row affected (0.017 sec)
Rows matched: 1  Changed: 1  Warnings: 0

As you can see, we opened a transaction and then updated the same row on two Galera nodes. Those are two separate Galera nodes and, as such, there are no mechanisms to prevent updating the same row with different data. When we’ll hit COMMIT, though, the certification process starts and conflicting writes are going to be dealt with.

On node2:

MariaDB [blog]> COMMIT;
Query OK, 0 rows affected (0.019 sec)

On node1:

MariaDB [blog]> COMMIT;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
MariaDB [blog]> SELECT * FROM tab WHERE a=1;
+---+------+
| a | b    |
+---+------+
| 1 | z    |
+---+------+
1 row in set (0.009 sec)

As you can see, the transaction completed successfully on node2, where COMMIT was executed earlier. On node1 it resulted in “deadlock” error which is not deadlock per see – it was just a conflict in Galera that has been dealt with.

Finally, as you can see, the SELECT confirms that the UPDATE executed on node2 went through.

Galera cluster has an ability to provision nodes with data

Back in the day when Galera Cluster was introduced, provisioning of the nodes was a manual process. You had to run, ideally, xtrabackup to get the full backup of the primary node and then restore it on a replica. Configure GTID’s (or not, if GTID weren’t used) according to the position in the backup and then set up the replication manually. This was quite cumbersome. Then we got Galera, which came with two mechanisms to provision the nodes. IST – incremental state transfer for all those cases where one node went off for a short period of time and SST, state snapshot transfer, which contained a full dataset, for the cases where IST wouldn’t work properly. It was a breakthrough. Incremental state transfer was maybe less impactful – standard replication, after all, also come with an incremental state transfer of some kind: binlogs, as long as they exist on the source node, will be used to catch up. State snapshot transfer, on the other hand, was amazing. We, finally, could just set up a new node, configure it to join the cluster and that’s it. The rest was pure magic. Data would be transferred using one of the methods SST supported. Nowadays it is rsync, xtrabackup and even a clone plugin for MySQL 8.0. All happens automatically and, while it may take some time and, potentially, impact the performance of the cluster, it is still one of the best features Galera has.

Galera supports local and WAN deployments

As we mentioned, Galera cluster can be deployed within a single datacenter or it can be deployed across WAN. This is quite significant difference between Galera and MySQL Group Replication (InnoDB Cluster) which is strictly intended to be deployed within single datacenter (there is InnoDB ClusterSet that’s intended for DR but it uses asynchronous replication between DC). Galera even comes with some functionality that is intended to help dealing with multi-DC over-the-WAN deployments: segments. Galera cluster can be split into multiple segments and it will attempt to minimize the amount of data sent between different segments. Instead of all-to-all communication, as within a single segment, cross-segment communication is one-to-one, based on automatically elected relay nodes. State transfers are also prioritized within a single segment and only when there is no node that can act as IST or SST donor, data will be transferred from another segment.

In addition to segments, Galera comes also with significant tuning capabilities that can help to minimize the impact of the WAN and high latency caused by geographical distribution of the nodes. What is very important, as we mentioned a couple of times already, the fact that you can architect and build WAN-spanning Galera clusters, doesn’t mean that you should do it. Contrary, the asynchronous replication that connects Galera clusters is way more efficient and less impacting solution in majority of the cases.

This concludes our blog, we will definitely dig into Galera cluster further in the future blog posts.