Vitess – How to migrate into Vitess? Replication

In the previous couple of blogs we have been looking at the basic scaling options in Vitess. We have added replicas, sharded our data, created custom shards and many more. In this blog we would like to start a new series, learning more about how the application interacts with Vitess, how to identify performance issues, how sharding may impact the application and how Vindexes may be of use in different scenarios.

First, though, we would like to take a quick look at the options we have to migrate an existing database into Vitess Kubernetes cluster. There are a couple of options and we will take a look at them in upcoming blog posts.

Initial setup

What we want to start with, though, is to describe the initial setup. It is not a complex one, by any means. We have a standard MySQL replication on MySQL 8.0. One source node, two replicas. Semi-synchronous replication. Our application (Sysbench TPC-C) connects to the database using ProxySQL, which tracks the state of the topology and performs a read/write split.

The data set has been created using following command:

root@vagrant:~/sysbench-tpcc# ./tpcc.lua --mysql-host=192.168.10.121 --mysql-user=sbtest --mysql-password=sbtest --mysql-db=sbtest --time=300 --threads=8 --report-interval=1 --tables=10 --scale=10 --db-driver=mysql prepare

We cloned sysbench-tpcc from https://github.com/Percona-Lab/sysbench-tpcc

Migrating the data into Vitess

There are a couple of ways in which you can migrate the data into Vitess. We will take a look at each of them but for now, the list of options consists of:

  1. Setup the replication between the production source node and Vitess
  2. Import one of the nodes from production into Vitess

What is important to keep in mind is that Vitess, under the hood, is a MySQL. As you may have seen in our previous blogs, for each pod there is MySQL running in a Docker container. This MySQL is something we can access and perform whatever tasks we need. Let’s start by writing down the steps that we have to perform to get the replication running between the production database and the Vitess cluster.

Migrating to Vitess using asynchronous replication

To get the replication going we have to perform several steps:

  1. Set up the Vitess environment. We will use one keyspace with one node only. Replicas can always be added later.
  2. Get the data out of the production system. We can use XtraBackup for that. We have to take a consistent backup of the source node, along with the GTID position at which backup has been performed.
  3. Provision the Vitess node using the backup we have just taken. The challenge might be to transfer the data into the container. We can use persistent volumes for this.
  4. Set up the replication from the production source node into the Vitess cluster.

First, we are going to create the Vitess cluster using Vitess Operator. We have prepared 201_initial_cluster.yaml file, where we have made a couple of changes. First, our production cluster runs on MySQL 8.0 therefore we made sure that Vitess will use 8.0 as well:

spec:
  images:
    vtctld: vitess/lite:mysql80
    vtgate: vitess/lite:mysql80
    vttablet: vitess/lite:mysql80
    vtbackup: vitess/lite:mysql80
    mysqld:
      mysql80Compatible: vitess/lite:mysql80

We also made some changes to the MySQL configuration:

            mysqld:
              configOverrides: |
                innodb_flush_log_at_trx_commit=2
                innodb_buffer_pool_size=512M
                replicate-rewrite-db="sbtest->vt_sbtest"

We have relaxed the durability, following the production settings. We have also increased the size of the InnoDB buffer pool (we are on a tight budget here, VM sizing definitely is not production-grade). We also set up the rewrite of the schema from ‘sbtest’ to ‘vt_sbtest’. The plan is that, eventually, we are going to setup the replication from our production to the Vitess. This will happen on the docker container level and, under the hood, Vitess uses ‘vt_*’ prefix for all keyspaces. The file can be found in our GitHub repository: https://github.com/krzysztof-ksiazek/vitesstests. Ok, let’s proceed and spin up the Vitess environment:

root@k8smaster:~/vitesstests# kubectl apply -f 201_initial_cluster.yaml
vitesscluster.planetscale.com/example created
secret/example-cluster-config created

After a while we can see services up and running:

root@k8smaster:~/vitesstests# kubectl get pod,pv
NAME                                                    READY   STATUS    RESTARTS   AGE
pod/vitess-operator-f44545df8-l5kk9                     1/1     Running   0          26d
pod/vitesstpcc-etcd-8f68363b-1                          1/1     Running   0          4m10s
pod/vitesstpcc-etcd-8f68363b-2                          1/1     Running   0          4m10s
pod/vitesstpcc-etcd-8f68363b-3                          1/1     Running   0          4m10s
pod/vitesstpcc-vttablet-zone1-2344898534-27a85fd2       3/3     Running   1          4m10s
pod/vitesstpcc-zone1-vtctld-f38ee748-6565d998b7-5wlkq   1/1     Running   1          4m10s
pod/vitesstpcc-zone1-vtgate-67d1e711-55c8c79d4-lz555    1/1     Running   2          4m10s

NAME                      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                                   STORAGECLASS   REASON   AGE
persistentvolume/backup   200Gi      RWX            Recycle          Bound       default/backupvol                                                               26d
persistentvolume/pv1      200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv10     200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv11     200Gi      RWO            Recycle          Bound       default/vitesstpcc-etcd-8f68363b-2                                              26d
persistentvolume/pv12     200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv13     200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv14     200Gi      RWO            Recycle          Bound       default/vitesstpcc-vttablet-zone1-2344898534-27a85fd2                           26d
persistentvolume/pv15     200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv16     200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv17     200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv18     200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv19     200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv2      200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv20     200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv3      200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv4      200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv5      200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv6      200Gi      RWO            Recycle          Bound       default/vitesstpcc-etcd-8f68363b-3                                              26d
persistentvolume/pv7      200Gi      RWO            Recycle          Available                                                                                   26d
persistentvolume/pv8      200Gi      RWO            Recycle          Bound       default/vitesstpcc-etcd-8f68363b-1                                              26d
persistentvolume/pv9      200Gi      RWO            Recycle          Available                                                                                   26d

Now it is time to think about how to migrate the data. There are several ways in which we can approach this. We could use XtraBackup to copy the data. The problem with this approach is that we would overwrite all data, including whatever data Vitess has stored on the instance. All system schemas would have been replaced. This is not a blocker as we can use mysqldump to preserve the state of the MySQL database in Vitess before we restore the XtraBackup. We decided to use another approach. We can use mydumper/myloader https://github.com/maxbube/mydumper to take the logical backup off the production database but with a twist. Instead of dumping everything, we can dump only the data we care for and skip system schemas while still benefiting from a consistent state of the data that we dumped. Let’s proceed with this plan:

root@k8smaster:/storage/backup# mkdir dumper
root@k8smaster:/storage/backup# cd dumper/
root@k8smaster:/storage/backup/dumper# mydumper -u sbtest -p sbtest -h 192.168.10.122 -v 3 -B sbtest -G -E -R -t 2
** Message: 20:33:26.391: Server version reported as: 8.0.26
** Message: 20:33:26.391: Connected to a MySQL server
** Message: 20:33:26.595: Started dump at: 2021-10-13 20:33:26
** Message: 20:33:26.596: Written master status
** Message: 20:33:26.610: Thread 1 connected using MySQL connection ID 55635
** Message: 20:33:26.616: Thread 2 connected using MySQL connection ID 55636
** Message: 20:33:26.618: Thread 1 dumping db information for `sbtest`
** Message: 20:33:26.618: Thread 2 dumping schema create for `sbtest`
** Message: 20:33:26.689: Thread 2 dumping data for `sbtest`.`customer1`
** Message: 20:33:26.691: Thread 1 dumping data for `sbtest`.`customer10`
.
.
.
** Message: 20:36:53.909: Thread 1 shutting down
** Message: 20:36:53.996: Thread 2 shutting down
** Message: 20:36:53.997: Finished dump at: 2021-10-13 20:36:53

Once the mydumper completes its run, we can move forward. Please notice that we stored the data on /storage/backup volume. This is a volume that we mount to all vttablets to store the backup data, therefore it will be accessible on every vttablet that we have spun up. 

As we are going to configure the replication, we have to create a replication user on the production cluster. We will use this user to connect from the docker container to the source node in the production.

mysql> CREATE USER vitess_repl@'%' IDENTIFIED BY '53cr3tp@$$';
Query OK, 0 rows affected (0.24 sec)
mysql> GRANT REPLICATION SLAVE ON *.* TO vitess_repl@'%';
Query OK, 0 rows affected (0.08 sec)

As a next step we should find an IP address of the MySQL in Vitess. We can have one vttablet running so we can easily check the pod name:

root@k8smaster:/storage/backup/dumper# kubectl get pod | grep vttablet
vitesstpcc-vttablet-zone1-2344898534-27a85fd2       3/3     Running     0          11m

Then we can use describe to find the IP of the pod:

root@k8smaster:/storage/backup/dumper# kubectl describe pod vitesstpcc-vttablet-zone1-2344898534-27a85fd2 | grep IP
IP:                   10.244.2.30
IPs:
  IP:           10.244.2.30
      --tablet_hostname=$(POD_IP)
      POD_IP:          (v1:status.podIP)
      POD_IP:          (v1:status.podIP)

Next, we can use ‘kubectl describe nodes’ to find where this pod has been scheduled. We will need this information to log into the MySQL instance and prepare it to act as a replica. Once we log in to the node we have to find a proper Docker container. There are four containers per pod in Vitess:

root@k8snode2:~# docker ps | grep vttablet
8f2bc00a9f1f   e80442e91b90                  "/bin/mysqld_exporte…"   5 minutes ago   Up 5 minutes             k8s_mysqld-exporter_vitesstpcc-vttablet-zone1-2344898534-27a85fd2_default_f24e71d9-8047-4cb3-8f8a-fb3af494529e_0
32cfdc48165b   37c0bee9615c                  "/vt/bin/mysqlctld -…"   5 minutes ago   Up 5 minutes             k8s_mysqld_vitesstpcc-vttablet-zone1-2344898534-27a85fd2_default_f24e71d9-8047-4cb3-8f8a-fb3af494529e_0
d30033640652   37c0bee9615c                  "/vt/bin/vttablet --…"   5 minutes ago   Up 5 minutes             k8s_vttablet_vitesstpcc-vttablet-zone1-2344898534-27a85fd2_default_f24e71d9-8047-4cb3-8f8a-fb3af494529e_0
3f1091b3c417   k8s.gcr.io/pause:3.2          "/pause"                 5 minutes ago   Up 5 minutes             k8s_POD_vitesstpcc-vttablet-zone1-2344898534-27a85fd2_default_f24e71d9-8047-4cb3-8f8a-fb3af494529e_0

Database runs in the pod that has been started through /vt/bin/mysqlctld. We can log into it and create the user we could use to load data into Vitess:

root@k8snode2:~# docker exec -it 32cfdc48165b /bin/bash

vitess@vitesstpcc-vttablet-zone1-2344898534-27a85fd2:/$ mysql -S /vt/socket/mysql.sock -uroot
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 147
Server version: 8.0.23 MySQL Community Server - GPL

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE USER migrate@'%' IDENTIFIED BY 'pass';
Query OK, 0 rows affected (0.13 sec)

mysql> GRANT ALL ON *.* TO migrate@'%';
Query OK, 0 rows affected (0.15 sec)

With the user created we are ready to load the data:

root@k8smaster:~# myloader -d /storage/backup/dumper/export-20211013-203326/ -h 10.244.2.30 -P 3306 -u migrate -p pass  -t 2 -v 2 -B vt_sbtest

It may take a while but eventually you should see data import completed. We are almost done. 

Setting up asynchronous replication

The last step will be to set up the replication from the source node to the Vitess cluster. Mydumper stores metadata in the metadata file:

root@k8smaster:~/vitesstests# cat /storage/backup/dumper/export-20211013-203326/metadata
Started dump at: 2021-10-13 20:33:26
SHOW MASTER STATUS:
        Log: binlog.000032
        Pos: 545510639
        GTID:0ef24fde-2b5b-11ec-8432-0800277360cf:1-231342,
cdde527a-2b5a-11ec-8401-0800277360cf:1-2920655

Finished dump at: 2021-10-13 20:36:53

That GTID positions are what we need. Those are transactions already executed at the time of the backup, we should pass them to ‘gtid_purged’ variable:

root@k8snode2:~# docker exec -it 32cfdc48165b /bin/bash
vitess@vitesstpcc-vttablet-zone1-2344898534-27a85fd2:/$ mysql -S /vt/socket/mysql.sock -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 32276
Server version: 8.0.23 MySQL Community Server - GPL

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SHOW VARIABLES LIKE
    -> ;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1
mysql> SHOW VARIABLES LIKE '%gtid_purge%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_purged   |       |
+---------------+-------+
1 row in set (0.00 sec)

mysql> SHOW VARIABLES LIKE 'gtid_purged';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_purged   |       |
+---------------+-------+
1 row in set (0.00 sec)

mysql> SET GLOBAL gtid_purged='0ef24fde-2b5b-11ec-8432-0800277360cf:1-231342,cdde527a-2b5a-11ec-8401-0800277360cf:1-2920655';
Query OK, 0 rows affected (0.04 sec)

Then, finally, we can set up the replication and start it:

mysql> CHANGE MASTER TO MASTER_HOST='192.168.10.122', MASTER_USER='vitess_repl', MASTER_PASSWORD='53cr3tp@$$', MASTER_AUTO_POSITION=1;
Query OK, 0 rows affected, 7 warnings (0.76 sec)

mysql> START REPLICA;
Query OK, 0 rows affected (0.02 sec)

mysql> SHOW REPLICA STATUS\G
*************************** 1. row ***************************
             Replica_IO_State: Queueing master event to the relay log
                  Source_Host: 192.168.10.122
                  Source_User: vitess_repl
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000032
          Read_Source_Log_Pos: 761236648
               Relay_Log_File: vt-2344898534-relay-bin.000004
                Relay_Log_Pos: 208235412
        Relay_Source_Log_File: binlog.000032
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Source_Log_Pos: 761233968
              Relay_Log_Space: 215733910
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Source_SSL_Allowed: No
           Source_SSL_CA_File:
           Source_SSL_CA_Path:
              Source_SSL_Cert:
            Source_SSL_Cipher:
               Source_SSL_Key:
        Seconds_Behind_Source: 167705
Source_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Source_Server_Id: 1002
                  Source_UUID: 0ef24fde-2b5b-11ec-8432-0800277360cf
             Source_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
    Replica_SQL_Running_State: waiting for handler commit
           Source_Retry_Count: 86400
                  Source_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Source_SSL_Crl:
           Source_SSL_Crlpath:
           Retrieved_Gtid_Set: 0ef24fde-2b5b-11ec-8432-0800277360cf:231343-262043
            Executed_Gtid_Set: 0ef24fde-2b5b-11ec-8432-0800277360cf:1-262042,
412c5fe6-2c25-11ec-bf4b-1238388dd764:1-100,
88db780d-2c24-11ec-ba50-1238388dd764:1-35,
cdde527a-2b5a-11ec-8401-0800277360cf:1-2920655
                Auto_Position: 1
         Replicate_Rewrite_DB: (sbtest,vt_sbtest)
                 Channel_Name:
           Source_TLS_Version:
       Source_public_key_path:
        Get_Source_public_key: 0
            Network_Namespace:
1 row in set (0.00 sec)

Now, the only thing left is to wait till the Vitess replica fully catches up to the production cluster. It will happen eventually:

mysql> \P grep Seconds_Behind_Source
PAGER set to 'grep Seconds_Behind_Source'
mysql> SHOW REPLICA STATUS\G
        Seconds_Behind_Source: 0
1 row in set (0.00 sec)

Let’s finish this process by scaling out the Vitess cluster and by adding two more replicas, to match the production setup. This can be easily accomplished by changing the number of replicas in yaml file. Before we do that, to make sure that the scale out process won’t require replicating every transaction from the primary node, let’s take a backup first. It will be used by Vitess to provision new replicas.

First, check the alias:

root@k8smaster:~/vitesstests# vtctlclient ListAllTablets
zone1-2344898534 sbtest - primary 10.244.2.30:15000 10.244.2.30:3306 [] 2021-10-13T12:55:13Z

Then, run the backup:

root@k8smaster:~/vitesstests# vtctlclient Backup -allow_primary=true zone1-2344898534
I1016 19:26:34.475760 1861872 main.go:67] I1016 19:26:34.748809 backup.go:181] I1016 19:26:34.555361 builtinbackupengine.go:141] Hook: , Compress: true
I1016 19:26:34.476776 1861872 main.go:67] I1016 19:26:34.750250 backup.go:181] I1016 19:26:34.556872 builtinbackupengine.go:151] getting current replication status
I1016 19:26:34.485464 1861872 main.go:67] I1016 19:26:34.758901 backup.go:181] I1016 19:26:34.565358 builtinbackupengine.go:192] using replication position: 0ef24fde-2b5b-11ec-8432-0800277360cf:1-1837612,412c5fe6-2c25-11ec-bf4b-1238388dd764:1-100,88db780d-2c24-11ec-ba50-1238388dd764:1-35,cdde527a-2b5a-11ec-8401-0800277360cf:1-2920655
I1016 19:26:43.592255 1861872 main.go:67] I1016 19:26:43.864576 backup.go:181] I1016 19:26:43.670162 builtinbackupengine.go:287] found 219 files to backup
.
.
.
I1016 19:28:21.399789 1861872 main.go:67] I1016 19:28:21.673220 backup.go:181] I1016 19:28:21.479700 builtinbackupengine.go:213] resetting mysqld read-only to false
I1016 19:28:21.400508 1861872 main.go:67] I1016 19:28:21.673960 backup.go:181] I1016 19:28:21.480595 builtinbackupengine.go:230] restarting mysql replication

Now, we can scale out the cluster. We can apply 202_added_replicas.yaml file from our repo to accomplish that:

root@k8smaster:~/vitesstests# kubectl apply -f 202_added_replicas.yaml
vitesscluster.planetscale.com/vitesstpcc configured
secret/example-cluster-config configured

We can immediately see two more pods spinning up.

root@k8smaster:~/vitesstests# kubectl get pods
NAME                                                READY   STATUS      RESTARTS   AGE
vitess-operator-f44545df8-l5kk9                     1/1     Running     0          30d
vitesstpcc-etcd-8f68363b-1                          1/1     Running     0          3d6h
vitesstpcc-etcd-8f68363b-2                          1/1     Running     0          3d6h
vitesstpcc-etcd-8f68363b-3                          1/1     Running     0          3d6h
vitesstpcc-vttablet-zone1-2179083526-85a44b42       2/3     Running     2          2m53s
vitesstpcc-vttablet-zone1-2344898534-27a85fd2       3/3     Running     0          2d22h
vitesstpcc-vttablet-zone1-2646235096-acd452ac       2/3     Running     2          2m53s
vitesstpcc-zone1-vtctld-f38ee748-7b75487874-qsnc2   1/1     Running     2          3d6h
vitesstpcc-zone1-vtgate-67d1e711-7b8556d549-sx9tf   1/1     Running     3          3d6h

After a short moment you should see all three vttablets running and serving:

root@k8smaster:~/vitesstests# mysql -e "SHOW vitess_tablets"
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
| Cell  | Keyspace | Shard | TabletType | State   | Alias            | Hostname     | PrimaryTermStartTime |
+-------+----------+-------+------------+---------+------------------+--------------+----------------------+
| zone1 | sbtest   | -     | PRIMARY    | SERVING | zone1-2344898534 | 10.244.2.30  | 2021-10-16T19:28:21Z |
| zone1 | sbtest   | -     | REPLICA    | SERVING | zone1-2646235096 | 10.244.1.103 |                      |
| zone1 | sbtest   | -     | REPLICA    | SERVING | zone1-2179083526 | 10.244.3.92  |                      |
+-------+----------+-------+------------+---------+------------------+--------------+--------------------

This pretty much concludes the data migration process. We will have to switch the traffic to the Vitess cluster. How to do that is a separate story, depending on the environment you have. Of course, you have to run the tests to make sure all works just fine. It should, in most of the cases. Single sharded setup is pretty much standard source – replica topology with the only difference being that we connect via vtgate instead of some other proxy or loadbalancer. The switchover process should also be pretty much the same like if you had a separate cluster replicating off your production setup. In our case it could start with adding Vitess VTGate to ProxySQL and redirecting the traffic to it. How to perform a proper switchover is a topic for a separate blog but the idea is to start slowly, with a small percentage of reads,steadily increasing the workload and, eventually, redirecting also writes. After the switchover is done, we can just stop the old, non-Vitess production setup and clean the replication on our Vitess’ primary vttablet:

mysql> STOP REPLICA;
Query OK, 0 rows affected (0.01 sec)
mysql> RESET REPLICA ALL;
Query OK, 0 rows affected (0.02 sec)
mysql> SHOW REPLICA STATUS\G
Empty set (0.00 sec)

In our case, let’s delete our Vitess cluster and we will continue in another blog, showing an alternative way to migrate your data into Vitess.

root@k8smaster:~/vitesstests# kubectl delete -f 202_added_replicas.yaml
vitesscluster.planetscale.com "vitesstpcc" deleted
secret "example-cluster-config" deleted

root@k8smaster:~/vitesstests# kubectl get pods
NAME                              READY   STATUS      RESTARTS   AGE
vitess-operator-f44545df8-l5kk9   1/1     Running     0          30d

After a short while we should see only the operator pod (and maybe some recycler pods as well).