Vitess – Merging the shards

In the last blog we have split one shard into several smaller ones, to accommodate a significant increase of load on one of the accounts. In this blog we are going to pretend that this condition no longer exists and we want to bring the cluster back to its original shape.

If you would like to follow our steps you can clone this repository: vitesstests

Currently we have five shards, each with two pods, a primary and replica:

root@k8smaster:~/vitesstests# vtctlclient listalltablets | grep newsbtest
zone1-0349226440 newsbtest 55-aa primary 10.244.2.3:15000 10.244.2.3:3306 [] 2021-09-25T17:20:38Z
zone1-0778238830 newsbtest b29f900000000000-b29f900000000001 primary 10.244.3.39:15000 10.244.3.39:3306 [] 2021-09-30T13:43:05Z
zone1-1504968304 newsbtest -55 primary 10.244.2.254:15000 10.244.2.254:3306 [] 2021-09-25T08:01:41Z
zone1-1572740937 newsbtest b29f900000000001- primary 10.244.2.5:15000 10.244.2.5:3306 [] 2021-09-30T13:35:45Z
zone1-1676955594 newsbtest -55 replica 10.244.1.49:15000 10.244.1.49:3306 [] <null>
zone1-2150384904 newsbtest b29f900000000000-b29f900000000001 replica 10.244.1.51:15000 10.244.1.51:3306 [] <null>
zone1-2214114162 newsbtest aa-b29f900000000000 primary 10.244.2.6:15000 10.244.2.6:3306 [] 2021-09-30T13:35:43Z
zone1-3429635106 newsbtest aa-b29f900000000000 replica 10.244.3.38:15000 10.244.3.38:3306 [] <null>
zone1-3839184014 newsbtest b29f900000000001- replica 10.244.3.37:15000 10.244.3.37:3306 [] <null>
zone1-4162850680 newsbtest 55-aa replica 10.244.1.48:15000 10.244.1.48:3306 [] <null>

What we did was to create a shard that contains a single problematic ID: b29f900000000000-b29f900000000001. Now, what we want to do is to merge this shard into the others, getting back to three shard setup: -55, 55-aa and aa-. We can approach it two-fold. We can use one of the old yaml files to create equally partitioned shards. We can also stick to the custom shards and just create another shard that will contain the range aa-. Then we will repartition the data and, eventually, wipe out the three shards that we have recently created.

Let’s stick to the custom sharding scheme, to maintain the flexibility. Technically, it will be exactly the same outcome as using equally partitioned shards but we will have all the yaml ready to add more custom shards should we need it in the future.

What has to be done is to define a shard with range which starts at aa:

        - databaseInitScriptSecret:
            name: example-cluster-config
            key: init_db.sql
          keyRange: {
            start: "aa",
            end: ""
          }
          tabletPools:
          - cell: zone1
            type: replica
            replicas: 2
            vttablet:
              extraFlags:
                db_charset: utf8mb4
                backup_storage_implementation: file
                backup_engine_implementation: xtrabackup
                xtrabackup_root_path: /usr/bin
                xtrabackup_user: root
                xtrabackup_stripes: '8'
                restore_from_backup: 'true'
                file_backup_storage_root: /mnt/backup
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
            mysqld:
              resources:
                requests:
                  cpu: 500m
                  memory: 1Gi
            dataVolumeClaimTemplate:
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 100Gi
            extraVolumes:
            - name: backupvol
              persistentVolumeClaim:
                claimName: "backupvol"
                accessModes: ["ReadWriteMany"]
                resources:
                  requests:
                    storage: 100Gi
                volumeName: backup
            extraVolumeMounts:
            - name: backupvol
              mountPath: /mnt

This definition is located in file 110_merge_newsbtest.yaml which we can apply. Then, once the new pods are created, we can run the merge reshard process:

root@k8smaster:~/vitesstests# vtctlclient Reshard -source_shards 'aa-b29f900000000000,b29f900000000000-b29f900000000001,b29f900000000001-' -target_shards 'aa-' Create newsbtest.reshard                         

As before, we should wait till it complete and we can check the progress of the reshard process:

root@k8smaster:~/vitesstests# vtctlclient Reshard Progress newsbtest.reshard

Copy Progress (approx):

sbtest2: rows copied 61731/1620865 (3%), size copied 14172160/370606080 (3%)
sbtest1: rows copied 410084/1621541 (25%), size copied 146489344/365363200 (40%)


Following vreplication streams are running for workflow newsbtest.reshard:

id=1 on aa-/zone1-3262256522: Status: Copying. VStream Lag: 0s.
id=2 on aa-/zone1-3262256522: Status: Running. VStream Lag: 0s.
id=3 on aa-/zone1-3262256522: Status: Copying. VStream Lag: 0s.

When the process completes, as usual, we switch the traffic:

root@k8smaster:~/vitesstests# vtctlclient Reshard Progress newsbtest.reshard

Copy Completed.

Following vreplication streams are running for workflow newsbtest.reshard:

id=1 on aa-/zone1-3262256522: Status: Running. VStream Lag: 0s.
id=2 on aa-/zone1-3262256522: Status: Running. VStream Lag: 0s.
id=3 on aa-/zone1-3262256522: Status: Running. VStream Lag: 0s.

root@k8smaster:~/vitesstests# vtctlclient Reshard SwitchTraffic newsbtest.reshard
.
.
.
SwitchTraffic was successful for workflow newsbtest.reshard
Start State: Reads Not Switched. Writes Not Switched
Current State: All Reads Switched. Writes Switched

Then we can drain the unused tablets:

root@k8smaster:~/vitesstests# for vt in $(vtctlclient ListAllTablets | grep b29f90000000000 | awk '{print $1}') ; do pod="$(kubectl get pod | grep ${vt} | awk '{print $1}')" ; echo ${pod} ; kubectl annotate po
d ${pod} drain.planetscale.com/started="Cleanup of shard merge" ; done
example-vttablet-zone1-0778238830-282e9c13
pod/example-vttablet-zone1-0778238830-282e9c13 annotated
example-vttablet-zone1-1572740937-a6805ba1
pod/example-vttablet-zone1-1572740937-a6805ba1 annotated
example-vttablet-zone1-2150384904-7dbc6918
pod/example-vttablet-zone1-2150384904-7dbc6918 annotated
example-vttablet-zone1-2214114162-a3c19b79
pod/example-vttablet-zone1-2214114162-a3c19b79 annotated
example-vttablet-zone1-3429635106-554af44a
pod/example-vttablet-zone1-3429635106-554af44a annotated
example-vttablet-zone1-3839184014-2e383f9b
pod/example-vttablet-zone1-3839184014-2e383f9b annotated

Finally, we can apply 111_cleanup_after_merge.yaml and, if needed, bounce vttablet containers to clean up unneeded pods. In our case, this time, it was not needed:

root@k8smaster:~/vitesstests# kubectl get pod
NAME                                                 READY   STATUS        RESTARTS   AGE
pod/example-etcd-faf13de3-1                          1/1     Running       0          19d
pod/example-etcd-faf13de3-2                          1/1     Running       0          19d
pod/example-etcd-faf13de3-3                          1/1     Running       0          19d
pod/example-vttablet-zone1-0349226440-35dab1bc       3/3     Running       227        10d
pod/example-vttablet-zone1-0778238830-282e9c13       3/3     Terminating   1          5d6h
pod/example-vttablet-zone1-1463074389-a4c6b61f       3/3     Running       1          2d22h
pod/example-vttablet-zone1-1504968304-96f9a1bf       3/3     Running       1          10d
pod/example-vttablet-zone1-1572740937-a6805ba1       3/3     Terminating   1          5d6h
pod/example-vttablet-zone1-1676955594-dc39347b       3/3     Running       1          10d
pod/example-vttablet-zone1-2150384904-7dbc6918       3/3     Terminating   1          5d6h
pod/example-vttablet-zone1-2179083526-f3060bc1       3/3     Running       1          18d
pod/example-vttablet-zone1-2214114162-a3c19b79       3/3     Terminating   1          5d6h
pod/example-vttablet-zone1-2344898534-e9abaf0e       3/3     Running       1          19d
pod/example-vttablet-zone1-2646235096-9ba85582       3/3     Running       1          19d
pod/example-vttablet-zone1-3262256522-ea0a10a7       3/3     Running       1          2d22h
pod/example-vttablet-zone1-3429635106-554af44a       3/3     Terminating   1          5d6h
pod/example-vttablet-zone1-3839184014-2e383f9b       3/3     Terminating   1          5d6h
pod/example-vttablet-zone1-4162850680-b78f527c       3/3     Running       227        10d
pod/example-zone1-vtctld-1d4dcad0-64668cccc8-swmj4   1/1     Running       1          19d
pod/example-zone1-vtgate-bc6cde92-8665cd4df-kwgcn    1/1     Running       1          19d
pod/vitess-operator-f44545df8-l5kk9                  1/1     Running       0          19d

Everything terminated nicely and gracefully. The last step that we had to perform was to recycle used pv’s (remember, we are still working on the poor man’s Kubernetes cluster):

root@k8smaster:~/vitesstests# for pv in $(kubectl get pv | grep Failed | awk '{print $1}' | cut -d / -f 2) ; do echo ${pv} ; rm -rf /storage/${pv}/ ; done
pv1
pv14
pv15
pv16
pv3
pv5

As you can see the merge process, pretty much just like almost every other operation on the shards, is very straightforward and easy to perform on Kubernetes in Vitess cluster deployed using the operator. If ease of scalability is something that you are after then Vitess is definitely something you should consider.

This blog concludes this part of the Vitess series. We have gone through adding replicas, adding shards, applying custom shard ranges and, finally, scale down through merging of the shards. We will continue looking at Vitess, this time we will try to understand how to deal with traffic, see what options are available to optimize access and queries.