In previous blog posts we have looked at how to migrate TPCC-like schema that has been created using SysBench. Now, we have it in our Vitess cluster and we would like to take it a next step further and see how it can be sharded.
Analysing the schema and queries
The first step that we have to do is to analyze the schema and queries that are executed against it. We are going to look for columns that are frequently used in the majority of the queries and see if we can use them to split the data in shards.
What we can definitely say is that warehouse ID is quite commonly used across the tables. With an exception of the “item” tables, all other tables have the reference to warehouse ID as the first column in their primary keys. This sounds like something that can be used. On top of that, analysis of the slow query log also points to the fact that some kind of reference to the warehouse table is used almost always.
“Item” table doesn’t have any kind of references to any other table, no foreign key or columns that seem like linked to some other table:
mysql> SHOW CREATE TABLE item1\G
*************************** 1. row ***************************
Table: item1
Create Table: CREATE TABLE `item1` (
`i_id` int NOT NULL,
`i_im_id` int DEFAULT NULL,
`i_name` varchar(24) DEFAULT NULL,
`i_price` decimal(5,2) DEFAULT NULL,
`i_data` varchar(50) DEFAULT NULL,
PRIMARY KEY (`i_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.01 sec)
Unfortunately, it is referenced in “stock” table through the foreign key as a parent table:
mysql> SHOW CREATE TABLE stock1\G
*************************** 1. row ***************************
Table: stock1
Create Table: CREATE TABLE `stock1` (
`s_i_id` int NOT NULL,
`s_w_id` smallint NOT NULL,
`s_quantity` smallint DEFAULT NULL,
`s_dist_01` char(24) DEFAULT NULL,
`s_dist_02` char(24) DEFAULT NULL,
`s_dist_03` char(24) DEFAULT NULL,
`s_dist_04` char(24) DEFAULT NULL,
`s_dist_05` char(24) DEFAULT NULL,
`s_dist_06` char(24) DEFAULT NULL,
`s_dist_07` char(24) DEFAULT NULL,
`s_dist_08` char(24) DEFAULT NULL,
`s_dist_09` char(24) DEFAULT NULL,
`s_dist_10` char(24) DEFAULT NULL,
`s_ytd` decimal(8,0) DEFAULT NULL,
`s_order_cnt` smallint DEFAULT NULL,
`s_remote_cnt` smallint DEFAULT NULL,
`s_data` varchar(50) DEFAULT NULL,
PRIMARY KEY (`s_w_id`,`s_i_id`),
KEY `fkey_stock_21` (`s_i_id`),
CONSTRAINT `fkey_stock_1_1` FOREIGN KEY (`s_w_id`) REFERENCES `warehouse1` (`w_id`),
CONSTRAINT `fkey_stock_2_1` FOREIGN KEY (`s_i_id`) REFERENCES `item1` (`i_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.04 sec)
The idea behind it is to make sure that the item entry in the “item” table will not be deleted if such an item exists in the “stock” table. This leads us to the following design options. First, making the most sense, would be to modify our application to enforce this constraint on the app side, not through foreign keys in the database. This will allow us to store the “item” tables in a separate keyspace. Alternatively we could collocate “item” tables with every shard, satisfying the foreign key constraint but this would also require some changes in the application – any change in the “item” tables would require to be repeated on every shard.
In our case we’ll proceed with the first option. We are not going to make a change in the application but we have verified that in TPCC workload no deletes or inserts are executed against “item” tables so we are good at this point.
To sum it up, we are going to remove foreign keys referencing the “item” tables in the “stock” tables and then we’ll move data around. “item” tables will be stored in an unsharded “items” keyspace while all other tables will be sharded using warehouse ID as the sharding key. For that we have to create a new keyspace and migrate item tables there.
Adding keyspace for “items” tables
We are adding keyspace definition for “items” keyspace:
- name: items
turndownPolicy: Immediate
partitionings:
- equal:
parts: 1
shardTemplate:
databaseInitScriptSecret:
name: example-cluster-config
key: init_db.sql
replication:
enforceSemiSync: false
tabletPools:
- cell: zone1
type: replica
replicas: 3
vttablet:
extraFlags:
db_charset: utf8mb4
backup_storage_implementation: file
backup_engine_implementation: xtrabackup
xtrabackup_root_path: /usr/bin
xtrabackup_user: root
xtrabackup_stripes: "8"
xtrabackup_stream_mode: "xbstream"
restore_from_backup: 'true'
file_backup_storage_root: /mnt/backup
resources:
requests:
cpu: 1
memory: 2Gi
limits:
cpu: 1
memory: 2Gi
mysqld:
configOverrides: |
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=1024M
resources:
requests:
cpu: 1
memory: 2Gi
limits:
cpu: 1
memory: 2Gi
dataVolumeClaimTemplate:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
extraVolumes:
- name: backupvol
persistentVolumeClaim:
claimName: "backupvol"
accessModes: ["ReadWriteMany"]
resources:
requests:
storage: 100Gi
volumeName: backup
extraVolumeMounts:
- name: backupvol
mountPath: /mnt
It has been added in the 207_items_keyspace_added.yaml file in our GitHub repository.
Let’s apply this cluster definition file and we’ll add three vttablets that will contain three replicas for “items” keyspace.
root@k8smaster:~/vitesstests# kubectl apply -f 207_items_keyspace_added.yaml
vitesscluster.planetscale.com/vitesstpcc configured
secret/example-cluster-config configured
After a brief while you should see new pods spun up:
root@k8smaster:~/vitesstests# kubectl get pods NAME READY STATUS RESTARTS AGE recycler-for-pv17 0/1 Completed 0 4d18h recycler-for-pv5 0/1 Completed 0 4d18h vitess-operator-7ccd86b994-ctj7c 1/1 Running 1 23d vitesstpcc-etcd-8f68363b-1 1/1 Running 0 4d18h vitesstpcc-etcd-8f68363b-2 1/1 Running 0 4d18h vitesstpcc-etcd-8f68363b-3 1/1 Running 0 4d18h vitesstpcc-vttablet-zone1-1817996704-58f13a5a 3/3 Running 0 4d18h vitesstpcc-vttablet-zone1-2086446713-08d0f189 3/3 Running 1 95s vitesstpcc-vttablet-zone1-2706617079-bc43076c 3/3 Running 1 95s vitesstpcc-vttablet-zone1-3151213148-0fcde563 3/3 Running 1 95s vitesstpcc-zone1-vtctld-f38ee748-5645c7d7cd-6cbkf 1/1 Running 0 4d18h vitesstpcc-zone1-vtgate-67d1e711-94bfcb98f-64x7v 1/1 Running 0 4d18h
Deleting foreign key from “stock” tables
We have to now proceed with a schema change on “stock” tables that will remove the foreign key which references “item” tables. There are several methods in which schema change can be applied on Vitess but the fact that foreign keys are used all over the schema limits the options that we have quite significantly. We, theoretically speaking, can use pt-onlie-schema-change or direct alters. Pt-online-schema-change is, potentially, problematic this is why, given our data set is not that large, we decided to go with the direct method:
root@k8smaster:~/vitesstests# for i in $(seq 1 10) ; do echo ${i} ; vtctlclient ApplySchema -allow_long_unavailability -ddl_strategy=direct -sql="ALTER TABLE stock${i} DROP FOREIGN KEY fkey_stock_2_${i}" newsbtest ; done
1
W1128 10:09:47.130558 2628832 main.go:67] W1128 10:09:47.056878 tablet_executor.go:134] Processing big schema change. This may cause visible MySQL downtime.
I1128 10:10:09.4905351 2628832 main.go:67] I1128 10:10:09.572576 tablet_executor.go:277] Received DDL request. strategy=direct
2
W1128 10:09:58.278597 2628912 main.go:67] W1128 10:09:58.230125 tablet_executor.go:134] Processing big schema change. This may cause visible MySQL downtime.
I1128 10:10:09.605618 2628912 main.go:67] I1128 10:10:09.560197 tablet_executor.go:277] Received DDL request. strategy=direct
3
W1128 10:10:09.846294 2628997 main.go:67] W1128 10:10:09.800549 tablet_executor.go:134] Processing big schema change. This may cause visible MySQL downtime.
I1128 10:10:17.432510 2628997 main.go:67] I1128 10:10:17.387662 tablet_executor.go:277] Received DDL request. strategy=direct
4
W1128 10:10:17.659991 2629073 main.go:67] W1128 10:10:17.615323 tablet_executor.go:134] Processing big schema change. This may cause visible MySQL downtime.
I1128 10:10:28.775981 2629073 main.go:67] I1128 10:10:28.730931 tablet_executor.go:277] Received DDL request. strategy=direct
5
W1128 10:10:29.245889 2629151 main.go:67] W1128 10:10:29.200017 tablet_executor.go:134] Processing big schema change. This may cause visible MySQL downtime.
I1128 10:10:40.790398 2629151 main.go:67] I1128 10:10:40.744633 tablet_executor.go:277] Received DDL request. strategy=direct
6
W1128 10:10:41.070807 2629235 main.go:67] W1128 10:10:41.024154 tablet_executor.go:134] Processing big schema change. This may cause visible MySQL downtime.
I1128 10:10:50.207777 2629235 main.go:67] I1128 10:10:50.159749 tablet_executor.go:277] Received DDL request. strategy=direct
7
W1128 10:10:50.549740 2629318 main.go:67] W1128 10:10:50.503366 tablet_executor.go:134] Processing big schema change. This may cause visible MySQL downtime.
I1128 10:11:03.677084 2629318 main.go:67] I1128 10:11:03.633118 tablet_executor.go:277] Received DDL request. strategy=direct
8
W1128 10:11:04.266861 2629437 main.go:67] W1128 10:11:04.218850 tablet_executor.go:134] Processing big schema change. This may cause visible MySQL downtime.
I1128 10:11:15.210185 2629437 main.go:67] I1128 10:11:15.164540 tablet_executor.go:277] Received DDL request. strategy=direct
9
W1128 10:11:15.885737 2629517 main.go:67] W1128 10:11:15.806143 tablet_executor.go:134] Processing big schema change. This may cause visible MySQL downtime.
I1128 10:11:25.754979 2629517 main.go:67] I1128 10:11:25.710539 tablet_executor.go:277] Received DDL request. strategy=direct
10
W1128 10:11:27.375892 2629612 main.go:67] W1128 10:11:27.321413 tablet_executor.go:134] Processing big schema change. This may cause visible MySQL downtime.
I1128 10:11:37.892553 2629612 main.go:67] I1128 10:11:37.836899 tablet_executor.go:277] Received DDL request. strategy=direct
All went well, we can verify that the foreign keys have been removed:
mysql> show create table stock1\G
*************************** 1. row ***************************
Table: stock1
Create Table: CREATE TABLE `stock1` (
`s_i_id` int NOT NULL,
`s_w_id` smallint NOT NULL,
`s_quantity` smallint DEFAULT NULL,
`s_dist_01` char(24) DEFAULT NULL,
`s_dist_02` char(24) DEFAULT NULL,
`s_dist_03` char(24) DEFAULT NULL,
`s_dist_04` char(24) DEFAULT NULL,
`s_dist_05` char(24) DEFAULT NULL,
`s_dist_06` char(24) DEFAULT NULL,
`s_dist_07` char(24) DEFAULT NULL,
`s_dist_08` char(24) DEFAULT NULL,
`s_dist_09` char(24) DEFAULT NULL,
`s_dist_10` char(24) DEFAULT NULL,
`s_ytd` decimal(8,0) DEFAULT NULL,
`s_order_cnt` smallint DEFAULT NULL,
`s_remote_cnt` smallint DEFAULT NULL,
`s_data` varchar(50) DEFAULT NULL,
PRIMARY KEY (`s_w_id`,`s_i_id`),
KEY `fkey_stock_21` (`s_i_id`),
CONSTRAINT `fkey_stock_1_1` FOREIGN KEY (`s_w_id`) REFERENCES `warehouse1` (`w_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.04 sec)
Migrating “item” tables to a new keyspace
Now is the time to move the item tables to the new keyspace that we have just created.
root@k8smaster:~/vitesstests# vtctlclient MoveTables -source newsbtest -tables 'item1,item2,item3,item4,item5,item6,item7,item8,item9,item10' Create items.item_table_migration Waiting for workflow to start: 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... E1128 09:18:57.919180 2606427 main.go:67] E1128 09:18:57.873397 vtctl.go:2554] workflow did not start within 30s MoveTables Error: rpc error: code = Unknown desc = workflow did not start within 30s E1128 09:18:57.919295 2606427 main.go:76] remote error: rpc error: code = Unknown desc = workflow did not start within 30s
The error, for some reason, is quite common and does not necessarily mean the process is not working. We can verify the progress:
root@k8smaster:~/vitesstests/tpcc_vschema# vtctlclient MoveTables Progress items.item_table_migration Copy Progress (approx): item6: rows copied 0/99601 (0%), size copied 16384/11026432 (0%) item7: rows copied 0/99510 (0%), size copied 16384/11026432 (0%) item8: rows copied 0/99480 (0%), size copied 16384/11026432 (0%) item9: rows copied 0/99601 (0%), size copied 16384/11026432 (0%) item4: rows copied 0/99510 (0%), size copied 16384/11026432 (0%) item5: rows copied 0/99601 (0%), size copied 16384/11026432 (0%) Following vreplication streams are running for workflow items.item_table_migration: id=1 on -/zone1-2706617079: Status: Copying. VStream Lag: -1s. Tx time: Sun Nov 28 09:19:44 2021.
As you can see, status is “Copying” which means the data is being transferred. Finally, you should see an indication that the process has completed:
root@k8smaster:~/vitesstests/tpcc_vschema# vtctlclient MoveTables Progress items.item_table_migration Copy Completed. Following vreplication streams are running for workflow items.item_table_migration: id=1 on -/zone1-2706617079: Status: Running. VStream Lag: -1s. Tx time: Sun Nov 28 09:22:26 2021.
This is also a good time to verify if there are any incompatibilities between our design and the application. We have went through the queries executed by our application and one become problematic:
SELECT i_price, i_name, i_data
FROM item3
WHERE i_id = 38476
The reason for that is we have separated the ‘item’ tables into a separate keyspace. We had to modify the application and explicitly asks for the ‘items’ keyspace:
SELECT i_price, i_name, i_data
FROM items.item3
WHERE i_id = 38476
Now, we should move the traffic to the new keyspace and then finalize the workflow by running:
root@k8smaster:~/vitesstests# vtctlclient MoveTables SwitchTraffic items.item_table_migration
I1128 09:24:39.515275 2609310 main.go:67] I1128 09:24:39.470227 traffic_switcher.go:442] Built switching metadata: &{migrationType:0 wr:0xc000da0b40 workflow:item_table_migration frozen:false reverseWorkflow:item_table_migration_reverse id:1242262565986086939 sources:map[-:0xc00036e8a0] targets:map[-:0xc000dceb10] sourceKeyspace:newsbtest targetKeyspace:items tables:[item1 item10 item2 item3 item4 item5 item6 item7 item8 item9] sourceKSSchema:0xc000339e00 optCells: optTabletTypes:primary,replica,rdonly externalCluster: externalTopo:<nil>}
I1128 09:24:39.698218 2609310 main.go:67] I1128 09:24:39.653296 traffic_switcher.go:480] No previous journals were found. Proceeding normally.
I1128 09:24:39.698261 2609310 main.go:67] I1128 09:24:39.653347 traffic_switcher.go:490] Stopping streams
I1128 09:24:39.698270 2609310 main.go:67] I1128 09:24:39.653357 traffic_switcher.go:502] Stopping source writes
I1128 09:24:39.757303 2609310 main.go:67] I1128 09:24:39.706646 keyspace.go:40] RefreshTabletsByShard called on shard newsbtest/-
I1128 09:24:39.809043 2609310 main.go:67] I1128 09:24:39.756933 keyspace.go:69] Calling RefreshState on tablet zone1-1817996704
I1128 09:24:39.925263 2609310 main.go:67] I1128 09:24:39.880834 traffic_switcher.go:997] Stopped Source Writes. Position for source newsbtest:-: MySQL56/b807b760-4c6e-11ec-af80-aad4cf17eb94:1-78150
I1128 09:24:39.925291 2609310 main.go:67] I1128 09:24:39.880934 traffic_switcher.go:509] Waiting for streams to catchup
I1128 09:24:39.925308 2609310 main.go:67] I1128 09:24:39.881016 traffic_switcher.go:1022] Before Catchup: uid: 1, target primary zone1-2706617079, target position , shard primary_alias:{cell:"zone1" uid:2706617079} primary_term_start_time:{seconds:1638090752 nanoseconds:407880564} key_range:{} is_primary_serving:true
I1128 09:24:39.925333 2609310 main.go:67] I1128 09:24:39.881036 traffic_switcher.go:1026] Before Catchup: waiting for keyspace:shard: items:- to reach source position MySQL56/b807b760-4c6e-11ec-af80-aad4cf17eb94:1-78150, uid 1
I1128 09:24:40.930070 2609310 main.go:67] I1128 09:24:40.886171 traffic_switcher.go:1033] After catchup: position for keyspace:shard: items:- reached, uid 1
I1128 09:24:40.941513 2609310 main.go:67] I1128 09:24:40.895864 traffic_switcher.go:1047] After catchup, position for target primary zone1-2706617079, MySQL56/48cf191c-502b-11ec-b129-ba32cdbad2b7:1-512
I1128 09:24:40.942955 2609310 main.go:67] I1128 09:24:40.895942 traffic_switcher.go:516] Migrating streams
I1128 09:24:40.942975 2609310 main.go:67] I1128 09:24:40.895980 traffic_switcher.go:523] Creating reverse streams
I1128 09:24:40.981342 2609310 main.go:67] I1128 09:24:40.934357 traffic_switcher.go:1222] Creating journal: id:1242262565986086939 tables:"item1" tables:"item10" tables:"item2" tables:"item3" tables:"item4" tables:"item5" tables:"item6" tables:"item7" tables:"item8" tables:"item9" local_position:"MySQL56/b807b760-4c6e-11ec-af80-aad4cf17eb94:1-78150" shard_gtids:{keyspace:"items" shard:"-" gtid:"MySQL56/48cf191c-502b-11ec-b129-ba32cdbad2b7:1-512"} participants:{keyspace:"newsbtest" shard:"-"}
I1128 09:24:40.990266 2609310 main.go:67] I1128 09:24:40.945063 keyspace.go:40] RefreshTabletsByShard called on shard items/-
I1128 09:24:41.005677 2609310 main.go:67] I1128 09:24:40.960831 keyspace.go:69] Calling RefreshState on tablet zone1-3151213148
I1128 09:24:41.005714 2609310 main.go:67] I1128 09:24:40.960865 keyspace.go:69] Calling RefreshState on tablet zone1-2706617079
I1128 09:24:41.005724 2609310 main.go:67] I1128 09:24:40.960982 keyspace.go:69] Calling RefreshState on tablet zone1-2086446713
I1128 09:24:41.027511 2609310 main.go:67] I1128 09:24:40.983661 traffic_switcher.go:1266] Delete routing: items.item1
I1128 09:24:41.027538 2609310 main.go:67] I1128 09:24:40.983741 traffic_switcher.go:1269] Add routing: item1 newsbtest.item1
I1128 09:24:41.027550 2609310 main.go:67] I1128 09:24:40.983813 traffic_switcher.go:1266] Delete routing: items.item10
I1128 09:24:41.027735 2609310 main.go:67] I1128 09:24:40.983851 traffic_switcher.go:1269] Add routing: item10 newsbtest.item10
I1128 09:24:41.027752 2609310 main.go:67] I1128 09:24:40.983908 traffic_switcher.go:1266] Delete routing: items.item2
I1128 09:24:41.027763 2609310 main.go:67] I1128 09:24:40.983930 traffic_switcher.go:1269] Add routing: item2 newsbtest.item2
I1128 09:24:41.027769 2609310 main.go:67] I1128 09:24:40.983998 traffic_switcher.go:1266] Delete routing: items.item3
I1128 09:24:41.027797 2609310 main.go:67] I1128 09:24:40.984042 traffic_switcher.go:1269] Add routing: item3 newsbtest.item3
I1128 09:24:41.027811 2609310 main.go:67] I1128 09:24:40.984096 traffic_switcher.go:1266] Delete routing: items.item4
I1128 09:24:41.027824 2609310 main.go:67] I1128 09:24:40.984137 traffic_switcher.go:1269] Add routing: item4 newsbtest.item4
I1128 09:24:41.028518 2609310 main.go:67] I1128 09:24:40.984156 traffic_switcher.go:1266] Delete routing: items.item5
I1128 09:24:41.028553 2609310 main.go:67] I1128 09:24:40.984231 traffic_switcher.go:1269] Add routing: item5 newsbtest.item5
I1128 09:24:41.028566 2609310 main.go:67] I1128 09:24:40.984289 traffic_switcher.go:1266] Delete routing: items.item6
I1128 09:24:41.028579 2609310 main.go:67] I1128 09:24:40.984315 traffic_switcher.go:1269] Add routing: item6 newsbtest.item6
I1128 09:24:41.028590 2609310 main.go:67] I1128 09:24:40.984388 traffic_switcher.go:1266] Delete routing: items.item7
I1128 09:24:41.028604 2609310 main.go:67] I1128 09:24:40.984431 traffic_switcher.go:1269] Add routing: item7 newsbtest.item7
I1128 09:24:41.028651 2609310 main.go:67] I1128 09:24:40.984500 traffic_switcher.go:1266] Delete routing: items.item8
I1128 09:24:41.028701 2609310 main.go:67] I1128 09:24:40.984522 traffic_switcher.go:1269] Add routing: item8 newsbtest.item8
I1128 09:24:41.028722 2609310 main.go:67] I1128 09:24:40.984590 traffic_switcher.go:1266] Delete routing: items.item9
I1128 09:24:41.028735 2609310 main.go:67] I1128 09:24:40.984616 traffic_switcher.go:1269] Add routing: item9 newsbtest.item9
I1128 09:24:41.098073 2609310 main.go:67] I1128 09:24:41.053208 traffic_switcher.go:1524] Marking target streams frozen for workflow item_table_migration db_name vt_items
SwitchTraffic was successful for workflow items.item_table_migration
Start State: Reads Not Switched. Writes Not Switched
Current State: All Reads Switched. Writes Switched
The traffic has been switched, we can now complete the process
root@k8smaster:~/vitesstests# vtctlclient MoveTables Complete items.item_table_migration I1128 10:16:04.593464 2631589 main.go:67] I1128 10:16:04.547075 traffic_switcher.go:1472] Dropping table vt_newsbtest.item1 I1128 10:16:05.085988 2631589 main.go:67] I1128 10:16:05.041912 traffic_switcher.go:1483] Removed table vt_newsbtest.item1 I1128 10:16:05.086013 2631589 main.go:67] I1128 10:16:05.041948 traffic_switcher.go:1472] Dropping table vt_newsbtest.item10 I1128 10:16:05.251355 2631589 main.go:67] I1128 10:16:05.206632 traffic_switcher.go:1483] Removed table vt_newsbtest.item10 I1128 10:16:05.251391 2631589 main.go:67] I1128 10:16:05.206671 traffic_switcher.go:1472] Dropping table vt_newsbtest.item2 I1128 10:16:05.449796 2631589 main.go:67] I1128 10:16:05.389048 traffic_switcher.go:1483] Removed table vt_newsbtest.item2 I1128 10:16:05.451788 2631589 main.go:67] I1128 10:16:05.390358 traffic_switcher.go:1472] Dropping table vt_newsbtest.item3 I1128 10:16:05.563527 2631589 main.go:67] I1128 10:16:05.517962 traffic_switcher.go:1483] Removed table vt_newsbtest.item3 I1128 10:16:05.563553 2631589 main.go:67] I1128 10:16:05.517994 traffic_switcher.go:1472] Dropping table vt_newsbtest.item4 I1128 10:16:05.752895 2631589 main.go:67] I1128 10:16:05.693817 traffic_switcher.go:1483] Removed table vt_newsbtest.item4 I1128 10:16:05.752924 2631589 main.go:67] I1128 10:16:05.693850 traffic_switcher.go:1472] Dropping table vt_newsbtest.item5 I1128 10:16:05.996603 2631589 main.go:67] I1128 10:16:05.941038 traffic_switcher.go:1483] Removed table vt_newsbtest.item5 I1128 10:16:05.996630 2631589 main.go:67] I1128 10:16:05.941098 traffic_switcher.go:1472] Dropping table vt_newsbtest.item6 I1128 10:16:06.184966 2631589 main.go:67] I1128 10:16:06.139987 traffic_switcher.go:1483] Removed table vt_newsbtest.item6 I1128 10:16:06.185000 2631589 main.go:67] I1128 10:16:06.140017 traffic_switcher.go:1472] Dropping table vt_newsbtest.item7 I1128 10:16:06.366270 2631589 main.go:67] I1128 10:16:06.321478 traffic_switcher.go:1483] Removed table vt_newsbtest.item7 I1128 10:16:06.366883 2631589 main.go:67] I1128 10:16:06.321519 traffic_switcher.go:1472] Dropping table vt_newsbtest.item8 I1128 10:16:06.467806 2631589 main.go:67] I1128 10:16:06.416383 traffic_switcher.go:1483] Removed table vt_newsbtest.item8 I1128 10:16:06.468400 2631589 main.go:67] I1128 10:16:06.416416 traffic_switcher.go:1472] Dropping table vt_newsbtest.item9 I1128 10:16:06.607029 2631589 main.go:67] I1128 10:16:06.558632 traffic_switcher.go:1483] Removed table vt_newsbtest.item9 I1128 10:16:06.644029 2631589 main.go:67] I1128 10:16:06.599152 keyspace.go:40] RefreshTabletsByShard called on shard newsbtest/- I1128 10:16:06.684161 2631589 main.go:67] I1128 10:16:06.639422 keyspace.go:69] Calling RefreshState on tablet zone1-1817996704 I1128 10:16:06.696743 2631589 main.go:67] I1128 10:16:06.650965 traffic_switcher.go:1546] Deleting reverse streams for workflow item_table_migration db_name vt_newsbtest I1128 10:16:06.721286 2631589 main.go:67] I1128 10:16:06.677002 traffic_switcher.go:1537] Deleting target streams for workflow item_table_migration db_name vt_items Complete was successful for workflow items.item_table_migration Start State: All Reads Switched. Writes Switched Current State: Workflow Not Found
All seems to be working just fine, we have concluded the migration of “item” tables to their new keyspace.
Sharding the rest of the dataset
Creating vschema for sharded newsbtest keyspace
Now, having “item” tables out of our way, we can proceed with the process of sharding the rest of the tables. As we have mentioned, we’ll be using warehouse ID as the sharding key.
root@k8smaster:~/vitesstests# vtctlclient ApplyVschema -vschema="$(cat tpcc_vschema/vschema_tpcc_vindex.json)" newsbtest
New VSchema object:
{
"sharded": true,
"vindexes": {
"w_id": {
"type": "reverse_bits"
}
},
"tables": {
"customer1": {
"columnVindexes": [
{
"column": "c_w_id",
"name": "w_id"
}
]
},
.
.
.
"warehouse9": {
"columnVindexes": [
{
"column": "w_id",
"name": "w_id"
}
]
}
}
}
If this is not what you expected, check the input data (as JSON parsing will skip unexpected fields).
As the next step, we have to spin up the vttablets for the sharded keyspace.
Creating new vttablets for sharded newsbtest keyspace
At this moment we should consider different options. How many shards? How are we going to shard the data? Custom sharding or equal sharding? Of course, all answers depend on the exact data set, size, load and so on. In our case, given we have 10 warehouses, let’s do equal sharding on 5 shards, each will contain a primary vttablet and one replica. First, we have to spin up those vttablets and we can do that by applying 208_shard_newsbtest.yaml file from our GitHub repository.
root@k8smaster:~/vitesstests# kubectl apply -f 208_shard_newsbtest.yaml vitesscluster.planetscale.com/vitesstpcc configured secret/example-cluster-config configured
After a while you should see new vttablets created:
root@k8smaster:~/vitesstests# vtctlclient listalltablets zone1-0271120913 newsbtest 66-99 primary 10.244.1.43:15000 10.244.1.43:3306 [] 2021-11-28T19:33:03Z zone1-0600961333 newsbtest 33-66 primary 10.244.2.251:15000 10.244.2.251:3306 [] 2021-11-28T19:28:14Z zone1-1542446643 newsbtest 99-cc primary 10.244.1.44:15000 10.244.1.44:3306 [] 2021-11-28T19:33:51Z zone1-1678750924 newsbtest 99-cc replica 10.244.3.51:15000 10.244.3.51:3306 [] <null> zone1-1680719348 newsbtest -33 primary 10.244.1.39:15000 10.244.1.39:3306 [] 2021-11-28T19:28:25Z zone1-1807238346 newsbtest 33-66 replica 10.244.2.252:15000 10.244.2.252:3306 [] <null> zone1-1817996704 newsbtest - primary 10.244.2.248:15000 10.244.2.248:3306 [] 2021-11-23T20:41:12Z zone1-1983245532 newsbtest -33 replica 10.244.2.253:15000 10.244.2.253:3306 [] <null> zone1-2086446713 items - replica 10.244.3.45:15000 10.244.3.45:3306 [] <null> zone1-2610868670 newsbtest 66-99 replica 10.244.3.50:15000 10.244.3.50:3306 [] <null> zone1-2706617079 items - primary 10.244.1.35:15000 10.244.1.35:3306 [] 2021-11-28T09:12:32Z zone1-3151213148 items - replica 10.244.2.249:15000 10.244.2.249:3306 [] <null> zone1-3658524143 newsbtest cc- primary 10.244.1.42:15000 10.244.1.42:3306 [] 2021-11-28T19:32:52Z zone1-3776076386 newsbtest cc- replica 10.244.1.45:15000 10.244.1.45:3306 [] <null>
As you can see, we have now sharded newsbtest keyspace with five shards: -33,33-66,66-99,99-cc and cc-
The next step will be to start the reshard process.
Resharding newsbtest keyspace
To reshard the keyspace the only thing we have to do is to run the following vtctlclient command:
root@k8smaster:~/vitesstests# vtctlclient Reshard -source_shards '-' -target_shards '-33,33-66,66-99,99-cc,cc-' Create newsbtest.reshard Reshard Error: rpc error: code = Unknown desc = copySchema: creating a table failed. Most likely some tables already exist on the destination and differ from the source. Please remove all to be copied tables from the destination manually and run this command again. Full error: rpc error: code = Unknown desc = TabletManager.ExecuteFetchAsDba on zone1-0271120913 error: rpc error: code = Unknown desc = Failed to open the referenced table 'district1' (errno 1824) (sqlstate HY000) during query: CREATE TABLE `vt_newsbtest`.`customer1` ( `c_id` int NOT NULL, `c_d_id` tinyint NOT NULL, `c_w_id` smallint NOT NULL, `c_first` varchar(16) DEFAULT NULL, `c_middle` char(2) DEFAULT NULL, `c_last` varchar(16) DEFAULT NULL, `c_street_1` varchar(20) DEFAULT NULL, `c_street_2` varchar(20) DEFAULT NULL, `c_city` varchar(20) DEFAULT NULL, `c_state` char(2) DEFAULT NULL, `c_zip` char(9) DEFAULT NULL, `c_phone` char(16) DEFAULT NULL, `c_since` datetime DEFAULT NULL, `c_credit` char(2) DEFAULT NULL, `c_credit_lim` bigint DEFAULT NULL, `c_discount` decimal(4,2) DEFAULT NULL, `c_balance` decimal(12,2) DEFAULT NULL, `c_ytd_payment` decimal(12,2) DEFAULT NULL, `c_payment_cnt` smallint DEFAULT NULL, `c_delivery_cnt` smallint DEFAULT NULL, `c_data` text, PRIMARY KEY (`c_w_id`,`c_d_id`,`c_id`), KEY `idx_customer1` (`c_w_id`,`c_d_id`,`c_last`,`c_first`), CONSTRAINT `fkey_customer_1_1` FOREIGN KEY (`c_w_id`, `c_d_id`) REFERENCES `district1` (`d_w_id`, `d_id`) . . .
Aaand it’s gone… Ok, it didn’t work out. As you may have guessed, the problem is caused by foreign keys. This is actually quite a good example why database engineers do not like foreign keys and why, in a perfect world, as DBE’s imagine it, all constraints are implemented on the application side. So, we cannot create tables in random order due to foreign key constraints. There is no (or at least we haven’t found) an option to define the table order for reshard workflow. Luckily, there is an option we can use – temporarily disable the foreign key checks for the duration of the resharding. We can do that, for example, by adding SET GLOBAL foreign_key_checks=0; to the database initialization script defined in the cluster definition file. We have something like that in the 209_shard_newsbtest_ignore_constrains.yaml file.
If pods are not recreated after applying this file, you can always delete them using kubectl delete … and they will be recreated with the proper initialization script. We can verify that the variable is set properly by logging to one of the MySQL containers and checking the variable state:
mysql> SHOW GLOBAL VARIABLES LIKE 'foreign_key_checks'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | foreign_key_checks | OFF | +--------------------+-------+ 1 row in set (0.00 sec)
Once this is verified we can start the resharding flow:
root@k8smaster:~/vitesstests# vtctlclient Reshard -source_shards '-' -target_shards '-33,33-66,66-99,99-cc,cc-' Create newsbtest.reshard Waiting for workflow to start: 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... E1128 21:04:56.756314 2910923 main.go:67] E1128 21:04:56.702667 vtctl.go:2554] workflow did not start within 30s Reshard Error: rpc error: code = Unknown desc = workflow did not start within 30s E1128 21:04:56.756481 2910923 main.go:76] remote error: rpc error: code = Unknown desc = workflow did not start within 30s
As usual, it did not start within 30 seconds but in reality it has started:
root@k8smaster:~/vitesstests# vtctlclient Reshard Progress newsbtest.reshard Copy Progress (approx): stock4: rows copied 0/934252 (0%), size copied 81920/360710144 (0%) new_orders8: rows copied 0/88182 (0%), size copied 81920/3686400 (2%) history2: rows copied 0/302808 (0%), size copied 81920/21544960 (0%) warehouse7: rows copied 0/8 (0%), size copied 81920/16384 (500%) orders3: rows copied 0/304748 (0%), size copied 81920/14172160 (0%) . . . order_line10: rows copied 0/2734687 (0%), size copied 81920/199049216 (0%) customer3: rows copied 0/270038 (0%), size copied 81920/189612032 (0%) Following vreplication streams are running for workflow newsbtest.reshard: id=1 on 99-cc/zone1-1542446643: Status: Copying. VStream Lag: 0s. id=1 on 33-66/zone1-0600961333: Status: Copying. VStream Lag: 0s. id=1 on 66-99/zone1-0271120913: Status: Copying. VStream Lag: 0s. id=1 on -33/zone1-1680719348: Status: Copying. VStream Lag: 0s. id=1 on cc-/zone1-3658524143: Status: Copying. VStream Lag: 0s.
Looks good, the resharding is in progress.
root@k8smaster:~/vitesstests# vtctlclient Reshard Progress newsbtest.reshard Copy Completed. Following vreplication streams are running for workflow newsbtest.reshard: id=1 on -33/zone1-1680719348: Status: Running. VStream Lag: -1s. Tx time: Sun Nov 28 22:52:25 2021. id=1 on 33-66/zone1-0600961333: Status: Running. VStream Lag: -1s. Tx time: Sun Nov 28 22:52:25 2021. id=1 on 99-cc/zone1-1542446643: Status: Running. VStream Lag: -1s. Tx time: Sun Nov 28 22:52:25 2021. id=1 on cc-/zone1-3658524143: Status: Running. VStream Lag: -1s. Tx time: Sun Nov 28 22:52:24 2021. id=1 on 66-99/zone1-0271120913: Status: Running. VStream Lag: -1s. Tx time: Sun Nov 28 22:52:25 2021.
Once it is done, we want to re-enable foreign key checks. We can do it in multiple ways. We could log in to all MySQL docker containers and execute SET GLOBAL foreign_key_checks=ON;.
We can also backup the existing vttablets for the sharded keyspace, edit the initialization script and re-create pods with the new init script. We have decided to go with the execution of the SET GLOBAL on our docker containers:
root@k8snode1:~# for id in $(docker ps | grep mysqlctld | awk '{print $1}') ; do echo ${id} ; docker exec -it ${id} /usr/bin/mysql -S /vt/socket/mysql.sock -uroot -e "SET GLOBAL foreign_key_checks=ON;" ; done
9d9cee2333fd
f87b9ed5c584
8c3bf2fdd177
9f952d242e43
6c537431bd87
root@k8snode1:~# for id in $(docker ps | grep mysqlctld | awk '{print $1}') ; do echo ${id} ; docker exec -it ${id} /usr/bin/mysql -S /vt/socket/mysql.sock -uroot -e "SHOW GLOBAL VARIABLES LIKE 'foreign_key_checks';" ; done
9d9cee2333fd
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| foreign_key_checks | ON |
+--------------------+-------+
f87b9ed5c584
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| foreign_key_checks | ON |
+--------------------+-------+
8c3bf2fdd177
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| foreign_key_checks | ON |
+--------------------+-------+
9f952d242e43
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| foreign_key_checks | ON |
+--------------------+-------+
6c537431bd87
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| foreign_key_checks | ON |
+--------------------+-------+
root@k8snode1:~#
This example is for one of the nodes in our Kubernetes cluster, you want to do the same for every node that you have.
At this point of time we also should consider running some tests to verify that our application will actually work using Vitess. We have found that Sysbench TPC-C will not work out of the box. Some changes were required in order for Vitess to be able to parse the query properly and execute it the way it should be. The simplest issues were related to the JOIN syntax – Vitess doesn’t seem to understand SELECT something FROM table1, table2, WHERE table1.id = 1 AND table2.id = 1. On the other hand, if we use FROM table1 JOIN table2 ON table1.id = table2.id, this will work just fine. Below are two examples of queries that we had to rewrite:
SELECT COUNT(DISTINCT (s_i_id))
FROM order_line4, stock4
WHERE ol_w_id = 5
AND ol_d_id = 4
AND ol_o_id < 3080
AND ol_o_id >= 3060
AND s_w_id= 5
AND s_i_id=ol_i_id
AND s_quantity < 11
SELECT c_discount, c_last, c_credit, w_tax
FROM customer7, warehouse7
WHERE w_id = 2
AND c_w_id = w_id
AND c_d_id = 4
AND c_id = 1368
The third example is from a query where JOIN was not done explicitly on the shard key. The WHERE clause still contained information about shard:
SELECT COUNT(DISTINCT (s_i_id))
FROM order_line4 AS ol JOIN stock4 AS s
ON ol.ol_i_id=s.s_i_id
WHERE ol.ol_w_id = 5
AND ol.ol_d_id = 2
AND ol.ol_o_id < 3074
AND ol.ol_o_id >= 3054
AND s.s_w_id= 5
AND s.s_quantity < 16
This query has already been rewritten to JOIN form but Vitess did not understand we are talking about only one particular shard and it complained about a cross-shard query with aggregation of data, something which is not yet supported by Vitess. Luckily, thanks to suggestion from Matt Lord from PlanetScale team, adding second condition into ON clause, something that will show Vitess’ query planer that the shard key is indeed used in this query, was enough to get it running:
SELECT COUNT(DISTINCT (s.s_i_id))
FROM order_line5 AS ol JOIN stock5 AS s
ON (ol.ol_i_id=s.s_i_id AND ol.ol_w_id=s.s_w_id)
WHERE ol.ol_w_id = 3
AND ol.ol_d_id = 1
AND ol.ol_o_id < 3088
AND ol.ol_o_id >= 3068
AND s.s_w_id= 3
AND s.s_quantity < 10
After verifying that the application will not break we are clear to switch the traffic to the sharded keyspace:
root@k8smaster:~/vitesstests# vtctlclient Reshard SwitchTraffic newsbtest.reshard I1129 14:30:36.390049 3358189 main.go:67] I1129 14:30:36.345738 traffic_switcher.go:356] About to switchShardReads: [], [REPLICA RDONLY], 0 I1129 14:30:36.435565 3358189 main.go:67] I1129 14:30:36.391363 keyspace.go:40] RefreshTabletsByShard called on shard newsbtest/-33 I1129 14:30:36.446435 3358189 main.go:67] I1129 14:30:36.402247 keyspace.go:69] Calling RefreshState on tablet zone1-1983245532 I1129 14:30:36.446460 3358189 main.go:67] I1129 14:30:36.402263 keyspace.go:69] Calling RefreshState on tablet zone1-1680719348 I1129 14:30:36.487177 3358189 main.go:67] I1129 14:30:36.430249 keyspace.go:40] RefreshTabletsByShard called on shard newsbtest/33-66 I1129 14:30:36.533911 3358189 main.go:67] I1129 14:30:36.489755 keyspace.go:69] Calling RefreshState on tablet zone1-1807238346 I1129 14:30:36.533965 3358189 main.go:67] I1129 14:30:36.489826 keyspace.go:69] Calling RefreshState on tablet zone1-0600961333 I1129 14:30:36.591816 3358189 main.go:67] I1129 14:30:36.546742 keyspace.go:40] RefreshTabletsByShard called on shard newsbtest/66-99 I1129 14:30:36.626138 3358189 main.go:67] I1129 14:30:36.582170 keyspace.go:69] Calling RefreshState on tablet zone1-2610868670 I1129 14:30:36.626216 3358189 main.go:67] I1129 14:30:36.582188 keyspace.go:69] Calling RefreshState on tablet zone1-0271120913 I1129 14:30:36.701324 3358189 main.go:67] I1129 14:30:36.652506 keyspace.go:40] RefreshTabletsByShard called on shard newsbtest/99-cc I1129 14:30:36.731112 3358189 main.go:67] I1129 14:30:36.687074 keyspace.go:69] Calling RefreshState on tablet zone1-1542446643 I1129 14:30:36.731139 3358189 main.go:67] I1129 14:30:36.687105 keyspace.go:69] Calling RefreshState on tablet zone1-1678750924 I1129 14:30:36.761661 3358189 main.go:67] I1129 14:30:36.716991 keyspace.go:40] RefreshTabletsByShard called on shard newsbtest/cc- . . . I1129 14:30:37.775382 3358189 main.go:67] I1129 14:30:37.719165 traffic_switcher.go:1524] Marking target streams frozen for workflow reshard db_name vt_newsbtest I1129 14:30:37.775418 3358189 main.go:67] I1129 14:30:37.719174 traffic_switcher.go:1524] Marking target streams frozen for workflow reshard db_name vt_newsbtest I1129 14:30:37.775427 3358189 main.go:67] I1129 14:30:37.719191 traffic_switcher.go:1524] Marking target streams frozen for workflow reshard db_name vt_newsbtest I1129 14:30:37.775434 3358189 main.go:67] I1129 14:30:37.719203 traffic_switcher.go:1524] Marking target streams frozen for workflow reshard db_name vt_newsbtest I1129 14:30:37.775441 3358189 main.go:67] I1129 14:30:37.719288 traffic_switcher.go:1524] Marking target streams frozen for workflow reshard db_name vt_newsbtest SwitchTraffic was successful for workflow newsbtest.reshard Start State: Reads Not Switched. Writes Not Switched Current State: All Reads Switched. Writes Switched
Once this is done, we can complete the reshard process:
root@k8smaster:~/vitesstests# vtctlclient Reshard Complete newsbtest.reshard I1129 14:31:14.086312 2254761 main.go:67] I0926 22:14:30.824644 traffic_switcher.go:1509] Deleting shard newsbtest.- I1129 14:31:14.096583 2254761 main.go:67] I0926 22:14:30.835135 shard.go:197] Deleting all tablets in shard newsbtest/- cell zone1 I1129 14:31:14.096612 2254761 main.go:67] I0926 22:14:30.835167 shard.go:201] Deleting tablet zone1-3154448388 I1129 14:31:14.151426 2254761 main.go:67] I0926 22:14:30.889905 shard.go:201] Deleting tablet zone1-1817996704 I1129 14:31:14.183772 2254761 main.go:67] I0926 22:14:30.922262 traffic_switcher.go:1515] Deleted shard newsbtest.- I1129 14:31:14.183994 2254761 main.go:67] I0926 22:14:30.922308 traffic_switcher.go:1546] Deleting reverse streams for workflow reshard db_name vt_newsbtest I1129 14:31:14.306691 2254761 main.go:67] I1129 14:31:14.044941 traffic_switcher.go:1537] Deleting target streams for workflow reshard db_name vt_newsbtest I1129 14:31:14.306726 2254761 main.go:67] I1129 14:31:14.044961 traffic_switcher.go:1537] Deleting target streams for workflow reshard db_name vt_newsbtest I1129 14:31:14.306739 2254761 main.go:67] I1129 14:31:14.044985 traffic_switcher.go:1537] Deleting target streams for workflow reshard db_name vt_newsbtest Complete was successful for workflow newsbtest.reshard Start State: All Reads Switched. Writes Switched Current State: Workflow Not Found
This concludes the migration. We have a running and working environment with data provisioned by Sybench-TPCC and sharded using Vitess. As you can see, sharding will almost never be transparent to the application. Most likely you will have to implement some sort of changes in the queries or even in the logic. Still, in the majority of cases it should be doable to migrate to Vitess and use it to scale out.