MySQL – What are backup solutions for MySQL?

Most of you are probably aware of the simple fact that the database backups are quite useful. If it happens that someone forgets a WHERE clause in an UPDATE query, backups may be the way to recover your business. Let’s just assume that we do not need to explain “why?” we should take backups, let’s focus on “how?”. Replication is not a backup – let’s make it clear. Aforementioned UPDATE without the WHERE clause will happily replicate to all of the replicas, eventually even to the delayed ones. Please keep in mind that this is a very introductory post. Backups are complex beasts and they touch pretty much every aspect of database operations. Security, encryption, performance, ensuring the operations continuity. In the future we will try to dig into those aspects as well. For now, though, let’s keep it simple and start with the overview of the backup options that exist for MySQL.

What is a backup?

Backup, in short, is a copy of all of the data from the database. It is like a snapshot of the state of the database in a given moment of time. Well, ideally. We’ll go into details later.

Types of backup

There are two types of backup, depending on the format the data has been collected in: logical backup and physical backup.

Logical backup

Logical backup is a dataset stored in a form of text. Typically SQL statements but it can also be CSV format. For MySQL the most common logical backup tool is mysqldump. It is quite versatile and flexible, it can be used in different scenarios. Another example of a tool that can perform logical backups of MySQL databases is MyDumper/MyLoader, a pair of scripts that can be used to perform the logical backup using parallel dumper threads. The main advantage of the logical backup is that it can be used for partial restore. A user, in majority of the cases, can just take a look at the backup file and extract either required INSERT statements or grab the data in some other way.

On the other hand, logical backups come with the heaviest overhead – the fact that they use queries slows things down significantly. Queries have to go through the database which means that all of the elements and bits that the query has to go through, are involved. SQL parser, optimizer, storage engine. All of it adds the overhead. The overhead might be significant and, as a result, dealing with large data sets using logical backups is not necessarily feasible. Yes, there are logical backup tools that can speed up the process by parallelizing some of the operations, but no matter which tool you would use, it will always be significantly slower than physical backups.

Physical backup

What is the physical backup? In short, it is a process of copying the data as it is stored on the disk drive. Compared to the logical backups, here we are, to some extent, bypassing the database and just focusing on the raw data stored on the disk. There are different ways of how the physical backup can be taken and it mostly depends on the infrastructure that is used by the database servers. Definitely the biggest advantage of using physical backups is the speed. The details depend on the exact method of executing the backup but in majority of the cases the backup performance is a function of the infrastructure performance. In short – the backup will take as much time as is necessary to copy the data from the database server to the final destination. If you keep the backup on the same host, disk speed will be the most limiting factor. If you want to stream the backup to another node, over the network, network will most likely be the bottleneck. If you attempt to compress or encrypt the backup, CPU performance will also play some role. Still, no matter how you take the backup, you can expect it to complete way faster than the logical backup.

On top of that, some of the backup methods allow you to execute incremental backups – instead of copying everything over and over again it might be possible to copy only the part of the data that has changed since the last backup. This, obviously, will speed up the backup process even further (albeit slows down the restore – you have to restore the full backup and then all incremental backups that happened afterwards).

There are also disadvantages. The main one is granularity. Again, it depends on the backup method but the ability to quickly restore a very small subset of data is almost non-existent. In some cases you may be able to restore a table but that’s the best you can hope for while for logical backups you can restore even a single row.

Another significant aspect of physical backups that you have to keep in mind is data consistency. Backup has to be consistent to be usable. It should contain the consistent data set at the given point of time. Only such backup can be used to build new nodes or rebuild existing ones. It is also the only way a backup can be usable for the application – apps, typically, care about the data consistency and if you insert, for example, data about a new product to couple of tables (product description, warehouse stock, pricing etc) while the backup is running, the application would expect the data to exist in all of those tables or in none of them once you restore the backup. It will not be happy if there’ll be data in product and pricing tables but no info in the warehouse. How to achieve the consistency, depends on the backup method and the database configuration. In some cases the backup tool will take care of it, in some other cases it will be up to the database to provide a consistent state. 

Backup methods and tools

In this section we would like to go over some of the tools and techniques that can be used to backup MySQL databases. This will not be an in-depth guide to all of them but it should give you some ideas about the characteristics of each of the methods and what to keep in mind while working with them.

Logical backups

mysqldump

Let’s start with the logical backups and the most commonly used tool is the mysqldump. It is a slow way of running logical backups but feature-wise it is great. It dumps the data into several formats, including INSERT (one statement per row), extended INSERT (multi-row inserts – faster but harder to extract single row data) and any format supported by SELECT … INTO OUTFILE, for example, CSV (comma separated values). You can take backups of separate databases, separate tables or you can run the whole backup as a transaction, utilizing InnoDB MultiVersion Concurrency Control and transaction isolation levels to provide a consistent backup of the whole database. It is also possible to attach information about the exact point of time at which the backup is consistent – it’s done through attaching the binary log file and position or Global Transaction ID as the metadata to the backup.

mydumper/myloader

It is very similar to mysqldump with one notable difference – it can dump and load the data in parallel. Multiple threads will be, most of the time, faster than a single thread, whether it is reading the data or writing it to the database.

Mydumper is designed to provide consistent backup of the whole database. It first locks the whole traffic on the server and then lets all of the worker threads start the transaction. Once this is done, the lock is released but all worker threads maintain the consistency through the same means as in mysqldump case – MVCC and transaction isolation levels. This makes it possible for mydumper to create backups that can be used to provision new nodes in the replication setup.

Physical backups

XtraBackup/MariaBackup

Let’s start with those two tools. XtraBackup has been created and is maintained by Percona. MariaBackup is a fork of XtraBackup that has been modified to be compatible with MariaDB. XtraBackup is a hot backup solution for MySQL. How it works is quite simple. It copies the physical files from the database server while monitoring the InnoDB redo log and keeping track of any updates to the database. Once the file copy process finishes, the lock is taken to finish the backup process and make sure that the backup is consistent at the given point of time. As part of the restore process data from InnoDB redo log is merged into the data collected from the data files and, through this, makes the backup consistent.

When we are talking about the physical backups, we should be concerned about their consistency. Does the tool take the backup in a way the consistency is provided? Does it rely on the external configuration, both MySQL and operating system? In the case of XtraBackup it is the tool itself which ensures that the backup is consistent. You do not need to perform any kind of manual steps on the database side, all is taken care of. XtraBackup is also an universal tool – it does not rely on any particular infrastructure or hardware. This makes it quite a popular backup tool.

Disk snapshots

Another quite popular method of taking backups is to do that using filesystem snapshots. Cloud deployments are common these days, private clouds built using OpenStack can also be frequently seen. Databases are deployed on top of block storage backed by Storage Area Network. It allows for disk snapshots to be performed and snapshots can be used to back up a MySQL database. There are a couple of considerations if you happen to use snapshots for the database backup. First, taking a snapshot pretty much equals shutting down the database server: a sudden disruption of the operations, pulling the plug from the outlet. This leads to not finished writes, interrupted transactions, buffer pool not being flushed properly to the disk.  What it means for us is that the database server where the snapshot has been taken has to be configured to handle recovery of those problems.

MySQL can be set up in a way that the data will be consistent after recovering from the crash but it may require a couple of additional configuration settings that are not used by default or that may result in a performance drop. Another option would be to stop a MySQL server in a clean fashion and only then take the snapshot of its data directory. As long as you can allow for that, it might be a good solution that provides clean, consistent backups.

LVM, even though it relies solely on the software only, can also be treated as a snapshot and, if you want to use it to take MySQL backups, you also should consider MySQL configuration and see if it can recover clearly using data from the snapshot.

Copying the data directory

Another example of how the backups can be performed is to use existing tools in the operating system to copy the data from one location to another. We are talking here about tools like cp, scp or rsync. They might be the core of a set of backup scripts written by the database administrator. If you plan to use this approach, you have to keep in mind that the backup has to be consistent. It is not possible to copy the data just like that, without stopping the database traffic and flushing everything to disk. If you can afford to have a dedicated MySQL node that can be stopped for the time of the backup, this option becomes viable. Alternatively you can try to use rsync to do the first copy, then stop the database and synchronize data by running rsync again. This approach has some advantages as rsync can copy only the modified data, making the downtime shorter.

This is a very brief introduction to different backup methods that are available for MySQL. We hope you found it useful. We will take a closer look at the backup process in future blog posts. Backups are very important aspects of the database management and it’s paramount to take them and manage them correctly.