Backup and Restore Overview
This document describes how to perform backup and restore on the TiDB cluster on Kubernetes. To back up and restore your data, you can use the Dumpling, TiDB Lightning, and Backup & Restore (BR) tools.
Dumpling is a data export tool, which exports data stored in TiDB or MySQL as SQL or CSV data files. You can use Dumpling to make a logical full backup or export.
TiDB Lightning is a tool used for fast full data import into a TiDB cluster. TiDB Lightning supports Dumpling or CSV format data source. You can use TiDB Lightning to make a logical full data restore or import.
BR is a command-line tool for distributed backup and restoration of the TiDB cluster data. Compared with Dumpling and Mydumper, BR is more suitable for huge data volumes. BR only supports TiDB v3.1 and later versions. For incremental backup insensitive to latency, refer to BR Overview. For real-time incremental backup, refer to TiCDC.
Usage scenarios
Back up data
If you have the following backup needs, you can use BR to make a backup of your TiDB cluster data:
- To back up a large volume of data (more than 1 TB) at a fast speed
- To get a direct backup of data as SST files (key-value pairs)
- To perform incremental backup that is insensitive to latency
Refer to the following documents for more information:
- Back up Data to S3-Compatible Storage Using BR
- Back up Data to GCS Using BR
- Back up Data to Azure Blob Storage Using BR
- Back up Data to PV Using BR
- Back up Data Using EBS Snapshots across Multiple Kubernetes
If you have the following backup needs, you can use Dumpling to make a backup of the TiDB cluster data:
- To export SQL or CSV files
- To limit the memory usage of a single SQL statement
- To export the historical data snapshot of TiDB
Refer to the following documents for more information:
Restore data
To recover the SST files exported by BR to a TiDB cluster, use BR. Refer to the following documents for more information:
- Restore Data from S3-Compatible Storage Using BR
- Restore Data from GCS Using BR
- Restore Data from Azure Blob Storage Using BR
- Restore Data from PV Using BR
- Restore Data Using EBS Snapshots across Multiple Kubernetes
To restore data from SQL or CSV files exported by Dumpling or other compatible data sources to a TiDB cluster, use TiDB Lightning. Refer to the following documents for more information:
- Restore Data from S3-Compatible Storage Using TiDB Lightning
- Restore Data from GCS Using TiDB Lightning
Backup and restore process
To make a backup of the TiDB cluster on Kubernetes, you need to create a Backup
CR object to describe the backup or create a BackupSchedule
CR object to describe a scheduled backup.
To restore data to the TiDB cluster on Kubernetes, you need to create a Restore
CR object to describe the restore.
After creating the CR object, according to your configuration, TiDB Operator chooses the corresponding tool and performs the backup or restore.
Delete the Backup CR
You can delete the Backup
CR or BackupSchedule
CR by running the following commands:
kubectl delete backup ${name} -n ${namespace}
kubectl delete backupschedule ${name} -n ${namespace}
If you use TiDB Operator v1.1.2 or an earlier version, or if you use TiDB Operator v1.1.3 or a later version and set the value of spec.cleanPolicy
to Delete
, TiDB Operator cleans the backup data when it deletes the CR.
If you back up cluster data using AWS EBS volume snapshots and set the value of spec.cleanPolicy
to Delete
, TiDB Operator deletes the CR, and cleans up the backup file and the volume snapshots on AWS.
In such cases, if you need to delete the namespace, it is recommended that you first delete all the Backup
/BackupSchedule
CRs and then delete the namespace.
If you delete the namespace before you delete the Backup
/BackupSchedule
CR, TiDB Operator will keep creating jobs to clean the backup data. However, because the namespace is in Terminating
state, TiDB Operator fails to create such a job, which causes the namespace to be stuck in this state.
To address this issue, delete finalizers
by running the following command:
kubectl patch -n ${namespace} backup ${name} --type merge -p '{"metadata":{"finalizers":[]}}'
Clean backup data
For TiDB Operator v1.2.3 and earlier versions, TiDB Operator cleans the backup data by deleting the backup files one by one.
For TiDB Operator v1.2.4 and later versions, TiDB Operator cleans the backup data by deleting the backup files in batches. For the batch deletion, the deletion methods are different depending on the type of backend storage used for backups.
- For the S3-compatible backend storage, TiDB Operator uses the concurrent batch deletion method, which deletes files in batch concurrently. TiDB Operator starts multiple goroutines concurrently, and each goroutine uses the batch delete API "DeleteObjects" to delete multiple files.
- For other types of backend storage, TiDB Operator uses the concurrent deletion method, which deletes files concurrently. TiDB Operator starts multiple goroutines, and each goroutine deletes one file at a time.
For TiDB Operator v1.2.4 and later versions, you can configure the following fields in the Backup CR to control the clean behavior:
.spec.cleanOption.pageSize
: Specifies the number of files to be deleted in each batch at a time. The default value is 10000..spec.cleanOption.disableBatchConcurrency
: If the value of this field istrue
, TiDB Operator disables the concurrent batch deletion method and uses the concurrent deletion method.If your S3-compatible backend storage does not support the
DeleteObjects
API, the default concurrent batch deletion method fails. You need to configure this field totrue
to use the concurrent deletion method..spec.cleanOption.batchConcurrency
: Specifies the number of goroutines to start for the concurrent batch deletion method. The default value is10
..spec.cleanOption.routineConcurrency
: Specifies the number of goroutines to start for the concurrent deletion method. The default value is100
.