📣

TiDB Cloud Serverless is now
TiDB Cloud Starter
! Same experience, new name.
Try it out →

Backup and Restore Overview

This document describes how to perform backup and restore on the TiDB cluster on Kubernetes. To back up and restore your data, you can use the Dumpling, TiDB Lightning, and Backup & Restore (BR) tools.

Dumpling is a data export tool, which exports data stored in TiDB or MySQL as SQL or CSV data files. You can use Dumpling to make a logical full backup or export.

TiDB Lightning is a tool used for fast full data import into a TiDB cluster. TiDB Lightning supports Dumpling or CSV format data source. You can use TiDB Lightning to make a logical full data restore or import.

BR is a command-line tool for distributed backup and restoration of the TiDB cluster data. Compared with Dumpling and Mydumper, BR is more suitable for huge data volumes. BR only supports TiDB v3.1 and later versions. For incremental backup insensitive to latency, refer to BR Overview. For real-time incremental backup, refer to TiCDC.

Usage scenarios

Back up data

If you have the following backup needs, you can use BR to make a backup of your TiDB cluster data:

  • To back up a large volume of data (more than 1 TiB) at a fast speed
  • To get a direct backup of data as SST files (key-value pairs)
  • To perform incremental backup that is insensitive to latency

For more information, see the following documents:

Restore data

To recover the SST files exported by BR to a TiDB cluster, use BR. For more information, see the following documents:

Backup and restore process

To make a backup of the TiDB cluster on Kubernetes, you need to create a Backup CR object to describe the backup or create a BackupSchedule CR object to describe a scheduled backup.

To restore data to the TiDB cluster on Kubernetes, you need to create a Restore CR object to describe the restore.

After creating the CR object, according to your configuration, TiDB Operator chooses the corresponding tool and performs the backup or restore.

Delete the Backup CR

You can delete the Backup CR or BackupSchedule CR by running the following commands:

kubectl delete backup ${name} -n ${namespace} kubectl delete backupschedule ${name} -n ${namespace}

If you set the value of spec.cleanPolicy to Delete, TiDB Operator cleans the backup data when it deletes the CR.

TiDB Operator automatically attempts to stop running log backup tasks when you delete the Custom Resource (CR). This automatic stop feature only applies to log backup tasks that are running normally and does not handle tasks in an error or failed state.

In such cases, if you need to delete the namespace, it is recommended that you first delete all the Backup or BackupSchedule CRs and then delete the namespace.

If you delete the namespace before you delete the Backup or BackupSchedule CR, TiDB Operator will keep creating jobs to clean the backup data. However, because the namespace is in Terminating state, TiDB Operator fails to create such a job, which causes the namespace to be stuck in this state.

To address this issue, delete finalizers by running the following command:

kubectl patch -n ${namespace} backup ${name} --type merge -p '{"metadata":{"finalizers":[]}}'

Clean backup data

TiDB Operator cleans the backup data by deleting the backup files in batches. For the batch deletion, the deletion methods are different depending on the type of backend storage used for backups.

  • For the S3-compatible backend storage, TiDB Operator uses the concurrent batch deletion method, which deletes files in batch concurrently. TiDB Operator starts multiple goroutines concurrently, and each goroutine uses the batch delete API DeleteObjects to delete multiple files.
  • For other types of backend storage, TiDB Operator uses the concurrent deletion method, which deletes files concurrently. TiDB Operator starts multiple goroutines, and each goroutine deletes one file at a time.

You can configure the following fields in the Backup CR to control the clean behavior:

  • .spec.cleanOption.pageSize: Specifies the number of files to be deleted in each batch at a time. The default value is 10000.

  • .spec.cleanOption.disableBatchConcurrency: If the value of this field is true, TiDB Operator disables the concurrent batch deletion method and uses the concurrent deletion method.

    If your S3-compatible backend storage does not support the DeleteObjects API, the default concurrent batch deletion method fails. You need to configure this field to true to use the concurrent deletion method.

  • .spec.cleanOption.batchConcurrency: Specifies the number of goroutines to start for the concurrent batch deletion method. The default value is 10.

  • .spec.cleanOption.routineConcurrency: Specifies the number of goroutines to start for the concurrent deletion method. The default value is 100.

Was this page helpful?