Back Up a TiDB Cluster across Multiple Kubernetes Using EBS Volume Snapshots
This document describes how to back up the data of a TiDB cluster deployed across multiple AWS Kubernetes clusters to AWS storage using EBS volume snapshots.
The backup method described in this document is implemented based on CustomResourceDefinition (CRD) in BR Federation and TiDB Operator. BR (Backup & Restore) is a command-line tool for distributed backup and recovery of the TiDB cluster data. For the underlying implementation, BR gets the backup data of the TiDB cluster, and then sends the data to the AWS storage.
Usage scenarios
If you have the following requirements when backing up TiDB cluster data, you can use TiDB Operator to back up the data using volume snapshots and metadata to Amazon S3:
- Minimize the impact of backup, such as keeping the impact on QPS and transaction latency less than 5%, and not utilizing cluster CPU and memory.
- Back up and restore data in a short period of time. For example, completing a backup within 1 hour and restore it within 2 hours.
If you have any other requirements, refer to Backup and Restore Overview and select an appropriate backup method.
Prerequisites
Storage blocks on volumes that were created from snapshots must be initialized (pulled down from Amazon S3 and written to the volume) before you can access the block. This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed. Volume performance is achieved after all blocks have been downloaded and written to the volume.
According to AWS documentation, the EBS volume restored from snapshots might have high latency before it is initialized. This can impact the performance of a restored TiDB cluster. See details in Create a volume from a snapshot.
To initialize the restored volume more efficiently, it is recommended to separate WAL and raft log into a dedicated small volume apart from TiKV data. By fully initializing the volume of WAL and raft log separately, we can enhance write performance for a restored TiDB cluster.
Limitations
- Snapshot backup is applicable to TiDB Operator v1.5.2 or later versions, and TiDB v6.5.8 or later versions.
- For TiKV configuration, do not set
resolved-ts.enable
tofalse
, and do not setraftstore.report-min-resolved-ts-interval
to"0s"
. Otherwise, it can lead to backup failure. - For PD configuration, do not set
pd-server.min-resolved-ts-persistence-interval
to"0s"
. Otherwise, it can lead to backup failure. - To use this backup method, the TiDB cluster must be deployed on AWS EC2 and use AWS EBS volumes.
- This backup method is currently not supported for TiFlash, TiCDC, DM, and TiDB Binlog nodes.
Ad-hoc backup
You can either fully or incrementally back up snapshots based on AWS EBS volumes. The initial backup of a node is full backup, while subsequent backups are incremental backup.
Snapshot backup is defined in a customized VolumeBackup
custom resource (CR) object. The BR Federation completes the backup task according to the specifications in this object.
Step 1. Set up the environment for EBS volume snapshot backup in every data plane
You must execute the following steps in every data plane.
Download the
backup-rbac.yaml
file to the backup server.If you have deployed the TiDB cluster in
${namespace}
, create the RBAC-related resources required for the backup in this namespace by running the following command:kubectl apply -f backup-rbac.yaml -n ${namespace}Grant permissions to access remote storage.
To back up cluster data and save snapshot metadata to Amazon S3, you need to grant permissions to remote storage. Refer to AWS account authorization for the three available methods.
Step 2. Back up data to S3 storage
You must execute the following steps in the control plane.
Depending on the authorization method you choose in the previous step for granting remote storage access, you can back up data by EBS snapshots using any of the following methods accordingly:
- AK/SK
- IAM role with Pod
- IAM role with ServiceAccount
If you grant permissions by accessKey and secretKey, you can create the VolumeBackup
CR as follows:
kubectl apply -f backup-fed.yaml
The backup-fed.yaml
file has the following content:
---
apiVersion: federation.pingcap.com/v1alpha1
kind: VolumeBackup
metadata:
name: ${backup-name}
spec:
clusters:
- k8sClusterName: ${k8s-name1}
tcName: ${tc-name1}
tcNamespace: ${tc-namespace1}
- k8sClusterName: ${k8s-name2}
tcName: ${tc-name2}
tcNamespace: ${tc-namespace2}
- ... # other clusters
template:
br:
sendCredToTikv: true
s3:
provider: aws
secretName: s3-secret
region: ${region-name}
bucket: ${bucket-name}
prefix: ${backup-path}
toolImage: ${br-image}
cleanPolicy: Delete
calcSizeLevel: {snapshot-size-calculation-level}
If you grant permissions by associating Pod with IAM, you can create the VolumeBackup
CR as follows:
kubectl apply -f backup-fed.yaml
The backup-fed.yaml
file has the following content:
---
apiVersion: federation.pingcap.com/v1alpha1
kind: VolumeBackup
metadata:
name: ${backup-name}
annotations:
iam.amazonaws.com/role: arn:aws:iam::123456789012:role/role-name
spec:
clusters:
- k8sClusterName: ${k8s-name1}
tcName: ${tc-name1}
tcNamespace: ${tc-namespace1}
- k8sClusterName: ${k8s-name2}
tcName: ${tc-name2}
tcNamespace: ${tc-namespace2}
- ... # other clusters
template:
br:
sendCredToTikv: false
s3:
provider: aws
region: ${region-name}
bucket: ${bucket-name}
prefix: ${backup-path}
toolImage: ${br-image}
cleanPolicy: Delete
calcSizeLevel: {snapshot-size-calculation-level}
If you grant permissions by associating ServiceAccount with IAM, you can create the VolumeBackup
CR as follows:
kubectl apply -f backup-fed.yaml
The backup-fed.yaml
file has the following content:
---
apiVersion: federation.pingcap.com/v1alpha1
kind: VolumeBackup
metadata:
name: ${backup-name}
spec:
clusters:
- k8sClusterName: ${k8s-name1}
tcName: ${tc-name1}
tcNamespace: ${tc-namespace1}
- k8sClusterName: ${k8s-name2}
tcName: ${tc-name2}
tcNamespace: ${tc-namespace2}
- ... # other clusters
template:
br:
sendCredToTikv: false
s3:
provider: aws
region: ${region-name}
bucket: ${bucket-name}
prefix: ${backup-path}
toolImage: ${br-image}
serviceAccount: tidb-backup-manager
cleanPolicy: Delete
calcSizeLevel: {snapshot-size-calculation-level}
Step 3. View the backup status
After creating the VolumeBackup
CR, the BR Federation automatically starts the backup process in each data plane.
To check the volume backup status, use the following command:
kubectl get vbk -n ${namespace} -o wide
Once the volume backup is complete, you can get the information of all the data planes in the status.backups
field. This information can be used for volume restore.
To obtain the information, use the following command:
kubectl get vbk ${backup-name} -n ${namespace} -o yaml
The information is as follows:
status:
backups:
- backupName: fed-{backup-name}-{k8s-name1}
backupPath: s3://{bucket-name}/{backup-path}-{k8s-name1}
commitTs: "ts1"
k8sClusterName: {k8s-name1}
tcName: {tc-name1}
tcNamespace: {tc-namespace1}
- backupName: fed-{backup-name}-{k8s-name2}
backupPath: s3://{bucket-name}/{backup-path}-{k8s-name2}
commitTs: "ts2"
k8sClusterName: {k8s-name2}
tcName: {tc-name2}
tcNamespace: {tc-namespace2}
- ... # other backups
Delete the VolumeBackup
CR
If you set spec.template.cleanPolicy
to Delete
, when you delete the VolumeBackup
CR, the BR Federation will clean up the backup file and the volume snapshots on AWS.
To delete the VolumeBackup
CR, run the following commands:
kubectl delete backup ${backup-name} -n ${namespace}
Scheduled volume backup
To ensure regular backups of the TiDB cluster and prevent an excessive number of backup items, you can set a backup policy and retention policy.
This can be done by creating a VolumeBackupSchedule
CR object that describes the scheduled snapshot backup. Each backup time point triggers a volume backup. The underlying implementation is the ad-hoc volume backup.
Perform a scheduled volume backup
You must execute the following steps in the control plane.
Depending on the authorization method you choose in the previous step for granting remote storage access, perform a scheduled volume backup by doing one of the following:
- AK/SK
- IAM role with ServiceAccount
- IAM role with ServiceAccount
If you grant permissions by accessKey and secretKey, Create the VolumeBackupSchedule
CR, and back up cluster data as described below:
kubectl apply -f volume-backup-scheduler.yaml
The content of volume-backup-scheduler.yaml
is as follows:
---
apiVersion: federation.pingcap.com/v1alpha1
kind: VolumeBackupSchedule
metadata:
name: {scheduler-name}
namespace: {namespace-name}
spec:
#maxBackups: {number}
#pause: {bool}
maxReservedTime: {duration}
schedule: {cron-expression}
backupTemplate:
clusters:
- k8sClusterName: {k8s-name1}
tcName: {tc-name1}
tcNamespace: {tc-namespace1}
- k8sClusterName: {k8s-name2}
tcName: {tc-name2}
tcNamespace: {tc-namespace2}
- ... # other clusters
template:
br:
sendCredToTikv: true
s3:
provider: aws
secretName: s3-secret
region: {region-name}
bucket: {bucket-name}
prefix: {backup-path}
toolImage: {br-image}
cleanPolicy: Delete
calcSizeLevel: {snapshot-size-calculation-level}
If you grant permissions by associating Pod with IAM, Create the VolumeBackupSchedule
CR, and back up cluster data as described below:
kubectl apply -f volume-backup-scheduler.yaml
The content of volume-backup-scheduler.yaml
is as follows:
---
apiVersion: federation.pingcap.com/v1alpha1
kind: VolumeBackupSchedule
metadata:
name: {scheduler-name}
namespace: {namespace-name}
annotations:
iam.amazonaws.com/role: arn:aws:iam::123456789012:role/role-name
spec:
#maxBackups: {number}
#pause: {bool}
maxReservedTime: {duration}
schedule: {cron-expression}
backupTemplate:
clusters:
- k8sClusterName: {k8s-name1}
tcName: {tc-name1}
tcNamespace: {tc-namespace1}
- k8sClusterName: {k8s-name2}
tcName: {tc-name2}
tcNamespace: {tc-namespace2}
- ... # other clusters
template:
br:
sendCredToTikv: false
s3:
provider: aws
region: {region-name}
bucket: {bucket-name}
prefix: {backup-path}
toolImage: {br-image}
cleanPolicy: Delete
calcSizeLevel: {snapshot-size-calculation-level}
If you grant permissions by associating ServiceAccount with IAM, Create the VolumeBackupSchedule
CR, and back up cluster data as described below:
kubectl apply -f volume-backup-scheduler.yaml
The content of volume-backup-scheduler.yaml
is as follows:
---
apiVersion: federation.pingcap.com/v1alpha1
kind: VolumeBackupSchedule
metadata:
name: {scheduler-name}
namespace: {namespace-name}
spec:
#maxBackups: {number}
#pause: {bool}
maxReservedTime: {duration}
schedule: {cron-expression}
backupTemplate:
clusters:
- k8sClusterName: {k8s-name1}
tcName: {tc-name1}
tcNamespace: {tc-namespace1}
- k8sClusterName: {k8s-name2}
tcName: {tc-name2}
tcNamespace: {tc-namespace2}
- ... # other clusters
template:
br:
sendCredToTikv: false
s3:
provider: aws
region: {region-name}
bucket: {bucket-name}
prefix: {backup-path}
serviceAccount: tidb-backup-manager
toolImage: {br-image}
cleanPolicy: Delete
calcSizeLevel: {snapshot-size-calculation-level}