Important

You are viewing the documentation for an older major version of the TiDB Operator tool (TiDB Operator v1.5). TiDB Operator release-1.6, the latest major version of TiDB Operator, is now stable and recommended for general use. To view this page for TiDB Operator release-1.6, click here.

FAQs on EBS Snapshot Backup and Restore across Multiple Kubernetes

This document addresses common questions and solutions related to EBS snapshot backup and restore across multiple Kubernetes environments.

New tags on snapshots and restored volumes

Symptom: Some tags are automatically added to generated snapshots and restored EBS volumes

Explanation: The new tags are added for traceability. Snapshots inherit all tags from the individual source EBS volumes, while restored EBS volumes inherit tags from the source snapshots but prefix keys with snapshot\. Additionally, new tags such as <TiDBCluster-BR: true>, <snapshot/createdFromSnapshotId, {source-snapshot-id}> are added to restored EBS volumes.

Backup Initialize Failed

Symptom: You get the error that contains GC safepoint 443455494791364608 exceed TS 0 when the backup is initializing.

Solution: This issue might occur if you have disabled the feature of "resolved ts" in TiKV or PD. Check the configuration of TiKV and PD:

For TiKV, confirm if you set resolved-ts.enable = false or raftstore.report-min-resolved-ts-interval = "0s". If so, remove these configurations.
For PD, confirm if you set pd-server.min-resolved-ts-persistence-interval = "0s". If so, remove this configuration.

Backup failed due to execution twice

Issue: #5143

Symptom: You get the error that contains backup meta file exists, and the backup pod is scheduled twice.

Solution: This issue might occur if the first backup pod is evicted by Kubernetes due to node resource pressure. You can configure PriorityClass and ResourceRequirements to reduce the possibility of eviction. For more details, refer to the comment of issue.

Save time for backup by controlling snapshot size calculation level

Symptom: Scheduled backup can't be completed in the expected window due to the cost of snapshot size calculation.

Solution: By default, both full size and incremental size are calculated by calling the AWS service, which might take several minutes. You can set spec.template.calcSizeLevel to full to skip incremental size calculation, set it to incremental to skip full size calculation, and set it to none to skip both calculations.

How to configure the TTL for the backup init job

The backup init job will handle backup preparations, including pausing GC, certain PD schedulers, and suspending Lightning. By default, a TTL of 10 minutes is associated with the init job in case it gets stuck. You can change the TTL by setting the spec.template.volumeBackupInitJobMaxActiveSeconds attribute of spec of volumebackup.

How to flow control to snapshots deletion

EBS snapshot backup GC is performed on one volumebackup at a time. For larger clusters with EBS snapshot backups, there might still be a significant number of snapshots for a single volume backup. Therefore, flow control is necessary for snapshot deletion. You can manage the expected ratio in a single data plane by setting the spec.template.snapshotsDeleteRatio parameter of the backup schedule CRD. The default value is 1.0, which ensures no more than one snapshot deletion per second.