FAQs on EBS Snapshot Backup and Restore across Multiple Kubernetes
This document addresses common questions and solutions related to EBS snapshot backup and restore across multiple Kubernetes environments.
New tags on snapshots and restored volumes
Symptom: Some tags are automatically added to generated snapshots and restored EBS volumes
Explanation: The new tags are added for traceability. Snapshots inherit all tags from the individual source EBS volumes, while restored EBS volumes inherit tags from the source snapshots but prefix keys with snapshot\
. Additionally, new tags such as <TiDBCluster-BR: true>
, <snapshot/createdFromSnapshotId, {source-snapshot-id}>
are added to restored EBS volumes.
Backup Initialize Failed
Symptom: You get the error that contains GC safepoint 443455494791364608 exceed TS 0
when the backup is initializing.
Solution: This issue might occur if you have disabled the feature of "resolved ts" in TiKV or PD. Check the configuration of TiKV and PD:
- For TiKV, confirm if you set
resolved-ts.enable = false
orraftstore.report-min-resolved-ts-interval = "0s"
. If so, remove these configurations. - For PD, confirm if you set
pd-server.min-resolved-ts-persistence-interval = "0s"
. If so, remove this configuration.
Backup failed due to execution twice
Issue: #5143
Symptom: You get the error that contains backup meta file exists
, and the backup pod is scheduled twice.
Solution: This issue might occur if the first backup pod is evicted by Kubernetes due to node resource pressure. You can configure PriorityClass
and ResourceRequirements
to reduce the possibility of eviction. For more details, refer to the comment of issue.
Save time for backup by controlling snapshot size calculation level
Symptom: Scheduled backup can't be completed in the expected window due to the cost of snapshot size calculation.
Solution: By default, both full size and incremental size are calculated by calling the AWS service, which might take several minutes. You can set spec.template.calcSizeLevel
to full
to skip incremental size calculation, set it to incremental
to skip full size calculation, and set it to none
to skip both calculations.
How to configure the TTL for the backup init job
The backup init job will handle backup preparations, including pausing GC, certain PD schedulers, and suspending Lightning. By default, a TTL of 10 minutes is associated with the init job in case it gets stuck. You can change the TTL by setting the spec.template.volumeBackupInitJobMaxActiveSeconds
attribute of spec of volumebackup.
How to flow control to snapshots deletion
EBS snapshot backup GC is performed on one volumebackup at a time. For larger clusters with EBS snapshot backups, there might still be a significant number of snapshots for a single volume backup. Therefore, flow control is necessary for snapshot deletion. You can manage the expected ratio in a single data plane by setting the spec.template.snapshotsDeleteRatio
parameter of the backup schedule CRD. The default value is 1.0, which ensures no more than one snapshot deletion per second.