Replace Nodes for a TiDB Cluster on Cloud Disks
This document describes a method for replacing and upgrading nodes without downtime for a TiDB cluster that uses cloud storage. You can change the nodes to a higher configuration, or upgrade the nodes to a newer version of Kubernetes.
This document uses Amazon EKS as an example and describes how to create a new node group and migrate a TiDB cluster to the new node group using a rolling restart. You can use this method to replace a node group with more compute resources for TiKV or TiDB and upgrade EKS.
For other cloud platforms, refer to Google Cloud GKE or Azure AKS and operate on the node group.
Prerequisites
- A TiDB cluster is deployed on the cloud. If not, refer to Deploy on Amazon EKS and deploy a cluster.
- The TiDB cluster uses cloud storage as its data disk.
Step 1: Create new node groups
Locate the
cluster.yaml
configuration file for the EKS cluster that the TiDB cluster is deployed in, and save a copy of the file ascluster-new.yaml
.In
cluster-new.yaml
, add new groups (for example,tidb-1b-new
andtikv-1a-new
):apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: your-eks-cluster region: ap-northeast-1 nodeGroups: ... - name: tidb-1b-new desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1b"] instanceType: c5.4xlarge labels: dedicated: tidb taints: dedicated: tidb:NoSchedule - name: tikv-1a-new desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1a"] instanceType: r5b.4xlarge labels: dedicated: tikv taints: dedicated: tikv:NoScheduleIf you want to scale up a node, modify
instanceType
. If you want to upgrade the Kubernetes version, first upgrade the version of your cluster control plane. For details, see Updating a Cluster.In
cluster-new.yaml
, delete the original node groups to be replaced.In this example, delete
tidb-1b
andtikv-1a
. You need to delete node groups according to your needs.In
cluster.yaml
, delete the node groups that are not to be replaced and keep the node groups that are to be replaced. The retained node groups will be deleted from the cluster.In this example, keep
tidb-1a
andtikv-1b
, and delete other node groups. You need to keep or delete node groups according to your needs.Create the new node groups:
eksctl create nodegroup -f cluster_new.ymlConfirm that the new nodes are added to the cluster.
kubectl get no -l alpha.eksctl.io/nodegroup-name=${new_nodegroup1} kubectl get no -l alpha.eksctl.io/nodegroup-name=${new_nodegroup2} ...${new_nodegroup}
is the name of a new node group. In this example, the new node groups aretidb-1b-new
andtikv-1a-new
. You need to configure the node group name according to your needs.
Step 2: Mark the original nodes as non-schedulable
You need to mark the original nodes as non-schedulable to ensure that no new Pod is scheduled to it. Run the kubectl cordon
command:
kubectl cordon -l alpha.eksctl.io/nodegroup-name=${origin_nodegroup1}
kubectl cordon -l alpha.eksctl.io/nodegroup-name=${origin_nodegroup2}
...
${origin_nodegroup}
is the name of an original node group. In this example, the original node groups are tidb-1b
and tikv-1a
. You need to configure the node group name according to your needs.
Step 3: Rolling restart the TiDB cluster
Refer to Restart a TiDB Cluster on Kubernetes and perform a rolling restart on the TiDB cluster.
Step 4: Delete the original node groups
Check whether there are TiDB, PD, or TiKV Pods left on nodes of the original node groups:
kubectl get po -n ${namespace} -owide
If no TiDB, PD, or TiKV Pods are left on the nodes of the original node groups, you can delete the original node groups:
eksctl delete nodegroup -f cluster.yaml --approve