Deploy TiDB on AWS EKS
This document describes how to deploy a TiDB cluster on AWS Elastic Kubernetes Service (EKS).
To deploy TiDB Operator and the TiDB cluster in a self-managed Kubernetes environment, refer to Deploy TiDB Operator and Deploy TiDB on General Kubernetes.
Prerequisites
Before deploying a TiDB cluster on AWS EKS, make sure the following requirements are satisfied:
Install Helm 3: used for deploying TiDB Operator.
Complete all operations in Getting started with eksctl.
This guide includes the following contents:
- Install and configure
awscli
. - Install and configure
eksctl
used for creating Kubernetes clusters. - Install
kubectl
.
- Install and configure
To verify whether AWS CLI is configured correctly, run the aws configure list
command. If the output shows the values for access_key
and secret_key
, AWS CLI is configured correctly. Otherwise, you need to re-configure AWS CLI.
Recommended instance types and storage
- Instance types: to gain better performance, the following is recommended:
- PD nodes:
c7g.xlarge
- TiDB nodes:
c7g.4xlarge
- TiKV or TiFlash nodes:
m7g.4xlarge
- PD nodes:
- Storage: Because AWS supports the EBS
gp3
volume type, it is recommended to use EBSgp3
. Forgp3
provisioning, the following is recommended:- TiKV: 400 MiB/s, 4000 IOPS
- TiFlash: 625 MiB/s, 6000 IOPS
- AMI type: Amazon Linux 2
Create an EKS cluster and a node pool
According to AWS Official Blog recommendation and EKS Best Practice Document, since most of the TiDB cluster components use EBS volumes as storage, it is recommended to create a node pool in each availability zone (at least 3 in total) for each component when creating an EKS.
Save the following configuration as the cluster.yaml
file. Replace ${clusterName}
with your desired cluster name. The cluster and node group names should match the regular expression [a-zA-Z][-a-zA-Z0-9]*
, so avoid names that contain _
.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: ${clusterName}
region: ap-northeast-1
addons:
- name: aws-ebs-csi-driver
nodeGroups:
- name: admin
desiredCapacity: 1
privateNetworking: true
labels:
dedicated: admin
iam:
withAddonPolicies:
ebs: true
- name: tidb-1a
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1a"]
instanceType: c5.2xlarge
labels:
dedicated: tidb
taints:
dedicated: tidb:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: tidb-1d
desiredCapacity: 0
privateNetworking: true
availabilityZones: ["ap-northeast-1d"]
instanceType: c5.2xlarge
labels:
dedicated: tidb
taints:
dedicated: tidb:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: tidb-1c
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1c"]
instanceType: c5.2xlarge
labels:
dedicated: tidb
taints:
dedicated: tidb:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: pd-1a
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1a"]
instanceType: c7g.xlarge
labels:
dedicated: pd
taints:
dedicated: pd:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: pd-1d
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1d"]
instanceType: c7g.xlarge
labels:
dedicated: pd
taints:
dedicated: pd:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: pd-1c
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1c"]
instanceType: c7g.xlarge
labels:
dedicated: pd
taints:
dedicated: pd:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: tikv-1a
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1a"]
instanceType: r5b.2xlarge
labels:
dedicated: tikv
taints:
dedicated: tikv:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: tikv-1d
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1d"]
instanceType: r5b.2xlarge
labels:
dedicated: tikv
taints:
dedicated: tikv:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: tikv-1c
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1c"]
instanceType: r5b.2xlarge
labels:
dedicated: tikv
taints:
dedicated: tikv:NoSchedule
iam:
withAddonPolicies:
ebs: true
By default, only two TiDB nodes are required, so you can set the desiredCapacity
of the tidb-1d
node group to 0
. You can scale out this node group any time if necessary.
Execute the following command to create the cluster:
eksctl create cluster -f cluster.yaml
After executing the command above, you need to wait until the EKS cluster is successfully created and the node group is created and added in the EKS cluster. This process might take 5 to 20 minutes. For more cluster configuration, refer to eksctl
documentation.
Configure StorageClass
This section describes how to configure the storage class for different storage types. These storage types are:
- The default
gp2
storage type after creating the EKS cluster. - The
gp3
storage type (recommended) or other EBS storage types. - The local storage used for testing bare-metal performance.
Configure gp2
note:
Starting from EKS Kubernetes 1.23, you need to deploy the EBS CSI driver before using the default gp2 storage class. For details, refer to the notice for Amazon EKS Kubernetes 1.23.
After you create an EKS cluster, the default StorageClass is gp2
. To improve I/O write performance, it is recommended to configure nodelalloc
and noatime
in the mountOptions
field of the StorageClass
resource.
kind: StorageClass
apiVersion: storage.k8s.io/v1
# ...
mountOptions:
- nodelalloc
- noatime
For more information on the mount options, see TiDB Environment and System Configuration Check.
Configure gp3
(recommended) or other EBS storage types
If you do not want to use the default gp2
storage type, you can create StorageClass for other storage types. For example, you can use the gp3
(recommended) or io1
storage type.
The following example shows how to create and configure a StorageClass for the gp3
storage type:
Deploy the AWS EBS Container Storage Interface (CSI) driver on the EKS cluster. If you are using a storage type other than
gp3
, skip this step.Set ebs-csi-node
toleration
.kubectl patch -n kube-system ds ebs-csi-node -p '{"spec":{"template":{"spec":{"tolerations":[{"operator":"Exists"}]}}}}'Expected output:
daemonset.apps/ebs-csi-node patchedCreate a
StorageClass
resource. In the resource definition, specify your desired storage type in theparameters.type
field.kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: gp3 provisioner: ebs.csi.aws.com allowVolumeExpansion: true volumeBindingMode: WaitForFirstConsumer parameters: type: gp3 fsType: ext4 iops: "4000" throughput: "400" mountOptions: - nodelalloc - noatimeIn the TidbCluster YAML file, configure
gp3
in thestorageClassName
field. For example:spec: tikv: ... storageClassName: gp3To improve I/O write performance, it is recommended to configure
nodelalloc
andnoatime
in themountOptions
field of theStorageClass
resource.kind: StorageClass apiVersion: storage.k8s.io/v1 # ... mountOptions: - nodelalloc - noatimeFor more information on the mount options, see TiDB Environment and System Configuration Check.
For more information on the EBS storage types and configuration, refer to Amazon EBS volume types and Storage Classes.
Configure local storage
Local storage is used for testing bare-metal performance. For higher IOPS and lower latency, you can choose NVMe SSD volumes offered by some AWS instances for the TiKV node pool. However, for the production environment, use AWS EBS as your storage type.
For instance types that provide NVMe SSD volumes, check out Amazon EC2 Instance Types.
The following c5d.4xlarge
example shows how to configure StorageClass for the local storage:
Create a node group with local storage for TiKV.
In the
eksctl
configuration file, modify the instance type of the TiKV node group toc5d.4xlarge
:- name: tikv-1a desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1a"] instanceType: c5d.4xlarge labels: dedicated: tikv taints: dedicated: tikv:NoSchedule iam: withAddonPolicies: ebs: true ...Create a node group with local storage:
eksctl create nodegroups -f cluster.yaml
If the TiKV node group already exists, to avoid name conflict, you can take either of the following actions:
- Delete the old group and create a new one.
- Change the group name.
Deploy local volume provisioner.
To conveniently discover and manage local storage volumes, install local-volume-provisioner.
Mount the local storage to the
/mnt/ssd
directory.According to the mounting configuration, modify the local-volume-provisioner.yaml file.
Deploy and create a
local-storage
storage class using the modifiedlocal-volume-provisioner.yaml
file.kubectl apply -f <local-volume-provisioner.yaml>
Use the local storage.
After you complete the previous step, local-volume-provisioner can discover all the local NVMe SSD volumes in the cluster.
After local-volume-provisioner discovers the local volumes, when you Deploy a TiDB cluster and the monitoring component, you need to add the tikv.storageClassName
field to tidb-cluster.yaml
and set the field value to local-storage
.
Deploy TiDB Operator
To deploy TiDB Operator in the EKS cluster, refer to the Deploy TiDB Operator section in Getting Started.
Deploy a TiDB cluster and the monitoring component
This section describes how to deploy a TiDB cluster and its monitoring component in AWS EKS.
Create namespace
To create a namespace to deploy the TiDB cluster, run the following command:
kubectl create namespace tidb-cluster
Deploy
First, download the sample TidbCluster
and TidbMonitor
configuration files:
curl -O https://raw.githubusercontent.com/pingcap/tidb-operator/v1.5.4/examples/aws/tidb-cluster.yaml && \
curl -O https://raw.githubusercontent.com/pingcap/tidb-operator/v1.5.4/examples/aws/tidb-monitor.yaml && \
curl -O https://raw.githubusercontent.com/pingcap/tidb-operator/v1.5.4/examples/aws/tidb-dashboard.yaml
Refer to configure the TiDB cluster to further customize and configure the CR before applying.
To deploy the TidbCluster
and TidbMonitor
CR in the EKS cluster, run the following command:
kubectl apply -f tidb-cluster.yaml -n tidb-cluster && \
kubectl apply -f tidb-monitor.yaml -n tidb-cluster
After the YAML file above is applied to the Kubernetes cluster, TiDB Operator creates the desired TiDB cluster and its monitoring component according to the YAML file.
View the cluster status
To view the status of the starting TiDB cluster, run the following command:
kubectl get pods -n tidb-cluster
When all the Pods are in the Running
or Ready
state, the TiDB cluster is successfully started. For example:
NAME READY STATUS RESTARTS AGE
tidb-discovery-5cb8474d89-n8cxk 1/1 Running 0 47h
tidb-monitor-6fbcc68669-dsjlc 3/3 Running 0 47h
tidb-pd-0 1/1 Running 0 47h
tidb-pd-1 1/1 Running 0 46h
tidb-pd-2 1/1 Running 0 46h
tidb-tidb-0 2/2 Running 0 47h
tidb-tidb-1 2/2 Running 0 46h
tidb-tikv-0 1/1 Running 0 47h
tidb-tikv-1 1/1 Running 0 47h
tidb-tikv-2 1/1 Running 0 47h
Access the database
After you have deployed a TiDB cluster, you can access the TiDB database to test or develop your application.
Prepare a bastion host
The LoadBalancer created for your TiDB cluster is an intranet LoadBalancer. You can create a bastion host in the cluster VPC to access the database. To create a bastion host on AWS console, refer to AWS documentation.
Select the cluster's VPC and Subnet, and verify whether the cluster name is correct in the dropdown box. You can view the cluster's VPC and Subnet by running the following command:
eksctl get cluster -n ${clusterName}
Allow the bastion host to access the Internet. Select the correct key pair so that you can log in to the host via SSH.
Install the MySQL client and connect
After the bastion host is created, you can connect to the bastion host via SSH and access the TiDB cluster via the MySQL client.
Log in to the bastion host via SSH:
ssh [-i /path/to/your/private-key.pem] ec2-user@<bastion-public-dns-name>Install the MySQL client on the bastion host:
sudo yum install mysql -yConnect the client to the TiDB cluster:
mysql --comments -h ${tidb-nlb-dnsname} -P 4000 -u root${tidb-nlb-dnsname}
is the LoadBalancer domain name of the TiDB service. You can view the domain name in theEXTERNAL-IP
field by executingkubectl get svc basic-tidb -n tidb-cluster
.For example:
$ mysql --comments -h abfc623004ccb4cc3b363f3f37475af1-9774d22c27310bc1.elb.us-west-2.amazonaws.com -P 4000 -u root Welcome to the MariaDB monitor. Commands end with ; or \g. Your MySQL connection id is 1189 Server version: 8.0.11-TiDB-v7.5.3 TiDB Server (Apache License 2.0) Community Edition, MySQL 8.0 compatible Copyright (c) 2000, 2022, Oracle and/or its affiliates. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MySQL [(none)]> show status; +--------------------+--------------------------------------+ | Variable_name | Value | +--------------------+--------------------------------------+ | Ssl_cipher | | | Ssl_cipher_list | | | Ssl_verify_mode | 0 | | Ssl_version | | | ddl_schema_version | 22 | | server_id | ed4ba88b-436a-424d-9087-977e897cf5ec | +--------------------+--------------------------------------+ 6 rows in set (0.00 sec)
Access the Grafana monitoring dashboard
Obtain the LoadBalancer domain name of Grafana:
kubectl -n tidb-cluster get svc basic-grafana
For example:
$ kubectl get svc basic-grafana
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
basic-grafana LoadBalancer 10.100.199.42 a806cfe84c12a4831aa3313e792e3eed-1964630135.us-west-2.elb.amazonaws.com 3000:30761/TCP 121m
In the output above, the EXTERNAL-IP
column is the LoadBalancer domain name.
You can access the ${grafana-lb}:3000
address using your web browser to view monitoring metrics. Replace ${grafana-lb}
with the LoadBalancer domain name.
Access the TiDB Dashboard
See Access TiDB Dashboard for instructions about how to securely allow access to the TiDB Dashboard.
Upgrade
To upgrade the TiDB cluster, execute the following command:
kubectl patch tc basic -n tidb-cluster --type merge -p '{"spec":{"version":"${version}"}}`.
The upgrade process does not finish immediately. You can watch the upgrade progress by executing kubectl get pods -n tidb-cluster --watch
.
Scale out
Before scaling out the cluster, you need to scale out the corresponding node group so that the new instances have enough resources for operation.
This section describes how to scale out the EKS node group and TiDB components.
Scale out EKS node group
When scaling out TiKV, the node groups must be scaled out evenly among the different availability zones. The following example shows how to scale out the tikv-1a
, tikv-1c
, and tikv-1d
groups of the ${clusterName}
cluster to 2 nodes:
eksctl scale nodegroup --cluster ${clusterName} --name tikv-1a --nodes 2 --nodes-min 2 --nodes-max 2
eksctl scale nodegroup --cluster ${clusterName} --name tikv-1c --nodes 2 --nodes-min 2 --nodes-max 2
eksctl scale nodegroup --cluster ${clusterName} --name tikv-1d --nodes 2 --nodes-min 2 --nodes-max 2
For more information on managing node groups, refer to eksctl
documentation.
Scale out TiDB components
After scaling out the EKS node group, execute kubectl edit tc basic -n tidb-cluster
, and modify each component's replicas
to the desired number of replicas. The scaling-out process is then completed.
Deploy TiFlash/TiCDC
TiFlash is the columnar storage extension of TiKV.
TiCDC is a tool for replicating the incremental data of TiDB by pulling TiKV change logs.
The two components are not required in the deployment. This section shows a quick start example.
Add node groups
In the configuration file of eksctl (cluster.yaml
), add the following two items to add a node group for TiFlash/TiCDC respectively. desiredCapacity
is the number of nodes you desire.
- name: tiflash-1a
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1a"]
labels:
dedicated: tiflash
taints:
dedicated: tiflash:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: tiflash-1d
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1d"]
labels:
dedicated: tiflash
taints:
dedicated: tiflash:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: tiflash-1c
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1c"]
labels:
dedicated: tiflash
taints:
dedicated: tiflash:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: ticdc-1a
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1a"]
labels:
dedicated: ticdc
taints:
dedicated: ticdc:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: ticdc-1d
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1d"]
labels:
dedicated: ticdc
taints:
dedicated: ticdc:NoSchedule
iam:
withAddonPolicies:
ebs: true
- name: ticdc-1c
desiredCapacity: 1
privateNetworking: true
availabilityZones: ["ap-northeast-1c"]
labels:
dedicated: ticdc
taints:
dedicated: ticdc:NoSchedule
iam:
withAddonPolicies:
ebs: true
Depending on the EKS cluster status, use different commands:
- If the cluster is not created, execute
eksctl create cluster -f cluster.yaml
to create the cluster and node groups. - If the cluster is already created, execute
eksctl create nodegroup -f cluster.yaml
to create the node groups. The existing node groups are ignored and will not be created again.
Configure and deploy
To deploy TiFlash, configure
spec.tiflash
intidb-cluster.yaml
:spec: ... tiflash: baseImage: pingcap/tiflash maxFailoverCount: 0 replicas: 1 storageClaims: - resources: requests: storage: 100Gi tolerations: - effect: NoSchedule key: dedicated operator: Equal value: tiflashFor other parameters, refer to Configure a TiDB Cluster.
To deploy TiCDC, configure
spec.ticdc
intidb-cluster.yaml
:spec: ... ticdc: baseImage: pingcap/ticdc replicas: 1 tolerations: - effect: NoSchedule key: dedicated operator: Equal value: ticdcModify
replicas
according to your needs.
Finally, execute kubectl -n tidb-cluster apply -f tidb-cluster.yaml
to update the TiDB cluster configuration.
For detailed CR configuration, refer to API references and Configure a TiDB Cluster.
Configure TiDB monitoring
For more information, see Deploy monitoring and alerts for a TiDB cluster.
Collect logs
System and application logs can be useful for troubleshooting issues and automating operations. By default, TiDB components output logs to the container's stdout
and stderr
, and log rotation is automatically performed based on the container runtime environment. When a Pod restarts, container logs will be lost. To prevent log loss, it is recommended to Collect logs of TiDB and its related components.