Scale a TiDB Cluster Using TiUP
The capacity of a TiDB cluster can be increased or decreased without interrupting the online services.
This document describes how to scale the TiDB, TiKV, PD, TiCDC, or TiFlash cluster using TiUP. If you have not installed TiUP, refer to the steps in Step 2. Deploy TiUP on the control machine.
To view the current cluster name list, run tiup cluster list
.
For example, if the original topology of the cluster is as follows:
Host IP | Service |
---|---|
10.0.1.3 | TiDB + TiFlash |
10.0.1.4 | TiDB + PD |
10.0.1.5 | TiKV + Monitor |
10.0.1.1 | TiKV |
10.0.1.2 | TiKV |
Scale out a TiDB/PD/TiKV cluster
This section exemplifies how to add a TiDB node to the 10.0.1.5
host.
Configure the scale-out topology:
Add the scale-out topology configuration in the
scale-out.yml
file:vi scale-out.ymltidb_servers: - host: 10.0.1.5 ssh_port: 22 port: 4000 status_port: 10080 deploy_dir: /tidb-deploy/tidb-4000 log_dir: /tidb-deploy/tidb-4000/logHere is a TiKV configuration file template:
tikv_servers: - host: 10.0.1.5 ssh_port: 22 port: 20160 status_port: 20180 deploy_dir: /tidb-deploy/tikv-20160 data_dir: /tidb-data/tikv-20160 log_dir: /tidb-deploy/tikv-20160/logHere is a PD configuration file template:
pd_servers: - host: 10.0.1.5 ssh_port: 22 name: pd-1 client_port: 2379 peer_port: 2380 deploy_dir: /tidb-deploy/pd-2379 data_dir: /tidb-data/pd-2379 log_dir: /tidb-deploy/pd-2379/logTo view the configuration of the current cluster, run
tiup cluster edit-config <cluster-name>
. Because the parameter configuration ofglobal
andserver_configs
is inherited byscale-out.yml
and thus also takes effect inscale-out.yml
.Run the scale-out command:
Before you run the
scale-out
command, use thecheck
andcheck --apply
commands to detect and automatically repair potential risks in the cluster:Check for potential risks:
tiup cluster check <cluster-name> scale-out.yml --cluster --user root [-p] [-i /home/root/.ssh/gcp_rsa]Enable automatic repair:
tiup cluster check <cluster-name> scale-out.yml --cluster --apply --user root [-p] [-i /home/root/.ssh/gcp_rsa]Run the
scale-out
command:tiup cluster scale-out <cluster-name> scale-out.yml [-p] [-i /home/root/.ssh/gcp_rsa]
In the preceding commands:
scale-out.yml
is the scale-out configuration file.--user root
indicates logging in to the target machine as theroot
user to complete the cluster scale out. Theroot
user is expected to havessh
andsudo
privileges to the target machine. Alternatively, you can use other users withssh
andsudo
privileges to complete the deployment.[-i]
and[-p]
are optional. If you have configured login to the target machine without password, these parameters are not required. If not, choose one of the two parameters.[-i]
is the private key of the root user (or other users specified by--user
) that has access to the target machine.[-p]
is used to input the user password interactively.
If you see
Scaled cluster <cluster-name> out successfully
, the scale-out operation succeeds.Refresh the cluster configuration.
Refresh the cluster configuration:
tiup cluster reload <cluster-name> --skip-restartRefresh the Prometheus configuration and restart Prometheus:
tiup cluster reload <cluster-name> -R prometheus
Check the cluster status:
tiup cluster display <cluster-name>Access the monitoring platform at http://10.0.1.5:3000 using your browser to monitor the status of the cluster and the new node.
After the scale-out, the cluster topology is as follows:
Host IP | Service |
---|---|
10.0.1.3 | TiDB + TiFlash |
10.0.1.4 | TiDB + PD |
10.0.1.5 | TiDB + TiKV + Monitor |
10.0.1.1 | TiKV |
10.0.1.2 | TiKV |
Scale out a TiFlash cluster
This section exemplifies how to add a TiFlash node to the 10.0.1.4
host.
Add the node information to the
scale-out.yml
file:Create the
scale-out.yml
file to add the TiFlash node information.tiflash_servers: - host: 10.0.1.4Currently, you can only add IP addresses but not domain names.
Run the scale-out command:
tiup cluster scale-out <cluster-name> scale-out.ymlView the cluster status:
tiup cluster display <cluster-name>Access the monitoring platform at http://10.0.1.5:3000 using your browser, and view the status of the cluster and the new node.
After the scale-out, the cluster topology is as follows:
Host IP | Service |
---|---|
10.0.1.3 | TiDB + TiFlash |
10.0.1.4 | TiDB + PD + TiFlash |
10.0.1.5 | TiDB+ TiKV + Monitor |
10.0.1.1 | TiKV |
10.0.1.2 | TiKV |
Scale out a TiCDC cluster
This section exemplifies how to add two TiCDC nodes to the 10.0.1.3
and 10.0.1.4
hosts.
Add the node information to the
scale-out.yml
file:Create the
scale-out.yml
file to add the TiCDC node information.cdc_servers: - host: 10.0.1.3 gc-ttl: 86400 data_dir: /tidb-data/cdc-8300 - host: 10.0.1.4 gc-ttl: 86400 data_dir: /tidb-data/cdc-8300Run the scale-out command:
tiup cluster scale-out <cluster-name> scale-out.ymlView the cluster status:
tiup cluster display <cluster-name>Access the monitoring platform at http://10.0.1.5:3000 using your browser, and view the status of the cluster and the new nodes.
After the scale-out, the cluster topology is as follows:
Host IP | Service |
---|---|
10.0.1.3 | TiDB + TiFlash + TiCDC |
10.0.1.4 | TiDB + PD + TiFlash + TiCDC |
10.0.1.5 | TiDB+ TiKV + Monitor |
10.0.1.1 | TiKV |
10.0.1.2 | TiKV |
Scale in a TiDB/PD/TiKV cluster
This section exemplifies how to remove a TiKV node from the 10.0.1.5
host.
View the node ID information:
tiup cluster display <cluster-name>Starting /root/.tiup/components/cluster/v1.12.3/cluster display <cluster-name> TiDB Cluster: <cluster-name> TiDB Version: v8.2.0 ID Role Host Ports Status Data Dir Deploy Dir -- ---- ---- ----- ------ -------- ---------- 10.0.1.3:8300 cdc 10.0.1.3 8300 Up data/cdc-8300 deploy/cdc-8300 10.0.1.4:8300 cdc 10.0.1.4 8300 Up data/cdc-8300 deploy/cdc-8300 10.0.1.4:2379 pd 10.0.1.4 2379/2380 Healthy data/pd-2379 deploy/pd-2379 10.0.1.1:20160 tikv 10.0.1.1 20160/20180 Up data/tikv-20160 deploy/tikv-20160 10.0.1.2:20160 tikv 10.0.1.2 20160/20180 Up data/tikv-20160 deploy/tikv-20160 10.0.1.5:20160 tikv 10.0.1.5 20160/20180 Up data/tikv-20160 deploy/tikv-20160 10.0.1.3:4000 tidb 10.0.1.3 4000/10080 Up - deploy/tidb-4000 10.0.1.4:4000 tidb 10.0.1.4 4000/10080 Up - deploy/tidb-4000 10.0.1.5:4000 tidb 10.0.1.5 4000/10080 Up - deploy/tidb-4000 10.0.1.3:9000 tiflash 10.0.1.3 9000/8123/3930/20170/20292/8234 Up data/tiflash-9000 deploy/tiflash-9000 10.0.1.4:9000 tiflash 10.0.1.4 9000/8123/3930/20170/20292/8234 Up data/tiflash-9000 deploy/tiflash-9000 10.0.1.5:9090 prometheus 10.0.1.5 9090 Up data/prometheus-9090 deploy/prometheus-9090 10.0.1.5:3000 grafana 10.0.1.5 3000 Up - deploy/grafana-3000 10.0.1.5:9093 alertmanager 10.0.1.5 9093/9294 Up data/alertmanager-9093 deploy/alertmanager-9093Run the scale-in command:
tiup cluster scale-in <cluster-name> --node 10.0.1.5:20160The
--node
parameter is the ID of the node to be taken offline.If you see
Scaled cluster <cluster-name> in successfully
, the scale-in operation succeeds.Refresh the cluster configuration.
Refresh the cluster configuration:
tiup cluster reload <cluster-name> --skip-restartRefresh the Prometheus configuration and restart Prometheus:
tiup cluster reload <cluster-name> -R prometheus
Check the cluster status:
The scale-in process takes some time. You can run the following command to check the scale-in status:
tiup cluster display <cluster-name>If the node to be scaled in becomes
Tombstone
, the scale-in operation succeeds.Access the monitoring platform at http://10.0.1.5:3000 using your browser, and view the status of the cluster.
The current topology is as follows:
Host IP | Service |
---|---|
10.0.1.3 | TiDB + TiFlash + TiCDC |
10.0.1.4 | TiDB + PD + TiFlash + TiCDC |
10.0.1.5 | TiDB + Monitor (TiKV is deleted) |
10.0.1.1 | TiKV |
10.0.1.2 | TiKV |
Scale in a TiFlash cluster
This section exemplifies how to remove a TiFlash node from the 10.0.1.4
host.
1. Adjust the number of replicas of the tables according to the number of remaining TiFlash nodes
Query whether any table has TiFlash replicas more than the number of TiFlash nodes after scale-in.
tobe_left_nodes
means the number of TiFlash nodes after scale-in. If the query result is empty, you can start scaling in TiFlash. If the query result is not empty, you need to modify the number of TiFlash replicas of the related table(s).SELECT * FROM information_schema.tiflash_replica WHERE REPLICA_COUNT > 'tobe_left_nodes';Execute the following statement for all tables with TiFlash replicas more than the number of TiFlash nodes after scale-in.
new_replica_num
must be less than or equal totobe_left_nodes
:ALTER TABLE <db-name>.<table-name> SET tiflash replica 'new_replica_num';Perform step 1 again and make sure that there is no table with TiFlash replicas more than the number of TiFlash nodes after scale-in.
2. Perform the scale-in operation
Perform the scale-in operation with one of the following solutions.
Solution 1. Use TiUP to remove a TiFlash node
Confirm the name of the node to be taken down:
tiup cluster display <cluster-name>Remove the TiFlash node (assume that the node name is
10.0.1.4:9000
from Step 1):tiup cluster scale-in <cluster-name> --node 10.0.1.4:9000
Solution 2. Manually remove a TiFlash node
In special cases (such as when a node needs to be forcibly taken down), or if the TiUP scale-in operation fails, you can manually remove a TiFlash node with the following steps.
Use the store command of pd-ctl to view the store ID corresponding to this TiFlash node.
Enter the store command in pd-ctl (the binary file is under
resources/bin
in the tidb-ansible directory).If you use TiUP deployment, replace
pd-ctl
withtiup ctl:v<CLUSTER_VERSION> pd
:
tiup ctl:v<CLUSTER_VERSION> pd -u http://<pd_ip>:<pd_port> storeRemove the TiFlash node in pd-ctl:
Enter
store delete <store_id>
in pd-ctl (<store_id>
is the store ID of the TiFlash node found in the previous step.If you use TiUP deployment, replace
pd-ctl
withtiup ctl:v<CLUSTER_VERSION> pd
:tiup ctl:v<CLUSTER_VERSION> pd -u http://<pd_ip>:<pd_port> store delete <store_id>
Wait for the store of the TiFlash node to disappear or for the
state_name
to becomeTombstone
before you stop the TiFlash process.Manually delete TiFlash data files (the location can be found in the
data_dir
directory under the TiFlash configuration of the cluster topology file).Delete information about the TiFlash node that goes down from the cluster topology using the following command:
tiup cluster scale-in <cluster-name> --node <pd_ip>:<pd_port> --force
The steps to manually clean up the replication rules in PD are below:
View all data replication rules related to TiFlash in the current PD instance:
curl http://<pd_ip>:<pd_port>/pd/api/v1/config/rules/group/tiflash[ { "group_id": "tiflash", "id": "table-45-r", "override": true, "start_key": "7480000000000000FF2D5F720000000000FA", "end_key": "7480000000000000FF2E00000000000000F8", "role": "learner", "count": 1, "label_constraints": [ { "key": "engine", "op": "in", "values": [ "tiflash" ] } ] } ]Remove all data replication rules related to TiFlash. Take the rule whose
id
istable-45-r
as an example. Delete it by the following command:curl -v -X DELETE http://<pd_ip>:<pd_port>/pd/api/v1/config/rule/tiflash/table-45-rView the cluster status:
tiup cluster display <cluster-name>Access the monitoring platform at http://10.0.1.5:3000 using your browser, and view the status of the cluster and the new nodes.
After the scale-out, the cluster topology is as follows:
Host IP | Service |
---|---|
10.0.1.3 | TiDB + TiFlash + TiCDC |
10.0.1.4 | TiDB + PD + TiCDC (TiFlash is deleted) |
10.0.1.5 | TiDB+ Monitor |
10.0.1.1 | TiKV |
10.0.1.2 | TiKV |
Scale in a TiCDC cluster
This section exemplifies how to remove the TiCDC node from the 10.0.1.4
host.
Take the node offline:
tiup cluster scale-in <cluster-name> --node 10.0.1.4:8300View the cluster status:
tiup cluster display <cluster-name>Access the monitoring platform at http://10.0.1.5:3000 using your browser, and view the status of the cluster.
The current topology is as follows:
Host IP | Service |
---|---|
10.0.1.3 | TiDB + TiFlash + TiCDC |
10.0.1.4 | TiDB + PD + (TiCDC is deleted) |
10.0.1.5 | TiDB + Monitor |
10.0.1.1 | TiKV |
10.0.1.2 | TiKV |