Sign InTry Free

Scale the TiDB Cluster Using TiDB Ansible

The capacity of a TiDB cluster can be increased or decreased without affecting the online services.

Assume that the topology is as follows:

NameHost IPServices
node1172.16.10.1PD1
node2172.16.10.2PD2
node3172.16.10.3PD3, Monitor
node4172.16.10.4TiDB1
node5172.16.10.5TiDB2
node6172.16.10.6TiKV1
node7172.16.10.7TiKV2
node8172.16.10.8TiKV3
node9172.16.10.9TiKV4

Increase the capacity of a TiDB/TiKV node

For example, if you want to add two TiDB nodes (node101, node102) with the IP addresses 172.16.10.101 and 172.16.10.102, take the following steps:

  1. Edit the inventory.ini file and the hosts.ini file, and append the node information.

    • Edit the inventory.ini file:

      [tidb_servers]
      172.16.10.4
      172.16.10.5
      172.16.10.101
      172.16.10.102
      
      [pd_servers]
      172.16.10.1
      172.16.10.2
      172.16.10.3
      
      [tikv_servers]
      172.16.10.6
      172.16.10.7
      172.16.10.8
      172.16.10.9
      
      [monitored_servers]
      172.16.10.1
      172.16.10.2
      172.16.10.3
      172.16.10.4
      172.16.10.5
      172.16.10.6
      172.16.10.7
      172.16.10.8
      172.16.10.9
      172.16.10.101
      172.16.10.102
      
      [monitoring_servers]
      172.16.10.3
      
      [grafana_servers]
      172.16.10.3
      

      Now the topology is as follows:

      NameHost IPServices
      node1172.16.10.1PD1
      node2172.16.10.2PD2
      node3172.16.10.3PD3, Monitor
      node4172.16.10.4TiDB1
      node5172.16.10.5TiDB2
      node101172.16.10.101TiDB3
      node102172.16.10.102TiDB4
      node6172.16.10.6TiKV1
      node7172.16.10.7TiKV2
      node8172.16.10.8TiKV3
      node9172.16.10.9TiKV4
    • Edit the hosts.ini file:

      [servers]
      172.16.10.1
      172.16.10.2
      172.16.10.3
      172.16.10.4
      172.16.10.5
      172.16.10.6
      172.16.10.7
      172.16.10.8
      172.16.10.9
      172.16.10.101
      172.16.10.102
      [all:vars]
      username = tidb
      ntp_server = pool.ntp.org
      
  2. Initialize the newly added node.

    1. Configure the SSH mutual trust and sudo rules of the deployment machine on the central control machine:

      ansible-playbook -i hosts.ini create_users.yml -l 172.16.10.101,172.16.10.102 -u root -k
      
    2. Install the NTP service on the deployment target machine:

      ansible-playbook -i hosts.ini deploy_ntp.yml -u tidb -b
      
    3. Initialize the node on the deployment target machine:

      ansible-playbook bootstrap.yml -l 172.16.10.101,172.16.10.102
      
  3. Deploy the newly added node:

    ansible-playbook deploy.yml -l 172.16.10.101,172.16.10.102
    
  4. Start the newly added node:

    ansible-playbook start.yml -l 172.16.10.101,172.16.10.102
    
  5. Update the Prometheus configuration and restart the cluster:

    ansible-playbook rolling_update_monitor.yml --tags=prometheus
    
  6. Monitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform: http://172.16.10.3:3000.

You can use the same procedure to add a TiKV node. But to add a PD node, some configuration files need to be manually updated.

Increase the capacity of a PD node

For example, if you want to add a PD node (node103) with the IP address 172.16.10.103, take the following steps:

  1. Edit the inventory.ini file and append the node information to the end of the [pd_servers] group:

    [tidb_servers]
    172.16.10.4
    172.16.10.5
    
    [pd_servers]
    172.16.10.1
    172.16.10.2
    172.16.10.3
    172.16.10.103
    
    [tikv_servers]
    172.16.10.6
    172.16.10.7
    172.16.10.8
    172.16.10.9
    
    [monitored_servers]
    172.16.10.4
    172.16.10.5
    172.16.10.1
    172.16.10.2
    172.16.10.3
    172.16.10.103
    172.16.10.6
    172.16.10.7
    172.16.10.8
    172.16.10.9
    
    [monitoring_servers]
    172.16.10.3
    
    [grafana_servers]
    172.16.10.3
    

    Now the topology is as follows:

    NameHost IPServices
    node1172.16.10.1PD1
    node2172.16.10.2PD2
    node3172.16.10.3PD3, Monitor
    node103172.16.10.103PD4
    node4172.16.10.4TiDB1
    node5172.16.10.5TiDB2
    node6172.16.10.6TiKV1
    node7172.16.10.7TiKV2
    node8172.16.10.8TiKV3
    node9172.16.10.9TiKV4
  2. Initialize the newly added node:

    ansible-playbook bootstrap.yml -l 172.16.10.103
    
  3. Deploy the newly added node:

    ansible-playbook deploy.yml -l 172.16.10.103
    
  4. Login the newly added PD node and edit the starting script:

    {deploy_dir}/scripts/run_pd.sh
    
    1. Remove the --initial-cluster="xxxx" \ configuration.

    2. Add --join="http://172.16.10.1:2379" \. The IP address (172.16.10.1) can be any of the existing PD IP address in the cluster.

    3. Start the PD service in the newly added PD node:

      {deploy_dir}/scripts/start_pd.sh
      
    4. Use pd-ctl to check whether the new node is added successfully:

      ./pd-ctl -u "http://172.16.10.1:2379"
      
  5. Start the monitoring service:

    ansible-playbook start.yml -l 172.16.10.103
    
  6. Update the cluster configuration:

    ansible-playbook deploy.yml
    
  7. Restart Prometheus, and enable the monitoring of PD nodes used for increasing the capacity:

    ansible-playbook stop.yml --tags=prometheus
    ansible-playbook start.yml --tags=prometheus
    
  8. Monitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform: http://172.16.10.3:3000.

Decrease the capacity of a TiDB node

For example, if you want to remove a TiDB node (node5) with the IP address 172.16.10.5, take the following steps:

  1. Stop all services on node5:

    ansible-playbook stop.yml -l 172.16.10.5
    
  2. Edit the inventory.ini file and remove the node information:

    [tidb_servers]
    172.16.10.4
    #172.16.10.5  # the removed node
    
    [pd_servers]
    172.16.10.1
    172.16.10.2
    172.16.10.3
    
    [tikv_servers]
    172.16.10.6
    172.16.10.7
    172.16.10.8
    172.16.10.9
    
    [monitored_servers]
    172.16.10.4
    #172.16.10.5  # the removed node
    172.16.10.1
    172.16.10.2
    172.16.10.3
    172.16.10.6
    172.16.10.7
    172.16.10.8
    172.16.10.9
    
    [monitoring_servers]
    172.16.10.3
    
    [grafana_servers]
    172.16.10.3
    

    Now the topology is as follows:

    NameHost IPServices
    node1172.16.10.1PD1
    node2172.16.10.2PD2
    node3172.16.10.3PD3, Monitor
    node4172.16.10.4TiDB1
    node5172.16.10.5TiDB2 removed
    node6172.16.10.6TiKV1
    node7172.16.10.7TiKV2
    node8172.16.10.8TiKV3
    node9172.16.10.9TiKV4
  3. Update the Prometheus configuration and restart the cluster:

    ansible-playbook rolling_update_monitor.yml --tags=prometheus
    
  4. Monitor the status of the entire cluster by opening a browser to access the monitoring platform: http://172.16.10.3:3000.

Decrease the capacity of a TiKV node

For example, if you want to remove a TiKV node (node9) with the IP address 172.16.10.9, take the following steps:

  1. Remove the node from the cluster using pd-ctl:

    1. View the store ID of node9:

      ./pd-ctl -u "http://172.16.10.1:2379" -d store
      
    2. Remove node9 from the cluster, assuming that the store ID is 10:

      ./pd-ctl -u "http://172.16.10.1:2379" -d store delete 10
      
  2. Use pd-ctl to check whether the node is successfully removed:

    ./pd-ctl -u "http://172.16.10.1:2379" -d store 10
    
  3. After the node is successfully removed, stop the services on node9:

    ansible-playbook stop.yml -l 172.16.10.9
    
  4. Edit the inventory.ini file and remove the node information:

    [tidb_servers]
    172.16.10.4
    172.16.10.5
    
    [pd_servers]
    172.16.10.1
    172.16.10.2
    172.16.10.3
    
    [tikv_servers]
    172.16.10.6
    172.16.10.7
    172.16.10.8
    #172.16.10.9  # the removed node
    
    [monitored_servers]
    172.16.10.4
    172.16.10.5
    172.16.10.1
    172.16.10.2
    172.16.10.3
    172.16.10.6
    172.16.10.7
    172.16.10.8
    #172.16.10.9  # the removed node
    
    [monitoring_servers]
    172.16.10.3
    
    [grafana_servers]
    172.16.10.3
    

    Now the topology is as follows:

    NameHost IPServices
    node1172.16.10.1PD1
    node2172.16.10.2PD2
    node3172.16.10.3PD3, Monitor
    node4172.16.10.4TiDB1
    node5172.16.10.5TiDB2
    node6172.16.10.6TiKV1
    node7172.16.10.7TiKV2
    node8172.16.10.8TiKV3
    node9172.16.10.9TiKV4 removed
  5. Update the Prometheus configuration and restart the cluster:

    ansible-playbook rolling_update_monitor.yml --tags=prometheus
    
  6. Monitor the status of the entire cluster by opening a browser to access the monitoring platform: http://172.16.10.3:3000.

Decrease the capacity of a PD node

For example, if you want to remove a PD node (node2) with the IP address 172.16.10.2, take the following steps:

  1. Remove the node from the cluster using pd-ctl:

    1. View the name of node2:

      ./pd-ctl -u "http://172.16.10.1:2379" -d member
      
    2. Remove node2 from the cluster, assuming that the name is pd2:

      ./pd-ctl -u "http://172.16.10.1:2379" -d member delete name pd2
      
  2. Use Grafana or pd-ctl to check whether the node is successfully removed:

    ./pd-ctl -u "http://172.16.10.1:2379" -d member
    
  3. After the node is successfully removed, stop the services on node2:

    ansible-playbook stop.yml -l 172.16.10.2
    
  4. Edit the inventory.ini file and remove the node information:

    [tidb_servers]
    172.16.10.4
    172.16.10.5
    
    [pd_servers]
    172.16.10.1
    #172.16.10.2  # the removed node
    172.16.10.3
    
    [tikv_servers]
    172.16.10.6
    172.16.10.7
    172.16.10.8
    172.16.10.9
    
    [monitored_servers]
    172.16.10.4
    172.16.10.5
    172.16.10.1
    #172.16.10.2  # the removed node
    172.16.10.3
    172.16.10.6
    172.16.10.7
    172.16.10.8
    172.16.10.9
    
    [monitoring_servers]
    172.16.10.3
    
    [grafana_servers]
    172.16.10.3
    

    Now the topology is as follows:

    NameHost IPServices
    node1172.16.10.1PD1
    node2172.16.10.2PD2 removed
    node3172.16.10.3PD3, Monitor
    node4172.16.10.4TiDB1
    node5172.16.10.5TiDB2
    node6172.16.10.6TiKV1
    node7172.16.10.7TiKV2
    node8172.16.10.8TiKV3
    node9172.16.10.9TiKV4
  5. Update the cluster configuration:

    ansible-playbook deploy.yml
    
  6. Restart Prometheus, and disable the monitoring of PD nodes used for increasing the capacity:

    ansible-playbook stop.yml --tags=prometheus
    ansible-playbook start.yml --tags=prometheus
    
  7. To monitor the status of the entire cluster, open a browser to access the monitoring platform: http://172.16.10.3:3000.

Download PDFRequest docs changes
Was this page helpful?
Open Source Ecosystem
TiDB
TiKV
TiSpark
Chaos Mesh
© 2022 PingCAP. All Rights Reserved.