- Docs Home
- About TiDB
- Quick Start
- Deploy
- Software and Hardware Requirements
- Environment Configuration Checklist
- Topology Patterns
- Install and Start
- Verify Cluster Status
- Benchmarks Methods
- Migrate
- Maintain
- Monitor and Alert
- Troubleshoot
- TiDB Troubleshooting Map
- Identify Slow Queries
- Analyze Slow Queries
- SQL Diagnostics
- Identify Expensive Queries
- Statement Summary Tables
- Troubleshoot Hotspot Issues
- Troubleshoot Increased Read and Write Latency
- Troubleshoot Cluster Setup
- Troubleshoot High Disk I/O Usage
- Troubleshoot Lock Conflicts
- Troubleshoot TiFlash
- Troubleshoot Write Conflicts in Optimistic Transactions
- Performance Tuning
- System Tuning
- Software Tuning
- SQL Tuning
- Overview
- Understanding the Query Execution Plan
- SQL Optimization Process
- Overview
- Logic Optimization
- Physical Optimization
- Prepare Execution Plan Cache
- Control Execution Plans
- Tutorials
- TiDB Tools
- Overview
- Use Cases
- Download
- TiUP
- TiDB Operator
- Backup & Restore (BR)
- TiDB Binlog
- TiDB Lightning
- TiDB Data Migration
- TiCDC
- Dumpling
- sync-diff-inspector
- Loader
- Mydumper
- Syncer
- TiSpark
- Reference
- Cluster Architecture
- Key Monitoring Metrics
- Secure
- Privileges
- SQL
- SQL Language Structure and Syntax
- SQL Statements
ADD COLUMN
ADD INDEX
ADMIN
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
ALTER DATABASE
ALTER INSTANCE
ALTER TABLE
ALTER USER
ANALYZE TABLE
BACKUP
BEGIN
CHANGE COLUMN
COMMIT
CHANGE DRAINER
CHANGE PUMP
CREATE [GLOBAL|SESSION] BINDING
CREATE DATABASE
CREATE INDEX
CREATE ROLE
CREATE SEQUENCE
CREATE TABLE LIKE
CREATE TABLE
CREATE USER
CREATE VIEW
DEALLOCATE
DELETE
DESC
DESCRIBE
DO
DROP [GLOBAL|SESSION] BINDING
DROP COLUMN
DROP DATABASE
DROP INDEX
DROP ROLE
DROP SEQUENCE
DROP STATS
DROP TABLE
DROP USER
DROP VIEW
EXECUTE
EXPLAIN ANALYZE
EXPLAIN
FLASHBACK TABLE
FLUSH PRIVILEGES
FLUSH STATUS
FLUSH TABLES
GRANT <privileges>
GRANT <role>
INSERT
KILL [TIDB]
LOAD DATA
LOAD STATS
MODIFY COLUMN
PREPARE
RECOVER TABLE
RENAME INDEX
RENAME TABLE
REPLACE
RESTORE
REVOKE <privileges>
REVOKE <role>
ROLLBACK
SELECT
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET PASSWORD
SET ROLE
SET TRANSACTION
SET [GLOBAL|SESSION] <variable>
SHOW ANALYZE STATUS
SHOW [BACKUPS|RESTORES]
SHOW [GLOBAL|SESSION] BINDINGS
SHOW BUILTINS
SHOW CHARACTER SET
SHOW COLLATION
SHOW [FULL] COLUMNS FROM
SHOW CONFIG
SHOW CREATE SEQUENCE
SHOW CREATE TABLE
SHOW CREATE USER
SHOW DATABASES
SHOW DRAINER STATUS
SHOW ENGINES
SHOW ERRORS
SHOW [FULL] FIELDS FROM
SHOW GRANTS
SHOW INDEX [FROM|IN]
SHOW INDEXES [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW MASTER STATUS
SHOW PLUGINS
SHOW PRIVILEGES
SHOW [FULL] PROCESSSLIST
SHOW PROFILES
SHOW PUMP STATUS
SHOW SCHEMAS
SHOW STATS_HEALTHY
SHOW STATS_HISTOGRAMS
SHOW STATS_META
SHOW STATUS
SHOW TABLE NEXT_ROW_ID
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [FULL] TABLES
SHOW [GLOBAL|SESSION] VARIABLES
SHOW WARNINGS
SHUTDOWN
SPLIT REGION
START TRANSACTION
TRACE
TRUNCATE
UPDATE
USE
- Data Types
- Functions and Operators
- Overview
- Type Conversion in Expression Evaluation
- Operators
- Control Flow Functions
- String Functions
- Numeric Functions and Operators
- Date and Time Functions
- Bit Functions and Operators
- Cast Functions and Operators
- Encryption and Compression Functions
- Information Functions
- JSON Functions
- Aggregate (GROUP BY) Functions
- Window Functions
- Miscellaneous Functions
- Precision Math
- List of Expressions for Pushdown
- Constraints
- Generated Columns
- SQL Mode
- Transactions
- Garbage Collection (GC)
- Views
- Partitioning
- Character Set and Collation
- System Tables
mysql
- INFORMATION_SCHEMA
- Overview
ANALYZE_STATUS
CHARACTER_SETS
CLUSTER_CONFIG
CLUSTER_HARDWARE
CLUSTER_INFO
CLUSTER_LOAD
CLUSTER_LOG
CLUSTER_SYSTEMINFO
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
COLUMNS
DDL_JOBS
ENGINES
INSPECTION_RESULT
INSPECTION_RULES
INSPECTION_SUMMARY
KEY_COLUMN_USAGE
METRICS_SUMMARY
METRICS_TABLES
PARTITIONS
PROCESSLIST
SCHEMATA
SEQUENCES
SESSION_VARIABLES
SLOW_QUERY
STATISTICS
TABLES
TABLE_CONSTRAINTS
TABLE_STORAGE_STATS
TIDB_HOT_REGIONS
TIDB_INDEXES
TIDB_SERVERS_INFO
TIFLASH_REPLICA
TIKV_REGION_PEERS
TIKV_REGION_STATUS
TIKV_STORE_STATUS
USER_PRIVILEGES
VIEWS
METRICS_SCHEMA
- UI
- TiDB Dashboard
- Overview
- Maintain
- Access
- Overview Page
- Cluster Info Page
- Key Visualizer Page
- Metrics Relation Graph
- SQL Statements Analysis
- Slow Queries Page
- Cluster Diagnostics
- Search Logs Page
- Profile Instances Page
- Session Management and Configuration
- FAQ
- CLI
- Command Line Flags
- Configuration File Parameters
- System Variables
- Storage Engines
- Telemetry
- Errors Codes
- Table Filter
- Schedule Replicas by Topology Labels
- FAQs
- Release Notes
- All Releases
- v4.0
- v3.1
- v3.0
- v2.1
- v2.0
- v1.0
- Glossary
You are viewing the documentation of an older version of the TiDB database (TiDB v4.0).
Scale the TiDB Cluster Using TiDB Ansible
The capacity of a TiDB cluster can be increased or decreased without affecting the online services.
For production environments, it is recommended that you scale TiDB using TiUP. Since v4.0, PingCAP no longer provides support for scaling TiDB using TiDB Ansible (deprecated). If you really need to use it for scaling, be aware of any risk.
In decreasing the capacity, if your cluster has a mixed deployment of other services, do not perform the following procedures. The following examples assume that the removed nodes have no mixed deployment of other services.
Assume that the topology is as follows:
Name | Host IP | Services |
---|---|---|
node1 | 172.16.10.1 | PD1 |
node2 | 172.16.10.2 | PD2 |
node3 | 172.16.10.3 | PD3, Monitor |
node4 | 172.16.10.4 | TiDB1 |
node5 | 172.16.10.5 | TiDB2 |
node6 | 172.16.10.6 | TiKV1 |
node7 | 172.16.10.7 | TiKV2 |
node8 | 172.16.10.8 | TiKV3 |
node9 | 172.16.10.9 | TiKV4 |
Increase the capacity of a TiDB/TiKV node
For example, if you want to add two TiDB nodes (node101, node102) with the IP addresses 172.16.10.101
and 172.16.10.102
, take the following steps:
Edit the
inventory.ini
file and thehosts.ini
file, and append the node information.Edit the
inventory.ini
file:[tidb_servers] 172.16.10.4 172.16.10.5 172.16.10.101 172.16.10.102 [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.4 172.16.10.5 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 172.16.10.101 172.16.10.102 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3
Now the topology is as follows:
Name Host IP Services node1 172.16.10.1 PD1 node2 172.16.10.2 PD2 node3 172.16.10.3 PD3, Monitor node4 172.16.10.4 TiDB1 node5 172.16.10.5 TiDB2 node101 172.16.10.101 TiDB3 node102 172.16.10.102 TiDB4 node6 172.16.10.6 TiKV1 node7 172.16.10.7 TiKV2 node8 172.16.10.8 TiKV3 node9 172.16.10.9 TiKV4 Edit the
hosts.ini
file:[servers] 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.4 172.16.10.5 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 172.16.10.101 172.16.10.102 [all:vars] username = tidb ntp_server = pool.ntp.org
Initialize the newly added node.
Configure the SSH mutual trust and sudo rules of the target machine on the control machine:
ansible-playbook -i hosts.ini create_users.yml -l 172.16.10.101,172.16.10.102 -u root -k
Install the NTP service on the target machine:
ansible-playbook -i hosts.ini deploy_ntp.yml -u tidb -b
Initialize the node on the target machine:
ansible-playbook bootstrap.yml -l 172.16.10.101,172.16.10.102
NoteIf an alias is configured in the
inventory.ini
file, for example,node101 ansible_host=172.16.10.101
, use-l
to specify the alias when executingansible-playbook
. For example,ansible-playbook bootstrap.yml -l node101,node102
. This also applies to the following steps.Deploy the newly added node:
ansible-playbook deploy.yml -l 172.16.10.101,172.16.10.102
Start the newly added node:
ansible-playbook start.yml -l 172.16.10.101,172.16.10.102
Update the Prometheus configuration and restart the cluster:
ansible-playbook rolling_update_monitor.yml --tags=prometheus
Monitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform:
http://172.16.10.3:3000
.
You can use the same procedure to add a TiKV node. But to add a PD node, some configuration files need to be manually updated.
Increase the capacity of a PD node
For example, if you want to add a PD node (node103) with the IP address 172.16.10.103
, take the following steps:
Edit the
inventory.ini
file and append the node information to the end of the[pd_servers]
group:[tidb_servers] 172.16.10.4 172.16.10.5 [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.103 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.4 172.16.10.5 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.103 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3
Now the topology is as follows:
Name Host IP Services node1 172.16.10.1 PD1 node2 172.16.10.2 PD2 node3 172.16.10.3 PD3, Monitor node103 172.16.10.103 PD4 node4 172.16.10.4 TiDB1 node5 172.16.10.5 TiDB2 node6 172.16.10.6 TiKV1 node7 172.16.10.7 TiKV2 node8 172.16.10.8 TiKV3 node9 172.16.10.9 TiKV4 Initialize the newly added node:
ansible-playbook bootstrap.yml -l 172.16.10.103
Deploy the newly added node:
ansible-playbook deploy.yml -l 172.16.10.103
Login the newly added PD node and edit the starting script:
{deploy_dir}/scripts/run_pd.sh
Remove the
--initial-cluster="xxxx" \
configuration.NoteYou cannot add the
#
character at the beginning of the line. Otherwise, the following configuration cannot take effect.Add
--join="http://172.16.10.1:2379" \
. The IP address (172.16.10.1
) can be any of the existing PD IP address in the cluster.Start the PD service in the newly added PD node:
{deploy_dir}/scripts/start_pd.sh
NoteBefore start, you need to ensure that the
health
status of the newly added PD node is "true", using PD Control. Otherwise, the PD service might fail to start and an error message["join meet error"] [error="etcdserver: unhealthy cluster"]
is returned in the log.Use
pd-ctl
to check whether the new node is added successfully:./pd-ctl -u "http://172.16.10.1:2379"
Notepd-ctl
is a command used to check the number of PD nodes.
Start the monitoring service:
ansible-playbook start.yml -l 172.16.10.103
NoteIf you use an alias (inventory_name), use the
-l
option to specify the alias.Update the cluster configuration:
ansible-playbook deploy.yml
Restart Prometheus, and enable the monitoring of PD nodes used for increasing the capacity:
ansible-playbook stop.yml --tags=prometheus ansible-playbook start.yml --tags=prometheus
Monitor the status of the entire cluster and the newly added node by opening a browser to access the monitoring platform:
http://172.16.10.3:3000
.
The PD Client in TiKV caches the list of PD nodes. Currently, the list is updated only if the PD leader is switched or the TiKV server is restarted to load the latest configuration. To avoid TiKV caching an outdated list, there should be at least two existing PD members in the PD cluster after increasing or decreasing the capacity of a PD node. If this condition is not met, transfer the PD leader manually to update the list of PD nodes.
Decrease the capacity of a TiDB node
For example, if you want to remove a TiDB node (node5) with the IP address 172.16.10.5
, take the following steps:
Stop all services on node5:
ansible-playbook stop.yml -l 172.16.10.5
Edit the
inventory.ini
file and remove the node information:[tidb_servers] 172.16.10.4 #172.16.10.5 # the removed node [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.4 #172.16.10.5 # the removed node 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3
Now the topology is as follows:
Name Host IP Services node1 172.16.10.1 PD1 node2 172.16.10.2 PD2 node3 172.16.10.3 PD3, Monitor node4 172.16.10.4 TiDB1 node5 172.16.10.5 TiDB2 removed node6 172.16.10.6 TiKV1 node7 172.16.10.7 TiKV2 node8 172.16.10.8 TiKV3 node9 172.16.10.9 TiKV4 Update the Prometheus configuration and restart the cluster:
ansible-playbook rolling_update_monitor.yml --tags=prometheus
Monitor the status of the entire cluster by opening a browser to access the monitoring platform:
http://172.16.10.3:3000
.
Decrease the capacity of a TiKV node
For example, if you want to remove a TiKV node (node9) with the IP address 172.16.10.9
, take the following steps:
Remove the node from the cluster using
pd-ctl
:View the store ID of node9:
./pd-ctl -u "http://172.16.10.1:2379" -d store
Remove node9 from the cluster, assuming that the store ID is 10:
./pd-ctl -u "http://172.16.10.1:2379" -d store delete 10
Use
pd-ctl
to check whether the node is successfully removed:./pd-ctl -u "http://172.16.10.1:2379" -d store 10
NoteIt takes some time to remove the node. If the status of the node you remove becomes Tombstone, then this node is successfully removed.
After the node is successfully removed, stop the services on node9:
ansible-playbook stop.yml -l 172.16.10.9
Edit the
inventory.ini
file and remove the node information:[tidb_servers] 172.16.10.4 172.16.10.5 [pd_servers] 172.16.10.1 172.16.10.2 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 #172.16.10.9 # the removed node [monitored_servers] 172.16.10.4 172.16.10.5 172.16.10.1 172.16.10.2 172.16.10.3 172.16.10.6 172.16.10.7 172.16.10.8 #172.16.10.9 # the removed node [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3
Now the topology is as follows:
Name Host IP Services node1 172.16.10.1 PD1 node2 172.16.10.2 PD2 node3 172.16.10.3 PD3, Monitor node4 172.16.10.4 TiDB1 node5 172.16.10.5 TiDB2 node6 172.16.10.6 TiKV1 node7 172.16.10.7 TiKV2 node8 172.16.10.8 TiKV3 node9 172.16.10.9 TiKV4 removed Update the Prometheus configuration and restart the cluster:
ansible-playbook rolling_update_monitor.yml --tags=prometheus
Monitor the status of the entire cluster by opening a browser to access the monitoring platform:
http://172.16.10.3:3000
.
Decrease the capacity of a PD node
For example, if you want to remove a PD node (node2) with the IP address 172.16.10.2
, take the following steps:
Remove the node from the cluster using
pd-ctl
:View the name of node2:
./pd-ctl -u "http://172.16.10.1:2379" -d member
Remove node2 from the cluster, assuming that the name is pd2:
./pd-ctl -u "http://172.16.10.1:2379" -d member delete name pd2
Use Grafana or
pd-ctl
to check whether the node is successfully removed:./pd-ctl -u "http://172.16.10.1:2379" -d member
After the node is successfully removed, stop the services on node2:
ansible-playbook stop.yml -l 172.16.10.2
NoteIn this example, you can only stop the PD service on node2. If there are any other services deployed with the IP address
172.16.10.2
, use the-t
option to specify the service (such as-t tidb
).Edit the
inventory.ini
file and remove the node information:[tidb_servers] 172.16.10.4 172.16.10.5 [pd_servers] 172.16.10.1 #172.16.10.2 # the removed node 172.16.10.3 [tikv_servers] 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitored_servers] 172.16.10.4 172.16.10.5 172.16.10.1 #172.16.10.2 # the removed node 172.16.10.3 172.16.10.6 172.16.10.7 172.16.10.8 172.16.10.9 [monitoring_servers] 172.16.10.3 [grafana_servers] 172.16.10.3
Now the topology is as follows:
Name Host IP Services node1 172.16.10.1 PD1 node2 172.16.10.2 PD2 removed node3 172.16.10.3 PD3, Monitor node4 172.16.10.4 TiDB1 node5 172.16.10.5 TiDB2 node6 172.16.10.6 TiKV1 node7 172.16.10.7 TiKV2 node8 172.16.10.8 TiKV3 node9 172.16.10.9 TiKV4 Update the cluster configuration:
ansible-playbook deploy.yml
Restart Prometheus, and disable the monitoring of PD nodes used for increasing the capacity:
ansible-playbook stop.yml --tags=prometheus ansible-playbook start.yml --tags=prometheus
To monitor the status of the entire cluster, open a browser to access the monitoring platform:
http://172.16.10.3:3000
.
The PD Client in TiKV caches the list of PD nodes. Currently, the list is updated only if the PD leader is switched or the TiKV server is restarted to load the latest configuration. To avoid TiKV caching an outdated list, there should be at least two existing PD members in the PD cluster after increasing or decreasing the capacity of a PD node. If this condition is not met, transfer the PD leader manually to update the list of PD nodes.