- Introduction
- Concepts
- Architecture
- Key Features
- Horizontal Scalability
- MySQL Compatible Syntax
- Replicate from and to MySQL
- Distributed Transactions with Strong Consistency
- Cloud Native Architecture
- Minimize ETL with HTAP
- Fault Tolerance & Recovery with Raft
- Automatic Rebalancing
- Deployment and Orchestration with Ansible, Kubernetes, Docker
- JSON Support
- Spark Integration
- Read Historical Data Without Restoring from Backup
- Fast Import and Restore of Data
- Hybrid of Column and Row Storage
- SQL Plan Management
- Open Source
- Online Schema Changes
- How-to
- Get Started
- Deploy
- Hardware Recommendations
- From Binary Tarball
- Orchestrated Deployment
- Geographic Redundancy
- Data Migration with Ansible
- Configure
- Secure
- Transport Layer Security (TLS)
- Generate Self-signed Certificates
- Monitor
- Migrate
- Maintain
- Common Ansible Operations
- Backup and Restore
- Use BR (recommended)
- Identify Abnormal Queries
- Scale
- Upgrade
- Troubleshoot
- Reference
- SQL
- MySQL Compatibility
- SQL Language Structure
- Attributes
- Data Types
- Functions and Operators
- Function and Operator Reference
- Type Conversion in Expression Evaluation
- Operators
- Control Flow Functions
- String Functions
- Numeric Functions and Operators
- Date and Time Functions
- Bit Functions and Operators
- Cast Functions and Operators
- Encryption and Compression Functions
- Information Functions
- JSON Functions
- Aggregate (GROUP BY) Functions
- Window Functions
- Miscellaneous Functions
- Precision Math
- List of Expressions for Pushdown
- SQL Statements
ADD COLUMN
ADD INDEX
ADMIN
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
ALTER DATABASE
ALTER INSTANCE
ALTER TABLE
ALTER USER
ANALYZE TABLE
BEGIN
CHANGE COLUMN
COMMIT
CREATE DATABASE
CREATE INDEX
CREATE ROLE
CREATE TABLE LIKE
CREATE TABLE
CREATE USER
CREATE VIEW
DEALLOCATE
DELETE
DESC
DESCRIBE
DO
DROP COLUMN
DROP DATABASE
DROP INDEX
DROP ROLE
DROP TABLE
DROP USER
DROP VIEW
EXECUTE
EXPLAIN ANALYZE
EXPLAIN
FLUSH PRIVILEGES
FLUSH STATUS
FLUSH TABLES
GRANT <privileges>
GRANT <role>
INSERT
KILL [TIDB]
LOAD DATA
LOAD STATS
MODIFY COLUMN
PREPARE
RECOVER TABLE
RENAME INDEX
RENAME TABLE
REPLACE
REVOKE <privileges>
REVOKE <role>
ROLLBACK
SELECT
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET PASSWORD
SET ROLE
SET TRANSACTION
SET [GLOBAL|SESSION] <variable>
SHOW ANALYZE STATUS
SHOW CHARACTER SET
SHOW COLLATION
SHOW [FULL] COLUMNS FROM
SHOW CREATE TABLE
SHOW CREATE USER
SHOW DATABASES
SHOW ENGINES
SHOW ERRORS
SHOW [FULL] FIELDS FROM
SHOW GRANTS
SHOW INDEXES [FROM|IN]
SHOW INDEX [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW PRIVILEGES
SHOW [FULL] PROCESSSLIST
SHOW SCHEMAS
SHOW STATUS
SHOW [FULL] TABLES
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [GLOBAL|SESSION] VARIABLES
SHOW WARNINGS
SPLIT REGION
START TRANSACTION
TRACE
TRUNCATE
UPDATE
USE
- Constraints
- Generated Columns
- Partitioning
- Character Set
- SQL Mode
- Views
- Configuration
- Security
- Transactions
- System Databases
- Errors Codes
- Supported Client Drivers
- Garbage Collection (GC)
- Performance
- Overview
- Understanding the Query Execution Plan
- The Blocklist of Optimization Rules and Expression Pushdown
- Introduction to Statistics
- TopN and Limit Push Down
- Optimizer Hints
- Follower Read
- Check the TiDB Cluster Status Using SQL Statements
- Execution Plan Binding
- Statement Summary Table
- Tune TiKV
- Operating System Tuning
- Column Pruning
- Key Monitoring Metrics
- Alert Rules
- Best Practices
- TiSpark
- TiKV
- TiFlash
- TiDB Binlog
- Tools
- Overview
- Use Cases
- Download
- TiDB Operator
- Table Filter
- Backup & Restore (BR)
- Mydumper
- Syncer
- Loader
- Data Migration
- TiDB Lightning
- sync-diff-inspector
- PD Control
- PD Recover
- TiKV Control
- TiDB Control
- TiDB in Kubernetes
- FAQs
- Support
- Contribute
- Releases
- All Releases
- v3.1
- v3.0
- v2.1
- v2.0
- v1.0
- Glossary
You are viewing the documentation of an older version of the TiDB database (TiDB v3.1).
Maintain a TiFlash Cluster
This document describes how to perform common operations when you maintain a TiFlash cluster, including checking the TiFlash version, taking TiFlash nodes down, and troubleshooting TiFlash. This document also introduces critical logs and a system table of TiFlash.
Check the TiFlash version
There are two ways to check the TiFlash version:
If the binary file name of TiFlash is
tiflash
, you can check the version by executing the./tiflash version
command.However, to execute the above command, you need to add the directory path which includes the
libtiflash_proxy.so
dynamic library to theLD_LIBRARY_PATH
environment variable. This is because the running of TiFlash relies on thelibtiflash_proxy.so
dynamic library.For example, when
tiflash
andlibtiflash_proxy.so
are in the same directory, you can first switch to this directory, and then use the following command to check the TiFlash version:LD_LIBRARY_PATH=./ ./tiflash version
Check the TiFlash version by referring to the TiFlash log. For the log path, see the
[logger]
part in thetiflash.toml
file. For example:<information>: TiFlash version: TiFlash 0.2.0 master-375035282451103999f3863c691e2fc2
Take a TiFlash node down
Taking a TiFlash node down differs from Scaling in a TiFlash node in that the former doesn't remove the node in TiDB Ansible; instead, it just safely shuts down the TiFlash process.
Follow the steps below to take a TiFlash node down:
After you take the TiFlash node down, if the number of the remaining nodes in the TiFlash cluster is greater than or equal to the maximum replicas of all data tables, you can go directly to step 3.
If the number of replicas of tables is greater than or equal to that of the remaining TiFlash nodes in the cluster, execute the following command on these tables in the TiDB client:
alter table <db-name>.<table-name> set tiflash replica 0;
To ensure that the TiFlash replicas of these tables are removed, see Check the Replication Progress. If you cannot view the replication progress of the related tables, it means that the replicas are removed.
Input the
store
command into pd-ctl (the binary file is inresources/bin
of the tidb-ansible directory) to view thestore id
of the TiFlash node.Input
store delete <store_id>
intopd-ctl
. Here<store_id>
refers to thestore id
in step 3.When the corresponding
store
of the node disappears, or whenstate_name
is changed toTombstone
, stop the TiFlash process.
If you don't cancel all tables replicated to TiFlash before all TiFlash nodes stop running, you need to manually delete the replication rules in PD. Or you cannot successfully take the TiFlash node down.
To manually delete the replication rules in PD, take the following steps:
Query all the data replication rules related to TiFlash in the current PD instance:
curl http://<pd_ip>:<pd_port>/pd/api/v1/config/rules/group/tiflash
[ { "group_id": "tiflash", "id": "table-45-r", "override": true, "start_key": "7480000000000000FF2D5F720000000000FA", "end_key": "7480000000000000FF2E00000000000000F8", "role": "learner", "count": 1, "label_constraints": [ { "key": "engine", "op": "in", "values": [ "tiflash" ] } ] } ]
Delete all the data replication rules related to TiFlash. The following example command deletes the rule whose
id
istable-45-r
:curl -v -X DELETE http://<pd_ip>:<pd_port>/pd/api/v1/config/rule/tiflash/table-45-r
TiFlash troubleshooting
This section describes some commonly encountered issues when using TiFlash, the reasons, and the solutions.
TiFlash replica is always unavailable
This is because TiFlash is in an abnormal state caused by configuration errors or environment issues. Take the following steps to identify the faulty component:
Check whether PD enables the
Placement Rules
feature (to enable the feature, see the step 2 of Add TiFlash component to an existing TiDB cluster:echo 'config show replication' | /path/to/pd-ctl -u http://<pd-ip>:<pd-port>
The expected result is
"enable-placement-rules": "true"
.Check whether the TiFlash process is working correctly by viewing
UpTime
on the TiFlash-Summary monitoring panel.Check whether the TiFlash proxy status is normal through
pd-ctl
.echo "store" | /path/to/pd-ctl -u http://<pd-ip>:<pd-port>
The TiFlash proxy's
store.labels
includes information such as{"key": "engine", "value": "tiflash"}
. You can check this information to confirm a TiFlash proxy.Check whether
pd buddy
can correctly print the logs (the log path is the value oflog
in the [flash.flash_cluster] configuration item; the default log path is under thetmp
directory configured in the TiFlash configuration file).Check whether the value of
max-replicas
in PD is less than or equal to the number of TiKV nodes in the cluster. If not, PD cannot replicate data to TiFlash:echo 'config show replication' | /path/to/pd-ctl -u http://<pd-ip>:<pd-port>
Reconfirm the value of
max-replicas
.Check whether the remaining disk space of the machine (where
store
of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of thestore
capacity (which is controlled by thelow-space-ratio
parameter), PD cannot schedule data to this TiFlash node.
TiFlash query time is unstable, and the error log prints many Lock Exception
messages
This is because large amounts of data are written to the cluster, which causes that the TiFlash query encounters a lock and requires query retry.
You can set the query timestamp to one second earlier in TiDB. For example, if the current time is '2020-04-08 20:15:01', you can execute set @@tidb_snapshot='2020-04-08 20:15:00';
before you execute the query. This makes less TiFlash queries encounter a lock and mitigates the risk of unstable query time.
Some queries return the Region Unavailable
error
If the load pressure on TiFlash is too heavy and it causes that TiFlash data replication falls behind, some queries might return the Region Unavailable
error.
In this case, you can balance the load pressure by adding more TiFlash nodes.
Data file corruption
Take the following steps to handle the data file corruption:
- Refer to Take a TiFlash node down to take the corresponding TiFlash node down.
- Delete the related data of the TiFlash node.
- Redeploy the TiFlash node in the cluster.
TiFlash critical logs
Log Information | Log Description |
---|---|
[ 23 ] <Information> KVStore: Start to persist [region 47, applied: term 6 index 10] | Data starts to be replicated (the number in the square brackets at the start of the log refers to the thread ID |
[ 30 ] <Debug> CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | Handling DAG request, that is, TiFlash starts to handle a Coprocessor request |
[ 30 ] <Debug> CoprocessorHandler: grpc::Status DB::CoprocessorHandler::execute() | Handling DAG request done, that is, TiFlash finishes handling a Coprocessor request |
You can find the beginning or the end of a Coprocessor request, and then locate the related logs of the Coprocessor request through the thread ID printed at the start of the log.
TiFlash system table
The column names and their descriptions of the information_schema.tiflash_replica
system table are as follows:
Column Name | Description |
---|---|
TABLE_SCHEMA | database name |
TABLE_NAME | table name |
TABLE_ID | table ID |
REPLICA_COUNT | number of TiFlash replicas |
AVAILABLE | available or not (0/1) |
PROGRESS | replication progress [0.0~1.0] |