- Docs Home
- About TiDB
- Quick Start
- Deploy
- Software and Hardware Requirements
- Environment Configuration Checklist
- Topology Patterns
- Install and Start
- Verify Cluster Status
- Benchmarks Methods
- Migrate
- Maintain
- Monitor and Alert
- Troubleshoot
- TiDB Troubleshooting Map
- Identify Slow Queries
- Analyze Slow Queries
- SQL Diagnostics
- Identify Expensive Queries
- Statement Summary Tables
- Troubleshoot Hotspot Issues
- Troubleshoot Increased Read and Write Latency
- Troubleshoot Cluster Setup
- Troubleshoot High Disk I/O Usage
- Troubleshoot Lock Conflicts
- Troubleshoot TiFlash
- Troubleshoot Write Conflicts in Optimistic Transactions
- Performance Tuning
- System Tuning
- Software Tuning
- SQL Tuning
- Overview
- Understanding the Query Execution Plan
- SQL Optimization Process
- Overview
- Logic Optimization
- Physical Optimization
- Prepare Execution Plan Cache
- Control Execution Plans
- Tutorials
- TiDB Tools
- Overview
- Use Cases
- Download
- TiUP
- TiDB Operator
- Backup & Restore (BR)
- TiDB Binlog
- TiDB Lightning
- TiDB Data Migration
- TiCDC
- Dumpling
- sync-diff-inspector
- Loader
- Mydumper
- Syncer
- TiSpark
- Reference
- Cluster Architecture
- Key Monitoring Metrics
- Secure
- Privileges
- SQL
- SQL Language Structure and Syntax
- SQL Statements
ADD COLUMN
ADD INDEX
ADMIN
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
ALTER DATABASE
ALTER INSTANCE
ALTER TABLE
ALTER USER
ANALYZE TABLE
BACKUP
BEGIN
CHANGE COLUMN
COMMIT
CHANGE DRAINER
CHANGE PUMP
CREATE [GLOBAL|SESSION] BINDING
CREATE DATABASE
CREATE INDEX
CREATE ROLE
CREATE SEQUENCE
CREATE TABLE LIKE
CREATE TABLE
CREATE USER
CREATE VIEW
DEALLOCATE
DELETE
DESC
DESCRIBE
DO
DROP [GLOBAL|SESSION] BINDING
DROP COLUMN
DROP DATABASE
DROP INDEX
DROP ROLE
DROP SEQUENCE
DROP STATS
DROP TABLE
DROP USER
DROP VIEW
EXECUTE
EXPLAIN ANALYZE
EXPLAIN
FLASHBACK TABLE
FLUSH PRIVILEGES
FLUSH STATUS
FLUSH TABLES
GRANT <privileges>
GRANT <role>
INSERT
KILL [TIDB]
LOAD DATA
LOAD STATS
MODIFY COLUMN
PREPARE
RECOVER TABLE
RENAME INDEX
RENAME TABLE
REPLACE
RESTORE
REVOKE <privileges>
REVOKE <role>
ROLLBACK
SELECT
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET PASSWORD
SET ROLE
SET TRANSACTION
SET [GLOBAL|SESSION] <variable>
SHOW ANALYZE STATUS
SHOW [BACKUPS|RESTORES]
SHOW [GLOBAL|SESSION] BINDINGS
SHOW BUILTINS
SHOW CHARACTER SET
SHOW COLLATION
SHOW [FULL] COLUMNS FROM
SHOW CONFIG
SHOW CREATE SEQUENCE
SHOW CREATE TABLE
SHOW CREATE USER
SHOW DATABASES
SHOW DRAINER STATUS
SHOW ENGINES
SHOW ERRORS
SHOW [FULL] FIELDS FROM
SHOW GRANTS
SHOW INDEX [FROM|IN]
SHOW INDEXES [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW MASTER STATUS
SHOW PLUGINS
SHOW PRIVILEGES
SHOW [FULL] PROCESSSLIST
SHOW PROFILES
SHOW PUMP STATUS
SHOW SCHEMAS
SHOW STATS_HEALTHY
SHOW STATS_HISTOGRAMS
SHOW STATS_META
SHOW STATUS
SHOW TABLE NEXT_ROW_ID
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [FULL] TABLES
SHOW [GLOBAL|SESSION] VARIABLES
SHOW WARNINGS
SHUTDOWN
SPLIT REGION
START TRANSACTION
TRACE
TRUNCATE
UPDATE
USE
- Data Types
- Functions and Operators
- Overview
- Type Conversion in Expression Evaluation
- Operators
- Control Flow Functions
- String Functions
- Numeric Functions and Operators
- Date and Time Functions
- Bit Functions and Operators
- Cast Functions and Operators
- Encryption and Compression Functions
- Information Functions
- JSON Functions
- Aggregate (GROUP BY) Functions
- Window Functions
- Miscellaneous Functions
- Precision Math
- List of Expressions for Pushdown
- Constraints
- Generated Columns
- SQL Mode
- Transactions
- Garbage Collection (GC)
- Views
- Partitioning
- Character Set and Collation
- System Tables
mysql
- INFORMATION_SCHEMA
- Overview
ANALYZE_STATUS
CHARACTER_SETS
CLUSTER_CONFIG
CLUSTER_HARDWARE
CLUSTER_INFO
CLUSTER_LOAD
CLUSTER_LOG
CLUSTER_SYSTEMINFO
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
COLUMNS
DDL_JOBS
ENGINES
INSPECTION_RESULT
INSPECTION_RULES
INSPECTION_SUMMARY
KEY_COLUMN_USAGE
METRICS_SUMMARY
METRICS_TABLES
PARTITIONS
PROCESSLIST
SCHEMATA
SEQUENCES
SESSION_VARIABLES
SLOW_QUERY
STATISTICS
TABLES
TABLE_CONSTRAINTS
TABLE_STORAGE_STATS
TIDB_HOT_REGIONS
TIDB_INDEXES
TIDB_SERVERS_INFO
TIFLASH_REPLICA
TIKV_REGION_PEERS
TIKV_REGION_STATUS
TIKV_STORE_STATUS
USER_PRIVILEGES
VIEWS
METRICS_SCHEMA
- UI
- TiDB Dashboard
- Overview
- Maintain
- Access
- Overview Page
- Cluster Info Page
- Key Visualizer Page
- Metrics Relation Graph
- SQL Statements Analysis
- Slow Queries Page
- Cluster Diagnostics
- Search Logs Page
- Profile Instances Page
- Session Management and Configuration
- FAQ
- CLI
- Command Line Flags
- Configuration File Parameters
- System Variables
- Storage Engines
- Telemetry
- Errors Codes
- Table Filter
- Schedule Replicas by Topology Labels
- FAQs
- Release Notes
- All Releases
- v4.0
- v3.1
- v3.0
- v2.1
- v2.0
- v1.0
- Glossary
You are viewing the documentation of an older version of the TiDB database (TiDB v4.0).
Use TiFlash
After TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated.
You can either use TiDB to read TiFlash replicas for medium-scale analytical processing, or use TiSpark to read TiFlash replicas for large-scale analytical processing, which is based on your own needs. See the following sections for details:
Create TiFlash replicas for tables
After TiFlash is connected to the TiKV cluster, data replication by default does not begin. You can send a DDL statement to TiDB through a MySQL client to create a TiFlash replica for a specific table:
ALTER TABLE table_name SET TIFLASH REPLICA count;
The parameter of the above command is described as follows:
count
indicates the number of replicas. When the value is0
, the replica is deleted.
If you execute multiple DDL statements on the same table, only the last statement is ensured to take effect. In the following example, two DDL statements are executed on the table tpch50
, but only the second statement (to delete the replica) takes effect.
Create two replicas for the table:
ALTER TABLE `tpch50`.`lineitem` SET TIFLASH REPLICA 2;
Delete the replica:
ALTER TABLE `tpch50`.`lineitem` SET TIFLASH REPLICA 0;
Notes:
If the table
t
is replicated to TiFlash through the above DDL statements, the table created using the following statement will also be automatically replicated to TiFlash:CREATE TABLE table_name like t;
For versions earlier than v4.0.6, if you create the TiFlash replica before using TiDB Lightning to import the data, the data import will fail. You must import data to the table before creating the TiFlash replica for the table.
If TiDB and TiDB Lightning are both v4.0.6 or later, no matter a table has TiFlash replica(s) or not, you can import data to that table using TiDB Lightning. Note that this might slow the TiDB Lightning procedure, which depends on the NIC bandwidth on the TiDB Lightning host, the CPU and disk load of the TiFlash node, and the number of TiFlash replicas.
It is recommended that you do not replicate more than 1,000 tables because this lowers the PD scheduling performance. This limit will be removed in later versions.
Check replication progress
You can check the status of the TiFlash replicas of a specific table using the following statement. The table is specified using the WHERE
clause. If you remove the WHERE
clause, you will check the replica status of all tables.
SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = '<db_name>' and TABLE_NAME = '<table_name>';
In the result of above statement:
AVAILABLE
indicates whether the TiFlash replicas of this table are available or not.1
means available and0
means unavailable. Once the replicas become available, this status does not change. If you use DDL statements to modify the number of replicas, the replication status will be recalculated.PROGRESS
means the progress of the replication. The value is between0.0
and1.0
.1
means at least one replica is replicated.
Use TiDB to read TiFlash replicas
TiDB provides three ways to read TiFlash replicas. If you have added a TiFlash replica without any engine configuration, the CBO (cost-based optimization) mode is used by default.
Smart selection
For tables with TiFlash replicas, the TiDB optimizer automatically determines whether to use TiFlash replicas based on the cost estimation. You can use the desc
or explain analyze
statement to check whether or not a TiFlash replica is selected. For example:
desc select count(*) from test.t;
+--------------------------+---------+--------------+---------------+--------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------+---------+--------------+---------------+--------------------------------+
| StreamAgg_9 | 1.00 | root | | funcs:count(1)->Column#4 |
| └─TableReader_17 | 1.00 | root | | data:TableFullScan_16 |
| └─TableFullScan_16 | 1.00 | cop[tiflash] | table:t | keep order:false, stats:pseudo |
+--------------------------+---------+--------------+---------------+--------------------------------+
3 rows in set (0.00 sec)
explain analyze select count(*) from test.t;
+--------------------------+---------+---------+--------------+---------------+----------------------------------------------------------------------+--------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+--------------------------+---------+---------+--------------+---------------+----------------------------------------------------------------------+--------------------------------+-----------+------+
| StreamAgg_9 | 1.00 | 1 | root | | time:83.8372ms, loops:2 | funcs:count(1)->Column#4 | 372 Bytes | N/A |
| └─TableReader_17 | 1.00 | 1 | root | | time:83.7776ms, loops:2, rpc num: 1, rpc time:83.5701ms, proc keys:0 | data:TableFullScan_16 | 152 Bytes | N/A |
| └─TableFullScan_16 | 1.00 | 1 | cop[tiflash] | table:t | time:43ms, loops:1 | keep order:false, stats:pseudo | N/A | N/A |
+--------------------------+---------+---------+--------------+---------------+----------------------------------------------------------------------+--------------------------------+-----------+------+
cop[tiflash]
means that the task will be sent to TiFlash for processing. If you have not selected a TiFlash replica, you can try to update the statistics using the analyze table
statement, and then check the result using the explain analyze
statement.
Note that if a table has only a single TiFlash replica and the related node cannot provide service, queries in the CBO mode will repeatedly retry. In this situation, you need to specify the engine or use the manual hint to read data from the TiKV replica.
Engine isolation
Engine isolation is to specify that all queries use a replica of the specified engine by configuring the corresponding variable. The optional engines are "tikv", "tidb" (indicates the internal memory table area of TiDB, which stores some TiDB system tables and cannot be actively used by users), and "tiflash", with the following two configuration levels:
TiDB instance-level, namely, INSTANCE level. Add the following configuration item in the TiDB configuration file:
[isolation-read] engines = ["tikv", "tidb", "tiflash"]
The INSTANCE-level default configuration is
["tikv", "tidb", "tiflash"]
.SESSION level. Use the following statement to configure:
set @@session.tidb_isolation_read_engines = "engine list separated by commas";
or
set SESSION tidb_isolation_read_engines = "engine list separated by commas";
The default configuration of the SESSION level inherits from the configuration of the TiDB INSTANCE level.
The final engine configuration is the session-level configuration, that is, the session-level configuration overrides the instance-level configuration. For example, if you have configured "tikv" in the INSTANCE level and "tiflash" in the SESSION level, then the TiFlash replicas are read. If the final engine configuration is "tikv" and "tiflash", then the TiKV and TiFlash replicas are both read, and the optimizer automatically selects a better engine to execute.
Because TiDB Dashboard and other components need to read some system tables stored in the TiDB memory table area, it is recommended to always add the "tidb" engine to the instance-level engine configuration.
If the queried table does not have a replica of the specified engine (for example, the engine is configured as "tiflash" but the table does not have a TiFlash replica), the query returns an error.
Manual hint
Manual hint can force TiDB to use specified replicas for specific table(s) on the premise of satisfying engine isolation. Here is an example of using the manual hint:
select /*+ read_from_storage(tiflash[table_name]) */ ... from table_name;
If you set an alias to a table in a query statement, you must use the alias in the statement that includes a hint for the hint to take effect. For example:
select /*+ read_from_storage(tiflash[alias_a,alias_b]) */ ... from table_name_1 as alias_a, table_name_2 as alias_b where alias_a.column_1 = alias_b.column_2;
In the above statements, tiflash[]
prompts the optimizer to read the TiFlash replicas. You can also use tikv[]
to prompt the optimizer to read the TiKV replicas as needed. For hint syntax details, refer to READ_FROM_STORAGE.
If the table specified by a hint does not have a replica of the specified engine, the hint is ignored and a warning is reported. In addition, a hint only takes effect on the premise of engine isolation. If the engine specified in a hint is not in the engine isolation list, the hint is also ignored and a warning is reported.
The MySQL client of 5.7.7 or earlier versions clears optimizer hints by default. To use the hint syntax in these early versions, start the client with the --comments
option, for example, mysql -h 127.0.0.1 -P 4000 -uroot --comments
.
The relationship of smart selection, engine isolation, and manual hint
In the above three ways of reading TiFlash replicas, engine isolation specifies the overall range of available replicas of engines; within this range, manual hint provides statement-level and table-level engine selection that is more fine-grained; finally, CBO makes the decision and selects a replica of an engine based on cost estimation within the specified engine list.
Before v4.0.3, the behavior of reading from TiFlash replica in a non-read-only SQL statement (for example, INSERT INTO ... SELECT
, SELECT ... FOR UPDATE
, UPDATE ...
, DELETE ...
) is undefined. In v4.0.3 and later versions, internally TiDB ignores the TiFlash replica for a non-read-only SQL statement to guarantee the data correctness. That is, for smart selection, TiDB automatically chooses the non-TiFlash replica; for engine isolation that specifies TiFlash replica only, TiDB reports an error; and for manual hint, TiDB ignores the hint.
Use TiSpark to read TiFlash replicas
Currently, you can use TiSpark to read TiFlash replicas in a method similar to the engine isolation in TiDB. This method is to configure the spark.tispark.isolation_read_engines
parameter. The parameter value defaults to tikv,tiflash
, which means that TiDB reads data from TiFlash or from TiKV according to CBO's selection. If you set the parameter value to tiflash
, it means that TiDB forcibly reads data from TiFlash.
Notes
When this parameter is set to
tiflash
, only the TiFlash replicas of all tables involved in the query are read and these tables must have TiFlash replicas; for tables that do not have TiFlash replicas, an error is reported. When this parameter is set totikv
, only the TiKV replica is read.
You can configure this parameter in one of the following ways:
Add the following item in the
spark-defaults.conf
file:spark.tispark.isolation_read_engines tiflash
Add
--conf spark.tispark.isolation_read_engines=tiflash
in the initialization command when initializing Spark shell or Thrift server.Set
spark.conf.set("spark.tispark.isolation_read_engines", "tiflash")
in Spark shell in a real-time manner.Set
set spark.tispark.isolation_read_engines=tiflash
in Thrift server after the server is connected via beeline.
Supported push-down calculations
Before v4.0.2, TiDB does not support the new framework for collations, so in those previous versions, if you enable the new framework for collations, none of the expressions can be pushed down. This restriction is removed in v4.0.2 and later versions.
TiFlash supports predicate, aggregate push-down calculations, and table joins. Push-down calculations can help TiDB perform distributed acceleration. Currently, Full Outer Join
and DISTINCT COUNT
are not the supported calculation types, which will be optimized in later versions.
You can enable the push-down of join
using the following session variable (Full Outer Join
is currently not supported):
set @@session.tidb_opt_broadcast_join=1
Currently, TiFlash supports pushing down a limited number of expressions, including:
+, -, /, *, >=, <=, =, !=, <, >, ifnull, isnull, bitor, in, bitand, or, and, like, not, case when, month, substr, timestampdiff, date_format, from_unixtime, json_length, if, bitneg, bitxor, round without fraction, cast(int as decimal), min, max, sum, count, avg, approx_count_distinct
TiFlash does not support push-down calculations in the following situations:
- Expressions that contain the
Time
type cannot be pushed down. - If an aggregate function or a
WHERE
clause contains expressions that are not included in the list above, the aggregate or related predicate filtering cannot be pushed down.
If a query encounters unsupported push-down calculations, TiDB needs to complete the remaining calculations, which might greatly affect the TiFlash acceleration effect.