- About TiDB
- Quick Start
- Software and Hardware Requirements
- Environment Configuration Checklist
- Topology Patterns
- Install and Start
- Verify Cluster Status
- Benchmarks Methods
- Backup and Restore
- Read Historical Data
- Configure Time Zone
- Daily Checklist
- Maintain TiFlash
- Maintain TiDB Using TiUP
- Maintain TiDB Using Ansible
- Modify Configuration Online
- Monitor and Alert
- TiDB Troubleshooting Map
- Identify Slow Queries
- Analyze Slow Queries
- SQL Diagnostics
- Identify Expensive Queries
- Statement Summary Tables
- Troubleshoot Hotspot Issues
- Troubleshoot Increased Read and Write Latency
- Troubleshoot Cluster Setup
- Troubleshoot High Disk I/O Usage
- Troubleshoot Lock Conflicts
- Troubleshoot TiFlash
- Troubleshoot Write Conflicts in Optimistic Transactions
- Performance Tuning
- System Tuning
- Software Tuning
- SQL Tuning
- Understanding the Query Execution Plan
- SQL Optimization Process
- Logic Optimization
- Physical Optimization
- Prepare Execution Plan Cache
- Control Execution Plans
- Multiple Data Centers in One City Deployment
- Three Data Centers in Two Cities Deployment
- Best Practices
- Use Placement Rules
- Use Load Base Split
- Use Store Limit
- TiDB Ecosystem Tools
- Use Cases
- Backup & Restore (BR)
- TiDB Binlog
- TiDB Lightning
- Cluster Architecture
- Key Monitoring Metrics
- SQL Language Structure and Syntax
- SQL Statements
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
CREATE [GLOBAL|SESSION] BINDING
CREATE TABLE LIKE
DROP [GLOBAL|SESSION] BINDING
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET [GLOBAL|SESSION] <variable>
SHOW ANALYZE STATUS
SHOW [GLOBAL|SESSION] BINDINGS
SHOW CHARACTER SET
SHOW [FULL] COLUMNS FROM
SHOW CREATE SEQUENCE
SHOW CREATE TABLE
SHOW CREATE USER
SHOW DRAINER STATUS
SHOW [FULL] FIELDS FROM
SHOW INDEX [FROM|IN]
SHOW INDEXES [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW MASTER STATUS
SHOW [FULL] PROCESSSLIST
SHOW PUMP STATUS
SHOW TABLE NEXT_ROW_ID
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [FULL] TABLES
SHOW [GLOBAL|SESSION] VARIABLES
- Data Types
- Functions and Operators
- Type Conversion in Expression Evaluation
- Control Flow Functions
- String Functions
- Numeric Functions and Operators
- Date and Time Functions
- Bit Functions and Operators
- Cast Functions and Operators
- Encryption and Compression Functions
- Information Functions
- JSON Functions
- Aggregate (GROUP BY) Functions
- Window Functions
- Miscellaneous Functions
- Precision Math
- List of Expressions for Pushdown
- Generated Columns
- SQL Mode
- Garbage Collection (GC)
- Character Set and Collation
- System Tables
- TiDB Dashboard
- Overview Page
- Cluster Info Page
- Key Visualizer Page
- Metrics Relation Graph
- SQL Statements Analysis
- Slow Queries Page
- Cluster Diagnostics
- Search Logs Page
- Profile Instances Page
- TiDB Dashboard
- Command Line Flags
- Configuration File Parameters
- System Variables
- Storage Engines
- Errors Codes
- Table Filter
- Schedule Replicas by Topology Labels
- Release Notes
- All Releases
Reading data from storage engines is one of the most time-consuming steps during the SQL execution. Currently, TiDB supports reading data from different storage engines and different indexes. Query execution performance depends largely on whether you select a suitable index or not.
This document introduces how to select an index to access a table, and some related ways to control index selection.
Before introducing index selection, it is important to understand the ways TiDB accesses tables, what triggers each way, what differences each way makes, and what the pros and cons are.
|Operator||Trigger Conditions||Applicable Scenarios||Explanations|
|PointGet / BatchPointGet||When accessing tables in one or more single point ranges.||Any scenario||If triggered, it is usually considered as the fastest operator, since it calls the kvget interface directly to perform the calculations rather than calls the coprocessor interface.|
|TableReader||None||Any scenario||It is generally considered as the least efficient operator that scans table data directly from the TiKV layer. It can be selected only if there is a range query on the |
|TableReader||A table has a replica on the TiFlash node.||There are fewer columns to read, but many rows to evaluate.||Tiflash is column-based storage. If you need to calculate a small number of columns and a large number of rows, it is recommended to choose this operator.|
|IndexReader||A table has one or more indexes, and the columns needed for the calculation are included in the indexes.||When there is a smaller range query on the indexes, or when there is an order requirement for indexed columns.||When multiple indexes exist, a reasonable index is selected based on the cost estimation.|
|IndexLookupReader||A table has one or more indexes, and the columns needed for calculation are not completely included in the indexes.||Same as IndexReader.||Since the index does not completely cover calculated columns, TiDB needs to retrieve rows from a table after reading indexes. There is an extra cost compared to the IndexReader operator.|
The TableReader operator is based on the
_tidb_rowidcolumn index, and TiFlash uses a column storage index, so the selection of index is the selection of an operator for accessing tables.
TiDB provides a heuristic rule named skyline-pruning based on the cost estimation of each operator for accessing tables. It can reduce the probability of wrong index selection caused by wrong estimation.
Skyline-pruning is a heuristic filtering rule for indexes. To judge an index, the following three dimensions are needed:
Whether it needs to retrieve rows from a table when you select the index to access the table (that is, the plan generated by the index is IndexReader operator or IndexLookupReader operator). Indexes that do not retrieve rows from a table are better on this dimension than indexes that do.
Select whether the index satisfies a certain order. Because index reading can guarantee the order of certain column sets, indexes that satisfy the query order are superior to indexes that do not satisfy on this dimension.
How many access conditions are covered by the indexed columns. An “access condition” is a where condition that can be converted to a column range. And the more access conditions an indexed column set covers, the better it is in this dimension.
For these three dimensions, if an index named idx_a is not worse than the index named idx_b in all three dimensions and one of the dimensions is better than idx_b, then idx_a is preferred.
After using the skyline-pruning rule to rule out inappropriate indexes, the selection of indexes is based entirely on the cost estimation. The cost estimation of accessing tables requires the following considerations:
- The average length of each row of the indexed data in the storage engine.
- The number of rows in the query range generated by the index.
- The cost for retrieving rows from a table.
- The number of ranges generated by index during the query execution.
According to these factors and the cost model, the optimizer selects an index with the lowest cost to access the table.
The estimated number of rows is not accurate?
This is usually due to stale or inaccurate statistics. You can re-execute the
analyze tablestatement or modify the parameters of the
Statistics are accurate, and reading from TiFlash is faster, but why does the optimizer choose to read from TiKV?
At present, the cost model of distinguishing TiFlash from TiKV is still rough. You can decrease the value of
tidb_opt_seek_factorparameter, then the optimizer prefers to choose TiFlash.
The statistics are accurate. Index A needs to retrieve rows from tables, but it actually executes faster than Index B that does not retrieve rows from tables. Why does the optimizer choose Index B?
In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of
tidb_opt_network_factorparameter to reduce the cost of retrieving rows from tables.
The index selection can be controlled by a single query through Optimizer Hints.
IGNORE_INDEXcan force the optimizer to use / not use certain indexes.
READ_FROM_STORAGEcan force the optimizer to choose the TiKV / TiFlash storage engine for certain tables to execute queries.