A Practical Guide for SQL Tuning

This guide is designed for TiDB SQL tuning beginners. It focuses on the following key principles:

Low entry barrier: introducing tuning concepts and methods step by step.
Practice-oriented: providing specific steps and examples for each optimization tip.
Quick start: prioritizing the most common and effective optimization methods.
Gentle learning curve: emphasizing practical techniques rather than complex theory.
Scenario-based: using real-world business cases to demonstrate optimization effects.

Introduction to SQL tuning

SQL tuning is crucial for optimizing database performance. It involves systematically improving the efficiency of SQL queries through the following typical steps:

Identify high-impact SQL statements:
- Review the SQL execution history to find statements that consume significant system resources or contribute heavily to the application workload.
- Use monitoring tools and performance metrics to identify resource-intensive queries.
Analyze execution plans:
- Examine the execution plans generated by the query optimizer for the identified statements.
- Verify whether these plans are reasonably efficient and use appropriate indexes and join methods.
Implement optimizations:
Implement optimizations to inefficient SQL statements. Optimizations might include rewriting SQL statements, adding or modifying indexes, updating statistics, or adjusting database parameters.

Repeat these steps until:

The system performance meets target requirements.
No further improvements can be made to the remaining statements.

SQL tuning is an ongoing process. As your data volume grows and query patterns change, you should:

Regularly monitor query performance.
Re-evaluate your optimization strategies.
Adapt your approach to address new performance challenges.

By consistently applying these practices, you can maintain optimal database performance over time.

Goals of SQL tuning

The primary goals of SQL tuning are the following:

Reduce response time for end users.
Minimize resource consumption during query execution.

To achieve these goals, you can use the following strategies.

Optimize query execution

SQL tuning focuses on finding more efficient ways to process the same workload without changing query functionality. You can optimize query execution as follows:

Improving execution plans:
- Analyze and modify query structures for more efficient processing.
- Use appropriate indexes to reduce data access and processing time.
- Enable TiFlash for analytical queries on large datasets and leverage the Massively Parallel Processing (MPP) engine for complex aggregations and joins.
Enhancing data access methods:
- Use covering indexes to satisfy queries directly from the index, avoiding full table scans.
- Implement partitioning strategies to limit data scans to relevant partitions.

Examples:

Create indexes on frequently queried columns to significantly reduce resource usage, particularly for queries that access small portions of the table.
Use index-only scans for queries that return a limited number of sorted results to avoid full table scans and sorting operations.

Balance workload distribution

In a distributed architecture like TiDB, balancing workloads across TiKV nodes is essential for optimal performance. To identify and resolve read and write hotspots, see Troubleshoot hotspot issues.

By implementing these strategies, you can ensure that your TiDB cluster efficiently utilizes all available resources and avoids bottlenecks caused by uneven workload distribution or serialization on individual TiKV nodes.

Identify high-load SQL

The most efficient way to identify resource-intensive SQL statements is by using TiDB Dashboard. You can also use other tools, such as views and logs, to identify high-load SQL statements.

Monitor SQL statements using TiDB Dashboard

SQL Statements page

In TiDB Dashboard, navigate to the SQL Statements page to identify the following:

The SQL statement with the highest total latency, which is the statement that takes the longest time to execute across multiple executions.
The number of times each SQL statement has been executed, which helps identify statements with the highest execution frequency.
Execution details of each SQL statement by clicking it to view EXPLAIN ANALYZE results.

TiDB normalizes SQL statements into templates by replacing literals and bind variables with ?. This normalization and sorting process helps you quickly identify the most resource-intensive queries that might require optimization.

sql-statements-default

Slow Queries page

In TiDB Dashboard, navigate to the Slow Queries page to find the following:

The slowest SQL queries.
The SQL query that reads the most data from TiKV.
The EXPLAIN ANALYZE output by clicking a query for detailed execution analysis.

The Slow Queries page does not display SQL execution frequency. A query appears on this page if its execution time exceeds the tidb_slow_log_threshold configuration item for a single instance.

slow-query-default

Use other tools to identify Top SQL

In addition to TiDB Dashboard, you can use other tools to identify resource-intensive SQL queries. Each tool offers unique insights and can be valuable for different analysis scenarios. Using a combination of these tools helps ensure comprehensive SQL performance monitoring and optimization.

TiDB Dashboard Top SQL Page
Logs: slow query log and expensive queries in TiDB log
Views: cluster_statements_summary view and cluster_processlist view

Gather data on identified SQL statements

For the top SQL statements identified, you can use PLAN REPLAYER to capture and save SQL execution information from a TiDB cluster. This tool helps recreate the execution environment for further analysis. To export SQL execution information, use the following syntax:

PLAN REPLAYER DUMP EXPLAIN [ANALYZE] [WITH STATS AS OF TIMESTAMP expression] sql-statement;

Use EXPLAIN ANALYZE whenever possible, as it provides both the execution plan and actual performance metrics for more accurate insights into query performance.

SQL tuning guide

This guide provides practical advice for beginners on optimizing SQL queries in TiDB. By following these best practices, you can improve query performance and streamline SQL tuning. This guide covers the following topics:

Understand query processing
Understand execution plans
Index strategy in TiDB
- Composite index strategy guidelines
- The cost of indexing
- SQL tuning with a covering index
- SQL tuning with a composite index involving sorting
- [SQL tuning with composite indexes for efficient filtering and sorting]#sql-tuning-with-composite-indexes-for-efficient-filtering-and-sorting
When to use TiFlash
- Analytical query
- SaaS arbitrary filtering workloads

Understand query processing

This section introduces the query processing workflow, optimizer fundamentals, and statistics management.

Query processing workflow

When a client sends a SQL statement to TiDB, the statement passes through the protocol layer of the TiDB server. This layer manages the connection between the TiDB server and the client, receives SQL statements, and returns data to the client.

workflow

In the preceding figure, to the right of the protocol layer is the optimizer of the TiDB server, which processes SQL statements as follows:

The SQL statement arrives at the SQL optimizer through the protocol layer and is parsed into an abstract syntax tree (AST).
TiDB identifies whether it is a Point Get statement, which involves a simple one-table lookup through a primary or unique key, such as SELECT * FROM t WHERE pk_col = 1 or SELECT * FROM t WHERE uk_col IN (1,2,3). For Point Get statements, TiDB skips subsequent optimization steps and proceeds directly to execution in the SQL executor.
If the query is not a Point Get, the AST undergoes logical transformation, where TiDB rewrites the SQL logically based on specific rules.
After logical transformation, TiDB processes the AST through cost-based optimization.
During cost-based optimization, the optimizer uses statistics to select appropriate operators and generates a physical execution plan.
The generated physical execution plan is sent to the SQL executor of the TiDB node for execution.
Unlike traditional single-node databases, TiDB pushes down operators or coprocessors to TiKV and/or TiFlash nodes containing the data. This approach processes parts of the execution plan where the data is stored, efficiently utilizing the distributed architecture, using resources in parallel, and reducing network data transfer. The TiDB node executor then assembles the final result and returns it to the client.

Optimizer fundamentals

TiDB uses a cost-based optimizer (CBO) to determine the most efficient execution plan for a SQL statement. The optimizer evaluates different execution strategies and selects the one with the lowest estimated cost. The cost depends on factors such as:

The SQL statement
Schema design
Statistics, including:
- Table statistics
- Index statistics
- Column statistics

Based on these inputs, the cost model generates an execution plan that details how TiDB will execute the SQL statement, including:

Access method
Join method
Join order

The effectiveness of the optimizer depends on the quality of the information it receives. To achieve optimal performance, ensure that statistics are up to date and indexes are well-designed.

Statistics management

Statistics are essential for the TiDB optimizer. TiDB uses statistics as the input of the optimizer to estimate the number of rows processed in each step of a SQL execution plan.

Statistics are divided into two levels:

Table-level statistics: include the total number of rows in the table and the number of rows modified since the last statistics collection.
Index/column-level statistics: include detailed information such as histograms, Count-Min Sketch, Top-N (values or indexes with the highest occurrences), distribution and quantity of different values, and the number of NULL values.

To check the accuracy and health of your statistics, you can use the following SQL statements:

SHOW STATS_META: shows metadata about table statistics.

SHOW STATS_META WHERE table_name='T2'\G;

*************************** 1. row ***************************
          Db_name: test
       Table_name: T2
   Partition_name:
      Update_time: 2023-05-11 02:16:50
     Modify_count: 10000
        Row_count: 20000
1 row in set (0.03 sec)

SHOW STATS_HEALTHY: shows the health status of table statistics.

SHOW STATS_HEALTHY WHERE table_name='T2'\G;

*************************** 1. row ***************************
       Db_name: test
    Table_name: T2
Partition_name:
       Healthy: 50
1 row in set (0.00 sec)

TiDB provides two methods for collecting statistics: automatic and manual collection. In most cases, automatic collection is sufficient. TiDB triggers automatic collection when certain conditions are met. Some common triggering conditions include:

tidb_auto_analyze_ratio: the healthiness trigger.
tidb_auto_analyze_start_time and tidb_auto_analyze_end_time: the time window for automatic statistics collection.

SHOW VARIABLES LIKE 'tidb\_auto\_analyze%';

+-----------------------------------------+-------------+
| Variable_name                           | Value       |
+-----------------------------------------+-------------+
| tidb_auto_analyze_ratio                 | 0.5         |
| tidb_auto_analyze_start_time            | 00:00 +0000 |
| tidb_auto_analyze_end_time              | 23:59 +0000 |
+-----------------------------------------+-------------+

In some cases, automatic collection might not meet your needs. By default, it occurs between 00:00 and 23:59, meaning the analyze job can run at any time. To minimize performance impact on your online business, you can set specific start and end times for statistics collection.

You can manually collect statistics using the ANALYZE TABLE table_name statement. This lets you adjust settings such as the sample rate, the number of Top-N values, or collect statistics for specific columns only.

Note that after manual collection, subsequent automatic collection jobs inherit the new settings. This means that any customizations made during manual collection will apply to future automatic analyses.

Locking table statistics is useful in the following scenarios:

The statistics on the table already represent the data well.
The table is very large, and statistics collection is time-consuming.
You want to maintain statistics only during specific time windows.

To lock statistics for a table, you can use the LOCK STATS table_name statement.

For more information, see Statistics.

Understand execution plans

An execution plan details the steps that TiDB will follow to execute a SQL query. This section explains how TiDB builds an execution plan and how to generate, display, and interpret execution plans.

How TiDB builds an execution plan

A SQL statement undergoes three main optimization stages in the TiDB optimizer:

Pre-processing
Logical transformation
Cost-based optimization

1. Pre-processing

During pre-processing, TiDB determines whether the SQL statement can be executed using Point_Get or Batch_Point_Get. These operations use a primary or unique key to read directly from TiKV through an exact key lookup. If a plan qualifies for Point_Get or Batch_Point_Get, the optimizer skips the logical transformation and cost-based optimization steps because direct key lookup is the most efficient way to access the row.

The following is an example of a Point_Get query:

EXPLAIN SELECT id, name FROM emp WHERE id = 901;

+-------------+---------+------+---------------+---------------+
| id          | estRows | task | access object | operator info |
+-------------+---------+------+---------------+---------------+
| Point_Get_1 | 1.00    | root | table:emp     | handle:901    |
+-------------+---------+------+---------------+---------------+

2. Logical transformation

During logical transformation, TiDB optimizes SQL statements based on the SELECT list, WHERE predicates, and other conditions. It generates a logical execution plan to annotate and rewrite the query. This logical plan is used in the next stage, cost-based optimization. The transformation applies rule-based optimizations such as column pruning, partition pruning, and join reordering. Because this process is rule-based and automatic, manual adjustments are usually unnecessary.

For more information, see SQL Logical Optimization.

3. Cost-based optimization

The TiDB optimizer uses statistics to estimate the number of rows processed in each step of a SQL statement and assigns a cost to each step. During cost-based optimization, the optimizer evaluates all possible plan choices, including index accesses and join methods, and calculates the total cost for each plan. The optimizer then selects the execution plan with the minimal total cost.

The following figure illustrates various data access paths and row set operations considered during cost-based optimization. For data retrieval paths, the optimizer determines the most efficient method between index scans and full table scans, and decides whether to retrieve data from row-based TiKV storage or columnar TiFlash storage.

The optimizer also evaluates operations that manipulate row sets, such as aggregation, join, and sorting. For example, the aggregation operator might use either HashAgg or StreamAgg, while the join method can select from HashJoin, MergeJoin, or IndexJoin.

Additionally, the physical optimization phase includes pushing down expressions and operators to the physical storage engines. The physical plan is distributed to different components based on the underlying storage engines as follows:

Root tasks are executed on the TiDB server.
Cop (Coprocessor) tasks are executed on TiKV.
MPP tasks are executed on TiFlash.

This distribution enables cross-component collaboration for efficient query processing.

cost-based-optimization

For more information, see SQL Physical Optimization.

Generate and display execution plans

Besides accessing execution plan information through TiDB Dashboard, you can use the EXPLAIN statement to display the execution plan for a SQL query. The EXPLAIN output includes the following columns:

id: the operator name and a unique identifier of the step.
estRows: the estimated number of rows from the particular step.
task: indicates the layer where the operator is executed. For example, root indicates execution on the TiDB server, cop[tikv] indicates execution on TiKV, and mpp[tiflash] indicates execution on TiFlash.
access object: the object where row sources are located.
operator info: additional details about the operator in the step.

EXPLAIN SELECT COUNT(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59';

+--------------------------+-------------+--------------+-------------------+----------------------------------------------------------------------------------------------------+
| id                       | estRows     | task         | access object     | operator info                                                                                      |
+--------------------------+-------------+--------------+-------------------+----------------------------------------------------------------------------------------------------+
| StreamAgg_20             | 1.00        | root         |                   | funcs:count(Column#13)->Column#11                                                                  |
| └─TableReader_21         | 1.00        | root         |                   | data:StreamAgg_9                                                                                   |
|   └─StreamAgg_9          | 1.00        | cop[tikv]    |                   | funcs:count(1)->Column#13                                                                          |
|     └─Selection_19       | 250.00      | cop[tikv]    |                   | ge(trips.start_date, 2017-07-01 00:00:00.000000), le(trips.start_date, 2017-07-01 23:59:59.000000) |
|       └─TableFullScan_18 | 10000.00    | cop[tikv]    | table:trips       | keep order:false, stats:pseudo                                                                     |
+--------------------------+-------------+--------------+-------------------+----------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

Different from EXPLAIN, EXPLAIN ANALYZE executes the corresponding SQL statement, records its runtime information, and returns the runtime information together with the execution plan. This runtime information is crucial for debugging query execution. For more information, see EXPLAIN ANALYZE.

The EXPLAIN ANALYZE output includes:

actRows: the number of rows output by the operator.
execution info: detailed execution information of the operator. time represents the total wall time, including the total execution time of all sub-operators. If the operator is called many times by the parent operator, then the time refers to the accumulated time.
memory: the memory used by the operator.
disk: the disk space used by the operator.

The following is an example. Some attributes and table columns are omitted to improve formatting.

EXPLAIN ANALYZE
SELECT SUM(pm.m_count) / COUNT(*) 
FROM (
    SELECT COUNT(m.name) m_count
    FROM universe.moons m
    RIGHT JOIN (
        SELECT p.id, p.name
        FROM universe.planet_categories c
        JOIN universe.planets p
            ON c.id = p.category_id 
            AND c.name = 'Jovian'
    ) pc ON m.planet_id = pc.id
    GROUP BY pc.name
) pm;

+-----------------------------------------+.+---------+-----------+---------------------------+----------------------------------------------------------------+.+-----------+---------+
| id                                      |.| actRows | task      | access object             | execution info                                                 |.| memory    | disk    |
+-----------------------------------------+.+---------+-----------+---------------------------+----------------------------------------------------------------+.+-----------+---------+
| Projection_14                           |.| 1       | root      |                           | time:1.39ms, loops:2, RU:1.561975, Concurrency:OFF             |.| 9.64 KB   | N/A     |
| └─StreamAgg_16                          |.| 1       | root      |                           | time:1.39ms, loops:2                                           |.| 1.46 KB   | N/A     |
|   └─Projection_40                       |.| 4       | root      |                           | time:1.38ms, loops:4, Concurrency:OFF                          |.| 8.24 KB   | N/A     |
|     └─HashAgg_17                        |.| 4       | root      |                           | time:1.36ms, loops:4, partial_worker:{...}, final_worker:{...} |.| 82.1 KB   | N/A     |
|       └─HashJoin_19                     |.| 25      | root      |                           | time:1.29ms, loops:2, build_hash_table:{...}, probe:{...}      |.| 2.25 KB   | 0 Bytes |
|         ├─HashJoin_35(Build)            |.| 4       | root      |                           | time:1.08ms, loops:2, build_hash_table:{...}, probe:{...}      |.| 25.7 KB   | 0 Bytes |
|         │ ├─IndexReader_39(Build)       |.| 1       | root      |                           | time:888.5µs, loops:2, cop_task: {...}                         |.| 286 Bytes | N/A     |
|         │ │ └─IndexRangeScan_38         |.| 1       | cop[tikv] | table:c, index:name(name) | tikv_task:{time:0s, loops:1}, scan_detail: {...}               |.| N/A       | N/A     |
|         │ └─TableReader_37(Probe)       |.| 10      | root      |                           | time:543.7µs, loops:2, cop_task: {...}                         |.| 577 Bytes | N/A     |
|         │   └─TableFullScan_36          |.| 10      | cop[tikv] | table:p                   | tikv_task:{time:0s, loops:1}, scan_detail: {...}               |.| N/A       | N/A     |
|         └─TableReader_22(Probe)         |.| 28      | root      |                           | time:671.7µs, loops:2, cop_task: {...}                         |.| 876 Bytes | N/A     |
|           └─TableFullScan_21            |.| 28      | cop[tikv] | table:m                   | tikv_task:{time:0s, loops:1}, scan_detail: {...}               |.| N/A       | N/A     |
+-----------------------------------------+.+---------+-----------+---------------------------+----------------------------------------------------------------+.+-----------+---------+

Read execution plans: first child first

To diagnose slow SQL queries, you need to understand how to read execution plans. The key principle is "first child first – recursive descent". Each operator in the plan generates a set of rows, and the execution order determines how these rows flow through the plan tree.

The "first child first" rule means that an operator must retrieve rows from all its child operators before producing output. For example, a join operator requires rows from both its child operators to perform the join. The "recursive descent" rule means you analyze the plan from top to bottom, although actual data flows from bottom to top, as each operator depends on its children's output.

Consider these two important concepts when reading execution plans:

Parent-child interaction: a parent operator calls its child operators sequentially but might cycle through them multiple times. For example, in an index lookup or nested loop join, the parent fetches a batch of rows from the first child, then retrieves zero or more rows from the second child. This process repeats until the first child's result set is fully processed.
Blocking vs. non-blocking operators: operators can be either blocking or non-blocking:
- Blocking operators, such as TopN and HashAgg, must create their entire result set before passing data to their parent.
- Non-blocking operators, such as IndexLookup and IndexJoin, produce and pass rows incrementally as needed.

When reading an execution plan, start from the top and work downward. In the following example, the leaf node of the plan tree is TableFullScan_18, which performs a full table scan. The rows from this scan are consumed by the Selection_19 operator, which filters the data based on ge(trips.start_date, 2017-07-01 00:00:00.000000), le(trips.start_date, 2017-07-01 23:59:59.000000). Then, the group-by operator StreamAgg_9 performs the final aggregation COUNT(*).

These three operators (TableFullScan_18, Selection_19, and StreamAgg_9) are pushed down to TiKV (marked as cop[tikv]), enabling early filtering and aggregation in TiKV to reduce data transfer between TiKV and TiDB. Finally, TableReader_21 reads the data from StreamAgg_9, and StreamAgg_20 performs the final aggregation count(*).

EXPLAIN SELECT COUNT(*) FROM trips WHERE start_date BETWEEN '2017-07-01 00:00:00' AND '2017-07-01 23:59:59';

+--------------------------+-------------+--------------+-------------------+----------------------------------------------------------------------------------------------------+
| id                       | estRows     | task         | access object     | operator info                                                                                      |
+--------------------------+-------------+--------------+-------------------+----------------------------------------------------------------------------------------------------+
| StreamAgg_20             | 1.00        | root         |                   | funcs:count(Column#13)->Column#11                                                                  |
| └─TableReader_21         | 1.00        | root         |                   | data:StreamAgg_9                                                                                   |
|   └─StreamAgg_9          | 1.00        | cop[tikv]    |                   | funcs:count(1)->Column#13                                                                          |
|     └─Selection_19       | 250.00      | cop[tikv]    |                   | ge(trips.start_date, 2017-07-01 00:00:00.000000), le(trips.start_date, 2017-07-01 23:59:59.000000) |
|       └─TableFullScan_18 | 10000.00    | cop[tikv]    | table:trips       | keep order:false, stats:pseudo                                                                     |
+--------------------------+-------------+--------------+-------------------+----------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

In the following example, start by examining IndexRangeScan_47, the first leaf node of the plan tree. The optimizer selects only the name and id columns from the stars table, which can be retrieved from the name(name) index. As a result, the root reader for stars is IndexReader_48, not TableReader.

The join between stars and planets is a hash join (HashJoin_44). The planets table is accessed using a full table scan (TableFullScan_45). After the join, TopN_26 and TopN_19 apply the ORDER BY and LIMIT clauses, respectively. The final operator, Projection_16, selects the final column t5.name.

EXPLAIN 
SELECT t5.name 
FROM (
    SELECT p.name, p.gravity, p.distance_from_sun 
    FROM universe.planets p 
    JOIN universe.stars s
        ON s.id = p.sun_id 
        AND s.name = 'Sun'
    ORDER BY p.distance_from_sun ASC 
    LIMIT 5
) t5
ORDER BY t5.gravity DESC 
LIMIT 3;

+-----------------------------------+----------+-----------+---------------------------+
| id                                | estRows  | task      | access object             |
+-----------------------------------+----------+-----------+---------------------------+
| Projection_16                     | 3.00     | root      |                           |
| └─TopN_19                         | 3.00     | root      |                           |
|   └─TopN_26                       | 5.00     | root      |                           |
|     └─HashJoin_44                 | 5.00     | root      |                           |
|       ├─IndexReader_48(Build)     | 1.00     | root      |                           |
|       │ └─IndexRangeScan_47       | 1.00     | cop[tikv] | table:s, index:name(name) |
|       └─TableReader_46(Probe)     | 10.00    | root      |                           |
|         └─TableFullScan_45        | 10.00    | cop[tikv] | table:p                   |
+-----------------------------------+----------+-----------+---------------------------+

The following figure illustrates the plan tree for the second execution plan:

execution-plan-traverse

The execution plan follows a top-to-bottom, first-child-first traversal, corresponding to a postorder traversal (Left, Right, Root) of the plan tree.

You can read this plan in the following order:

Start at the top with Projection_16.
Move to its child, TopN_19.
Continue to TopN_26.
Proceed to HashJoin_44.
For HashJoin_44, process its left (Build) child first:
- IndexReader_48
- IndexRangeScan_47
For HashJoin_44, process its right (Probe) child:
- TableReader_46
- TableFullScan_45

This traversal ensures that each operator's inputs are processed before the operator itself, enabling efficient query execution.

Identify bottlenecks in execution plans

When analyzing execution plans, compare actRows (actual rows) with estRows (estimated rows) to evaluate the accuracy of the optimizer's estimations. A significant difference between these values might indicate outdated or inaccurate statistics, which can lead to suboptimal query plans.

To identify bottlenecks in a slow query, perform the following steps:

Review the execution info section from top to bottom, focusing on operators with high execution time.
For the first child operator that consumes significant time:
- Compare actRows with estRows to assess estimation accuracy.
- Analyze detailed metrics in execution info, such as high execution time or other metrics.
- Check memory and disk usage for potential resource constraints.
Correlate these factors to determine the root cause of the performance issue. For example, if a TableFullScan operation has a high actRows count and long execution time in execution info, consider creating an index. If a HashJoin operation shows high memory usage and execution time, consider optimizing the join order or using an alternative join method.

In the following execution plan, the query runs for 5 minutes and 51 seconds before being canceled. The key issues include:

Severe underestimation: The first leaf node IndexReader_76 reads data from the index_orders_on_adjustment_id(adjustment_id) index. The actual number of rows (actRows) is 256,811,189, which is significantly higher than the estimated 1 row (estRows).
Memory overflow: This underestimation causes the hash join operator HashJoin_69 to build a hash table with far more data than expected, consuming excessive memory (22.6 GB) and disk space (7.65 GB).
Query termination: The actRows value of 0 for HashJoin_69 and operators above it indicate either no matching rows or query termination due to resource constraints. In this case, the hash join consumes too much memory, triggering memory control mechanisms to terminate the query.
Incorrect join order: The root cause of this inefficient plan is the severe underestimation of estRows for IndexRangeScan_75, leading the optimizer to choose an incorrect join order.

To address these issues, ensure that table statistics are up to date, particularly for the orders table and the index_orders_on_adjustment_id index.

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...----------------------+
| id                                 | estRows   | estCost      | actRows   | task      | access object                                                                          | execution info ...| memory   | disk     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...----------------------+
| TopN_19                            | 1.01      | 461374372.63 | 0         | root      |                                                                                        | time:5m51.1s, l...| 0 Bytes  | 0 Bytes  |
| └─IndexJoin_32                     | 1.01      | 460915067.45 | 0         | root      |                                                                                        | time:5m51.1s, l...| 0 Bytes  | N/A      |
|   ├─HashJoin_69(Build)             | 1.01      | 460913065.41 | 0         | root      |                                                                                        | time:5m51.1s, l...| 21.6 GB  | 7.65 GB  |
|   │ ├─IndexReader_76(Build)        | 1.00      | 18.80        | 256805045 | root      |                                                                                        | time:1m4.1s, lo...| 12.4 MB  | N/A      |
|   │ │ └─IndexRangeScan_75          | 1.00      | 186.74       | 256811189 | cop[tikv] | table:orders, index:index_orders_on_adjustment_id(adjustment_id)                       | tikv_task:{proc...| N/A      | N/A      |
|   │ └─Projection_74(Probe)         | 30652.93  | 460299612.60 | 1024      | root      |                                                                                        | time:1.08s, loo...| 413.4 KB | N/A      |
|   │   └─IndexLookUp_73             | 30652.93  | 460287375.95 | 6144      | root      | partition:all                                                                          | time:1.08s, loo...| 107.8 MB | N/A      |
|   │     ├─IndexRangeScan_70(Build) | 234759.64 | 53362737.50  | 390699    | cop[tikv] | table:rates, index:index_rates_on_label_id(label_id)                                   | time:29.6ms, lo...| N/A      | N/A      |
|   │     └─Selection_72(Probe)      | 30652.93  | 110373973.91 | 187070    | cop[tikv] |                                                                                        | time:36.8s, loo...| N/A      | N/A      |
|   │       └─TableRowIDScan_71      | 234759.64 | 86944962.10  | 390699    | cop[tikv] | table:rates                                                                            | tikv_task:{proc...| N/A      | N/A      |
|   └─TableReader_28(Probe)          | 0.00      | 43.64        | 0         | root      |                                                                                        |                ...| N/A      | N/A      |
|     └─Selection_27                 | 0.00      | 653.96       | 0         | cop[tikv] |                                                                                        |                ...| N/A      | N/A      |
|       └─TableRangeScan_26          | 1.01      | 454.36       | 0         | cop[tikv] | table:labels                                                                           |                ...| N/A      | N/A      |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...----------------------+

The following execution plan shows the expected results after fixing the incorrect estimation on the orders table. The query now takes 1.96 seconds to run, which is a significant improvement from the previous 5 minutes and 51 seconds:

Accurate estimation: The estRows values now closely match the actRows, indicating that the statistics are updated and more accurate.
Efficient join order: The query now starts with a TableReader on the labels table, followed by an IndexJoin with the rates table, and another IndexJoin with the orders table. This join order works more efficiently with the actual data distribution.
No memory overflow: Unlike the previous plan, this execution shows no signs of excessive memory or disk usage, indicating that the query runs within expected resource limits.

This optimized plan demonstrates the importance of accurate statistics and proper join order in query performance. The execution time reduction (from 351 seconds to 1.96 seconds) shows the impact of addressing estimation errors.

+---------------------------------------+----------+---------+-----------+----------------------------------------------------------------------------------------+---------------...+----------+------+
| id                                    | estRows  | actRows | task      | access object                                                                          | execution info...| memory   | disk |
+---------------------------------------+----------+---------+-----------+----------------------------------------------------------------------------------------+---------------...+----------+------+
| Limit_24                              | 1000.00  | 1000    | root      |                                                                                        | time:1.96s, lo...| N/A      | N/A  |
| └─IndexJoin_88                        | 1000.00  | 1000    | root      |                                                                                        | time:1.96s, lo...| 1.32 MB  | N/A  |
|   ├─IndexJoin_99(Build)               | 1000.00  | 2458    | root      |                                                                                        | time:1.96s, lo...| 77.7 MB  | N/A  |
|   │ ├─TableReader_109(Build)          | 6505.62  | 158728  | root      |                                                                                        | time:1.26s, lo...| 297.0 MB | N/A  |
|   │ │ └─Selection_108                 | 6505.62  | 171583  | cop[tikv] |                                                                                        | tikv_task:{pro...| N/A      | N/A  |
|   │ │   └─TableRangeScan_107          | 80396.43 | 179616  | cop[tikv] | table:labels                                                                           | tikv_task:{pro...| N/A      | N/A  |
|   │ └─Projection_98(Probe)            | 1000.00  | 2458    | root      |                                                                                        | time:2.13s, lo...| 59.2 KB  | N/A  |
|   │   └─IndexLookUp_97                | 1000.00  | 2458    | root      | partition:all                                                                          | time:2.13s, lo...| 1.20 MB  | N/A  |
|   │     ├─Selection_95(Build)         | 6517.14  | 6481    | cop[tikv] |                                                                                        | time:798.6ms, ...| N/A      | N/A  |
|   │     │ └─IndexRangeScan_93         | 6517.14  | 6481    | cop[tikv] | table:rates, index:index_rates_on_label_id(label_id)                                   | tikv_task:{pro...| N/A      | N/A  |
|   │     └─Selection_96(Probe)         | 1000.00  | 2458    | cop[tikv] |                                                                                        | time:444.4ms, ...| N/A      | N/A  |
|   │       └─TableRowIDScan_94         | 6517.14  | 6481    | cop[tikv] | table:rates                                                                            | tikv_task:{pro...| N/A      | N/A  |
|   └─TableReader_84(Probe)             | 984.56   | 1998    | root      |                                                                                        | time:207.6ms, ...| N/A      | N/A  |
|     └─Selection_83                    | 984.56   | 1998    | cop[tikv] |                                                                                        | tikv_task:{pro...| N/A      | N/A  |
|       └─TableRangeScan_82             | 1000.00  | 2048    | cop[tikv] | table:orders                                                                           | tikv_task:{pro...| N/A      | N/A  |
+---------------------------------------+----------+---------+-----------+----------------------------------------------------------------------------------------+---------------...+----------+------+

For more information, see TiDB Query Execution Plan Overview and EXPLAIN Walkthrough.

Index strategy in TiDB

TiDB is a distributed SQL database that separates the SQL layer (TiDB Server) from the storage layer (TiKV). Unlike traditional databases, TiDB does not use a buffer pool to cache data on compute nodes. As a result, SQL query performance and overall cluster performance depend on the number of key-value (KV) RPC requests that need to be processed. Common KV RPC requests include Point_Get, Batch_Point_Get, and Coprocessor.

To optimize performance in TiDB, it is essential to use indexes effectively, as they can significantly reduce the number of KV RPC requests. Fewer KV RPC requests improve query performance and system efficiency. The following lists some key strategies that help optimize indexing:

Avoid full table scans.
Avoid sorting operations.
Use covering indexes or exclude unnecessary columns to reduce row lookups.

This section explains general indexing strategies and indexing costs. It also provides three practical examples of effective indexing in TiDB, with a focus on composite and covering indexes for SQL tuning.

Composite index strategy guidelines

To create an efficient composite index, order your columns strategically. The column order directly affects how efficiently the index filters and sorts data.

Follow these recommended column order guidelines for a composite index:

Start with index prefix columns for direct access:
- Columns with equivalent conditions
- Columns with IS NULL conditions
Add columns for sorting next:
- Let the index handle sorting operations
- Enable sort and limit pushdown to TiKV
- Preserve the sorted order
Include additional filtering columns to reduce row lookups:
- Time range conditions on datetime columns
- Other non-equivalent conditions, such as !=, <>, and IS NOT NULL
Add columns from the SELECT list or used in aggregation to fully utilize a covering index.

The cost of indexing

While indexes can improve query performance, they also introduce costs you should consider:

Performance impact on writes:
- A non-clustered index reduces the chance of single-phase commit optimization.
- Each additional index slows down write operations (such as INSERT, UPDATE, and DELETE).
- When data is modified, all affected indexes must be updated.
- The more indexes a table has, the greater the write performance impact.
Resource consumption:
- Indexes require additional disk space.
- More memory is needed to cache frequently accessed indexes.
- Backup and recovery operations take longer.
Write hotspot risk:
- Secondary indexes can create write hotspots. For example, a monotonically increasing datetime index will cause hotspots on table writes.
- Hotspots can lead to significant performance degradation.

The following lists some best practices:

Create indexes only when they provide clear performance benefits.
Regularly review index usage statistics using TIDB_INDEX_USAGE.
Consider the write/read ratio of your workload when designing indexes.

SQL tuning with a covering index

A covering index includes all columns referenced in the WHERE and SELECT clauses. Using a covering index can eliminate the need for additional index lookups, significantly improving query performance.

The following query requires an index lookup of 2,597,411 rows and takes 46.4 seconds to execute. TiDB dispatches 67 cop tasks for the index range scan on logs_idx (IndexRangeScan_11) and 301 cop tasks for table access (TableRowIDScan_12).

SELECT
  SUM(`logs`.`amount`)
FROM
  `logs`
WHERE
  `logs`.`user_id` = 1111
  AND `logs`.`snapshot_id` IS NULL
  AND `logs`.`status` IN ('complete', 'failure')
  AND `logs`.`source_type` != 'online'
  AND (
    `logs`.`source_type` IN ('user', 'payment')
    OR `logs`.`source_type` IN (
      'bank_account',
    )
    AND `logs`.`target_type` IN ('bank_account')
  );

The original execution plan is as follows:

+-------------------------------+------------+---------+-----------+--------------------------------------------------------------------------+------------------------------------------------------------+
| id                            | estRows    | actRows | task      | access object                                                            | execution info                                             | 
+-------------------------------+------------+---------+-----------+--------------------------------------------------------------------------+------------------------------------------------------------+
| HashAgg_18                   | 1.00       | 2571625.22 | 1       | root      |                                                              | time:46.4s, loops:2, partial_worker:{wall_time:46.37,   ...|
| └─IndexLookUp_19             | 1.00       | 2570096.68 | 301     | root      |                                                              | time:46.4s, loops:2, index_task: {total_time: 45.8s,    ...|
|   ├─IndexRangeScan_11(Build) | 1309.50    | 317033.98  | 2597411 | cop[tikv] | table:logs, index:logs_idx(snapshot_id, user_id, status)     | time:228ms, loops:2547, cop_task: {num: 67, max: 2.17s, ...|
|   └─HashAgg_7(Probe)         | 1.00       | 588434.48  | 301     | cop[tikv] |                                                              | time:3m46.7s, loops:260, cop_task: {num: 301,           ...|
|     └─Selection_13           | 1271.37    | 561549.27  | 2566562 | cop[tikv] |                                                              | tikv_task:{proc max:10s, min:0s, avg: 915.3ms,          ...|
|       └─TableRowIDScan_12    | 1309.50    | 430861.31  | 2597411 | cop[tikv] | table:logs                                                   | tikv_task:{proc max:10s, min:0s, avg: 908.7ms,          ...|
+-------------------------------+------------+---------+-----------+--------------------------------------------------------------------------+------------------------------------------------------------+

To improve query performance, you can create a covering index that includes source_type, target_type, and amount columns. This optimization eliminates the need for additional table lookups, reducing execution time to 90 milliseconds, and TiDB only needs to send one cop task to TiKV for data scanning.

After creating the index, execute the ANALYZE TABLE statement to collect statistics. In TiDB, index creation does not automatically update statistics, so analyzing the table ensures that the optimizer selects the new index.

CREATE INDEX logs_covered ON logs(snapshot_id, user_id, status, source_type, target_type, amount); 
ANALYZE TABLE logs INDEX logs_covered;

The new execution plan is as follows:

+-------------------------------+------------+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+
| id                            | estRows    | actRows | task      | access object                                                                                                                   | execution info                              |
+-------------------------------+------------+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+
| HashAgg_13                    | 1.00       | 1       | root      |                                                                                                                                 | time:90ms, loops:2, RU:158.885311,       ...|
| └─IndexReader_14              | 1.00       | 1       | root      |                                                                                                                                 | time:89.8ms, loops:2, cop_task: {num: 1, ...|
|   └─HashAgg_6                 | 1.00       | 1       | cop[tikv] |                                                                                                                                 | tikv_task:{time:88ms, loops:52},         ...|
|     └─Selection_12            | 5245632.33 | 52863   | cop[tikv] |                                                                                                                                 | tikv_task:{time:80ms, loops:52}          ...|
|       └─IndexRangeScan_11     | 5245632.33 | 52863   | cop[tikv] | table:logs, index:logs_covered(snapshot_id, user_id, status, source_type, target_type, amount)                                  | tikv_task:{time:60ms, loops:52}          ...|
+-------------------------------+------------+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+

SQL tuning with a composite index involving sorting

You can optimize queries with ORDER BY clauses by creating composite indexes that include both filtering and sorting columns. This approach helps TiDB access data efficiently while maintaining the desired order.

For example, consider the following query that retrieves data from test based on specific conditions:

EXPLAIN ANALYZE SELECT  
test.*
FROM
  test
WHERE
  test.snapshot_id = 459840
  AND test.id > 998464
ORDER BY
  test.id ASC
LIMIT
  1000

The execution plan shows a duration of 170 ms. TiDB uses the test_index to perform an IndexRangeScan_20 with the filter snapshot_id = 459840. It then retrieves all columns from the table, returning 5,715 rows to TiDB after IndexLookUp_23. TiDB sorts these rows and returns 1,000 rows.

The id column is the primary key, which means it is implicitly included in the test_idx index. However, IndexRangeScan_20 does not guarantee order because test_idx includes two additional columns (user_id and status) after the index prefix column snapshot_id. As a result, the ordering of id is not preserved.

The original plan is as follows:

+------------------------------+---------+---------+-----------+----------------------------------------------------------+-----------------------------------------------+--------------------------------------------+
| id                           | estRows | actRows | task      | access object                                            | execution info                                | operator info                              | 
+------------------------------+---------+---------+-----------+----------------------------------------------------------+-----------------------------------------------+--------------------------------------------+
| id                           | estRows | actRows | task      | access object                                            | execution info                             ...| test.id, offset:0, count:1000              | 
| TopN_10                      | 19.98   | 1000    | root      |                                                          | time:170.6ms, loops:2                      ...|                                            |
| └─IndexLookUp_23             | 19.98   | 5715    | root      |                                                          | time:166.6ms, loops:7                      ...|                                            | 
|   ├─Selection_22(Build)      | 19.98   | 5715    | cop[tikv] |                                                          | time:18.6ms, loops:9, cop_task: {num: 3,   ...| gt(test.id, 998464)                        | 
|   │ └─IndexRangeScan_20      | 433.47  | 7715    | cop[tikv] | table:test, index:test_idx(snapshot_id, user_id, status) | tikv_task:{proc max:4ms, min:4ms, avg: 4ms ...| range:[459840,459840], keep order:false    |
|   └─TableRowIDScan_21(Probe) | 19.98   | 5715    | cop[tikv] | table:test                                               | time:301.6ms, loops:10, cop_task: {num: 3, ...| keep order:false                           | 
+------------------------------+---------+---------+-----------+----------------------------------------------------------+-----------------------------------------------+--------------------------------------------+

To optimize the query, you can create a new index on (snapshot_id). This ensures that id is sorted within each snapshot_id group. With this index, execution time drops to 96 ms. The keep order property becomes true for IndexRangeScan_33, and TopN is replaced with Limit. As a result, IndexLookUp_35 returns only 1,000 rows to TiDB, eliminating the need for additional sorting operations.

The following is the query statement with the optimized index:

CREATE INDEX test_new ON test(snapshot_id);
ANALYZE TABLE test INDEX test_new;

The new plan is as follows:

+----------------------------------+---------+---------+-----------+----------------------------------------------+----------------------------------------------+----------------------------------------------------+
| id                               | estRows | actRows | task      | access object                                | execution info                               | operator info                                      |
+----------------------------------+---------+---------+-----------+----------------------------------------------+----------------------------------------------+----------------------------------------------------+
| Limit_14                         | 17.59   | 1000    | root      |                                              | time:96.1ms, loops:2, RU:92.300155           | offset:0, count:1000                               |
| └─IndexLookUp_35                 | 17.59   | 1000    | root      |                                              | time:96.1ms, loops:1, index_task:         ...|                                                    |
|   ├─IndexRangeScan_33(Build)     | 17.59   | 5715    | cop[tikv] | table:test, index:test_new(snapshot_id)      | time:7.25ms, loops:8, cop_task: {num: 3,  ...| range:(459840 998464,459840 +inf], keep order:true |
|   └─TableRowIDScan_34(Probe)     | 17.59   | 5715    | cop[tikv] | table:test                                   | time:232.9ms, loops:9, cop_task: {num: 3, ...| keep order:false                                   |
+----------------------------------+---------+---------+-----------+----------------------------------------------+----------------------------------------------+----------------------------------------------------+

SQL tuning with composite indexes for efficient filtering and sorting

The following query takes 11 minutes and 9 seconds to execute, which is excessively long for a query that returns only 101 rows. The slow performance is caused by several factors:

Inefficient index usage: The optimizer selects the index on created_at, resulting in a scan of 25,147,450 rows.
Large intermediate result set: After applying the date range filter, 12,082,311 rows still require processing.
Late filtering: The most selective predicates (mode, user_id, and label_id) are applied after accessing the table, resulting in 16,604 rows.
Sorting overhead: The final sort operation on 16,604 rows adds additional processing time.

The following is the query statement:

SELECT `orders`.*
FROM `orders`
WHERE 
    `orders`.`mode` = 'production'
    AND `orders`.`user_id` = 11111
    AND orders.label_id IS NOT NULL
    AND orders.created_at >= '2024-04-07 18:07:52'
    AND orders.created_at <= '2024-05-11 18:07:52'
    AND orders.id >= 1000000000
    AND orders.id < 1500000000
ORDER BY orders.id DESC 
LIMIT 101;

The following are the existing indexes on orders:

PRIMARY KEY (`id`),
UNIQUE KEY `index_orders_on_adjustment_id` (`adjustment_id`),
KEY `index_orders_on_user_id` (`user_id`),
KEY `index_orders_on_label_id` (`label_id`),
KEY `index_orders_on_created_at` (`created_at`)

The original execution plan is as follows:

+--------------------------------+-----------+---------+-----------+--------------------------------------------------------------------------------+-----------------------------------------------------+----------------------------------------------------------------------------------------+----------+------+
| id                             | estRows   | actRows | task      | access object                                                                  | execution info                                      | operator info                                                                          | memory   | disk |
+--------------------------------+-----------+---------+-----------+--------------------------------------------------------------------------------+-----------------------------------------------------+----------------------------------------------------------------------------------------+----------+------+
| TopN_10                        | 101.00    | 101     | root      |                                                                                | time:11m9.8s, loops:2                               | orders.id:desc, offset:0, count:101                                                    | 271 KB   | N/A  |
| └─IndexLookUp_39               | 173.83    | 16604   | root      |                                                                                | time:11m9.8s, loops:19, index_task: {total_time:...}|                                                                                        | 20.4 MB  | N/A  |
|   ├─Selection_37(Build)        | 8296.70   | 12082311| cop[tikv] |                                                                                | time:26.4ms, loops:11834, cop_task: {num: 294, m...}| ge(orders.id, 1000000000), lt(orders.id, 1500000000)                                   | N/A      | N/A  |
|   │ └─IndexRangeScan_35        | 6934161.90| 25147450| cop[tikv] | table:orders, index:index_orders_on_created_at(created_at)                     | tikv_task:{proc max:2.15s, min:0s, avg: 58.9ms, ...}| range:[2024-04-07 18:07:52,2024-05-11 18:07:52), keep order:false                      | N/A      | N/A  |
|   └─Selection_38(Probe)        | 173.83    | 16604   | cop[tikv] |                                                                                | time:54m46.2s, loops:651, cop_task: {num: 1076, ...}| eq(orders.mode, "production"), eq(orders.user_id, 11111), not(isnull(orders.label_id)) | N/A      | N/A  |
|     └─TableRowIDScan_36        | 8296.70   | 12082311| cop[tikv] | table:orders                                                                   | tikv_task:{proc max:44.8s, min:0s, avg: 3.33s, p...}| keep order:false                                                                       | N/A      | N/A  |
+--------------------------------+-----------+---------+-----------+--------------------------------------------------------------------------------+-----------------------------------------------------+----------------------------------------------------------------------------------------+----------+------+

After creating a composite index idx_composite on orders(user_id, mode, id, created_at, label_id), the query performance improves significantly. The execution time drops from 11 minutes and 9 seconds to just 5.3 milliseconds, making the query over 126,000 times faster. This massive improvement results from:

Efficient index usage: the new index enables an index range scan on user_id, mode, and id, which are the most selective predicates. This reduces the number of scanned rows from millions to just 224.
Index-only sort: the keep order:true in the execution plan indicates that sorting is performed using the index structure, eliminating the need for a separate sort operation.
Early filtering: the most selective predicates are applied first, reducing the result set to 224 rows before further filtering.
Limit push-down: the LIMIT clause is pushed down to the index scan, allowing early termination of the scan once 101 rows are found.

This case demonstrates the significant impact of a well-designed index on query performance. By aligning the index structure with the query's predicates, sort order, and required columns, the query achieves a performance improvement of over five orders of magnitude.

CREATE INDEX idx_composite ON orders(user_id, mode, id, created_at, label_id);
ANALYZE TABLE orders index idx_composite;

The new execution plan is as follows:

+--------------------------------+-----------+---------+-----------+--------------------------------------------------------------------------------+-----------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+----------+------+
| id                             | estRows   | actRows | task      | access object                                                                  | execution info                                      | operator info                                                                                                        | memory   | disk |
+--------------------------------+-----------+---------+-----------+--------------------------------------------------------------------------------+-----------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+----------+------+
| IndexLookUp_32                 | 101.00    | 101     | root      |                                                                                | time:5.3ms, loops:2, RU:3.435006, index_task: {t...}| limit embedded(offset:0, count:101)                                                                                  | 128.5 KB | N/A  |
| ├─Limit_31(Build)              | 101.00    | 101     | cop[tikv] |                                                                                | time:1.35ms, loops:1, cop_task: {num: 1, max: 1....}| offset:0, count:101                                                                                                  | N/A      | N/A  |
| │ └─Selection_30               | 535.77    | 224     | cop[tikv] |                                                                                | tikv_task:{time:0s, loops:3}                        | ge(orders.created_at, 2024-04-07 18:07:52), le(orders.created_at, 2024-05-11 18:07:52), not(isnull(orders.label_id)) | N/A      | N/A  |
| │   └─IndexRangeScan_28        | 503893.42 | 224     | cop[tikv] | table:orders, index:idx_composite(user_id, mode, id, created_at, label_id)     | tikv_task:{time:0s, loops:3}                        | range:[11111 "production" 1000000000,11111 "production" 1500000000), keep order:true, desc                           | N/A      | N/A  |
| └─TableRowIDScan_29(Probe)     | 101.00    | 101     | cop[tikv] | table:orders                                                                   | time:2.9ms, loops:2, cop_task: {num: 3, max: 2.7...}| keep order:false                                                                                                     | N/A      | N/A  |
+--------------------------------+-----------+---------+-----------+--------------------------------------------------------------------------------+-----------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+----------+------+

When to use TiFlash

This section explains when to use TiFlash in TiDB. TiFlash is optimized for analytical queries that involve complex calculations, aggregations, and large dataset scans. Its columnar storage format and MPP mode significantly improve performance in these scenarios.

Use TiFlash in the following scenarios:

Large-scale data analysis: TiFlash delivers faster performance for OLAP workloads that require extensive data scans. Its columnar storage and MPP execution mode optimize query efficiency compared to TiKV.
Complex scans, aggregations, and joins: TiFlash processes queries with heavy aggregations and joins more efficiently by reading only the necessary columns.
Mixed workloads: in hybrid environments where both transactional (OLTP) and analytical (OLAP) workloads run simultaneously, TiFlash handles analytical queries without affecting TiKV's performance for transactional queries.
SaaS applications with arbitrary filtering requirements: queries often involve filtering across many columns. Indexing all columns is impractical, especially when queries include a tenant ID as part of the primary key. TiFlash sorts and clusters data by primary key, making it well suited for these workloads. With the late materialization feature, TiFlash enables efficient table range scans, improving query performance without the overhead of maintaining multiple indexes.

Using TiFlash strategically enhances query performance and optimizes resource usage in TiDB for data-intensive analytical queries. The following sections provide examples of TiFlash use cases.

Analytical query

This section compares the execution performance of TPC-H Query 14 on TiKV and TiFlash storage engines.

TPC-H Query 14 involves joining the order_line and item tables. The query takes 21.1 seconds on TiKV but only 1.41 seconds using TiFlash MPP mode, making it 15 times faster.

TiKV plan: TiDB fetches 3,864,397 rows from the lineitem table and 10 million rows from the part table. The hash join operation (HashJoin_21), along with the subsequent projection (Projection_38) and aggregation (HashAgg_9) operations, are performed in TiDB.
TiFlash plan: The optimizer detects TiFlash replicas for both order_line and item tables. Based on cost estimation, TiDB automatically selects the MPP mode, executing the entire query within the TiFlash columnar storage engine. This includes table scans, hash joins, column projections, and aggregations, significantly improving performance compared to the TiKV plan.

The following is the query:

select 100.00 * sum(case when i_data like 'PR%' then ol_amount else 0 end) / (1+sum(ol_amount)) as promo_revenue
from order_line, item
where ol_i_id = i_id and ol_delivery_d >= '2007-01-02 00:00:00.000000' and ol_delivery_d < '2030-01-02 00:00:00.000000';

The execution plan on TiKV is as follows:

+-------------------------------+--------------+-----------+-----------+----------------+----------------------------------------------+
| ID                            | ESTROWS      | ACTROWS   | TASK      | ACCESS OBJECT  | EXECUTION INFO                               |
+-------------------------------+--------------+-----------+-----------+----------------+----------------------------------------------+
| Projection_8                  | 1.00         | 1         | root      |                | time:21.1s, loops:2, RU:1023225.707561, ...  |
| └─HashAgg_9                   | 1.00         | 1         | root      |                | time:21.1s, loops:2, partial_worker:{ ...    |
|   └─Projection_38             | 3839984.46   | 3864397   | root      |                | time:21.1s, loops:3776, Concurrency:5        |
|     └─HashJoin_21             | 3839984.46   | 3864397   | root      |                | time:21.1s, loops:3776, build_hash_table:... |
|       ├─TableReader_24(Build) | 3826762.62   | 3864397   | root      |                | time:18.4s, loops:3764, cop_task: ...        |
|       │ └─Selection_23        | 3826762.62   | 3864397   | cop[tikv] |                | tikv_task:{proc max:717ms, min:265ms, ...    |
|       │   └─TableFullScan_22  | 300005811.00 | 300005811 | cop[tikv] | table:lineitem | tikv_task:{proc max:685ms, min:252ms, ...    |
|       └─TableReader_26(Probe) | 10000000.00  | 10000000  | root      |                | time:1.29s, loops:9780, cop_task: ...        |
|         └─TableFullScan_25    | 10000000.00  | 10000000  | cop[tikv] | table:part     | tikv_task:{proc max:922ms, min:468ms, ...    |
+-------------------------------+--------------+-----------+-----------+----------------+----------------------------------------------+

The execution plan on TiFlash is as follows:

+--------------------------------------------+-------------+----------+--------------+----------------+--------------------------------------+
| ID                                         | ESTROWS     | ACTROWS  | TASK         | ACCESS OBJECT  | EXECUTION INFO                       |
+--------------------------------------------+-------------+----------+--------------+----------------+--------------------------------------+
| Projection_8                               | 1.00        | 1        | root         |                | time:1.41s, loops:2, RU:45879.127909 |
| └─HashAgg_52                               | 1.00        | 1        | root         |                | time:1.41s, loops:2, ...             |
|   └─TableReader_54                         | 1.00        | 1        | root         |                | time:1.41s, loops:2, ...             |
|     └─ExchangeSender_53                    | 1.00        | 1        | mpp[tiflash] |                | tiflash_task:{time:1.41s, ...        |
|       └─HashAgg_13                         | 1.00        | 1        | mpp[tiflash] |                | tiflash_task:{time:1.41s, ...        |
|         └─Projection_74                    | 3813443.11  | 3864397  | mpp[tiflash] |                | tiflash_task:{time:1.4s, ...         |
|           └─Projection_51                  | 3813443.11  | 3864397  | mpp[tiflash] |                | tiflash_task:{time:1.39s, ...        |
|             └─HashJoin_50                  | 3813443.11  | 3864397  | mpp[tiflash] |                | tiflash_task:{time:1.39s, ...        |
|               ├─ExchangeReceiver_31(Build) | 3800312.67  | 3864397  | mpp[tiflash] |                | tiflash_task:{time:1.05s, ...        |
|               │ └─ExchangeSender_30        | 3800312.67  | 3864397  | mpp[tiflash] |                | tiflash_task:{time:1.2s, ...         |
|               │   └─TableFullScan_28       | 3800312.67  | 3864397  | mpp[tiflash] | table:lineitem | tiflash_task:{time:1.15s, ...        |
|               └─ExchangeReceiver_34(Probe) | 10000000.00 | 10000000 | mpp[tiflash] |                | tiflash_task:{time:1.24s, ...        |
|                 └─ExchangeSender_33        | 10000000.00 | 10000000 | mpp[tiflash] |                | tiflash_task:{time:1.4s, ...         |
|                   └─TableFullScan_32       | 10000000.00 | 10000000 | mpp[tiflash] | table:part     | tiflash_task:{time:59.2ms, ...       |
+--------------------------------------------+-------------+----------+--------------+----------------+--------------------------------------+

SaaS arbitrary filtering workloads

In SaaS applications, tables often use composite primary keys that include tenant identification. The following example demonstrates how TiFlash can significantly improve query performance in these scenarios.

Case study: multi-tenant data access

Consider a table with a composite primary key: (tenantId, objectTypeId, objectId). A typical query pattern for this table involves:

Retrieving the first N records for a specific tenant and object type while applying random filters across hundreds or thousands of columns. This makes creating indexes for all possible filter combinations impractical. The query might also include a sort operation after filtering.
Calculating the total count of records that match the filter criteria.

Performance comparison

When running the same query on different storage engines, you can observe significant performance differences:

TiKV plan: The query takes 2 minutes 38.6 seconds on TiKV. TableRangeScan sends 5,121 cop tasks because the data is distributed across 5,121 Regions.
TiFlash plan: The same query takes only 3.44 seconds on TiFlash MPP engine—nearly 46 times faster. TiFlash stores data sorted by primary key, so queries filtered by the primary key's prefix use TableRangeScan instead of a full table scan. TiFlash requires only 2 MPP tasks compared to TiKV's 5,121 tasks.