- Key Features
- Horizontal Scalability
- MySQL Compatible Syntax
- Replicate from and to MySQL
- Distributed Transactions with Strong Consistency
- Cloud Native Architecture
- Minimize ETL with HTAP
- Fault Tolerance & Recovery with Raft
- Automatic Rebalancing
- Deployment and Orchestration with Ansible, Kubernetes, Docker
- JSON Support
- Spark Integration
- Read Historical Data Without Restoring from Backup
- Fast Import and Restore of Data
- Hybrid of Column and Row Storage
- SQL Plan Management
- Open Source
- Online Schema Changes
- Key Features
- Get Started
- From Binary Tarball
- Orchestrated Deployment
- Geographic Redundancy
- SQL Language Structure
- Data Types
- Numeric Types
- Date and Time Types
- String Types
- Functions and Operators
- Function and Operator Reference
- Type Conversion in Expression Evaluation
- Control Flow Functions
- String Functions
- Numeric Functions and Operators
- Date and Time Functions
- Bit Functions and Operators
- Cast Functions and Operators
- Encryption and Compression Functions
- Information Functions
- JSON Functions
- Aggregate (GROUP BY) Functions
- Window Functions
- Miscellaneous Functions
- Precision Math
- List of Expressions for Pushdown
- SQL Statements
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
CREATE TABLE LIKE
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET [GLOBAL|SESSION] <variable>
SHOW ANALYZE STATUS
SHOW CHARACTER SET
SHOW [FULL] COLUMNS FROM
SHOW CREATE TABLE
SHOW CREATE USER
SHOW [FULL] FIELDS FROM
SHOW INDEXES [FROM|IN]
SHOW INDEX [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW [FULL] PROCESSSLIST
SHOW [FULL] TABLES
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [GLOBAL|SESSION] VARIABLES
- System Databases
- Garbage Collection (GC)
- Understanding the Query Execution Plan
- The Blocklist of Optimization Rules and Expression Pushdown
- Introduction to Statistics
- TopN and Limit Push Down
- Optimizer Hints
- Follower Read
- Check the TiDB Cluster Status Using SQL Statements
- Execution Plan Binding
- Statement Summary Table
- Tune TiKV
- Operating System Tuning
- Column Pruning
- Key Monitoring Metrics
- Best Practices
- TiDB Binlog
- Binlog Consumer Client
- TiDB Binlog Relay Log
- Bidirectional Replication Between TiDB Clusters
- TiDB Lightning
- All Releases
To better handle SQL performance related issues, MySQL has provided statement summary tables in
performance_schema to monitor SQL with statistics. Among these tables,
events_statements_summary_by_digest is very useful in locating SQL problems with its abundant fields such as latency, execution times, rows scanned, and full table scans.
Starting from v3.1.0-beta, TiDB provides the support for the
table. Starting from v3.1.0-beta.1, TiDB provides the support for the
events_statements_summary_by_digest_history table. In this document, you will learn about the two tables, and how to troubleshoot SQL performance issues using these tables.
events_statements_summary_by_digest is a system table in
performance_schema. It groups the SQL statements by the SQL digest and the plan digest, and provides statistics for each SQL category.
The "SQL digest" here means the same as used in slow logs, which is a unique identifier calculated through normalized SQL statements. The normalization process ignores constant, blank characters, and is case insensitive. Therefore, statements with consistent syntaxes have the same digest. For example:
SELECT * FROM employee WHERE id IN (1, 2, 3) AND salary BETWEEN 1000 AND 2000; select * from EMPLOYEE where ID in (4, 5) and SALARY between 3000 and 4000;
After normalization, they are both of the following category: aliases: ['/docs/v3.1/reference/performance/statement-summary/']
select * from employee where id in (...) and salary between ? and ?;
The "plan digest" here refers to the unique identifier calculated through normalized execution plan. The normalization process ignores constants. The same SQL statements might be grouped into different categories because the same statements might have different execution plans. SQL statements of the same category have the same execution plan.
events_statements_summary_by_digest stores the aggregated results of SQL monitoring metrics. In general, each of the monitoring metrics includes the maximum value and average value. For example, the execution latency metric corresponds to two fields:
AVG_LATENCY (average latency) and
MAX_LATENCY (maximum latency).
To make sure that the monitoring metrics are up to date, data in the
events_statements_summary_by_digest table is periodically cleared, and only recent aggregated results are retained and displayed. The periodical data clearing is controlled by the
tidb_stmt_summary_refresh_interval system variable. If you happen to make a query right after the clearing, the data displayed might be very little.
Some concepts in TiDB are different from those in MySQL. For this reason, the schema of the
events_statements_summary_by_digest table in TiDB greatly differs from that in MySQL.
The following is a sample output of querying
SUMMARY_BEGIN_TIME: 2020-01-02 11:00:00 SUMMARY_END_TIME: 2020-01-02 11:30:00 STMT_TYPE: select SCHEMA_NAME: test DIGEST: 0611cc2fe792f8c146cc97d39b31d9562014cf15f8d41f23a4938ca341f54182 DIGEST_TEXT: select * from employee where id = ? TABLE_NAMES: test.employee INDEX_NAMES: NULL SAMPLE_USER: root EXEC_COUNT: 3 SUM_LATENCY: 1035161 MAX_LATENCY: 399594 MIN_LATENCY: 301353 AVG_LATENCY: 345053 AVG_PARSE_LATENCY: 57000 MAX_PARSE_LATENCY: 57000 AVG_COMPILE_LATENCY: 175458 MAX_COMPILE_LATENCY: 175458 ........... AVG_MEM: 103 MAX_MEM: 103 AVG_AFFECTED_ROWS: 0 FIRST_SEEN: 2020-01-02 11:12:54 LAST_SEEN: 2020-01-02 11:25:24 QUERY_SAMPLE_TEXT: select * from employee where id=3100 PREV_SAMPLE_TEXT: PLAN_DIGEST: f415b8d52640b535b9b12a9c148a8630d2c6d59e419aad29397842e32e8e5de3 PLAN: Point_Get_1 root 1 table:employee, handle:3100
In TiDB, the time unit of fields in statement summary tables is nanosecond (ns), whereas in MySQL the time unit is picosecond (ps).
STMT_TYPE: SQL statement type.
SCHEMA_NAME: The current schema in which SQL statements of this category are executed.
DIGEST: The digest of SQL statements of this category.
DIGEST_TEXT: The normalized SQL statement.
QUERY_SAMPLE_TEXT: The original SQL statements of the SQL category. Only one original statement is taken.
TABLE_NAMES: All tables involved in SQL statements. If there is more than one table, each is separated by a comma.
INDEX_NAMES: All SQL indexes used in SQL statements. If there is more than one index, each is separated by a comma.
SAMPLE_USER: The users who execute SQL statements of this category. Only one user is taken.
PLAN_DIGEST: The digest of the execution plan.
PLAN: The original execution plan. If there are multiple statements, the plan of only one statement is taken.
Fields related to execution time:
SUMMARY_BEGIN_TIME: The beginning time of the current summary period.
SUMMARY_END_TIME: The ending time of the current summary period.
FIRST_SEEN: The time when SQL statements of this category are seen for the first time.
LAST_SEEN: The time when SQL statements of this category are seen for the last time.
Fields related to TiDB server:
EXEC_COUNT: Total execution times of SQL statements of this category.
SUM_LATENCY: The total execution latency of SQL statements of this category.
MAX_LATENCY: The maximum execution latency of SQL statements of this category.
MIN_LATENCY: The minimum execution latency of SQL statements of this category.
AVG_LATENCY: The average execution latency of SQL statements of this category.
AVG_PARSE_LATENCY: The average latency of the parser.
MAX_PARSE_LATENCY: The maximum latency of the parser.
AVG_COMPILE_LATENCY: The average latency of the compiler.
MAX_COMPILE_LATENCY: The maximum latency of the compiler.
AVG_MEM: The average memory (byte) used.
MAX_MEM: The maximum memory (byte) used.
Fields related to TiKV Coprocessor task:
COP_TASK_NUM: The number of Coprocessor requests that a SQL statement sends.
AVG_COP_PROCESS_TIME: The average execution time of Coprocessor tasks.
MAX_COP_PROCESS_TIME: The maximum execution time of Coprocessor tasks.
MAX_COP_PROCESS_ADDRESS: The address of the Coprocessor task with the maximum execution time.
AVG_COP_WAIT_TIME: The average waiting time of Coprocessor tasks.
MAX_COP_WAIT_TIME: The maximum waiting time of Coprocessor tasks.
MAX_COP_WAIT_ADDRESS: The address of the Coprocessor task with the maximum waiting time.
AVG_PROCESS_TIME: The average processing time of SQL statements in TiKV.
MAX_PROCESS_TIME: The maximum processing time of SQL statements in TiKV.
AVG_WAIT_TIME: The average waiting time of SQL statements in TiKV.
MAX_WAIT_TIME: The maximum waiting time of SQL statements in TiKV.
AVG_BACKOFF_TIME: The average waiting time before retry when a SQL statement encounters an error that requires a retry.
MAX_BACKOFF_TIME: The maximum waiting time before retry when a SQL statement encounters an error that requires a retry.
AVG_TOTAL_KEYS: The average number of keys that Coprocessor has scanned.
MAX_TOTAL_KEYS: The maximum number of keys that Coprocessor has scanned.
AVG_PROCESSED_KEYS: The average number of keys that Coprocessor has processed. Compared with
avg_processed_keysdoes not include the old versions of MVCC. A great difference between
avg_processed_keysindicates that many old versions exist.
MAX_PROCESSED_KEYS: The maximum number of keys that Coprocessor has processed.
AVG_PREWRITE_TIME: The average time of the prewrite phase.
MAX_PREWRITE_TIME: The longest time of the prewrite phase.
AVG_COMMIT_TIME: The average time of the commit phase.
MAX_COMMIT_TIME: The longest time of the commit phase.
AVG_GET_COMMIT_TS_TIME: The average time of getting
MAX_GET_COMMIT_TS_TIME: The longest time of getting
AVG_COMMIT_BACKOFF_TIME: The average waiting time before retry when a SQL statement encounters an error that requires a retry during the commit phase.
MAX_COMMIT_BACKOFF_TIME: The maximum waiting time before retry when a SQL statement encounters an error that requires a retry during the commit phase.
AVG_RESOLVE_LOCK_TIME: The average time for resolving lock conflicts occurred between transactions.
MAX_RESOLVE_LOCK_TIME: The longest time for resolving lock conflicts occurred between transactions.
AVG_LOCAL_LATCH_WAIT_TIME: The average waiting time of the local transaction.
MAX_LOCAL_LATCH_WAIT_TIME: The maximum waiting time of the local transaction.
AVG_WRITE_KEYS: The average count of written keys.
MAX_WRITE_KEYS: The maximum count of written keys.
AVG_WRITE_SIZE: The average amount of written data (in byte).
MAX_WRITE_SIZE: The maximum amount of written data (in byte).
AVG_PREWRITE_REGIONS: The average number of Regions involved in the prewrite phase.
MAX_PREWRITE_REGIONS: The maximum number of Regions during the prewrite phase.
AVG_TXN_RETRY: The average number of transaction retries.
MAX_TXN_RETRY: The maximum number of transaction retries.
SUM_BACKOFF_TIMES: The sum of retries when SQL statements of this category encounter errors that require a retry.
BACKOFF_TYPES: All types of errors that require retries and the number of retries for each type. The format of the field is
type:number. If there is more than one error type, each is separated by a comma, like
AVG_AFFECTED_ROWS: The average number of rows affected.
PREV_SAMPLE_TEXT: When the current SQL statement is
PREV_SAMPLE_TEXTis the previous statement to
COMMIT. In this case, SQL statements are grouped by the digest and
prev_sample_text. This means that
COMMITstatements with different
prev_sample_textare grouped to different rows. When the current SQL statement is not
PREV_SAMPLE_TEXTfield is an empty string.
The schema of the
events_statements_summary_by_digest_history table is identical to that of the
events_statements_summary_by_digest_history stores the historical data used to troubleshoot anomalies or to compare monitoring metrics of different time.
SUMMARY_BEGIN_TIME field and the
SUMMARY_END_TIME field refer to the beginning time and the ending time of a historical period.
This section shows how to use the statement summary feature to troubleshoot SQL performance issues using two sample questions.
In this example, the client shows slow performance with point queries on the employee table. You can perform a fuzzy search by SQL texts:
SELECT avg_latency, exec_count, query_sample_text FROM performance_schema.events_statements_summary_by_digest WHERE digest_text LIKE 'select * from employee%';
As shown in the result below,
avg_latency of 1ms and 0.3ms are in the normal range. Therefore, it can be concluded that the server end is not the cause, and continue the troubleshooting with the client or the network.
+-------------+------------+------------------------------------------+ | avg_latency | exec_count | query_sample_text | +-------------+------------+------------------------------------------+ | 1042040 | 2 | select * from employee where name='eric' | | 345053 | 3 | select * from employee where id=3100 | +-------------+------------+------------------------------------------+ 2 rows in set (0.00 sec)
If the QPS (queries/sec) decrease significantly from 10:00 A.M. to 10:30 A.M., you can find out the three categories of SQL statements with the longest time consumption from the history table:
SELECT sum_latency, avg_latency, exec_count, query_sample_text FROM performance_schema.events_statements_summary_by_digest_history WHERE summary_begin_time='2020-01-02 10:00:00' ORDER BY sum_latency DESC LIMIT 3;
The result shows that the following three SQL categories consume the longest time in total, which require focus on optimization.
+-------------+-------------+------------+-----------------------------------------------------------------------+ | sum_latency | avg_latency | exec_count | query_sample_text | +-------------+-------------+------------+-----------------------------------------------------------------------+ | 7855660 | 1122237 | 7 | select avg(salary) from employee where company_id=2013 | | 7241960 | 1448392 | 5 | select * from employee join company on employee.company_id=company.id | | 2084081 | 1042040 | 2 | select * from employee where name='eric' | +-------------+-------------+------------+-----------------------------------------------------------------------+ 3 rows in set (0.00 sec)
The statement summary feature is disabled by default. You can enable it by setting a system variable, for example:
set global tidb_enable_stmt_summary = true;
The statistics in the system table will be cleared if the statement summary feature is disabled, and will be re-calculated next time the statement summary feature is enabled. Tests have shown that enabling this feature has little impact on performance.
Another two system variables that control the statement summary:
tidb_stmt_summary_refresh_interval: The interval at which the
events_statements_summary_by_digesttable is refreshed. The time unit is second (s). The default value is
tidb_stmt_summary_history_size: The size of each SQL statement category historically stored in the
events_statements_summary_by_digest_historytable. The default value is
The following is a configuration example of statement summary:
set global tidb_stmt_summary_refresh_interval = 1800; set global tidb_stmt_summary_history_size = 24;
When the above configuration takes effect, every 30 minutes the
events_statements_summary_by_digest is cleared.
events_statements_summary_by_digest_history stores data generated over the recent 12 hours.
The above two system variables have two scopes - global and session, which work a little differently from other system variables, as described below:
- Set the global variable to apply to the cluster immediately.
- Set the session variable to apply to the current TiDB server immediately. This is useful when you debug on a single TiDB server instance.
- The session variable has a higher read priority. The global variable will be read only if no session variable is set.
- Set the session variable to a blank string to re-read the global variable.
The statement summary tables are memory tables. To prevent potential memory issues, you need to limit the number of statements to be saved and the longest SQL display length. You can configure these limits using the following parameters under
max-stmt-countlimits the number of SQL statements that can be stored. The default value is
200. If the set limit is exceeded, those SQL statements that recently remain unused will be cleared.
max-sql-lengthspecifies the longest display length of
QUERY_SAMPLE_TEXT. The default value is 4096.
max-sql-lengthconfiguration items affect memory usage. It is recommended that you adjust these configurations based on your actual needs. It is not recommended to set them too large values.
The statement summary tables have the following limitations:
- Querying statement summary tables only returns the statement summary of the current TiDB server, not that of the entire cluster.
- The statement summary will be lost when the TiDB server restarts. This is because statement summary tables are memory tables, and the data is cached in memory instead of being persisted on storage.