Statement Summary Tables

To better handle SQL performance issues, MySQL has provided statement summary tables in performance_schema to monitor SQL with statistics. Among these tables, events_statements_summary_by_digest is very useful in locating SQL problems with its abundant fields such as latency, execution times, rows scanned, and full table scans.

Therefore, starting from v4.0.0-rc.1, TiDB provides system tables in information_schema (not performance_schema) that are similar to events_statements_summary_by_digest in terms of features.

statements_summary
statements_summary_history
cluster_statements_summary
cluster_statements_summary_history
statements_summary_evicted

Note

The preceding tables are not available on TiDB Cloud Starter and TiDB Cloud Essential instances.

This document details these tables and introduces how to use them to troubleshoot SQL performance issues.

`statements_summary`

statements_summary is a system table in information_schema. statements_summary groups the SQL statements by the resource group, the SQL digest and the plan digest, and provides statistics for each SQL category.

The "SQL digest" here means the same as used in slow logs, which is a unique identifier calculated through normalized SQL statements. The normalization process ignores constant, blank characters, and is case insensitive. Therefore, statements with consistent syntaxes have the same digest. For example:

SELECT * FROM employee WHERE id IN (1, 2, 3) AND salary BETWEEN 1000 AND 2000;
select * from EMPLOYEE where ID in (4, 5) and SALARY between 3000 and 4000;

After normalization, they are both of the following category:

select * from employee where id in (...) and salary between ? and ?;

The "plan digest" here refers to the unique identifier calculated through normalized execution plan. The normalization process ignores constants. The same SQL statements might be grouped into different categories because the same statements might have different execution plans. SQL statements of the same category have the same execution plan.

statements_summary stores the aggregated results of SQL monitoring metrics. In general, each of the monitoring metrics includes the maximum value and average value. For example, the execution latency metric corresponds to two fields: AVG_LATENCY (average latency) and MAX_LATENCY (maximum latency).

To make sure that the monitoring metrics are up to date, data in the statements_summary table is periodically cleared, and only recent aggregated results are retained and displayed. The periodical data clearing is controlled by the tidb_stmt_summary_refresh_interval system variable. If you happen to make a query right after the clearing, the data displayed might be very little.

The following is a sample output of querying statements_summary:

   SUMMARY_BEGIN_TIME: 2020-01-02 11:00:00
     SUMMARY_END_TIME: 2020-01-02 11:30:00
            STMT_TYPE: Select
          SCHEMA_NAME: test
               DIGEST: 0611cc2fe792f8c146cc97d39b31d9562014cf15f8d41f23a4938ca341f54182
          DIGEST_TEXT: select * from employee where id = ?
          TABLE_NAMES: test.employee
          INDEX_NAMES: NULL
          SAMPLE_USER: root
           EXEC_COUNT: 3
          SUM_LATENCY: 1035161
          MAX_LATENCY: 399594
          MIN_LATENCY: 301353
          AVG_LATENCY: 345053
    AVG_PARSE_LATENCY: 57000
    MAX_PARSE_LATENCY: 57000
  AVG_COMPILE_LATENCY: 175458
  MAX_COMPILE_LATENCY: 175458
  ...........
              AVG_MEM: 103
              MAX_MEM: 103
              AVG_DISK: 65535
              MAX_DISK: 65535
    AVG_AFFECTED_ROWS: 0
           FIRST_SEEN: 2020-01-02 11:12:54
            LAST_SEEN: 2020-01-02 11:25:24
    QUERY_SAMPLE_TEXT: select * from employee where id=3100
     PREV_SAMPLE_TEXT:
          PLAN_DIGEST: f415b8d52640b535b9b12a9c148a8630d2c6d59e419aad29397842e32e8e5de3
                 PLAN:  Point_Get_1     root    1       table:employee, handle:3100

Note

In TiDB, the time unit of fields in statement summary tables is nanosecond (ns), whereas in MySQL the time unit is picosecond (ps).
Starting from v7.5.1 and v7.6.0, for clusters with resource control enabled, statements_summary will be aggregated by resource group, for example, the same statements executed in different resource groups will be collected as different records.

`statements_summary_history`

The table schema of statements_summary_history is identical to that of statements_summary. statements_summary_history saves the history data of a time range. By checking history data, you can troubleshoot anomalies and compare monitoring metrics of different time ranges.

The fields SUMMARY_BEGIN_TIME and SUMMARY_END_TIME represent the start time and the end time of the historical time range.

`statements_summary_evicted`

The tidb_stmt_summary_max_stmt_count system variable limits the number of SQL digests that the statements_summary and statements_summary_history tables can store in memory totally. Once this limit is exceeded, TiDB evicts the least recently used SQL digests from both statements_summary and statements_summary_history tables.

Note

When tidb_stmt_summary_enable_persistent is enabled, data in the statements_summary_history table is persisted to the disk. In this case, tidb_stmt_summary_max_stmt_count only limits the number of SQL digests that the statements_summary table can store in memory, and TiDB evicts the least recently used SQL digests only from the statements_summary table when tidb_stmt_summary_max_stmt_count is exceeded.

The statements_summary_evicted table records the period during which the eviction occurs and the number of SQL digests evicted during that period. This table helps you evaluate whether tidb_stmt_summary_max_stmt_count is properly configured for your workload. If this table contains records, it indicates that the number of SQL digests exceeded tidb_stmt_summary_max_stmt_count at some time point.

On the SQL statements page of TiDB Dashboard, the information about evicted statements is displayed in the Others row.

The `cluster` tables for statement summary

The statements_summary, statements_summary_history, and statements_summary_evicted tables only show the statement summary of a single TiDB server. To query the data of the entire cluster, you need to query the cluster_statements_summary, cluster_statements_summary_history, or cluster_statements_summary_evicted tables.

cluster_statements_summary displays the statements_summary data of each TiDB server. cluster_statements_summary_history displays the statements_summary_history data of each TiDB server. cluster_statements_summary_evicted displays the statements_summary_evicted data of each TiDB server. These tables use the INSTANCE field to represent the address of the TiDB server. The other fields are the same as those in statements_summary, statements_summary_history, and statements_summary_evicted.

Parameter configuration

The following system variables are used to control the statement summary:

tidb_enable_stmt_summary: Determines whether to enable the statement summary feature. 1 represents enable, and 0 means disable. The feature is enabled by default. The statistics in the system table are cleared if this feature is disabled. The statistics are re-calculated next time this feature is enabled. Tests have shown that enabling this feature has little impact on performance.
tidb_stmt_summary_refresh_interval: The interval at which the statements_summary table is refreshed. The time unit is second (s). The default value is 1800.
tidb_stmt_summary_history_size: The size of each SQL statement category stored in the statements_summary_history table, which is also the maximum number of records in the statements_summary_evicted table. The default value is 24.
tidb_stmt_summary_max_stmt_count: Limits the number of SQL digests that the statements_summary and statements_summary_history tables can store in memory totally. The default value is 3000.
Once this limit is exceeded, TiDB evicts the least recently used SQL digests from both statements_summary and statements_summary_history tables. These evicted digests are then counted in the statements_summary_evicted table.
Note
- When a SQL digest is evicted, its related summary data of all time ranges is removed from both the statements_summary and statements_summary_history tables. As a result, even if the number of SQL digests within a specific time range does not exceed the limit, the number of SQL digests in the statements_summary_history table might be less than the actual number of SQL digests. If this situation occurs and affects performance, you are recommended to increase the value of tidb_stmt_summary_max_stmt_count.
- For TiDB Self-Managed, when tidb_stmt_summary_enable_persistent is enabled, data in the statements_summary_history table is persisted to the disk. In this case, tidb_stmt_summary_max_stmt_count only limits the number of SQL digests that the statements_summary table can store in memory, and TiDB evicts the least recently used SQL digests only from the statements_summary table when tidb_stmt_summary_max_stmt_count is exceeded.
tidb_stmt_summary_max_sql_length: Specifies the longest display length of DIGEST_TEXT and QUERY_SAMPLE_TEXT. The default value is 4096.
tidb_stmt_summary_internal_query: Determines whether to count the TiDB SQL statements. 1 means to count, and 0 means not to count. The default value is 0.

An example of the statement summary configuration is shown as follows:

set global tidb_stmt_summary_max_stmt_count = 3000;
set global tidb_enable_stmt_summary = true;
set global tidb_stmt_summary_refresh_interval = 1800;
set global tidb_stmt_summary_history_size = 24;

After the preceding configuration takes effect, the statements_summary table is cleared every 30 minutes and the statements_summary_history table stores up to 3000 types of SQL statements. For each type, the statements_summary_history table stores data for the recent 24 periods. The statements_summary_evicted table records the recent 24 periods during which SQL statements are evicted from the statement summary. The statements_summary_evicted table is updated every 30 minutes.

Note

If a SQL type appears every minute, the statements_summary_history stores data for the most recent 12 hours. If a SQL type only appears from 00:00 to 00:30 every day, the statements_summary_history stores data for the most recent 24 periods, with each period being 1 day. Therefore, the statements_summary_history stores the most recent 24 days of data for this SQL type.
The tidb_stmt_summary_history_size, tidb_stmt_summary_max_stmt_count, and tidb_stmt_summary_max_sql_length configuration items affect memory usage. It is recommended that you adjust these configurations based on your needs, the SQL size, SQL count, and machine configuration. It is not recommended to set them too large values. You can calculate the memory usage using tidb_stmt_summary_history_size * tidb_stmt_summary_max_stmt_count * tidb_stmt_summary_max_sql_length * 3.

Set a proper size for statement summary

After the system has run for a period of time (depending on the system load), you can check the statement_summary table to see whether SQL eviction has occurred. For example:

select @@global.tidb_stmt_summary_max_stmt_count;
select count(*) from information_schema.statements_summary;

+-------------------------------------------+
| @@global.tidb_stmt_summary_max_stmt_count |
+-------------------------------------------+
| 3000                                      |
+-------------------------------------------+
1 row in set (0.001 sec)

+----------+
| count(*) |
+----------+
|     3001 |
+----------+
1 row in set (0.001 sec)

You can see that the statements_summary table is full of records. Then check the evicted data from the statements_summary_evicted table:

select * from information_schema.statements_summary_evicted;

+---------------------+---------------------+---------------+
| BEGIN_TIME          | END_TIME            | EVICTED_COUNT |
+---------------------+---------------------+---------------+
| 2020-01-02 16:30:00 | 2020-01-02 17:00:00 |            59 |
+---------------------+---------------------+---------------+
| 2020-01-02 16:00:00 | 2020-01-02 16:30:00 |            45 |
+---------------------+---------------------+---------------+
2 row in set (0.001 sec)

From the preceding result, you can see that a maximum of 59 SQL categories are evicted. In this case, it is recommended that you increase the size of the statement_summary table by at least 59 records, which means increasing the size to at least 3059 records.

Limitation

By default, statements summary tables are saved in memory. When a TiDB server restarts, all data will be lost.

To address this issue, TiDB v6.6.0 experimentally introduces the statement summary persistence feature, which is disabled by default. After this feature is enabled, the history data is no longer saved in memory, but directly written to disks. In this way, the history data is still available if a TiDB server restarts.

Persist statements summary

Warning

Statements summary persistence is an experimental feature. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an issue on GitHub.

As described in the Limitation section, statements summary tables are saved in memory by default. Once a TiDB server restarts, all the statements summary will be lost. Starting from v6.6.0, TiDB experimentally provides the configuration item tidb_stmt_summary_enable_persistent to allow users to enable or disable statements summary persistence.

To enable statements summary persistence, you can add the following configuration items to the TiDB configuration file:

[instance]
tidb_stmt_summary_enable_persistent = true
# The following entries use the default values, which can be modified as needed.
# tidb_stmt_summary_filename = "tidb-statements.log"
# tidb_stmt_summary_file_max_days = 3
# tidb_stmt_summary_file_max_size = 64 # MiB
# tidb_stmt_summary_file_max_backups = 0

After statements summary persistence is enabled, the memory keeps only the current real-time data and no history data. Once the real-time data is refreshed as history data, the history data is written to the disk at an interval of tidb_stmt_summary_refresh_interval described in the Parameter configuration section. Queries on the statements_summary_history or cluster_statements_summary_history table will return results combining both in-memory and on-disk data.

Note

When statements summary persistence is enabled, the tidb_stmt_summary_history_size configuration described in the Parameter configuration section will no longer take effect because the memory does not keep the history data. Instead, the following three configurations will be used to control the retention period and size of history data for persistence: tidb_stmt_summary_file_max_days, tidb_stmt_summary_file_max_size, and tidb_stmt_summary_file_max_backups.
The smaller the value of tidb_stmt_summary_refresh_interval, the more immediate data is written to the disk. However, this also means more redundant data is written to the disk.

Troubleshooting examples

This section provides two examples to show how to use the statement summary feature to troubleshoot SQL performance issues.

Could high SQL latency be caused by the server end?

In this example, the client shows slow performance with point queries on the employee table. You can perform a fuzzy search on SQL texts:

SELECT avg_latency, exec_count, query_sample_text
    FROM information_schema.statements_summary
    WHERE digest_text LIKE 'select * from employee%';

1ms and 0.3ms are considered within the normal range of avg_latency. Therefore, it can be concluded that the server end is not the cause. You can troubleshoot with the client or the network.

+-------------+------------+------------------------------------------+
| avg_latency | exec_count | query_sample_text                        |
+-------------+------------+------------------------------------------+
|     1042040 |          2 | select * from employee where name='eric' |
|      345053 |          3 | select * from employee where id=3100     |
+-------------+------------+------------------------------------------+
2 rows in set (0.00 sec)

Which categories of SQL statements consume the longest total time?

If the QPS decrease significantly from 10:00 to 10:30, you can find out the three categories of SQL statements with the longest time consumption from the history table:

SELECT sum_latency, avg_latency, exec_count, query_sample_text
    FROM information_schema.statements_summary_history
    WHERE summary_begin_time='2020-01-02 10:00:00'
    ORDER BY sum_latency DESC LIMIT 3;

The result shows that the following three categories of SQL statements consume the longest time in total, which need to be optimized with high priority.

+-------------+-------------+------------+-----------------------------------------------------------------------+
| sum_latency | avg_latency | exec_count | query_sample_text                                                     |
+-------------+-------------+------------+-----------------------------------------------------------------------+
|     7855660 |     1122237 |          7 | select avg(salary) from employee where company_id=2013                |
|     7241960 |     1448392 |          5 | select * from employee join company on employee.company_id=company.id |
|     2084081 |     1042040 |          2 | select * from employee where name='eric'                              |
+-------------+-------------+------------+-----------------------------------------------------------------------+
3 rows in set (0.00 sec)

Fields description

`statements_summary` fields description

The following are descriptions of fields in the statements_summary table.

Basic fields:

STMT_TYPE: SQL statement type.
SCHEMA_NAME: The current schema in which SQL statements of this category are executed.
DIGEST: The digest of SQL statements of this category.
DIGEST_TEXT: The normalized SQL statement.
QUERY_SAMPLE_TEXT: The original SQL statements of the SQL category. Only one original statement is taken.
TABLE_NAMES: All tables involved in SQL statements. If there is more than one table, each is separated by a comma.
INDEX_NAMES: All SQL indexes used in SQL statements. If there is more than one index, each is separated by a comma.
SAMPLE_USER: The users who execute SQL statements of this category. Only one user is taken.
PLAN_DIGEST: The digest of the execution plan.
PLAN: The original execution plan. If there are multiple statements, the plan of only one statement is taken.
BINARY_PLAN: The original execution plan encoded in binary format. If there are multiple statements, the plan of only one statement is taken. Execute the SELECT tidb_decode_binary_plan('xxx...') statement to parse the specific execution plan.
PLAN_CACHE_HITS: The total number of times that SQL statements of this category hit the plan cache.
PLAN_IN_CACHE: Indicates whether the previous execution of SQL statements of this category hit the plan cache.
PLAN_CACHE_UNQUALIFIED: The number of times that the SQL statements of this category fail to hit the plan cache.
PLAN_CACHE_UNQUALIFIED_LAST_REASON: The reason why the SQL statements of this category fail to hit the plan cache last time.

Fields related to execution time:

SUMMARY_BEGIN_TIME: The beginning time of the current summary period.
SUMMARY_END_TIME: The ending time of the current summary period.
FIRST_SEEN: The time when SQL statements of this category are seen for the first time.
LAST_SEEN: The time when SQL statements of this category are seen for the last time.

Fields related to TiDB server:

EXEC_COUNT: Total execution times of SQL statements of this category.
SUM_ERRORS: The sum of errors occurred during execution.
SUM_WARNINGS: The sum of warnings occurred during execution.
SUM_LATENCY: The total execution latency of SQL statements of this category.
MAX_LATENCY: The maximum execution latency of SQL statements of this category.
MIN_LATENCY: The minimum execution latency of SQL statements of this category.
AVG_LATENCY: The average execution latency of SQL statements of this category.
AVG_PARSE_LATENCY: The average latency of the parser.
MAX_PARSE_LATENCY: The maximum latency of the parser.
AVG_COMPILE_LATENCY: The average latency of the compiler.
MAX_COMPILE_LATENCY: The maximum latency of the compiler.
AVG_MEM: The average memory (byte) used.
MAX_MEM: The maximum memory (byte) used.
AVG_DISK: The average disk space (byte) used.
MAX_DISK: The maximum disk space (byte) used.
AVG_TIDB_CPU_TIME: The average TiDB server CPU time that SQL statements of this category consume. It shows meaningful values only when the Top SQL feature is enabled. Otherwise the value is always 0.

Fields related to TiKV Coprocessor task:

SUM_COP_TASK_NUM: The total number of Coprocessor requests sent.
MAX_COP_PROCESS_TIME: The maximum execution time of Coprocessor tasks.
MAX_COP_PROCESS_ADDRESS: The address of the Coprocessor task with the maximum execution time.
MAX_COP_WAIT_TIME: The maximum waiting time of Coprocessor tasks.
MAX_COP_WAIT_ADDRESS: The address of the Coprocessor task with the maximum waiting time.
AVG_PROCESS_TIME: The average processing time of SQL statements in TiKV.
MAX_PROCESS_TIME: The maximum processing time of SQL statements in TiKV.
AVG_WAIT_TIME: The average waiting time of SQL statements in TiKV.
MAX_WAIT_TIME: The maximum waiting time of SQL statements in TiKV.
AVG_BACKOFF_TIME: The average waiting time before retry when a SQL statement encounters an error that requires a retry.
MAX_BACKOFF_TIME: The maximum waiting time before retry when a SQL statement encounters an error that requires a retry.
AVG_TOTAL_KEYS: The average number of keys that Coprocessor has scanned.
MAX_TOTAL_KEYS: The maximum number of keys that Coprocessor has scanned.
AVG_PROCESSED_KEYS: The average number of keys that Coprocessor has processed. Compared with avg_total_keys, avg_processed_keys does not include the old versions of MVCC. A great difference between avg_total_keys and avg_processed_keys indicates that many old versions exist.
MAX_PROCESSED_KEYS: The maximum number of keys that Coprocessor has processed.
AVG_TIKV_CPU_TIME: The average TiKV server CPU time that SQL statements of this category consume.

Transaction-related fields:

AVG_PREWRITE_TIME: The average time of the prewrite phase.
MAX_PREWRITE_TIME: The longest time of the prewrite phase.
AVG_COMMIT_TIME: The average time of the commit phase.
MAX_COMMIT_TIME: The longest time of the commit phase.
AVG_GET_COMMIT_TS_TIME: The average time of getting commit_ts.
MAX_GET_COMMIT_TS_TIME: The longest time of getting commit_ts.
AVG_COMMIT_BACKOFF_TIME: The average waiting time before retry when a SQL statement encounters an error that requires a retry during the commit phase.
MAX_COMMIT_BACKOFF_TIME: The maximum waiting time before retry when a SQL statement encounters an error that requires a retry during the commit phase.
AVG_RESOLVE_LOCK_TIME: The average time for resolving lock conflicts occurred between transactions.
MAX_RESOLVE_LOCK_TIME: The longest time for resolving lock conflicts occurred between transactions.
AVG_LOCAL_LATCH_WAIT_TIME: The average waiting time of the local transaction.
MAX_LOCAL_LATCH_WAIT_TIME: The maximum waiting time of the local transaction.
AVG_WRITE_KEYS: The average count of written keys.
MAX_WRITE_KEYS: The maximum count of written keys.
AVG_WRITE_SIZE: The average amount of written data (in byte).
MAX_WRITE_SIZE: The maximum amount of written data (in byte).
AVG_PREWRITE_REGIONS: The average number of Regions involved in the prewrite phase.
MAX_PREWRITE_REGIONS: The maximum number of Regions during the prewrite phase.
AVG_TXN_RETRY: The average number of transaction retries.
MAX_TXN_RETRY: The maximum number of transaction retries.
SUM_BACKOFF_TIMES: The sum of retries when SQL statements of this category encounter errors that require a retry.
BACKOFF_TYPES: All types of errors that require retries and the number of retries for each type. The format of the field is type:number. If there is more than one error type, each is separated by a comma, like txnLock:2,pdRPC:1.
AVG_AFFECTED_ROWS: The average number of rows affected.
PREV_SAMPLE_TEXT: When the current SQL statement is COMMIT, PREV_SAMPLE_TEXT is the previous statement to COMMIT. In this case, SQL statements are grouped by the digest and prev_sample_text. This means that COMMIT statements with different prev_sample_text are grouped to different rows. When the current SQL statement is not COMMIT, the PREV_SAMPLE_TEXT field is an empty string.

Fields related to Resource Control:

AVG_REQUEST_UNIT_WRITE: the average number of write RUs consumed by SQL statements.
MAX_REQUEST_UNIT_WRITE: the maximum number of write RUs consumed by SQL statements.
AVG_REQUEST_UNIT_READ: the average number of read RUs consumed by SQL statements.
MAX_REQUEST_UNIT_READ: the maximum number of read RUs consumed by SQL statements.
AVG_QUEUED_RC_TIME: the average waiting time for available RU when executing SQL statements.
MAX_QUEUED_RC_TIME: the maximum waiting time for available RU when executing SQL statements.
RESOURCE_GROUP: the resource group bound to SQL statements.

Fields related to network traffic:

SUM_UNPACKED_BYTES_SENT_TIKV_TOTAL: total bytes sent from SQL statements to TiKV.
SUM_UNPACKED_BYTES_RECEIVED_TIKV_TOTAL: total bytes received by SQL statements from TiKV.
SUM_UNPACKED_BYTES_SENT_TIKV_CROSS_ZONE: bytes sent from SQL statements to TiKV across availability zones.
SUM_UNPACKED_BYTES_RECEIVED_TIKV_CROSS_ZONE: bytes received by SQL statements from TiKV across availability zones.
SUM_UNPACKED_BYTES_SENT_TIFLASH_TOTAL: total bytes sent from SQL statements to TiFlash, including bytes sent between TiFlash nodes.
SUM_UNPACKED_BYTES_RECEIVED_TIFLASH_TOTAL: total bytes received by SQL statements from TiFlash, including bytes received between TiFlash nodes.
SUM_UNPACKED_BYTES_SENT_TIFLASH_CROSS_ZONE: bytes sent from SQL statements to TiFlash across availability zones, including bytes sent between TiFlash nodes across availability zones.
SUM_UNPACKED_BYTES_RECEIVED_TIFLASH_CROSS_ZONE: bytes received by SQL statements from TiFlash across availability zones, including bytes received between TiFlash nodes across availability zones.

Fields related to storage engines:

STORAGE_KV: introduced in v9.0.0, indicates whether the previous execution of SQL statements of this category read data from TiKV.
STORAGE_MPP: introduced in v9.0.0, indicates whether the previous execution of SQL statements of this category read data from TiFlash.

`statements_summary_evicted` fields description

BEGIN_TIME: Records the starting time.
END_TIME: Records the ending time.
EVICTED_COUNT: The number of SQL categories that are evicted during the record period.

Statement Summary Tables

statements_summary

statements_summary_history

statements_summary_evicted

The cluster tables for statement summary

Parameter configuration

Set a proper size for statement summary

Limitation

Persist statements summary

Troubleshooting examples

Could high SQL latency be caused by the server end?

Which categories of SQL statements consume the longest total time?

Fields description

statements_summary fields description

statements_summary_evicted fields description

Was this page helpful?

`statements_summary`

`statements_summary_history`

`statements_summary_evicted`

The `cluster` tables for statement summary

`statements_summary` fields description

`statements_summary_evicted` fields description