- 关于 TiDB
- 快速上手
- 部署集群
- 数据迁移
- 概述
- 从 MySQL 迁移至 TiDB
- 从 CSV 文件迁移至 TiDB
- 从 SQL 文件迁移到 TiDB
- 运维操作
- 升级 TiDB 版本
- 扩缩容
- 备份与恢复
- 读取历史数据
- 修改时区
- 日常巡检
- TiFlash 常用运维操作
- TiUP 常用运维操作
- 在线修改集群配置
- 监控与告警
- 故障诊断
- 性能调优
- 系统调优
- 软件调优
- SQL 性能调优
- SQL 性能调优概览
- 理解 TiDB 执行计划
- SQL 优化流程
- 控制执行计划
- 教程
- TiDB 生态工具
- 参考指南
- 架构
- 监控指标
- 安全加固
- 权限
- SQL
- SQL 语言结构和语法
- SQL 语句
ADD COLUMN
ADD INDEX
ADMIN
ADMIN CANCEL DDL
ADMIN CHECKSUM TABLE
ADMIN CHECK [TABLE|INDEX]
ADMIN SHOW DDL [JOBS|QUERIES]
ALTER DATABASE
ALTER INDEX
ALTER INSTANCE
ALTER TABLE
ALTER USER
ANALYZE TABLE
BACKUP
BEGIN
CHANGE COLUMN
CHANGE DRAINER
CHANGE PUMP
COMMIT
CREATE [GLOBAL|SESSION] BINDING
CREATE DATABASE
CREATE INDEX
CREATE ROLE
CREATE SEQUENCE
CREATE TABLE LIKE
CREATE TABLE
CREATE USER
CREATE VIEW
DEALLOCATE
DELETE
DESC
DESCRIBE
DO
DROP [GLOBAL|SESSION] BINDING
DROP COLUMN
DROP DATABASE
DROP INDEX
DROP ROLE
DROP SEQUENCE
DROP STATS
DROP TABLE
DROP USER
DROP VIEW
EXECUTE
EXPLAIN ANALYZE
EXPLAIN
FLASHBACK TABLE
FLUSH PRIVILEGES
FLUSH STATUS
FLUSH TABLES
GRANT <privileges>
GRANT <role>
INSERT
KILL [TIDB]
LOAD DATA
LOAD STATS
MODIFY COLUMN
PREPARE
RECOVER TABLE
RENAME INDEX
RENAME TABLE
REPLACE
RESTORE
REVOKE <privileges>
REVOKE <role>
ROLLBACK
SELECT
SET DEFAULT ROLE
SET [NAMES|CHARACTER SET]
SET PASSWORD
SET ROLE
SET TRANSACTION
SET [GLOBAL|SESSION] <variable>
SHOW [BACKUPS|RESTORES]
SHOW ANALYZE STATUS
SHOW [GLOBAL|SESSION] BINDINGS
SHOW BUILTINS
SHOW CHARACTER SET
SHOW COLLATION
SHOW [FULL] COLUMNS FROM
SHOW CONFIG
SHOW CREATE SEQUENCE
SHOW CREATE TABLE
SHOW CREATE USER
SHOW DATABASES
SHOW DRAINER STATUS
SHOW ENGINES
SHOW ERRORS
SHOW [FULL] FIELDS FROM
SHOW GRANTS
SHOW INDEX [FROM|IN]
SHOW INDEXES [FROM|IN]
SHOW KEYS [FROM|IN]
SHOW MASTER STATUS
SHOW PLUGINS
SHOW PRIVILEGES
SHOW [FULL] PROCESSSLIST
SHOW PROFILES
SHOW PUMP STATUS
SHOW SCHEMAS
SHOW STATS_HEALTHY
SHOW STATS_HISTOGRAMS
SHOW STATS_META
SHOW STATUS
SHOW TABLE NEXT_ROW_ID
SHOW TABLE REGIONS
SHOW TABLE STATUS
SHOW [FULL] TABLES
SHOW [GLOBAL|SESSION] VARIABLES
SHOW WARNINGS
SHUTDOWN
SPLIT REGION
START TRANSACTION
TRACE
TRUNCATE
UPDATE
USE
- 数据类型
- 函数与操作符
- 聚簇索引
- 约束
- 生成列
- SQL 模式
- 事务
- 垃圾回收 (GC)
- 视图
- 分区表
- 字符集和排序规则
- 系统表
mysql
- INFORMATION_SCHEMA
- Overview
ANALYZE_STATUS
CHARACTER_SETS
CLUSTER_CONFIG
CLUSTER_HARDWARE
CLUSTER_INFO
CLUSTER_LOAD
CLUSTER_LOG
CLUSTER_SYSTEMINFO
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
COLUMNS
DDL_JOBS
ENGINES
INSPECTION_RESULT
INSPECTION_RULES
INSPECTION_SUMMARY
KEY_COLUMN_USAGE
METRICS_SUMMARY
METRICS_TABLES
PARTITIONS
PROCESSLIST
SCHEMATA
SEQUENCES
SESSION_VARIABLES
SLOW_QUERY
STATISTICS
TABLES
TABLE_CONSTRAINTS
TABLE_STORAGE_STATS
TIDB_HOT_REGIONS
TIDB_INDEXES
TIDB_SERVERS_INFO
TIFLASH_REPLICA
TIKV_REGION_PEERS
TIKV_REGION_STATUS
TIKV_STORE_STATUS
USER_PRIVILEGES
VIEWS
METRICS_SCHEMA
- UI
- CLI
- 命令行参数
- 配置文件参数
- 系统变量
- 存储引擎
- TiKV
- TiFlash
- TiUP
- 遥测
- 错误码
- 通过拓扑 label 进行副本调度
- 常见问题解答 (FAQ)
- 术语表
- 版本发布历史
用 EXPLAIN 查看使用子查询的执行计划
TiDB 会执行多种子查询相关的优化,以提升子查询的执行性能。本文档介绍一些常见子查询的优化方式,以及如何解读 EXPLAIN
语句返回的执行计划信息。
本文档所使用的示例表数据如下:
CREATE TABLE t1 (id BIGINT NOT NULL PRIMARY KEY auto_increment, pad1 BLOB, pad2 BLOB, pad3 BLOB, int_col INT NOT NULL DEFAULT 0);
CREATE TABLE t2 (id BIGINT NOT NULL PRIMARY KEY auto_increment, t1_id BIGINT NOT NULL, pad1 BLOB, pad2 BLOB, pad3 BLOB, INDEX(t1_id));
CREATE TABLE t3 (
id INT NOT NULL PRIMARY KEY auto_increment,
t1_id INT NOT NULL,
UNIQUE (t1_id)
);
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM dual;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
UPDATE t1 SET int_col = 1 WHERE pad1 = (SELECT pad1 FROM t1 ORDER BY RAND() LIMIT 1);
INSERT INTO t3 SELECT NULL, id FROM t1 WHERE id < 1000;
SELECT SLEEP(1);
ANALYZE TABLE t1, t2, t3;
Inner join(无 UNIQUE
约束的子查询)
以下示例中,IN
子查询会从表 t2
中搜索一列 ID。为保证语义正确性,TiDB 需要保证 t1_id
列的值具有唯一性。使用 EXPLAIN
可查看到该查询的执行计划去掉重复项并执行 Inner Join
内连接操作:
EXPLAIN SELECT * FROM t1 WHERE id IN (SELECT t1_id FROM t2);
+--------------------------------+----------+-----------+------------------------------+---------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------------+----------+-----------+------------------------------+---------------------------------------------------------------------------------+
| IndexMergeJoin_19 | 45.00 | root | | inner join, inner:TableReader_14, outer key:test.t2.t1_id, inner key:test.t1.id |
| ├─HashAgg_38(Build) | 45.00 | root | | group by:test.t2.t1_id, funcs:firstrow(test.t2.t1_id)->test.t2.t1_id |
| │ └─IndexReader_39 | 45.00 | root | | index:HashAgg_31 |
| │ └─HashAgg_31 | 45.00 | cop[tikv] | | group by:test.t2.t1_id, |
| │ └─IndexFullScan_37 | 90000.00 | cop[tikv] | table:t2, index:t1_id(t1_id) | keep order:false |
| └─TableReader_14(Probe) | 1.00 | root | | data:TableRangeScan_13 |
| └─TableRangeScan_13 | 1.00 | cop[tikv] | table:t1 | range: decided by [test.t2.t1_id], keep order:true |
+--------------------------------+----------+-----------+------------------------------+---------------------------------------------------------------------------------+
7 rows in set (0.00 sec)
由上述查询结果可知,TiDB 首先执行 Index Join
索引连接(即 Merge Join
合并连接的变体)操作,开始读取 t2.t1_id
列的索引。先是 └─HashAgg_31
算子的部分任务在 TiKV 中对 t1_id
值进行去重,然后├─HashAgg_38(Build)
算子的部分任务在 TiDB 中对 t1_id
值再次进行去重。去重操作由聚合函数 firstrow(test.t2.t1_id)
执行,之后会将操作结果与 t1
表的主键相连接。
Inner join(有 UNIQUE
约束的子查询)
在上述示例中,为了确保 t1_id
值在与表 t1
连接前具有唯一性,需要执行聚合运算。在以下示例中,由于 UNIQUE
约束已能确保 t3.t1_id
列值的唯一:
EXPLAIN SELECT * FROM t1 WHERE id IN (SELECT t1_id FROM t3);
+-----------------------------+---------+-----------+------------------------------+---------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+---------+-----------+------------------------------+---------------------------------------------------------------------------------+
| IndexMergeJoin_20 | 999.00 | root | | inner join, inner:TableReader_15, outer key:test.t3.t1_id, inner key:test.t1.id |
| ├─IndexReader_39(Build) | 999.00 | root | | index:IndexFullScan_38 |
| │ └─IndexFullScan_38 | 999.00 | cop[tikv] | table:t3, index:t1_id(t1_id) | keep order:false |
| └─TableReader_15(Probe) | 1.00 | root | | data:TableRangeScan_14 |
| └─TableRangeScan_14 | 1.00 | cop[tikv] | table:t1 | range: decided by [test.t3.t1_id], keep order:true |
+-----------------------------+---------+-----------+------------------------------+---------------------------------------------------------------------------------+
5 rows in set (0.00 sec)
从语义上看,因为约束保证了 t3.t1_id
列值的唯一性,TiDB 可以直接执行 INNER JOIN
查询。
Semi Join(关联查询)
在前两个示例中,通过 HashAgg
聚合操作或通过 UNIQUE
约束保证子查询数据的唯一性之后,TiDB 才能够执行 Inner Join
操作。这两种连接均使用了 Index Join
(Merge Join
的变体)。
下面的例子中,TiDB 优化器则选择了一种不同的执行计划:
EXPLAIN SELECT * FROM t1 WHERE id IN (SELECT t1_id FROM t2 WHERE t1_id != t1.int_col);
+-----------------------------+-----------+-----------+------------------------------+--------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+-----------+-----------+------------------------------+--------------------------------------------------------------------------------------------------------+
| MergeJoin_9 | 45446.40 | root | | semi join, left key:test.t1.id, right key:test.t2.t1_id, other cond:ne(test.t2.t1_id, test.t1.int_col) |
| ├─IndexReader_24(Build) | 180000.00 | root | | index:IndexFullScan_23 |
| │ └─IndexFullScan_23 | 180000.00 | cop[tikv] | table:t2, index:t1_id(t1_id) | keep order:true |
| └─TableReader_22(Probe) | 56808.00 | root | | data:Selection_21 |
| └─Selection_21 | 56808.00 | cop[tikv] | | ne(test.t1.id, test.t1.int_col) |
| └─TableFullScan_20 | 71010.00 | cop[tikv] | table:t1 | keep order:true |
+-----------------------------+-----------+-----------+------------------------------+--------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
由上述查询结果可知,TiDB 执行了 Semi Join
。不同于 Inner Join
,Semi Join
仅允许右键 (t2.t1_id
) 上的第一个值,也就是该操作将去除 Join
算子任务中的重复数据。Join
算法也包含 Merge Join
,会按照排序顺序同时从左侧和右侧读取数据,这是一种高效的 Zipper Merge
。
可以将原语句视为关联子查询,因为它引入了子查询外的 t1.int_col
列。然而,EXPLAIN
语句的返回结果显示的是关联子查询去关联后的执行计划。条件 t1_id != t1.int_col
会被重写为 t1.id != t1.int_col
。TiDB 可以从表 t1
中读取数据并且在 └─Selection_21
中执行此操作,因此这种去关联和重写操作会极大提高执行效率。
Anti Semi Join (NOT IN
子查询)
在以下示例中,除非子查询中存在 t3.t1_id
,否则该查询将(从语义上)返回表 t3
中的所有行:
EXPLAIN SELECT * FROM t3 WHERE t1_id NOT IN (SELECT id FROM t1 WHERE int_col < 100);
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+
| IndexMergeJoin_20 | 1598.40 | root | | anti semi join, inner:TableReader_15, outer key:test.t3.t1_id, inner key:test.t1.id |
| ├─TableReader_28(Build) | 1998.00 | root | | data:TableFullScan_27 |
| │ └─TableFullScan_27 | 1998.00 | cop[tikv] | table:t3 | keep order:false |
| └─TableReader_15(Probe) | 1.00 | root | | data:Selection_14 |
| └─Selection_14 | 1.00 | cop[tikv] | | lt(test.t1.int_col, 100) |
| └─TableRangeScan_13 | 1.00 | cop[tikv] | table:t1 | range: decided by [test.t3.t1_id], keep order:true |
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
上述查询首先读取了表 t3
,然后根据主键开始探测 (probe) 表 t1
。连接类型是 anti semi join,即反半连接:之所以使用 anti,是因为上述示例有不存在匹配值(即 NOT IN
)的情况;使用 Semi Join
则是因为仅需要匹配第一行后就可以停止查询。