解释使用索引的语句
TiDB 支持多种操作符,这些操作符利用索引来加快查询执行速度:
本文中的示例基于以下样例数据:
CREATE TABLE t1 (
id INT NOT NULL PRIMARY KEY auto_increment,
intkey INT NOT NULL,
pad1 VARBINARY(1024),
INDEX (intkey)
);
INSERT INTO t1 SELECT NULL, FLOOR(RAND()*1024), RANDOM_BYTES(1024) FROM dual;
INSERT INTO t1 SELECT NULL, FLOOR(RAND()*1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, FLOOR(RAND()*1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
INSERT INTO t1 SELECT NULL, FLOOR(RAND()*1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000;
IndexLookup
当从二级索引中检索数据时,TiDB 使用 IndexLookup
操作符。在这种情况下,以下查询都将对 intkey
索引使用 IndexLookup
操作符:
EXPLAIN SELECT * FROM t1 WHERE intkey = 123;
EXPLAIN SELECT * FROM t1 WHERE intkey < 10;
EXPLAIN SELECT * FROM t1 WHERE intkey BETWEEN 300 AND 310;
EXPLAIN SELECT * FROM t1 WHERE intkey IN (123,29,98);
EXPLAIN SELECT * FROM t1 WHERE intkey >= 99 AND intkey <= 103;
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
| IndexLookUp_10 | 1.00 | root | | |
| ├─IndexRangeScan_8(Build) | 1.00 | cop[tikv] | table:t1, index:intkey(intkey) | range:[123,123], keep order:false |
| └─TableRowIDScan_9(Probe) | 1.00 | cop[tikv] | table:t1 | keep order:false |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
3 rows in set (0.00 sec)
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
| IndexLookUp_10 | 3.60 | root | | |
| ├─IndexRangeScan_8(Build) | 3.60 | cop[tikv] | table:t1, index:intkey(intkey) | range:[-inf,10), keep order:false |
| └─TableRowIDScan_9(Probe) | 3.60 | cop[tikv] | table:t1 | keep order:false |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
3 rows in set (0.00 sec)
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
| IndexLookUp_10 | 5.67 | root | | |
| ├─IndexRangeScan_8(Build) | 5.67 | cop[tikv] | table:t1, index:intkey(intkey) | range:[300,310], keep order:false |
| └─TableRowIDScan_9(Probe) | 5.67 | cop[tikv] | table:t1 | keep order:false |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
3 rows in set (0.00 sec)
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------------------------+
| IndexLookUp_10 | 4.00 | root | | |
| ├─IndexRangeScan_8(Build) | 4.00 | cop[tikv] | table:t1, index:intkey(intkey) | range:[29,29], [98,98], [123,123], keep order:false |
| └─TableRowIDScan_9(Probe) | 4.00 | cop[tikv] | table:t1 | keep order:false |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------------------------+
3 rows in set (0.00 sec)
+-------------------------------+---------+-----------+--------------------------------+----------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+---------+-----------+--------------------------------+----------------------------------+
| IndexLookUp_10 | 6.00 | root | | |
| ├─IndexRangeScan_8(Build) | 6.00 | cop[tikv] | table:t1, index:intkey(intkey) | range:[99,103], keep order:false |
| └─TableRowIDScan_9(Probe) | 6.00 | cop[tikv] | table:t1 | keep order:false |
+-------------------------------+---------+-----------+--------------------------------+----------------------------------+
3 rows in set (0.00 sec)
IndexLookup
操作符有两个子节点:
├─IndexRangeScan_8(Build)
操作符对intkey
索引执行范围扫描,并检索内部的RowID
(对于此表,即主键)。└─TableRowIDScan_9(Probe)
操作符随后从表数据中检索完整行。
由于 IndexLookup
任务需要两个步骤,SQL 优化器可能会根据 statistics 在匹配大量行的场景下选择 TableFullScan
操作符。在以下示例中,intkey > 100
条件匹配大量行,因此选择了 TableFullScan
:
EXPLAIN SELECT * FROM t1 WHERE intkey > 100;
+-------------------------+---------+-----------+---------------+-------------------------+
| id | estRows | task | access object | operator info |
+-------------------------+---------+-----------+---------------+-------------------------+
| TableReader_7 | 898.50 | root | | data:Selection_6 |
| └─Selection_6 | 898.50 | cop[tikv] | | gt(test.t1.intkey, 100) |
| └─TableFullScan_5 | 1010.00 | cop[tikv] | table:t1 | keep order:false |
+-------------------------+---------+-----------+---------------+-------------------------+
3 rows in set (0.00 sec)
IndexLookup
操作符也可以用来高效优化对索引列的 LIMIT
:
EXPLAIN SELECT * FROM t1 ORDER BY intkey DESC LIMIT 10;
+--------------------------------+---------+-----------+--------------------------------+------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------------+---------+-----------+--------------------------------+------------------------------------+
| IndexLookUp_21 | 10.00 | root | | limit embedded(offset:0, count:10) |
| ├─Limit_20(Build) | 10.00 | cop[tikv] | | offset:0, count:10 |
| │ └─IndexFullScan_18 | 10.00 | cop[tikv] | table:t1, index:intkey(intkey) | keep order:true, desc |
| └─TableRowIDScan_19(Probe) | 10.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+--------------------------------+---------+-----------+--------------------------------+------------------------------------+
在上述示例中,从索引 intkey
读取最后 10 行。这些 RowID
值随后从表数据中检索。
IndexReader
TiDB 支持 covering index optimization(覆盖索引优化)。如果所有行都能从索引中检索出来,TiDB 会跳过通常在 IndexLookup
中需要的第二步。考虑以下两个示例:
EXPLAIN SELECT * FROM t1 WHERE intkey = 123;
EXPLAIN SELECT id FROM t1 WHERE intkey = 123;
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
| IndexLookUp_10 | 1.00 | root | | |
| ├─IndexRangeScan_8(Build) | 1.00 | cop[tikv] | table:t1, index:intkey(intkey) | range:[123,123], keep order:false |
| └─TableRowIDScan_9(Probe) | 1.00 | cop[tikv] | table:t1 | keep order:false |
+-------------------------------+---------+-----------+--------------------------------+-----------------------------------+
3 rows in set (0.00 sec)
+--------------------------+---------+-----------+--------------------------------+-----------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------+---------+-----------+--------------------------------+-----------------------------------+
| Projection_4 | 1.00 | root | | test.t1.id |
| └─IndexReader_6 | 1.00 | root | | index:IndexRangeScan_5 |
| └─IndexRangeScan_5 | 1.00 | cop[tikv] | table:t1, index:intkey(intkey) | range:[123,123], keep order:false |
+--------------------------+---------+-----------+--------------------------------+-----------------------------------+
3 rows in set (0.00 sec)
因为 id
也是内部的 RowID
,它存储在 intkey
索引中。使用 └─IndexRangeScan_5
后,可以直接返回 RowID
的值。
Point_Get 和 Batch_Point_Get
当直接从主键或唯一键检索数据时,TiDB 使用 Point_Get
或 Batch_Point_Get
操作符。这些操作符比 IndexLookup
更高效。例如:
EXPLAIN SELECT * FROM t1 WHERE id = 1234;
EXPLAIN SELECT * FROM t1 WHERE id IN (1234,123);
ALTER TABLE t1 ADD unique_key INT;
UPDATE t1 SET unique_key = id;
ALTER TABLE t1 ADD UNIQUE KEY (unique_key);
EXPLAIN SELECT * FROM t1 WHERE unique_key = 1234;
EXPLAIN SELECT * FROM t1 WHERE unique_key IN (1234, 123);
+-------------+---------+------+---------------+---------------+
| id | estRows | task | access object | operator info |
+-------------+---------+------+---------------+---------------+
| Point_Get_1 | 1.00 | root | table:t1 | handle:1234 |
+-------------+---------+------+---------------+---------------+
1 row in set (0.00 sec)
+-------------------+---------+------+---------------+-------------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------+---------+------+---------------+-------------------------------------------------+
| Batch_Point_Get_1 | 2.00 | root | table:t1 | handle:[1234 123], keep order:false, desc:false |
+-------------------+---------+------+---------------+-------------------------------------------------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.27 sec)
Query OK, 1010 rows affected (0.06 sec)
Rows matched: 1010 Changed: 1010 Warnings: 0
Query OK, 0 rows affected (0.37 sec)
+-------------+---------+------+----------------------------------------+---------------+
| id | estRows | task | access object | operator info |
+-------------+---------+------+----------------------------------------+---------------+
| Point_Get_1 | 1.00 | root | table:t1, index:unique_key(unique_key) | |
+-------------+---------+------+----------------------------------------+---------------+
1 row in set (0.00 sec)
+-------------------+---------+------+----------------------------------------+------------------------------+
| id | estRows | task | access object | operator info |
+-------------------+---------+------+----------------------------------------+------------------------------+
| Batch_Point_Get_1 | 2.00 | root | table:t1, index:unique_key(unique_key) | keep order:false, desc:false |
+-------------------+---------+------+----------------------------------------+------------------------------+
1 row in set (0.00 sec)
IndexFullScan
由于索引是有序的,IndexFullScan
操作符可以用来优化常见的查询,例如索引值的 MIN
或 MAX
:
EXPLAIN SELECT MIN(intkey) FROM t1;
EXPLAIN SELECT MAX(intkey) FROM t1;
+------------------------------+---------+-----------+--------------------------------+-------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+---------+-----------+--------------------------------+-------------------------------------+
| StreamAgg_12 | 1.00 | root | | funcs:min(test.t1.intkey)->Column#4 |
| └─Limit_16 | 1.00 | root | | offset:0, count:1 |
| └─IndexReader_29 | 1.00 | root | | index:Limit_28 |
| └─Limit_28 | 1.00 | cop[tikv] | | offset:0, count:1 |
| └─IndexFullScan_27 | 1.00 | cop[tikv] | table:t1, index:intkey(intkey) | keep order:true |
+------------------------------+---------+-----------+--------------------------------+-------------------------------------+
5 rows in set (0.00 sec)
+------------------------------+---------+-----------+--------------------------------+-------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+---------+-----------+--------------------------------+-------------------------------------+
| StreamAgg_12 | 1.00 | root | | funcs:max(test.t1.intkey)->Column#4 |
| └─Limit_16 | 1.00 | root | | offset:0, count:1 |
| └─IndexReader_29 | 1.00 | root | | index:Limit_28 |
| └─Limit_28 | 1.00 | cop[tikv] | | offset:0, count:1 |
| └─IndexFullScan_27 | 1.00 | cop[tikv] | table:t1, index:intkey(intkey) | keep order:true, desc |
+------------------------------+---------+-----------+--------------------------------+-------------------------------------+
5 rows in set (0.00 sec)
在上述语句中,对每个 TiKV Region 执行 IndexFullScan
任务。尽管名字叫做 FullScan
,但实际上只需要读取第一行(└─Limit_28
)。每个 TiKV Region 会将其 MIN
或 MAX
值返回给 TiDB,之后执行流式聚合以筛选出单行。带有 MAX
或 MIN
聚合函数的流式聚合还确保在表为空时返回 NULL
。
相比之下,在未建立索引的值上执行 MIN
函数会导致 TableFullScan
。此查询需要扫描 TiKV 中的所有行,但会执行 TopN
计算以确保每个 TiKV Region 只返回一行到 TiDB。虽然 TopN
避免了过多行在 TiKV 和 TiDB 之间传输,但此语句的效率远远低于上述利用索引的 MIN
示例。
EXPLAIN SELECT MIN(pad1) FROM t1;
+--------------------------------+---------+-----------+---------------+-----------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------------+---------+-----------+---------------+-----------------------------------+
| StreamAgg_13 | 1.00 | root | | funcs:min(test.t1.pad1)->Column#4 |
| └─TopN_14 | 1.00 | root | | test.t1.pad1, offset:0, count:1 |
| └─TableReader_23 | 1.00 | root | | data:TopN_22 |
| └─TopN_22 | 1.00 | cop[tikv] | | test.t1.pad1, offset:0, count:1 |
| └─Selection_21 | 1008.99 | cop[tikv] | | not(isnull(test.t1.pad1)) |
| └─TableFullScan_20 | 1010.00 | cop[tikv] | table:t1 | keep order:false |
+------------------------------+---------+-----------+---------------+-----------------------------------+
6 rows in set (0.00 sec)
以下语句会使用 IndexFullScan
操作符扫描索引中的每一行:
EXPLAIN SELECT SUM(intkey) FROM t1;
EXPLAIN SELECT AVG(intkey) FROM t1;
+----------------------------+---------+-----------+--------------------------------+-------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------+---------+-----------+--------------------------------+-------------------------------------+
| StreamAgg_20 | 1.00 | root | | funcs:sum(Column#6)->Column#4 |
| └─IndexReader_21 | 1.00 | root | | index:StreamAgg_8 |
| └─StreamAgg_8 | 1.00 | cop[tikv] | | funcs:sum(test.t1.intkey)->Column#6 |
| └─IndexFullScan_19 | 1010.00 | cop[tikv] | table:t1, index:intkey(intkey) | keep order:false |
+----------------------------+---------+-----------+--------------------------------+-------------------------------------+
4 rows in set (0.00 sec)
+----------------------------+---------+-----------+--------------------------------+----------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------+---------+-----------+--------------------------------+----------------------------------------------------------------------------+
| StreamAgg_20 | 1.00 | root | | funcs:avg(Column#7, Column#8)->Column#4 |
| └─IndexReader_21 | 1.00 | root | | index:StreamAgg_8 |
| └─StreamAgg_8 | 1.00 | cop[tikv] | | funcs:count(test.t1.intkey)->Column#7, funcs:sum(test.t1.intkey)->Column#8 |
| └─IndexFullScan_19 | 1010.00 | cop[tikv] | table:t1, index:intkey(intkey) | keep order:false |
+----------------------------+---------+-----------+--------------------------------+----------------------------------------------------------------------------+
4 rows in set (0.00 sec)
在上述示例中,IndexFullScan
比 TableFullScan
更高效,因为 (intkey + RowID)
索引中的值宽度小于整行的宽度。
以下语句不支持使用 IndexFullScan
操作符,因为需要从表中获取额外的列:
EXPLAIN SELECT AVG(intkey), ANY_VALUE(pad1) FROM t1;
+------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------+
| Projection_4 | 1.00 | root | | Column#4, any_value(test.t1.pad1)->Column#5 |
| └─StreamAgg_16 | 1.00 | root | | funcs:avg(Column#10, Column#11)->Column#4, funcs:firstrow(Column#12)->test.t1.pad1 |
| └─TableReader_17 | 1.00 | root | | data:StreamAgg_8 |
| └─StreamAgg_8 | 1.00 | cop[tikv] | | funcs:count(test.t1.intkey)->Column#10, funcs:sum(test.t1.intkey)->Column#11, funcs:firstrow(test.t1.pad1)->Column#12 |
| └─TableFullScan_15 | 1010.00 | cop[tikv] | table:t1 | keep order:false |
+------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------+