Partition Pruning
Partition pruning 是一种适用于分区表的性能优化技术。它会分析查询语句中的过滤条件,在不包含任何所需数据的分区上进行(裁剪),从而排除掉这些分区。通过排除不必要的分区,TiDB 能够减少需要访问的数据量,从而可能显著提升查询执行时间。
以下是一个示例:
CREATE TABLE t1 (
id INT NOT NULL PRIMARY KEY,
pad VARCHAR(100)
)
PARTITION BY RANGE COLUMNS(id) (
PARTITION p0 VALUES LESS THAN (100),
PARTITION p1 VALUES LESS THAN (200),
PARTITION p2 VALUES LESS THAN (MAXVALUE)
);
INSERT INTO t1 VALUES (1, 'test1'),(101, 'test2'), (201, 'test3');
EXPLAIN SELECT * FROM t1 WHERE id BETWEEN 80 AND 120;
+----------------------------+---------+-----------+------------------------+------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------+---------+-----------+------------------------+------------------------------------------------+
| PartitionUnion_8 | 80.00 | root | | |
| ├─TableReader_10 | 40.00 | root | | data:TableRangeScan_9 |
| │ └─TableRangeScan_9 | 40.00 | cop[tikv] | table:t1, partition:p0 | range:[80,120], keep order:false, stats:pseudo |
| └─TableReader_12 | 40.00 | root | | data:TableRangeScan_11 |
| └─TableRangeScan_11 | 40.00 | cop[tikv] | table:t1, partition:p1 | range:[80,120], keep order:false, stats:pseudo |
+----------------------------+---------+-----------+------------------------+------------------------------------------------+
5 rows in set (0.00 sec)
使用场景:分区裁剪
分区裁剪的使用场景对于 Range 分区表和 Hash 分区表有所不同。
在 Hash 分区表中使用分区裁剪
本节描述 Hash 分区表中分区裁剪的适用和不适用场景。
适用场景
在 Hash 分区表中,分区裁剪仅适用于等值比较的查询条件。
create table t (x int) partition by hash(x) partitions 4;
explain select * from t where x = 1;
+-------------------------+----------+-----------+-----------------------+--------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------+----------+-----------+-----------------------+--------------------------------+
| TableReader_8 | 10.00 | root | | data:Selection_7 |
| └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 1) |
| └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+-----------------------+--------------------------------+
从上述 SQL 语句可以看出,根据条件 x = 1
,所有结果都落在一个分区中。经过哈希分区后,可以确认值 1
在 p1
分区内。因此,只需要扫描 p1
分区,无需访问不会匹配的 p2
、p3
和 p4
分区。从执行计划可以看到,只有一个 TableFullScan
操作,并且在 access object
中指定了 p1
分区,因此可以确认分区裁剪已生效。
不适用场景
本节描述两个 Hash 分区表中分区裁剪不适用的场景。
场景一
如果不能确认查询结果只落在一个分区(例如 in
、between
、>
、<
、>=
、<=
),则不能使用分区裁剪优化。例如:
create table t (x int) partition by hash(x) partitions 4;
explain select * from t where x > 2;
+------------------------------+----------+-----------+-----------------------+--------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+----------+-----------+-----------------------+--------------------------------+
| Union_10 | 13333.33 | root | | |
| ├─TableReader_13 | 3333.33 | root | | data:Selection_12 |
| │ └─Selection_12 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) |
| │ └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo |
| ├─TableReader_16 | 3333.33 | root | | data:Selection_15 |
| │ └─Selection_15 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) |
| │ └─TableFullScan_14 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
| ├─TableReader_19 | 3333.33 | root | | data:Selection_18 |
| │ └─Selection_18 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) |
| │ └─TableFullScan_17 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo |
| └─TableReader_22 | 3333.33 | root | | data:Selection_21 |
| └─Selection_21 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) |
| └─TableFullScan_20 | 10000.00 | cop[tikv] | table:t, partition:p3 | keep order:false, stats:pseudo |
+------------------------------+----------+-----------+-----------------------+--------------------------------+
在此情况下,分区裁剪不适用,因为无法确认对应的 Hash 分区满足 x > 2
条件。
场景二
由于分区裁剪的规则优化是在查询计划生成阶段进行的,因此不适用于过滤条件只能在执行阶段获得的场景。例如:
create table t (x int) partition by hash(x) partitions 4;
explain select * from t2 where x = (select * from t1 where t2.x = t1.x and t2.x < 2);
+--------------------------------------+----------+-----------+------------------------+----------------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------------------+----------+-----------+------------------------+----------------------------------------------+
| Projection_13 | 9990.00 | root | | test.t2.x |
| └─Apply_15 | 9990.00 | root | | inner join, equal:[eq(test.t2.x, test.t1.x)] |
| ├─TableReader_18(Build) | 9990.00 | root | | data:Selection_17 |
| │ └─Selection_17 | 9990.00 | cop[tikv] | | not(isnull(test.t2.x)) |
| │ └─TableFullScan_16 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
| └─Selection_19(Probe) | 0.80 | root | | not(isnull(test.t1.x)) |
| └─MaxOneRow_20 | 1.00 | root | | |
| └─Union_21 | 2.00 | root | | |
| ├─TableReader_24 | 2.00 | root | | data:Selection_23 |
| │ └─Selection_23 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) |
| │ └─TableFullScan_22 | 2500.00 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo |
| └─TableReader_27 | 2.00 | root | | data:Selection_26 |
| └─Selection_26 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) |
| └─TableFullScan_25 | 2500.00 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo |
+--------------------------------------+----------+-----------+------------------------+----------------------------------------------+
每次读取 t2
的一行数据时,都需要对 t1
的分区表进行查询。理论上,t1.x > val
的过滤条件在此时已满足,但实际上,分区裁剪只在查询计划的生成阶段生效,而非在执行阶段。
在 Range 分区表中使用分区裁剪
本节描述 Range 分区表中分区裁剪的适用和不适用场景。
适用场景
Range 分区表中,分区裁剪适用于以下三种场景。
场景一
分区裁剪适用于 Range 分区表中的等值比较查询条件。例如:
create table t (x int) partition by range (x) (
partition p0 values less than (5),
partition p1 values less than (10),
partition p2 values less than (15)
);
explain select * from t where x = 3;
+-------------------------+----------+-----------+-----------------------+--------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------+----------+-----------+-----------------------+--------------------------------+
| TableReader_8 | 10.00 | root | | data:Selection_7 |
| └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 3) |
| └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+-----------------------+--------------------------------+
分区裁剪也适用于使用 in
查询条件的等值比较。例如:
create table t (x int) partition by range (x) (
partition p0 values less than (5),
partition p1 values less than (10),
partition p2 values less than (15)
);
explain select * from t where x in(1,13);
+-----------------------------+----------+-----------+-----------------------+--------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+----------+-----------+-----------------------+--------------------------------+
| Union_8 | 40.00 | root | | |
| ├─TableReader_11 | 20.00 | root | | data:Selection_10 |
| │ └─Selection_10 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) |
| │ └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo |
| └─TableReader_14 | 20.00 | root | | data:Selection_13 |
| └─Selection_13 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) |
| └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo |
+-----------------------------+----------+-----------+-----------------------+--------------------------------+
在上述 SQL 中,从 x in(1,13)
条件可以看出,所有结果只落在少数几个分区内。经过分析,发现 x=1
的所有记录都在 p0
分区,x=13
的所有记录都在 p2
分区,因此只需访问 p0
和 p2
分区。
场景二
分区裁剪适用于区间比较的查询条件,例如 between
、>
、<
、=
、>=
、<=
。例如:
create table t (x int) partition by range (x) (
partition p0 values less than (5),
partition p1 values less than (10),
partition p2 values less than (15)
);
explain select * from t where x between 7 and 14;
+-----------------------------+----------+-----------+-----------------------+-----------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+----------+-----------+-----------------------+-----------------------------------+
| Union_8 | 500.00 | root | | |
| ├─TableReader_11 | 250.00 | root | | data:Selection_10 |
| │ └─Selection_10 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) |
| │ └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
| └─TableReader_14 | 250.00 | root | | data:Selection_13 |
| └─Selection_13 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) |
| └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo |
+-----------------------------+----------+-----------+-----------------------+--------------------------------+
场景三
分区裁剪适用于分区表达式为 fn(col)
的简单形式,查询条件为 >、<、=、>=、<=
,且 fn
函数为单调函数。
如果 fn
为单调函数,则对于任何 x
和 y
,如果 x > y
,则 fn(x) > fn(y)
。在这种情况下,fn
可以被视为严格单调的。对于任何 x
和 y
,如果 x > y
,则 fn(x) >= fn(y)
。此时,fn
也可以称为“单调函数”。理论上,所有单调函数(严格或非严格)都支持分区裁剪。目前,TiDB 仅支持以下单调函数:
UNIX_TIMESTAMP()
TO_DAYS()
EXTRACT(<time unit> FROM <DATETIME/DATE/TIME column>)
。对于DATE
和DATETIME
列,YEAR
和YEAR_MONTH
时间单位被视为单调函数。对于TIME
列,HOUR
、HOUR_MINUTE
、HOUR_SECOND
和HOUR_MICROSECOND
被视为单调函数。注意,WEEK
不支持作为EXTRACT
的时间单位用于分区裁剪。
例如,当分区表达式为 fn(col)
,且 fn
为单调函数 to_days
时,分区裁剪会生效:
create table t (id datetime) partition by range (to_days(id)) (
partition p0 values less than (to_days('2020-04-01')),
partition p1 values less than (to_days('2020-05-01')));
explain select * from t where id > '2020-04-18';
+-------------------------+----------+-----------+-----------------------+-------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------+----------+-----------+-----------------------+-------------------------------------------+
| TableReader_8 | 3333.33 | root | | data:Selection_7 |
| └─Selection_7 | 3333.33 | cop[tikv] | | gt(test.t.id, 2020-04-18 00:00:00.000000) |
| └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+-----------------------+-------------------------------------------+
不适用场景:Range 分区表
由于分区裁剪的规则优化是在查询计划生成阶段进行的,因此不适用于过滤条件只能在执行阶段获得的场景。例如:
create table t1 (x int) partition by range (x) (
partition p0 values less than (5),
partition p1 values less than (10));
create table t2 (x int);
explain select * from t2 where x < (select * from t1 where t2.x < t1.x and t2.x < 2);
+--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+
| Projection_13 | 9990.00 | root | | test.t2.x |
| └─Apply_15 | 9990.00 | root | | CARTESIAN inner join, other cond:lt(test.t2.x, test.t1.x) |
| ├─TableReader_18(Build) | 9990.00 | root | | data:Selection_17 |
| │ └─Selection_17 | 9990.00 | cop[tikv] | | not(isnull(test.t2.x)) |
| │ └─TableFullScan_16 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
| └─Selection_19(Probe) | 0.80 | root | | not(isnull(test.t1.x)) |
| └─MaxOneRow_20 | 1.00 | root | | |
| └─Union_21 | 2.00 | root | | |
| ├─TableReader_24 | 2.00 | root | | data:Selection_23 |
| │ └─Selection_23 | 2.00 | cop[tikv] | | lt(test.t2.x, 2), lt(test.t2.x, test.t1.x) |
| │ └─TableFullScan_22 | 2.50 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo |
| └─TableReader_27 | 2.00 | root | | data:Selection_26 |
| └─Selection_26 | 2.00 | cop[tikv] | | lt(test.t2.x, 2), lt(test.t1.x, test.t2.x) |
| └─TableFullScan_25 | 2.50 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo |
+--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+
14 rows in set (0.00 sec)
每次读取 t2
的一行数据时,都需要对 t1
的分区表进行查询。理论上,t1.x > val
的过滤条件在此时已满足,但实际上,分区裁剪只在查询计划的生成阶段生效,而非在执行阶段。