📣

TiDB Cloud Serverless 现已更名为
Starter
!此页面由 AI 自动翻译,英文原文请见
此处。

Partition Pruning

Partition pruning 是一种适用于分区表的性能优化技术。它会分析查询语句中的过滤条件,在不包含任何所需数据的分区上进行(裁剪),从而排除掉这些分区。通过排除不必要的分区,TiDB 能够减少需要访问的数据量,从而可能显著提升查询执行时间。

以下是一个示例:

CREATE TABLE t1 ( id INT NOT NULL PRIMARY KEY, pad VARCHAR(100) ) PARTITION BY RANGE COLUMNS(id) ( PARTITION p0 VALUES LESS THAN (100), PARTITION p1 VALUES LESS THAN (200), PARTITION p2 VALUES LESS THAN (MAXVALUE) ); INSERT INTO t1 VALUES (1, 'test1'),(101, 'test2'), (201, 'test3'); EXPLAIN SELECT * FROM t1 WHERE id BETWEEN 80 AND 120;
+----------------------------+---------+-----------+------------------------+------------------------------------------------+ | id | estRows | task | access object | operator info | +----------------------------+---------+-----------+------------------------+------------------------------------------------+ | PartitionUnion_8 | 80.00 | root | | | | ├─TableReader_10 | 40.00 | root | | data:TableRangeScan_9 | | │ └─TableRangeScan_9 | 40.00 | cop[tikv] | table:t1, partition:p0 | range:[80,120], keep order:false, stats:pseudo | | └─TableReader_12 | 40.00 | root | | data:TableRangeScan_11 | | └─TableRangeScan_11 | 40.00 | cop[tikv] | table:t1, partition:p1 | range:[80,120], keep order:false, stats:pseudo | +----------------------------+---------+-----------+------------------------+------------------------------------------------+ 5 rows in set (0.00 sec)

使用场景:分区裁剪

分区裁剪的使用场景对于 Range 分区表和 Hash 分区表有所不同。

在 Hash 分区表中使用分区裁剪

本节描述 Hash 分区表中分区裁剪的适用和不适用场景。

适用场景

在 Hash 分区表中,分区裁剪仅适用于等值比较的查询条件。

create table t (x int) partition by hash(x) partitions 4; explain select * from t where x = 1;
+-------------------------+----------+-----------+-----------------------+--------------------------------+ | id | estRows | task | access object | operator info | +-------------------------+----------+-----------+-----------------------+--------------------------------+ | TableReader_8 | 10.00 | root | | data:Selection_7 | | └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 1) | | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | +-------------------------+----------+-----------+-----------------------+--------------------------------+

从上述 SQL 语句可以看出,根据条件 x = 1,所有结果都落在一个分区中。经过哈希分区后,可以确认值 1p1 分区内。因此,只需要扫描 p1 分区,无需访问不会匹配的 p2p3p4 分区。从执行计划可以看到,只有一个 TableFullScan 操作,并且在 access object 中指定了 p1 分区,因此可以确认分区裁剪已生效。

不适用场景

本节描述两个 Hash 分区表中分区裁剪不适用的场景。

场景一

如果不能确认查询结果只落在一个分区(例如 inbetween><>=<=),则不能使用分区裁剪优化。例如:

create table t (x int) partition by hash(x) partitions 4; explain select * from t where x > 2;
+------------------------------+----------+-----------+-----------------------+--------------------------------+ | id | estRows | task | access object | operator info | +------------------------------+----------+-----------+-----------------------+--------------------------------+ | Union_10 | 13333.33 | root | | | | ├─TableReader_13 | 3333.33 | root | | data:Selection_12 | | │ └─Selection_12 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | | │ └─TableFullScan_11 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo | | ├─TableReader_16 | 3333.33 | root | | data:Selection_15 | | │ └─Selection_15 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | | │ └─TableFullScan_14 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | | ├─TableReader_19 | 3333.33 | root | | data:Selection_18 | | │ └─Selection_18 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | | │ └─TableFullScan_17 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo | | └─TableReader_22 | 3333.33 | root | | data:Selection_21 | | └─Selection_21 | 3333.33 | cop[tikv] | | gt(test.t.x, 2) | | └─TableFullScan_20 | 10000.00 | cop[tikv] | table:t, partition:p3 | keep order:false, stats:pseudo | +------------------------------+----------+-----------+-----------------------+--------------------------------+

在此情况下,分区裁剪不适用,因为无法确认对应的 Hash 分区满足 x > 2 条件。

场景二

由于分区裁剪的规则优化是在查询计划生成阶段进行的,因此不适用于过滤条件只能在执行阶段获得的场景。例如:

create table t (x int) partition by hash(x) partitions 4; explain select * from t2 where x = (select * from t1 where t2.x = t1.x and t2.x < 2);
+--------------------------------------+----------+-----------+------------------------+----------------------------------------------+ | id | estRows | task | access object | operator info | +--------------------------------------+----------+-----------+------------------------+----------------------------------------------+ | Projection_13 | 9990.00 | root | | test.t2.x | | └─Apply_15 | 9990.00 | root | | inner join, equal:[eq(test.t2.x, test.t1.x)] | | ├─TableReader_18(Build) | 9990.00 | root | | data:Selection_17 | | │ └─Selection_17 | 9990.00 | cop[tikv] | | not(isnull(test.t2.x)) | | │ └─TableFullScan_16 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo | | └─Selection_19(Probe) | 0.80 | root | | not(isnull(test.t1.x)) | | └─MaxOneRow_20 | 1.00 | root | | | | └─Union_21 | 2.00 | root | | | | ├─TableReader_24 | 2.00 | root | | data:Selection_23 | | │ └─Selection_23 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) | | │ └─TableFullScan_22 | 2500.00 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo | | └─TableReader_27 | 2.00 | root | | data:Selection_26 | | └─Selection_26 | 2.00 | cop[tikv] | | eq(test.t2.x, test.t1.x), lt(test.t2.x, 2) | | └─TableFullScan_25 | 2500.00 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo | +--------------------------------------+----------+-----------+------------------------+----------------------------------------------+

每次读取 t2 的一行数据时,都需要对 t1 的分区表进行查询。理论上,t1.x > val 的过滤条件在此时已满足,但实际上,分区裁剪只在查询计划的生成阶段生效,而非在执行阶段。

在 Range 分区表中使用分区裁剪

本节描述 Range 分区表中分区裁剪的适用和不适用场景。

适用场景

Range 分区表中,分区裁剪适用于以下三种场景。

场景一

分区裁剪适用于 Range 分区表中的等值比较查询条件。例如:

create table t (x int) partition by range (x) ( partition p0 values less than (5), partition p1 values less than (10), partition p2 values less than (15) ); explain select * from t where x = 3;
+-------------------------+----------+-----------+-----------------------+--------------------------------+ | id | estRows | task | access object | operator info | +-------------------------+----------+-----------+-----------------------+--------------------------------+ | TableReader_8 | 10.00 | root | | data:Selection_7 | | └─Selection_7 | 10.00 | cop[tikv] | | eq(test.t.x, 3) | | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo | +-------------------------+----------+-----------+-----------------------+--------------------------------+

分区裁剪也适用于使用 in 查询条件的等值比较。例如:

create table t (x int) partition by range (x) ( partition p0 values less than (5), partition p1 values less than (10), partition p2 values less than (15) ); explain select * from t where x in(1,13);
+-----------------------------+----------+-----------+-----------------------+--------------------------------+ | id | estRows | task | access object | operator info | +-----------------------------+----------+-----------+-----------------------+--------------------------------+ | Union_8 | 40.00 | root | | | | ├─TableReader_11 | 20.00 | root | | data:Selection_10 | | │ └─Selection_10 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) | | │ └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p0 | keep order:false, stats:pseudo | | └─TableReader_14 | 20.00 | root | | data:Selection_13 | | └─Selection_13 | 20.00 | cop[tikv] | | in(test.t.x, 1, 13) | | └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo | +-----------------------------+----------+-----------+-----------------------+--------------------------------+

在上述 SQL 中,从 x in(1,13) 条件可以看出,所有结果只落在少数几个分区内。经过分析,发现 x=1 的所有记录都在 p0 分区,x=13 的所有记录都在 p2 分区,因此只需访问 p0p2 分区。

场景二

分区裁剪适用于区间比较的查询条件,例如 between><=>=<=。例如:

create table t (x int) partition by range (x) ( partition p0 values less than (5), partition p1 values less than (10), partition p2 values less than (15) ); explain select * from t where x between 7 and 14;
+-----------------------------+----------+-----------+-----------------------+-----------------------------------+ | id | estRows | task | access object | operator info | +-----------------------------+----------+-----------+-----------------------+-----------------------------------+ | Union_8 | 500.00 | root | | | | ├─TableReader_11 | 250.00 | root | | data:Selection_10 | | │ └─Selection_10 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) | | │ └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | | └─TableReader_14 | 250.00 | root | | data:Selection_13 | | └─Selection_13 | 250.00 | cop[tikv] | | ge(test.t.x, 7), le(test.t.x, 14) | | └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t, partition:p2 | keep order:false, stats:pseudo | +-----------------------------+----------+-----------+-----------------------+--------------------------------+
场景三

分区裁剪适用于分区表达式为 fn(col) 的简单形式,查询条件为 >、<、=、>=、<=,且 fn 函数为单调函数。

如果 fn 为单调函数,则对于任何 xy,如果 x > y,则 fn(x) > fn(y)。在这种情况下,fn 可以被视为严格单调的。对于任何 xy,如果 x > y,则 fn(x) >= fn(y)。此时,fn 也可以称为“单调函数”。理论上,所有单调函数(严格或非严格)都支持分区裁剪。目前,TiDB 仅支持以下单调函数:

例如,当分区表达式为 fn(col),且 fn 为单调函数 to_days 时,分区裁剪会生效:

create table t (id datetime) partition by range (to_days(id)) ( partition p0 values less than (to_days('2020-04-01')), partition p1 values less than (to_days('2020-05-01'))); explain select * from t where id > '2020-04-18';
+-------------------------+----------+-----------+-----------------------+-------------------------------------------+ | id | estRows | task | access object | operator info | +-------------------------+----------+-----------+-----------------------+-------------------------------------------+ | TableReader_8 | 3333.33 | root | | data:Selection_7 | | └─Selection_7 | 3333.33 | cop[tikv] | | gt(test.t.id, 2020-04-18 00:00:00.000000) | | └─TableFullScan_6 | 10000.00 | cop[tikv] | table:t, partition:p1 | keep order:false, stats:pseudo | +-------------------------+----------+-----------+-----------------------+-------------------------------------------+

不适用场景:Range 分区表

由于分区裁剪的规则优化是在查询计划生成阶段进行的,因此不适用于过滤条件只能在执行阶段获得的场景。例如:

create table t1 (x int) partition by range (x) ( partition p0 values less than (5), partition p1 values less than (10)); create table t2 (x int); explain select * from t2 where x < (select * from t1 where t2.x < t1.x and t2.x < 2);
+--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+ | id | estRows | task | access object | operator info | +--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+ | Projection_13 | 9990.00 | root | | test.t2.x | | └─Apply_15 | 9990.00 | root | | CARTESIAN inner join, other cond:lt(test.t2.x, test.t1.x) | | ├─TableReader_18(Build) | 9990.00 | root | | data:Selection_17 | | │ └─Selection_17 | 9990.00 | cop[tikv] | | not(isnull(test.t2.x)) | | │ └─TableFullScan_16 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo | | └─Selection_19(Probe) | 0.80 | root | | not(isnull(test.t1.x)) | | └─MaxOneRow_20 | 1.00 | root | | | | └─Union_21 | 2.00 | root | | | | ├─TableReader_24 | 2.00 | root | | data:Selection_23 | | │ └─Selection_23 | 2.00 | cop[tikv] | | lt(test.t2.x, 2), lt(test.t2.x, test.t1.x) | | │ └─TableFullScan_22 | 2.50 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo | | └─TableReader_27 | 2.00 | root | | data:Selection_26 | | └─Selection_26 | 2.00 | cop[tikv] | | lt(test.t2.x, 2), lt(test.t1.x, test.t2.x) | | └─TableFullScan_25 | 2.50 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo | +--------------------------------------+----------+-----------+------------------------+-----------------------------------------------------------+ 14 rows in set (0.00 sec)

每次读取 t2 的一行数据时,都需要对 t1 的分区表进行查询。理论上,t1.x > val 的过滤条件在此时已满足,但实际上,分区裁剪只在查询计划的生成阶段生效,而非在执行阶段。

文档内容是否有帮助?