从 Window Functions 派生 TopN 或 Limit
Window Functions 是一种常见的 SQL 函数类型。当你使用 window function 进行行编号,例如 ROW_NUMBER()
或 RANK()
时,通常会在 window function 计算完成后对结果进行过滤。例如:
SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY a) AS rownumber FROM t) dt WHERE rownumber <= 3
在典型的 SQL 执行过程中,TiDB 首先对表 t
中的所有数据进行排序,然后为每一行计算 ROW_NUMBER()
结果,最后再进行 rownumber <= 3
的过滤。
从 v7.0.0 版本开始,TiDB 支持将从 window functions 派生的 TopN 或 Limit 操作符进行优化。借助此规则,TiDB 可以将原始 SQL 重写为等价的形式,例如:
WITH t_topN AS (SELECT a FROM t1 ORDER BY a LIMIT 3) SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY a) AS rownumber FROM t_topN) dt WHERE rownumber <= 3
重写后,TiDB 可以从 window function 和后续的过滤条件中推导出 TopN 操作符。与原始 SQL 中的 Sort 操作(ORDER BY
)相比,TopN 操作符具有更高的执行效率。此外,TiKV 和 TiFlash 也支持将 TopN 操作下推,从而进一步提升重写 SQL 的性能。
从 window functions 派生 TopN 或 Limit 默认是关闭的。你可以通过将会话变量 tidb_opt_derive_topn 设置为 ON
来启用此功能。
启用后,你可以通过以下操作之一禁用该功能:
- 将会话变量 tidb_opt_derive_topn 设置为
OFF
。 - 按照 The blocklist of optimization rules and expression pushdown 中的步骤操作。
限制条件
- 目前仅支持对
ROW_NUMBER()
window function 进行 SQL 重写。 - TiDB 仅在过滤条件为
<
或<=
时,才能将 SQL 重写为包含推导的 TopN。
使用示例
以下示例演示如何使用该优化规则。
不含 PARTITION BY 的 window functions
示例 1:不含 ORDER BY 的 window functions
CREATE TABLE t(id int, value int);
SET tidb_opt_derive_topn=on;
EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER () AS rownumber FROM t) dt WHERE rownumber <= 3;
结果如下:
+----------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+
| Projection_9 | 2.40 | root | | Column#5 |
| └─Selection_10 | 2.40 | root | | le(Column#5, 3) |
| └─Window_11 | 3.00 | root | | row_number()->Column#5 over(rows between current row and current row) |
| └─Limit_15 | 3.00 | root | | offset:0, count:3 |
| └─TableReader_26 | 3.00 | root | | data:Limit_25 |
| └─Limit_25 | 3.00 | cop[tikv] | | offset:0, count:3 |
| └─TableFullScan_24 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+----------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+
在此查询中,优化器从 window function 派生出 Limit 操作符,并将其下推到 TiKV。
示例 2:含 ORDER BY 的 window functions
CREATE TABLE t(id int, value int);
SET tidb_opt_derive_topn=on;
EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY value) AS rownumber FROM t) dt WHERE rownumber <= 3;
结果如下:
+----------------------------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------+
| Projection_10 | 2.40 | root | | Column#5 |
| └─Selection_11 | 2.40 | root | | le(Column#5, 3) |
| └─Window_12 | 3.00 | root | | row_number()->Column#5 over(order by test.t.value rows between current row and current row) |
| └─TopN_13 | 3.00 | root | | test.t.value, offset:0, count:3 |
| └─TableReader_25 | 3.00 | root | | data:TopN_24 |
| └─TopN_24 | 3.00 | cop[tikv] | | test.t.value, offset:0, count:3 |
| └─TableFullScan_23 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+----------------------------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------+
在此查询中,优化器从 window function 派生出 TopN 操作符,并将其下推到 TiKV。
含 PARTITION BY 的 window functions
示例 3:不含 ORDER BY 的 window functions
CREATE TABLE t(id1 int, id2 int, value1 int, value2 int, primary key(id1,id2) clustered);
SET tidb_opt_derive_topn=on;
EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY id1) AS rownumber FROM t) dt WHERE rownumber <= 3;
结果如下:
+------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------+
| Projection_10 | 2.40 | root | | Column#6 |
| └─Selection_11 | 2.40 | root | | le(Column#6, 3) |
| └─Shuffle_26 | 3.00 | root | | execution info: concurrency:2, data sources:[TableReader_24] |
| └─Window_12 | 3.00 | root | | row_number()->Column#6 over(partition by test.t.id1 rows between current row and current row) |
| └─Sort_25 | 3.00 | root | | test.t.id1 |
| └─TableReader_24 | 3.00 | root | | data:Limit_23 |
| └─Limit_23 | 3.00 | cop[tikv] | | partition by test.t.id1, offset:0, count:3 |
| └─TableFullScan_22 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------+
在此查询中,优化器从 window function 派生出 Limit 操作符,并将其下推到 TiKV。注意,此 Limit 实际上是分区 Limit,即对每个具有相同 id1
值的分组应用 Limit。
示例 4:含 ORDER BY 的 window functions
CREATE TABLE t(id1 int, id2 int, value1 int, value2 int, primary key(id1,id2) clustered);
SET tidb_opt_derive_topn=on;
EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY id1 ORDER BY value1) AS rownumber FROM t) dt WHERE rownumber <= 3;
结果如下:
+------------------------------------+----------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------------+----------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------+
| Projection_10 | 2.40 | root | | Column#6 |
| └─Selection_11 | 2.40 | root | | le(Column#6, 3) |
| └─Shuffle_23 | 3.00 | root | | execution info: concurrency:3, data sources:[TableReader_21] |
| └─Window_12 | 3.00 | root | | row_number()->Column#6 over(partition by test.t.id1 order by test.t.value1 rows between current row and current row) |
| └─Sort_22 | 3.00 | root | | test.t.id1, test.t.value1 |
| └─TableReader_21 | 3.00 | root | | data:TopN_19 |
| └─TopN_19 | 3.00 | cop[tikv] | | partition by test.t.id1 order by test.t.value1, offset:0, count:3 |
| └─TableFullScan_18 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+------------------------------------+----------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------+
在此查询中,优化器从 window function 派生出 TopN 操作符,并将其下推到 TiKV。注意,此 TopN 实际上是分区 TopN,即对每个具有相同 id1
值的分组应用。
示例 5:PARTITION BY 列不是主键的前缀
CREATE TABLE t(id1 int, id2 int, value1 int, value2 int, primary key(id1,id2) clustered);
SET tidb_opt_derive_topn=on;
EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY value1) AS rownumber FROM t) dt WHERE rownumber <= 3;
结果如下:
+----------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------+
| Projection_9 | 8000.00 | root | | Column#6 |
| └─Selection_10 | 8000.00 | root | | le(Column#6, 3) |
| └─Shuffle_15 | 10000.00 | root | | execution info: concurrency:5, data sources:[TableReader_13] |
| └─Window_11 | 10000.00 | root | | row_number()->Column#6 over(partition by test.t.value1 rows between current row and current row) |
| └─Sort_14 | 10000.00 | root | | test.t.value1 |
| └─TableReader_13 | 10000.00 | root | | data:TableFullScan_12 |
| └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+----------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------+
在此查询中,由于 PARTITION BY
列不是主键的前缀,SQL 不会被重写。
示例 6:PARTITION BY 列是主键的前缀但不是聚簇索引
CREATE TABLE t(id1 int, id2 int, value1 int, value2 int, primary key(id1,id2) nonclustered);
SET tidb_opt_derive_topn=on;
EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY id1) AS rownumber FROM t use index()) dt WHERE rownumber <= 3;
结果如下:
+----------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------+
| Projection_9 | 8000.00 | root | | Column#7 |
| └─Selection_10 | 8000.00 | root | | le(Column#7, 3) |
| └─Shuffle_15 | 10000.00 | root | | execution info: concurrency:5, data sources:[TableReader_13] |
| └─Window_11 | 10000.00 | root | | row_number()->Column#7 over(partition by test.t.id1 rows between current row and current row) |
| └─Sort_14 | 10000.00 | root | | test.t.id1 |
| └─TableReader_13 | 10000.00 | root | | data:TableFullScan_12 |
| └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+----------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------+
在此查询中,虽然 PARTITION BY
列是主键的前缀,但由于主键不是聚簇索引,SQL 不会被重写。