TiFlash 支持的下推计算
本文介绍 TiFlash 支持的下推计算。
下推运算符
TiFlash 支持下推以下运算符:
- TableScan:从表中读取数据。
- Selection:过滤数据。
- HashAgg:基于 Hash 聚合 算法进行数据聚合。
- StreamAgg:基于 流式聚合 算法进行数据聚合。StreamAgg 仅支持无
GROUP BY条件的聚合。 - TopN:执行 TopN 计算。
- Limit:执行 limit 计算。
- Project:执行投影计算。
- HashJoin:使用 Hash Join 算法进行 join 计算,但需满足以下条件:
- 该运算符仅能在 MPP 模式 下下推。
- 支持的 join 类型包括 Inner Join、Left Join、Semi Join、Anti Semi Join、Left Semi Join 和 Anti Left Semi Join。
- 上述 join 类型均支持等值 join(Equi Join)和非等值 join(Non-Equi Join,笛卡尔 Join 或 Null-aware Semi Join)。在计算笛卡尔 Join 或 Null-aware Semi Join 时,采用 Broadcast 算法,而非 Shuffle Hash Join 算法。
- 窗口函数:目前,TiFlash 支持
ROW_NUMBER()、RANK()、DENSE_RANK()、LEAD()、LAG()、FIRST_VALUE()和LAST_VALUE()。
在 TiDB 中,运算符以树状结构组织。要将某个运算符下推到 TiFlash,需同时满足以下前提条件:
- 其所有子运算符均可下推到 TiFlash。
- 若运算符包含表达式(大多数运算符都包含表达式),则该运算符的所有表达式均可下推到 TiFlash。
下推表达式
TiFlash 支持下推以下表达式:
限制
包含 Bit、Set 和 Geometry 类型的表达式无法下推到 TiFlash。
DATE_ADD()、DATE_SUB()、ADDDATE()和SUBDATE()函数仅支持以下 interval 类型。如果使用了其他 interval 类型,TiFlash 会报错。- DAY
- WEEK
- MONTH
- YEAR
- HOUR
- MINUTE
- SECOND
如果查询中存在不支持下推的计算,TiDB 需要完成剩余的计算,这可能会极大影响 TiFlash 的加速效果。目前不支持的运算符和表达式在未来版本中可能会被支持。
如 MAX() 等函数,作为聚合函数时支持下推,但作为窗口函数时不支持下推。
示例
本节提供了一些将运算符和表达式下推到 TiFlash 的示例。
示例 1:将运算符下推到 TiFlash
CREATE TABLE t(id INT PRIMARY KEY, a INT);
ALTER TABLE t SET TIFLASH REPLICA 1;
EXPLAIN SELECT * FROM t LIMIT 3;
+------------------------------+---------+--------------+---------------+--------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+---------+--------------+---------------+--------------------------------+
| Limit_9 | 3.00 | root | | offset:0, count:3 |
| └─TableReader_17 | 3.00 | root | | data:ExchangeSender_16 |
| └─ExchangeSender_16 | 3.00 | mpp[tiflash] | | ExchangeType: PassThrough |
| └─Limit_15 | 3.00 | mpp[tiflash] | | offset:0, count:3 |
| └─TableFullScan_14 | 3.00 | mpp[tiflash] | table:t | keep order:false, stats:pseudo |
+------------------------------+---------+--------------+---------------+--------------------------------+
5 rows in set (0.18 sec)
在上述示例中,Limit 运算符被下推到 TiFlash 进行数据过滤,有助于减少需要传输的数据量,降低网络开销。可以通过 Limit_15 运算符所在行的 task 列值为 mpp[tiflash] 进行识别。
示例 2:将表达式下推到 TiFlash
CREATE TABLE t(id INT PRIMARY KEY, a INT);
ALTER TABLE t SET TIFLASH REPLICA 1;
INSERT INTO t(id,a) VALUES (1,2),(2,4),(11,2),(12,4),(13,4),(14,7);
EXPLAIN SELECT MAX(id + a) FROM t GROUP BY a;
+------------------------------------+---------+--------------+---------------+---------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------------+---------+--------------+---------------+---------------------------------------------------------------------------+
| TableReader_45 | 4.80 | root | | data:ExchangeSender_44 |
| └─ExchangeSender_44 | 4.80 | mpp[tiflash] | | ExchangeType: PassThrough |
| └─Projection_39 | 4.80 | mpp[tiflash] | | Column#3 |
| └─HashAgg_37 | 4.80 | mpp[tiflash] | | group by:Column#9, funcs:max(Column#8)->Column#3 |
| └─Projection_46 | 6.00 | mpp[tiflash] | | plus(test.t.id, test.t.a)->Column#8, test.t.a |
| └─ExchangeReceiver_23 | 6.00 | mpp[tiflash] | | |
| └─ExchangeSender_22 | 6.00 | mpp[tiflash] | | ExchangeType: HashPartition, Hash Cols: [name: test.t.a, collate: binary] |
| └─TableFullScan_21 | 6.00 | mpp[tiflash] | table:t | keep order:false, stats:pseudo |
+------------------------------------+---------+--------------+---------------+---------------------------------------------------------------------------+
8 rows in set (0.18 sec)
在上述示例中,表达式 id + a 被下推到 TiFlash 进行提前计算,有助于减少需要传输的数据量,从而降低网络传输开销并提升整体计算性能。可以通过 operator 列为 plus(test.t.id, test.t.a) 的行,其 task 列值为 mpp[tiflash] 进行识别。
示例 3:下推限制
CREATE TABLE t(id INT PRIMARY KEY, a INT);
ALTER TABLE t SET TIFLASH REPLICA 1;
INSERT INTO t(id,a) VALUES (1,2),(2,4),(11,2),(12,4),(13,4),(14,7);
EXPLAIN SELECT id FROM t WHERE TIME(now()+ a) < '12:00:00';
+-----------------------------+---------+--------------+---------------+--------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+---------+--------------+---------------+--------------------------------------------------------------------------------------------------+
| Projection_4 | 4.80 | root | | test.t.id |
| └─Selection_6 | 4.80 | root | | lt(cast(time(cast(plus(20230110083056, test.t.a), var_string(20))), var_string(10)), "12:00:00") |
| └─TableReader_11 | 6.00 | root | | data:ExchangeSender_10 |
| └─ExchangeSender_10 | 6.00 | mpp[tiflash] | | ExchangeType: PassThrough |
| └─TableFullScan_9 | 6.00 | mpp[tiflash] | table:t | keep order:false, stats:pseudo |
+-----------------------------+---------+--------------+---------------+--------------------------------------------------------------------------------------------------+
5 rows in set, 3 warnings (0.20 sec)
上述示例仅在 TiFlash 上执行了 TableFullScan,其他函数的计算和过滤均在 root 层完成,未下推到 TiFlash。
你可以通过以下命令识别无法下推到 TiFlash 的运算符和表达式:
SHOW WARNINGS;
+---------+------+------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1105 | Scalar function 'time'(signature: Time, return type: time) is not supported to push down to storage layer now. |
| Warning | 1105 | Scalar function 'cast'(signature: CastDurationAsString, return type: var_string(10)) is not supported to push down to tiflash now. |
| Warning | 1105 | Scalar function 'cast'(signature: CastDurationAsString, return type: var_string(10)) is not supported to push down to tiflash now. |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.18 sec)
上述示例中的表达式无法完全下推到 TiFlash,是因为 Time 和 Cast 函数无法下推到 TiFlash。
示例 4:窗口函数
CREATE TABLE t(id INT PRIMARY KEY, c1 VARCHAR(100));
ALTER TABLE t SET TIFLASH REPLICA 1;
INSERT INTO t VALUES(1,"foo"),(2,"bar"),(3,"bar foo"),(10,"foo"),(20,"bar"),(30,"bar foo");
EXPLAIN SELECT id, ROW_NUMBER() OVER (PARTITION BY id > 10) FROM t;
+----------------------------------+----------+--------------+---------------+---------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------------+----------+--------------+---------------+---------------------------------------------------------------------------------------------------------------+
| TableReader_30 | 10000.00 | root | | MppVersion: 1, data:ExchangeSender_29 |
| └─ExchangeSender_29 | 10000.00 | mpp[tiflash] | | ExchangeType: PassThrough |
| └─Projection_7 | 10000.00 | mpp[tiflash] | | test.t.id, Column#5, stream_count: 4 |
| └─Window_28 | 10000.00 | mpp[tiflash] | | row_number()->Column#5 over(partition by Column#4 rows between current row and current row), stream_count: 4 |
| └─Sort_14 | 10000.00 | mpp[tiflash] | | Column#4, stream_count: 4 |
| └─ExchangeReceiver_13 | 10000.00 | mpp[tiflash] | | stream_count: 4 |
| └─ExchangeSender_12 | 10000.00 | mpp[tiflash] | | ExchangeType: HashPartition, Compression: FAST, Hash Cols: [name: Column#4, collate: binary], stream_count: 4 |
| └─Projection_10 | 10000.00 | mpp[tiflash] | | test.t.id, gt(test.t.id, 10)->Column#4 |
| └─TableFullScan_11 | 10000.00 | mpp[tiflash] | table:t | keep order:false, stats:pseudo |
+----------------------------------+----------+--------------+---------------+---------------------------------------------------------------------------------------------------------------+
9 rows in set (0.0073 sec)
在该输出中,可以看到 Window 操作的 task 列值为 mpp[tiflash],说明 ROW_NUMBER() OVER (PARTITION BY id > 10) 操作可以下推到 TiFlash。
CREATE TABLE t(id INT PRIMARY KEY, c1 VARCHAR(100));
ALTER TABLE t SET TIFLASH REPLICA 1;
INSERT INTO t VALUES(1,"foo"),(2,"bar"),(3,"bar foo"),(10,"foo"),(20,"bar"),(30,"bar foo");
EXPLAIN SELECT id, MAX(id) OVER (PARTITION BY id > 10) FROM t;
+-----------------------------+----------+-----------+---------------+------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+----------+-----------+---------------+------------------------------------------------------------+
| Projection_6 | 10000.00 | root | | test.t1.id, Column#5 |
| └─Shuffle_14 | 10000.00 | root | | execution info: concurrency:5, data sources:[Projection_8] |
| └─Window_7 | 10000.00 | root | | max(test.t1.id)->Column#5 over(partition by Column#4) |
| └─Sort_13 | 10000.00 | root | | Column#4 |
| └─Projection_8 | 10000.00 | root | | test.t1.id, gt(test.t1.id, 10)->Column#4 |
| └─TableReader_10 | 10000.00 | root | | data:TableFullScan_9 |
| └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+-----------------------------+----------+-----------+---------------+------------------------------------------------------------+
7 rows in set (0.0010 sec)
在该输出中,可以看到 Window 操作的 task 列值为 root,说明 MAX(id) OVER (PARTITION BY id > 10) 操作无法下推到 TiFlash。这是因为 MAX() 仅作为聚合函数时支持下推,作为窗口函数时不支持下推。