📣
TiDB Cloud Premium 开放公测中。为企业级工作负载提供无限扩展、即时弹性伸缩和高级安全保障。此页面由 AI 自动翻译,英文原文请见此处。

解释使用 Join 的语句



在 TiDB 中,SQL 优化器需要决定哪些表应以何种顺序进行连接,以及为特定 SQL 语句选择最有效的连接算法。本文中的示例基于以下样本数据:

CREATE TABLE t1 (id BIGINT NOT NULL PRIMARY KEY auto_increment, pad1 BLOB, pad2 BLOB, pad3 BLOB, int_col INT NOT NULL DEFAULT 0); CREATE TABLE t2 (id BIGINT NOT NULL PRIMARY KEY auto_increment, t1_id BIGINT NOT NULL, pad1 BLOB, pad2 BLOB, pad3 BLOB, INDEX(t1_id)); INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM dual; INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t1 SELECT NULL, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024), 0 FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; INSERT INTO t2 SELECT NULL, a.id, RANDOM_BYTES(1024), RANDOM_BYTES(1024), RANDOM_BYTES(1024) FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000; UPDATE t1 SET int_col = 1 WHERE pad1 = (SELECT pad1 FROM t1 ORDER BY RAND() LIMIT 1); SELECT SLEEP(1); ANALYZE TABLE t1, t2;

Index Join

如果需要连接的估算行数较少(通常少于 10000 行),优先考虑使用 index join 方法。这种连接方式类似于 MySQL 中的主方法。在以下示例中,操作符 ├─TableReader_29(Build) 首先读取表 t1。对于每一行匹配的记录,TiDB 将探测表 t2

EXPLAIN SELECT /*+ INL_JOIN(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.id = t2.t1_id;
+---------------------------------+----------+-----------+------------------------------+---------------------------------------------------------------------------------------------------------------------------+ | id | estRows | task | access object | operator info | +---------------------------------+----------+-----------+------------------------------+---------------------------------------------------------------------------------------------------------------------------+ | IndexJoin_11 | 90000.00 | root | | inner join, inner:IndexLookUp_10, outer key:test.t1.id, inner key:test.t2.t1_id, equal cond:eq(test.t1.id, test.t2.t1_id) | | ├─TableReader_29(Build) | 71010.00 | root | | data:TableFullScan_28 | | │ └─TableFullScan_28 | 71010.00 | cop[tikv] | table:t1 | keep order:false | | └─IndexLookUp_10(Probe) | 90000.00 | root | | | | ├─IndexRangeScan_8(Build) | 90000.00 | cop[tikv] | table:t2, index:t1_id(t1_id) | range: decided by [eq(test.t2.t1_id, test.t1.id)], keep order:false | | └─TableRowIDScan_9(Probe) | 90000.00 | cop[tikv] | table:t2 | keep order:false | +---------------------------------+----------+-----------+------------------------------+---------------------------------------------------------------------------------------------------------------------------+

index join 在内存使用方面较为高效,但在需要大量 probe 操作时,可能比其他连接方法执行得更慢。考虑以下查询:

SELECT * FROM t1 INNER JOIN t2 ON t1.id=t2.t1_id WHERE t1.pad1 = 'value' and t2.pad1='value';

在内连接操作中,TiDB 实现了连接重排序,可能先访问 t1t2。假设 TiDB 选择 t1 作为第一个表进行 build 步骤,然后在探测 t2 之前,能够在 t1.pad1 = 'value' 的条件下过滤。对 t2.pad1='value' 条件的过滤会在每次探测 t2 时应用,这可能比其他连接方法效率低。

index join 在 build 端较小且 probe 端已建立索引且较大时效果最佳。考虑以下查询,index join 的性能甚至比 hash join 更差,且未被 SQL 优化器选中:

-- DROP previously added index ALTER TABLE t2 DROP INDEX t1_id; EXPLAIN ANALYZE SELECT /*+ INL_JOIN(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.id = t2.t1_id WHERE t1.int_col = 1; EXPLAIN ANALYZE SELECT /*+ HASH_JOIN(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.id = t2.t1_id WHERE t1.int_col = 1; EXPLAIN ANALYZE SELECT * FROM t1 INNER JOIN t2 ON t1.id = t2.t1_id WHERE t1.int_col = 1;
+-----------------------------+----------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+---------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +-----------------------------+----------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+---------+------+ | IndexJoin_14 | 90000.00 | 0 | root | | time:330.2ms, loops:1, inner:{total:72.2ms, concurrency:5, task:12, construct:58.6ms, fetch:13.5ms, build:2.12µs}, probe:26.1ms | inner join, inner:TableReader_10, outer key:test.t2.t1_id, inner key:test.t1.id, equal cond:eq(test.t2.t1_id, test.t1.id) | 88.5 MB | N/A | | ├─TableReader_20(Build) | 90000.00 | 90000 | root | | time:307.2ms, loops:96, cop_task: {num: 24, max: 130.6ms, min: 170.9µs, avg: 33.5ms, p95: 105ms, max_proc_keys: 10687, p95_proc_keys: 9184, tot_proc: 472ms, rpc_num: 24, rpc_time: 802.4ms, copr_cache_hit_ratio: 0.62, distsql_concurrency: 15} | data:TableFullScan_19 | 58.6 MB | N/A | | │ └─TableFullScan_19 | 90000.00 | 90000 | cop[tikv] | table:t2 | tikv_task:{proc max:34ms, min:0s, avg: 15.3ms, p80:24ms, p95:30ms, iters:181, tasks:24}, scan_detail: {total_process_keys: 69744, total_process_keys_size: 217533936, total_keys: 69753, get_snapshot_time: 701.6µs, rocksdb: {delete_skipped_count: 97368, key_skipped_count: 236847, block: {cache_hit_count: 3509}}} | keep order:false | N/A | N/A | | └─TableReader_10(Probe) | 12617.92 | 0 | root | | time:11.9ms, loops:12, cop_task: {num: 42, max: 848.8µs, min: 199µs, avg: 451.8µs, p95: 846.2µs, max_proc_keys: 7, p95_proc_keys: 5, rpc_num: 42, rpc_time: 18.3ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15} | data:Selection_9 | N/A | N/A | | └─Selection_9 | 12617.92 | 0 | cop[tikv] | | tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:42, tasks:42}, scan_detail: {total_process_keys: 56, total_process_keys_size: 174608, total_keys: 77, get_snapshot_time: 727.7µs, rocksdb: {block: {cache_hit_count: 154}}} | eq(test.t1.int_col, 1) | N/A | N/A | | └─TableRangeScan_8 | 90000.00 | 56 | cop[tikv] | table:t1 | tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:42, tasks:42} | range: decided by [test.t2.t1_id], keep order:false | N/A | N/A | +-----------------------------+----------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+---------+------+ +------------------------------+----------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +------------------------------+----------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+ | HashJoin_20 | 90000.00 | 0 | root | | time:313.6ms, loops:1, build_hash_table:{total:24.6ms, fetch:21.2ms, build:3.32ms}, probe:{concurrency:5, total:1.57s, max:313.5ms, probe:18.9ms, fetch:1.55s} | inner join, equal:[eq(test.t1.id, test.t2.t1_id)] | 32.0 MB | 0 Bytes | | ├─TableReader_23(Build) | 9955.54 | 10000 | root | | time:23.6ms, loops:12, cop_task: {num: 11, max: 504.6µs, min: 203.7µs, avg: 377.4µs, p95: 504.6µs, rpc_num: 11, rpc_time: 3.92ms, copr_cache_hit_ratio: 1.00, distsql_concurrency: 15} | data:Selection_22 | 14.9 MB | N/A | | │ └─Selection_22 | 9955.54 | 10000 | cop[tikv] | | tikv_task:{proc max:104ms, min:3ms, avg: 24.4ms, p80:33ms, p95:104ms, iters:113, tasks:11}, scan_detail: {get_snapshot_time: 241.4µs, rocksdb: {block: {}}} | eq(test.t1.int_col, 1) | N/A | N/A | | │ └─TableFullScan_21 | 71010.00 | 71010 | cop[tikv] | table:t1 | tikv_task:{proc max:101ms, min:3ms, avg: 23.8ms, p80:33ms, p95:101ms, iters:113, tasks:11} | keep order:false | N/A | N/A | | └─TableReader_25(Probe) | 90000.00 | 90000 | root | | time:293.7ms, loops:91, cop_task: {num: 24, max: 105.7ms, min: 210.9µs, avg: 31.4ms, p95: 103.8ms, max_proc_keys: 10687, p95_proc_keys: 9184, tot_proc: 407ms, rpc_num: 24, rpc_time: 752.2ms, copr_cache_hit_ratio: 0.62, distsql_concurrency: 15} | data:TableFullScan_24 | 58.6 MB | N/A | | └─TableFullScan_24 | 90000.00 | 90000 | cop[tikv] | table:t2 | tikv_task:{proc max:31ms, min:0s, avg: 13ms, p80:19ms, p95:26ms, iters:181, tasks:24}, scan_detail: {total_process_keys: 69744, total_process_keys_size: 217533936, total_keys: 69753, get_snapshot_time: 637.2µs, rocksdb: {delete_skipped_count: 97368, key_skipped_count: 236847, block: {cache_hit_count: 3509}}} | keep order:false | N/A | N/A | +------------------------------+----------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+ +------------------------------+----------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +------------------------------+----------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+ | HashJoin_21 | 90000.00 | 0 | root | | time:331.7ms, loops:1, build_hash_table:{total:32.7ms, fetch:26ms, build:6.73ms}, probe:{concurrency:5, total:1.66s, max:331.3ms, probe:16ms, fetch:1.64s} | inner join, equal:[eq(test.t1.id, test.t2.t1_id)] | 32.3 MB | 0 Bytes | | ├─TableReader_26(Build) | 9955.54 | 10000 | root | | time:30.4ms, loops:13, cop_task: {num: 11, max: 1.87ms, min: 844.7µs, avg: 1.29ms, p95: 1.87ms, rpc_num: 11, rpc_time: 13.5ms, copr_cache_hit_ratio: 1.00, distsql_concurrency: 15} | data:Selection_25 | 12.2 MB | N/A | | │ └─Selection_25 | 9955.54 | 10000 | cop[tikv] | | tikv_task:{proc max:104ms, min:3ms, avg: 24.4ms, p80:33ms, p95:104ms, iters:113, tasks:11}, scan_detail: {get_snapshot_time: 521µs, rocksdb: {block: {}}} | eq(test.t1.int_col, 1) | N/A | N/A | | │ └─TableFullScan_24 | 71010.00 | 71010 | cop[tikv] | table:t1 | tikv_task:{proc max:101ms, min:3ms, avg: 23.8ms, p80:33ms, p95:101ms, iters:113, tasks:11} | keep order:false | N/A | N/A | | └─TableReader_23(Probe) | 90000.00 | 90000 | root | | time:308.6ms, loops:91, cop_task: {num: 24, max: 123.3ms, min: 518.9µs, avg: 32.4ms, p95: 113.4ms, max_proc_keys: 10687, p95_proc_keys: 9184, tot_proc: 499ms, rpc_num: 24, rpc_time: 776ms, copr_cache_hit_ratio: 0.62, distsql_concurrency: 15} | data:TableFullScan_22 | 58.6 MB | N/A | | └─TableFullScan_22 | 90000.00 | 90000 | cop[tikv] | table:t2 | tikv_task:{proc max:44ms, min:0s, avg: 16.8ms, p80:27ms, p95:40ms, iters:181, tasks:24}, scan_detail: {total_process_keys: 69744, total_process_keys_size: 217533936, total_keys: 69753, get_snapshot_time: 955.4µs, rocksdb: {delete_skipped_count: 97368, key_skipped_count: 236847, block: {cache_hit_count: 3509}}} | keep order:false | N/A | N/A | +------------------------------+----------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+

在上述示例中,index join 操作缺少 t1.int_col 的索引。一旦添加该索引,性能由 0.3 秒提升到 0.06 秒,具体结果如下:

-- 重新添加索引 ALTER TABLE t2 ADD INDEX (t1_id); EXPLAIN ANALYZE SELECT /*+ INL_JOIN(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.id = t2.t1_id WHERE t1.int_col = 1; EXPLAIN ANALYZE SELECT /*+ HASH_JOIN(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.id = t2.t1_id WHERE t1.int_col = 1; EXPLAIN ANALYZE SELECT * FROM t1 INNER JOIN t2 ON t1.id = t2.t1_id WHERE t1.int_col = 1;
+----------------------------------+----------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+-----------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +----------------------------------+----------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+-----------+------+ | IndexJoin_12 | 90000.00 | 0 | root | | time:65.6ms, loops:1, inner:{total:129.7ms, concurrency:5, task:7, construct:7.13ms, fetch:122.5ms, build:16.4µs}, probe:2.54ms | inner join, inner:IndexLookUp_11, outer key:test.t1.id, inner key:test.t2.t1_id, equal cond:eq(test.t1.id, test.t2.t1_id) | 28.7 MB | N/A | | ├─TableReader_33(Build) | 9955.54 | 10000 | root | | time:15.4ms, loops:16, cop_task: {num: 11, max: 1.52ms, min: 211.5µs, avg: 416.8µs, p95: 1.52ms, rpc_num: 11, rpc_time: 4.36ms, copr_cache_hit_ratio: 1.00, distsql_concurrency: 15} | data:Selection_32 | 13.9 MB | N/A | | │ └─Selection_32 | 9955.54 | 10000 | cop[tikv] | | tikv_task:{proc max:104ms, min:3ms, avg: 24.4ms, p80:33ms, p95:104ms, iters:113, tasks:11}, scan_detail: {get_snapshot_time: 185µs, rocksdb: {block: {}}} | eq(test.t1.int_col, 1) | N/A | N/A | | │ └─TableFullScan_31 | 71010.00 | 71010 | cop[tikv] | table:t1 | tikv_task:{proc max:101ms, min:3ms, avg: 23.8ms, p80:33ms, p95:101ms, iters:113, tasks:11} | keep order:false | N/A | N/A | | └─IndexLookUp_11(Probe) | 90000.00 | 0 | root | | time:115.6ms, loops:7 | | 555 Bytes | N/A | | ├─IndexRangeScan_9(Build) | 90000.00 | 0 | cop[tikv] | table:t2, index:t1_id(t1_id) | time:114.3ms, loops:7, cop_task: {num: 7, max: 42ms, min: 1.3ms, avg: 16.2ms, p95: 42ms, tot_proc: 71ms, rpc_num: 7, rpc_time: 113.2ms, copr_cache_hit_ratio: 0.29, distsql_concurrency: 15}, tikv_task:{proc max:37ms, min:0s, avg: 11.3ms, p80:20ms, p95:37ms, iters:7, tasks:7}, scan_detail: {total_keys: 9296, get_snapshot_time: 141.9µs, rocksdb: {block: {cache_hit_count: 18592}}} | range: decided by [eq(test.t2.t1_id, test.t1.id)], keep order:false | N/A | N/A | | └─TableRowIDScan_10(Probe) | 90000.00 | 0 | cop[tikv] | table:t2 | | keep order:false | N/A | N/A | +----------------------------------+----------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+-----------+------+ +------------------------------+----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +------------------------------+----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+ | HashJoin_32 | 90000.00 | 0 | root | | time:320.2ms, loops:1, build_hash_table:{total:19.3ms, fetch:16.8ms, build:2.52ms}, probe:{concurrency:5, total:1.6s, max:320.1ms, probe:16.1ms, fetch:1.58s} | inner join, equal:[eq(test.t1.id, test.t2.t1_id)] | 32.0 MB | 0 Bytes | | ├─TableReader_35(Build) | 9955.54 | 10000 | root | | time:18.6ms, loops:12, cop_task: {num: 11, max: 713.8µs, min: 197.3µs, avg: 368.5µs, p95: 713.8µs, rpc_num: 11, rpc_time: 3.83ms, copr_cache_hit_ratio: 1.00, distsql_concurrency: 15} | data:Selection_34 | 14.9 MB | N/A | | │ └─Selection_34 | 9955.54 | 10000 | cop[tikv] | | tikv_task:{proc max:104ms, min:3ms, avg: 24.4ms, p80:33ms, p95:104ms, iters:113, tasks:11}, scan_detail: {get_snapshot_time: 178.9µs, rocksdb: {block: {}}} | eq(test.t1.int_col, 1) | N/A | N/A | | │ └─TableFullScan_33 | 71010.00 | 71010 | cop[tikv] | table:t1 | tikv_task:{proc max:101ms, min:3ms, avg: 23.8ms, p80:33ms, p95:101ms, iters:113, tasks:11} | keep order:false | N/A | N/A | | └─TableReader_37(Probe) | 90000.00 | 90000 | root | | time:304.4ms, loops:91, cop_task: {num: 24, max: 114ms, min: 251.1µs, avg: 33.1ms, p95: 110.4ms, max_proc_keys: 10687, p95_proc_keys: 9184, tot_proc: 492ms, rpc_num: 24, rpc_time: 793ms, copr_cache_hit_ratio: 0.62, distsql_concurrency: 15} | data:TableFullScan_36 | 58.6 MB | N/A | | └─TableFullScan_36 | 90000.00 | 90000 | cop[tikv] | table:t2 | tikv_task:{proc max:38ms, min:3ms, avg: 14.1ms, p80:23ms, p95:35ms, iters:181, tasks:24}, scan_detail: {total_process_keys: 69744, total_process_keys_size: 217533936, total_keys: 139497, get_snapshot_time: 577.2µs, rocksdb: {delete_skipped_count: 44208, key_skipped_count: 253431, block: {cache_hit_count: 3527}}} | keep order:false | N/A | N/A | +------------------------------+----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+ +------------------------------+----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +------------------------------+----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+ | HashJoin_33 | 90000.00 | 0 | root | | time:306.3ms, loops:1, build_hash_table:{total:20.5ms, fetch:17.1ms, build:3.45ms}, probe:{concurrency:5, total:1.53s, max:305.9ms, probe:17.1ms, fetch:1.51s} | inner join, equal:[eq(test.t1.id, test.t2.t1_id)] | 32.0 MB | 0 Bytes | | ├─TableReader_42(Build) | 9955.54 | 10000 | root | | time:19.6ms, loops:12, cop_task: {num: 11, max: 1.07ms, min: 246.1µs, avg: 600µs, p95: 1.07ms, rpc_num: 11, rpc_time: 6.17ms, copr_cache_hit_ratio: 1.00, distsql_concurrency: 15} | data:Selection_41 | 19.7 MB | N/A | | │ └─Selection_41 | 9955.54 | 10000 | cop[tikv] | | tikv_task:{proc max:104ms, min:3ms, avg: 24.4ms, p80:33ms, p95:104ms, iters:113, tasks:11}, scan_detail: {get_snapshot_time: 282.9µs, rocksdb: {block: {}}} | eq(test.t1.int_col, 1) | N/A | N/A | | │ └─TableFullScan_40 | 71010.00 | 71010 | cop[tikv] | table:t1 | tikv_task:{proc max:101ms, min:3ms, avg: 23.8ms, p80:33ms, p95:101ms, iters:113, tasks:11} | keep order:false | N/A | N/A | | └─TableReader_44(Probe) | 90000.00 | 90000 | root | | time:289.2ms, loops:91, cop_task: {num: 24, max: 108.2ms, min: 252.8µs, avg: 31.3ms, p95: 106.1ms, max_proc_keys: 10687, p95_proc_keys: 9184, tot_proc: 445ms, rpc_num: 24, rpc_time: 750.4ms, copr_cache_hit_ratio: 0.62, distsql_concurrency: 15} | data:TableFullScan_43 | 58.6 MB | N/A | | └─TableFullScan_43 | 90000.00 | 90000 | cop[tikv] | table:t2 | tikv_task:{proc max:31ms, min:3ms, avg: 13.3ms, p80:24ms, p95:30ms, iters:181, tasks:24}, scan_detail: {total_process_keys: 69744, total_process_keys_size: 217533936, total_keys: 139497, get_snapshot_time: 730.2µs, rocksdb: {delete_skipped_count: 44208, key_skipped_count: 253431, block: {cache_hit_count: 3527}}} | keep order:false | N/A | N/A | +------------------------------+----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+---------+---------+

Index Join 相关算法

如果使用 Hint INL_JOIN 进行 Index Join 操作,TiDB 会在连接外表之前创建一个中间结果的 Hash Table。TiDB 同样也支持使用 Hint INL_HASH_JOIN 在外表上建 Hash Table。以上所述的 Index Join 相关算法都由 SQL 优化器自动选择。

配置

Index Join 算法的性能受以下系统变量影响:

Hash Join

在 Hash Join 操作中,TiDB 首先读取 Build 端的数据并将其缓存在 Hash Table 中,然后再读取 Probe 端的数据,使用 Probe 端的数据来探查 Hash Table 以获得所需行。与 Index Join 算法相比,Hash Join 要消耗更多内存,但如果需要连接的行数很多,运行速度会比 Index Join 快。TiDB 中的 Hash Join 算子是多线程的,并且可以并发执行。

下面是一个 Hash Join 示例:

EXPLAIN SELECT /*+ HASH_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
+-----------------------------+-----------+-----------+---------------+------------------------------------------------+ | id | estRows | task | access object | operator info | +-----------------------------+-----------+-----------+---------------+------------------------------------------------+ | HashJoin_27 | 142020.00 | root | | inner join, equal:[eq(test.t1.id, test.t2.id)] | | ├─TableReader_29(Build) | 142020.00 | root | | data:TableFullScan_28 | | │ └─TableFullScan_28 | 142020.00 | cop[tikv] | table:t1 | keep order:false | | └─TableReader_31(Probe) | 180000.00 | root | | data:TableFullScan_30 | | └─TableFullScan_30 | 180000.00 | cop[tikv] | table:t2 | keep order:false | +-----------------------------+-----------+-----------+---------------+------------------------------------------------+ 5 rows in set (0.00 sec)

TiDB 会按照以下顺序执行 HashJoin_27 算子:

  1. 将 Build 端数据缓存在内存中。
  2. 根据缓存数据在 Build 端构造一个 Hash Table。
  3. 读取 Probe 端的数据。
  4. 使用 Probe 端的数据来探查 Hash Table。
  5. 将符合条件的结果返回给用户。

EXPLAIN 返回结果中的 operator info 一列记录了 HashJoin_27 的其他信息,包括该查询是 Inner Join 还是 Outer Join 以及 Join 的条件是什么等。在上面给出的示例中,该查询为 Inner Join,Join 条件是 equal:[eq(test.t1.id, test.t2.id)],与查询语句中的 WHERE t1.id = t2.id 部分对应。下面例子中其他几个 Join 算子的 operator info 和此处类似。

运行数据

如果在执行操作时,内存使用超过了 tidb_mem_quota_query 规定的值(默认为 1 GB),且 tidb_enable_tmp_storage_on_oom 的值为 ON (默认为 ON),那么 TiDB 会尝试使用临时存储,在磁盘上创建 Hash Join 的 Build 端。EXPLAIN ANALYZE 返回结果中的 execution info 一栏记录了有关内存使用情况等运行数据。下面的例子展示了 tidb_mem_quota_query 的值分别设为 1 GB(默认)及 500 MB 时,EXPLAIN ANALYZE 的返回结果(当内存配额设为 500 MB 时,磁盘用作临时存储区):

EXPLAIN ANALYZE SELECT /*+ HASH_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id; SET tidb_mem_quota_query=500 * 1024 * 1024; EXPLAIN ANALYZE SELECT /*+ HASH_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
+-----------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+-----------------------+---------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +-----------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+-----------------------+---------+ | HashJoin_27 | 142020.00 | 71010 | root | | time:647.508572ms, loops:72, build_hash_table:{total:579.254415ms, fetch:566.91012ms, build:12.344295ms}, probe:{concurrency:5, total:3.23315006s, max:647.520113ms, probe:330.884716ms, fetch:2.902265344s} | inner join, equal:[eq(test.t1.id, test.t2.id)] | 209.61642456054688 MB | 0 Bytes | | ├─TableReader_29(Build) | 142020.00 | 71010 | root | | time:567.088247ms, loops:72, cop_task: {num: 2, max: 569.809411ms, min: 369.67451ms, avg: 469.74196ms, p95: 569.809411ms, max_proc_keys: 39245, p95_proc_keys: 39245, tot_proc: 400ms, rpc_num: 2, rpc_time: 939.447231ms, copr_cache_hit_ratio: 0.00} | data:TableFullScan_28 | 210.2100534439087 MB | N/A | | │ └─TableFullScan_28 | 142020.00 | 71010 | cop[tikv] | table:t1 | proc max:64ms, min:48ms, p80:64ms, p95:64ms, iters:79, tasks:2 | keep order:false | N/A | N/A | | └─TableReader_31(Probe) | 180000.00 | 90000 | root | | time:337.233636ms, loops:91, cop_task: {num: 3, max: 569.790741ms, min: 332.758911ms, avg: 421.543165ms, p95: 569.790741ms, max_proc_keys: 31719, p95_proc_keys: 31719, tot_proc: 500ms, rpc_num: 3, rpc_time: 1.264570696s, copr_cache_hit_ratio: 0.00} | data:TableFullScan_30 | 267.1126985549927 MB | N/A | | └─TableFullScan_30 | 180000.00 | 90000 | cop[tikv] | table:t2 | proc max:84ms, min:72ms, p80:84ms, p95:84ms, iters:102, tasks:3 | keep order:false | N/A | N/A | +-----------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+-----------------------+---------+ 5 rows in set (0.65 sec) Query OK, 0 rows affected (0.00 sec) +-----------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+-----------------------+----------------------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +-----------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+-----------------------+----------------------+ | HashJoin_27 | 142020.00 | 71010 | root | | time:963.983353ms, loops:72, build_hash_table:{total:775.961447ms, fetch:503.789677ms, build:272.17177ms}, probe:{concurrency:5, total:4.805454793s, max:963.973133ms, probe:922.156835ms, fetch:3.883297958s} | inner join, equal:[eq(test.t1.id, test.t2.id)] | 93.53974533081055 MB | 210.7459259033203 MB | | ├─TableReader_29(Build) | 142020.00 | 71010 | root | | time:504.062018ms, loops:72, cop_task: {num: 2, max: 509.276857ms, min: 402.66386ms, avg: 455.970358ms, p95: 509.276857ms, max_proc_keys: 39245, p95_proc_keys: 39245, tot_proc: 384ms, rpc_num: 2, rpc_time: 911.893237ms, copr_cache_hit_ratio: 0.00} | data:TableFullScan_28 | 210.20934200286865 MB | N/A | | │ └─TableFullScan_28 | 142020.00 | 71010 | cop[tikv] | table:t1 | proc max:88ms, min:72ms, p80:88ms, p95:88ms, iters:79, tasks:2 | keep order:false | N/A | N/A | | └─TableReader_31(Probe) | 180000.00 | 90000 | root | | time:363.058382ms, loops:91, cop_task: {num: 3, max: 412.659191ms, min: 358.489688ms, avg: 391.463008ms, p95: 412.659191ms, max_proc_keys: 31719, p95_proc_keys: 31719, tot_proc: 484ms, rpc_num: 3, rpc_time: 1.174326746s, copr_cache_hit_ratio: 0.00} | data:TableFullScan_30 | 267.11340618133545 MB | N/A | | └─TableFullScan_30 | 180000.00 | 90000 | cop[tikv] | table:t2 | proc max:92ms, min:64ms, p80:92ms, p95:92ms, iters:102, tasks:3 | keep order:false | N/A | N/A | +-----------------------------+-----------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+-----------------------+----------------------+ 5 rows in set (0.98 sec)

配置

Hash Join 算法的性能受以下系统变量影响:

  • tidb_mem_quota_query(默认值:1GB)- 如果某条查询的内存消耗超出了配额,TiDB 会尝试将 Hash Join 的 Build 端移到磁盘上以节省内存。
  • tidb_hash_join_concurrency(默认值:5)- 可以并发执行的 Hash Join 任务数量。

相关优化

TiDB 提供了 Runtime Filter 功能,针对 Hash Join 进行性能优化,大幅提升 Hash Join 的执行速度。具体优化使用方式见 Runtime Filter

Merge Join

Merge Join 是一种特殊的 Join 算法。当两个关联表要 Join 的字段需要按排好的顺序读取时,适用 Merge Join 算法。由于 Build 端和 Probe 端的数据都会读取,这种算法的 Join 操作是流式的,类似“拉链式合并”的高效版。Merge Join 占用的内存要远低于 Hash Join,但 Merge Join 不能并发执行。

下面是一个使用 Merge Join 的例子:

EXPLAIN SELECT /*+ MERGE_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
+-----------------------------+-----------+-----------+---------------+-------------------------------------------------------+ | id | estRows | task | access object | operator info | +-----------------------------+-----------+-----------+---------------+-------------------------------------------------------+ | MergeJoin_7 | 142020.00 | root | | inner join, left key:test.t1.id, right key:test.t2.id | | ├─TableReader_12(Build) | 180000.00 | root | | data:TableFullScan_11 | | │ └─TableFullScan_11 | 180000.00 | cop[tikv] | table:t2 | keep order:true | | └─TableReader_10(Probe) | 142020.00 | root | | data:TableFullScan_9 | | └─TableFullScan_9 | 142020.00 | cop[tikv] | table:t1 | keep order:true | +-----------------------------+-----------+-----------+---------------+-------------------------------------------------------+ 5 rows in set (0.00 sec)

TiDB 会按照以下顺序执行 Merge Join 算子:

  1. 从 Build 端把一个 Join Group 的数据全部读取到内存中。
  2. 读取 Probe 端的数据。
  3. 将 Probe 端的每行数据与 Build 端的一个完整 Join Group 比较,依次查看是否匹配(除了满足等值条件以外,还有其他非等值条件,这里的“匹配”主要是指查看是否满足非等值条件)。Join Group 指的是所有 Join Key 上值相同的数据。

文档内容是否有帮助?