跳到主要内容

SQL优化

一条SQL经过优化器的解析生成执行计划,然后执行器根据表的存储引擎定义去调用接口读取数据,最后返回满足条件的查询结果。执行计划就是一条SQL在数据库中的执行过程或者访问路径的描述,执行计划就好比日常生活中你要从一个目的地去另一个目的地选择的交通工具,选择不同的交通工具,到达目的地的时间是不同的。

不同的执行计划,虽然最终的返回查询结果是相同的,但是由于执行过程的不同,性能有很大的差异。通常情况下,影响执行计划的常见因素有统计信息、是否使用临时表、是否排序、是否回表、分页查询的偏移量、优化器参数等,本文会介绍以下几个方面:

1)SQL的执行过程

2)StoneDB 执行计划的名词解释

3)StoneDB 常见的执行计划

4)常见的优化思想

5)表连接

SQL 的执行过程

1)客户端通过网络向服务端发送SQL语句,连接器负责与客户端建立连接,检查用户名密码等是否正确,从权限表里检查用户的权限

2)连接建立完成后,到查询缓存检查之前是否执行过的语句及结果,如果没有,继续下一步

3)分析器对查询语句进行词法分析(表、列是否存在)和语法分析(SQL语法是否正确)

4)优化器根据统计信息生成执行计划,并且选择代价最小的执行计划

5)执行器先判断对表是否有相关权限,然后根据表的引擎定义去使用这个引擎提供的接口取需要的行,如果需要的行所在的数据页在内存中,则直接返回给执行器,否则需要从磁盘读入内存,再返回给执行器

StoneDB 执行计划的名词解释

id:代表执行操作顺序,分别有三种不同的执行结果

1)id 值相同时,执行顺序由上而下

2)id 值不同时,id 值越大,越优先执行

3)id 值既有相同,又有不同时,id 值越大,越优先执行,id 值相同的,执行顺序由上而下

select_type:查询类型,主要用于区别普通查询、关联查询、子查询等

1)SIMPLE:单表查询

2)PRIMARY:主查询,含有子查询时,最外层的查询被标记为 PRIMARY

3)SUBQUERY:子查询,通常出现在 select 子句或 where 子句后面的子查询

4)DEPENDENT SUBQUERY:子查询的执行依赖于于外层查询,外层查询有多少行结果集,子查询就会被执行多少次

5)DERIVED:派生查询,通常出现在 from 子句后面的子查询

6)UNION:union 后面的查询语句

7)UNION RESULT:union 的结果集

table:当前步骤需要访问的表

partitions:匹配的分区

type:连接类型

1)eq_ref:主键或唯一索引扫描,只返回一条记录;或者在多表连接中,连接字段是主键或唯一索引,返回一行或多行记录,且每行记录是唯一的

2)ref:非唯一索引扫描,常用于非唯一索引等值查询,返回一行或多行记录

3)range:索引范围扫描,常用于<<=>>=between and 等操作

4)index_merge:索引合并,既可以是一张表的两个索引字段使用 or(Using union),又可以是一张表的两个索引字段使用 and(Using intersect)

5)index_subquery:外层查询关联子查询,子查询的连接字段有索引

6)all:全表扫描

possible_keys:可能选择的索引

key:实际选择的索引

key_len:使用的索引长度,用于判断组合索引是否被完全使用

注:不同字符集,各数据类型的 key_len 的长度计算公式是不同的

ref:指明当前表所参照的字段,如果是常量,则为 const,如果是连接字段,则是驱动表的连接字段

rows:执行过程中需要扫描的行数,该值越大,意味着需要扫描的行数越多,相应的耗时越长,是个预估值,由于 StoneDB 是列式存储,且数据是被高度压缩的,预估值和实际值会相差很大

filtered:表示存储引擎层返回的数据在Server层过滤后,剩下多少满足查询的记录数量的比例,5.7之前使用 explain extended 时会出现这个列,5.7 之后默认就有这个字段

extra:执行情况的说明和描述

1)Using where with pushed condition:表示在 Server 层对存储引擎层返回的数据做的过滤(和是否使用索引无关)

2)Using filesort:表示需要排序

3)Using temporary:表示需要创建一个临时表来存储结果集,例如:使用 union、distinct、group by、子查询、排序没有走索引等

4)Using union:表示根据至少两个索引得到的结果集,通常是至少两个索引字段使用 or

5)Using join buffer (Block Nested Loop):使用 Block Nested-Loop Join 算法,说明被驱动表的连接字段没有索引

StoneDB 常见的执行计划

索引扫描的执行计划

执行计划中,索引扫描的 type 有 eq_ref、ref、range、index_merge、index_subquery。

> explain select * from t_atomstore where id=1;
+----+-------------+-------------+------------+--------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+--------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t_atomstore | NULL | eq_ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL |
+----+-------------+-------------+------------+--------+---------------+---------+---------+-------+------+----------+-------+

> explain select first_name from t_atomstore where first_name='zhou';
+----+-------------+-------------+------------+------+---------------+---------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+---------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t_atomstore | NULL | ref | idx_firstname | idx_firstname | 32 | const | 10 | 100.00 | NULL |
+----+-------------+-------------+------------+------+---------------+---------------+---------+-------+------+----------+-------+
注:执行计划中,即使只查询索引列,不发生回表,Extra 也不会显示 "Using index",而是显示 "NULL"

> explain select * from t_atomstore where first_name in('zhou','liu');
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | t_atomstore | NULL | range | idx_firstname | idx_firstname | 32 | NULL | 20 | 100.00 | Using where with pushed condition (`test`.`t_atomstore`.`first_name` in ('周','刘'))(t0) Pckrows: 2, susp. 2 (0 empty 0 full). Conditions: 1 |
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------+

全表扫描的执行计划

执行计划中,全表扫描的 type 有 ALL,由于 StoneDB 是列式存储,且数据是被高度压缩的,大部分的查询是全表扫描。

> explain select first_name,count(*) from t_atomstore group by first_name;
+----+-------------+-------------+------------+------+---------------+------+---------+------+--------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+------+---------+------+--------+----------+---------------------------------+
| 1 | SIMPLE | t_atomstore | NULL | ALL | NULL | NULL | NULL | NULL | 100000 | 100.00 | Using temporary; Using filesort |
+----+-------------+-------------+------------+------+---------------+------+---------+------+--------+----------+---------------------------------+
1 row in set, 1 warning (0.00 sec)

执行计划中,即使字段 first_name 有索引,既消除了排序,又无需创建临时表,但优化器依然没有选择索引全扫描,而是选择了全表扫描。
注:这里出现一个 "warning",实际上是在 StoneDB 做了语句改写,即在分组字段后面加了 order by null,也就是说对返回的分组字段不需要做排序。而在 InnoDB 中,如果分组字段是非索引字段,默认需要对返回的分组字段做排序。

聚合运算的执行计划

由于 StoneDB 是列式存储,且数据是被高度压缩的,数据包节点记录了数据包的元数据,对于统计、聚合类查询,知识网格根据这些元数据能很快的得到结果。

> explain select first_name,sum(score) from t_test1 group by first_name;
+----+-------------+---------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------+
| 1 | SIMPLE | t_test1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000000 | 100.00 | Using temporary; Using filesort |
+----+-------------+---------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------+

关联查询的执行计划

> explain select t1.id,t1.first_name,t2.first_name from t_test1 t1,t_test2 t2 where t1.id=t2.id and t1.first_name='周';
+----+-------------+-------+------------+--------+---------------+---------+---------+----------+---------+----------+------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------+---------+----------+------------------------------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1000000 | 10.00 | Using where with pushed condition (`xx`.`t1`.`first_name` = '周')(t0) Pckrows: 16, susp. 16 (0 empty 0 full). Conditions: 1 |
| 1 | SIMPLE | t2 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | xx.t1.id | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------+---------+----------+------------------------------------------------------------------------------------------------------------------------------+
注:执行计划中,如果被驱动表的关联字段有索引,两表关联会使用 Index Nested-Loop Join 算法。

mysql> explain select t1.id,t1.first_name,t2.first_name from t_test1 t1,t_test2 t2 where t1.copy_id=t2.copy_id and t1.first_name='周';
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000000 | 10.00 | Using where with pushed condition (`xx`.`t1`.`first_name` = '周')(t0) Pckrows: 16, susp. 16 (0 empty 0 full). Conditions: 1 |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 1000000 | 10.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------------------------------------------------------------------------------------------+
注:如果被驱动表的关联字段没有索引,两表关联会使用 Block Nested-Loop Join 算法,这种表关联的性能就差很多。

子查询的执行计划

> explain select t1.first_name from t_test1 t1 where t1.id in (select t2.id from t_test2 t2 where t2.first_name='周');
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000000 | 100.00 | Using where with pushed condition <in_optimizer>(`xx`.`t1`.`id`,`xx`.`t1`.`id` in ( <materialize> (/* select#2 */ select `xx`.`t2`.`id` from `xx`.`t_test2` `t2` where (`xx`.`t2`.`first_name` = '周') ), <primary_index_lookup>(`xx`.`t1`.`id` in <temporary table> on <auto_key> where ((`xx`.`t1`.`id` = `materialized-subquery`.`id`)))))(t0) Pckrows: 16, susp. 16 (0 empty 0 full). Conditions: 1 |
| 2 | SUBQUERY | t2 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1000000 | 10.00 | Using where with pushed condition (`xx`.`t2`.`first_name` = '周')(t0) Pckrows: 16, susp. 16 (0 empty 0 full). Conditions: 1 |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

> explain select t1.first_name from t_test1 t1 where exists (select 1 from t_test2 t2 where t1.id=t2.id and t2.first_name='周');
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000000 | 100.00 | Using where with pushed condition exists(/* select#2 */ select 1 from `xx`.`t_test2` `t2` where ((`xx`.`t1`.`id` = `xx`.`t2`.`id`) and (`xx`.`t2`.`first_name` = '周')))(t0) Pckrows: 16, susp. 16 (0 empty 0 full). Conditions: 1 |
| 2 | DEPENDENT SUBQUERY | t2 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | xx.t1.id | 1 | 10.00 | Using where with pushed condition (`xx`.`t2`.`first_name` = '周')(t0) Pckrows: 16, susp. 16 (0 empty 0 full). Conditions: 1 |
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

常见的优化思想

group by 优化思想

在 MySQL 中,group by 需要先排序后分组,如果分组的字段是个索引字段,能规避临时表的创建和消除排序。而在 StoneDB 中,由于数据包节点记录了数据包的元数据,对于统计、聚合类查询,知识网格根据这些元数据能很快的得到结果,因此 StoneDB 无需在分组字段上创建索引。

in/exists 优化思想

小表驱动大表,即小的结果集驱动大的结果集。

select * from A where id in (select id from B);

优化器会先执行子查询,将子查询的结果集当作外层查询的参数,等价于:

for select id from B
for select * from A where A.id = B.id

当B表的结果集小于 A 表的结果集时,使用 in 优于 exists,如下所示:

> select count(*) from t_test;
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.00 sec)

> select count(*) from t_test2;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.00 sec)

> select count(*) from t_test2 where id in (select id from t_test);
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.00 sec)

> select count(*) from t_test2 where exists (select 1 from t_test where t_test.id=t_test2.id);
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (4.19 sec)

注:使用 exists 写法是大表驱动小表,执行时间是 4.19s。

select * from A where exists (select 1 from B where B.id = A.id);

优化器会将外层查询值代入子查询,子查询依赖外层查询,等价于:

for select * from A
for select * from B where B.id = A.id

当 A 表的结果集小于 B 表的结果集时,使用 exists 优于 in,如下所示:

> select count(*) from t_test where id in (select id from t_test2);
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.55 sec)

> select count(*) from t_test where exists (select 1 from t_test2 where t_test2.id=t_test.id);
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.03 sec)

注:使用 in 写法是大表驱动小表,执行时间是 0.55s。

in 与 exists 相互转换

in 与 exists 相互转换的前提条件:如果子查询的关联字段是非空的,那么 in 和 exists 就可以互相转换,且 in 转换为 exists 的目的是可以让子查询的关联字段使用索引,如下所示:

mysql> explain select * from t_test1 t1 where t1.id in (select t2.id from t_test2 t2 where t2.first_name='周');
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000000 | 100.00 | Using where with pushed condition <in_optimizer>(`xx`.`t1`.`id`,`xx`.`t1`.`id` in ( <materialize> (/* select#2 */ select `xx`.`t2`.`id` from `xx`.`t_test2` `t2` where (`xx`.`t2`.`first_name` = '周') ), <primary_index_lookup>(`xx`.`t1`.`id` in <temporary table> on <auto_key> where ((`xx`.`t1`.`id` = `materialized-subquery`.`id`)))))(t0) Pckrows: 16, susp. 16 (0 empty 0 full). Conditions: 1 |
| 2 | SUBQUERY | t2 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1000000 | 10.00 | Using where with pushed condition (`xx`.`t2`.`first_name` = '周')(t0) Pckrows: 16, susp. 16 (0 empty 0 full). Conditions: 1 |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> explain select * from t_test1 t1 where exists (select 1 from t_test2 t2 where t1.id=t2.id and t2.first_name='周');
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000000 | 100.00 | Using where with pushed condition exists(/* select#2 */ select 1 from `xx`.`t_test2` `t2` where ((`xx`.`t1`.`id` = `xx`.`t2`.`id`) and (`xx`.`t2`.`first_name` = '周')))(t0) Pckrows: 16, susp. 16 (0 empty 0 full). Conditions: 1 |
| 2 | DEPENDENT SUBQUERY | t2 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | xx.t1.id | 1 | 10.00 | Using where with pushed condition (`xx`.`t2`.`first_name` = '周')(t0) Pckrows: 16, susp. 16 (0 empty 0 full). Conditions: 1 |
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set, 2 warnings (0.00 sec)

分页优化思想

将分页查询改为子查询,并且只查询主键,好处是子查询不用回表,性能是很高的。然后子查询作为结果集和主表关联,如下所示:

> select * from t_test1 order by id asc limit 949420,10;
+--------+------------+-----------+-----+-------+---------+
| id | first_name | last_name | sex | score | copy_id |
+--------+------------+-----------+-----+-------+---------+
| 949421 | hou | tianli | 1 | 72 | 949421 |
| 949422 | li | jingqi | 1 | 45 | 949422 |
| 949423 | gao | jingqi | 1 | 84 | 949423 |
| 949424 | chen | liyi | 1 | 80 | 949424 |
| 949425 | ruan | liyi | 0 | 53 | 949425 |
| 949426 | lin | liyi | 0 | 92 | 949426 |
| 949427 | sun | yi | 0 | 92 | 949427 |
| 949428 | li | chengyi | 0 | 71 | 949428 |
| 949429 | yang | chengyi | 0 | 65 | 949429 |
| 949430 | chen | chengyi | 1 | 81 | 949430 |
+--------+------------+-----------+-----+-------+---------+
10 rows in set (0.41 sec)

> select a.* from t_test1 a,(select id from t_test1 order by id asc limit 949420, 10) b where a.id=b.id;
+--------+------------+-----------+-----+-------+---------+
| id | first_name | last_name | sex | score | copy_id |
+--------+------------+-----------+-----+-------+---------+
| 949421 | hou | tianli | 1 | 72 | 949421 |
| 949422 | li | jingqi | 1 | 45 | 949422 |
| 949423 | gao | jingqi | 1 | 84 | 949423 |
| 949424 | chen | liyi | 1 | 80 | 949424 |
| 949425 | ruan | liyi | 0 | 53 | 949425 |
| 949426 | lin | liyi | 0 | 92 | 949426 |
| 949427 | sun | yi | 0 | 92 | 949427 |
| 949428 | li | chengyi | 0 | 71 | 949428 |
| 949429 | yang | chengyi | 0 | 65 | 949429 |
| 949430 | chen | chengyi | 1 | 81 | 949430 |
+--------+------------+-----------+-----+-------+---------+
10 rows in set (0.13 sec)

表连接

Nested Loop Join

具体执行步骤:

1)首先优化器会按照一定的规则来决定表 T1 和表 T2 中谁是驱动表,谁是被驱动表,驱动表用于外层循环,被驱动表用于内层循环(这里假设驱动表是 T1,被驱动表是 T2);

2)然后以目标SQL中指定的谓词条件访问驱动表 T1,得到的结果集记录为1;

3)遍历驱动结果集1,并同时遍历被驱动表 T2,即先取出驱动结果集1中的第1条记录,接着遍历被驱动表 T2,并按照连接条件去判断 T2 中是否存在匹配的记录,直到遍历完驱动结果集1中的所有记录为止。

如果被驱动表的连接字段有索引,第3步是通过索引得到满足条件的行的,如果驱动表100行,被驱动表1000行,那么被驱动表需要执行100次,每次扫描1行,整个过程需要扫描行数是200。

如果被驱动表的连接字段没有索引,第3步是通过全表扫描得到满足条件的行的,如果驱动表100行,被驱动表1000行,那么被驱动表需要执行100次,每次扫描1000行,整个过程需要扫描行数是100100。

Nested-Loop Join 适用场景:表关联时,返回少量数据行。

Hash Join

假设有两张表,A 表和 B 表,分别有10万条数据和100万条数据,A 表中 ID 从1到10万,B 表中 ID 从1到100万,两张表通过 ID 字段做关联,即SQL语句示例如下:

SELECT * FROM A,B WHERE A.ID=B.ID

1)优化器会选择 A 表为驱动表,B 表为被驱动表,在内存中建立一张 hash table;

2)全表扫描 A 表,使用 Hash 函数计算出表关联字段的 hash value,并存放在 hash table;

3)全表扫描 B 表,使用 Hash 函数计算出表关联字段的 hash value;

4)将 B 表的 hash value 与内存中的 hash table 进行匹配,如果匹配成功,则返回数据,否则这行就被丢弃。

Hash Join 适用场景:表关联时,返回大量数据行。

Sort Merge Join

具体执行步骤:

1)首先以目标SQL中指定的谓词条件访问表 T1,得到的结果集按照表 T1 中的连接列排序,排好序的结果集记为结果集1;

2)然后以目标SQL中指定的谓词条件访问表 T2,得到的结果集按照表 T2 中的连接列排序,排好序的结果集记为结果集2;

3)最后遍历结果集1,即先取出结果集1中的第1条记录和结果集2中按照连接条件判断是否存在匹配记录,直到遍历完结果集中所有的记录。