优化09-表连接

发布于:2025-06-06 ⋅ 阅读:(20) ⋅ 点赞:(0)

一、表连接介绍

表连接类型

表连接是关系型数据库关键特性,在关系型数据库中,表连接分为三类:循环嵌套连接(Nested Loops Join)、哈希连接(Hash Join)、合并排序连接(Merge Sort Join)。假设存在表A和表B,都存在ID列,通过id列连接。

Nested Loops Join:遍历A中所有id,依次拿id和表B中的id对比。

Hash Join:对A中的id做HASH运算,放入多个HASH Bucket中,根据B表的id列的hash值和HASH Bucket匹配。

Merge Sort Join:对A表和B表的id列进行排序,按照排序结果进行连接。

表连接分析

表的访问次数,由执行计划的starts列来表示

  • NL连接驱动表被访问0次或1次,被驱动表被访问0次或N次,N的值取决于驱动表的返回行数。
  • HASH连接,驱动表和被驱动表都是被访问0次或1次,大部分场景是驱动表和被驱动表都访问1次。
  • MS连接,驱动表和被驱动表都是被访问0次或1次,大部分场景是驱动表和被驱动表都访问1次。

驱动顺序

  • NL连接的性能与驱动顺序有关,一般小表做驱动表,性能更好。
  • HASH连接的性能和驱动顺序有关,一般小表做驱动表。
  • MS连接的性能和驱动顺序无关。

是否排序

  • NL连接不排序、不多余消耗内存
  • HASH连接不排序,但是hash area多消耗内存
  • MS连接排序,消耗sort area内存

使用限制

  • NL连接支持各种写法,无限制。
  • HASH连接支持等值连接,不支持>、<、like、<>等。
  • MS连接不支持>、<、like、<>等

适用场景

  • NL连接一般适用于OLTP系统
  • HASH连接、MS连接适用于OLAP系统

二、表连接测试

创建测试数据

--创建表T1和T2
CREATE TABLE t1 (
id NUMBER NOT NULL,
n NUMBER,
contents VARCHAR2(4000));

CREATE TABLE t2 (
id NUMBER NOT NULL,
t1_id NUMBER NOT NULL,
n NUMBER,
contents VARCHAR2(4000));

--插入测试数据
execute dbms_random.seed(0);
INSERT INTO t1
SELECT  rownum,  rownum, dbms_random.string('a', 50)
FROM dual
CONNECT BY level <= 100
ORDER BY dbms_random.random;
INSERT INTO t2 SELECT rownum, rownum, rownum, dbms_random.string('b', 50) FROM dual CONNECT BY level <= 100000
ORDER BY dbms_random.random;
COMMIT;

--查看数据量 
SQL> select count(*) from t1;

  COUNT(*)
----------
       100
SQL> select count(*) from t2;

  COUNT(*)
----------
    100000

NL连接优化实验

--  两个表无索引执行计划
alter session set statistics_level=all;
select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.02 |    1016 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |      1 |      1 |00:00:00.02 |    1016 |  1000K|  1000K|  407K (0)|
|*  2 |   TABLE ACCESS FULL| T1   |      1 |      1 |      1 |00:00:00.01 |       7 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    103K|    100K|00:00:00.01 |    1006 |       |       |          |
----------------------------------------------------------------------------------------------------------------

--驱动表创建索引
 create index t1_n on t1(n);
 select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;
 select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
-------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |      1 |        |      1 |00:00:00.05 |    1008 |
|*  1 |  HASH JOIN                           |      |      1 |      1 |      1 |00:00:00.05 |    1008 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1   |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  3 |    INDEX RANGE SCAN                  | T1_N |      1 |      1 |      1 |00:00:00.01 |       1 |
|   4 |   TABLE ACCESS FULL                  | T2   |      1 |    103K|    100K|00:00:00.02 |    1006 |
-------------------------------------------------------------------------------------------------------
--被驱动表创建索引
CREATE INDEX t2_t1_id ON t2(t1_id);
select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name    |Starts| E-Rows|A-Rows|   A-Time   |Buffers|Reads |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |         |     1|       |     1|00:00:00.01 |      7|     4|
|   1 |  NESTED LOOPS                         |         |     1|      1|     1|00:00:00.01 |      7|     4|
|   2 |   NESTED LOOPS                        |         |     1|      1|     1|00:00:00.01 |      6|     4|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1      |     1|      1|     1|00:00:00.01 |      3|     0|
|*  4 |     INDEX RANGE SCAN                  | T1_N    |     1|      1|     1|00:00:00.01 |      2|     0|
|*  5 |    INDEX RANGE SCAN                   | T2_T1_ID|     1|      1|     1|00:00:00.01 |      3|     4|
|   6 |   TABLE ACCESS BY INDEX ROWID         | T2      |     1|      1|     1|00:00:00.01 |      1|     0|
-----------------------------------------------------------------------------------------------------------
--大表驱动小表
SELECT /*+ leading(t2) use_nl(t1) */ *
  2  FROM t1, t2
  3  WHERE t1.id = t2.t1_id
  4  AND t1.n = 19;
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:00.11 |    1013 |
|   1 |  NESTED LOOPS                |      |      1 |      1 |      1 |00:00:00.11 |    1013 |
|   2 |   NESTED LOOPS               |      |      1 |    103K|    100K|00:00:00.07 |    1011 |
|   3 |    TABLE ACCESS FULL         | T2   |      1 |    103K|    100K|00:00:00.01 |    1006 |
|*  4 |    INDEX RANGE SCAN          | T1_N |    100K|      1 |    100K|00:00:00.04 |       5 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T1   |    100K|      1 |      1 |00:00:00.02 |       2 |
-----------------------------------------------------------------------------------------------
--尽管两个表都建立了索引,但是如果大表驱动小表,反而消耗更大。
--总结:
--驱动表和被驱动表考虑创建索引
--确保小表驱动大表

HASH连接优化实验

--删除索引
 drop index t1_n;
 drop index t2_t1_id;
 select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.02 |    1016 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |      1 |      1 |00:00:00.02 |    1016 |  1000K|  1000K|  407K (0)|
|*  2 |   TABLE ACCESS FULL| T1   |      1 |      1 |      1 |00:00:00.01 |       7 |       |       |          |
|   3 |   TABLE ACCESS FULL| T2   |      1 |    103K|    100K|00:00:00.01 |    1006 |       |       |          |
----------------------------------------------------------------------------------------------------------------

--创建索引
create index idx_t1_n on t1(n);

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |      1 |        |      1 |00:00:00.06 |    1008 |       |       |          |
|*  1 |  HASH JOIN                           |          |      1 |      1 |      1 |00:00:00.06 |    1008 |  1000K|  1000K|  402K (0)|
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1       |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|*  3 |    INDEX RANGE SCAN                  | IDX_T1_N |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          |
|   4 |   TABLE ACCESS FULL                  | T2       |      1 |    100K|    100K|00:00:00.02 |    1006 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------

--总结
--驱动表和被驱动表考虑创建索引
--确保小表驱动大表
--由于HASH操作需要额外内存区域(hash area),如果需要做hash运算的数据太多,则会用的临时表空间,涉及磁盘IO会大大降低性能,索引尽量保证hash运算在内存中完成。

MS连接优化实验

--Merge Sort Join两表限制条件皆无索引

SQL> SELECT /*+ leading(t2) use_merge(t1)*/ *
  2  FROM t1, t2
  3  WHERE t1.id = t2.t1_id
  4   and t1.n=19;
  ---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |      1 |        |      1 |00:00:00.06 |    1007 |       |       |          |
|   1 |  MERGE JOIN                           |          |      1 |      1 |      1 |00:00:00.06 |    1007 |       |       |          |
|   2 |   SORT JOIN                           |          |      1 |    100K|     20 |00:00:00.06 |    1005 |  9762K|  1209K| 8677K (0)|
|   3 |    TABLE ACCESS FULL                  | T2       |      1 |    100K|    100K|00:00:00.01 |    1005 |       |       |          |
|*  4 |   SORT JOIN                           |          |     20 |      1 |      1 |00:00:00.01 |       2 |  2048 |  2048 | 2048  (0)|
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|*  6 |     INDEX RANGE SCAN                  | IDX_T1_N |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          |
------------------------------------------------------------------------------------------------------------------

--限制条件有索引
create index idx_t1_n on t1(n);
create index idx_t2_n on t2(n);
SQL> SELECT /*+ leading(t2) use_merge(t1)*/ *
  2  FROM t1, t2
  3  WHERE t1.id = t2.t1_id
  4  and t1.n=19
  5  and t2.n=12;

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |      1 |        |      0 |00:00:00.01 |       5 |       |       |          |
|   1 |  MERGE JOIN                           |          |      1 |      1 |      0 |00:00:00.01 |       5 |       |       |          |
|   2 |   SORT JOIN                           |          |      1 |      1 |      1 |00:00:00.01 |       3 |  2048 |  2048 | 2048  (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2       |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|*  4 |     INDEX RANGE SCAN                  | IDX_T2_N |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|*  5 |   SORT JOIN                           |          |      1 |      1 |      0 |00:00:00.01 |       2 |  2048 |  2048 | 2048  (0)|
|   6 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
|*  7 |     INDEX RANGE SCAN                  | IDX_T1_N |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

--连接条件创建索引
set autotrace traceonly
SQL> SELECT /*+ leading(t1) use_merge(t2)*/ *
  2  FROM t1, t2
  3  WHERE t1.id = t2.t1_id;
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1012  consistent gets
          0  physical reads
          0  redo size
      13999  bytes sent via SQL*Net to client
        673  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
        100  rows processed

CREATE INDEX idx_t1_id ON t1(id);
CREATE INDEX idx_t2_t1_id ON t2(t1_id);
set autotrace traceonly
SQL> SELECT /*+ leading(t1) use_merge(t2)*/ *
  2  FROM t1, t2
  3  WHERE t1.id = t2.t1_id;

100 rows selected.

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1021  consistent gets
          0  physical reads
          0  redo size
      13999  bytes sent via SQL*Net to client
        673  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed

网站公告

今日签到

点亮在社区的每一天
去签到