2024-05-08 postgres-火山模型-执行-记录

发布于:2024-05-09 ⋅ 阅读:(25) ⋅ 点赞:(0)

摘要:

2024-05-08 postgres-火山模型-执行-记录

上下文: 2024-05-08 postgres-调试及分析-记录-CSDN博客

火山模型:

  1. 数据流是在查询树上,自上而下进行拉取,由上而下的调用。树本身就表明了数据的流动。
  2. 每次执行一个元组,也就类似于迭代器的模式。
  3. 执行到最底层,是scan table算子,一次获取一行数据。
  4. 上层的算子不断地GetNext的调用下层算子,在本算子进行运算。

查询执行计划:

d1=# EXPLAIN ANALYZE VERBOSE    
d1-# SELECT * FROM  t1 LEFT JOIN t2 ON t2.a = t1.a WHERE t2.b < 5;
***(Single step mode: verify command)*******************************************
EXPLAIN ANALYZE VERBOSE    
SELECT * FROM  t1 LEFT JOIN t2 ON t2.a = t1.a WHERE t2.b < 5;
***(press return to proceed or enter x and return to cancel)********************

                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=232.74..364.14 rows=8509 width=16) (actual time=0.032..0.035 rows=1 loops=1)
   Output: t1.a, t1.b, t2.a, t2.b
   Merge Cond: (t2.a = t1.a)
   ->  Sort  (cost=74.23..76.11 rows=753 width=8) (actual time=0.020..0.020 rows=2 loops=1)
         Output: t2.a, t2.b
         Sort Key: t2.a
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on public.t2  (cost=0.00..38.25 rows=753 width=8) (actual time=0.010..0.012 rows=2 loops=1)
               Output: t2.a, t2.b
               Filter: (t2.b < 5)
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8) (actual time=0.008..0.008 rows=2 loops=1)
         Output: t1.a, t1.b
         Sort Key: t1.a
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on public.t1  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.002..0.003 rows=2 loops=1)
               Output: t1.a, t1.b
 Planning Time: 0.407 ms
 Execution Time: 0.080 ms
(18 rows)

函数调用堆栈:

#0  heapgettup_pagemode (scan=0x1443958, dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:917
#1  0x00000000004db32a in heap_getnextslot (sscan=0x1443958, direction=ForwardScanDirection, slot=0x1432a78) at heapam.c:1398
#2  0x0000000000730ec5 in table_scan_getnextslot (sscan=0x1443958, direction=ForwardScanDirection, slot=0x1432a78) at ../../../src/include/access/tableam.h:1044
#3  0x0000000000730f97 in SeqNext (node=0x14328d8) at nodeSeqscan.c:80
#4  0x00000000006f860d in ExecScanFetch (node=0x14328d8, accessMtd=0x730efe <SeqNext>, recheckMtd=0x730fa8 <SeqRecheck>) at execScan.c:133
#5  0x00000000006f86b3 in ExecScan (node=0x14328d8, accessMtd=0x730efe <SeqNext>, recheckMtd=0x730fa8 <SeqRecheck>) at execScan.c:199
#6  0x0000000000730ff3 in ExecSeqScan (pstate=0x14328d8) at nodeSeqscan.c:112
#7  0x0000000000732343 in ExecProcNode (node=0x14328d8) at ../../../src/include/executor/executor.h:257
#8  0x000000000073248a in ExecSort (pstate=0x14326c8) at nodeSort.c:108
#9  0x00000000006f4ca9 in ExecProcNodeFirst (node=0x14326c8) at execProcnode.c:463
#10 0x0000000000726e97 in ExecProcNode (node=0x14326c8) at ../../../src/include/executor/executor.h:257
#11 0x0000000000727af0 in ExecMergeJoin (pstate=0x14322b8) at nodeMergejoin.c:656
#12 0x00000000006f4ca9 in ExecProcNodeFirst (node=0x14322b8) at execProcnode.c:463
#13 0x00000000006ea204 in ExecProcNode (node=0x14322b8) at ../../../src/include/executor/executor.h:257
#14 0x00000000006ec6bb in ExecutePlan (estate=0x1432078, planstate=0x14322b8, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, 
    direction=ForwardScanDirection, dest=0x1423f98, execute_once=true) at execMain.c:1551
#15 0x00000000006ea76a in standard_ExecutorRun (queryDesc=0x136dfc8, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:361
#16 0x00000000006ea602 in ExecutorRun (queryDesc=0x136dfc8, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:305
#17 0x000000000090c03e in PortalRunSelect (portal=0x13ad5d8, forward=true, count=0, dest=0x1423f98) at pquery.c:921
#18 0x000000000090bd2d in PortalRun (portal=0x13ad5d8, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1423f98, altdest=0x1423f98, qc=0x7ffff3ea58b0)
    at pquery.c:765
#19 0x0000000000905d39 in exec_simple_query (query_string=0x134a598 "SELECT * FROM  t1 LEFT JOIN t2 ON t2.a = t1.a WHERE t2.b < 5;") at postgres.c:1214
#20 0x000000000090a0ef in PostgresMain (argc=1, argv=0x7ffff3ea5b40, dbname=0x13775d8 "d1", username=0x1345a48 "kevin") at postgres.c:4496
#21 0x0000000000857a54 in BackendRun (port=0x136f010) at postmaster.c:4530
#22 0x00000000008573c1 in BackendStartup (port=0x136f010) at postmaster.c:4252
#23 0x0000000000853b10 in ServerLoop () at postmaster.c:1745
#24 0x00000000008533c9 in PostmasterMain (argc=1, argv=0x1343a00) at postmaster.c:1417
#25 0x0000000000760270 in main (argc=1, argv=0x1343a00) at main.c:209

分析:

  1. 从查询执行的函数调用堆栈,可以看到很明确的在查询树中,由上层算子,调用下层算子,数据的流动,在查询树中由上而下的进行拉取
  2. 最底层执行的算子是ExecScanFetch,一次获取一行的数据
  3. pg的查询执行的抽象程度很好,每个算子抽象成node, 整体大的框架确定后,每个算子单独进行物理执行的实现

参考:

PostgreSQL 基于heap表 存储引擎实现原理 - 知乎 (zhihu.com)