创建索引:GaussDB(PostgreSQL)开发笔记

发布于:2025-05-22 ⋅ 阅读:(23) ⋅ 点赞:(0)

一、查看数据库和表信息

(一)查看数据库版本

  • SELECT version()

PostgreSQL 9.2.4 (GaussDB 9.1.0 build 53f3b265) compiled at 2025-03-21 14:49:19 commit 9292 last mr 24793

(二)查看表信息

  • 使用 \d** 命令** :在 gsql 命令行工具中,执行 \d table_name 查看表的基本信息,包括列、数据类型、索引等。

  • 使用 \d+** 命令** :执行 \d+ table_name 查看更详细的表信息,如列的详细信息、表的存储选项等。

  • 查询 information_schema.columns** 视图** :执行 SELECT * FROM information_schema.columns WHERE table_name = 'table_name',获取指定表的列信息,包括列名、数据类型、是否可为空等。

  • 查询 pg_tables** 系统表** :执行 SELECT * FROM pg_tables WHERE tablename = 'table_name',查看表的基本信息,如表名、所属模式、表空间等。

select count(1) from table_for_test_index

-- 总数 4579369

二、索引管理

(一)查看索引

  • 使用 \d** 命令** :在 gsql 命令行工具中,执行 \d table_name 查看表的索引信息。

  • 查询 pg_indexes** 系统表** :执行 SELECT * FROM pg_indexes WHERE tablename = 'table_name',查看指定表的索引信息,包括索引名、表名、索引定义等。

(二)创建索引

  • 创建普通索引CREATE INDEX index_name ON table_name (column_name);

  • 创建唯一索引CREATE UNIQUE INDEX index_name ON table_name (column_name);

三、查询优化

(一)执行计划分析

  • 使用 EXPLAIN ANALYZE :查看查询的执行计划,分析查询的性能。关键指标包括操作节点的执行时间(A-time)、返回的行数(A-rows)、估计的行数(E-rows)、内存使用(Peak Memory)等。

(二)优化建议

  • 增加索引 :为查询条件中的列创建索引,加速查询过滤和排序。

  • 优化查询语句 :确保查询条件的顺序合理,使数据库可以利用索引。

  • 定期维护索引 :定期对表进行 VACUUMANALYZE 操作,以保持索引和统计信息的准确性。

四、数据库维护操作

(一)VACUUM 操作

  • 作用 :回收删除或更新操作产生的废弃空间,更新数据库的可见性信息,减少磁盘空间的浪费,提高查询性能。

  • 使用方法VACUUM [VERBOSE] [TABLE_NAME];

  • 示例VACUUM VERBOSE table_for_test_index;

(二)ANALYZE 操作

  • 作用 :收集表的统计信息,帮助查询规划器生成更优的查询执行计划。

  • 使用方法ANALYZE [VERBOSE] [TABLE_NAME];

  • 示例ANALYZE VERBOSE table_for_test_index;

(三)组合命令

  • VACUUM ANALYZE :先执行 VACUUM 回收空间,然后执行 ANALYZE 更新统计信息。

  • 示例VACUUM ANALYZE activity;

五、案例分析

create table table_for_test_index
(
    data_id                      bigint,
    inspect_status               text,
    item_name            varchar(4000),
    item_content         varchar(4000),
    item_result          text,
    province_code        varchar(400),
    province_name        varchar(400),
    area_code            varchar(400),
    area_name            varchar(400),
    sub_dealer_id        integer,
    sub_dealer_name      varchar(400),
    team_id              integer,
    team_name            varchar(400),
)

-- 计划创建索引 --场景二 单一索引 CREATE INDEX idx_table_for_test_index_confirm_first_time ON table_for_test_index(confirm_first_time); --场景四 联合索引 CREATE INDEX idx_confirm_first_time_team_id ON table_for_test_index (confirm_first_time, team_id);

-- 执行目标sql

explain analyse
 select order_no,
        item_name,
        item_content,
        item_result,
        province_name,
        area_name,
        dealer_name,
        dealer_team_name,
        supo_dealer_manager_name,
        to_char(fst_pass_time, 'YYYY-MM-DD') fst_pass_time,
        to_char(confirm_first_time, 'YYYY-MM-DD') as            confirm_first_time
 from table_for_test_index
 where 1 = 1
   AND confirm_first_time >= '2025-02-28T00:00'
   and '2025-05-22T23:59:59.999999999' >= confirm_first_time
   and item_name in ('固定', '焊接', '线缆', '阴影')
   and item_result in ('不合格')
   and area_code in
       ('aa','cc','vv','ss')
   and province_code in ('a')
   and sub_dealer_id in (11)
   and team_id in (1122)
 order by data_id
 LIMIT 10

(一)场景 1:不加索引

 id |                         operation                          |        A-time        | A-rows | E-rows |  Peak Memory   | E-memory | A-width | E-width |  E-costs  
----+------------------------------------------------------------+----------------------+--------+--------+----------------+----------+---------+---------+-----------
  1 | ->  Limit                                                  | 1448.363             |      1 |     10 | 8KB            |          |         |     307 | 136675.50 
  2 |    ->  Streaming (type: GATHER)                            | 1448.360             |      1 |     30 | 160KB          |          |         |     307 | 136679.86 
"  3 |       ->  Limit                                            | [1396.774, 1425.354] |      1 |     30 | [8KB, 8KB]     | 1MB      |         |     307 | 136671.86 "
"  4 |          ->  Sort                                          | [1396.769, 1425.350] |      1 |    663 | [40KB, 40KB]   | 16MB     | [0,360] |     307 | 136672.38 "
"  5 |             ->  Subquery Scan on table_for_test_index  | [1396.596, 1425.256] |      1 |    662 | [24KB, 32KB]   | 1MB      |         |     307 | 136667.05 "
"  6 |                ->  Hash Semi Join (7, 8)                   | [1396.594, 1425.183] |      1 |    662 | [24KB, 32KB]   | 1MB      |         |     307 | 136665.94 "
"  7 |                   ->  Seq Scan on table_for_test_index | [1396.586, 1424.857] |      1 |    662 | [80KB, 80KB]   | 1MB      |         |     314 | 136662.74 "
"  8 |                   ->  Hash                                 | [0.076, 0.076]       |     88 |    264 | [296KB, 296KB] | 16MB     | [23,23] |      32 | 1.10      "
"  9 |                      ->  Values Scan on ""*VALUES*""         | [0.031, 0.031]       |     88 |    264 | [16KB, 16KB]   | 1MB      |         |      32 | 1.10      "------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
"  6 --Hash Semi Join (7, 8)"
"        Hash Cond: ((table_for_test_index.area_code)::text = ""*VALUES*"".column1)"
  7 --Seq Scan on table_for_test_index
"        Filter: ((confirm_first_time >= '2025-02-28 00:00:00'::timestamp without time zone) AND ('2025-05-23 00:00:00'::timestamp without time zone >= confirm_first_time) AND (item_result = '不合格'::text) AND ((province_code)::text = '45'::text) AND (sub_dealer_id = 1077) AND (team_id = 1122) AND ((item_name)::text = ANY ('{压块固定问题,焊接防腐,线缆虚接,阴影遮挡}'::text[])))"
        Rows Removed by Filter: 4579368
""
           Memory Information (identified by plan id)            
-----------------------------------------------------------------
Coordinator Query Peak Memory:
        Query Peak Memory: 0MB
Datanode:
        Max Query Peak Memory: 0MB
        Min Query Peak Memory: 0MB
  4 --Sort
        Sort Method: quicksort  Memory: 25kB ~ 25kB
  8 --Hash
        Max Buckets: 32768  Max Batches: 1  Max Memory Usage: 4kB
        Min Buckets: 0  Min Batches: 0  Min Memory Usage: 0kB
""
                                 User Define Profiling                                  
----------------------------------------------------------------------------------------
Plan Node id: 2  Track name: coordinator get datanode connection
" (actual time=[9.030, 9.030], calls=[1, 1])"
Plan Node id: 2  Track name: coordinator begin transaction
" (actual time=[0.002, 0.002], calls=[1, 1])"
Plan Node id: 2  Track name: coordinator send command
" (actual time=[0.091, 0.091], calls=[3, 3])"
Plan Node id: 2  Track name: coordinator get the first tuple
" (actual time=[37.865, 37.865], calls=[1, 1])"
Plan Node id: 7  Track name: heap scan read buffer
" (sample time=[72.655, 77.479], estimated time=[726.567, 774.834], calls=[87041, 87065])"
""
                            ====== Query Summary =====                            
----------------------------------------------------------------------------------
"Datanode executor start time [dn_6005_6006, dn_6001_6002]: [0.358 ms,0.399 ms]"
"Datanode executor run time [dn_6005_6006, dn_6003_6004]: [1398.015 ms,1435.453 ms]"
"Datanode executor end time [dn_6003_6004, dn_6001_6002]: [0.031 ms,0.037 ms]"
System available mem: 4546560KB
Query Max mem: 4546560KB
Query estimated mem: 32768KB
Enqueue time: 34.512 ms
Coordinator executor start time: 0.203 ms
Coordinator executor run time: 1450.309 ms
Coordinator executor end time: 0.041 ms
Parser runtime: 0.000 ms
Planner runtime: 2.931 ms
Query Id: 73464969046324330
Unique SQL Id: 2238842644
Unique SQL Hash: sql_07acf26f0590c5ec805e5875efec3870
Total runtime: 1488.180 ms

  • 执行时间 :总执行时间为 1488.180 ms

  • 主要操作节点

    • 顺序扫描(Seq Scan) :由于没有索引,数据库需要全表顺序扫描来查找符合条件的记录,这导致大量数据被扫描(Rows Removed by Filter: 4579368),占用了大部分查询时间。

    • 哈希半连接(Hash Semi Join) :用于连接两个数据集。一个数据集来自顺序扫描,另一个来自一个值扫描(Values Scan)。连接条件是 install_area_code

    • 排序(Sort) :在结果集中按 data_id 排序,使用了快速排序算法,内存使用为 25kB。

    • 限制(Limit) :限制返回的行数为 10,有助于减少最终返回的数据量。

  • 问题分析 :顺序扫描导致查询性能较差,大量数据被扫描后又被过滤掉,浪费了大量时间。

(二)场景 2:增加 confirm_first_time 索引

 id |                             operation                              |       A-time       | A-rows  | E-rows  |  Peak Memory   | E-memory | A-width | E-width |  E-costs  
----+--------------------------------------------------------------------+--------------------+---------+---------+----------------+----------+---------+---------+-----------
  1 | ->  Limit                                                          | 636.978            |       1 |      10 | 8KB            |          |         |     307 | 102387.55 
  2 |    ->  Streaming (type: GATHER)                                    | 636.974            |       1 |      30 | 160KB          |          |         |     307 | 102391.90 
"  3 |       ->  Limit                                                    | [614.182, 626.947] |       1 |      30 | [8KB, 8KB]     | 1MB      |         |     307 | 102383.90 "
"  4 |          ->  Sort                                                  | [614.180, 626.944] |       1 |     663 | [40KB, 40KB]   | 16MB     | [0,360] |     307 | 102384.43 "
"  5 |             ->  Subquery Scan on table_for_test_index          | [614.131, 626.898] |       1 |     662 | [24KB, 32KB]   | 1MB      |         |     307 | 102379.10 "
"  6 |                ->  Hash Semi Join (7, 9)                           | [614.128, 626.895] |       1 |     662 | [24KB, 32KB]   | 1MB      |         |     307 | 102377.99 "
"  7 |                   ->  Bitmap Heap Scan on table_for_test_index | [613.883, 626.598] |       1 |     662 | [72KB, 72KB]   | 1MB      |         |     314 | 102374.79 "
"  8 |                      ->  Bitmap Index Scan                         | [69.465, 75.358]   | 1082697 | 1081447 | [4MB, 4MB]     | 1MB      |         |       0 | 3609.07   "
"  9 |                   ->  Hash                                         | [0.071, 0.083]     |     264 |     264 | [296KB, 296KB] | 16MB     | [23,23] |      32 | 1.10      "
" 10 |                      ->  Values Scan on ""*VALUES*""                 | [0.027, 0.045]     |     264 |     264 | [16KB, 16KB]   | 1MB      |         |      32 | 1.10      "
""
                                                                                                                                 Predicate Information (identified by plan id)                                                                                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
"  6 --Hash Semi Join (7, 9)"
"        Hash Cond: ((table_for_test_index.area_code)::text = ""*VALUES*"".column1)"
  7 --Bitmap Heap Scan on table_for_test_index
        Recheck Cond: ((confirm_frist_time >= '2025-02-28 00:00:00'::timestamp without time zone) AND ('2025-05-23 00:00:00'::timestamp without time zone >= confirm_time))
"        Filter: ((item_result = '不合格'::text) AND ((province_code)::text = '45'::text) AND (sub_dealer_id = 1077) AND (team_id = 1122) AND ((item_name)::text = ANY ('{压块,焊接,线缆,阴影}'::text[])))"
        Rows Removed by Filter: 1082696
  8 --Bitmap Index Scan
        Index Cond: ((confirm_first_time >= '2025-02-28 00:00:00'::timestamp without time zone) AND ('2025-05-23 00:00:00'::timestamp without time zone >= confirm_first_time))
""
           Memory Information (identified by plan id)            
-----------------------------------------------------------------
Coordinator Query Peak Memory:
        Query Peak Memory: 2MB
Datanode:
        Max Query Peak Memory: 5MB
        Min Query Peak Memory: 5MB
  4 --Sort
        Sort Method: quicksort  Memory: 25kB ~ 25kB
  9 --Hash
        Max Buckets: 32768  Max Batches: 1  Max Memory Usage: 4kB
        Min Buckets: 32768  Min Batches: 1  Min Memory Usage: 4kB
""
                                 User Define Profiling                                  
----------------------------------------------------------------------------------------
Plan Node id: 2  Track name: coordinator get datanode connection
" (actual time=[5.730, 5.730], calls=[1, 1])"
Plan Node id: 2  Track name: coordinator begin transaction
" (actual time=[0.003, 0.003], calls=[1, 1])"
Plan Node id: 2  Track name: coordinator send command
" (actual time=[0.141, 0.141], calls=[3, 3])"
Plan Node id: 2  Track name: coordinator get the first tuple
" (actual time=[13.444, 13.444], calls=[1, 1])"
Plan Node id: 8  Track name: bitmap heap scan read buffer
" (sample time=[35.496, 36.298], estimated time=[355.004, 363.007], calls=[40175, 40206])"
""
                           ====== Query Summary =====                           
--------------------------------------------------------------------------------
"Datanode executor start time [dn_6005_6006, dn_6003_6004]: [0.413 ms,0.486 ms]"
"Datanode executor run time [dn_6005_6006, dn_6003_6004]: [614.189 ms,626.956 ms]"
"Datanode executor end time [dn_6005_6006, dn_6001_6002]: [0.054 ms,0.156 ms]"
System available mem: 4546560KB
Query Max mem: 4546560KB
Query estimated mem: 32768KB
Enqueue time: 5.010 ms
Coordinator executor start time: 0.317 ms
Coordinator executor run time: 637.256 ms
Coordinator executor end time: 0.043 ms
Parser runtime: 0.000 ms
Planner runtime: 4.348 ms
Query Id: 73464969046932357
Unique SQL Id: 2238842644
Unique SQL Hash: sql_07acf26f0590c5ec805e5875efec3870
Total runtime: 647.191 ms

  • 执行时间 :总执行时间为 647.191 ms

  • 主要操作节点

    • 位图索引扫描(Bitmap Index Scan) :利用 confirm_first_time 索引快速找到符合条件的记录。这一操作实际返回了 1082697 行,实际时间为 [69.465, 75.358] 毫秒,相比顺序扫描,大大减少了需要处理的数据量。

    • 位图堆扫描(Bitmap Heap Scan) :根据位图索引扫描的结果,从堆中获取完整的行数据,并应用其他过滤条件(如 inspect_item_resultsrt_unqualified_cnt_flag 等)。

    • 排序(Sort) :在结果集中按 data_id 排序,使用了快速排序算法,内存使用为 25kB。

    • 限制(Limit) :限制返回的行数为 10,有助于减少最终返回的数据量。

  • 优化效果 :相比未增加索引时,查询时间减少了一半以上,说明索引的添加对查询性能有显著的提升。

(三)场景 3:增加 confirm_first_timedata_id 索引

 id |                             operation                              |       A-time       | A-rows  | E-rows  |  Peak Memory   | E-memory | A-width | E-width |  E-costs  
----+--------------------------------------------------------------------+--------------------+---------+---------+----------------+----------+---------+---------+-----------
  1 | ->  Limit                                                          | 628.006            |       1 |      10 | 8KB            |          |         |     307 | 102387.55 
  2 |    ->  Streaming (type: GATHER)                                    | 628.002            |       1 |      30 | 160KB          |          |         |     307 | 102391.90 
"  3 |       ->  Limit                                                    | [610.628, 617.545] |       1 |      30 | [8KB, 8KB]     | 1MB      |         |     307 | 102383.90 "
"  4 |          ->  Sort                                                  | [610.625, 617.543] |       1 |     663 | [40KB, 40KB]   | 16MB     | [0,360] |     307 | 102384.43 "
"  5 |             ->  Subquery Scan on table_for_test_index          | [610.567, 617.491] |       1 |     662 | [24KB, 32KB]   | 1MB      |         |     307 | 102379.10 "
"  6 |                ->  Hash Semi Join (7, 9)                           | [610.491, 617.487] |       1 |     662 | [24KB, 32KB]   | 1MB      |         |     307 | 102377.99 "
"  7 |                   ->  Bitmap Heap Scan on table_for_test_index | [610.248, 617.215] |       1 |     662 | [72KB, 72KB]   | 1MB      |         |     314 | 102374.79 "
"  8 |                      ->  Bitmap Index Scan                         | [68.714, 82.117]   | 1082697 | 1081447 | [4MB, 4MB]     | 1MB      |         |       0 | 3609.07   "
"  9 |                   ->  Hash                                         | [0.071, 0.074]     |     264 |     264 | [296KB, 296KB] | 16MB     | [23,23] |      32 | 1.10      "
" 10 |                      ->  Values Scan on ""*VALUES*""                 | [0.024, 0.028]     |     264 |     264 | [16KB, 16KB]   | 1MB      |         |      32 | 1.10      "
""
                                                                                                                                 Predicate Information (identified by plan id)                                                                                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
"  6 --Hash Semi Join (7, 9)"
"        Hash Cond: ((table_for_test_index.area_code)::text = ""*VALUES*"".column1)"
  7 --Bitmap Heap Scan on table_for_test_index
        Recheck Cond: ((confirm_time >= '2025-02-28 00:00:00'::timestamp without time zone) AND ('2025-05-23 00:00:00'::timestamp without time zone >= confirm_time))
"        Filter: ((item_result = '不合格'::text) AND AND ((province_code)::text = '45'::text) AND (sub_dealer_id = 1077) AND (team_id = 1122) AND ((item_name)::text = ANY ('{压块,焊接,线缆,阴影}'::text[])))"
        Rows Removed by Filter: 1082696
  8 --Bitmap Index Scan
        Index Cond: ((confirm_first_time >= '2025-02-28 00:00:00'::timestamp without time zone) AND ('2025-05-23 00:00:00'::timestamp without time zone >= confirm_first_time))
""
           Memory Information (identified by plan id)            
-----------------------------------------------------------------
Coordinator Query Peak Memory:
        Query Peak Memory: 0MB
Datanode:
        Max Query Peak Memory: 5MB
        Min Query Peak Memory: 5MB
  4 --Sort
        Sort Method: quicksort  Memory: 25kB ~ 25kB
  9 --Hash
        Max Buckets: 32768  Max Batches: 1  Max Memory Usage: 4kB
        Min Buckets: 32768  Min Batches: 1  Min Memory Usage: 4kB
""
                                 User Define Profiling                                  
----------------------------------------------------------------------------------------
Plan Node id: 2  Track name: coordinator get datanode connection
" (actual time=[5.962, 5.962], calls=[1, 1])"
Plan Node id: 2  Track name: coordinator begin transaction
" (actual time=[0.002, 0.002], calls=[1, 1])"
Plan Node id: 2  Track name: coordinator send command
" (actual time=[0.095, 0.095], calls=[3, 3])"
Plan Node id: 2  Track name: coordinator get the first tuple
" (actual time=[7.891, 7.891], calls=[1, 1])"
Plan Node id: 8  Track name: bitmap heap scan read buffer
" (sample time=[34.368, 35.879], estimated time=[343.731, 358.835], calls=[40175, 40206])"
""
                           ====== Query Summary =====                           
--------------------------------------------------------------------------------
"Datanode executor start time [dn_6005_6006, dn_6003_6004]: [0.504 ms,0.547 ms]"
"Datanode executor run time [dn_6001_6002, dn_6003_6004]: [610.664 ms,617.553 ms]"
"Datanode executor end time [dn_6005_6006, dn_6001_6002]: [0.051 ms,0.067 ms]"
System available mem: 4546560KB
Query Max mem: 4546560KB
Query estimated mem: 32768KB
Enqueue time: 4.886 ms
Coordinator executor start time: 0.191 ms
Coordinator executor run time: 628.284 ms
Coordinator executor end time: 0.045 ms
Parser runtime: 0.000 ms
Planner runtime: 3.145 ms
Query Id: 73464969047523501
Unique SQL Id: 2238842644
Unique SQL Hash: sql_07acf26f0590c5ec805e5875efec3870
Total runtime: 636.740 ms

  • 执行时间 :总执行时间为 628.006 ms

  • 主要操作节点

    • 位图索引扫描(Bitmap Index Scan) :利用 confirm_first_time 索引快速找到符合条件的记录,实际时间为 [68.714, 82.117] 毫秒。

    • 位图堆扫描(Bitmap Heap Scan) :根据位图索引扫描的结果,从堆中获取完整的行数据,并应用其他过滤条件。

    • 排序(Sort) :在结果集中按 data_id 排序,使用了快速排序算法,内存使用为 25kB。由于 data_id 上有索引,排序操作可能利用了该索引,进一步减少了排序的时间。

    • 限制(Limit) :限制返回的行数为 10,有助于减少最终返回的数据量。

  • 优化效果 :相比场景 2,总执行时间略有减少,说明 data_id 索引对排序操作有一定的帮助,进一步提高了查询效率。

(四)场景 4:增加 confirm_first_timeteam_id 增加联合主键

 id |                             operation                              |      A-time      | A-rows | E-rows |  Peak Memory   | E-memory | A-width | E-width | E-costs 
----+--------------------------------------------------------------------+------------------+--------+--------+----------------+----------+---------+---------+---------
  1 | ->  Limit                                                          | 28.507           |      1 |     10 | 8KB            |          |         |     307 | 5987.59 
  2 |    ->  Streaming (type: GATHER)                                    | 28.504           |      1 |     30 | 96KB           |          |         |     307 | 5991.94 
"  3 |       ->  Limit                                                    | [22.979, 24.620] |      1 |     30 | [8KB, 8KB]     | 1MB      |         |     307 | 5983.94 "
"  4 |          ->  Sort                                                  | [22.977, 24.616] |      1 |    663 | [40KB, 40KB]   | 16MB     | [0,360] |     307 | 5984.47 "
"  5 |             ->  Subquery Scan on table_for_test_index          | [22.968, 24.602] |      1 |    662 | [24KB, 32KB]   | 1MB      |         |     307 | 5979.14 "
"  6 |                ->  Hash Semi Join (7, 9)                           | [22.966, 24.584] |      1 |    662 | [24KB, 32KB]   | 1MB      |         |     307 | 5978.02 "
"  7 |                   ->  Bitmap Heap Scan on table_for_test_index | [22.859, 24.464] |      1 |    662 | [72KB, 72KB]   | 1MB      |         |     314 | 5974.83 "
"  8 |                      ->  Bitmap Index Scan                         | [22.762, 24.362] |     67 |   1145 | [32KB, 32KB]   | 1MB      |         |       0 | 4510.28 "
"  9 |                   ->  Hash                                         | [0.064, 0.075]   |    264 |    264 | [296KB, 296KB] | 16MB     | [23,23] |      32 | 1.10    "
" 10 |                      ->  Values Scan on ""*VALUES*""                 | [0.023, 0.033]   |    264 |    264 | [16KB, 16KB]   | 1MB      |         |      32 | 1.10    "
""
                                                                                                                   Predicate Information (identified by plan id)                                                                                                                    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
"  6 --Hash Semi Join (7, 9)"
"        Hash Cond: ((table_for_test_index.area_code)::text = ""*VALUES*"".column1)"
  7 --Bitmap Heap Scan on table_for_test_index
        Recheck Cond: ((confirm_first_time >= '2025-02-28 00:00:00'::timestamp without time zone) AND ('2025-05-23 00:00:00'::timestamp without time zone >= confirm_first_time) AND (dealer_team_id = 1122))
"        Filter: ((item_result = '不合格'::text) AND ((install_province_code)::text = '45'::text) AND (sub_dealer_id = 1077) AND ((item_name)::text = ANY ('{压块,焊接,线缆,阴影}'::text[])))"
        Rows Removed by Filter: 66
  8 --Bitmap Index Scan
        Index Cond: ((confirm_first_time >= '2025-02-28 00:00:00'::timestamp without time zone) AND ('2025-05-23 00:00:00'::timestamp without time zone >= confirm_first_time) AND (team_id = 1122))
""
           Memory Information (identified by plan id)            
-----------------------------------------------------------------
Coordinator Query Peak Memory:
        Query Peak Memory: 0MB
Datanode:
        Max Query Peak Memory: 0MB
        Min Query Peak Memory: 0MB
  4 --Sort
        Sort Method: quicksort  Memory: 25kB ~ 25kB
  9 --Hash
        Max Buckets: 32768  Max Batches: 1  Max Memory Usage: 4kB
        Min Buckets: 32768  Min Batches: 1  Min Memory Usage: 4kB
""
                           User Define Profiling                            
----------------------------------------------------------------------------
Plan Node id: 2  Track name: coordinator get datanode connection
" (actual time=[0.012, 0.012], calls=[1, 1])"
Plan Node id: 2  Track name: coordinator begin transaction
" (actual time=[0.003, 0.003], calls=[1, 1])"
Plan Node id: 2  Track name: coordinator send command
" (actual time=[0.097, 0.097], calls=[3, 3])"
Plan Node id: 2  Track name: coordinator get the first tuple
" (actual time=[2.296, 2.296], calls=[1, 1])"
Plan Node id: 8  Track name: bitmap heap scan read buffer
" (sample time=[0.000, 0.001], estimated time=[0.000, 0.011], calls=[21, 22])"
""
                          ====== Query Summary =====                          
------------------------------------------------------------------------------
"Datanode executor start time [dn_6001_6002, dn_6005_6006]: [0.107 ms,0.128 ms]"
"Datanode executor run time [dn_6005_6006, dn_6001_6002]: [22.986 ms,24.641 ms]"
"Datanode executor end time [dn_6001_6002, dn_6003_6004]: [0.041 ms,0.045 ms]"
System available mem: 4546560KB
Query Max mem: 4546560KB
Query estimated mem: 4362KB
Enqueue time: 0.055 ms
Coordinator executor start time: 0.229 ms
Coordinator executor run time: 28.775 ms
Coordinator executor end time: 0.045 ms
Parser runtime: 0.000 ms
Planner runtime: 3.295 ms
Query Id: 73464969050955362
Unique SQL Id: 2238842644
Unique SQL Hash: sql_07acf26f0590c5ec805e5875efec3870
Total runtime: 32.631 ms

联合索引的创建

CREATE INDEX idx_confirm_first_time_team_id ON table_for_test_index (confirm_first_time, team_id);

  • 执行时间 :总执行时间缩短到 32.631 ms,相比未增加索引时的 1488.180 ms 有显著提升,也优于仅增加单个索引的场景。

  • 主要操作节点

    • 位图索引扫描(Bitmap Index Scan) :利用 confirm_first_timeteam_id 的联合索引快速找到符合条件的记录,实际时间为 [22.762, 24.362] 毫秒。

    • 位图堆扫描(Bitmap Heap Scan) :根据位图索引扫描的结果,从堆中获取完整的行数据,并应用其他过滤条件。

    • 排序(Sort) :在结果集中按 data_id 排序,使用了快速排序算法,内存使用为 25kB。

    • 限制(Limit) :限制返回的行数为 10,有助于减少最终返回的数据量。

  • 优化效果 :联合索引的使用显著减少了数据扫描的时间,查询时间大幅减少,说明索引的添加对查询性能有显著的提升。

(五)四种场景对比分析

在 GaussDB(PostgreSQL)中,根据是否添加索引及索引类型的不同,查询性能表现各异。以下是四种查询场景的总结分析:

场景一:未添加索引
  • 特点:查询时需全表扫描,逐行检查所有数据,过滤掉不符合条件的记录。

  • 性能表现:在数据量较大时,查询性能较差,执行时间较长,如案例中的 1488.180 ms

场景二:添加 confirm_first_time 单列索引
  • 特点:利用索引快速定位 confirm_first_time 范围内的数据,减少需扫描的数据量。

  • 性能表现:查询性能显著提升,执行时间大幅减少,案例中为 647.191 ms

场景三:添加 confirm_first_timedata_id 两个单列索引
  • 特点confirm_first_time 索引用于快速过滤时间范围数据,data_id 索引用于加速排序。

  • 性能表现:查询性能进一步提升,执行时间略短于场景二,案例中为 628.006 ms

场景四:添加 confirm_first_timeteam_id 联合索引
  • 特点:联合索引同时利用两个列的过滤条件,减少扫描数据量,且无需单独维护两个单列索引。

  • 性能表现:查询性能较好,执行时间短,案例中为 32.631 ms

六、增加索引时可能遇到的问题及解决方案

(一)权限不足

  • 问题描述 :在创建索引时,可能会出现权限不足的错误,提示用户没有足够的权限来修改表结构。

  • 解决方法 :确保当前用户具有对表的 ALTER 权限。可以使用以下命令授予权限:

    • 授予权限GRANT ALTER ON TABLE table_name TO username;

(二)表结构冲突

  • 问题描述 :如果表是分布式表(如使用 DISTRIBUTE BY HASH 创建的表),可能会出现不支持主键或唯一索引的错误。

[2025-05-20 15:20:45] [0A000] ERROR: Round-Robin table does not support PrimaryKey/UniqueIndex

  • 解决方法 :对于 GaussDB(PostgreSQL),分布式表不支持主键和唯一索引。可以创建普通索引来提高查询性能:

    • 创建普通索引CREATE INDEX idx_column ON table_name (column_name);

(三)索引创建失败

  • 问题描述 :在创建索引时,可能会出现索引创建失败的错误,提示索引名重复或列类型不支持。

  • 解决方法

    • 检查索引名 :确保索引名不重复。

    • 检查列类型 :确保列类型支持索引创建。对于不支持的列类型,可以考虑使用函数索引或修改列类型。

(四)性能问题

  • 问题描述 :在创建索引时,可能会对表的写入性能产生影响,特别是对于大表来说,索引创建过程可能会比较耗时,并占用较多的磁盘空间。

  • 解决方法

    • 选择合适的时间 :在低峰期创建索引,以减少对业务的影响。

    • 使用并发创建索引 :可以使用 CONCURRENTLY 关键字创建索引,减少对表的写入锁定。例如:

      • 并发创建索引CREATE INDEX CONCURRENTLY idx_column ON table_name (column_name);

(五)索引维护成本

  • 问题描述 :索引会增加数据插入、更新和删除操作的维护成本,因为每次数据变更都需要更新索引。

  • 解决方法

    • 权衡索引数量 :根据查询需求合理创建索引,避免创建过多的索引。

    • 定期维护索引 :定期对索引进行维护,如重建索引以提高性能。可以使用以下命令重建索引:

      • 重建索引REINDEX INDEX index_name;

七、总结

  1. 定期执行 VACUUM** 和 **ANALYZE** 操作** :以保持数据库的性能和查询优化处于最佳状态。VACUUM 回收废弃空间,ANALYZE 更新统计信息,帮助查询规划器生成更优的执行计划。

  2. 合理创建索引 :根据查询需求,为经常用于过滤和排序的列创建索引。特别是对查询条件中的列(如 confirm_first_time)和排序列(如 data_id)创建索引,可以显著提高查询性能。

  3. 分析执行计划 :通过 EXPLAIN ANALYZE 查看查询的执行计划,了解查询的性能瓶颈,并针对性地进行优化。关注操作节点的执行时间、返回的行数、内存使用等关键指标。

  4. 处理索引创建问题 :在创建索引时,注意权限、表结构冲突、索引创建失败等问题,并采取相应的解决方法。定期维护索引,权衡索引数量,以保持数据库性能和查询效率的平衡。

以上内容涵盖了在 GaussDB(PostgreSQL)中进行数据库开发和优化的关键步骤和技巧,以及在增加索引时可能遇到的问题及解决方案,希望对日常工作有所帮助。


网站公告

今日签到

点亮在社区的每一天
去签到