GBase 8t数据库SQL优化一例

客户的数据库从Oracle迁移到GBase 8t,反应有个SQL语句变得很慢很慢。而在原Oracle环境下,执行只需要10秒钟。
SQL语句如下:

select count(*) 
from TA_DOC doc left join TA_FORM_DATA t 
    on t.docid=doc.id
    and t.elementname = '纸质文件印数' 
where isprint!=1 
    and doc.state!=2 
    and (select count(*) 
         from TA_TODO t 
         where currtype=4 
         and currdeptid=10000086  
         and curruserid is not null 
         and currusername is not null 
         and state=2 
         and mainsend!=4 
         and docid=doc.id)>0

对该语句做sqexplain,发现用时达到80分钟,ta_doc扫了23881次(子查询select count(*) from ta_todo t),每次扫23881行,结果如下:

Estimated Cost: 1561030
Estimated # of Rows Returned: 1
      1) informix.doc: INDEX PATH
            Filters: ((informix.doc.state != 2.0000000000000000 AND informix.doc.isprint != 1.0000000000000000 ) AND  > 0 ) 
        (1) Index Name: informix. 202_417
            Index Keys: id   (Serial, fragments: ALL)
      2) informix.t: INDEX PATH
            Filters: informix.t.elementname = '纸质文件印数' 
        (1) Index Name: informix. 209_457
            Index Keys: docid fromid elementid   (Serial, fragments: ALL)
            Lower Index Filter: informix.t.docid = informix.doc.id 
    ON-Filters:(informix.t.docid = informix.doc.id AND informix.t.elementname = '纸质文件印数' ) 
    NESTED LOOP JOIN(LEFT OUTER JOIN)
    Subquery:
    ---------
    Estimated Cost: 805
    Estimated # of Rows Returned: 1
      1) informix.t: INDEX PATH
            Filters: (((((informix.t.docid = informix.doc.id AND informix.t.currtype = '4' ) AND informix.t.mainsend != 4.0000000000000000 ) AND informix.t.currusername IS NOT NULL ) AND informix.t.curruserid IS NOT NULL ) AND informix.t.currdeptid = 10000086 ) 
        (1) Index Name: informix.fk_todo_state
            Index Keys: state   (Serial, fragments: ALL)
            Lower Index Filter: informix.t.state = '2'
Query statistics:
-----------------
  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                doc
  t2                t
  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     1275       6448      23881      81:32.02   1559152 
  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t2     1692       6810      12560      00:00.04   0       
  type     rows_prod  est_rows  time       est_cost
  -------------------------------------------------
  nljoin   1275       6449      81:32.05   1561031 
  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         1275       81:32.05

分析:
程序员设计该业务逻辑时,并没有考虑到数据的增长情况,在数据量不断增大的情况下,ta_todo会执行的次数越来越多,越来越影响性能。从sqexplain看,虽然单一步花费的时间很快,但是执行的次数多了(23881次),总体时间就上来了。
于是我们对该查询进行优化,子查询改为临时表操作,改为关联查询,即需要一次ta_doc与ta_form_data join操作,再与ta_todo查询结果的临时表进行关联查询,语句改为如下:

select count(*) 
from 
    (select doc.id as id 
     from TA_DOC doc left join TA_FORM_DATA t 
         on doc.id = t.docid 
         and t.elementname = '纸质文件印数' 
     where isprint!='1' 
         and doc.state!='2'
    ) t1,
    (select docid,count(*) as numofid
     from TA_TODO
     where currtype='4' 
         and currdeptid='10000086'  
         and curruserid is not null 
         and currusername is not null 
         and state='2' 
         and mainsend!='4'
     group by docid
    ) t2
where t1.id = t2.docid

使用sqexplain跟踪,发现有极大的改善:

Estimated Cost: 4796
Estimated # of Rows Returned: 1
  1) (Temp Table For Collection Subquery): SEQUENTIAL SCAN
  2) (Temp Table For Collection Subquery): SEQUENTIAL SCAN
DYNAMIC HASH JOIN 
    Dynamic Hash Filters: (Temp Table For Collection Subquery).id = (Temp Table For Collection Subquery).docid 
Query statistics:
-----------------
  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                doc
  t2                t
  t3                (Temp Table For Collection Subquery)
  t4                ta_todo
  t5                (Temp Table For Collection Subquery)
  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     23768      19344     23881      00:00.04   7357    
  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t2     4728       6810      67219      00:00.25   0       
  type     rows_prod  est_rows  time       est_cost
  -------------------------------------------------
  nljoin   23768      19345     00:00.18   12826   
  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t3     23768      19345     23768      00:00.00   780     
  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t4     1517       1955      2686       00:00.00   806     
  type     rows_prod  est_rows  rows_cons  time       est_cost
  ------------------------------------------------------------
  group    1290       1892      1517       00:00.00   2720    
  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t5     1290       1892      1290       00:00.00   78      
  type     rows_prod  est_rows  rows_bld  rows_prb  novrflo  time       est_cost
  ------------------------------------------------------------------------------
  hjoin    1275       366008    1290      23768     0        00:00.00   4796    
  type     rows_prod  est_rows  rows_cons  time
  -------------------------------------------------
  group    1          1         1275       00:00.00

结果:
将语句耗费的时间从4891秒,提升到0.3秒,效率提升1.6万倍;将同样的SQL优化到原Oracle环境下,也从10几秒,得升到0.3秒,效率也有36倍的提升。
总结:
Oracle的优化器对SQL的优化效率比较高,但也是有优化空间;GBase 8t/Informix的优化器相对保守(严格按照语句的要求进行优化),在SQL语句不怎么高效时,效率会比较慢。

标签: none

添加新评论

Free Web Hosting