GBase 8t数据库SQL优化一例
客户的数据库从Oracle迁移到GBase 8t,反应有个SQL语句变得很慢很慢。而在原Oracle环境下,执行只需要10秒钟。
SQL语句如下:
select count(*)
from TA_DOC doc left join TA_FORM_DATA t
on t.docid=doc.id
and t.elementname = '纸质文件印数'
where isprint!=1
and doc.state!=2
and (select count(*)
from TA_TODO t
where currtype=4
and currdeptid=10000086
and curruserid is not null
and currusername is not null
and state=2
and mainsend!=4
and docid=doc.id)>0
对该语句做sqexplain,发现用时达到80分钟,ta_doc扫了23881次(子查询select count(*) from ta_todo t),每次扫23881行,结果如下:
Estimated Cost: 1561030
Estimated # of Rows Returned: 1
1) informix.doc: INDEX PATH
Filters: ((informix.doc.state != 2.0000000000000000 AND informix.doc.isprint != 1.0000000000000000 ) AND > 0 )
(1) Index Name: informix. 202_417
Index Keys: id (Serial, fragments: ALL)
2) informix.t: INDEX PATH
Filters: informix.t.elementname = '纸质文件印数'
(1) Index Name: informix. 209_457
Index Keys: docid fromid elementid (Serial, fragments: ALL)
Lower Index Filter: informix.t.docid = informix.doc.id
ON-Filters:(informix.t.docid = informix.doc.id AND informix.t.elementname = '纸质文件印数' )
NESTED LOOP JOIN(LEFT OUTER JOIN)
Subquery:
---------
Estimated Cost: 805
Estimated # of Rows Returned: 1
1) informix.t: INDEX PATH
Filters: (((((informix.t.docid = informix.doc.id AND informix.t.currtype = '4' ) AND informix.t.mainsend != 4.0000000000000000 ) AND informix.t.currusername IS NOT NULL ) AND informix.t.curruserid IS NOT NULL ) AND informix.t.currdeptid = 10000086 )
(1) Index Name: informix.fk_todo_state
Index Keys: state (Serial, fragments: ALL)
Lower Index Filter: informix.t.state = '2'
Query statistics:
-----------------
Table map :
----------------------------
Internal name Table name
----------------------------
t1 doc
t2 t
type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t1 1275 6448 23881 81:32.02 1559152
type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t2 1692 6810 12560 00:00.04 0
type rows_prod est_rows time est_cost
-------------------------------------------------
nljoin 1275 6449 81:32.05 1561031
type rows_prod est_rows rows_cons time
-------------------------------------------------
group 1 1 1275 81:32.05
分析:
程序员设计该业务逻辑时,并没有考虑到数据的增长情况,在数据量不断增大的情况下,ta_todo会执行的次数越来越多,越来越影响性能。从sqexplain看,虽然单一步花费的时间很快,但是执行的次数多了(23881次),总体时间就上来了。
于是我们对该查询进行优化,子查询改为临时表操作,改为关联查询,即需要一次ta_doc与ta_form_data join操作,再与ta_todo查询结果的临时表进行关联查询,语句改为如下:
select count(*)
from
(select doc.id as id
from TA_DOC doc left join TA_FORM_DATA t
on doc.id = t.docid
and t.elementname = '纸质文件印数'
where isprint!='1'
and doc.state!='2'
) t1,
(select docid,count(*) as numofid
from TA_TODO
where currtype='4'
and currdeptid='10000086'
and curruserid is not null
and currusername is not null
and state='2'
and mainsend!='4'
group by docid
) t2
where t1.id = t2.docid
使用sqexplain跟踪,发现有极大的改善:
Estimated Cost: 4796
Estimated # of Rows Returned: 1
1) (Temp Table For Collection Subquery): SEQUENTIAL SCAN
2) (Temp Table For Collection Subquery): SEQUENTIAL SCAN
DYNAMIC HASH JOIN
Dynamic Hash Filters: (Temp Table For Collection Subquery).id = (Temp Table For Collection Subquery).docid
Query statistics:
-----------------
Table map :
----------------------------
Internal name Table name
----------------------------
t1 doc
t2 t
t3 (Temp Table For Collection Subquery)
t4 ta_todo
t5 (Temp Table For Collection Subquery)
type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t1 23768 19344 23881 00:00.04 7357
type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t2 4728 6810 67219 00:00.25 0
type rows_prod est_rows time est_cost
-------------------------------------------------
nljoin 23768 19345 00:00.18 12826
type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t3 23768 19345 23768 00:00.00 780
type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t4 1517 1955 2686 00:00.00 806
type rows_prod est_rows rows_cons time est_cost
------------------------------------------------------------
group 1290 1892 1517 00:00.00 2720
type table rows_prod est_rows rows_scan time est_cost
-------------------------------------------------------------------
scan t5 1290 1892 1290 00:00.00 78
type rows_prod est_rows rows_bld rows_prb novrflo time est_cost
------------------------------------------------------------------------------
hjoin 1275 366008 1290 23768 0 00:00.00 4796
type rows_prod est_rows rows_cons time
-------------------------------------------------
group 1 1 1275 00:00.00
结果:
将语句耗费的时间从4891秒,提升到0.3秒,效率提升1.6万倍;将同样的SQL优化到原Oracle环境下,也从10几秒,得升到0.3秒,效率也有36倍的提升。
总结:
Oracle的优化器对SQL的优化效率比较高,但也是有优化空间;GBase 8t/Informix的优化器相对保守(严格按照语句的要求进行优化),在SQL语句不怎么高效时,效率会比较慢。