Search

Top 60 Oracle Blogs

Recent comments

Purging SQL Statements and Execution Plans from AWR

A previous blog discussed the problem posed by old execution plans that were previously captured by AWR being used by dbms_xplan.display_awr() to describe current executions plans for the same execution plan but displaying the old costs.
Tanel Poder commented that there is a TIMESTAMP column on both V$SQL_PLAN and DBA_HIST_SQL_PLAN so you know when the plans were generated, but this is not used to purge plans from AWR.  It prompted me to look into purging of statements and plans in more detail.
There has been a long-standing problem of old plans not being purged at all from AWR.  There are various notes on the Oracle support site that refer to this behaviour.   However, it appears to have been addressed by at least Oracle 19c. 
There is no snapshot ID (SNAP_ID) on either DBA_HIST_SQLTEXT or DBA_HIST_SQL_PLAN but this column does exist on the underlying tables WRH$_SQLTEXT and WRH$_SQL_PLAN, where it is used to restrict the purge process to rows created before the retention period.  SQL statements and plans collected prior to the AWR retention limit, for which there are no current SQL statistics, are purged.  However, SQL statements are not purged if SQL baselines are present for that snapshot.

Even so, an old plan can still be retained in the AWR, if the same SQL statement is still submitted and the same plan is still generated by the optimiser, and the statement is captured by the AWR snapshot either because it is in the top statements, or the SQL is 'colored'.  dbms_xplan.display_awr() will show the plan with the original costs.  It will not be purged because there are SQL statistics more recent than the AWR retention limit.

Oracle Notes

There are various references on the Oracle support website to the lack of purging resulting in significant growth in the size of the AWR tables, including bugs in old releases.

dbms_workload_repository.purge_sql_details()

SQL statements and plans are automatically purged along with the AWR snapshots when they reach the retention limit.  There is also a procedure for manual purge.  Session trace including bind variable values of the purge procedure reveals the behaviour of the purge process.  It purges WRH$_SQLTEXT, WRH$_SQL_PLAN and WRH$_SQL_BIND_METADATA for SNAP_IDs in the range 0 to one less than the oldest snapshot not beyond the retention limit.  When purging manually, it is also possible to set a maximum number of rows to be purged from each table.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">select min(snap_id) 
from wrm$_snapshot
where dbid = :dbid
and (cast(SYSTIMESTAMP as TIMESTAMP) - end_interval_time) <= :retention

SQL Statements are not purged if there are either SQL statistics for it in AWR or a SQL Baseline on it.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">delete from (
select *
from wrh$_sqltext tab
where tab.dbid = :dbid
and :beg_snap <= tab.snap_id and tab.snap_id < :end_snap
and not exists (
select 1
from wri$_sqltext_refcount refc
where refc.dbid = tab.dbid
and refc.sql_id = tab.sql_id
and refc.con_dbid = tab.con_dbid and refc.ref_count > 0)
and not exists (
select 1
from wrh$_sqlstat s
where s.dbid = :dbid and s.con_dbid = tab.con_dbid
and s.sql_id = tab.sql_id))
where rownum <= :bind1

Execution plans are not purged if there are SQL statistics for it in AWR.  Statistics older than the retention limit will have been purged by this stage.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">delete from (
select *
from wrh$_sql_plan tab
where tab.dbid = :dbid
and :beg_snap <= tab.snap_id and tab.snap_id < :end_snap
and not exists (
select 1
from wrh$_sqlstat s
where s.dbid = :dbid and s.con_dbid = tab.con_dbid
and s.sql_id = tab.sql_id))
where rownum <= :bind1

dbms_workload_repository.drop_snapshot_range() behaves similarly but does not need to test for SQL statistics because it is deleting them in that range.

Comparison of Optimizer Cost of Captured Plans with Optimizer Cost of SQL Statistics

It is possible to compare the optimizer cost on SQL statistics with the optimizer cost of the captured plan and see where they have diverged.  In this example, I am looking for an absolute difference in cost of 10 and also of at least 10%.
#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">set lines 140 pages 99
col costpctdiff heading 'Cost|%Diff' format 99999
col costdiff heading 'Cost|Diff' format 99999999
col plan_hash_value heading 'SQL Plan|Hash Value'
col child_number heading 'Child|No.' format 9999
col inst_id heading 'Inst|ID' format 999
col hcost heading 'AWR|Cost' format 9999999
col ccost heading 'Cursor|Cost' format 9999999
col htimestamp heading 'AWR|Timestamp'
col ctimestamp heading 'Cursor|Timestamp'
col end_interval_time format a26
col snap_id heading 'Snap|ID' format 99999
col awr_cost format 99999999
col optimizer_Cost heading 'Opt.|Cost' format 99999999
col optimizer_env_hash_value heading 'Opt. Env.|Hash Value'
col num_stats heading 'Num|Stats' format 9999
alter session set nls_date_format = 'hh24:mi:ss dd.mm.yy';
break on plan_hash_value skip 1 on sql_id on dbid
ttitle 'compare AWR/recent plan costs'

with h as ( /*captured plan outside retention limit*/
select p.dbid, p.sql_id, p.plan_hash_Value, max(cost) cost
, max(p.timestamp) timestamp
from dba_hist_sql_plan p
, dba_hist_wr_control c
where p.dbid = c.dbid
and p.cost>0
and (p.object_owner != 'SYS' OR p.object_owner IS NULL) --omit SYS owned objects
and p.timestamp < sysdate-c.retention
group by p.dbid, p.sql_id, p.plan_hash_value
), s as ( /*SQL statistics*/
select t.dbid, t.sql_id, t.plan_hash_value, t.optimizer_env_hash_value
, t.optimizer_cost
, MIN(t.snap_id) snap_id
, MIN(s.end_interval_time) end_interval_time
, COUNT(*) num_stats
from dba_hist_snapshot s
, dba_hist_sqlstat t
where s.dbid = t.dbid
and s.snap_id = t.snap_id
and t.optimizer_cost > 0
GROUP BY t.dbid, t.sql_id, t.plan_hash_value, t.optimizer_env_hash_value
, t.optimizer_cost
), x as (
Select NVL(h.dbid,s.dbid) dbid
, NVL(h.sql_id,s.sql_id) sql_id
, NVL(h.plan_hash_value,s.plan_hash_value) plan_hash_value
, h.cost hcost, h.timestamp htimestamp
, s.snap_id, s.end_interval_time
, s.optimizer_env_hash_value, s.optimizer_cost
, s.num_stats
, s.optimizer_cost-h.cost costdiff
, 100*s.optimizer_cost/NULLIF(h.cost,0) costpctdiff
From h join s
on h.plan_hash_value = s.plan_hash_value
and h.sql_id = s.sql_id
and h.dbid = s.dbid
), y as (
SELECT x.*
, MAX(ABS(costpctdiff)) OVER (PARTITION BY dbid, sql_id, plan_hash_value) maxcostpctdiff
, MAX(ABS(costdiff)) OVER (PARTITION BY dbid, sql_id, plan_hash_value) maxcostabsdiff
FROM x
)
SELECT dbid, sql_id, plan_hash_value, hcost, htimestamp
, snap_id, end_interval_time, optimizer_env_hash_value, optimizer_cost, num_stats, costdiff, costpctdiff
FROM y
WHERE maxcostpctdiff>=10
And maxcostabsdiff>=10
order by plan_hash_value,sql_id,end_interval_time
/
break on report
ttitle off

We can see that according to some statement statistics, for the same execution plan, costs vary over time.  They can go up or down.  This can be caused by different object statistics or different bind variables.  It is also perfectly possible to have different cursors produce the same execution plan but with a different cost.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 80%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">                                                       compare AWR/recent plan costs

SQL Plan AWR AWR Snap Opt. Env. Opt. Num Cost Cost
DBID SQL_ID Hash Value Cost Timestamp ID END_INTERVAL_TIME Hash Value Cost Stats Diff %Diff
---------- ------------- ---------- -------- ----------------- ------ -------------------------- ---------- --------- ----- --------- ------

3719366004 1852wpfdvbbdb 428098007 5759 06:04:57 07.01.19 66189 18-JUL-19 01.00.29.158 PM 2275008064 15760 22 10001 274
5759 06:04:57 07.01.19 66191 18-JUL-19 02.00.30.019 PM 2275008064 6434 40 675 112
5759 06:04:57 07.01.19 66196 18-JUL-19 04.30.41.024 PM 2275008064 6435 24 676 112
5759 06:04:57 07.01.19 66197 18-JUL-19 05.00.26.793 PM 2275008064 6448 10 689 112
5759 06:04:57 07.01.19 66217 19-JUL-19 03.00.09.245 AM 2275008064 6444 2 685 112
3719366004 dnxpk0a7tmk2c 5761 14:25:15 07.01.19 66247 19-JUL-19 06.00.59.483 PM 2275008064 4258 2 -1503 74

3719366004 882x55gh06a0k 4034560521 321903 09:02:37 01.04.19 66208 18-JUL-19 10.30.45.355 PM 2275008064 471900 6 149997 147
321903 09:02:37 01.04.19 66285 20-JUL-19 01.00.01.061 PM 2275008064 471848 2 149945 147
321903 09:02:37 01.04.19 66326 21-JUL-19 09.30.18.514 AM 2275008064 473557 2 151654 147

3719366004 fgyja8zfdf4s6 4057647380 188 22:02:32 11.02.19 66214 19-JUL-19 01.30.10.479 AM 50414560 496 4 308 264

9ud99dpmkdtq6 4059558786 3 03:56:20 01.11.18 66191 18-JUL-19 02.00.30.019 PM 50414560 170 226 167 5667

3d6ph39wbq0zq 4078589457 62946 06:23:42 01.05.18 66270 20-JUL-19 05.30.15.218 AM 3570113222 196673 2 133727 312

9ud99dpmkdtq6 4082397478 3 20:36:53 03.10.18 66189 18-JUL-19 01.00.29.158 PM 50414560 2296 174 2293 76533

6rkn5ty3mpdg8 4086831979 1096 11:54:01 22.01.19 66201 18-JUL-19 07.00.07.631 PM 50414560 549 16 -547 50
1096 11:54:01 22.01.19 66212 19-JUL-19 12.30.00.506 AM 50414560 6055 16 4959 552

Summary

Old execution plans will not be purged from AWR if there are recent SQL statistics for the same SQL ID and execution plan hash value.  You will see the old plan costs if you extract the plan with
dbms_xplan.display_awr().

Is that a problem?

  • This does not have an impact on the behaviour of the database.  The data in AWR tells you what happened in the past.
  • I think it is most important to know how far you can trust display_awr().  The plan can be used as a guide to the path of execution, but not to the amount of work done.  It will show you the correct operations in the execution plan, but costs, bind variables, partition IDs and anything that does not contribute to the plan hash value should be approached with caution.  This is why I often profile DB time for a SQL statement by SQL_PLAN_LINE_ID so I can see how much time was spent in each operation.

Oracle Doc ID 1478615.1 implies that it is permitted to manually purge specific plans.  A subsequent AWR snapshot might then capture a new version of the same execution plan.  However, there are potential problems with this approach.

You may have an example where the old estimates for a plan don't make sense for a recent execution, but if you replace the plan with an updated one to get recent estimates that may not make sense for some activity in the past.
You can have different child cursors with different costs for the same execution plan.  Therefore, it is effectively arbitrary as to which plan would be captured into AWR, so there is no guarantee that you will get a more representative plan.  It is not even certain that costs for one cursor make sense for another.
I am not sure that there is any material benefit that is worth the effort of implementing something to additionally purge old execution plans.  I would welcome comments and opinions on this subject.