Search

Top 60 Oracle Blogs

Recent comments

Optimizer

[Oracle] DB Optimizer Part XI - Query transformation fixes false cardinality estimation with multiple OR predicates (in newer Oracle releases)

Introduction

This is a short blog post about another query transformation (check my first blog post [Oracle] DB Optimizer Part VIII - Looking under the hood of query transformation (done by CBO) with simple real life example first, if you never heard of the term "query transformations"), that fixes a cardinality estimation issue with multiple (OR) predicates that cover the whole data range. I have to admit that this is a special case, but it can happen as observed by my client

Limiting the Degree of Parallelism via Resource Manager and a gotcha

This might be something very obvious for the reader but I had an interesting revelation recently when implementing parallel_degree_limit_p1 in a resource consumer group. My aim was to prevent users mapped to a resource consumer group from executing any query in parallel. The environment is fictional, but let’s assume that it is possible that maintenance operations for example leave indexes and tables decorated with a parallel x attribute. Another common case is the restriction of PQ resource to users to prevent them from using all the machine’s resources.

This can happen when you perform an index rebuild for example in parallel to speed the operation up. However the DOP will stay the same with the index after the maintenance operation, and you have to explicitly set it back:

Creating Optimizer Trace Files

Many Oracle DBA’s are probably familiar with what Optimizer trace files are and likely know how to create them. When I say “Optimizer trace” more than likely you think of event 10053, right? SQL code like this probably is familiar then:

alter session set tracefile_identifier='MY_10053';
alter session set events '10053 trace name context forever';
select /* hard parse comment */ * from emp where ename = 'SCOTT';
alter session set events '10053 trace name context off';

In 11g, a new diagnostic events infrastructure was implemented and there are various levels of debug output that you can control for sql compilation. ORADEBUG shows us the hierarchy.

Implicit Datatype Conversion + Histograms = Bad Execution Plan?

Earlier today I exchanged some tweets with @martinberx about some optimizer questions and after posting more information on the ORACLE-L list, I was able to reproduce what he was observing.

The issue:

DB: 11.2.0.2.0 – 64bit /> I have a small query with a little error, which causes big troubles. /> The relevant part of the query is /> WHERE …. /> AND inst_prod_type=003 /> AND setid=’COM01′

but INST_PROD_TYPE is VARCHAR2.

this leads to filter[ (TO_NUMBER("INST_PROD_TYPE")=3 AND "SETID"='COM01') ]

based on this TO_NUMBER ( I guess!) the optimiser takes a fix selectivity of 1%.

Can someone tell me if this 1% is right? Jonathan Lewis “CBO Fundamentals” on page 133 is only talking about character expressions.

Unfortunately there are only 2 distinct values of INST_PROD_TYPE so this artificial [low] selectivity leads to my problem: /> An INDEX SKIP SCAN on PS0RF_INST_PROD is choosen. (columns of PS0RF_INST_PROD: INST_PROD_TYPE, SETID, INST_PROD_ID )

After fixing the statement to /> AND inst_prod_type=’003′ /> another index is used and the statement performs as expected.

Now I have no problem, but want to find the optimizers decisions in my 10053 traces.

The Important Bits of Information

From Martin’s email we need to pay close attention to:

  • Predicate of “inst_prod_type=003″ where INST_PROD_TYPE is VARCHAR2 (noting no single quotes around 003)
  • Implicite datatype conversion in predicate section of explain plan – TO_NUMBER(“INST_PROD_TYPE”)=3
  • only 2 distinct values of INST_PROD_TYPE

From this information I’ll construct the following test case:

create table foo (c1 varchar2(8));
insert into foo select '003' from dual connect by level <= 1000000;
insert into foo select '100' from dual connect by level <= 1000000;
commit;
exec dbms_stats.gather_table_stats(user,'foo');

And using the display_raw function we’ll look at the column stats.

col low_val     for a8
col high_val    for a8
col data_type   for a9
col column_name for a11

select
   a.column_name,
   display_raw(a.low_value,b.data_type) as low_val,
   display_raw(a.high_value,b.data_type) as high_val,
   b.data_type,
   a.density,
   a.histogram,
   a.num_buckets
from
   user_tab_col_statistics a, user_tab_cols b
where
   a.table_name='FOO' and
   a.table_name=b.table_name and
   a.column_name=b.column_name
/

COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS
----------- -------- -------- --------- ---------- --------------- -----------
C1          003      100      VARCHAR2          .5 NONE                      1

Take note of the lack of a histogram.

Now let’s see what the CBO estimates for a simple query with and without quotes (explicit cast and implicit cast).

SQL> explain plan for select count(*) from foo where c1=003;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 1342139204

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |   875   (3)| 00:00:11 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| FOO  |  1000K|  3906K|   875   (3)| 00:00:11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TO_NUMBER("C1")=003)

14 rows selected.

SQL> explain plan for select count(*) from foo where c1='003';

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 1342139204

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |   868   (2)| 00:00:11 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| FOO  |  1000K|  3906K|   868   (2)| 00:00:11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("C1"='003')

14 rows selected.

In this case the estimated number of rows is spot on – 1 million rows. Now lets regather stats and because of our queries using C1 predicates, it will become a candidate for a histogram. We can see this from sys.col_usage$.

select  oo.name owner,
        o.name table_name,
        c.name column_name,
        u.equality_preds,
        u.equijoin_preds,
        u.nonequijoin_preds,
        u.range_preds,
        u.like_preds,
        u.null_preds,
        u.timestamp
from    sys.col_usage$ u,
        sys.obj$ o,
        sys.user$ oo,
        sys.col$ c
where   o.obj#   = u.obj#
and     oo.user# = o.owner#
and     c.obj#   = u.obj#
and     c.col#   = u.intcol#
and     oo.name  = 'GRAHN'
and     o.name   = 'FOO'
/

OWNER TABLE_NAME COLUMN_NAME EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
----- ---------- ----------- -------------- -------------- ----------------- ----------- ---------- ---------- -------------------
GRAHN FOO        C1                       1              0                 0           0          0          0 2011-06-08 22:29:59

Regather stats and re-check the column stats:

SQL> exec dbms_stats.gather_table_stats(user,'foo');

PL/SQL procedure successfully completed.

SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='FOO' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /

COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS
----------- -------- -------- --------- ---------- --------------- -----------
C1          003      100      VARCHAR2  2.5192E-07 FREQUENCY                 2

Note the presence of a frequency histogram. Now let’s re-explain:

SQL> explain plan for select count(*) from foo where c1=003;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 1342139204

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |   875   (3)| 00:00:11 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| FOO  |     1 |     4 |   875   (3)| 00:00:11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TO_NUMBER("C1")=003)

SQL> explain plan for select count(*) from foo where c1='003';

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 1342139204

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |   868   (2)| 00:00:11 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| FOO  |  1025K|  4006K|   868   (2)| 00:00:11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("C1"='003')

And whammy! Note that the implicit cast [ filter(TO_NUMBER("C1")=003) ] now has an estimate of 1 row (when we know there is 1 million). /> So what is going on here? Let’s dig into the optimizer trace for some insight:

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for FOO[FOO]
  Column (#1):
    NewDensity:0.243587, OldDensity:0.000000 BktCnt:5458, PopBktCnt:5458, PopValCnt:2, NDV:2
  Column (#1): C1(
    AvgLen: 4 NDV: 2 Nulls: 0 Density: 0.243587
    Histogram: Freq  #Bkts: 2  UncompBkts: 5458  EndPtVals: 2
  Using prorated density: 0.000000 of col #1 as selectvity of out-of-range/non-existent value pred
  Table: FOO  Alias: FOO
    Card: Original: 2000000.000000  Rounded: 1  Computed: 0.50  Non Adjusted: 0.50
  Access Path: TableScan
    Cost:  875.41  Resp: 875.41  Degree: 0
      Cost_io: 853.00  Cost_cpu: 622375564
      Resp_io: 853.00  Resp_cpu: 622375564
  Best:: AccessPath: TableScan
         Cost: 875.41  Degree: 1  Resp: 875.41  Card: 0.50  Bytes: 0

As you can see from the line

Using prorated density: 0.000000 of col #1 as selectvity of out-of-range/non-existent value pred

The presence of the histogram and the implicit conversion of TO_NUMBER(“C1″)=003 causes the CBO to use a density of 0 because it thinks it’s a non-existent value. The reason for this is that TO_NUMBER(“C1″)=003 is the same as TO_NUMBER(“C1″)=3 and for the histogram the CBO uses TO_CHAR(C1)=’3′ and 3 is not present in the histogram only ’003′ and ’100′.

Dumb Luck?

So, what if the predicate contained a number that was not left padded with zeros, say 100, the other value we put in the table?

SQL> explain plan for select count(*) from foo where c1=100;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 1342139204

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |   875   (3)| 00:00:11 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| FOO  |  1009K|  3944K|   875   (3)| 00:00:11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TO_NUMBER("C1")=100)

While not exact, the CBO estimate is quite close to the 1 million rows with C1=’100′.

Summary

It’s quite clear that Martin’s issue came down to the following:

  • implicit casting
  • presences of histogram
  • zero left padded number/string

The combination of these created a scenario where the CBO thinks the value is out-of-range and uses a prorated density of 0 resulting in a cardinality of 1 when there are many more rows than 1.

The moral of the story here is always cast your predicates correctly. This includes explicit cast of date types as well – never rely on the nls settings.

All tests performed on 11.2.0.2.

Transitive Closure - Outer Joins

The Cost Based Optimizer (CBO) supports since at least Oracle 9i the automatic generation of additional predicates based on transitive closure.

In principle this means:

If a = b and b = c then the CBO can infer a = c

As so often with these optimizations the purpose of these automatically generated additional predicates is to allow the optimizer finding potentially more efficient access paths, like an index usage or earlier filtering reducing the amount of data to process.

PX COORDINATOR FORCED SERIAL operation

This is just a short heads-up for those that come across an execution plan showing the PX COORDINATOR FORCED SERIAL operation. I don't have official confirmation but according to my tests a plan with such an operation effectively means: Cost for parallel execution but execute serially (You might remember that I've recently mentioned in another post that there is the possibility to have a plan executed in parallel but costed serially, weird isn't it). Why such an operation exists is not clear to me - obviously it would make much more sense to cost straight away for serial execution in such a case. Probably there is a good reason, otherwise such an operation didn't exist but I think at least the costing is questionable in current versions.

Reading Parallel Execution Plans With Bloom Pruning And Composite Partitioning

You’ve probably heard sayings like “sometimes things aren’t always what they seem” and “people lie”. Well, sometimes execution plans lie. It’s not really by intent, but it is sometimes difficult (or impossible) to represent everything in a query execution tree in nice tabular format like dbms_xplan gives. One of the optimizations that was introduced back in 10gR2 was the use of bloom filters. Bloom filters can be used in two ways: 1) for filtering or 2) for partition pruning (bloom pruning) starting with 11g. Frequently the data models used in data warehousing are dimensional models (star or snowflake) and most Oracle warehouses use simple range (or interval) partitioning on the fact table date key column as that is the filter that yields the largest I/O reduction from partition pruning (most queries in a time series star schema include a time window, right!). As a result, it is imperative that the join between the date dimension and the fact table results in partition pruning. Let’s consider a basic two table join between a date dimension and a fact table. For these examples I’m using STORE_SALES and DATE_DIM which are TPC-DS tables (I frequently use TPC-DS for experiments as it uses a [...]

automatic DOP

nocompressでIn-Memory Parallel Query の中でCOMPRESS->NOCOMPRESSにしたらパラレル度が自動的に4->6になった。

NOCOMPRESSだとパラレル度は上がるのか?

NOCOMPRESSのlineitemをSelectする:

SQL> alter system flush buffer_cache;

システムが変更されました。

SQL> select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty
2 , sum(l_extendedprice) as sum_base_price
3 , sum(l_extendedprice * (1 - l_discount)) as sum_disc_price
4 , sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge
5 , avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price
6 , avg(l_discount) as avg_disc
7 , count(*) as count_order
8 from lineitem
9 where l_shipdate <= date '1992-12-01' - interval '68' day (3)
10 group by l_returnflag, l_linestatus
11 order by l_returnflag, l_linestatus
12 /

経過: 00:00:02.03

実行計画
----------------------------------------------------------
Plan hash value: 2461824725

------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 135 | 6956 (1)| 00:01:24 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (ORDER) | :TQ10001 | 5 | 135 | 6956 (1)| 00:01:24 | Q1,01 | P->S | QC (ORDER) |
| 3 | SORT GROUP BY | | 5 | 135 | 6956 (1)| 00:01:24 | Q1,01 | PCWP | |
| 4 | PX RECEIVE | | 5 | 135 | 6956 (1)| 00:01:24 | Q1,01 | PCWP | |
| 5 | PX SEND RANGE | :TQ10000 | 5 | 135 | 6956 (1)| 00:01:24 | Q1,00 | P->P | RANGE |
| 6 | HASH GROUP BY | | 5 | 135 | 6956 (1)| 00:01:24 | Q1,00 | PCWP | |
| 7 | PX BLOCK ITERATOR | | 634K| 16M| 6941 (1)| 00:01:24 | Q1,00 | PCWC | |
|* 8 | TABLE ACCESS FULL| LINEITEM | 634K| 16M| 6941 (1)| 00:01:24 | Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("L_SHIPDATE"<=TO_DATE(' 1992-09-24 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
- automatic DOP: Computed Degree of Parallelism is 3


統計
----------------------------------------------------------
22 recursive calls
4 db block gets
106301 consistent gets
102004 physical reads
0 redo size
1456 bytes sent via SQL*Net to client
519 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
2 rows processed

DOP=3が自動的に設定され、2.03秒かかった。
automatic DOP: Computed Degree of Parallelism is 3

次にCOMPRESSされたlineitemをSelectしてみる:


SQL> select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty
2 , sum(l_extendedprice) as sum_base_price
3 , sum(l_extendedprice * (1 - l_discount)) as sum_disc_price
4 , sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge
5 , avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price
6 , avg(l_discount) as avg_disc
7 , count(*) as count_order
8 from lineitem_comp
9 where l_shipdate <= date '1992-12-01' - interval '68' day (3)
10 group by l_returnflag, l_linestatus
11 order by l_returnflag, l_linestatus
12 /

経過: 00:00:01.51

実行計画
----------------------------------------------------------
Plan hash value: 3881738590

-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 135 | 6787 (2)| 00:01:22 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (ORDER) | :TQ10001 | 5 | 135 | 6787 (2)| 00:01:22 | Q1,01 | P->S | QC (ORDER) |
| 3 | SORT GROUP BY | | 5 | 135 | 6787 (2)| 00:01:22 | Q1,01 | PCWP | |
| 4 | PX RECEIVE | | 5 | 135 | 6787 (2)| 00:01:22 | Q1,01 | PCWP | |
| 5 | PX SEND RANGE | :TQ10000 | 5 | 135 | 6787 (2)| 00:01:22 | Q1,00 | P->P | RANGE |
| 6 | HASH GROUP BY | | 5 | 135 | 6787 (2)| 00:01:22 | Q1,00 | PCWP | |
| 7 | PX BLOCK ITERATOR | | 634K| 16M| 6765 (1)| 00:01:22 | Q1,00 | PCWC | |
|* 8 | TABLE ACCESS FULL| LINEITEM_COMP | 634K| 16M| 6765 (1)| 00:01:22 | Q1,00 | PCWP | |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("L_SHIPDATE"<=TO_DATE(' 1992-09-24 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
- automatic DOP: Computed Degree of Parallelism is 2


統計
----------------------------------------------------------
16 recursive calls
4 db block gets
67041 consistent gets
66958 physical reads
0 redo size
1456 bytes sent via SQL*Net to client
519 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
2 rows processed

DOP=2が自動的に設定され、1.51秒で終わった。

まとめてみる

NOCOMPRESS   automatic DOP=3
COMPRESS    automatic DOP=2

NOCOMPRESSの方がDOPが大きく設定される。 --->アンコンプレスの処理負荷を考慮しているのだろうか?

実行時間は
NOCOMPRESS  2.01秒
COMPRESS   1.51秒

COMPRESSの方が速い。それはphysical read量に比例する
 102004  physical reads
  66958  physical reads

だから、Parallel QueryはCOMPRESSが有利
でも、In-Memory Parallel Queryだと

上記のテスト後にもう一度二つのSQLを実行してみる:


SQL> select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty
2 , sum(l_extendedprice) as sum_base_price
3 , sum(l_extendedprice * (1 - l_discount)) as sum_disc_price
4 , sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge
5 , avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price
6 , avg(l_discount) as avg_disc
7 , count(*) as count_order
8 from lineitem
9 where l_shipdate <= date '1992-12-01' - interval '68' day (3)
10 group by l_returnflag, l_linestatus
11 order by l_returnflag, l_linestatus
12 /

経過: 00:00:00.33

実行計画
----------------------------------------------------------
Plan hash value: 2461824725

------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 135 | 6956 (1)| 00:01:24 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (ORDER) | :TQ10001 | 5 | 135 | 6956 (1)| 00:01:24 | Q1,01 | P->S | QC (ORDER) |
| 3 | SORT GROUP BY | | 5 | 135 | 6956 (1)| 00:01:24 | Q1,01 | PCWP | |
| 4 | PX RECEIVE | | 5 | 135 | 6956 (1)| 00:01:24 | Q1,01 | PCWP | |
| 5 | PX SEND RANGE | :TQ10000 | 5 | 135 | 6956 (1)| 00:01:24 | Q1,00 | P->P | RANGE |
| 6 | HASH GROUP BY | | 5 | 135 | 6956 (1)| 00:01:24 | Q1,00 | PCWP | |
| 7 | PX BLOCK ITERATOR | | 634K| 16M| 6941 (1)| 00:01:24 | Q1,00 | PCWC | |
|* 8 | TABLE ACCESS FULL| LINEITEM | 634K| 16M| 6941 (1)| 00:01:24 | Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("L_SHIPDATE"<=TO_DATE(' 1992-09-24 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
- automatic DOP: Computed Degree of Parallelism is 3


統計
----------------------------------------------------------
34 recursive calls
4 db block gets
106301 consistent gets
0 physical reads
0 redo size
1456 bytes sent via SQL*Net to client
519 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
2 rows processed

SQL> select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty
2 , sum(l_extendedprice) as sum_base_price
3 , sum(l_extendedprice * (1 - l_discount)) as sum_disc_price
4 , sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge
5 , avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price
6 , avg(l_discount) as avg_disc
7 , count(*) as count_order
8 from lineitem_comp
9 where l_shipdate <= date '1992-12-01' - interval '68' day (3)
10 group by l_returnflag, l_linestatus
11 order by l_returnflag, l_linestatus
12 /

経過: 00:00:00.62

実行計画
----------------------------------------------------------
Plan hash value: 3881738590

-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 135 | 6787 (2)| 00:01:22 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (ORDER) | :TQ10001 | 5 | 135 | 6787 (2)| 00:01:22 | Q1,01 | P->S | QC (ORDER) |
| 3 | SORT GROUP BY | | 5 | 135 | 6787 (2)| 00:01:22 | Q1,01 | PCWP | |
| 4 | PX RECEIVE | | 5 | 135 | 6787 (2)| 00:01:22 | Q1,01 | PCWP | |
| 5 | PX SEND RANGE | :TQ10000 | 5 | 135 | 6787 (2)| 00:01:22 | Q1,00 | P->P | RANGE |
| 6 | HASH GROUP BY | | 5 | 135 | 6787 (2)| 00:01:22 | Q1,00 | PCWP | |
| 7 | PX BLOCK ITERATOR | | 634K| 16M| 6765 (1)| 00:01:22 | Q1,00 | PCWC | |
|* 8 | TABLE ACCESS FULL| LINEITEM_COMP | 634K| 16M| 6765 (1)| 00:01:22 | Q1,00 | PCWP | |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("L_SHIPDATE"<=TO_DATE(' 1992-09-24 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
- automatic DOP: Computed Degree of Parallelism is 2


統計
----------------------------------------------------------
16 recursive calls
4 db block gets
67041 consistent gets
0 physical reads
0 redo size
1456 bytes sent via SQL*Net to client
519 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
2 rows processed

一回目のSQLでBuffer_Cache上にデータがあるのでphysical reads=0となる。

NOCOMPRESS   automatic DOP=3
COMPRESS    automatic DOP=2

実行時間は
NOCOMPRESS  0.33秒
COMPRESS   0.62秒

consistent gets量が多いにもかかわらずNOCOMPRESSの方が2倍速い。
 106301   consistent gets
 67041   consistent gets

でも、その差はたった0.3秒。しかし、その僅かな差を積み上げると:

という結果になった

result_cacheで更に50%アップ

TPC-Hベンチマークの続き

In-Memory-nonParallelでの最高結果は同時8セッションで:

今度はResult_Cacheを使ってみると:
同じ8セッションで最高値15000qphを記録した。

result_cacheが効く場合と効かない場合

TPC-HベンチマークのHeavy SQL Top5からひとつ選んで「Where条件」を変えるテストを行ってみた。

buffer_cache flush後


SQL> select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
2 from customer, orders, lineitem
3 where o_orderkey in
4 ( select l_orderkey
5 from lineitem
6 group by l_orderkey
7 having sum(l_quantity) > 312)
8 and c_custkey = o_custkey
9 and o_orderkey = l_orderkey
10 group by c_name, c_custkey
11 , o_orderkey
12 , o_orderdate
13 , o_totalprice
14 order by o_totalprice desc, o_orderdate;
...
...
15行が選択されました。

経過: 00:00:08.52

統計
----------------------------------------------------------
1 recursive calls
0 db block gets
145349 consistent gets
83465 physical reads
0 redo size
1725 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
15 rows processed

8.52秒かかった(physical readsが発生しているから)

having sum(l_quantity) > 312)を313に変えて実行してみる


SQL> select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
2 from customer, orders, lineitem
3 where o_orderkey in
4 ( select l_orderkey
5 from lineitem
6 group by l_orderkey
7 having sum(l_quantity) > 313)
8 and c_custkey = o_custkey
9 and o_orderkey = l_orderkey
10 group by c_name, c_custkey
11 , o_orderkey
12 , o_orderdate
13 , o_totalprice
14 order by o_totalprice desc, o_orderdate;
...
...
12行が選択されました。

経過: 00:00:00.01

統計
----------------------------------------------------------
0 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
1563 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12 rows processed

択された行数も違うのにresult_cacheが効いて0.01秒で終わった。

もう一度312に戻すと


SQL> select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
2 from customer, orders, lineitem
3 where o_orderkey in
4 ( select l_orderkey
5 from lineitem
6 group by l_orderkey
7 having sum(l_quantity) > 312)
8 and c_custkey = o_custkey
9 and o_orderkey = l_orderkey
10 group by c_name, c_custkey
11 , o_orderkey
12 , o_orderdate
13 , o_totalprice
14 order by o_totalprice desc, o_orderdate;
...
...
15行が選択されました。

経過: 00:00:00.02

統計
----------------------------------------------------------
1 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
1725 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
15 rows processed

recursive callが発行され0.02秒となったが、result_cacheは効いた。

今度は314にしてみる


SQL> select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
2 from customer, orders, lineitem
3 where o_orderkey in
4 ( select l_orderkey
5 from lineitem
6 group by l_orderkey
7 having sum(l_quantity) > 314)
8 and c_custkey = o_custkey
9 and o_orderkey = l_orderkey
10 group by c_name, c_custkey
11 , o_orderkey
12 , o_orderdate
13 , o_totalprice
14 order by o_totalprice desc, o_orderdate;
...
...

10行が選択されました。

経過: 00:00:04.17

統計
----------------------------------------------------------
1 recursive calls
0 db block gets
145349 consistent gets
0 physical reads
0 redo size
1463 bytes sent via SQL*Net to client
520 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
10 rows processed

result_cacheは効かなかった。しかし、buffer_cache scan(physical reads=0)で4.17秒で終わった。

結果がまったく同じでなくともresult_cacheは働く。
その理屈は、僕には説明できない。

因みに、Exadata1のTPC-Hベンチマークでも:
result_cache_mode = FORCE を設定している。

"PX Deq Credit: send blkd" アイドルイベント?

PX Deq Credit: send blkdイベントの検証

パラレル度=1で
test1. insert into lineitem_work (select /*+ parallel(L 1) */ * from lineitem L)
パレレル度=40で
test2. insert into lineitem_work (select /*+ parallel(L 40) */ * from lineitem L)

cpu sec elapsed(sec)
test1 8.92 25.53
test2 0.00 60.19

実行時間は2.4倍パラレルのほうが遅くなる。

test2での問題待機イベントは:


Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
PX Deq: Execution Msg 16 0.01 0.21
Disk file operations I/O 1 0.03 0.03
direct path read 16 0.00 0.04
PX Deq Credit: send blkd 1777 2.01 59.63
PX qref latch 4 0.00 0.00
PX Deq Credit: need buffer 15 0.00 0.09
asynch descriptor resize 11 0.00 0.00

前回のテストでも出てきたPX Deq Credit: send blkdだ。

selectがparallelでもInsertがnon parallelだとこうなる。
そこで、InsertもParallelにしてみる、
test3. ALTER SESSION FORCE PARALLEL DML PARALLEL 40;

cpu sec elapsed(sec)
test1 8.92 25.53
test2 0.00 60.19
test3 0.12 10.05

待機イベントPX Deq Credit: send blkdもなくなった:


Event waited on Times Max. Wait Total Waited