Search

OakieTags

Who's online

There are currently 0 users and 33 guests online.

Recent comments

Affiliations

CBO

Concatenated Bitmap Indexes Part II (Everybody’s Got To Learn Sometime)

A basic little post to conclude this discussion. The issues regarding whether to go for single column indexes vs. concatenated indexes are similar for Bitmap indexes as they are for B-Tree indexes.   It’s generally more efficient to access a concatenated index as it’s only the one index with less processing and less throwaway rowids/rows [...]

Query transformation – Part 1

Query transformation is a set of techniques used by the optimizer to rewrite a query and optimizer it better. Few optimization paths open up to the optimizer after query transformation. Some query transformations must be costed to be chosen and some do not need to be costed. For example, if a table can be eliminated completely from the join, then that transformation is applied and need to cost that transformation is minimal.

Test case

We will use the following test case to illustrate the concepts behind Query transformation. Some of the optimizations that we see here works from version 11gR1 onwards and so, these test cases might not work in the versions 10g and below.


create table backup.t1 (n1 number not null primary key, n2 number not null, n3 varchar2(256) );
insert into backup.t1 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100);
create table backup.t2 (n1 number not null primary key , n2 number not null, n3 varchar2(256) );
alter table backup.t1 add constrainT t1_fk foreign key  (n2) references backup.t2(n1)
insert into backup.t2 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100);
insert into backup.t1 select n1, n1, lpad ( n1, 250,'x') from (select level n1 from dual connect by level <=100);

Join elimination (JE)

JE is a technique in which one or more tables can be eliminated from the execution plan without altering functional behavior. In the listing 1-1, query selects columns from the table t1 only, but there exists a join predicate between t1 and t2 in that query. Further, no columns are selected from table t2 in this query and join to t2 simply serves as to verify the existence of foreign key values. Enabled Foreign key constraint between these two tables establishes the existence check already and so, there is no need for explicit existence check in the query also. Join to table t2 can be eliminated by the optimizer safely.

select /*+ gather_plan_statistics */ t1.* from t1, t2 where t1.n2 = t2.n1;
select *  from table(dbms_xplan.display_cursor('','','allstats last'))
------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   1 |  TABLE ACCESS FULL| T1   |      1 |     82 |    100 |00:00:00.01 |      13 |
-----------------------------------------------------------------------------------

Listing 1-1 :JE example case 1

As you see from the listing 1-1, Table T2 is removed from the execution plan. Since there is a valid foreign key constraint, optimizer eliminated the join condition to that table t2.

Let’s also discuss another Join Elimination test case. In the Listing 1-2, predicate is “t1.n2 not in (select t2.n1 from t2)”. As the enabled foreign key constraint dictates that this predicate will always be false and no rows will be returned. Optimizer promptly identified this condition and added a filter predicate in the step 1 with “NULL is NOT NULL” as a predicate. Step 1 is executed before step 2; Step 2 is never executed as the value of Starts column is zero in the execution plan.

SQL_ID  d09kmzum9wgta, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ t1.* from t1 where t1.n2 not in
(select t2.n1 from t2 )

Plan hash value: 3332582666
---------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      0 |00:00:00.01 |
|*  1 |  FILTER            |      |      1 |        |      0 |00:00:00.01 |
|   2 |   TABLE ACCESS FULL| T1   |      0 |    100 |      0 |00:00:00.01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NULL IS NOT NULL)

Listing 1-2 :JE example case 2

Listing 1-3 provides another variation of JE.

select /*+ gather_plan_statistics */ t1.* from t1 where t1.n2 in
(select t2.n1 from t2 )

Plan hash value: 3617692013
------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |    100 |00:00:00.01 |      14 |
|   1 |  TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |      14 |
------------------------------------------------------------------------------------
Listing 1-3 :JE example case 2

Following output shows the trace lines from the 10053 trace file.
JE: Considering Join Elimination on query block SEL$5DA710D3 (#1)
*************************
Join Elimination (JE)
*************************
JE: cfro: T1 objn:74684 col#:2 dfro:T2 dcol#:2
JE: cfro: T1 objn:74684 col#:2 dfro:T2 dcol#:2
Query block (0E0D43D0) before join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT “T1″.”N1″ “N1″,”T1″.”N2″ “N2″,”T1″.”N3″ “N3″ FROM “CBO3″.”T2″ “T2″,”CBO3″.”T1″ “T1″ WHERE “T1″.”N2″=”T2″.”N1″
JE: eliminate table: T2
Registered qb: SEL$14EF7918 0xe0d43d0 (JOIN REMOVED FROM QUERY BLOCK SEL$5DA710D3; SEL$5DA710D3; “T2″@”SEL$2″)

Filter Predicate(s) Generation from constraints

Various filter predicates are generated and added to the execution plan using enabled and validated constraints (check, not null constraints).

In the Listing 1-4, columns n1 and n2 has enabled valid NOT NULL constraints that precludes null values in the columns n1 and n2. Query in the listing 1-4 has predicate “n1 is null or n2 is null” which can never be true. This fact is used by the optimizer to improve the execution plan. Filter predicate (NULL IS NOT NULL) is added in step 1 which will be FALSE. So, Step 2 is never executed as the value of Starts column is 0 in the execution plan. This means that step (2) in the execution plan was never executed and table T1 was never accessed.

select /*+ gather_plan_statistics */ * from t1 where n1 is null or n2 is null;
select *  from table(dbms_xplan.display_cursor('','','allstats last'));
---------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   |
---------------------------------------------------------------------------
|*  1 |  FILTER            |      |      1 |        |      0 |00:00:00.01 |
|   2 |   TABLE ACCESS FULL| T1   |      0 |    100 |      0 |00:00:00.01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NULL IS NOT NULL)

Listing 1-4: Filter predicate generation from NOT NULL constraint.

Let’s add a check constraint to this column to explain this further. In the listing 1-5 a check constraint is added which specifies that “n1 200 and generated predicate n1<200 will nullify each other leading to an always FALSE condition. Optimizer identified this condition and added a filter predicate in step 1: NULL IS NOT NULL.


alter table t1 add constraint t1_n1_lt_150 check (n1 200;
select *  from table(dbms_xplan.display_cursor('','','allstats last'));
---------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   |
---------------------------------------------------------------------------
|*  1 |  FILTER            |      |      1 |        |      0 |00:00:00.01 |
|*  2 |   TABLE ACCESS FULL| T1   |      0 |     20 |      0 |00:00:00.01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NULL IS NOT NULL)
   2 - filter("N1">200)

Listing 1-5:Filter predicate generation from a check constraint

Following lines from the trace file generated from event 10053 shows that a predicate n1<200 is added; This auto-generated predicate and existing predicate canceled each other leading to an eternally FALSE condition.

kkogcp: try to generate transitive predicate from check constraints for SEL$5DA710D3 (#0)
constraint: “T1″.”N1″<200
predicates with check contraints: "T1"."N2"="T2"."N1" AND "T1"."N1"<200
after transitive predicate generation: "T1"."N2"="T2"."N1" AND "T1"."N1"<200
finally: "T1"."N2"="T2"."N1"
apadrv-start: call(in-use=1056, alloc=16344), compile(in-use=44792, alloc=46272)
kkoqbc-start

SJC: Set to Join Conversion

In some cases, the optimizer can convert a set operator to a join operator. Interestingly, this feature is not enabled by default (up to 11gR1). In the listing 1-6, we enable this parameter. A MINUS set operation has been converted to a join operation.

alter session set "_convert_set_to_join"=true;
select /*+ gather_plan_statistics  */ n2 from t1 minus select n1 from t2
Plan hash value: 3050591313
------------------------------------------------------------------------------------...
| Id  | Operation           | Name         | Starts | E-Rows | A-Rows |   A-Time   |...
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              |      1 |        |      0 |00:00:00.01 |...
|   1 |  HASH UNIQUE        |              |      1 |     99 |      0 |00:00:00.01 |
|   2 |   NESTED LOOPS ANTI |              |      1 |     99 |      0 |00:00:00.01 |
|   3 |    TABLE ACCESS FULL| T1           |      1 |    100 |    100 |00:00:00.01 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0010995 |    100 |      1 |    100 |00:00:00.01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("N2"="N1")

Listing 1-6: SJC

There is also a new hint set_to_join with this new feature.


/*+
  ...
      OPT_PARAM('_convert_set_to_join' 'true')
  ...
      SET_TO_JOIN(@"SET$1")
  ...
  */

SU: Subquery Unnesting

Subqueries can be unnested in to a join. Listing 1-7 shows that a subquery is unnested in to a view and then joined to other row sources. In this listing, a correlated subquery is moved in to a view VW_SQ_1, unnested and then joined using Nested Loops Join technique. There are many different variations of Subquery Unnesting possible, but the crux of the matter is that subqueries can be unnested, joined and then costed.

 select /*+ gather_plan_statistics  */ n1 from t1 where n1 >
	(select max(n2) from t2 where t2.n1 = t1.n1)
Plan hash value: 2311753844
-----------------------------------------------------------------------------------...
| Id  | Operation                       | Name         | Starts | E-Rows | A-Rows |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |              |      1 |        |      0 |
|   1 |  NESTED LOOPS                   |              |      1 |      1 |      0 |
|   2 |   VIEW                          | VW_SQ_1      |      1 |      5 |    100 |
|*  3 |    FILTER                       |              |      1 |        |    100 |
|   4 |     HASH GROUP BY               |              |      1 |      5 |    100 |
|   5 |      TABLE ACCESS BY INDEX ROWID| T2           |      1 |    100 |    100 |
|*  6 |       INDEX RANGE SCAN          | SYS_C0010995 |      1 |    100 |    100 |
|*  7 |   INDEX UNIQUE SCAN             | SYS_C0010992 |    100 |      1 |      0 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(MAX("N2")<200)
   6 - access("T2"."N1""MAX(N2)")

SU is one reason why there are many performance issues after a database upgrade to 10g and above. Cost of unnested subquery will go up or down leading to an unfortunate choice of not-so-optimal execution plan.

Use of ORDERED hint can really play havoc with SU feature too. For example, in the listing 1-8, join between t1 and t2 is preferred followed by other joins. You would expect to see the leading table in the join to be T1, but the leading row source is VW_SQ_1.

select /*+ gather_plan_statistics ORDERED  */ t1.n1, t2.n1 from t1 , t2
where t1.n1 = t2.n1 and t1.n1 > (select max(n2) from t2 where t2.n1 =t1.n1)
Plan hash value: 3904485247
------------------------------------------------------------------------------------...
| Id  | Operation                        | Name         | Starts | E-Rows | A-Rows |
------------------------------------------------------------------------------------...
|   0 | SELECT STATEMENT                 |              |      1 |        |      0 |...
|   1 |  NESTED LOOPS                    |              |      1 |      1 |      0 |
|   2 |   NESTED LOOPS                   |              |      1 |      1 |      0 |
|   3 |    VIEW                          | VW_SQ_1      |      1 |      5 |    100 |
|*  4 |     FILTER                       |              |      1 |        |    100 |
|   5 |      HASH GROUP BY               |              |      1 |      5 |    100 |
|   6 |       TABLE ACCESS BY INDEX ROWID| T2           |      1 |    100 |    100 |
|*  7 |        INDEX RANGE SCAN          | SYS_C0010995 |      1 |    100 |    100 |
|*  8 |    INDEX UNIQUE SCAN             | SYS_C0010992 |    100 |      1 |      0 |
|*  9 |   INDEX UNIQUE SCAN              | SYS_C0010995 |      0 |      1 |      0 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(MAX("N2")<200)
   7 - access("T2"."N1""MAX(N2)")
   9 - access("T1"."N1"="T2"."N1")
       filter("T2"."N1"<200)

Is CBO not honoring our hint? It is honoring our hint. Except that ORDERED hint was applied after the SU transformation and so, unnested view is in the leading row source. A variation of the transfromed query from 10053 trace file is printed below. With ORDERED hint, of course, CBO must choose the unnested view as the leading row source. Use LEADING hint instead of ORDERED hint if necessary.

SELECT /*+ ORDERED */ “T1″.”N1″ “N1″,”T2″.”N1″ “N1″ FROM
(SELECT MAX(“T2″.”N2″) “MAX(N2)”,”T2″.”N1″ “ITEM_1″ FROM “CBO3″.”T2″ “T2″ GROUP BY “T2″.”N1″) “VW_SQ_2″,
“CBO3″.”T1″ “T1″,”CBO3″.”T2″ “T2″ WHERE “T1″.”N1″=”T2″.”N1″ AND “T1″.”N1″>”VW_SQ_2″.”MAX(N2)” AND “VW_SQ_2″.”ITEM_1″=”T1″.”N1″

Summary

There are many techniques to cover in one blog entry. We will discuss these features further in upcoming blogs. This can be read in more conventional format: query_transformation_part_1_orainternals

10053 viewer

I’ve been trying to find a way to post an executable for several weeks because I’ve been sent a simple viewer for 10053 trace files written by Hans-Peter Sloot of Atos Origin and Robert van der Ende. They wrote this viewer because trace files from event 10053 can be enormous, and scrolling back and fore [...]

NL join (ordered)

Some time ago there was a thread on the SQL.ru forum where user has asked the never-ending question “why CBO is doing this?”. The problem was a simple count(*) of parent-child tables join with no FK constraint was executed in very strange way: via NESTED LOOPS using child as a driving table. Here is a [...]

CONNECT BY oddity

This week I’ve seen an issue with a CONNECT BY query: for some reason Oracle 10.2.0.4 decided to build a weird plan (the query is weird too, but that’s not my point here ). An explanation of why that happened looks interesting, so here it is. Set up: drop table t2 cascade constraints purge; drop [...]

Frequency histograms - edge cases

Oracle introduced with the 10.2.0.4 patch set a significant change how non-existing values are treated when there is a frequency histogram on a column / expression. See Jonathan Lewis' blog post which is probably the most concise description of the issue. In a nutshell the change is about the following (quoted from Jonathan's post): "If the value you supply does not appear in the histogram, but is inside the low/high range of the histogram then the cardinality will be half the cardinality of the least frequently occurring value that is in the histogram".

I'm still a bit puzzled why Oracle introduced such a significant change to the optimizer with a patch set, but one of the most obvious reasons might be that the change allows to generate frequency histograms using a rather low sample size, because there is no longer a similar threat as before when the frequency histogram misses one of the existing values (which would then return an estimate of 1 if they didn't appear in the histogram).

In fact when using the default DBMS_STATS.AUTO_SAMPLE_SIZE Oracle exactly does this: It uses by default a quite low sample size to perform the additional runs required for each histogram - probably an attempt to minimize the additional work that needs to be done for histogram generation.

In it is however quite interesting to see how exactly this behaviour together with the new treatment of non-existing values can turn into a threat as a recent thread on OTN demonstrated.

Consider the following scenario: You have a column with a highly skewed data distribution; there is a single, very popular value, and a few other, very unpopular values.

Now you have a query type that filters on this column and frequently searches for non-existing values. In order to speed up the query an index has been created on the column, and in order to make the optimizer aware of the fact that the data distribution is highly skewed a histogram is generated, so that the optimizer should favor the index only for those unpopular respectively non-existing values.

The following test case (run on 11.1.0.7) emulates this scenario:

create table t1 (
id number(*, 0)
, t_status number(*, 0)
, vc varchar2(100)
);

-- 1 million rows
-- One very popular value
-- Two very unpopular values
-- in column T_STATUS
insert /*+ append */ into t1 (id, t_status, vc)
with generator as (
select /*+ materialize */
level as id
from
dual
connect by
level <= 10000
)
select /*+ use_nl(v1, v2) */
rownum as id
, case
when rownum <= 10
then 1
when rownum <= 20
then 2
else 0
end as t_status
, rpad('x', 100) as vc
from
generator v1
, generator v2
where
rownum <= 1000000;

commit;

create index t1_idx on t1 (t_status);

exec dbms_stats.gather_table_stats(null, 'T1', method_opt => 'for all columns size 1 for columns t_status size 254')

The first interesting point are the generated column statistics:

SQL>
SQL> -- Note that the basic column statistics
SQL> -- are generated with a much higher sample size
SQL> -- The histogram however was created with a sample size
SQL> -- of 5,000 rows only
SQL> -- Therefore we get three distinct values but only a single bucket
SQL> -- But not always => Potential instability issue
SQL> select
2 column_name
3 , num_distinct
4 , sample_size
5 from
6 user_tab_col_statistics
7 where
8 table_name = 'T1';

COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE
--------------- ------------ -----------
ID 1000000 1000000
T_STATUS 3 5499
VC 1 1000000

SQL>
SQL> -- This results in a single bucket
SQL> -- But not always => Potential instability issue
SQL> select
2 column_name
3 , endpoint_number
4 , endpoint_value
5 from
6 user_tab_histograms
7 where
8 table_name = 'T1'
9 and column_name = 'T_STATUS';

COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------- --------------
T_STATUS 5499 0

Notice the inconsistency: The basic column statistics (number of distinct values, low value, high value) obviously have been generated with a much higher sample size (in fact a compute in this case) than the histogram on T_STATUS. The histogram therefore misses the unpopular values and consists only of a single value - the very popular one.

Now watch closely what happens to the cardinality estimates of the non-existing values:

SQL>
SQL> -- The popular value
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 t_status = 0
8 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 999K| 102M| 4319 (2)| 00:00:52 |
|* 1 | TABLE ACCESS FULL| T1 | 999K| 102M| 4319 (2)| 00:00:52 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("T_STATUS"=0)

13 rows selected.

SQL>
SQL> -- A non-existing value
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 t_status = 1000
8 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 500K| 51M| 4316 (2)| 00:00:52 |
|* 1 | TABLE ACCESS FULL| T1 | 500K| 51M| 4316 (2)| 00:00:52 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("T_STATUS"=1000)

13 rows selected.

SQL>
SQL> -- Two non-existing values
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 t_status in (1000, 2000)
8 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 102M| 4325 (3)| 00:00:52 |
|* 1 | TABLE ACCESS FULL| T1 | 1000K| 102M| 4325 (3)| 00:00:52 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("T_STATUS"=1000 OR "T_STATUS"=2000)

13 rows selected.

SQL>

Ouch, ouch: Whereas the estimate for the popular value is correct, the estimates for the unpopular values are totally way-off - in fact using an IN clause with two non-existing values gets an estimate of all rows contained in the table.

The explanation: Since the least popular value is the single bucket covering virtually all rows, half of it is still 50% of the total cardinality - so two non-existing values in an IN clause end up with a selectivity of 1.

Furthermore the usual decay to values outside the low/high column values doesn't apply here either - no matter what non-existing values get used, the overestimation stays the same.

These overestimates have obviously a significant impact - here the suitable index doesn't get used - more complex plans might turn even into a complete disaster.

The default behaviour that histograms are generated with a much lower sample size than the basic column statistics also introduces a kind of instability - try to run the example several times. Sometimes one of the unpopular values might be caught by the default histogram generation, sometimes not. The effect is dramatic: If one of the unpopular values gets caught, the estimates will be reasonable again, since then the least popular value do have a very low cardinality - if not, you get exactly the opposite result just demonstrated.

If the old behaviour gets re-activated, the results are as expected with the same set of statistics:

SQL>
SQL> -- Switch off new behaviour
SQL> alter session set "_fix_control" = '5483301:off';

Session altered.

SQL>
SQL> -- The popular value
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 t_status = 0
8 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 999K| 102M| 4319 (2)| 00:00:52 |
|* 1 | TABLE ACCESS FULL| T1 | 999K| 102M| 4319 (2)| 00:00:52 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("T_STATUS"=0)

13 rows selected.

SQL>
SQL> -- A non-existing value
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 t_status = 1000
8 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 107 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 107 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T1_IDX | 1 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T_STATUS"=1000)

14 rows selected.

SQL>
SQL> -- Two non-existing values
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 t_status in (1000, 2000)
8 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3743026710

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 107 | 5 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 107 | 5 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T1_IDX | 1 | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T_STATUS"=1000 OR "T_STATUS"=2000)

15 rows selected.

Switching back to the former treatment that non-existing values lead to an estimate of 1 will fix this issue, however since this has a potential impact on every execution plan thorough regression testing would be required with this used as a global setting.

Increasing the sample size is another option, if it ensures that unpopular values get caught by the histogram generation, so that the least popular value of the histogram is one with a low cardinality. Note however that this will increase the amount of work necessary to gather the statistics for the histograms.

SQL>
SQL> -- Gather statistics with 100% sample size
SQL> exec dbms_stats.gather_table_stats(null, 'T1', estimate_percent => null, method_opt => 'for all columns size 1 for columns t_status size 254')

PL/SQL procedure successfully completed.

SQL>
SQL> -- Now the statistics are more representative
SQL> select
2 column_name
3 , num_distinct
4 , sample_size
5 from
6 user_tab_col_statistics
7 where
8 table_name = 'T1';

COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE
--------------- ------------ -----------
ID 1000000 1000000
T_STATUS 3 1000000
VC 1 1000000

SQL>
SQL> -- The histogram now covers also the unpopular values
SQL> select
2 column_name
3 , endpoint_number
4 , endpoint_value
5 from
6 user_tab_histograms
7 where
8 table_name = 'T1'
9 and column_name = 'T_STATUS';

COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------- --------------
T_STATUS 999980 0
T_STATUS 999990 1
T_STATUS 1000000 2

SQL>
SQL> -- Switch back on new behaviour
SQL> alter session set "_fix_control" = '5483301:on';

Session altered.

SQL>
SQL> -- A non-existing value
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 t_status = 1000
8 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 107 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 107 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T1_IDX | 1 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T_STATUS"=1000)

14 rows selected.

SQL>
SQL> -- Two non-existing values
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 t_status in (1000, 2000)
8 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3743026710

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 107 | 5 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 107 | 5 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T1_IDX | 1 | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T_STATUS"=1000 OR "T_STATUS"=2000)

15 rows selected.

As suggested by Jonathan Lewis in the OTN thread, another elegant solution to the problem of searching for non-existing values would be to add a virtual column that filtered out the popular values. This approach has several advantages if applicable: The index maintained is very small, minimizes the maintenance effort (and might also address index-efficiency related issues in case of frequent updates to the column) and solves the histogram issue since the histogram will only cover the unpopular values:

SQL>
SQL> -- Add a virtual column (the same could be achieved using a function-based index instead for pre-11g versions)
SQL> alter table t1 add (t_status_unpop as (case when t_status != 0 then t_status end));

Table altered.

SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'T1', method_opt => 'for columns t_status_unpop size 254')

PL/SQL procedure successfully completed.

SQL>
SQL> -- Since the virtual column only covers
SQL> -- the unpopular values, the histogram
SQL> -- will be very precise even with a low sample size
SQL> select
2 column_name
3 , num_distinct
4 , sample_size
5 from
6 user_tab_col_statistics
7 where
8 table_name = 'T1';

COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE
--------------- ------------ -----------
ID 1000000 1000000
T_STATUS 3 1000000
VC 1 1000000
T_STATUS_UNPOP 2 20

SQL>
SQL> -- The histogram only covers the unpopular values
SQL> select
2 column_name
3 , endpoint_number
4 , endpoint_value
5 from
6 user_tab_histograms
7 where
8 table_name = 'T1'
9 and column_name = 'T_STATUS_UNPOP';

COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------- --------------
T_STATUS_UNPOP 10 1
T_STATUS_UNPOP 20 2

SQL>
SQL> -- Two non-existing values
SQL> -- So we don't run into the same problem here
SQL> -- The estimate is reasonable for non-existing values
SQL> explain plan for
2 select
3 *
4 from
5 t1
6 where
7 t_status_unpop in (1000, 2000)
8 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 108 | 4470 (6)| 00:00:54 |
|* 1 | TABLE ACCESS FULL| T1 | 1 | 108 | 4470 (6)| 00:00:54 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("T_STATUS_UNPOP"=1000 OR "T_STATUS_UNPOP"=2000)

13 rows selected.

I haven't created an index on the virtual column, but the estimate is correct and a suitable index would get used if it existed.

Enhanced Subquery Optimizations in Oracle

While googling I found a fresh article from Oracle for the VLDB journal: Enhanced Subquery Optimization in Oracle. It primarily discusses subqueries – how Oracle deals with them on optimization and at run-time. Paper’s topics overview: subquery coalescing – here I’ve done some testing of the feature partially available in 11gR2 (parallel) group-by pushdown – [...]

Myth: Bitmap Indexes With High Distinct Columns (Supermassive Black Hole)

As discussed in my previous post, it’s a silly myth to suggest a bitmap index should only be used for so-called “low cardinality” columns else the resultant index would be “huge”. I thought it might be worth a quick refresh to see a simple little demonstration why such claims are a nonsense.  There is in fact no limit to [...]

RMOUG 2010: My presentations

It is very disappointing to me that I had to cancel my trip to RMOUG training days. I am sick and was not able to catch the flight due to that.

But, I can always share my presentations here. I had two presentations planned in this training day and can be accessed as below:

Advanced RAC troubleshooting
Riyaj_Advanced_rac_troubleshooting_RMOUG_2010_doc
Riyaj_Advanced_rac_troubleshooting_RMOUG_2010_ppt

Why optimizer hates my sql
Riyaj_Why_optimizer_hates_my_sql_2010

RMOUG training days audience: Please accept my sincere apologies.

How Does An Execution Plan Suddenly Change When The Statistics (And Everything Else) Remains The Same ? (In Limbo)

I’ve slipped this post in as there have been a number of discussions recently on how execution plans have changed while nothing else appears to have changed in the database. How can an execution plan suddenly change when no one has made any changes to the database ?   By no changes, it means that there [...]