Search

OakieTags

Who's online

There are currently 0 users and 27 guests online.

Recent comments

Affiliations

CBO Surprise

Well, it surprised me!

I’ve said for a very long time that in principle, ignoring bugs and restrictions, the optimizer will always choose the lowest cost option during its search for an execution path. It turns out that this isn’t true. In a comment attached to a note I had written about a possible bug relating to function-based indexes I was told that there are cases where the optimizer follows a rule that allows it to ignore the lowest cost path if it is derived from a range-based predicate involving unpeekable bind variables.

The trouble is, any statement made about “bind variables” may also apply in any circumstances where the optimizer sees: “unknown value”. Here’s a simplified example that I find a little worrying (running on 11.1):

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	mod(rownum-1,20)	scattered,
	trunc(rownum / 500)	clustered,
	lpad(mod(rownum,2),10)	ind_pad,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 40000
;

create index t1_equi  on t1(scattered, ind_pad, clustered);
create index t1_range on t1(clustered, scattered);

-- call dbms_stats to compute table stats with no histograms

set autotrace traceonly explain

select
	small_vc
from
	t1
where
	scattered = 10
and	clustered between (select 40 from dual)
	          and     (select 41 from dual)
;

select
	/*+ index(t1 t1_range) */
	small_vc
from
	t1
where
	scattered = 10
and	clustered between (select 40 from dual)
	          and     (select 41 from dual)
;

The SQL with its “select constant from dual” may look a little artificial – but it represents a strategy that is used quite commonly; and it’s just one of several code patterns that can kick the optimizer into using the arithmetic for “unknown value at compile time” (another would the case where you use sys_context(), possibly in fine-grained access control – a.k.a. row-level security or virtual private database).

I’ve got two possible indexes on the table that could be used to assist this query – and one of them is significantly larger than the other because it has an extra column (ind_pad) in the middle which does not appear in the where clause. which index is the optimizer going to use ?

-- Default path
-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     5 |    85 |    17 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |     5 |    85 |    13 |
|*  2 |   INDEX RANGE SCAN          | T1_EQUI |     9 |       |     9 |
|   3 |    FAST DUAL                |         |     1 |       |     2 |
|   4 |    FAST DUAL                |         |     1 |       |     2 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("SCATTERED"=10 AND "CLUSTERED">= (SELECT 40 FROM
              "SYS"."DUAL" "DUAL") AND "CLUSTERED"<= (SELECT 41 FROM "SYS"."DUAL"
              "DUAL"))
       filter("CLUSTERED">= (SELECT 40 FROM "SYS"."DUAL" "DUAL") AND
              "CLUSTERED"<= (SELECT 41 FROM "SYS"."DUAL" "DUAL"))

-- Hinted to user t1_range index
------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     5 |    85 |    10 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     5 |    85 |     6 |
|*  2 |   INDEX RANGE SCAN          | T1_RANGE |     9 |       |     2 |
|   3 |    FAST DUAL                |          |     1 |       |     2 |
|   4 |    FAST DUAL                |          |     1 |       |     2 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("CLUSTERED">= (SELECT 40 FROM "SYS"."DUAL" "DUAL") AND
              "SCATTERED"=10 AND "CLUSTERED"<= (SELECT 41 FROM "SYS"."DUAL" "DUAL"))
       filter("SCATTERED"=10)

The optimizer has selected the path with the higher cost – and it’s hard-coded to do this in response to the nature of the predicates and the available choice of indexes. We have two indexes, with predicate usage as follows:

t1_equi (scattered, ind_pad, clustered) with predicates that are: (equality, omitted, range)
t1_range (clustered, scattered) with predicates that are: (range, equality)

The important point is that we have a predicate on column clustered which is range-based with values that are unknown at optimisation time (even though the human eye can see that the values are going to be 40 and 41). Because of this uncertainty the optimizer is coded to bypass the index that starts with this suspect column and use the (higher cost) index where there is a known starting predicate. In effect the optimizer seems to be saying: “in the worst case scenario t1_range might end up doing an index full scan but t1_equi will only have to do a partial range scan”.

I’ve still got a lot of examples I want to work through to see how far ranging this rule is and if it ever gets ignored (what happens, for example, to joins with range-based predicates), but I hope that this preliminary note acts a useful clue when you next see Oracle ignoring the lowest cost path.