Search

OakieTags

Who's online

There are currently 0 users and 34 guests online.

Recent comments

Affiliations

FBI Bug

Here’s an odd little bug (I think) in the optimizer that showed itself in a thread on the OTN database forum some time ago. The optimizer can choose an index which is NOT the cheapest index for satisfying a query against a single table. Here’s the demo – which I’ve run against 11.1.0.6 using an 8KB block size, ASSM and system allocated extent sizes:

create table t1(
	DATE1              DATE,
	STRING1            VARCHAR2(40),
	NUM1               NUMBER,
	NUM2               NUMBER,
	NUM3               NUMBER,
	NUM4               NUMBER,
	NUM5               NUMBER,
	STRING2            VARCHAR2(3),
	NUM6               NUMBER,
	STRING3            VARCHAR2(240),
	STRING4            VARCHAR2(240),
	STRING5            VARCHAR2(240),
	STRING6            VARCHAR2(240),
	STRING7            VARCHAR2(240),
	STRING8            VARCHAR2(240),
	STRING9            VARCHAR2(10)
);

insert into t1(
	date1, num1, num3, num6, string9
)
select
	sysdate + dbms_random.value(-180, +180),
	trunc(dbms_random.value(0,10)),
	trunc(dbms_random.value(0,10)),
	trunc(dbms_random.value(0,10)),
	dbms_random.string('U',6)
from dual
connect by
	rownum <= 40000
;

commit;

CREATE INDEX t1_i1 ON t1 (TRUNC(DATE1), NUM3);
CREATE INDEX t1_i2 ON t1 (NUM3, NUM1, TRUNC(DATE1));

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		estimate_percent => 100,
		block_sample 	 => true,
		method_opt 	 => 'for all columns size 1',
		cascade		 => true
	);
end;
/

explain plan for
SELECT
	NVL(SUM(num6),0) num6_SUM
FROM t1
WHERE
	1 = 1
and	num3 = :b1
AND	TRUNC(date1) BETWEEN sysdate-:b2 AND sysdate-:b3
and	string9 = :b4
;

select * from table(dbms_xplan.display)
;

explain plan for
SELECT
	/*+ index(t1 t1_i1) */
	NVL(SUM(num6),0) num6_SUM
FROM t1
WHERE
	1 = 1
AND	num3 = :b1
AND	TRUNC(date1) BETWEEN sysdate-:b2 AND sysdate-:b3
and	string9 = :b4
;

select * from table(dbms_xplan.display)
;

The mixture of upper and lower case combined with the presence of redundant columns is because I copied most of the code from the original thread and kept the table and index structures – I don’t usually write messy code.

You’ll notice that I have a query that could use either of the indexes to get to the table, but index t1_i2 starts with a column that holds only 10 distinct values and has an extra column “in the way” so the nature of the query means Oracle will have to scan at least 10% of this index to satisfy the query. Here are the two execution plans.

First the unhinted version of the query – which uses the t1_i2 index. Note the cost, of which the index range scan cost (line 4) is a significant fraction:

-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     1 |    21 |    33 |
|   1 |  SORT AGGREGATE               |       |     1 |    21 |       |
|*  2 |   FILTER                      |       |       |       |       |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T1    |     1 |    21 |    33 |
|*  4 |     INDEX RANGE SCAN          | T1_I2 |    18 |       |    15 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SYSDATE@!-TO_NUMBER(:B2)<=SYSDATE@!-TO_NUMBER(:B3))
   3 - filter("STRING9"=:B4)
   4 - access("NUM3"=TO_NUMBER(:B1) AND
              TRUNC(INTERNAL_FUNCTION("DATE1"))>=SYSDATE@!-TO_NUMBER(:B2) AND
              TRUNC(INTERNAL_FUNCTION("DATE1"))<=SYSDATE@!-TO_NUMBER(:B3))
       filter(TRUNC(INTERNAL_FUNCTION("DATE1"))>=SYSDATE@!-TO_NUMBER(:B2)
              AND TRUNC(INTERNAL_FUNCTION("DATE1"))<=SYSDATE@!-TO_NUMBER(:B3))

Now the version hinted to use t1_i1. Again note the cost – and that the difference is due only to the lower cost of the index range scan.

-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     1 |    21 |    20 |
|   1 |  SORT AGGREGATE               |       |     1 |    21 |       |
|*  2 |   FILTER                      |       |       |       |       |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T1    |     1 |    21 |    20 |
|*  4 |     INDEX RANGE SCAN          | T1_I1 |    18 |       |     2 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SYSDATE@!-TO_NUMBER(:B2)<=SYSDATE@!-TO_NUMBER(:B3))
   3 - filter("STRING9"=:B4)
   4 - access(TRUNC(INTERNAL_FUNCTION("DATE1"))>=SYSDATE@!-TO_NUMBER(:B2)
              AND "NUM3"=TO_NUMBER(:B1) AND TRUNC(INTERNAL_FUNCTION("DATE1"))<=
              SYSDATE@!-TO_NUMBER(:B3))
       filter("NUM3"=TO_NUMBER(:B1))

Left to its own devices, the optimizer chose the more expensive index – even though in both cases the calculations show that the number of visits to the table to collect the result set is the same.

I haven’t really worked out exactly what causes the optimizer to do the wrong thing but it seems, from a couple of other tests, that if you have two indexes that could be used to satisfy a query with the same table access cost but one of the indexes starts with a range scan on a “virtual column” then the other index will be chosen, even if it happens to be the more expensive index to use.