Here’s an odd little bug (I think) in the optimizer that showed itself in a thread on the OTN database forum some time ago. The optimizer can choose an index which is NOT the cheapest index for satisfying a query against a single table. Here’s the demo – which I’ve run against 11.1.0.6 using an 8KB block size, ASSM and system allocated extent sizes:
create table t1(
DATE1 DATE,
STRING1 VARCHAR2(40),
NUM1 NUMBER,
NUM2 NUMBER,
NUM3 NUMBER,
NUM4 NUMBER,
NUM5 NUMBER,
STRING2 VARCHAR2(3),
NUM6 NUMBER,
STRING3 VARCHAR2(240),
STRING4 VARCHAR2(240),
STRING5 VARCHAR2(240),
STRING6 VARCHAR2(240),
STRING7 VARCHAR2(240),
STRING8 VARCHAR2(240),
STRING9 VARCHAR2(10)
);
insert into t1(
date1, num1, num3, num6, string9
)
select
sysdate + dbms_random.value(-180, +180),
trunc(dbms_random.value(0,10)),
trunc(dbms_random.value(0,10)),
trunc(dbms_random.value(0,10)),
dbms_random.string('U',6)
from dual
connect by
rownum <= 40000
;
commit;
CREATE INDEX t1_i1 ON t1 (TRUNC(DATE1), NUM3);
CREATE INDEX t1_i2 ON t1 (NUM3, NUM1, TRUNC(DATE1));
begin
dbms_stats.gather_table_stats(
ownname => user,
tabname =>'T1',
estimate_percent => 100,
block_sample => true,
method_opt => 'for all columns size 1',
cascade => true
);
end;
/
explain plan for
SELECT
NVL(SUM(num6),0) num6_SUM
FROM t1
WHERE
1 = 1
and num3 = :b1
AND TRUNC(date1) BETWEEN sysdate-:b2 AND sysdate-:b3
and string9 = :b4
;
select * from table(dbms_xplan.display)
;
explain plan for
SELECT
/*+ index(t1 t1_i1) */
NVL(SUM(num6),0) num6_SUM
FROM t1
WHERE
1 = 1
AND num3 = :b1
AND TRUNC(date1) BETWEEN sysdate-:b2 AND sysdate-:b3
and string9 = :b4
;
select * from table(dbms_xplan.display)
;
The mixture of upper and lower case combined with the presence of redundant columns is because I copied most of the code from the original thread and kept the table and index structures – I don’t usually write messy code.
You’ll notice that I have a query that could use either of the indexes to get to the table, but index t1_i2 starts with a column that holds only 10 distinct values and has an extra column “in the way” so the nature of the query means Oracle will have to scan at least 10% of this index to satisfy the query. Here are the two execution plans.
First the unhinted version of the query – which uses the t1_i2 index. Note the cost, of which the index range scan cost (line 4) is a significant fraction:
-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 33 |
| 1 | SORT AGGREGATE | | 1 | 21 | |
|* 2 | FILTER | | | | |
|* 3 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 21 | 33 |
|* 4 | INDEX RANGE SCAN | T1_I2 | 18 | | 15 |
-----------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYSDATE@!-TO_NUMBER(:B2)<=SYSDATE@!-TO_NUMBER(:B3))
3 - filter("STRING9"=:B4)
4 - access("NUM3"=TO_NUMBER(:B1) AND
TRUNC(INTERNAL_FUNCTION("DATE1"))>=SYSDATE@!-TO_NUMBER(:B2) AND
TRUNC(INTERNAL_FUNCTION("DATE1"))<=SYSDATE@!-TO_NUMBER(:B3))
filter(TRUNC(INTERNAL_FUNCTION("DATE1"))>=SYSDATE@!-TO_NUMBER(:B2)
AND TRUNC(INTERNAL_FUNCTION("DATE1"))<=SYSDATE@!-TO_NUMBER(:B3))
Now the version hinted to use t1_i1. Again note the cost – and that the difference is due only to the lower cost of the index range scan.
-----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 20 |
| 1 | SORT AGGREGATE | | 1 | 21 | |
|* 2 | FILTER | | | | |
|* 3 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 21 | 20 |
|* 4 | INDEX RANGE SCAN | T1_I1 | 18 | | 2 |
-----------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYSDATE@!-TO_NUMBER(:B2)<=SYSDATE@!-TO_NUMBER(:B3))
3 - filter("STRING9"=:B4)
4 - access(TRUNC(INTERNAL_FUNCTION("DATE1"))>=SYSDATE@!-TO_NUMBER(:B2)
AND "NUM3"=TO_NUMBER(:B1) AND TRUNC(INTERNAL_FUNCTION("DATE1"))<=
SYSDATE@!-TO_NUMBER(:B3))
filter("NUM3"=TO_NUMBER(:B1))
Left to its own devices, the optimizer chose the more expensive index – even though in both cases the calculations show that the number of visits to the table to collect the result set is the same.
I haven’t really worked out exactly what causes the optimizer to do the wrong thing but it seems, from a couple of other tests, that if you have two indexes that could be used to satisfy a query with the same table access cost but one of the indexes starts with a range scan on a “virtual column” then the other index will be chosen, even if it happens to be the more expensive index to use.
Recent comments
21 weeks 1 day ago
30 weeks 6 days ago
32 weeks 4 days ago
35 weeks 6 days ago
38 weeks 21 hours ago
47 weeks 4 days ago
49 weeks 1 day ago
50 weeks 1 day ago
50 weeks 2 days ago
1 year 6 days ago