Here’s a question that is NOT a trick question, it’s demonstrating an example of optimizer behaviour that might come as a surprise.
I have an index (addr_id0050, effective_date), the first column is numeric, the second is a date. Here’s a query with an execution plan that uses that index:
define m_date='30-Jan-2013'
select
small_vc
from t1
where
addr_id0050 between 24 and 26
and effective_date = to_date('&m_date', 'dd-mon-yyyy')
;
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18 | 396 | 23 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 18 | 396 | 23 |
|* 2 | INDEX SKIP SCAN | T1_I0050 | 18 | | 5 |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ADDR_ID050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30'
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID050"<=26)
filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
So here’s the question – given that my where clause includes a predicate on the first column of the index that would allow an index range scan to take place, wouldn’t you expect Oracle to do a range scan, and how does a skip scan work in this case ?
To push the point a little further I have another column, also numeric, in the same table which appears in a similar index (addr_id2500, effective_date). Here’s the equivalent query with its execution plan.
select
small_vc
from t1
where
addr_id2500 between 24 and 26
and effective_date = to_date('&m_date', 'dd-mon-yyyy')
;
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 23 | 5 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 23 | 5 |
|* 2 | INDEX RANGE SCAN | T1_I2500 | 1 | | 4 |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE('
2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
In this case, with the same starting predicate (but referencing the alternative column), the optimizer produces the index range that we might expect.
The index skip scan isn’t just for cases where the first column of an index is missing from the list of predicates. The basic principle is that the optimizer has the option to do a small number of tiny range scans on an index when the alternative is to do a tablescan or a very large index range scan.
As you might guess from my column names, addr_id0050 holds 50 distinct values, so the range 24 to 26 actually accounts for 6% of the total volume of the index. On the other hand, there are over 800 distinct values for effective_date. There is even an index on (effective_date) but that’s going to pick up lots of rows that are widely scattered throughout the table then throw away 94% of them so Oracle has decided it’s too expensive to use.
So the optimizer has worked out that it can probe for (24, 30th Jan), (25, 30th Jan), and (26, 30th Jan) as the most efficient access path. Of course, it doesn’t know that it’s probing for exactly those values but statistically it assumes that there are only a few possible values that will show up if it probes the index on the first column – in effect using an “inlist iterator” on the first column without knowing in advance what it’s going to find in the list.
When the optimizer sees the query using addr_id2500 (which holds 2,500 distinct values) the arithmetic faviours the option we are familiar with. A simple range scan based on values between 24 and 26 is going to range through roughly l/800th of the index – which is a tiny number of leaf blocks (just one or two, in my case) so the optimizer decides to do that and check every index entry on the way to see if the effective_date holds a suitable value.
If you want to repeat the experiment, I was using 11.2.0.3 with 1MB uniform extents and freelist management. The SQL to create the table and indexes is as follows:
create table t1
as
with generator as (
select --+ materialize
rownum id
from all_objects
where rownum <= 3000
)
select
mod(rownum,2500) addr_id2500,
mod(rownum,50) addr_id0050,
trunc(sysdate) + trunc(mod(rownum,2501)/3) effective_date,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 250000
;
create index t1_i1 on t1(effective_date);
create index t1_i2500 on t1(addr_id2500, effective_date);
create index t1_i0050 on t1(addr_id0050, effective_date);
-- collect stats, no histograms, 11g auto sample size
I had disabled CPU costing to get repeatable results – depending on parameter settings whether you have system stats enabled or not then you may need to tweak the code to change the relative numbers of distinct values in the numeric columns before you see the switch between range scan and skip scan.
Recent comments
17 weeks 3 days ago
27 weeks 1 day ago
28 weeks 6 days ago
32 weeks 1 day ago
34 weeks 3 days ago
43 weeks 6 days ago
45 weeks 3 days ago
46 weeks 3 days ago
46 weeks 4 days ago
49 weeks 2 days ago