Search

OakieTags

Who's online

There are currently 0 users and 31 guests online.

Recent comments

Affiliations

Skip Scan 2

Here’s a question that is NOT a trick question, it’s demonstrating an example of optimizer behaviour that might come as a surprise.
I have an index (addr_id0050, effective_date), the first column is numeric, the second is a date. Here’s a query with an execution plan that uses that index:

define m_date='30-Jan-2013'

select
	small_vc
from	t1
where
	addr_id0050 between 24 and 26
and	effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    18 |   396 |    23 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |    18 |   396 |    23 |
|*  2 |   INDEX SKIP SCAN           | T1_I0050 |    18 |       |     5 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30'
              00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID050"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

So here’s the question – given that my where clause includes a predicate on the first column of the index that would allow an index range scan to take place, wouldn’t you expect Oracle to do a range scan, and how does a skip scan work in this case ?

To push the point a little further I have another column, also numeric, in the same table which appears in a similar index (addr_id2500, effective_date). Here’s the equivalent query with its execution plan.

select
	small_vc
from	t1
where
	addr_id2500 between 24 and 26
and	effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    23 |     5 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |    23 |     5 |
|*  2 |   INDEX RANGE SCAN          | T1_I2500 |     1 |       |     4 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE('
              2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

In this case, with the same starting predicate (but referencing the alternative column), the optimizer produces the index range that we might expect.

The index skip scan isn’t just for cases where the first column of an index is missing from the list of predicates. The basic principle is that the optimizer has the option to do a small number of tiny range scans on an index when the alternative is to do a tablescan or a very large index range scan.

As you might guess from my column names, addr_id0050 holds 50 distinct values, so the range 24 to 26 actually accounts for 6% of the total volume of the index. On the other hand, there are over 800 distinct values for effective_date. There is even an index on (effective_date) but that’s going to pick up lots of rows that are widely scattered throughout the table then throw away 94% of them so Oracle has decided it’s too expensive to use.

So the optimizer has worked out that it can probe for (24, 30th Jan), (25, 30th Jan), and (26, 30th Jan) as the most efficient access path. Of course, it doesn’t know that it’s probing for exactly those values but statistically it assumes that there are only a few possible values that will show up if it probes the index on the first column – in effect using an “inlist iterator” on the first column without knowing in advance what it’s going to find in the list.

When the optimizer sees the query using addr_id2500 (which holds 2,500 distinct values) the arithmetic faviours the option we are familiar with. A simple range scan based on values between 24 and 26 is going to range through roughly l/800th of the index – which is a tiny number of leaf blocks (just one or two, in my case) so the optimizer decides to do that and check every index entry on the way to see if the effective_date holds a suitable value.

If you want to repeat the experiment, I was using 11.2.0.3 with 1MB uniform extents and freelist management. The SQL to create the table and indexes is as follows:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,2500)				addr_id2500,
	mod(rownum,50)					addr_id0050,
	trunc(sysdate) + trunc(mod(rownum,2501)/3)	effective_date,
	lpad(rownum,10,'0')				small_vc,
	rpad('x',100)					padding
from
	generator	v1,
	generator	v2
where
	rownum <= 250000
;

create index t1_i1 on t1(effective_date);
create index t1_i2500 on t1(addr_id2500, effective_date);
create index t1_i0050 on t1(addr_id0050, effective_date);

-- collect stats, no histograms, 11g auto sample size

I had disabled CPU costing to get repeatable results – depending on parameter settings whether you have system stats enabled or not then you may need to tweak the code to change the relative numbers of distinct values in the numeric columns before you see the switch between range scan and skip scan.