Search

OakieTags

Who's online

There are currently 0 users and 26 guests online.

Recent comments

Affiliations

Index naming

Here’s a model of a little problem I came across recently. It’s something I wrote about many years ago, and I thought I’d seen a note on Metalink explaining that the issue had been addressed; but the problem is still there, even in 11.2.0.3.

We start with a little data set (and it’s my standard setup of 8KB blocks, LMTs, 1MB uniform extents, and no ASSM):

create table t1
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 10000
)
select
	trunc((rownum-1)/2)	product_id,
	mod(rownum-1,2)		flag1,
	mod(rownum-1,3)		flag2,
	rownum			n1,
	lpad(rownum,30)		v1
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;


-- collect stats, compute, no histograms

alter table t1 add constraint t1_pk primary key (product_id,flag1, flag2) 
using index (
	create unique index t1_pk on t1(product_id, flag1, flag2)
);

create index t1_i1 on t1(product_id, flag2, flag1, n1,v1);

As so often happens with the anomalies I uncover on client sites, there’s something a bit odd about this model. The second index starts with all the columns of the primary key (although the column order is slightly different). There is a rationale for this: the second index makes a big difference to a particular overnight batch job because it eliminates a large number of random accesses to a very big table. The intention is to drop the primary index eventually and use this larger index to cover the primary key.

Here’s an example of a very simple SQL statement which is run very frequently during the day – but don’t ask why this statement happens, there is a good rationale but it would take a lengthy explanation.

select
	rowid
from
	t1
where
	product_id = 2500
;

Which index do you think Oracle will use to execute this query ? If it’s not obvious from the index definitions I’ll give you an extra clue by reporting a few of the index statistics as well.

INDEX_NAME             BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR 
-------------------- -------- ----------- ----------------- 
T1_PK                       2       2,807             7,088 
T1_I1                       2       7,996             7,154 

The answer is that Oracle uses the larger index. Here are the execution plans I get from hinting the query to use each index in turn:

============
Hinted to PK
============

----------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT |       |     2 |    34 |     3 |
|*  1 |  INDEX RANGE SCAN| T1_PK |     2 |    34 |     3 |
----------------------------------------------------------

======================
Hinted to larger index
======================

----------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT |       |     2 |    34 |     3 |
|*  1 |  INDEX RANGE SCAN| T1_I1 |     2 |    34 |     3 |
----------------------------------------------------------

The two indexes produce the same cost (and you can check this in more detail in the 10053 trace) – so why does Oracle use the “wrong” index ? Because when you sort the index names alphabetically, T1_I1 comes before T1_PK. If you doubt me at this point, by the way, just run the test and then repeat it using the name T1_XX instead of T1_I1.

You might think the problem has something to do with bad statistics, of course – but that’s why I sepcified compute in the stats collection. To hammer the point home I’ve also run tests hacking the statistics on the T1_I1 index, doubling virtually every statistic (leaf blocks, clustering_factor, etc.) except the number of distinct keys – which I halved. There is a statistical solution – but I’d have to push the statistics on this index a very long way before the arithmetic for this query made the larger index more expensive than the smaller one.

I thought I’d seen a note on MOS (Metalink) saying that this tie-break situation had been addressed with an extra test based on the number of distinct keys, (I’m not sure it’s been implemented, but clearly these two indexes have the same number of distinct key anyway). Perhaps a check on the absolute number of leaf blocks in the index might also be in order if the key count is a tie. In the meantime, of course, I can add a SQL Baseline to the query.

The choice of index doesn’t make any difference to the performance of any individual execution of the query, of course, but it does mean that I don’t have both the indexes taking up space in the cache during the day when the larger index shouldn’t really be used, so statistically the wrong choice has an impact on every execution of this query and every other query on the system.