Here’s a model of a little problem I came across recently. It’s something I wrote about many years ago, and I thought I’d seen a note on Metalink explaining that the issue had been addressed; but the problem is still there, even in 11.2.0.3.
We start with a little data set (and it’s my standard setup of 8KB blocks, LMTs, 1MB uniform extents, and no ASSM):
create table t1 as with generator as ( select --+ materialize rownum id from dual connect by level <= 10000 ) select trunc((rownum-1)/2) product_id, mod(rownum-1,2) flag1, mod(rownum-1,3) flag2, rownum n1, lpad(rownum,30) v1 from generator v1, generator v2 where rownum <= 1000000 ; -- collect stats, compute, no histograms alter table t1 add constraint t1_pk primary key (product_id,flag1, flag2) using index ( create unique index t1_pk on t1(product_id, flag1, flag2) ); create index t1_i1 on t1(product_id, flag2, flag1, n1,v1);
As so often happens with the anomalies I uncover on client sites, there’s something a bit odd about this model. The second index starts with all the columns of the primary key (although the column order is slightly different). There is a rationale for this: the second index makes a big difference to a particular overnight batch job because it eliminates a large number of random accesses to a very big table. The intention is to drop the primary index eventually and use this larger index to cover the primary key.
Here’s an example of a very simple SQL statement which is run very frequently during the day – but don’t ask why this statement happens, there is a good rationale but it would take a lengthy explanation.
select rowid from t1 where product_id = 2500 ;
Which index do you think Oracle will use to execute this query ? If it’s not obvious from the index definitions I’ll give you an extra clue by reporting a few of the index statistics as well.
INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR -------------------- -------- ----------- ----------------- T1_PK 2 2,807 7,088 T1_I1 2 7,996 7,154
The answer is that Oracle uses the larger index. Here are the execution plans I get from hinting the query to use each index in turn:
============ Hinted to PK ============ ---------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 34 | 3 | |* 1 | INDEX RANGE SCAN| T1_PK | 2 | 34 | 3 | ---------------------------------------------------------- ====================== Hinted to larger index ====================== ---------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 34 | 3 | |* 1 | INDEX RANGE SCAN| T1_I1 | 2 | 34 | 3 | ----------------------------------------------------------
The two indexes produce the same cost (and you can check this in more detail in the 10053 trace) – so why does Oracle use the “wrong” index ? Because when you sort the index names alphabetically, T1_I1 comes before T1_PK. If you doubt me at this point, by the way, just run the test and then repeat it using the name T1_XX instead of T1_I1.
You might think the problem has something to do with bad statistics, of course – but that’s why I sepcified compute in the stats collection. To hammer the point home I’ve also run tests hacking the statistics on the T1_I1 index, doubling virtually every statistic (leaf blocks, clustering_factor, etc.) except the number of distinct keys – which I halved. There is a statistical solution – but I’d have to push the statistics on this index a very long way before the arithmetic for this query made the larger index more expensive than the smaller one.
I thought I’d seen a note on MOS (Metalink) saying that this tie-break situation had been addressed with an extra test based on the number of distinct keys, (I’m not sure it’s been implemented, but clearly these two indexes have the same number of distinct key anyway). Perhaps a check on the absolute number of leaf blocks in the index might also be in order if the key count is a tie. In the meantime, of course, I can add a SQL Baseline to the query.
The choice of index doesn’t make any difference to the performance of any individual execution of the query, of course, but it does mean that I don’t have both the indexes taking up space in the cache during the day when the larger index shouldn’t really be used, so statistically the wrong choice has an impact on every execution of this query and every other query on the system.
Recent comments
17 weeks 3 days ago
27 weeks 2 days ago
29 weeks 4 hours ago
32 weeks 1 day ago
34 weeks 3 days ago
44 weeks 5 hours ago
45 weeks 4 days ago
46 weeks 4 days ago
46 weeks 5 days ago
49 weeks 3 days ago