Search

OakieTags

Who's online

There are currently 0 users and 46 guests online.

Recent comments

Affiliations

Unique Fail

As in – how come a unique (or primary key) index is predicted to return more than one row using a unique scan, for example (running on 10.2.0.3 – but the same type of thing happens on newer versions):

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	rownum			n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e4
;

create table t2
as
select	*
from	t1
where
	mod(id,10) = 0
;

-- do something about stats

alter table t1 add constraint t1_pk primary key (id) using index (
	create unique index t1_pk on t1(id)
)
;

select
	t2.*
from
	t2, t1
where
	t2.id between 100 and 200
and	t1.id = t2.n1
and	t1.n1 is not null
;

----------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost  |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |    24 |  3048 |    16 |
|   1 |  NESTED LOOPS                |       |    24 |  3048 |    16 |
|*  2 |   TABLE ACCESS FULL          | T2    |    12 |  1428 |     4 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| T1    |     2 |    16 |     1 |
|*  4 |    INDEX UNIQUE SCAN         | T1_PK |     1 |       |       |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."ID">=100 AND "T2"."ID"<=200)
   3 - filter("T1"."N1" IS NOT NULL)
   4 - access("T1"."ID"="T2"."N1")

Notice line 3 – the table access to t1 driven by line 4 (the unique scan of unique index t1_pk) which is predicted to return 2 rows. Clearly something has gone wrong with the optimizer since this is not a possible outcome at runtime.

There are probably a number of paths through the optimizer code that could result in unexpected results like this, and many of them are probably “emergent properties” rather than simple behaviour (for example, the combination of a subquery factoring clause that has been moved inline where it has then accepted a pushed join predicate might lead to an odd combination of calculations that doesn’t make sense). Before getting too excited, though, if you’re seeing an anomaly like this it’s worth taking a quick look at the statistics, you may simply have an unlucky variation in the timing and sample sizes used that happens to have produced a quirky bit of arithmetic.

Note in the example above the place where I commented: “do something about stats”. Normally that line would read “gather stats at this point”; in this case I did something a little bizarre to make sure that the critical table, index, and columns statistics were not self-consistent. In effect I managed to create some stats that said: “every distinct value for the id column will identify two rows – even though it’s got a unique constraint”. Naturally the optimizer arithmetic “believed” the statistics.

I don’t think the specific scenario I engineered in my example can actually happen in newer versions of Oracle – so I’m not going to tell you exactly what I did to force it – but do remember that odd effects can appear simply because the stats collection didn’t quite work the way you wanted.

In 11g, of course, you can test this hypothesis fairly simply by collecting pending stats, and then enabling them in your own session to see what effect they have, without running the risk of damaging public execution plans. It’s also possible that if you use the auto_sample_size in 11g with the approximate NDV feature enabled then this problem will be less likely to appear.