As in – how come a unique (or primary key) index is predicted to return more than one row using a unique scan, for example (running on 10.2.0.3 – but the same type of thing happens on newer versions):
create table t1
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
level <= 1e4
)
select
rownum id,
rownum n1,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 1e4
;
create table t2
as
select *
from t1
where
mod(id,10) = 0
;
-- do something about stats
alter table t1 add constraint t1_pk primary key (id) using index (
create unique index t1_pk on t1(id)
)
;
select
t2.*
from
t2, t1
where
t2.id between 100 and 200
and t1.id = t2.n1
and t1.n1 is not null
;
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 24 | 3048 | 16 |
| 1 | NESTED LOOPS | | 24 | 3048 | 16 |
|* 2 | TABLE ACCESS FULL | T2 | 12 | 1428 | 4 |
|* 3 | TABLE ACCESS BY INDEX ROWID| T1 | 2 | 16 | 1 |
|* 4 | INDEX UNIQUE SCAN | T1_PK | 1 | | |
----------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T2"."ID">=100 AND "T2"."ID"<=200)
3 - filter("T1"."N1" IS NOT NULL)
4 - access("T1"."ID"="T2"."N1")
Notice line 3 – the table access to t1 driven by line 4 (the unique scan of unique index t1_pk) which is predicted to return 2 rows. Clearly something has gone wrong with the optimizer since this is not a possible outcome at runtime.
There are probably a number of paths through the optimizer code that could result in unexpected results like this, and many of them are probably “emergent properties” rather than simple behaviour (for example, the combination of a subquery factoring clause that has been moved inline where it has then accepted a pushed join predicate might lead to an odd combination of calculations that doesn’t make sense). Before getting too excited, though, if you’re seeing an anomaly like this it’s worth taking a quick look at the statistics, you may simply have an unlucky variation in the timing and sample sizes used that happens to have produced a quirky bit of arithmetic.
Note in the example above the place where I commented: “do something about stats”. Normally that line would read “gather stats at this point”; in this case I did something a little bizarre to make sure that the critical table, index, and columns statistics were not self-consistent. In effect I managed to create some stats that said: “every distinct value for the id column will identify two rows – even though it’s got a unique constraint”. Naturally the optimizer arithmetic “believed” the statistics.
I don’t think the specific scenario I engineered in my example can actually happen in newer versions of Oracle – so I’m not going to tell you exactly what I did to force it – but do remember that odd effects can appear simply because the stats collection didn’t quite work the way you wanted.
In 11g, of course, you can test this hypothesis fairly simply by collecting pending stats, and then enabling them in your own session to see what effect they have, without running the risk of damaging public execution plans. It’s also possible that if you use the auto_sample_size in 11g with the approximate NDV feature enabled then this problem will be less likely to appear.
Recent comments
21 weeks 1 day ago
30 weeks 6 days ago
32 weeks 4 days ago
35 weeks 6 days ago
38 weeks 1 day ago
47 weeks 4 days ago
49 weeks 1 day ago
50 weeks 1 day ago
50 weeks 2 days ago
1 year 6 days ago