Here’s a model of a problem I solved quite recently at a client site. The client’s query was much more complex and the volume of data much larger, but this tiny, two table, example is sufficient to demonstrate the key principle. (Originally I thought I’d have to use three tables to model the problem, which is why you may find my choice of table names a little odd). I ran this example on 11.2.0.2 – which was the client version:
create table t2
as
select
rownum id2,
trunc(dbms_random.value(0,1000)) n2a,
trunc(dbms_random.value(0,1000)) n2b,
lpad(rownum,6,'0') vc2,
lpad('x',100,'x') padding
from
all_objects
where
rownum <= 5000;
alter table t2 add constraint t2_pk primary key(id2, n2a);
create table t3
as
select
rownum id3,
trunc(dbms_random.value(0,1000)) n3a,
trunc(dbms_random.value(0,1000)) n3b,
lpad(rownum,6,'0') vc3,
lpad('x',100,'x') padding
from
all_objects
where
rownum <= 5000;
alter table t3 add constraint t3_pk primary key(n3a, n3b, id3);
-- now collect stats on the table and execute this query (with autotrace enabled)
select
*
from
t2
where
not exists (
select /*+ unnest */
null
from t3
where n3a = n2a
and n3b = n2b
and (id3 = id2 or id3 = id2 + 1000)
)
;
select * from table(dbms_xplan.display);
You’ll note that I’ve included the /*+ unnest */ hint in the subquery because I want Oracle to run this as a hash anti-join; and it would appear to be legal (and simple) to do this given the various not null contraints and primary key information. In fact the plan uses a filter subquery:
------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5000 | 585K| 5015 |
|* 1 | FILTER | | | | |
| 2 | TABLE ACCESS FULL| T2 | 5000 | 585K| 15 |
|* 3 | INDEX RANGE SCAN | T3_PK | 1 | 12 | 2 |
------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT /*+ UNNEST */ 0 FROM "T3" "T3" WHERE
"N3B"=:B1 AND "N3A"=:B2 AND ("ID3"=:B3 OR "ID3"=:B4+1000)))
3 - access("N3A"=:B1 AND "N3B"=:B2)
filter("ID3"=:B1 OR "ID3"=:B2+1000)
I suspect that the optimizer code bypasses the anti-join because of the (carefully bracketed) disjunct (OR) predicate. On the client site this resulted in the subquery being executed 9 million times, reducing an intermediate data set from 9M rows to 2M rows at a cost of 27 million buffer visits and about 60 CPU seconds. Fortunately I was able to dredge up a little bit of propositional calculus and quote the following equivalence:
not( A or B ) <=> (not A and not B)
Equally fortunately I didn’t have to worry about three-valued logic (all relevant columns were declared not null), so I was able to rewrite the query in the form:
select * from t2 where not exists ( select null from t3 where n3a = n2a and n3b = n2b and id3 = id2 ) and not exists ( select null from t3 where n3a = n2a and n3b = n2b and id3 = id2 + 1000 ) ;
With this code Oracle did two unnests and converted to hash anti-joinsin both cases (at least, that’s what happened on the client site – my small sample switched to nested loop anti-joins):
-------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 7200 | 15 |
| 1 | NESTED LOOPS ANTI | | 50 | 7200 | 15 |
| 2 | NESTED LOOPS ANTI | | 4999 | 644K| 15 |
| 3 | TABLE ACCESS FULL| T2 | 5000 | 585K| 15 |
|* 4 | INDEX UNIQUE SCAN| T3_PK | 1 | 12 | |
|* 5 | INDEX UNIQUE SCAN | T3_PK | 5000 | 60000 | |
-------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("N3A"="N2A" AND "N3B"="N2B" AND "ID3"="ID2"+1000)
5 - access("N3A"="N2A" AND "N3B"="N2B" AND "ID3"="ID2")
The change to hash anti-joins was a huge benefit (the nested loop anti-join would have improved things for the client to a degree, but there’s not really an enormouse difference in some cases between a filter subquery and an equivalent nested loop anti/semi-join). In this case the query run time dropped from 95 seconds to 27 seconds – all of it CPU time.
Update: a check of the 10053 trace file for 10.2.0.3 shows the following:
SU: Checking validity of unnesting subquery SEL$2 (#2) SU: SU bypassed: Invalid correlated predicates. SU: Validity checks failed.
On the other hand, it is possible to get unnesting and a hash anti-join with the predicate: id3 between id2 and id2 + 1000. (But see comment 6 and its reply)
Recent comments
21 weeks 2 days ago
31 weeks 9 hours ago
32 weeks 5 days ago
35 weeks 6 days ago
38 weeks 1 day ago
47 weeks 5 days ago
49 weeks 2 days ago
50 weeks 2 days ago
50 weeks 3 days ago
1 year 1 week ago