Listening to a presentation by Paul Matuszyk on extended statistics yesterday, I learned something that I should have spotted ages ago. Here’s a little demo script to introduce the point:
create table t1
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
level <= 1000)
select
rownum id,
mod(rownum,100) n1,
mod(rownum,100) n2,
mod(rownum,100) n3,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 1000000;
create index t1_i1 on t1(n1, n2, n3);
-- collect stats, no histograms.
Notice that in any one row the values of n1, n2, and n3 are all the same, and given the way I’ve used the mod() function to generate the values there are only 100 possible combinations across the entire million rows. This fact will be echoed in the number of distinct keys in the (non-unique) index t1_i1.
Now take a look at the following queries:
select * from t1 where n1 = 50 and n2 = 50 and n3 = 50 ; select n1, n2, n3, max(small_vc) from t1 group by n1, n2, n3 ; select max(t1.small_vc), max(t2.small_vc) from t1, t1 t2 where t2.n1 = t1.n1 and t2.n2 = t1.n2 and t2.n3 = t1.n3 ;
Here are the execution plans for the queries – two plans per query, the first of each pair comes from 10.2.0.5, the second comes from 11.1.0.7. Spot the differences:
Simple select ---------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 125 | 2761 | |* 1 | TABLE ACCESS FULL| T1 | 1 | 125 | 2761 | ---------------------------------------------------------- ---------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 1220K| 2761 | |* 1 | TABLE ACCESS FULL| T1 | 10000 | 1220K| 2761 | ---------------------------------------------------------- Simple aggregate ------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 500K| 9765K| | 6012 | | 1 | HASH GROUP BY | | 500K| 9765K| 30M| 6012 | | 2 | TABLE ACCESS FULL| T1 | 1000K| 19M| | 2761 | ------------------------------------------------------------------- ----------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ----------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 2000 | 5033 | | 1 | HASH GROUP BY | | 100 | 2000 | 5033 | | 2 | TABLE ACCESS FULL| T1 | 1000K| 19M| 2761 | ----------------------------------------------------------- Simple join -------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | | 6495 | | 1 | SORT AGGREGATE | | 1 | 40 | | | |* 2 | HASH JOIN | | 1000K| 38M| 30M| 6495 | | 3 | TABLE ACCESS FULL| T1 | 1000K| 19M| | 2761 | | 4 | TABLE ACCESS FULL| T1 | 1000K| 19M| | 2761 | -------------------------------------------------------------------- -------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | | 1006K| | 1 | SORT AGGREGATE | | 1 | 40 | | | |* 2 | HASH JOIN | | 10G| 372G| 30M| 1006K| | 3 | TABLE ACCESS FULL| T1 | 1000K| 19M| | 2761 | | 4 | TABLE ACCESS FULL| T1 | 1000K| 19M| | 2761 | --------------------------------------------------------------------
In 11g, the optimizer can use the number of distinct keys for an index in much the same way that it can take advantage of extended statistics (specifically in the “column group” variant) to improve its estimates of cardinality for queries involving predicates on correlated columns. In this case the 10g code has seen three columns with 100 distinct values in each, from which is has inferred that there could be 1,000,000 combinations (100 * 100 * 100) in the table; but 11g has based its calculations on the known number of distinct keys in the index – which is just 100 – leading to dramatically different results.
The optimizer has used an “index sanity” check when calculating join cardinalities since (at least) Oracle 8i – but the check has been restricted to unique indexes. From 11g it is effectively in use for any index.
The code has been back-ported to 10.2.0.4 (and 10.2.0.5) but requires you to set a value for the hidden _fix_control parameter (which you can do at the session or system level) to enable the feature. There are three levels for the setting (corresponding to my three queries) – single table cardinality, aggregation, and join cardinality. To enable the feature you need to execute a statement like:
alter session set "_fix_control"=''5765456:N';
The value for N can be 1, 3, or 7 (It’s a bitmap, but the effects are cumulative – the values are 1, 1 + 2, 1 + 2 + 4).
I’ve often demonstrated the effect of the index sanity check in my seminars and tutorials, with a follow-up comment to the effect that I don’t understand why the optimized only applies the mechanism to unique indexes. From 11g, it doesn’t; the thing is, as I also point out in the seminar, this makes it harder to work out if it’s safe to drop an index. Even when an index doesn’t appear in an execution plan Oracle may be using its number of distinct keys to evaluate and choose an execution plan. So, to guard against changes in plans when you drop a multi-column index, you might want to think about creating extended stats on a set of columns that matches the index definition before dropping the index.
Recent comments
17 weeks 3 days ago
27 weeks 2 days ago
28 weeks 6 days ago
32 weeks 1 day ago
34 weeks 3 days ago
44 weeks 5 min ago
45 weeks 3 days ago
46 weeks 3 days ago
46 weeks 4 days ago
49 weeks 3 days ago