Search

Top 60 Oracle Blogs

Recent comments

Cardinality Change

Here’s an entertaining little change across versions of Oracle, brought to my attention by Tony Hasler during UKOUG Tech 14. It’s a join cardinality estimate, so here are a couple of tables to demonstrate the issue – the only columns needed are the alpha_06 columns, but I reused some code from other demonstrations to create my test case, so there are lots of irrelevant columns in the create table script:


create table t1 nologging as
with generator as (
        select rownum id
        from dual
        connect by rownum <= 1000
)
select
        rownum                                          id,
        mod(rownum-1,200)                               mod_200,
        trunc(dbms_random.value(0,300))                 rand_300,
        mod(rownum-1,10000)                             mod_10000,
        trunc(sysdate) +
                trunc(dbms_random.value(0,1000))        date_1000,
        dbms_random.string('l',6)                       alpha_06,
        dbms_random.string('l',20)                      alpha_20
from
        generator,
        generator
where
        rownum <= 1e6
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

create table t2 nologging as select * from t1;
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for all columns size 1')

I’m going to join t1 to t2 with a predicate based on the alpha_06 columns – using a LIKE predicate. Before I do so I’ll point out that there are are 1,000,000 rows in the table, and (checking the column stats) 985,920 distinct values for alpha_06. Here’s my query, with the execution plan I got from 11.1.0.7:


select
        count(*)
from
        t1, t2
where
        t2.alpha_06 like t1.alpha_06
;

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    14 |  1122M  (6)|999:59:59 |
|   1 |  SORT AGGREGATE     |      |     1 |    14 |            |          |
|   2 |   NESTED LOOPS      |      |    50G|   651G|  1122M  (6)|999:59:59 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|  6835K|  1123   (6)| 00:00:06 |
|*  4 |    TABLE ACCESS FULL| T2   | 50000 |   341K|  1122   (6)| 00:00:06 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ALPHA_06" LIKE "T1"."ALPHA_06")

The 50,000 cardinality estimate for t2 looks like the standard 5% guess for “column >= {unknown value}”, following which the join cardinality of 50G is the same 5% guess applied to the Cartesian join between t1 and t2 (1M * 1M * 0.05). It’s not a good estimate in my case because the right answer happens to be close to 1M rows, specifically 1,003,176. So let’s upgrade to 11.2.0.4 and see what we get instead:

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    14 |  1050M  (6)|999:59:59 |
|   1 |  SORT AGGREGATE     |      |     1 |    14 |            |          |
|   2 |   NESTED LOOPS      |      |  2014K|    26M|  1050M  (6)|999:59:59 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|  6835K|  1051   (6)| 00:00:06 |
|*  4 |    TABLE ACCESS FULL| T2   |     2 |    14 |  1050   (6)| 00:00:06 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ALPHA_06" LIKE "T1"."ALPHA_06")

The estimate has dropped from 50 Billion rows down to 2 Million – a factor of about 25,000: possibly an indicator that the algorithm has changed, and that a few people might find execution plans changing as they upgrade to a newer version of Oracle. The change occurred at 11.2.0.2 as revealed by fix control 9303766 which has the description: “use 1/NDV+1/NROWS for col1 LIKE col2 selectivities”.

Just as a quick check on the arithmetic: there are 1 million rows in table t2, with (as noted above) 985,920 distinct values in the column, so the selectivity should be: 1/1000000 + 1/985920 = 2.014281 * e-6. Multiply the selectivity by 1e6 and you get 2, the cardinality estimate for t2; multiply the selectivity by 1M*1M (the Cartesian join) and you get 2,014,281, the cardinality estimate of the join. QED.

There are workarounds, of course. One would be to reverse out the fix control, either as an initialisation parameter or in a session logon trigger, another might be to modify the SQL – I think the following would be equivalent:


select
        *
from    t1, t2
where
        t2.alpha_06 like substr(t1.alpha_06,1,length(t1.alpha_06))||'%'
and     t1.alpha_06 is not null
and     t2.alpha_06 is not null

This changes the critical predicate from the form “col1 like col2″ to “col1 like {unknown value from function}” i.e. back to a case where the optimizer uses the 5% guess, and the cardinality estimates go back the original values.