Search

OakieTags

Who's online

There are currently 0 users and 28 guests online.

Recent comments

Affiliations

10gR2

IS NULL Conditions and B-tree Indexes

At page 383 of my book I wrote the following sentence (BTW, the same information is also provided by Table 9-3 at page 381):

With B-tree indexes, IS NULL conditions can be applied only through composite B-tree indexes when several SQL conditions are applied and at least one of them is not based on IS NULL or an inequality.

The text continues by showing the following examples (notice that in both cases the IS NULL predicate is applied through an access predicate):

Quiz Night

I've recently come across an interesting variation of a "famous" ASSM bug. Probably some of you will remember that ASSM bug that was caused by row migrations in larger block sizes (16K/32K).

If you don't remember or don't know what I'm talking about, you can have a look here where Greg Rahn provides a summary of the issue or check My Oracle Support bug description 6918210.

Greg also links to a script originally created by Jonathan Lewis that allows to reproduce the issue at will.

So far the issue was only reproduced on block sizes greater 8K - the variation I've encountered however allows to reproduce the issue on 8K and 4K, possibly also on 2K, but I haven't tested 2K yet.

Below is my version of script. If you compare it to Jonathan's version you'll notice that it is very similar, if not to say almost the same except for additional optional instrumentation, that you can simply un-comment if you've installed my Advanced Oracle Troubleshooting script package that is based on Tanel Poder's awesome "tpt_public" tool set.

The SESSPACK tool can be found in Tanel's tool set (tools/sesspack_0.05_release) and the SNAP_KCBSW package has been developed by Jonathan a long time ago - it can be found here. Note that it only works for versions below 11g - this instrumentation has been "optimized away" in 11g, unfortunately.

In order to reduce the runtime, I've simply limited the number of rows in the table to 50,000 rows.

set echo on timing on

drop table t1;

purge table t1;

CREATE TABLE t1
(n1 NUMBER,
n2 NUMBER)
TABLESPACE &tblspace;

INSERT --+ append
INTO t1
SELECT TRUNC(dbms_random.VALUE(10000000,100000000)) n1,
TO_NUMBER(NULL) AS n2
FROM dual
CONNECT BY LEVEL <= 50000
/

BEGIN dbms_stats.gather_table_stats(
ownname => null,
tabname => 'T1');
END;
/

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

/* Uncomment for instrumentation
@trci assm_bug

@trc_f

@trc_p

exec sesspack.snap_me

execute snap_kcbsw.start_snap

@46on 8
*/

alter session set events '10046 trace name context forever, level 8';

UPDATE t1 SET n2 = n1;

commit;

/* Uncomment for instrumentation
@trci assm_bug_off

@46off
*/

alter session set events '10046 trace name context off';

/* Uncomment for instrumentation
set serveroutput on size 1000000 format wrapped
set linesize 120
set trimspool on

execute snap_kcbsw.end_snap

exec sesspack.snap_me
*/

BEGIN dbms_stats.gather_table_stats(
ownname => null,
tabname => 'T1');
END;
/

SELECT num_rows,blocks FROM user_tables WHERE table_name = 'T1';

/* Uncomment for instrumentation
@trc_orasrp &trc_p &trc_f

@trc_tvdxtat &trc_p &trc_f
*/

Here is the task: You are allowed to modify the script at exactly one single location - the modification can take a maximum of four keywords, which means you can add/modify/remove at most four keywords.

With the correct modification you will be able to reproduce the bug even in 8K and lower block sizes.

So, what to modify and why?

If you want to actually run the script yourself you need to use database versions prior 11.2 because the bug is obviously fixed there - this includes 10.2.0.5, which interestingly doesn't have the bug fixed.

I've used a 8K/4K ASSM tablespace with UNIFORM 1M extents for my tests, but I don't think that the extent management matters in that case. My test database uses 8K as default block size.

You'll notice the bug when checking the runtime and the trace file. If you encounter the bug, the runtime for the update will be several seconds (more than 10 seconds seen on my test system in some cases) and the number of current mode gets for the update will be in the millions.

If you've enabled the additional instrumentation it will tell you that the reasons for the buffer gets where "ktspfsrch" and "ktspscan_bmb" for most of the gets. You can also take stack traces (e.g. using Tanel's OStackProf tool) if you use more than 50,000 rows to have a longer runtime of the update statement which will show you similar function names on the stack.

If you don't hit the bug, the update usually takes max. 1-2 seconds, and the current mode gets should be far less than one million when sticking to the 50,000 rows.

P.S.: There is more than one correct answer - and it is possible to hit the bug for 8K block sizes with a single keyword modification (full points!).

Update 24th Jan: P.P.S: No takers yet... So here's an additional hint: The issue is caused by row migration...

Update 26th Jan: OK, time to post a quick answer here. As pointed out by Narendra below, simply setting PCTFREE to 0 already was sufficient to reproduce the issue with smaller block sizes. However, there is much more to tell about and therefore this deserves a separate post that I'll publish the next couple of days.

For the time being here are the correct answers that I'm aware of at present:

- PCTFREE 0
- COMPRESS
- COMPRESS FOR ALL OPERATIONS

But as I already said, there is much more, in particular when partitioning comes into the picture - and I hope to cover all these details in the upcoming post.

Hash Aggregation

Oracle introduced in Oracle 10g the hash aggregation as a new feature. It can be used for both GROUP BY and UNIQUE operations (HASH GROUP BY and HASH UNIQUE respectively) and is by default the preferred aggregation method if there is no particular reason that lets the cost based optimizer prefer the sort based aggregation (SORT GROUP BY and SORT UNIQUE), for example if the GROUP BY is followed by an ORDER BY on the same expression (using the SORT GROUP BY in such cases is not always beneficial by the way, see Guy Harrison's blog for an example).

Ever since its introduction from time to time I've heard complaints about performance degradations of aggregation operations that are based on the new hash aggregation algorithm compared to the previously used sort based aggregations.

Now there may be many potential reasons for such performance degradations (and possibly many of them might not have anything to do with the hash aggregation) but here is a surprising revelation that might explain why some of them were indeed caused by the switch to the new algorithm: The hash aggregation operation does not work very well together with the automatic PGA management (WORKAREA_SIZE_POLICY = AUTO, default since 9i). The fundamental defect is that it is not able to dynamically resize to a larger workarea size when using automatic PGA management and therefore remains more or less at its initial expected size based on the estimates at optimization time.

This effectively means that the efficiency of the hash aggregation operation when using automatic PGA management is heavily dependant on the cardinality estimates at optimization time - in case of estimates in the right ballpark, the memory used at execution time will correspond to the actual requirements at runtime, but in case of bad estimates, the operation potentially uses far less memory than available and unnecessarily spills to disk.

Let's start with a simple script to demonstrate the issue:

set echo on timing on

show parameter pga
show parameter processes

-- alter session set workarea_size_policy = manual sort_area_size = 40000000;

drop table t1 purge;

drop table t2 purge;

create table t1
as
select mod(rownum, 27) as user_id, rownum as object_id from dual connect by level <= 30000;

create table t2
as
select distinct user_id from t1;

exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

column low_val new_value low_value
column high_val new_value high_value

select
max(user_id) + 1 as low_val
, max(user_id) + max(user_id) - min(user_id) + 1 as high_val
from
t2;

-- Invalidate any cursors using T1
comment on table t1 is '';

alter session set statistics_level = all;

-- alter session set tracefile_identifier = 'smm_trace_correct_cardinality';

-- alter session set "_smm_trace" = 65535;

select /*+ full(@inner t1)
full(@inner t2)
full(@inner t3)
leading(@inner t2 t3 t1)
use_nl(@inner t3)
use_hash(@inner t1)
no_swap_join_inputs(@inner t1)
use_hash_aggregation(@inner)
*/
max(cnt)
from
(
select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
from t1, t2, t2 t3
where t1.user_id = t2.user_id
group by t1.object_id, t3.user_id
)
;

-- alter session set "_smm_trace" = 0;

set pagesize 0 linesize 200 trimspool on tab off

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

set pagesize 14

-- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
declare
srec dbms_stats.statrec;
novals dbms_stats.numarray;
distcnt number;
avgclen number;
nullcnt number;
density number;
srec2 dbms_stats.statrec;
begin
dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
srec2.epc := 2;
novals := dbms_stats.numarray(
&low_value,
&high_value
);
srec2.bkvals := null;
dbms_stats.prepare_column_values(srec2,novals);
dbms_stats.set_column_stats(
ownname=>null,
tabname=>'t1',
colname=>'user_id',
distcnt=>distcnt,
nullcnt=>nullcnt,
srec=>srec2,
avgclen=>avgclen,
density=>density
);
end;
/

-- Invalidate any cursors using T1
comment on table t1 is '';

alter session set statistics_level = all;

-- alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';

-- alter session set "_smm_trace" = 65535;

select /*+ full(@inner t1)
full(@inner t2)
full(@inner t3)
leading(@inner t2 t3 t1)
use_nl(@inner t3)
use_hash(@inner t1)
no_swap_join_inputs(@inner t1)
use_hash_aggregation(@inner)
*/
max(cnt)
from
(
select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
from t1, t2, t2 t3
where t1.user_id = t2.user_id
group by t1.object_id, t3.user_id
)
;

-- alter session set "_smm_trace" = 0;

set pagesize 0 linesize 200 trimspool on tab off

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

I first start with showing the current PGA_AGGREGATE_TARGET (P_A_T) and PROCESSES setting.

I create then two tables as sample data that will be joined and already collect information about the minimum and maximum value of the join column.

Any cursors that are dependent on table T1 will then be invalidated to make sure that a reoptimization of the next query will take place (of course running the complete script invalidates such cursors anyway due to the drop and re-create of the tables).

I then run a simple join followed by a group by operation. I've added a cartesian join by the way, but its only purpose is to ensure that the generated row source is sufficiently large to give the group by something to do.

I have used hints to ensure that I always get the same execution plan even if I later on manipulate the statistics to see the effect of incorrect cardinality estimates.

After getting the actual runtime execution plan along with execution statistics I modify the column statistics of one of the tables' join column in such a way that the optimizer thinks that there is no overlap between the join column values and therefore computes a very low join cardinality.

After making sure again that a reoptimization will take place I run the same statement again with the same data volume and the same hints in place.

And this is what I get for all major versions that are currently out there (10.2.0.4, 10.2.0.5, 11.1.0.7, 11.2.0.1 and 11.2.0.2):

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 200M
SQL> show parameter processes

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 0
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 150
SQL>
SQL> -- alter session set workarea_size_policy = manual sort_area_size = 40000000;
SQL>
SQL> drop table t1 purge;
drop table t1 purge
*
ERROR at line 1:
ORA-00942: table or view does not exist

Elapsed: 00:00:00.01
SQL>
SQL> drop table t2 purge;
drop table t2 purge
*
ERROR at line 1:
ORA-00942: table or view does not exist

Elapsed: 00:00:00.00
SQL>
SQL> create table t1
2 as
3 select mod(rownum, 27) as user_id, rownum as object_id from dual connect by level <= 30000;

Table created.

Elapsed: 00:00:00.07
SQL>
SQL> create table t2
2 as
3 select distinct user_id from t1;

Table created.

Elapsed: 00:00:00.06
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> column low_val new_value low_value
SQL> column high_val new_value high_value
SQL>
SQL> select
2 max(user_id) + 1 as low_val
3 , max(user_id) + max(user_id) - min(user_id) + 1 as high_val
4 from
5 t2;

LOW_VAL HIGH_VAL
---------- ----------
27 53

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.04
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> -- alter session set tracefile_identifier = 'smm_trace_correct_cardinality';
SQL>
SQL> -- alter session set "_smm_trace" = 65535;
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:00.71
SQL>
SQL> -- alter session set "_smm_trace" = 0;
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner
t2 t3 t1) use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner
t1) use_hash_aggregation(@inner) */ max(cnt) from ( select /*+ qb_name(inner) */ count(*)
as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by t1.object_id,
t3.user_id )

Plan hash value: 3134842094

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.72 | 139 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.72 | 139 | | | |
| 2 | VIEW | | 1 | 570K| 810K|00:00:00.43 | 139 | | | |
| 3 | HASH GROUP BY | | 1 | 570K| 810K|00:00:00.43 | 139 | 35M| 5521K| 38M (0)|
|* 4 | HASH JOIN | | 1 | 807K| 810K|00:00:00.02 | 139 | 1348K| 1348K| 1093K (0)|
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29907 | 30000 |00:00:00.01 | 55 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

29 rows selected.

Elapsed: 00:00:00.12
SQL>
SQL> set pagesize 14
SQL>
SQL> -- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 srec2 dbms_stats.statrec;
9 begin
10 dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
11 srec2.epc := 2;
12 novals := dbms_stats.numarray(
13 &low_value,
14 &high_value
15 );
16 srec2.bkvals := null;
17 dbms_stats.prepare_column_values(srec2,novals);
18 dbms_stats.set_column_stats(
19 ownname=>null,
20 tabname=>'t1',
21 colname=>'user_id',
22 distcnt=>distcnt,
23 nullcnt=>nullcnt,
24 srec=>srec2,
25 avgclen=>avgclen,
26 density=>density
27 );
28 end;
29 /
old 13: &low_value,
new 13: 27,
old 14: &high_value
new 14: 53

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.03
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> -- alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';
SQL>
SQL> -- alter session set "_smm_trace" = 65535;
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:01.37
SQL>
SQL> -- alter session set "_smm_trace" = 0;
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner t1) use_hash_aggregation(@inner)
*/ max(cnt) from ( select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id =
t2.user_id group by t1.object_id, t3.user_id )

Plan hash value: 3134842094

------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.37 | 139 | 2715 | 2715 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.37 | 139 | 2715 | 2715 | | | | |
| 2 | VIEW | | 1 | 1 | 810K|00:00:00.62 | 139 | 2715 | 2715 | | | | |
| 3 | HASH GROUP BY | | 1 | 1 | 810K|00:00:00.62 | 139 | 2715 | 2715 | 35M| 5521K| 7010K (1)| 23552 |
|* 4 | HASH JOIN | | 1 | 1 | 810K|00:00:00.01 | 139 | 0 | 0 | 1348K| 1348K| 1167K (0)| |
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | 0 | 0 | | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | 0 | 0 | | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | 0 | 0 | | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29907 | 30000 |00:00:00.01 | 55 | 0 | 0 | | | | |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

28 rows selected.

Elapsed: 00:00:00.03
SQL>

As it can be seen with an estimate in the right ballpark the HASH GROUP BY operation completes in memory (very close to the 20% PGA_AGGREGATE_TARGET maximum size of a single workarea from 10.2 on with automatic PGA management for PGA_AGGREGATE_TARGET < 500M, for more information see Joze Senegacnik's paper on the internals of automatic PGA management).

However repeating exactly the same operation with the fudged statistics it spills to disk and uses only 7M, although it could have used up to 40M (given that there is no concurrent workload).

The same does not happen when repeating this experiment with other operations that use a workarea - the most obvious one being a SORT GROUP BY, as can be seen from this output:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 200M
SQL> show parameter processes

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 0
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 150
SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.04
SQL>
SQL> drop table t2 purge;

Table dropped.

Elapsed: 00:00:00.04
SQL>
SQL> create table t1
2 as
3 select mod(rownum, 15) as user_id, rownum as object_id from dual connect by level <= 30000;

Table created.

Elapsed: 00:00:00.06
SQL>
SQL> create table t2
2 as
3 select distinct user_id from t1;

Table created.

Elapsed: 00:00:00.03
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> column low_val new_value low_value
SQL> column high_val new_value high_value
SQL>
SQL> select
2 max(user_id) + 1 as low_val
3 , max(user_id) + max(user_id) - min(user_id) + 1 as high_val
4 from
5 t2;

LOW_VAL HIGH_VAL
---------- ----------
15 29

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> -- alter session set tracefile_identifier = 'smm_trace_correct_cardinality';
SQL>
SQL> -- alter session set "_smm_trace" = 65535;
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 no_use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:00.54
SQL>
SQL> -- alter session set "_smm_trace" = 0;
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 94tqnspjuzk8x, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner
t2 t3 t1) use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner
t1) no_use_hash_aggregation(@inner) */ max(cnt) from ( select /*+ qb_name(inner) */
count(*) as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by
t1.object_id, t3.user_id )

Plan hash value: 2068941295

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.53 | 103 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.53 | 103 | | | |
| 2 | VIEW | | 1 | 315K| 450K|00:00:00.37 | 103 | | | |
| 3 | SORT GROUP BY | | 1 | 315K| 450K|00:00:00.37 | 103 | 28M| 1913K| 25M (0)|
|* 4 | HASH JOIN | | 1 | 446K| 450K|00:00:00.01 | 103 | 1348K| 1348K| 1097K (0)|
| 5 | NESTED LOOPS | | 1 | 225 | 225 |00:00:00.01 | 48 | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 15 | 15 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL| T2 | 15 | 15 | 225 |00:00:00.01 | 45 | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29787 | 30000 |00:00:00.01 | 55 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

29 rows selected.

Elapsed: 00:00:00.03
SQL>
SQL> set pagesize 14
SQL>
SQL> -- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 srec2 dbms_stats.statrec;
9 begin
10 dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
11 srec2.epc := 2;
12 novals := dbms_stats.numarray(
13 &low_value,
14 &high_value
15 );
16 srec2.bkvals := null;
17 dbms_stats.prepare_column_values(srec2,novals);
18 dbms_stats.set_column_stats(
19 ownname=>null,
20 tabname=>'t1',
21 colname=>'user_id',
22 distcnt=>distcnt,
23 nullcnt=>nullcnt,
24 srec=>srec2,
25 avgclen=>avgclen,
26 density=>density
27 );
28 end;
29 /
old 13: &low_value,
new 13: 15,
old 14: &high_value
new 14: 29

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> -- alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';
SQL>
SQL> -- alter session set "_smm_trace" = 65535;
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 no_use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:00.56
SQL>
SQL> -- alter session set "_smm_trace" = 0;
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 94tqnspjuzk8x, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner
t2 t3 t1) use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner
t1) no_use_hash_aggregation(@inner) */ max(cnt) from ( select /*+ qb_name(inner) */
count(*) as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by
t1.object_id, t3.user_id )

Plan hash value: 2068941295

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.55 | 103 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.55 | 103 | | | |
| 2 | VIEW | | 1 | 1 | 450K|00:00:00.38 | 103 | | | |
| 3 | SORT GROUP BY | | 1 | 1 | 450K|00:00:00.38 | 103 | 28M| 1913K| 25M (0)|
|* 4 | HASH JOIN | | 1 | 1 | 450K|00:00:00.01 | 103 | 1348K| 1348K| 1053K (0)|
| 5 | NESTED LOOPS | | 1 | 225 | 225 |00:00:00.01 | 48 | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 15 | 15 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL| T2 | 15 | 15 | 225 |00:00:00.01 | 45 | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29787 | 30000 |00:00:00.01 | 55 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

29 rows selected.

Elapsed: 00:00:00.03
SQL>

All I've done is to reduce the data set, because the SORT GROUP BY in this case required more memory for the same amount of data than the HASH GROUP BY, in order to prevent the operation to spill to disk with a 200M PGA_AGGREGATE_TARGET setting and change the USE_HASH_AGGREGATION hint to NO_USE_HASH_AGGREGATION.

As you can see, the operation completes both times in memory and uses the same amount of memory no matter what the estimates look like.

I've tested the serial execution of the most common workarea based operations like HASH JOIN, SORT ORDER BY, WINDOW SORT, SORT UNIQUE and all of them were able to dynamically resize the workarea in cases where the initial estimated size was too small.

If you carefully check then you'll notice that I haven't mentioned the HASH UNIQUE operation yet, and later on you'll see why.

A cunning feature of the automatic PGA management comes to help, however, which is a kind of "feedback loop" for the workareas based on statistics maintained by the automatic PGA memory management, and indeed when the HASH GROUP BY cursor based on the incorrect cardinality estimate gets shared (not invalidated) and re-executed, the next execution will look like this:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_wrong_cardinality_repeated_execution';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:00.76
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner t1)
use_hash_aggregation(@inner) */ max(cnt) from ( select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id,
t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by t1.object_id, t3.user_id )

Plan hash value: 3134842094

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.75 | 139 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.75 | 139 | | | | |
| 2 | VIEW | | 1 | 1 | 810K|00:00:00.44 | 139 | | | | |
| 3 | HASH GROUP BY | | 1 | 1 | 810K|00:00:00.44 | 139 | 35M| 5521K| 38M (0)| |
|* 4 | HASH JOIN | | 1 | 1 | 810K|00:00:00.01 | 139 | 1348K| 1348K| 1412K (0)| |
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29836 | 30000 |00:00:00.01 | 55 | | | | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

28 rows selected.

Elapsed: 00:00:00.03
SQL>

So Oracle this time based the workarea memory requirements on the feedback from the previous execution and therefore allocated sufficient memory to complete the operation without spilling to disk.

Notice however that the feedback loop unfortunately does not work as desired when the workarea execution fails due to insufficient TEMP space. Ideally the feedback loop should allow the subsequent executions to grab more memory specifically in such cases, but at present it doesn't - the failure obviously prevents an update of the statistics and therefore subsequent executions continue to fail since they still use the same amount of memory.

You can test this by simply assigned a very small TEMP tablespace to the user executing the query so that the second execution fails due to insufficient TEMP space. If you repeat the execution in this case, it will fail again and keeps doing so.

What happens if I invalidate the cursor and repeat the execution?

SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.01
SQL> set echo on timing on
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_wrong_cardinality_repeated_exec_invalid';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;
1

Elapsed: 00:00:01.29
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner t1) use_hash_aggregation(@inner)
*/ max(cnt) from ( select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id =
t2.user_id group by t1.object_id, t3.user_id )

Plan hash value: 3134842094

------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.28 | 139 | 2715 | 2715 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.28 | 139 | 2715 | 2715 | | | | |
| 2 | VIEW | | 1 | 1 | 810K|00:00:00.61 | 139 | 2715 | 2715 | | | | |
| 3 | HASH GROUP BY | | 1 | 1 | 810K|00:00:00.61 | 139 | 2715 | 2715 | 35M| 5521K| 7067K (1)| 23552 |
|* 4 | HASH JOIN | | 1 | 1 | 810K|00:00:00.01 | 139 | 0 | 0 | 1348K| 1348K| 1129K (0)| |
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | 0 | 0 | | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | 0 | 0 | | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | 0 | 0 | | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 29836 | 30000 |00:00:00.01 | 55 | 0 | 0 | | | | |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."USER_ID"="T2"."USER_ID")

28 rows selected.

Elapsed: 00:00:00.03
SQL>

Back to square one - with the invalidation the statistics are gone, too, so the bad cardinality estimate again lead to the suboptimal execution of the HASH GROUP BY.

So you might think, why bother? Only the first execution of a cursor without workarea execution statistics will be affected by the problem, subsequent execution of the same cursor will benefit from the statistics from the previous executions.

The problem however is, that this is fine for applications that share cursors. Unfortunately applications that peform heavy duty aggregations like data warehouses typically do not share cursors, since they do not care about the optimization overhead and deliberately use literals to provide as much information to the optimizer as possible.

Also these heavy duty aggregations usually use Parallel Execution features, and as you'll see from the output of the same test case, if I run the HASH GROUP BY in parallel by simply setting table T1 to parallel degree 2, a similar problem occurs - so Parallel Execution is affected as well.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 200M
SQL> show parameter processes

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 0
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 150
SQL>
SQL> -- alter session set workarea_size_policy = manual sort_area_size = 40000000;
SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.01
SQL>
SQL> drop table t2 purge;

Table dropped.

Elapsed: 00:00:00.04
SQL>
SQL> create table t1
2 as
3 select mod(rownum, 27) as user_id, rownum as object_id from dual connect by level <= 30000;

Table created.

Elapsed: 00:00:00.06
SQL>
SQL> alter table t1 parallel 2;

Table altered.

Elapsed: 00:00:00.03
SQL>
SQL> create table t2
2 as
3 select distinct user_id from t1;

Table created.

Elapsed: 00:00:00.03
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> column low_val new_value low_value
SQL> column high_val new_value high_value
SQL>
SQL> select
2 max(user_id) + 1 as low_val
3 , max(user_id) + max(user_id) - min(user_id) + 1 as high_val
4 from
5 t2;

LOW_VAL HIGH_VAL
---------- ----------
27 53

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_correct_cardinality';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:00.42
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_hash(@inner t1) no_swap_join_inputs(@inner t1)
use_hash_aggregation(@inner) */ max(cnt) from ( select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id,
t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by t1.object_id, t3.user_id )

Plan hash value: 464898991

--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | O/1/M |
--------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.43 | 87 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.43 | 87 | | | |
| 2 | PX COORDINATOR | | 1 | | 2 |00:00:00.43 | 87 | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 1 | 0 |00:00:00.01 | 0 | | | |
| 4 | SORT AGGREGATE | | 2 | 1 | 2 |00:00:00.81 | 0 | | | |
| 5 | VIEW | | 2 | 575K| 810K|00:00:00.48 | 0 | | | |
| 6 | HASH GROUP BY | | 2 | 575K| 810K|00:00:00.48 | 0 | 35M| 5521K| 2/0/0|
| 7 | PX RECEIVE | | 2 | 814K| 810K|00:00:00.01 | 0 | | | |
| 8 | PX SEND HASH | :TQ10001 | 0 | 814K| 0 |00:00:00.01 | 0 | | | |
|* 9 | HASH JOIN | | 2 | 814K| 810K|00:00:00.01 | 89 | 1348K| 1348K| 2/0/0|
| 10 | BUFFER SORT | | 2 | | 1458 |00:00:00.01 | 0 | 29696 | 29696 | 2/0/0|
| 11 | PX RECEIVE | | 2 | 729 | 1458 |00:00:00.01 | 0 | | | |
| 12 | PX SEND BROADCAST | :TQ10000 | 0 | 729 | 0 |00:00:00.01 | 0 | | | |
| 13 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | | | |
| 14 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | | | |
| 15 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | | | |
| 16 | PX BLOCK ITERATOR | | 2 | 30164 | 30000 |00:00:00.01 | 89 | | | |
|* 17 | TABLE ACCESS FULL | T1 | 18 | 30164 | 30000 |00:00:00.01 | 89 | | | |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

9 - access("T1"."USER_ID"="T2"."USER_ID")
17 - access(:Z>=:Z AND :Z<=:Z)

38 rows selected.

Elapsed: 00:00:00.03
SQL>
SQL> set pagesize 14
SQL>
SQL> -- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 srec2 dbms_stats.statrec;
9 begin
10 dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
11 srec2.epc := 2;
12 novals := dbms_stats.numarray(
13 &low_value,
14 &high_value
15 );
16 srec2.bkvals := null;
17 dbms_stats.prepare_column_values(srec2,novals);
18 dbms_stats.set_column_stats(
19 ownname=>null,
20 tabname=>'t1',
21 colname=>'user_id',
22 distcnt=>distcnt,
23 nullcnt=>nullcnt,
24 srec=>srec2,
25 avgclen=>avgclen,
26 density=>density
27 );
28 end;
29 /
old 13: &low_value,
new 13: 27,
old 14: &high_value
new 14: 53

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_hash(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 use_hash_aggregation(@inner)
9 */
10 max(cnt)
11 from
12 (
13 select /*+ qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 group by t1.object_id, t3.user_id
17 )
18 ;

MAX(CNT)
----------
1

Elapsed: 00:00:01.17
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS'));
SQL_ID 2s1sdfhvhajv3, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2) full(@inner t3) leading(@inner t2 t3 t1) use_nl(@inner t3)
use_hash(@inner t1) no_swap_join_inputs(@inner t1) use_hash_aggregation(@inner) */ max(cnt) from ( select /*+
qb_name(inner) */ count(*) as cnt, t1.object_id, t3.user_id from t1, t2, t2 t3 where t1.user_id = t2.user_id group by t1.object_id, t3.user_id )

Plan hash value: 464898991

------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | O/1/M | Max-Tmp |
------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.16 | 87 | 0 | 0 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.16 | 87 | 0 | 0 | | | | |
| 2 | PX COORDINATOR | | 1 | | 2 |00:00:01.16 | 87 | 0 | 0 | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10002 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 4 | SORT AGGREGATE | | 2 | 1 | 2 |00:00:02.25 | 0 | 2460 | 2460 | | | | |
| 5 | VIEW | | 2 | 1 | 810K|00:00:01.53 | 0 | 2460 | 2460 | | | | |
| 6 | HASH GROUP BY | | 2 | 1 | 810K|00:00:01.53 | 0 | 2460 | 2460 | 35M| 5521K| | 11264 |
| 7 | PX RECEIVE | | 2 | 1 | 810K|00:00:00.01 | 0 | 0 | 0 | | | | |
| 8 | PX SEND HASH | :TQ10001 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
|* 9 | HASH JOIN | | 2 | 1 | 810K|00:00:00.01 | 89 | 0 | 0 | 1348K| 1348K| 2/0/0| |
| 10 | BUFFER SORT | | 2 | | 1458 |00:00:00.01 | 0 | 0 | 0 | 29696 | 29696 | 2/0/0| |
| 11 | PX RECEIVE | | 2 | 729 | 1458 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 12 | PX SEND BROADCAST | :TQ10000 | 0 | 729 | 0 |00:00:00.01 | 0 | 0 | 0 | | | | |
| 13 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | 0 | 0 | | | | |
| 14 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | 0 | 0 | | | | |
| 15 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | 0 | 0 | | | | |
| 16 | PX BLOCK ITERATOR | | 2 | 30164 | 30000 |00:00:00.01 | 89 | 0 | 0 | | | | |
|* 17 | TABLE ACCESS FULL | T1 | 18 | 30164 | 30000 |00:00:00.01 | 89 | 0 | 0 | | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

9 - access("T1"."USER_ID"="T2"."USER_ID")
17 - access(:Z>=:Z AND :Z<=:Z)

37 rows selected.

Elapsed: 00:00:00.04
SQL>

So, if your database uses automatic PGA management and

- uses the hash aggregation HASH GROUP BY (or HASH UNIQUE, more on that in a moment)
- and does not share cursors

every execution that is based on bad cardinality estimates potentially has a problem with the hash aggregation because it might not make efficient use of the available memory.

The same applies to applications that share cursors, however in that case only the first execution after re-optimization / invalidation is affected.

So you might want to carefully check the runtime execution statistics of your critial hash aggregations.

Mind you, things could be worse, and that is where the HASH UNIQUE operation comes into the picture.

When I realised the issue with the HASH GROUP BY operation I was quite certain that the HASH UNIQUE operation will be affected in a similar way, since internally Oracle seems to use the same mechanism for both operations (In the SMM trace files both are called HASH GROUP BY).

To my surprise I noticed that in 10g versions below 10.2.0.5 and 11g versions below 11.2.0.1 (which means in this case 10.2.0.4 and 11.1.0.7 respectively, I didn't test other versions) the HASH UNIQUE operation suffers from an even more dramatic problem: The cardinality estimate is not considered and the initial workarea size is always based on minimum assumptions.

In passing, according to the SMM trace files that can be activated using the undocumented "_smm_trace" parameter it looks like that many of the sort-based workareas like SORT ORDER BY or WINDOW SORT seem to suffer from the same defect, since they however are able to dynamically resize and make use of the workarea statistics feedback they effectively work as expected, they just start with a too low workarea size estimate every time they are executed for the first time after re-optimization / invalidation.

The combination of the two issues - inability to dynamically resize and ignoring the optimizer estimates - leads to a dire result: Every first execution of a HASH UNIQUE operation in those versions will only use the minimum amount of memory. The following test case shows the problem:

set echo on timing on

show parameter pga
show parameter processes

-- alter session set workarea_size_policy = manual sort_area_size = 40000000;

drop table t1 purge;

drop table t2 purge;

create table t1
as
select mod(rownum, 27) as user_id, rownum as object_id from dual connect by level <= 30000;

create table t2
as
select distinct user_id from t1;

exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

column low_val new_value low_value
column high_val new_value high_value

select
max(user_id) + 1 as low_val
, max(user_id) + max(user_id) - min(user_id) + 1 as high_val
from
t2;

-- Invalidate any cursors using T1
comment on table t1 is '';

alter session set statistics_level = all;

alter session set tracefile_identifier = 'smm_trace_correct_cardinality';

alter session set "_smm_trace" = 65535;

select /*+ full(@inner t1)
full(@inner t2)
full(@inner t3)
leading(@inner t2 t3 t1)
use_nl(@inner t3)
use_nl(@inner t1)
use_hash_aggregation(@inner)
*/
max(user_id)
from
(
select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
from t1, t2, t2 t3
where t1.user_id = t2.user_id
)
;

alter session set "_smm_trace" = 0;

set pagesize 0 linesize 200 trimspool on tab off

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

set pagesize 14

-- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
declare
srec dbms_stats.statrec;
novals dbms_stats.numarray;
distcnt number;
avgclen number;
nullcnt number;
density number;
srec2 dbms_stats.statrec;
begin
dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
srec2.epc := 2;
novals := dbms_stats.numarray(
&low_value,
&high_value
);
srec2.bkvals := null;
dbms_stats.prepare_column_values(srec2,novals);
dbms_stats.set_column_stats(
ownname=>null,
tabname=>'t1',
colname=>'user_id',
distcnt=>distcnt,
nullcnt=>nullcnt,
srec=>srec2,
avgclen=>avgclen,
density=>density
);
end;
/

-- Invalidate any cursors using T1
comment on table t1 is '';

alter session set statistics_level = all;

alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';

alter session set "_smm_trace" = 65535;

select /*+ full(@inner t1)
full(@inner t2)
full(@inner t3)
leading(@inner t2 t3 t1)
use_nl(@inner t3)
use_nl(@inner t1)
use_hash_aggregation(@inner)
*/
max(user_id)
from
(
select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
from t1, t2, t2 t3
where t1.user_id = t2.user_id
)
;

alter session set "_smm_trace" = 0;

set pagesize 0 linesize 200 trimspool on tab off

select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));

Obviously it is very similar to the HASH GROUP BY test case, this time however I've used a DISTINCT clause and replaced the HASH JOIN with a NESTED LOOP which makes the generated trace files easier to read, since there is exactly one workarea involved in this execution.

And this is what I get from 11.1.0.7, this time using AMM, and therefore the PGA_AGGREGAT_TARGET has been set to 0 (You'll get the same result from 10.2.0.4 with a corresponding P_A_T setting):

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 0
SQL> show parameter processes

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 0
global_txn_processes integer 1
job_queue_processes integer 1000
log_archive_max_processes integer 4
processes integer 150
SQL>
SQL> -- alter session set workarea_size_policy = manual sort_area_size = 40000000;
SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.01
SQL>
SQL> drop table t2 purge;

Table dropped.

Elapsed: 00:00:00.00
SQL>
SQL> create table t1
2 as
3 select mod(rownum, 27) as user_id, rownum as object_id from dual connect by level <= 30000;

Table created.

Elapsed: 00:00:00.04
SQL>
SQL> create table t2
2 as
3 select distinct user_id from t1;

Table created.

Elapsed: 00:00:00.04
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.06
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>
SQL> column low_val new_value low_value
SQL> column high_val new_value high_value
SQL>
SQL> select
2 max(user_id) + 1 as low_val
3 , max(user_id) + max(user_id) - min(user_id) + 1 as high_val
4 from
5 t2;

LOW_VAL HIGH_VAL
---------- ----------
27 53

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_correct_cardinality';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_nl(@inner t1)
7 use_hash_aggregation(@inner)
8 */
9 max(user_id)
10 from
11 (
12 select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
13 from t1, t2, t2 t3
14 where t1.user_id = t2.user_id
15 )
16 ;

MAX(USER_ID)
------------
26

Elapsed: 00:00:01.79
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 7cnqz02uwcs1a, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2)
full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_nl(@inner t1)
use_hash_aggregation(@inner) */ max(user_id) from ( select /*+
qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id from t1,
t2, t2 t3 where t1.user_id = t2.user_id )

Plan hash value: 1541846686

------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.78 | 40179 | 1470 | 1470 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.78 | 40179 | 1470 | 1470 | | | | |
| 2 | VIEW | | 1 | 572K| 810K|00:00:00.97 | 40179 | 1470 | 1470 | | | | |
| 3 | HASH UNIQUE | | 1 | 572K| 810K|00:00:00.97 | 40179 | 1470 | 1470 | 25M| 3296K| 6029K (1)| 13312 |
| 4 | NESTED LOOPS | | 1 | 810K| 810K|00:00:00.01 | 40179 | 0 | 0 | | | | |
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | 0 | 0 | | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | 0 | 0 | | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | 0 | 0 | | | | |
|* 8 | TABLE ACCESS FULL | T1 | 729 | 1111 | 810K|00:00:00.03 | 40095 | 0 | 0 | | | | |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("T1"."USER_ID"="T2"."USER_ID")

30 rows selected.

Elapsed: 00:00:00.03
SQL>
SQL> set pagesize 14
SQL>
SQL> -- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 srec2 dbms_stats.statrec;
9 begin
10 dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
11 srec2.epc := 2;
12 novals := dbms_stats.numarray(
13 &low_value,
14 &high_value
15 );
16 srec2.bkvals := null;
17 dbms_stats.prepare_column_values(srec2,novals);
18 dbms_stats.set_column_stats(
19 ownname=>null,
20 tabname=>'t1',
21 colname=>'user_id',
22 distcnt=>distcnt,
23 nullcnt=>nullcnt,
24 srec=>srec2,
25 avgclen=>avgclen,
26 density=>density
27 );
28 end;
29 /
old 13: &low_value,
new 13: 27,
old 14: &high_value
new 14: 53

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_nl(@inner t1)
7 use_hash_aggregation(@inner)
8 */
9 max(user_id)
10 from
11 (
12 select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
13 from t1, t2, t2 t3
14 where t1.user_id = t2.user_id
15 )
16 ;

MAX(USER_ID)
------------
26

Elapsed: 00:00:01.26
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID 7cnqz02uwcs1a, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2)
full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_nl(@inner t1)
use_hash_aggregation(@inner) */ max(user_id) from ( select /*+
qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id from t1,
t2, t2 t3 where t1.user_id = t2.user_id )

Plan hash value: 1541846686

------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.26 | 40179 | 1470 | 1470 | | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.26 | 40179 | 1470 | 1470 | | | | |
| 2 | VIEW | | 1 | 1 | 810K|00:00:00.98 | 40179 | 1470 | 1470 | | | | |
| 3 | HASH UNIQUE | | 1 | 1 | 810K|00:00:00.98 | 40179 | 1470 | 1470 | 25M| 3296K| 6093K (1)| 13312 |
| 4 | NESTED LOOPS | | 1 | 1 | 810K|00:00:00.01 | 40179 | 0 | 0 | | | | |
| 5 | NESTED LOOPS | | 1 | 729 | 729 |00:00:00.01 | 84 | 0 | 0 | | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 27 | 27 |00:00:00.01 | 3 | 0 | 0 | | | | |
| 7 | TABLE ACCESS FULL| T2 | 27 | 27 | 729 |00:00:00.01 | 81 | 0 | 0 | | | | |
|* 8 | TABLE ACCESS FULL | T1 | 729 | 1 | 810K|00:00:00.01 | 40095 | 0 | 0 | | | | |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("T1"."USER_ID"="T2"."USER_ID")

30 rows selected.

Elapsed: 00:00:00.03
SQL>

As you can see no matter what the estimates looks like, on this system the first execution of a HASH UNIQUE will not get more than 6M - subsequent executions of the same cursor benefit from the statistics from the previous run as seen before.

Again, switching to a SORT UNIQUE and reducing the data set accordingly, the problem can not be reproduced:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>
SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 0
SQL> show parameter processes

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 0
global_txn_processes integer 1
job_queue_processes integer 1000
log_archive_max_processes integer 4
processes integer 150
SQL>
SQL> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.15
SQL>
SQL> drop table t2 purge;

Table dropped.

Elapsed: 00:00:00.07
SQL>
SQL> create table t1
2 as
3 select mod(rownum, 15) as user_id, rownum as object_id from dual connect by level <= 30000;

Table created.

Elapsed: 00:00:00.06
SQL>
SQL> create table t2
2 as
3 select distinct user_id from t1;

Table created.

Elapsed: 00:00:00.04
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> column low_val new_value low_value
SQL> column high_val new_value high_value
SQL>
SQL> select
2 max(user_id) + 1 as low_val
3 , max(user_id) + max(user_id) - min(user_id) + 1 as high_val
4 from
5 t2;

LOW_VAL HIGH_VAL
---------- ----------
15 29

Elapsed: 00:00:00.00
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_correct_cardinality';

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_nl(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 no_use_hash_aggregation(@inner)
9 */
10 max(user_id)
11 from
12 (
13 select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 )
17 ;

MAX(USER_ID)
------------
14

Elapsed: 00:00:00.73
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID ckb2sbz3y2z14, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2)
full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_nl(@inner t1)
no_swap_join_inputs(@inner t1)
no_use_hash_aggregation(@inner) */ max(user_id) from ( select
/*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id from
t1, t2, t2 t3 where t1.user_id = t2.user_id )

Plan hash value: 3828303002

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.71 | 12423 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.71 | 12423 | | | |
| 2 | VIEW | | 1 | 318K| 450K|00:00:00.62 | 12423 | | | |
| 3 | SORT UNIQUE | | 1 | 318K| 450K|00:00:00.62 | 12423 | 22M| 1744K| 20M (0)|
| 4 | NESTED LOOPS | | 1 | 450K| 450K|00:00:00.01 | 12423 | | | |
| 5 | NESTED LOOPS | | 1 | 225 | 225 |00:00:00.01 | 48 | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 15 | 15 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL| T2 | 15 | 15 | 225 |00:00:00.01 | 45 | | | |
|* 8 | TABLE ACCESS FULL | T1 | 225 | 2000 | 450K|00:00:00.01 | 12375 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("T1"."USER_ID"="T2"."USER_ID")

31 rows selected.

Elapsed: 00:00:00.06
SQL>
SQL> set pagesize 14
SQL>
SQL> -- Fudge the statistics so that the join cardinality between t1 and t2 will be 0 (rounded up to 1)
SQL> declare
2 srec dbms_stats.statrec;
3 novals dbms_stats.numarray;
4 distcnt number;
5 avgclen number;
6 nullcnt number;
7 density number;
8 srec2 dbms_stats.statrec;
9 begin
10 dbms_stats.get_column_stats(null, 't1', 'user_id', distcnt => distcnt, avgclen => avgclen, nullcnt => nullcnt, density => density, srec => srec);
11 srec2.epc := 2;
12 novals := dbms_stats.numarray(
13 &low_value,
14 &high_value
15 );
16 srec2.bkvals := null;
17 dbms_stats.prepare_column_values(srec2,novals);
18 dbms_stats.set_column_stats(
19 ownname=>null,
20 tabname=>'t1',
21 colname=>'user_id',
22 distcnt=>distcnt,
23 nullcnt=>nullcnt,
24 srec=>srec2,
25 avgclen=>avgclen,
26 density=>density
27 );
28 end;
29 /
old 13: &low_value,
new 13: 15,
old 14: &high_value
new 14: 29

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.06
SQL>
SQL> -- Invalidate any cursors using T1
SQL> comment on table t1 is '';

Comment created.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set statistics_level = all;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> alter session set tracefile_identifier = 'smm_trace_wrong_cardinality';

Session altered.

Elapsed: 00:00:00.01
SQL>
SQL> alter session set "_smm_trace" = 65535;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> select /*+ full(@inner t1)
2 full(@inner t2)
3 full(@inner t3)
4 leading(@inner t2 t3 t1)
5 use_nl(@inner t3)
6 use_nl(@inner t1)
7 no_swap_join_inputs(@inner t1)
8 no_use_hash_aggregation(@inner)
9 */
10 max(user_id)
11 from
12 (
13 select /*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id
14 from t1, t2, t2 t3
15 where t1.user_id = t2.user_id
16 )
17 ;

MAX(USER_ID)
------------
14

Elapsed: 00:00:00.74
SQL>
SQL> alter session set "_smm_trace" = 0;

Session altered.

Elapsed: 00:00:00.00
SQL>
SQL> set pagesize 0 linesize 200 trimspool on tab off
SQL>
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
SQL_ID ckb2sbz3y2z14, child number 0
-------------------------------------
select /*+ full(@inner t1) full(@inner t2)
full(@inner t3) leading(@inner t2 t3 t1)
use_nl(@inner t3) use_nl(@inner t1)
no_swap_join_inputs(@inner t1)
no_use_hash_aggregation(@inner) */ max(user_id) from ( select
/*+ qb_name(inner) no_merge */ distinct t1.object_id, t3.user_id from
t1, t2, t2 t3 where t1.user_id = t2.user_id )

Plan hash value: 3828303002

--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.73 | 12423 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.73 | 12423 | | | |
| 2 | VIEW | | 1 | 1 | 450K|00:00:00.61 | 12423 | | | |
| 3 | SORT UNIQUE | | 1 | 1 | 450K|00:00:00.61 | 12423 | 22M| 1744K| 20M (0)|
| 4 | NESTED LOOPS | | 1 | 1 | 450K|00:00:00.01 | 12423 | | | |
| 5 | NESTED LOOPS | | 1 | 225 | 225 |00:00:00.01 | 48 | | | |
| 6 | TABLE ACCESS FULL| T2 | 1 | 15 | 15 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL| T2 | 15 | 15 | 225 |00:00:00.01 | 45 | | | |
|* 8 | TABLE ACCESS FULL | T1 | 225 | 1 | 450K|00:00:00.01 | 12375 | | | |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - filter("T1"."USER_ID"="T2"."USER_ID")

31 rows selected.

Elapsed: 00:00:00.01
SQL>
SQL> set doc off
SQL> doc
SQL>

In 10.2.0.5 and 11.2.0.1/11.2.0.2 this issue is apparently fixed and the cardinality estimates are used for an initial workarea size calculation - however this doesn't mean that the dynamic resize problem is fixed - it simply means that the HASH UNIQUE in those versions behaves exactly as the HASH GROUP BY and the memory usage for the first execution depends on the cardinality estimates.

Summary

If your database uses automatic PGA management then for the hash aggregation HASH GROUP BY / HASH UNIQUE operations every initial execution that is based on bad cardinality estimates potentially has a problem because it might not make efficient use of the available memory.

The same applies to applications that share cursors, however in that case only the initial execution after re-optimization / invalidation is affected, subsequent executions benefit from the workarea statistics feedback mechanism.

Furthermore in 10g versions below 10.2.0.5 and 11g versions below 11.2.0.1 the initial execution of a HASH UNIQUE operation ignores the cardinality estimates and will always be based on minimum assumptions.

So you might want to carefully check the runtime execution statistics of your critical hash aggregations.

Possible Workarounds

Different strategies are available as workarounds, depending on the situation:

- Upgrading to versions where the HASH UNIQUE operation at least considers the optimizer estimates might be beneficial

- Obviously good cardinality estimates are crucial, in that case and when using the correct Oracle versions you should be fine. Using the undocumented OPT_ESTIMATE hint (or the deprecated undocumented CARDINALITY hint) might help in cases where other options like manually crafted statistics are not able to help the optimizer to come up with reasonable cardinality estimates.

- Applications that are able to share cursors might not be too much affected due to the ability to use the workarea executions for subsequent executions of the same cursor

- The described problems disappear when switching to manual workarea size policy and allowing for sufficient memory of workarea. Interestingly the less obvious SORT_AREA_SIZE is used for the manual control of the HASH GROUP BY operation, and not the HASH_AREA_SIZE. Of course using manual PGA memory management system-wide is highly unrecommended, so this might only be a workaround in certain cases of large batch jobs where manual workarea sizes are used anyway. Also note, when switching to manual workareas be aware of a nasty bug that was introduced in the 10.2.0.3 patchset. For more information see MOS note "6053134.8: ALTER SESSION to set SORT_AREA_SIZE no honoured" and for example mine and Jonathen Lewis' post about the issue. According to the MOS note the bug has been fixed in 10.2.0.4.3, 10.2.0.5 and 11.1.0.7 / 11.2

- Obviously the hash aggregation can be avoided on statement level using the NO_USE_HASH_AGGREGATION hint or on session / system level using the _GBY_HASH_AGGREGATION_ENABLED parameter. Since the sort based aggregation can be less efficient (but note that it doesn't have to be, it depends on the individual grouping and data pattern) again this doesn't necessarily solve the problem since aggregate workareas might spill to disk which wouldn't be necessary when hash aggregation was used and worked as expected.

Final Note

Of course there are a lot of things that I haven't touched yet or only briefly, like Parallel Execution of the hash aggregations, workareas used for index sorts, and all the other details when the workarea spills to disk.

If you want to get an idea what kind of nasty things can happen in that case you might want to read Jonathan Lewis' post on analytic functions.

I also haven't tried yet to fiddle around with the undocumented _smm* related parameters if setting them to non-default values allows to work around the issue.

And of course then there are bugs like this one: Bug 6817844 - Multi pass sort with auto memory management even with plenty of PGA [ID 6817844.8], which is also mentioned in Jonathan's post.

As a final note, the odd TEMP tablespace I/O pattern issue that Jonathan describes in his post (Bug 9041800) is marked as fixed in the 11.2.0.2 patch set, but I haven't tested this yet.

Exadata V3 – Oops – EXADATA X2-8

Oracle had the new version of the Exadata Machine on display at Oracle Open World this week. It’s called the Exadata X2-8. That’s a catchy name. It sounds very Iron Manish! In fact they had these fellows on display next to the demo machines.


The X2-8 uses two 4U Sun Fire x4800 servers which each have 8 x eight-core intel CPUs (X7560) and 1 Terabyte of memory along with 14 Exadata Storage Servers. Here’s a link to the spec sheet for the X2-8 and for the Oracle Sun X4800. Below are a couple of pictures. The first one shows one of the database servers (X4800) with one of the CPU modules out.

The storage cells have not changed much from the original V2. They still have 2 CPUs and 12 drives and 384G of flash cache. Although I’ve been told they have newer (faster) 6 core Intel CPUs. (I did get a look at an unpublished spec sheet on an Oracle employee’s iPhone and it said they were using the 5670 CPUs) Oh, and they will allow you to run Solaris on the database servers. Of course they will have to finish a new version of Solaris 11 before that can happen. It’s worth noting that X2-8 can be ordered but that they don’t have a firm delivery date yet.

So this configuration is definitely for the high end market and addresses a couple of issues. The increased memory will allow us to have a more robust consolidation platform. It will also allow bigger OLTP type systems to run better (i.e. the additional memory means we can support many more concurrent users and have much larger buffer caches). Note that Exadata’s offloading generally reduces the memory requirements, but nevertheless, very large systems, particularly ones with lot’s of fast OLTP type transactions and lot’s of users will be better satisfied by this type of configuration. Also note that there is no little brother version of the X2-8. It comes in a full rack only. Which makes sense because there are only two database machines. I don’t believe the price has been set yet, but the word on the street is that the hardware will be about 50% more than the full rack version with the small 2 CPU blades (now renamed X2-2 by the way).

I did a post a couple of weeks ago (Thoughts on Exadata V3) about what I thought we might get to see in the way of Exadata changes. We got most of the things I was expecting but not all of them. Among the things we got are bigger/beefier servers with more memory and available slots for HBAs to provide some additional connection capabilities (although I’m not sure if Oracle is going to want people to open the machines up and put additional controllers in). I did see mention in the x4800 spec sheet of an HBA so they may actually have one in there already (I need to check that out). They also announced that they will be offering a version of Solaris that can run on the database servers which I was expecting, although they are still using Intel chips. The thing I was expecting that didn’t happen was a change of mind set about flexibility of configuration. They seem pretty set on maintaining a fixed configuration that will be the same for every customer. That is probably not such a bad idea. Certainly it’s easier to support and faster to deploy. But you know how customers are, they want their hamburgers with extra mustard and no pickles. So we’ll see how that works out over time. But for now, it’s a fixed menu. To quote Henry Ford, “You can get it in any color you want as long as it’s black”.

So that’s all I can think of at this point. Please let me know if you have any questions and I’ll see what I can find out.

Do Storage Indexes Work with Bind Variables?

I saw a post today where the subject of Exadata Storage Indexes were being discussed. One of the things that caught my eye was a discussion of whether Storage Indexes worked with Bind Variables. One of the posters observed that since smart scan was aimed at data warehouse type queries, bind variables were pretty much irrelevant. Which is true. Still it’s an interesting question. So I thought I’d give it a quick test.

As usual I used a couple of scripts:

fsx.sql – queries v$sql and shows whether a statement has been offloaded or not (slightly modified to remove 2 columns)
mystats.sql – just queries v$mystat

We’ll look at a test with a number column first.

SYS@LABRAT1> -- Do SI's work with bind variables? - Yes
SYS@LABRAT1>
SYS@LABRAT1> -- first here's basic info on my test table (SKEW3) 
SYS@LABRAT1>
SYS@LABRAT1> desc kso.skew3
 Name                                                                                   Null?    Type
 -------------------------------------------------------------------------------------- -------- ----------------------------------------------------------
 PK_COL                                                                                          NUMBER
 COL1                                                                                            NUMBER
 COL2                                                                                            VARCHAR2(30)
 COL3                                                                                            DATE
 COL4                                                                                            VARCHAR2(1)
 
SYS@LABRAT1> select count(*) from kso.skew3;
 
  COUNT(*)
----------
 384000048
 
1 row selected.
 
Elapsed: 00:00:26.53
SYS@LABRAT1>  -- 27 seconds to do a full scan with no where clause (there are no indexes)
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                                        0
 
SYS@LABRAT1> -- no Storage Index usage by this session yet
SYS@LABRAT1> -- let's try a query using a variable
SYS@LABRAT1> set echo on
SYS@LABRAT1> @test_bv_si
SYS@LABRAT1> 
SYS@LABRAT1> variable X NUMBER
SYS@LABRAT1> 
SYS@LABRAT1> begin
  2  
  3  :X := -1;
  4  
  5  end;
  6  
  7  /
 
PL/SQL procedure successfully completed.
 
SYS@LABRAT1> 
SYS@LABRAT1> select count(*) from kso.skew3 where col1 = :x;
 
  COUNT(*)
----------
         0
 
Elapsed: 00:00:00.08
 
SYS@LABRAT1> set echo off
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                              16025346048
 
SYS@LABRAT1> -- so it used the storage index
SYS@LABRAT1> @fsx
Enter value for sql_text: select count(*) from kso.skew3 where col1 = :x 
Enter value for sql_id: 
Enter value for inst_id: 
 
 INST SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME AVG_PX OFFLOADABLE IO_SAVED_% SQL_TEXT
----- ------------- ------ ---------- ---------- ------------- ------ ----------- ---------- --------------------------------------------------
    1 1nsxv1zpawmsa      0 2684249835          2           .08      0 Yes             100.00 select count(*) from kso.skew3 where col1 = :x
 
1 row selected.
 
SYS@LABRAT1> @dplan
Enter value for sql_id: 1nsxv1zpawmsa
Enter value for child_no: 
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1nsxv1zpawmsa, child number 0
-------------------------------------
select count(*) from kso.skew3 where col1 = :x
 
Plan hash value: 2684249835
 
------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |       |       |   533K(100)|          |
|   1 |  SORT AGGREGATE            |       |     1 |     5 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| SKEW3 |   385 |  1925 |   533K  (1)| 01:46:43 |
------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - storage("COL1"=:X)
       filter("COL1"=:X)
 
 
20 rows selected.

So the Storage Index was clearly used on this statement using a SQL*Plus number variable. Here’s some 10046 trace data to show that smart scan wait event was used – note also the “enq: KO – fast object checkpoint” wait event which is done before the direct path reads (replaced by the “cell smart table scan” event in Exadata land).

 
...
 
PARSING IN CURSOR #2 len=46 dep=0 uid=0 oct=3 lid=0 tim=1284254192882293 hv=3937292042 ad='76742aa20' sqlid='1nsxv1zpawmsa'
select count(*) from kso.skew3 where col1 = :x
END OF STMT
PARSE #2:c=1000,e=299,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1284254192882292
WAIT #2: nam='ges message buffer allocation' ela= 5 pool=0 request=1 allocated=0 obj#=73486 tim=1284254192882398
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=73486 tim=1284254192882442
WAIT #2: nam='library cache lock' ela= 228 handle address=31804186896 lock address=31727714984 100*mode+namespace=315619966779394 obj#=73486 tim=1284254192882696
WAIT #2: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=73486 tim=1284254192882741
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=73486 tim=1284254192882764
WAIT #2: nam='library cache pin' ela= 176 handle address=31804186896 pin address=31727714728 100*mode+namespace=315619966779394 obj#=73486 tim=1284254192882963
EXEC #2:c=1000,e=1611,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=2684249835,tim=1284254192883951
WAIT #2: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=73486 tim=1284254192883982
WAIT #2: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=73486 tim=1284254192884289
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=73486 tim=1284254192884322
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 100 name|mode=1263468550 2=65584 0=1 obj#=73486 tim=1284254192884443
WAIT #2: nam='reliable message' ela= 1287 channel context=31898270672 channel handle=31492015160 broadcast message=31556682800 obj#=73486 tim=1284254192885850
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=73486 tim=1284254192885899
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 142 name|mode=1263468550 2=65584 0=1 obj#=73486 tim=1284254192886063
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=73486 tim=1284254192886118
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 117 name|mode=1263468545 2=65584 0=2 obj#=73486 tim=1284254192886271
WAIT #2: nam='cell smart table scan' ela= 240 cellhash#=379339958 p2=0 p3=0 obj#=73486 tim=1284254192887559
WAIT #2: nam='cell smart table scan' ela= 222 cellhash#=2133459483 p2=0 p3=0 obj#=73486 tim=1284254192888038
WAIT #2: nam='cell smart table scan' ela= 212 cellhash#=3176594409 p2=0 p3=0 obj#=73486 tim=1284254192888531
WAIT #2: nam='cell smart table scan' ela= 1038 cellhash#=379339958 p2=0 p3=0 obj#=73486 tim=1284254192894795
WAIT #2: nam='cell smart table scan' ela= 1061 cellhash#=2133459483 p2=0 p3=0 obj#=73486 tim=1284254192895927
WAIT #2: nam='cell smart table scan' ela= 962 cellhash#=3176594409 p2=0 p3=0 obj#=73486 tim=1284254192896956
WAIT #2: nam='cell smart table scan' ela= 1121 cellhash#=379339958 p2=0 p3=0 obj#=73486 tim=12842541928982088
 
...

So it looks like a definite yes for offloading with bind variables and using Storage Indexes with bind variables. At least with numeric variables. Now let’s check out a varchar2 column and while we’re at it let’s check wild carding using % and the LIKE operator.

SYS@LABRAT1> select col2, count(*) from kso.skew3 group by col2;
 
COL2                               COUNT(*)
------------------------------ ------------
                                         12
2342                                     36
asddsadasd                        384000000
 
3 rows selected.
 
Elapsed: 00:00:41.90
SYS@LABRAT1> select count(*) from kso.skew3 where col2 like '2342';
 
    COUNT(*)
------------
          36
 
1 row selected.
 
Elapsed: 00:00:00.10
SYS@LABRAT1> -- you should guess from the elapsed time this one used the Storage Index (it did)
SYS@LABRAT1>
SYS@LABRAT1> select count(*) from kso.skew3 where col2 like '234%';
 
    COUNT(*)
------------
          36
 
1 row selected.
 
Elapsed: 00:00:14.98
SYS@LABRAT1> -- and you should guess from the elapsed time that this one didn't (it didn't)
SYS@LABRAT1> -- so wildcards are not good for Storage Indexes
SYS@LABRAT1> -- let's try varchar2 variables now
SYS@LABRAT1>
SYS@LABRAT1> @test_bv_si
SYS@LABRAT1> 
SYS@LABRAT1> variable W varchar2(10)
SYS@LABRAT1> variable X varchar2(10)
SYS@LABRAT1> variable Y varchar2(10)
SYS@LABRAT1> variable Z varchar2(10)
SYS@LABRAT1> 
SYS@LABRAT1> begin
  2  
  3  :X := '1111';
  4  :Y := '2342';
  5  :Z := '234%';
  6  
  7  end;
  8  
  9  /
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.00
SYS@LABRAT1> 
SYS@LABRAT1> select /* 1111 */ count(*) from kso.skew3 where col1 = :x;
 
  COUNT(*)
----------
         0
 
Elapsed: 00:00:15.25
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                               VALUE
--------------------------------------------- ----------
cell physical IO bytes saved by storage index          0
 
Elapsed: 00:00:00.00
SYS@LABRAT1> -- no joy - looks like this one should use Storage Index, why not???
SYS@LABRAT1>
SYS@LABRAT1> select /* 2342 */ count(*) from kso.skew3 where col2 = :y;
 
  COUNT(*)
----------
        36
 
Elapsed: 00:00:00.10
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                               VALUE
--------------------------------------------- ----------
cell physical IO bytes saved by storage index 1.6000E+10
 
Elapsed: 00:00:00.00
SYS@LABRAT1> --this one worked
SYS@LABRAT1>
SYS@LABRAT1> select /* 2342 */ count(*) from kso.skew3 where col2 like :y;
 
  COUNT(*)
----------
        36
 
Elapsed: 00:00:15.11
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                               VALUE
--------------------------------------------- ----------
cell physical IO bytes saved by storage index 1.6000E+10
 
Elapsed: 00:00:00.00
SYS@LABRAT1> -- this one doesn't work due to the LIKE
SYS@LABRAT1>
SYS@LABRAT1> select /* 234% */ count(*) from kso.skew3 where col2 like :z;
 
  COUNT(*)
----------
        36
 
Elapsed: 00:00:15.19
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                               VALUE
--------------------------------------------- ----------
cell physical IO bytes saved by storage index 1.6000E+10
 
Elapsed: 00:00:00.00
SYS@LABRAT1> -- this one also doesn't work due to the LIKE
SYS@LABRAT1> -- let's try rerunning the same statement but changing the values 
SYS@LABRAT1>
SYS@LABRAT1> exec :Y := '2342';
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.00
SYS@LABRAT1> select /* various */ count(*) from kso.skew3 where col2 = :y;
 
  COUNT(*)
----------
        36
 
Elapsed: 00:00:00.08
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                               VALUE
--------------------------------------------- ----------
cell physical IO bytes saved by storage index 3.2000E+10
 
Elapsed: 00:00:00.01
SYS@LABRAT1> -- used the Storage Index as expected
SYS@LABRAT1>
SYS@LABRAT1> exec :Y := '1111';
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.00
SYS@LABRAT1> select /* various */ count(*) from kso.skew3 where col2 = :y;
 
  COUNT(*)
----------
         0
 
Elapsed: 00:00:00.07
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                               VALUE
--------------------------------------------- ----------
cell physical IO bytes saved by storage index 4.8026E+10
 
Elapsed: 00:00:00.00
SYS@LABRAT1> -- This is odd - it used the Storage Index again, even though our original test with '1111' didn't
SYS@LABRAT1> -- is this similar to bind variable peeking in that the statement has a locked in approach?
SYS@LABRAT1> -- I'm not sure.
SYS@LABRAT1>
SYS@LABRAT1> exec :Y := 'asddsadasd';
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.00
SYS@LABRAT1> select /* various */ count(*) from kso.skew3 where col2 = :y;
 
  COUNT(*)
----------
 384000000
 
Elapsed: 00:00:32.01
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                               VALUE
--------------------------------------------- ----------
cell physical IO bytes saved by storage index 4.8026E+10
 
Elapsed: 00:00:00.00
SYS@LABRAT1> -- doesn't appeat that this one got any benefit, 
SYS@LABRAT1> -- but there may not be any blocks that don't contain that value 
SYS@LABRAT1> -- here the stats on the statements in this example
SYS@LABRAT1>
SYS@LABRAT1> set echo off
SYS@LABRAT1>@fsx
Enter value for sql_text: %skew3%
Enter value for sql_id: 
Enter value for inst_id: 
 
 INST SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME AVG_PX OFFLOADABLE IO_SAVED_% SQL_TEXT
----- ------------- ------ ---------- ---------- ------------- ------ ----------- ---------- ----------------------------------------------------------------------
    1 14a1chcq10j8q      0 2684249835          1         15.23      0 Yes              99.99 select /* 1111 */ count(*) from kso.skew3 where col1 = :x
    1 367zpt07qh2d6      0 2684249835          1         15.17      0 Yes              99.99 select /* 234% */ count(*) from kso.skew3 where col2 like :z
    1 f834t319m48vw      0 2684249835          3         10.72      0 Yes              86.45 select /* various */ count(*) from kso.skew3 where col2 = :y
    1 ftrtpg2xcdp0t      0 2684249835          1         15.11      0 Yes              99.99 select /* 2342 */ count(*) from kso.skew3 where col2 like :y
    1 gcnvsm28bnu4p      0 2684249835          1           .09      0 Yes             100.00 select /* 2342 */ count(*) from kso.skew3 where col2 = :y
 
5 rows selected.
 
Elapsed: 00:00:00.05

So these results indicate the following:

  1. Storage Indexes Can be Used with Bind Variables on Varchar2 variables
  2. Storage Indexes Don’t Appear to be Used with any Wild Carding (%)
  3. Storage Indexes Aren’t Used with the Like Operator when Bind Variables are used

Now let’s take a quick look at how Storage Indexes work with date fields.

 
SYS@LABRAT1> flush_pool
 
System altered.
 
SYS@LABRAT1> select min(col3),max(col3) from kso.skew3;             
 
MIN(COL3) MAX(COL3)
--------- ---------
20-OCT-05 01-JAN-09
 
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                                        0
 
SYS@LABRAT1> set timing on
SYS@LABRAT1> select count(*) from kso.skew3 where col3 = '20-OCT-05';
 
  COUNT(*)
----------
         4
 
Elapsed: 00:00:15.13
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                                        0
 
Elapsed: 00:00:00.01
SYS@LABRAT1> -- so no Storage Index usage???
SYS@LABRAT1> 
SYS@LABRAT1> select count(*) from kso.skew3 where col3 < '19-OCT-05';
 
  COUNT(*)
----------
         0
 
Elapsed: 00:00:15.07
SYS@LABRAT1> -- still no Storage Index usage
SYS@LABRAT1> 
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                                        0
 
Elapsed: 00:00:00.00
SYS@LABRAT1> select count(*) from kso.skew3 where col3 > '01-jan-10';
 
  COUNT(*)
----------
         0
 
Elapsed: 00:00:15.09
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                                        0
 
Elapsed: 00:00:00.00
SYS@LABRAT1> -- still nothing
SYS@LABRAT1> 
SYS@LABRAT1> select count(*) from kso.skew3 where col3 is null;
 
  COUNT(*)
----------
        12
 
Elapsed: 00:00:00.08
SYS@LABRAT1> @mystats
Enter value for name: storage
 
NAME                                                                             VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index                              16012763136
 
Elapsed: 00:00:00.00
SYS@LABRAT1> -- so this time we used the Storage Index 
SYS@LABRAT1> -- why is it different?
SYS@LABRAT1> 
SYS@LABRAT1> @fsx
Enter value for sql_text: %skew3%
Enter value for sql_id: 
Enter value for inst_id: 
 
 INST SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME AVG_PX OFFLOADABLE IO_SAVED_% SQL_TEXT
----- ------------- ------ ---------- ---------- ------------- ------ ----------- ---------- ----------------------------------------------------------------------
    1 2sycry6jd7cus      0 2684249835          1         15.06      0 Yes              99.99 select count(*) from kso.skew3 where col3 < '19-OCT-05'
    1 6n5y91cxw4yzu      0 2684249835          1           .08      0 Yes             100.00 select count(*) from kso.skew3 where col3 is null
    1 asfmw4ccsv2u9      0 2684249835          1         28.45      0 Yes              67.15 select min(col3),max(col3) from kso.skew3
    1 fuhmg9hqdbd84      0 2684249835          1         15.12      0 Yes              99.99 select count(*) from kso.skew3 where col3 = '20-OCT-05'
    1 gkbzsmx4w57ym      0 2684249835          1         15.09      0 Yes              99.99 select count(*) from kso.skew3 where col3 > '01-jan-10'
 
5 rows selected.
 
Elapsed: 00:00:00.06
SYS@LABRAT1> @dplan
Enter value for sql_id: fuhmg9hqdbd84
Enter value for child_no: 
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  fuhmg9hqdbd84, child number 0
-------------------------------------
select count(*) from kso.skew3 where col3 = '20-OCT-05'
 
Plan hash value: 2684249835
 
------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |       |       |   537K(100)|          |
|   1 |  SORT AGGREGATE            |       |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| SKEW3 |   383 |  3064 |   537K  (2)| 01:47:25 |
------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - storage("COL3"='20-OCT-05')
       filter("COL3"='20-OCT-05')
 
 
20 rows selected.
 
Elapsed: 00:00:00.09
SYS@LABRAT1> @dplan
Enter value for sql_id: 6n5y91cxw4yzu
Enter value for child_no: 
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6n5y91cxw4yzu, child number 0
-------------------------------------
select count(*) from kso.skew3 where col3 is null
 
Plan hash value: 2684249835
 
------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |       |       |   533K(100)|          |
|   1 |  SORT AGGREGATE            |       |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| SKEW3 |    12 |    96 |   533K  (1)| 01:46:44 |
------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - storage("COL3" IS NULL)
       filter("COL3" IS NULL)
 
 
20 rows selected.
 
Elapsed: 00:00:00.02
SYS@LABRAT1> -- I wonder if the date format is disabling the Storage Index
SYS@LABRAT1> 
SYS@LABRAT1> 
SYS@LABRAT1> select count(*) from kso.skew3 where col3 = '20-OCT-2005';
 
  COUNT(*)
----------
         4
 
1 row selected.
 
Elapsed: 00:00:00.08
SYS@LABRAT1> -- ha, that did it!
SYS@LABRAT1> 
SYS@LABRAT1> @fsx
Enter value for sql_text: %skew3%
Enter value for sql_id: 
Enter value for inst_id: 
 
 
 INST SQL_ID         CHILD  PLAN_HASH      EXECS     AVG_ETIME AVG_PX OFFLOADABLE IO_SAVED_% SQL_TEXT
----- ------------- ------ ---------- ---------- ------------- ------ ----------- ---------- ----------------------------------------------------------------------
    1 2s58n6d3mzkmn      0 2684249835          1           .07      0 Yes             100.00 select count(*) from kso.skew3 where col3 = '20-OCT-2005'
    1 2sycry6jd7cus      0 2684249835          1         15.06      0 Yes              99.99 select count(*) from kso.skew3 where col3 < '19-OCT-05'
    1 6n5y91cxw4yzu      0 2684249835          1           .08      0 Yes             100.00 select count(*) from kso.skew3 where col3 is null
    1 asfmw4ccsv2u9      0 2684249835          1         28.45      0 Yes              67.15 select min(col3),max(col3) from kso.skew3
    1 fuhmg9hqdbd84      0 2684249835          1         15.12      0 Yes              99.99 select count(*) from kso.skew3 where col3 = '20-OCT-05'
    1 gkbzsmx4w57ym      0 2684249835          1         15.09      0 Yes              99.99 select count(*) from kso.skew3 where col3 > '01-jan-10'
 
6 rows selected.
 
Elapsed: 00:00:00.06
SYS@LABRAT1> @dplan
Enter value for sql_id: 2s58n6d3mzkmn
Enter value for child_no: 
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  2s58n6d3mzkmn, child number 0
-------------------------------------
select count(*) from kso.skew3 where col3 = '20-OCT-2005'
 
Plan hash value: 2684249835
 
------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |       |       |   533K(100)|          |
|   1 |  SORT AGGREGATE            |       |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| SKEW3 |   383 |  3064 |   533K  (1)| 01:46:48 |
------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - storage("COL3"=TO_DATE(' 2005-10-20 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
       filter("COL3"=TO_DATE(' 2005-10-20 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
 
 
22 rows selected.
 
Elapsed: 00:00:00.03

So like regular B-Tree indexes, implicit conversion, functions applied to columns, etc … can disable Storage Indexes. Not too surprising. It’s also interesting is that the Storage Indexes on dates are a little persnickety. Looks like literals work fine (at least in SQL*Plus) as long as the full 4 digit year is specified. You can see the format that Oracle converts it to is ‘syyyy-mm-dd hh24:mi:ss’. I was unable to get Storage Indexes to work with date columns using SQL*Plus varchar2 variables though. Any one got any ideas?

Parallel Full Table Scans Do Not Always Perform Direct Reads

Even though in general parallel full table scans performs direct reads, some exceptions exist. The aim of this post is to show such an exception.
For test purposes I build in my own schema a copy of the SH.SALES table (the one distributed by Oracle with the demo schemas…). On that table I build an [...]

Partition-Wise Join of List-Partitioned Tables

When two tables are equi-partitioned on their join keys, the query optimizer is able to take advantage of partition-wise joins. To make sure that the tables are equi-partitioned, as of Oracle Database 11g reference partitioning can be used. In fact, per definition, with reference partitioning all “related” tables have exactly the same partitioning schema. If [...]

Compression Restrictions

When using table or index compression certain restrictions apply. Since I find it always hard to gather that information from the manuals and also some of the restrictions are not documented properly or not at all, I'll attempt to summarize the restrictions that I'm aware of here:

1. Compression attribute on subpartitions
Note that you can't define the COMPRESSION on subpartition level - the subpartitions inherit this attribute from the parent partition.

It is however perfectly valid and legal to exchange a subpartition with a compressed table and that way introduce individual compression on subpartition level. It is unclear why this is only possible using this indirect method - it simply looks like a implementation restriction. Since compressed data is most suitable for data warehouse environments and these ought to use EXCHANGE [SUB]PARTITION technology anyway this restriction is probably not that crucial.

2. Number of columns
Basic heap table compression and the new OLTP (in 11.1 called "for all operations") compression are both limited to 255 columns. If you attempt to compress a table with more than 255 columns then no compression will be performed although you don't get an error message. Interestingly the COMPRESSION column in the dictionary still shows "ENABLED" which is certainly misleading.

3. DDL Restrictions - modification of table structure
As soon as a segment contains compressed data or has been marked for compression (does not apply in all cases, see below for more information), the following restrictions regarding the modification of the table structure apply:

a. Basic heap table compression will not prevent the addition of more than 255 columns via ADD COLUMN - but if you update the segment the affected blocks will be silently decompressed during the update operation. In 9.2 no schema evolution is possible with compressed segments, see the next paragraph.

b. In Oracle 9.2 a lot of modifications to the table are no longer possible with compression enabled (and this applies to partitioned tables as well if at least one partition is compressed):

- You cannot add or drop any columns, and you even can't set any column unused. The error messages are to say at least confusing, since it tells you something about virtual and object columns, but in fact it is about the enabled compression. These are serious limitations in terms of schema evolution. I'll address below what can be done to overcome this issue.

Oracle9i Enterprise Edition Release 9.2.0.8.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production

SQL> create table t_part
2 compress
3 partition by list (user_id) (
4 partition p_default values (default)
5 )
6 as
7 select * from all_users;

Table created.

SQL> alter table t_part set unused (created);
alter table t_part set unused (created)
*
ERROR at line 1:
ORA-12996: cannot drop system-generated virtual column

SQL> alter table t_part drop unused columns;

Table altered.

SQL> alter table t_part drop column username;
alter table t_part drop column username
*
ERROR at line 1:
ORA-12996: cannot drop system-generated virtual column

SQL> alter table t_part add test_col varchar2(10);
alter table t_part add test_col varchar2(10)
*
ERROR at line 1:
ORA-22856: cannot add columns to object tables

SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';
alter table t_part add test_col2 varchar2(10) default 'BLA'
*
ERROR at line 1:
ORA-22856: cannot add columns to object tables

SQL>

c. In Oracle 10.2 some restrictions have been lifted but there are still a couple of limitations:

- You can now add columns and you can set columns unused but you can't drop columns nor can you add columns with default values

This allows schema evolution but has interesting side effects if you attempt to use for example a Create Table As Select (CTAS) statement to dynamically create a copy of a table and attempt to perform an EXCHANGE PARTITION operation with that newly created table: If there are unused columns in the source table then these won't be copied over to the table created via CTAS and therefore the EXCHANGE PARTITION operation will fail with column mismatch error messages. Since you can't drop the unused columns as long as there is compression enabled on the table (more on that later since it is a bit more complex with partitioned tables), you effectively can't use that approach. In order to perform EXCHANGE PARTITION with compressed tables and evolved schemas that include unused columns you have basically three choices:

- Maintain the "EXCHANGE" table along with the target table which means apply the same DDL in the same order to both tables in order to keep them synchronized

- Extract the hidden columns from the dictionary and create a exchange table with the correct hidden/unused columns which can turn into a complex operation if you need to cover all possible column data types and features including LOBs etc.

- Perform some kind of rebuild operation on the target table to get rid of the unused columns. Some ideas regarding this point are going to follow further below

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create table t_part
2 compress
3 partition by list (user_id) (
4 partition p_default values (default)
5 )
6 as
7 select * from all_users;

Table created.

SQL> alter table t_part set unused (created);

Table altered.

SQL> alter table t_part drop unused columns;
alter table t_part drop unused columns
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part drop column username;
alter table t_part drop column username
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part add test_col varchar2(10);

Table altered.

SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';
alter table t_part add test_col2 varchar2(10) default 'BLA'
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL>

d. Oracle 11 introduces Advanced Compression which is COMPRESS FOR ALL OPERATIONS in 11.1 or OLTP compression in 11.2 (don't confuse this with HCC compression levels)

Basic compression in 11.1 has the same limitations as 10.2, but with advanced compression (called COMPRESS FOR ALL OPERATIONS in 11.1) all mentioned schema evolution restrictions have been lifted as it can be seen below.

Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create table t_part
2 --compress
3 compress for all operations
4 partition by list (user_id) (
5 partition p_default values (default)
6 )
7 as
8 select * from all_users;

Table created.

SQL> alter table t_part set unused (created);

Table altered.

SQL> alter table t_part drop unused columns;

Table altered.

SQL> alter table t_part drop column username;

Table altered.

SQL> alter table t_part add test_col varchar2(10);

Table altered.

SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';

Table altered.

SQL>

This is more of theoretical nature since you usually won't mix basic with advanced compression, bit things become more interesting when mixing basic and advanced compression in different partitions of the same table:

Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create table t_part
2 compress
3 --compress for all operations
4 partition by list (user_id) (
5 partition p_default values (default)
6 )
7 as
8 select * from all_users;

Table created.

SQL> alter table t_part set unused (created);

Table altered.

SQL> alter table t_part drop unused columns;
alter table t_part drop unused columns
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part drop column username;
alter table t_part drop column username
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part add test_col varchar2(10);

Table altered.

SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';
alter table t_part add test_col2 varchar2(10) default 'BLA'
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part split partition p_default values (123)
2 into (partition p_123 compress for all operations, partition p_default);

Table altered.

SQL> select
2 partition_name
3 , compression
4 , compress_for
5 from
6 user_tab_partitions
7 where
8 table_name = 'T_PART';

PARTITION_NAME COMPRESS COMPRESS_FOR
------------------------------ -------- ------------------
P_DEFAULT ENABLED DIRECT LOAD ONLY
P_123 ENABLED FOR ALL OPERATIONS

SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';

Table altered.

SQL> alter table t_part drop unused columns;
alter table t_part drop unused columns
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part drop column username;
alter table t_part drop column username
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL>

Notice the inconsistent behaviour: The first DDL command after having added a partition with advanced compression succeeds, but the following DDLs fail with an error message.

If the order of these DDLs is changed, then it becomes obvious that always the first DDL is successful no matter which of those failing otherwise comes first.

Furthermore in 11.1.0.7 if you have a mixture of Advanced Compression (COMPRESS FOR ALL OPERATIONS) and NOCOMPRESS partitions still above restrictions apply - only if you have set all partitions to COMPRESS FOR ALL OPERATIONS then the restrictions are lifted which looks more like a bug than a feature.

I'll go into more details below which settings influence this behaviour.

e. Oracle 11.2 introduces Hybrid Columnar Compression (HCC) which is only enabled in conjunction with ExaData

In 11.2 basic compression still has the same restrictions as already mentioned above, and if at least one partition of a partitioned table has advanced compression enabled then some of restrictions are lifted - even with a mixture of basic and advanced compression or a mixture of no compression and advanced compression, however adding a column with a default value is still only supported if all partitions are compressed with Advanced Compression. I'm not sure if this is really a feature since at least mixing Advanced Compression with no compression adding a column with default value should be possible.

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create table t_part
2 compress
3 --compress for all operations
4 partition by list (user_id) (
5 partition p_default values (default)
6 )
7 as
8 select * from all_users;

Table created.

SQL> alter table t_part set unused (created);

Table altered.

SQL> alter table t_part drop unused columns;
alter table t_part drop unused columns
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part drop column username;
alter table t_part drop column username
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part add test_col varchar2(10);

Table altered.

SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';
alter table t_part add test_col2 varchar2(10) default 'BLA'
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part split partition p_default values (123)
2 into (partition p_123 compress for all operations, partition p_default);

Table altered.

SQL> select
2 partition_name
3 , compression
4 , compress_for
5 from
6 user_tab_partitions
7 where
8 table_name = 'T_PART';

PARTITION_NAME COMPRESS COMPRESS_FOR
------------------------------ -------- ------------
P_123 ENABLED OLTP
P_DEFAULT ENABLED BASIC

SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';
alter table t_part add test_col2 varchar2(10) default 'BLA'
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part drop unused columns;

Table altered.

SQL> alter table t_part drop column username;

Table altered.

SQL> alter table t_part move partition p_default compress for all operations;

Table altered.

SQL> select
2 partition_name
3 , compression
4 , compress_for
5 from
6 user_tab_partitions
7 where
8 table_name = 'T_PART';

PARTITION_NAME COMPRESS COMPRESS_FOR
------------------------------ -------- ------------
P_123 ENABLED OLTP
P_DEFAULT ENABLED OLTP

SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';

Table altered.

SQL>

When using HCC the most important difference is that it is not restricted to 255 columns - it successfully compresses segments with any number of columns (up to the hard limit of 1,000 columns).

HCC can only be used with direct-path operations, any conventional DML will decompress the affected blocks and use OLTP compression instead - which means that in case the segment has more than 255 columns the compression will effectively be disabled since any non-HCC compression supports only up to 255 columns.

HCC has only a few restrictions regarding schema evolution - the most noticeable one is that DROP COLUMN is always turned into a SET COLUMN UNUSED, so with HCC enabled you can't drop columns since DROP UNUSED COLUMNS is not supported and raises an error. This means that dropping / adding columns works towards the hard limit of 1,000 columns with HCC enabled.

Again this has the same side effect as described above - a simple CTAS operation to create a table to be exchanged with won't work any longer with unused columns.

SQL> create table t_part
2 compress for query high
3 --compress for all operations
4 partition by list (user_id) (
5 partition p_default values (default)
6 )
7 as
8 select * from all_users;

Table created.

SQL> alter table t_part set unused (created);

Table altered.

SQL> alter table t_part drop unused columns;
alter table t_part drop unused columns
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part drop column username;

Table altered.

SQL> alter table t_part add test_col varchar2(10);

Table altered.

SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';

Table altered.

SQL> alter table t_part split partition p_default values (123)
2 into (partition p_123 compress for all operations, partition p_default);

Table altered.

SQL> select
2 partition_name
3 , compression
4 , compress_for
5 from
6 user_tab_partitions
7 where
8 table_name = 'T_PART';

PARTITION_NAME COMPRESS COMPRESS_FOR
------------------------------ -------- ------------
P_123 ENABLED OLTP
P_DEFAULT ENABLED QUERY HIGH

SQL> alter table t_part add test_col3 varchar2(10) default 'BLA';

Table altered.

SQL> alter table t_part drop unused columns;
alter table t_part drop unused columns
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table t_part drop column username;
alter table t_part drop column username
*
ERROR at line 1:
ORA-00904: "USERNAME": invalid identifier

SQL> alter table t_part move partition p_default compress for all operations;

Table altered.

SQL> select
2 partition_name
3 , compression
4 , compress_for
5 from
6 user_tab_partitions
7 where
8 table_name = 'T_PART';

PARTITION_NAME COMPRESS COMPRESS_FOR
------------------------------ -------- ------------
P_123 ENABLED OLTP
P_DEFAULT ENABLED OLTP

SQL> alter table t_part add test_col4 varchar2(10) default 'BLA';

Table altered.

SQL>

And here is what happens if you're already hitting the 1,000 columns limit and attempt to use DROP [UNUSED] COLUMNS with HCC enabled:

SQL> alter table many_x_cols drop (col256);

Table altered.

SQL> alter table many_x_cols drop unused columns;
alter table many_x_cols drop unused columns
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> alter table many_x_cols add (col256 varchar2(10));
alter table many_x_cols add (col256 varchar2(10))
*
ERROR at line 1:
ORA-01792: maximum number of columns in a table or view is 1000

SQL> alter table many_x_cols move pctfree 0 nocompress;

Table altered.

SQL> alter table many_x_cols drop unused columns;

Table altered.

SQL> alter table many_x_cols add (col256 varchar2(10));

Table altered.

SQL>

As you can see dropping columns is only supported with HCC disabled which allows in this particular case to add another column after dropping the hidden/unused one.

- Index restrictions

* You cannot compress a table or a partition of table (or exchange a compressed partition into it) if there are usable bitmap indexes defined on the table due to the different Hakan factor that is reset when using by the compression - which means that with compressed segments more rows will fit into a single block leading to a different Hakan factor used for the bitmap indexes. The bitmap indexes need to be set unused (all partitions in case of a partitioned table/index) or dropped before enabling the compression and rebuild / recreated after compressing the heap segment. This is also officially documented. It is however interesting to note that this restriction is only enforced when switching from uncompressed to compressed segments / partitions. Changing the compression for example from basic compression to HCC compression is possible without hitting this restriction but due to the different compression levels in my opinion this potentially leads again to significantly different Hakan factors - so actively resetting the Hakan Factor using "ALTER TABLE ... MINIMIZE RECORDS_PER_BLOCK" and rebuilding the bitmap indexes might be in order when moving between different compression levels.

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter table t_part move partition p_default nocompress;

Table altered.

SQL> alter table t_part move partition p_123 nocompress;

Table altered.

SQL> alter table t_part modify default attributes nocompress;

Table altered.

SQL> alter table t_part nocompress;

Table altered.

SQL> create bitmap index t_part_idx_bitmap on t_part (test_col) local;

Index created.

SQL> alter table t_part move partition p_default compress basic;
alter table t_part move partition p_default compress basic
*
ERROR at line 1:
ORA-14646: Specified alter table operation involving compression cannot be
performed in the presence of usable bitmap indexes

SQL> alter table t_part minimize records_per_block;
alter table t_part minimize records_per_block
*
ERROR at line 1:
ORA-28602: statement not permitted on tables containing bitmap indexes

SQL> alter index t_part_idx_bitmap modify partition p_default unusable;

Index altered.

SQL> alter table t_part move partition p_default compress basic;
alter table t_part move partition p_default compress basic
*
ERROR at line 1:
ORA-14646: Specified alter table operation involving compression cannot be
performed in the presence of usable bitmap indexes

SQL> alter table t_part minimize records_per_block;
alter table t_part minimize records_per_block
*
ERROR at line 1:
ORA-28602: statement not permitted on tables containing bitmap indexes

SQL> alter index t_part_idx_bitmap unusable;

Index altered.

SQL> alter table t_part move partition p_default compress basic;

Table altered.

SQL> alter table t_part minimize records_per_block;

Table altered.

SQL> alter index t_part_idx_bitmap rebuild partition p_default;

Index altered.

SQL> alter index t_part_idx_bitmap rebuild partition p_123;

Index altered.

SQL> alter table t_part move partition p_default nocompress update indexes;

Table altered.

As you can see switching on the compression is only allowed if all partitions of the bitmap index(es) are unusable - the opposite is not true: You can uncompress afterwards without any restrictions.

* This is related to B*Tree index compression and again more of theoretical nature but might become relevant when trying to introduce B*Tree index compression partition-wise via EXCHANGE PARTITION: You can't compress a single partition of a partitioned B*Tree index (or exchange a partition with a compressed index into the table) if the whole index has been not initially created with compression enabled. If you drop the index and recreate the whole index with compression enabled then you can turn the index compression individually on and off per index partition.

Very likely this is related to the restriction that the B*Tree index compression prefix length needs to be the same across all index partitions - this can only be defined on global level and I assume this is the reason why the index needs to be created compressed first. Since you usually want to have all partitions of an partitioned index to be compressed the same way this is probably a rather irrelevant restriction (except for the mentioned case regarding EXCHANGE PARTITION)

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create index t_part_idx on t_part (user_id, test_col) nocompress local;

Index created.

SQL> alter index t_part_idx rebuild partition p_default compress;
alter index t_part_idx rebuild partition p_default compress
*
ERROR at line 1:
ORA-28659: COMPRESS must be specified at object level first

SQL> select
2 partition_name
3 , compression
4 from
5 user_ind_partitions
6 where
7 index_name = 'T_PART_IDX';

PARTITION_NAME COMPRESS
------------------------------ --------
P_123 DISABLED
P_DEFAULT DISABLED

SQL> drop index t_part_idx;

Index dropped.

SQL> create index t_part_idx on t_part (user_id, test_col) compress local;

Index created.

SQL> alter index t_part_idx rebuild partition p_default nocompress;

Index altered.

SQL> select
2 partition_name
3 , compression
4 from
5 user_ind_partitions
6 where
7 index_name = 'T_PART_IDX';

PARTITION_NAME COMPRESS
------------------------------ --------
P_123 ENABLED
P_DEFAULT DISABLED

SQL> drop index t_part_idx;

Index dropped.

SQL> create index t_part_idx on t_part (user_id, test_col) nocompress global
2 partition by range(user_id) (
3 partition p_default values less than (maxvalue)
4 );

Index created.

SQL> alter index t_part_idx rebuild partition p_default compress;
alter index t_part_idx rebuild partition p_default compress
*
ERROR at line 1:
ORA-28659: COMPRESS must be specified at object level first

SQL> drop index t_part_idx;

Index dropped.

SQL> create index t_part_idx on t_part (user_id, test_col) compress global
2 partition by range(user_id) (
3 partition p_default values less than (maxvalue)
4 );

Index created.

SQL> alter index t_part_idx rebuild partition p_default nocompress;

Index altered.

SQL> select
2 partition_name
3 , compression
4 from
5 user_ind_partitions
6 where
7 index_name = 'T_PART_IDX';

PARTITION_NAME COMPRESS
------------------------------ --------
P_DEFAULT DISABLED

SQL>

- Overcoming any of the schema evolution restrictions by temporarily uncompressing / disabling enabled compression

As already mentioned above there are different options how to overcome schema evolution restrictions. One obvious option, although very likely not applicable in many cases simply due to space and load constraints, is to temporarily uncompress the segments and to disable the compression. It is interesting to note that it is not as simple as it seems to undo compression with partitioned tables as I will demonstrate - I think I've even read somewhere that it is not possible at all to revert a table into a state where the mentioned restrictions no longer apply. This is definitely not true.

First of all it is important to note that in addition to the obvious decompression of each partition that contains compressed data the "COMPRESS" attribute that determines if a suitable operation will generate compressed data or not needs to be reset to NOCOMPRESS (but see below for "oddities"). So there is a significant difference between "ALTER ... [NO]COMPRESS" and "ALTER ... MOVE [NO]COMPRESS". The latter reorganizes the segment / partition and compresses/uncompresses the data while doing so, the former just "marks" the segment / partition as "enabled/disabled for compression" but doesn't touch the data and leaves it in whatever state it is at present. As as side note, the "COMPRESSION" column in the dictionary can not be used to distinguish between these two - you can have the column show "DISABLED" with data still compressed, and you can have the column show "ENABLED" with no compressed data and you can end up with a mixture of both - not every block of a segment is necessarily in a compressed state.

Furthermore the "COMPRESSION" attribute can be defined on multiple levels with partitioned tables:

- On the global TABLE level: ALTER TABLE ... [NO]COMPRESS

- On the partition level as default attributes: ALTER TABLE ... MODIFY DEFAULT ATTRIBUTES [NO]COMPRESS

- On the partition level to mark for [NO]COMPRESSION: ALTER TABLE ... MODIFY PARTITION [NO]COMPRESS

- On the partition level with MOVE: ALTER TABLE ... MOVE PARTITION [NO]COMPRESS ... [UPDATE [GLOBAL] INDEXES]

Now in order to be able to overcome the schema evolution restrictions all levels except the partition default attribute level need to be addressed: Any compressed data needs to be uncompressed, but also any partition that has been marked for compression (but doesn't necessarily contain compressed data) needs to be modified as well - finally the global table level also needs to be reset.

The setting on global table level seems to be the crucial step. It looks like Oracle checks during this operation (ALTER TABLE ... NOCOMPRESS) if all subordinate (sub-)partitions have a flag set that marks them as uncompressed. Only if that is the case some internal flag on global table level can also be reset so that the restrictions no longer apply.

There are however some oddities that need to be considered when doing so:

- You need to "move" any partition (ALTER TABLE ... MOVE PARTITION NOCOMPRESS) that has marked for compression if any other partition actually held compressed data - simply "unmarking" it using "ALTER TABLE ... MODIFY PARTITION NOCOMPRESS" is not sufficient even it does not hold any compressed data (for example, has only be marked with "ALTER TABLE ... MODIFY PARTITION COMPRESS").

- If you have BITMAP INDEXES defined on the table with compressed partitions you need to either set the whole indexes unusable or drop them before the COMPRESSION can be effectively disabled on global table level using ALTER TABLE ... NOCOMPRESS

See the following test case for all steps to unwind this:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create table t_part
2 nocompress
3 partition by list (user_id) (
4 partition p_default values (default)
5 )
6 as
7 select * from all_users;

Table created.

SQL> alter table t_part split partition p_default values (123)
2 into (partition p_123, partition p_default);

Table altered.

SQL> -- Compress one partition
SQL> alter table t_part move partition p_default compress;

Table altered.

SQL> -- And uncompress it again
SQL> alter table t_part move partition p_default nocompress;

Table altered.

SQL> -- Doesn't work!
SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';
alter table t_part add test_col2 varchar2(10) default 'BLA'
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> -- Compress one partition
SQL> alter table t_part move partition p_default compress;

Table altered.

SQL> -- And uncompress it again
SQL> alter table t_part move partition p_default nocompress;

Table altered.

SQL> -- The global table level was missing!
SQL> alter table t_part nocompress;

Table altered.

SQL> -- Now it works
SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';

Table altered.

SQL> -- Start again
SQL> alter table t_part drop column test_col2;

Table altered.

SQL> -- Compress one partition
SQL> alter table t_part move partition p_default compress;

Table altered.

SQL> -- Mark in addition another partition for compression (note: no data gets compressed!)
SQL> alter table t_part modify partition p_123 compress;

Table altered.

SQL> -- And uncompress the compressed partition again
SQL> alter table t_part move partition p_default nocompress;

Table altered.

SQL> -- Unmark the other partition marked for compression (note: no data was compressed!)
SQL> alter table t_part modify partition p_123 nocompress;

Table altered.

SQL> -- The global table level
SQL> alter table t_part nocompress;

Table altered.

SQL> -- But: Doesn't work!
SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';
alter table t_part add test_col2 varchar2(10) default 'BLA'
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> -- Oddity 1: Although it doesn't contain compressed data
SQL> -- this partition needs to be reorganized as well
SQL> alter table t_part move partition p_123 nocompress;

Table altered.

SQL> -- Don't forget the global table level, otherwise below doesn't work
SQL> alter table t_part nocompress;

Table altered.

SQL> -- Now it works
SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';

Table altered.

SQL> -- Start again
SQL> alter table t_part drop column test_col2;

Table altered.

SQL> -- Compress one partition
SQL> alter table t_part move partition p_default compress;

Table altered.

SQL> -- Add a bitmap index
SQL> create bitmap index t_part_idx_bitmap on t_part (username) local;

Index created.

SQL> -- And uncompress the compressed partition again
SQL> alter table t_part move partition p_default nocompress update indexes;

Table altered.

SQL> -- The global table level
SQL> alter table t_part nocompress;

Table altered.

SQL> -- But: Doesn't work!
SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';
alter table t_part add test_col2 varchar2(10) default 'BLA'
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> -- Oddity 2: Bitmap indexes on compressed tables need to be set unusable or dropped
SQL> -- Note it is not sufficient to set single partitions unusable
SQL> alter index t_part_idx_bitmap unusable;

Index altered.

SQL> -- But still: Doesn't work because the global level has not been touched again
SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';
alter table t_part add test_col2 varchar2(10) default 'BLA'
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

SQL> -- The global table level
SQL> alter table t_part nocompress;

Table altered.

SQL> -- Now it works
SQL> alter table t_part add test_col2 varchar2(10) default 'BLA';

Table altered.

SQL>

DBMS_STATS - Gather table statistics with many columns

Here is another good reason why you probably don't want to use tables with too many columns.

The first good reason is that Oracle stores rows of conventional heap tables with more than 255 columns (where at least one column value after the 255th is non-null) in multiple row pieces even when the entire row may fit into a single block (and up to 11.2 doesn't support basic and OLTP heap table compression on tables with more than 255 columns). This leads to something that is sometimes called "intra-row chaining" which means that Oracle needs to follow the row piece pointer to access columns after the 255th one leading to multiple logical I/Os per row, up to four for a row approaching the hard limit of 1,000 columns.

Of course such a multiple-piece row easily can result in actual row chaining where the row pieces are scattered across different blocks now potentially leading to additional random single block I/O, but I digress...

This simple example allows to see this (and the other, yet to describe) effect in action - it creates by default a table with 1,000 columns with 10,000 rows where each row resides in a separate block (and therefore only intra-row chaining should occur). I used a 8K block size tablespace with Manual Segment Space Management (MSSM) in order to avoid the ASSM overhead in this case:

----------------------------------------------------------------------------------------------------------
-- MANY_X_COLS.SQL
-- Usage: @MANY_X_COLS [NUM_COLS] [PCTFREE] [PCTUSED] [TABLESPACE] [COL1] [COL2] [COL3] [COL4] [NUM_ROWS]
-- Defaults: [1000 ] [99 ] [1 ] [TEST_8K ] [001 ] [256 ] [512 ] [768 ] [10000 ]
--
-- Create a table MANY_X_COLS with a configurable number of columns
-- and populate four columns of it using skewed data (normal distribution)
----------------------------------------------------------------------------------------------------------

set termout off verify off
REM Save current settings
store set .settings replace

REM Command line handling

REM Preparation for default value handling
column 1 new_value 1
column 2 new_value 2
column 3 new_value 3
column 4 new_value 4
column 5 new_value 5
column 6 new_value 6
column 7 new_value 7
column 8 new_value 8
column 9 new_value 9

select
'' as "1"
, '' as "2"
, '' as "3"
, '' as "4"
, '' as "5"
, '' as "6"
, '' as "7"
, '' as "8"
, '' as "9"
from
dual
where
rownum = 0;

REM Default values
select
nvl('&&1', '1000') as "1"
, nvl('&&2', '99') as "2"
, nvl('&&3', '1') as "3"
, nvl('&&4', 'TEST_8K') as "4"
, nvl('&&5', '001') as "5"
, nvl('&&6', '256') as "6"
, nvl('&&7', '512') as "7"
, nvl('&&8', '768') as "8"
, nvl('&&9', '10000') as "9"
from
dual;

define n_iter = &1
define n_pctfree = &2
define n_pctused = &3
define s_tblspace = &4
define s_col1 = &5
define s_col2 = &6
define s_col3 = &7
define s_col4 = &8
define n_num_rows = &9

set termout on echo on verify on

declare
s_sql1 varchar2(32767);
s_sql varchar2(32767);

e_table_or_view_not_exists exception;
pragma exception_init(e_table_or_view_not_exists, -942);
begin
for i in 1..&n_iter loop
s_sql1 := s_sql1 || 'col' || to_char(i, 'FM9000') || ' varchar2(10),';
end loop;

s_sql1 := rtrim(s_sql1, ',');

s_sql := 'drop table many_x_cols purge';

begin
execute immediate s_sql;
exception
when e_table_or_view_not_exists then
null;
end;

s_sql := 'create table many_x_cols (' || s_sql1 || ') pctfree &n_pctfree pctused &n_pctused tablespace &s_tblspace';

execute immediate s_sql;

dbms_random.seed(0);

s_sql := 'insert /*+ append */ into many_x_cols (
col&s_col1
, col&s_col2
, col&s_col3
, col&s_col4
)
select
trunc(sys.dbms_random.normal * &n_num_rows.) as col&s_col1
, trunc(sys.dbms_random.normal * &n_num_rows.) as col&s_col2
, trunc(sys.dbms_random.normal * &n_num_rows.) as col&s_col3
, trunc(sys.dbms_random.normal * &n_num_rows.) as col&s_col4
from
dual
connect by
level <= &n_num_rows
';

execute immediate s_sql;

commit;
end;
/

set termout off
REM Restore and cleanup
undefine 1 2 3 4 5 6 7 8 9
column 1 clear
column 2 clear
column 3 clear
column 4 clear
column 5 clear
column 6 clear
column 7 clear
column 8 clear
column 9 clear
@.settings
set termout on

If you have created the table with the defaults and now run some simple queries on it, for example:

SET AUTOTRACE TRACEONLY STATISTICS

SELECT COUNT(*) FROM MANY_X_COLS;

you should see approximately 10,000 consistent gets after some initial runs to avoid the overhead of hard parsing. Your output should look similar to the following:

SQL> SELECT COUNT(*) FROM MANY_X_COLS;

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
10004 consistent gets
10000 physical reads
0 redo size
422 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

However if you start to access columns after the 255th one (and you've used the defaults or suitable values to populate at least one these columns), then you'll notice that the number of consistent gets increases whereas the number of physical reads stays more or less the same:

SET AUTOTRACE TRACEONLY STATISTICS

SELECT COUNT(COL256) FROM MANY_X_COLS;

SQL> SELECT COUNT(COL256) FROM MANY_X_COLS;

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
30004 consistent gets
10000 physical reads
0 redo size
427 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

It is interesting to note that in 11.1.0.7 and 11.2.0.1 it happened sometimes that the consistent gets jumped directly from 10,000 to 30,000 and increased then to 40,000 and 50,000 respectively when accessing later columns whereas in 10.2.0.4 the expected result of 20,000, 30,000 and 40,000 was seen.

It is also interesting to note that the statistic "table fetch continued row" is only increasing once per row which could be another interesting variation of the theme "How does Oracle record this statistic".

OK, back to the main topic for today: DBMS_STATS and such tables with many columns.

The other good reason is that DBMS_STATS might switch to a multi-pass mode if a table exceeds a certain number of columns. This means even if you request only basic column statistics (METHOD_OPT=>"FOR ... COLUMNS SIZE 1...") without any additional histograms that require additional passes anyway DBMS_STATS.GATHER_TABLE_STATS will perform multiple passes (full segment scans usually) on the same segment. This is even true if basic column statistics are only requested for a limited set of columns (for example "FOR COLUMNS SIZE 1 COL1, COL2, COL3") since Oracle gathers some column statistics always no matter what options are passed to DBMS_STATS (very likely in order to calculate for example the average row length for the table statistics).

The threshold when DBMS_STATS switches to multiple passes depends the options passed to DBMS_STATS and differs between Oracle versions 11g (11.1 and 11.2) and versions prior to 11g - and I'm not talking about the multiple passes that are used in versions prior to 11g to determine the optimal auto sample size (DBMS_STATS.AUTO_SAMPLE_SIZE, see below for more information).

In Oracle 10.2.0.4 DBMS_STATS is capable of handling a minimum of approximately 160 columns per pass if basic columns statistics are requested and a maximum of approximately 440 columns if no column statistics are requested, which means that it requires between three and seven passes to gather statistics for a 1,000 columns table. Below is what such a typical query looks like - and you'll find up to seven of them in a SQL trace file of a DBMS_STATS call with a different column list for a 1,000 columns table.

Notice that in 10.2.0.4 DBMS_STATS gathers five different values per column for basic column statistics:

- Number of non-null values
- Number of distinct values
- Column size
- Minimum value
- Maximum value

If no column statistics are requested then still the "Number of non-null values" and "Column size" per column are gathered which explains why DBMS_STATS is capable of handling more columns per pass since simply less information per column is requested.

select /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
*/ count(*),count("COL257"),count(distinct "COL257"),
sum(sys_op_opnsize("COL257")),substrb(dump(min("COL257"),16,0,32),1,120),
substrb(dump(max("COL257"),16,0,32),1,120),count("COL258"),count(distinct
"COL258"),sum(sys_op_opnsize("COL258")),substrb(dump(min("COL258"),16,0,32),
1,120),substrb(dump(max("COL258"),16,0,32),1,120),count("COL259"),
count(distinct "COL259"),sum(sys_op_opnsize("COL259")),
substrb(dump(min("COL259"),16,0,32),1,120),substrb(dump(max("COL259"),16,0,
32),1,120),...
.
.
.
from
"CBO_TEST"."MANY_X_COLS" t

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.20 0.19 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1.21 3.07 9713 40010 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.41 3.26 9713 40010 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT GROUP BY (cr=40010 pr=9713 pw=0 time=3070537 us)
10000 TABLE ACCESS FULL MANY_X_COLS (cr=40010 pr=9713 pw=0 time=1820245 us)

Tracing the DBMS_STATS.GATHER_TABLE_STATS call furthermore reveals that Oracle performs at least one additional query when using DBMS_STATS.AUTO_SAMPLE_SIZE (the default) which is probably used for determining the optimal sample size because Oracle uses an iterative approach prior to 11g to determine the auto sample size. The sample size is increased in steps until the number of distinct values per column passes some statistical checks. The interesting thing however seems to be that these additional queries are repeated per pass:

select /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
*/ count(*)
from
"CBO_TEST"."MANY_X_COLS" sample block ( 1.0000000000,1) t

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 7 0.01 0.00 0 0 0 0
Execute 7 0.00 0.00 0 0 0 0
Fetch 7 0.12 3.67 1971 764 0 7
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 21 0.13 3.67 1971 764 0 7

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=119 pr=332 pw=0 time=530678 us)
108 TABLE ACCESS SAMPLE MANY_X_COLS (cr=119 pr=332 pw=0 time=646431 us)

********************************************************************************

select /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
*/ count(*)
from
"CBO_TEST"."MANY_X_COLS" t

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 7 0.00 0.00 0 0 0 0
Execute 7 0.00 0.00 0 0 0 0
Fetch 7 3.41 15.85 65858 70070 0 7
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 21 3.41 15.86 65858 70070 0 7

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=10010 pr=9445 pw=0 time=2228265 us)
10000 TABLE ACCESS FULL MANY_X_COLS (cr=10010 pr=9445 pw=0 time=816529 us)

This means that in total between six (no column statistics) and at least 14 queries (basic column statistics for all columns) are used to gather the statistics for a 1,000 columns table in 10.2.0.4 with DBMS_STATS.AUTO_SAMPLE_SIZE.

Using a fixed sample size (or compute) it takes between three and seven queries to gather the statistics for a 1,000 columns table in 10.2.0.4. When computing the basic statistics for all columns this means a minimum of 220,000 consistent gets to gather the statistics on a 10,000 blocks table with 1,000 columns in comparison to approximately 20,000 consistent gets to perform the same on a 255 columns table - not taking into account the CPU overhead of performing the aggregates (count distinct, min, max, etc.) on all those columns.
Note that these consistent gets exclude the work required to maintain the dictionary and statistics history information.

The additional queries executed when using DBMS_STATS.AUTO_SAMPLE_SIZE in 10.2.0.4 increase the number of consistent gets to 300,000 for the 1,000 columns / 10,000 blocks table (in this case the automatism ends up in a compute) and 40,000 consistent gets for the 255 columns / 10,000 blocks table.

Of course the intention of the AUTO_SAMPLE_SIZE is to find an optimal sample size and not to actually increase the amount of work compared to a compute gather statistics, but in this particular case this was true.

This significant difference in consistent gets between the 1,000 columns and 255 columns table both having 10,000 blocks has three reasons:

1. It requires more consistent gets per pass due to the multiple row pieces (up to four times more consistent gets)
2. DBMS_STATS performs many more passes on the 1,000 columns table (when gathering basic column statistics for all columns at least seven on the 1,000 columns table compared to at least two on the 255 columns table)
3. Obviously DBMS_STATS doesn't make a systematic attempt to minimize the number of consistent gets per pass by trying to access only leading columns per pass (so to require only 10,000 / 20,000 / 30,000 / 40,000 gets per pass). It looks more like the columns accessed per pass are randomly chosen often causing the maximum number of consistent gets per pass

The same applies to version 11.1 and 11.2 as long as fixed sample sizes are used. However when using DBMS_STATS.AUTO_SAMPLE_SIZE the whole gather statistics logic has been revised, in particular due to the introduction of the improvements regarding number of distinct column values approximation and incremental partition statistics, which is described in great detail by Amit Poddar here.

Due to these improvements to DBMS_STATS when using DBMS_STATS.AUTO_SAMPLE_SIZE Oracle 11.1.0.7 (and 11.2.0.1) are capable of handling up to 1,000 columns in a single pass if no columns statistics are requested and a minimum of approximately 260 columns per pass if basic column statistics are requested. This means it takes four passes to gather basic column statistics on all columns for a 1,000 columns table. Furthermore due to the revised logic the additional queries used in 10.2.0.4 for the DBMS_STATS.AUTO_SAMPLE_SIZE are no longer used.

Tracing the DBMS_STATS.GATHER_TABLE_STATS call in 11.1.0.7 and 11.2.0.1 also reveals that there is a special operation activated that is not represented in SQL. Accompanied by a comment outlining what the special NDV operation is supposed to do DBMS_STATS seems to activate a "APPROXIMATE NDV AGGREGATE" operation that is very likely used for the aforementioned improvement regarding the NDV algorithm.

Such a query looks then like the following:

select /*+ full(t) no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
no_substrb_pad */to_char(count("COL174")),to_char(substrb(dump(min("COL174")
,16,0,32),1,120)),to_char(substrb(dump(max("COL174"),16,0,32),1,120)),
to_char(count("COL175")),to_char(substrb(dump(min("COL175"),16,0,32),1,120))
,to_char(substrb(dump(max("COL175"),16,0,32),1,120)),...
.
.
.
from
"CBO_TEST"."MANY_X_COLS" t /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,
NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,
NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,
NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,
.
.
.
NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.11 0.10 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 1.01 2.24 9845 30010 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 1.12 2.35 9845 30010 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 93 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=30010 pr=9845 pw=0 time=0 us)
8755 APPROXIMATE NDV AGGREGATE (cr=30010 pr=9845 pw=0 time=2336556 us cost=2 size=121114 card=82)
10000 TABLE ACCESS FULL MANY_X_COLS (cr=30010 pr=9845 pw=0 time=300223 us cost=2 size=121114 card=82)

The following differences to 10g become obvious:

1. The SQL part has been reduced to

- Number of non-null values
- Minimum value
- Maximum value

for basic column statistics. The number of distinct values is obviously performed as part of the special operation "APPROXIMATE NDV AGGREGATE" as it can be seen in the "Row Source Operation" (for further details refer to Amit Poddar's document). Note that in case of no column statistics this is reduced to a single expression per column which is the "Number of non-null values".

2. The special operation "APPROXIMATE NDV AGGREGATE" appears in the execution plan

3. The trailing comment section of the statement describes what the "APPROXIMATE NDV AGGREGATE" is supposed to perform, for more details refer to the aforementioned document by Amit Poddar

4. All expressions are converted to strings using TO_CHAR

5. A new hint NO_SUBSTRB_PAD has been introduced (example taken from 11.2.0.1)

Probably the fact that the number of expressions in the SQL part has been reduced allows Oracle 11g to process more columns per pass.

This improved variant requires approximately 150,000 consistent gets (four passes) for basic column statistics for all columns of a 10,000 blocks / 1,000 columns table with DBMS_STATS.AUTO_SAMPLE_SIZE in comparison to 10,000 consistent gets for table with 255 columns (single pass).

So even with the improvements of 11g approximate NDV algorithm it will still require multiple passes when requesting basic column statistics (which generally speaking is recommended) for tables with more than 260 columns. If you think about the consequence of that in case of a table that not only has many columns but also many rows (that are then at least intra-row chained) the potential overhead becomes quite obvious.

So in summary the best way to deal with such tables is to avoid them in first place - in Oracle 10g attempt to keep the number of columns below 160 respectively 260 in 11g when using DBMS_STATS.AUTO_SAMPLE_SIZE to ensure a single pass operation covering all columns.

If you really need to use that many columns it might be worth the effort to write a custom gather statistics procedure that explicitly states the columns to gather statistics for in the METHOD_OPT parameter in order to minimize the number of passes required - of course keeping in mind the potential side effects on the cost based optimizer when column statistics are missing and these become relevant for optimization. In 11g with DBMS_STATS.AUTO_SAMPLE_SIZE this potentially allows to reduce the number of passes down to a single one.

And as a side note - using the deprecated ANALYZE ... STATISTICS command requires exactly a single pass to gather basic column statistics for all 1,000 columns of the 10,000 rows table - it even records only 10,000 consistent gets (and no "table fetch continued row") which is quite interesting since it is unclear how the different row pieces are accessed then. It might however simply be an instrumentation issue. Of course in most cases it doesn't make sense to revert to ANALYZE ... STATISTICS due to its shortcomings (just to name a few: No parallelism, no statistics history, no global statistics on partitioned tables and no approximate NDV improvements in 11g) and differences to DBMS_STATS that might lead to plan changes - see also the other notes linked to in that note by Jonathan Lewis.

Optimizer Mode Mismatch Does Not Prevent Sharing of Child Cursor!?!?

The aim of this post is to describe a strange (buggy) situation that I observed recently. But before doing that, I shortly summarize what a parent cursor and a child cursor are as well as when they can be shared. By the way, I borrowed this description from the pages 20/21 of my book. Hence, [...]