Search

OakieTags

Who's online

There are currently 0 users and 30 guests online.

Recent comments

Affiliations

Virtual Stats

Or – to be more accurate – real statistics on a virtual column.

This is one of the “10 top tips” that I came up with for my session with Maria Colgan at OOW13. A method of giving more information that might improve execution plans when you can change the code. I’ll start with a small data set including a virtual column (running 11.1.0.7), and a couple of problem queries:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum		n1,
	rownum		n2
from
	generator	v2
;

alter table t1
add (
	m2	generated always as (mod(n2,3)) virtual
)
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);

end;
/

explain plan for
select * from t1 where mod(n1,3) = 0
;

explain plan for
select * from t1 where mod(n1,3) != 0
;

In many cases the wrong cardinality will make a big difference to the optimizer’s choice of execution plan, and anything we can do “legally” to help the optimizer estimate a better cardinality may give us a better plan. I’m only intending to make a point about cardinality estimates with this example, so I haven’t bothered to create a subtle data set, or any indexes – so when I look at the execution plans I’ll only be looking at the “rows” column of the output.

Given the mod(n1,3) predicates and the 10,000 rows in the table we can estimate that the two cardinalities ought to be 3,333 and 6,667 respectively – but what do the plans show us:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 |  1000 |     5  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |  1000 |     5  (20)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(MOD("N1",3)=0)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500 |  5000 |     5  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   500 |  5000 |     5  (20)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(MOD("N1",3)<>0)

The estimate are 100 and 500 respectively – examples of the optimizer’s basic guessing strategy of 1% for “function of column = constant” and 5% for “function of column != constant”. (There are a couple of variations on this theme, but 1% and 5% commonly appear.)

You’ll notice, of course, that n1 and n2 are defined in the same way, so what do the estimates look like when I change the predicates to references n2. There’s an enormous difference:

explain plan for
select * from t1 where mod(n2,3)= 0
;

select * from table(dbms_xplan.display);

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  3333 | 33330 |     5  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  3333 | 33330 |     5  (20)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T1"."M2"=0)



explain plan for
select * from t1 where mod(n2,3) != 0
;

select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  6667 | 66670 |     5  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  6667 | 66670 |     5  (20)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T1"."M2"<>0)

The optimizer has got the estimates right! And the reason becomes clear when you look at the predicate section of the execution plan. The optimizer has transformed the queries by recognising that the supplied predicates match the definition of the virtual column, so it has rewritten the queries to use the virtual columns, which means getting rid of the references to the mod() function, and it has accurate statistics about the virtual column.

Think carefully about the significance of what we’ve done. Without changing the code we’ve made it possible for the optimizer to do a better job because it’s no longer guessing.

As I said on the day – this tip (like all tips) is just the tip of the iceberg – take it, think about it, and apply it sensible and carefully.

Update 4th Oct 2013

It didn’t take long for an iceberg to appear: I was running a training session for a client in Prague yesterday and presented an example of using virtual columns to change execution plans without changing the code and one member of the audience (I forget whether he was a developer of a DBA) promptly came up with this very important question: “What if your code includes statements like insert into t1 values (1,1);”

Of course, your code should always list the column names in the insert statement, but some code doesn’t. In 11g the code will fail with Oracle error “ORA-00947: not enough values”. There is a workaround: rather than creating a virtual column create extended stats to represent the expression. The internal, virtual column representing the expression will then be invisible and the ORA-00947 will not occur.

There is a slight downside to this strategy, though – you’re only allowed 20 sets of extended stats on a single table. That’s quite a lot, but for some people it may not be enough; the limit on virtual columns is dictated by the standard 1,000 column limit for a table.

In 12c there is a nicer alternative – simply declare the virtual column to be invisible. Conveniently Oracle won’t see it for the insert – but you can still use it explicitly by name, and if you use the defining expression in a WHERE clause the optimizer will still recognise that the expression exists as a virtual column with real statistics.

Footnote: a follow-on thought I had about visibility was that some screen-painters automatically generate a single update statement intedned to update every column that has been selected by the screen: so if you display a virtual column on-screen (rather than displaying the result of the expression) then the update statement will fail with Oracle error “ORA-54017: UPDATE operation disallowed on virtual columns.”