Search

OakieTags

Who's online

There are currently 0 users and 20 guests online.

Recent comments

Affiliations

FBI oddities

Function-based indexes are wonderful things – but they don’t always work exactly as expected. Here’s an example of one such anomaly.

Imagine you have some type of “orders” table where most orders are in a “finished with” state, and you have a requirement to access the small number of orders in the “new” state. Here’s a sample data set to emulate this type of data requirement (created in 11.1.0.6, 1MB uniforma extents, freelist management and 8KB blocks).


create table t1 (
	state		varchar2(10),
	n1		number,
	v1		varchar2(10),
	padding	varchar2(100)
);

insert into t1
select
	decode(mod(rownum,100),0,'OPEN','CLOSED'),
	rownum,
	lpad(rownum,10,0),
	rpad('x',100,'x')
from
	all_objects
where
	rownum <= 5000
;

I’ve generated this data set so that every 100th row is marked as ‘OPEN’ and all the rest are marked as ‘CLOSED’ – in a real system the percentage of ‘OPEN’ orders would probably be much smaller so we could easily decide to have an index on state to give us an efficient access path to the open orders. But such an index would be very large, because it would also hold entries for the huge number of closed orders; we’d also have to create a histogram on the column (possibly by writing a simple script) so that Oracle could recognise the skewed data distribution.

If we wanted to be clever, though, and if we were able to edit the SQL that addressed this table, we could minimise the size of the index and avoid the need for a histogram by creating a function-based index that held values just for the rows where the state was ‘OPEN’. For example, I could create an index which held the order number only for those rows where the state was open; and there are several ways I could do this, for example:


create index t1_f1 on t1(decode(state,'CLOSED', to_number(null), n1 ));
create index t1_f2 on t1(to_number(decode(state,'CLOSED', null, n1 )));
create index t1_f3 on t1(case when state = 'CLOSED' then to_number(null) else n1 end);
create index t1_f4 on t1(to_number(case when state = 'CLOSED' then null else n1 end));
create index t1_f5 on t1(decode(state,'CLOSED', null, n1 ));
create index t1_f6 on t1(decode(state,'CLOSED', cast(null as number), n1 ));

If you’re wondering why I’ve included a “to_number()” in the first index, remember that NULL is implicitly assumed to be a NULL of type character by Oracle – so I’ve got to do something to tell Oracle that this NULL is supposed to be a numeric NULL. Index t1_f5 is the same as t1_f1, but without the to_number(), and index t1_f6 is the same again but using the more modern cast() to supply the conversion.

You’ll note that I haven’t yet shown any attempt to collect statistics. If we create the indexes AFTER we’ve collected stats on the table we’ll have to collect some extra table stats once the indexes exist because each function-based index will have added a new (hidden) column to the table and, although the “create index” commands will have created statistics for the indexes (from 10g onwards), we will not yet have stats on these hidden columns. So I’m going to wait until after creating the indexes to generate the stats:


begin
	dbms_stats.gather_table_stats(
		ownname		=> user,
		tabname		=>'T1',
		estimate_percent	=> 100,
		method_opt		=> 'for all columns size 1'
	);
end;
/

The question is now this – given the definitions of the indexes above, which of the following six queries – each one designed to be an exact match for one of the index definitions – will use “its” index. (Note that I have hinted the queries to ensure that if the optimizer is allowed to use an index it will use an index – and I’ve included the name of the relevant index as a comment at the end of each hint):


select
	/*+ index(t1) t1_f1 */
	v1
from
	t1
where
	decode(state,'CLOSED', to_number(null), n1 ) = 100
;

select
	/*+ index(t1) t1_f2 */
	v1
from
	t1
where
	to_number(decode(state,'CLOSED', null, n1 )) = 100
;

select
	/*+ index(t1) t1_f3 */
	v1
from
	t1
where
	case when state = 'CLOSED' then to_number(null) else n1 end = 100
;

select
	/*+ index(t1) t1_f4 */
	v1
from
	t1
where
	to_number(case when state = 'CLOSED' then null else n1 end) = 100
;

select
	/*+ index(t1) t1_f5 */
	v1
from
	t1
where
	decode(state,'CLOSED', null, n1 ) = 100
;

select
	/*+ index(t1) t1_f6 */
	v1
from
	t1
where
	decode(state,'CLOSED', cast(null as number), n1 ) = 100
;

The answer depends on the version of Oracle. Under Oracle 11.1.0.6 I got the following results. First, the attempt to create t1_f5 resulted in the following Oracle error (and that’s an important clue to what has happened in another part of the test):

create index t1_f5 on t1(decode(state,'CLOSED', null, n1 ))
                         *
ERROR at line 1:
ORA-01408: such column list already indexed

The index usage was as follows:

    t1_f1		not used	(decode)
    t1_f2		not used	(decode)
    t1_f3		used		(case)
    t1_f4		used		(case)
    t1_f5		non-existent - but used t1_f1
    t1_f6		used		(cast)
    

If you want it in a sound-bite: newer technologies do better than older technologies. But why do the results look the way they do ? You can find the answer in the index definitions that have been stored in the database:


column index_name format a10		heading "Index"
column column_position format 999	heading "Posn"
column column_expression format a72	heading "Expression"

select
	index_name, column_position, column_expression
from
	user_ind_expressions
where
	table_name = 'T1'
;

Index      Posn Expression
---------- ---- ------------------------------------------------------------------------
T1_F1         1 DECODE("STATE",'CLOSED',NULL,"N1")
T1_F2         1 TO_NUMBER(DECODE("STATE",'CLOSED',NULL,TO_CHAR("N1")))
T1_F3         1 CASE "STATE" WHEN 'CLOSED' THEN NULL ELSE "N1" END
T1_F4         1 TO_NUMBER(TO_CHAR(CASE "STATE" WHEN 'CLOSED' THEN NULL ELSE "N1" END ))
T1_F6         1 DECODE("STATE",'CLOSED',CAST(NULL AS number),"N1")

Compare the stored definition with the orginal definitions. Notice how the decodes and NULLs don’t work happily together.

In t1_f1 the explicit to_number() that I included has disappeared – that’s why I was unable to create index t1_f5 – its definition was identical to the modified t1_f1 definition. Then, of course, my predicate no longer matches the exact index definition.

In the t1_f2 definition, because NULL is implicitly character Oracle has added an explicit to_char() to the n1 column I supplied so that its type agrees with the NULL, thus allowing the final to_number() to work. So, again, my predicate no longer matches the index definition.

In t1_f3 and t1_f4 I didn’t include any explicit conversions, and Oracle didn’t add any implicit conversions – but if you look closely it has transformed the version of the case statement I supplied into the simpler form – and everything happened to work (there was an earlier version of Oracle where Oracle would do this transformation for the predicate at run time but not for the index at index creation time – with the result that the “specially created” index wouldn’t work.

Index t1_f5 was not created because my explicit definition matched Oracle’s implicit conversion of t1_f1 – and then my explicit rendition of the matching predicate allowed the optimizer to use index t1_f1.

Finally, with the cast() operator the decode() wasn’t “clever enough” to eliminate my explicit conversion, so the predicate matched the index definition and the index was used.

So the message is this – be careful how you define your function-based indexes, and check what Oracle has stored as the index definition before you commit too much effort to rewriting code to use your new index.

Footnote: Inevitably there are more questions you could ask to fill in further details here. For example, if you created a “genuine” virtual column in 11g using one of my “unusable” decode() expressions, and then indexed the virtual column, would Oracle use the index ? If I had included some cast() operators in my case expressions and corresponding predicates would Oracle still have been able to use the indexes or would I have found the index definitions and predicates were transformed differently and ceased to match ? Is the behaviour shown consistent across all popular versions of Oracle ? (the answer to that last one is No)

These questions (and others) are left as exercises for the interested reader to carry out in the privacy and quiet of their own workplaces.