Search

Top 60 Oracle Blogs

Recent comments

Function Based Indexes

Indexing LOBs

Many years ago, possibly when most sites were still using Oracle 8i, a possible solution to a particular customer problem was to create a function-based index on a CLOB column using the dbms_lob.getlength() function call. I can’t find the notes explaining why this was necessary (I usually have some sort of clue – such as the client name – in the script, but in this case all I had was a comment that “the manuals say you can’t do this, but it works provided you wrap the dbms_lob call inside a deterministic function”).

32K Columns

Oracle 12c has increased the maximum length of character-based columns to 32K bytes – don’t get too excited, they’re stored out of lines (so similar in cost to LOBs) and need some modification to the parameter file and data dictionary (starting the database in upgrade mode) before you can use them.

Richard Foote has a pair of articles on indexing such columns:

12c Indexing Extended Data Types Part II (15 Steps)

Finally, at long long last, I have a spare 30 minutes in my life to complete this blog entry !! As discussed previously, Oracle has extended the maximum length of varchar2, nvarchar and raw columns to 32K, but this comes with some challenges when it comes to indexing such columns due to restrictions on the […]

FBI decode

It probably won’t surprise many people to hear me say that the decode() function can be a bit of a nuisance; and I’ll bet that quite a lot of people have had trouble occasionally trying to get function-based indexes that use this function to behave properly. So (to put it all together and support the general directives that case is probably a better choice than decode() and that the cast() operator is an important thing to learn) here’s an example of how function-based indexes don’t always allow you to work around bad design/code. (Note: this is a model of a problem I picked up at a client site, stripped to a minimum – you have to pretend that I’m not allowed to fix the problem by changing code).

First we create some data and indexes, and gather all relevant stats:

12c Indexing Extended Data Types Part I (A Big Hurt)

The maximum size for VARCHAR2, NVARCHAR and RAW columns has been extended to 32767 bytes with the Oracle 12c Database. However, indexing such columns with standard indexes comes with some challenges. These extended data types are not enabled by default within the database but can easily be done so by following these steps: Restart the […]

FBI Delete

A recent post on Oracle-l complained about an oddity when deleting through a function-based index.

I have a function based index but the CBO is not using it. The DML that I expect to have a plan with index range scan is doing a FTS. Its a simple DML that deletes 1000 rows at a time in a loop and is based on the column on which the FBI is created.

NVL2()

There are many little bits and pieces lurking in the Oracle code set that would be very useful if only you had had time to notice them. Here’s one that seems to be virtually unknown, yet does a wonderful job of eliminating calls to decode().

The nvl2() function takes three parameters, returning the third if the first is null and returning the second if the first is not null. This is  convenient for all sorts of example where you might otherwise use an expression involving  case or decode(), but most particularly it’s a nice little option if you want to create a function-based index that indexes only those rows where a column is null.

Here’s a code fragment to demonstrate the effect:

FBI Bug

Here’s a funny little optimizer bug – though one that seems to have been fixed by at least 10.2.0.3. It showed up earlier on today in a thread on the OTN database forum. We’ll start (in 9.2.0.8) with a little table and two indexes – one normal, the other descending.

Virtual bug

I’ve said in the past that one of the best new features, in my view, in 11g was the appearance of proper virtual columns; and I’ve also been very keen on the new “approximate NDV” that makes it viable to collect stats with the “auto_sample_size”.

Who’d have guessed that if you put them both together, then ran a parallel stats collection it would break :(

The bug number Karen quotes (10013177.8) doesn’t (appear to) mention extended stats – but since virtual columns, function-based indexes, and extended stats share a number of implementation details I’d guess that they might be affected as well.

FBI oddities

Function-based indexes are wonderful things – but they don’t always work exactly as expected. Here’s an example of one such anomaly.

Imagine you have some type of “orders” table where most orders are in a “finished with” state, and you have a requirement to access the small number of orders in the “new” state. Here’s a sample data set to emulate this type of data requirement (created in 11.1.0.6, 1MB uniforma extents, freelist management and 8KB blocks).


create table t1 (
	state		varchar2(10),
	n1		number,
	v1		varchar2(10),
	padding	varchar2(100)
);

insert into t1
select
	decode(mod(rownum,100),0,'OPEN','CLOSED'),
	rownum,
	lpad(rownum,10,0),
	rpad('x',100,'x')
from
	all_objects
where
	rownum <= 5000
;

I’ve generated this data set so that every 100th row is marked as ‘OPEN’ and all the rest are marked as ‘CLOSED’ – in a real system the percentage of ‘OPEN’ orders would probably be much smaller so we could easily decide to have an index on state to give us an efficient access path to the open orders. But such an index would be very large, because it would also hold entries for the huge number of closed orders; we’d also have to create a histogram on the column (possibly by writing a simple script) so that Oracle could recognise the skewed data distribution.

If we wanted to be clever, though, and if we were able to edit the SQL that addressed this table, we could minimise the size of the index and avoid the need for a histogram by creating a function-based index that held values just for the rows where the state was ‘OPEN’. For example, I could create an index which held the order number only for those rows where the state was open; and there are several ways I could do this, for example:


create index t1_f1 on t1(decode(state,'CLOSED', to_number(null), n1 ));
create index t1_f2 on t1(to_number(decode(state,'CLOSED', null, n1 )));
create index t1_f3 on t1(case when state = 'CLOSED' then to_number(null) else n1 end);
create index t1_f4 on t1(to_number(case when state = 'CLOSED' then null else n1 end));
create index t1_f5 on t1(decode(state,'CLOSED', null, n1 ));
create index t1_f6 on t1(decode(state,'CLOSED', cast(null as number), n1 ));

If you’re wondering why I’ve included a “to_number()” in the first index, remember that NULL is implicitly assumed to be a NULL of type character by Oracle – so I’ve got to do something to tell Oracle that this NULL is supposed to be a numeric NULL. Index t1_f5 is the same as t1_f1, but without the to_number(), and index t1_f6 is the same again but using the more modern cast() to supply the conversion.

You’ll note that I haven’t yet shown any attempt to collect statistics. If we create the indexes AFTER we’ve collected stats on the table we’ll have to collect some extra table stats once the indexes exist because each function-based index will have added a new (hidden) column to the table and, although the “create index” commands will have created statistics for the indexes (from 10g onwards), we will not yet have stats on these hidden columns. So I’m going to wait until after creating the indexes to generate the stats:


begin
	dbms_stats.gather_table_stats(
		ownname		=> user,
		tabname		=>'T1',
		estimate_percent	=> 100,
		method_opt		=> 'for all columns size 1'
	);
end;
/

The question is now this – given the definitions of the indexes above, which of the following six queries – each one designed to be an exact match for one of the index definitions – will use “its” index. (Note that I have hinted the queries to ensure that if the optimizer is allowed to use an index it will use an index – and I’ve included the name of the relevant index as a comment at the end of each hint):


select
	/*+ index(t1) t1_f1 */
	v1
from
	t1
where
	decode(state,'CLOSED', to_number(null), n1 ) = 100
;

select
	/*+ index(t1) t1_f2 */
	v1
from
	t1
where
	to_number(decode(state,'CLOSED', null, n1 )) = 100
;

select
	/*+ index(t1) t1_f3 */
	v1
from
	t1
where
	case when state = 'CLOSED' then to_number(null) else n1 end = 100
;

select
	/*+ index(t1) t1_f4 */
	v1
from
	t1
where
	to_number(case when state = 'CLOSED' then null else n1 end) = 100
;

select
	/*+ index(t1) t1_f5 */
	v1
from
	t1
where
	decode(state,'CLOSED', null, n1 ) = 100
;

select
	/*+ index(t1) t1_f6 */
	v1
from
	t1
where
	decode(state,'CLOSED', cast(null as number), n1 ) = 100
;

The answer depends on the version of Oracle. Under Oracle 11.1.0.6 I got the following results. First, the attempt to create t1_f5 resulted in the following Oracle error (and that’s an important clue to what has happened in another part of the test):

create index t1_f5 on t1(decode(state,'CLOSED', null, n1 ))
                         *
ERROR at line 1:
ORA-01408: such column list already indexed

The index usage was as follows:

    t1_f1		not used	(decode)
    t1_f2		not used	(decode)
    t1_f3		used		(case)
    t1_f4		used		(case)
    t1_f5		non-existent - but used t1_f1
    t1_f6		used		(cast)
    

If you want it in a sound-bite: newer technologies do better than older technologies. But why do the results look the way they do ? You can find the answer in the index definitions that have been stored in the database:


column index_name format a10		heading "Index"
column column_position format 999	heading "Posn"
column column_expression format a72	heading "Expression"

select
	index_name, column_position, column_expression
from
	user_ind_expressions
where
	table_name = 'T1'
;

Index      Posn Expression
---------- ---- ------------------------------------------------------------------------
T1_F1         1 DECODE("STATE",'CLOSED',NULL,"N1")
T1_F2         1 TO_NUMBER(DECODE("STATE",'CLOSED',NULL,TO_CHAR("N1")))
T1_F3         1 CASE "STATE" WHEN 'CLOSED' THEN NULL ELSE "N1" END
T1_F4         1 TO_NUMBER(TO_CHAR(CASE "STATE" WHEN 'CLOSED' THEN NULL ELSE "N1" END ))
T1_F6         1 DECODE("STATE",'CLOSED',CAST(NULL AS number),"N1")

Compare the stored definition with the orginal definitions. Notice how the decodes and NULLs don’t work happily together.

In t1_f1 the explicit to_number() that I included has disappeared – that’s why I was unable to create index t1_f5 – its definition was identical to the modified t1_f1 definition. Then, of course, my predicate no longer matches the exact index definition.

In the t1_f2 definition, because NULL is implicitly character Oracle has added an explicit to_char() to the n1 column I supplied so that its type agrees with the NULL, thus allowing the final to_number() to work. So, again, my predicate no longer matches the index definition.

In t1_f3 and t1_f4 I didn’t include any explicit conversions, and Oracle didn’t add any implicit conversions – but if you look closely it has transformed the version of the case statement I supplied into the simpler form – and everything happened to work (there was an earlier version of Oracle where Oracle would do this transformation for the predicate at run time but not for the index at index creation time – with the result that the “specially created” index wouldn’t work.

Index t1_f5 was not created because my explicit definition matched Oracle’s implicit conversion of t1_f1 – and then my explicit rendition of the matching predicate allowed the optimizer to use index t1_f1.

Finally, with the cast() operator the decode() wasn’t “clever enough” to eliminate my explicit conversion, so the predicate matched the index definition and the index was used.

So the message is this – be careful how you define your function-based indexes, and check what Oracle has stored as the index definition before you commit too much effort to rewriting code to use your new index.

Footnote: Inevitably there are more questions you could ask to fill in further details here. For example, if you created a “genuine” virtual column in 11g using one of my “unusable” decode() expressions, and then indexed the virtual column, would Oracle use the index ? If I had included some cast() operators in my case expressions and corresponding predicates would Oracle still have been able to use the indexes or would I have found the index definitions and predicates were transformed differently and ceased to match ? Is the behaviour shown consistent across all popular versions of Oracle ? (the answer to that last one is No)

These questions (and others) are left as exercises for the interested reader to carry out in the privacy and quiet of their own workplaces.