Search

Top 60 Oracle Blogs

Recent comments

Datatype conversions – strange internal function

Perhaps the most famous (or infamous) performance tuning problem you’ll find spanning decades of blog posts when it comes to Oracle and SQL is the “mystery” of why Oracle is not using an index when it is totally obvious to us as the developer that it should be.  The demo code is always along the following lines


SQL> select *
  2  from   t
  3  where  rownum <= 10;

PK         DATA
---------- ---------------------------------------------------------------
1          I_FILE#_BLOCK#
2          I_OBJ3
3          I_TS1
4          I_CON1
5          IND$
6          CDEF$
7          C_TS#
8          I_CCOL2
9          I_PROXY_DATA$
10         I_CDEF4

10 rows selected.

SQL>
SQL> select index_name
  2  from   user_indexes
  3  where  table_name = 'T';

INDEX_NAME
------------------------------
IX

SQL>
SQL> select column_name
  2  from   user_ind_columns
  3  where  index_name = 'IX';

COLUMN_NAME
------------------------------
PK

SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2  from t
  3  where pk = 12;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    38 |   152   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    38 |   152   (2)| 00:00:01 |
--------------------------------------------------------------------------


SQL> select /*+ index(t ix) */ *
  2  from t
  3  where pk = 12;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    38 |   152   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    38 |   152   (2)| 00:00:01 |
--------------------------------------------------------------------------

It just looks like Oracle is broken doesn’t it? There is my primary key column, it is indexed (and rest assured I have not pulled any stunts like making the index unusable or invisible) and yet the database steadfastly refuses to use it, even when I force it with a hint.

Savvy readers will spot something in the output the gives the true cause. Notice that the output for the SELECT on the table shows that the PK column data is left justified. A DESCRIBE command gives the game away.


SQL> desc t
 Name                          Null?    Type
 ----------------------------- -------- --------------
 PK                                     VARCHAR2(40)
 DATA                                   VARCHAR2(128)

The PK column is a VARCHAR2 column even though it is holding numeric data. If I include the entire output from the execution plan, you can immediately see why an index was not used, because the database silently added a TO_NUMBER function around the PK column in order to align the datatypes.


SQL> select /*+ index(t ix) */ *
  2  from t
  3  where pk = 12;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    38 |   152   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    38 |   152   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_NUMBER("PK")=12)

As I mentioned, this is a well known issue and many of us can now spot it immediately when it occurs.

But a customer approached me recently with a performance tuning issue that did not appear to be related to this (but as we’ll soon see, my assumption was incorrect). This time I’ll make it obvious that the PK is indeed a string by prefixing every value with “str”, and then I’ll re-run the same demo, this time with a bind variable


SQL> create table t
  2  as select 'str'||to_char(rownum) pk, object_name data
  3  from dba_objects;

Table created.

SQL>
SQL> create index ix on t ( pk );

Index created.

SQL>
SQL> select *
  2  from   t
  3  where  rownum <= 10;

PK         DATA
---------- ----------------------------------------------------------------------------------
str1       I_FILE#_BLOCK#
str2       I_OBJ3
str3       I_TS1
str4       I_CON1
str5       IND$
str6       CDEF$
str7       C_TS#
str8       I_CCOL2
str9       I_PROXY_DATA$
str10      I_CDEF4

10 rows selected.

SQL>
SQL> select index_name
  2  from   user_indexes
  3  where  table_name = 'T';

INDEX_NAME
------------------------------
IX

SQL>
SQL> select column_name
  2  from   user_ind_columns
  3  where  index_name = 'IX';

COLUMN_NAME
------------------------------
PK

SQL>
SQL> set autotrace traceonly explain
SQL> exec :input := 'str123'

PL/SQL procedure successfully completed.

SQL> select *
  2  from t
  3  where pk = :input;

Execution Plan
----------------------------------------------------------
Plan hash value: 3947747388

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |     1 |    41 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |     1 |    41 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IX   |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

So far everything looks fine – the index is being used as expected.

But is it? What happens when I execute the query?


SQL> select *
  2  from t
  3  where pk = :input;

PK         DATA
---------- ----------------------------------------------------------------
str123     OPQTYPE$

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
------------------------
SQL_ID  9nmz653d6u2wm, child number 0
-------------------------------------
select * from t where pk = :input

Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   163 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |   880 | 36080 |   163   (2)| 00:00:01 |
--------------------------------------------------------------------------

I have gone back to scanning the entire table, and yet it is obvious here that everything here is strings and no number datatypes to confuse the database.

Before I reveal the answer, because as the blog post title suggests, this is also a datatype conversion issue, I’ll digress slightly as a “cautionary tale” about tuning SQL. When the customer first approached me, they were requesting “assistance with stored outlines and Exadata smart scan”. They were encountered the full table scan as shown above and were trying apply a stored outline in order to force an index. Because that was not working, they were then asking me for information about how the optimized costed full scans under Exadata and whether that was impacting the decision and so forth.

So what did I do? I took a narrow view and spent a good couple of hours or so exploring potential issues with stored outlines and smart scan, and of course, there was absolutely nothing wrong with stored outlines and smart scan!

Let me reveal more of the demo code to show the true cause.


SQL> variable input nvarchar2(10)
SQL> exec :input := 'str123'

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------
SQL_ID  9nmz653d6u2wm, child number 0
-------------------------------------
select * from t where pk = :input

Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   163 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |   880 | 36080 |   163   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(SYS_OP_C2C("PK")=:INPUT)

The bind variable I was using was not a VARCHAR2 but NVARCHAR2. That subtle difference means we have a datatype mismatch, just like the common NUMBER versus VARCHAR2 mismatch demo. Thus the database has to perform an conversion to bring them into alignment. In this case, the internal function SYS_OP_C2C is used to change the VARCHAR2 values in the PK column to an NVARCHAR2.

For this customer, they had not explicitly been using NVARCHAR2 so this was a revelation to them. It turned out that some of their driver software automatically bound variables as NVARCHAR2 when the database characterset was UTF8.

So there are two lessons to take away here:

  1. Even if all of the inputs to your SQL appear to be strings, that is still not a guarantee that you won’t get caught by the datatype conversion trap. Always check the predicates section of your SQL plan to ensure no silent conversions are taking place.
  2. When someone requests your help, don’t fall into the trap of focussing your attention on what they say is the problem. Always take that time to step back, and take a look at the root issue and work from there. Sometimes you might end up walking down some same paths that have already been taken, but at least your vision will not be clouded by assumptions (and/or errors) made by others.

For this customer, we temporarily created an index on the expression SYS_OP_C2C(their_column) which solved the performance issue, until such stage as they could roll out a new version of their code which correctly bound their variables to the right datatypes.