Search

Top 60 Oracle Blogs

Recent comments

Misdirection

A recent post on the ODC database forum prompted me to write a short note about a trap that catches everyone from time to time. The trap is following the obvious; and it’s a trap because it’s only previous experience that lets you decide what’s obvious and the similarity between what you’re looking and your previous experience may be purely coincidental.

The question on OTN (paraphrased) was as follows:

When I run the first query below Oracle doesn’t use the index on column AF and is slow, but when I run the second query the Oracle uses the index and it’s fast. So when the input starts with ‘\\’ the indexes are not used. What’s going on ?


SELECT * FROM T WHERE AF = '\\domain\test\1123.pdf';
SELECT * FROM T WHERE AF = 'a\\domain\test\1123.pdf';

Looking at the two queries my first thought was that it’s obvious what’s (probably) happening, and my second thought was the more interesting question: “why does this person think that the ‘\\’ is significant ?”

The cause of the difference in behaviour is probably related to the way that Oracle stores statistics (specifically histograms) about character columns, and the way in which the cardinality calculations can go wrong.  If two character match over the first few characters the numeric representation of those strings that Oracle uses in a histogram is identical, and if they are long enough even the “actual value” stored would be identical. It looks as if this person is storing URLs, and it’s quite likely that there are a lot of long URLs that start with the same (long) string of characters – it’s a very old problem – and it’s an example of a column where you probably want to be absolutely sure that you don’t gather a histogram.

But why did the OP decide that the ‘\\’ was the significant bit ? I don’t know, of course, but  how about this:

  • No contrary tests: Perhaps every single time the query misbehaved the value started with ‘\\’ and it never went wrong for any other starting values. And maybe the OP tested several different domain names – it would be much easier to see the ‘\\’ as the common denominator rather than “repetitive leading character string” if you tested with values that spanned different domains.

combined with

  • An easily available “justification”: In many programming languages (including SQL) ‘\’ is an escape character – if you don’t really know much about how the optimizer works you might believe that that could be enough to confuse the optimizer.

It can be very difficult when you spot an obvious pattern to pause long enough to consider whether you’ve identified the whole pattern, or whether you’re looking at a special case that’s going to take you in the wrong direction.