In the previous post I've demonstrated an unexpected Nested Loop Join caused by an extreme data distribution. Although unexpected at first sight, the performance of the execution plan selected by the optimizer is decent - provided the estimates are in the right ballpark.Here is another case of an unexpected execution plan, this time about Merge Joins.
In order to appreciate why the execution plan encountered is unexpected, first a quick summary about how Merge Joins work:A Merge Join is essentially a Nested Loop operation from one sorted row source into another sorted row source.
Recently I came across some interesting edge cases regarding the costing of joins. They all have in common that they result in (at first sight) unexpected execution plans, but only some of them are actual threats to performance.
The first one is about outer joins with an extreme data distribution. Consider the following data setup:
create table t1
rownum as id
, rpad('x', 100) as filler
, case when rownum > 1e6 then rownum end as null_fk
level <= 1e6
exec dbms_stats.gather_table_stats(null, 't1')
create table t2
rownum as id
, rpad('x', 100) as filler
level <= 1e6
When performing aggregate GROUP BY operations an additional filter on the aggregates can be applied using the HAVING clause.Usually aggregates are one of the last steps executed before the final result set is returned to the client.However there are various reasons, why a GROUP BY operation might be somewhere in the middle of the execution plan operation, for example it might be part of a view that cannot be merged (or was hinted not to be merged using the NO_MERGE hint), or in the more recent releases (11g+) the optimizer decided to use the GROUP BY PLACEMENT transformation that deliberately can move the GROUP BY operation to a different execution step of the plan.In such cases, when the GROUP BY operation will be input to some other operation, it becomes essential for the overall efficiency of the execution plan preferred by the optimizer that the cardinality estimates are in the right ballpark, as it will influe
Here’s a question that is NOT a trick question, it’s demonstrating an example of optimizer behaviour that might come as a surprise.
I have an index (addr_id0050, effective_date), the first column is numeric, the second is a date. Here’s a query with an execution plan that uses that index:
A recent question on OTN asked how you could model a case where Oracle had the choice between a “perfect” index for a range scan and an index that could be used for an index skip scan and choose the latter path even though it was clearly (to the human eye) the less sensible choice. There have been a number of wierd and wonderful anomalies with the index skip scan and bad choice over the years, and this particular case is just one of many oddities I have seen in the past – so I didn’t think it would be hard to model one (in fact, I thought I already had at least two examples somewhere in my library – but I couldn’t find them).
One of the most useful information that the Oracle kernel attaches to plans in the library cache are measures of various resource consumption figures, such as elapsed time, consistent and current gets, disk reads, etcetera. These can be made available for each plan line (aka "row source operation").
These figures are always cumulative, that is, include both the resource consumed by the line itself and all of its progeny. It is very often extremely useful to exclude the progeny from the measure, to get what we could name the "self" figure (following, of course, the terminology introduced by Cary Millsap and Jeff Holt in their famous book Optimizing Oracle Performance).
The Enkitec Extreme Exadata Expo (E4) event is over, but I still have plenty to say about the technology. The event was a great success, with plenty of interesting speakers and presentations. As I said in a previous note, I was particularly keen to hear Frits Hoogland’s comments on Exadata and OLTP, Richard Foote on Indexes, and Maria Colgan’s comments on how Oracle is making changes to the optimizer to understand Exadata a little better.
All three presentations were interesting – but Maria’s was possiby the most important (and entertaining). In particular she told us about two patches for 126.96.36.199, one current and one that is yet to be released (unfortunately I forgot to take note of the patch numbers).
When I arrived in Edinburgh for the UKOUG Scotland conference a little while ago Thomas Presslie, one of the organisers and chairman of the committee, asked me if I’d sign up on the “unconference” timetable to give a ten-minute talk on something. So I decided to use Hybrid Columnar Compression to make a general point about choosing and testing features. For those of you who missed this excellent conference, here’s a brief note of what I said.
In a couple of days, on Wednesday, 1st of August, I'll be presenting another free webinar hosted at AllThingsOracle.com.
Although it is called "Oracle Cost-Based Optimizer Advanced Session", don't be mislead by the title.
It is not a truly "advanced" session, but rather I'll try to delve into various topics that I could only mention briefly or had to omit completely during the first webinar on the Cost-Based Optimizer.
In principle it's going to be a selection of the most recurring issues that I come across during my consultancy work:
- I'm going to spend some time on statistics and histograms in particular and what I believe are the most important aspects to understand regarding them
A comment on a recent post of mine pointed me to a question on the OTN SQL and PL/SQL Forum where someone had presented a well-written test case of an odd pattern of behaviour in ANSI SQL. I made a couple of brief comments on the thread, but thought it worth highlighting here as well. The scripts to create the required tables (plus a few extras) are all available on OTN. If you create only the four tables needed and all their indexes you will need about 1.3GB of space.