Search

Top 60 Oracle Blogs

Recent comments

statistics

Join Cardinality – 5

So far in this series I’ve written about the way that the optimizer estimates cardinality for an equijoin where one end of the join has a frequency histogram and the other end has a histogram of type:

Upgrades – again

I’ve got a data set which I’ve recreated in 11.2.0.4 and 12.2.0.1.

I’ve generated stats on the data set, and the stats are identical.

I don’t have any indexes or extended stats, or SQL Plan directives or SQL Plan Profiles, or SQL Plan Baselines, or SQL Patches to worry about.

I’m joining two tables, and the join column on one table has a frequency histogram while the join column on the other table has a height-balanced histogram.  The histograms were created with estimate_percent => 100%. (which explains why I’ve got a height-balanced histogram in 12c rather than a hybrid histogram.)

Here are the two execution plans, 11.2.0.4 first, pulled from memory by dbms_xplan.display_cursor():

Join Cardinality – 4

In previous installments of this series I’ve been describing how Oracle estimates the join cardinality for single column joins with equality where the columns have histograms defined. So far I’ve  covered two options for the types of histogram involved: frequency to frequency, and frequency to top-frequency. Today it’s time to examine frequency to hybrid.

My first thought about this combination was that it was likely to be very similar to frequency to top-frequency because a hybrid histogram has a list of values with “repeat counts” (which is rather like a simple frequency histogram), and a set of buckets with variable sizes that could allow us to work out an “average selectivity” of the rest of the data.

Upgrade threat

Here’s one I’ve just discovered while trying to build a reproducible test case – that didn’t reproduce because an internal algorithm has changed.

If you upgrade from 12c to 18c and have a number of hybrid histograms in place you may find that some execution plans change because of a change in the algorithm for producing hybrid histograms (and that’s not just if you happen to get the patch that fixes the top-frequency/hybrid bug relating to high values).

Here’s a little test to demonstrate how I wasted a couple of hours trying to solve the wrong problem – first a simple data set:

Column Groups

Sometimes a good thing becomes at bad thing when you hit some sort of special case – today’s post is an example of this that came up on the Oracle-L listserver a couple of years ago with a question about what the optimizer was doing. I’ll set the scene by creating some data to reproduce the problem:

Hybrid Fake

Oracle 12c introduced the “Hybrid” histogram – a nice addition to the available options and one that (ignoring the bug for which a patch has been created) supplies the optimizer with better information about the data than the equivalent height-balanced histogram. There is still a problem, though, in the trade-off between accuracy and speed: just as it does with height-balanced histograms when using auto_sample_size Oracle samples (typically) about 5,500 rows to create a hybrid histogram, and the SQL it uses to generate the necessary summary is essentially an aggregation of the sample, so either you have a small sample with the risk of lower accuracy or a large sample with an increase in workload.

Join Cardinality – 3

In the previous posting I listed the order of precision of histograms as:

Join Cardinality – 2

In the previous note I posted about Join Cardinality I described a method for calculating the figure that the optimizer would give for the special case where you had a query that:

Join Cardinality

Following up my “Hacking for Skew” article from a couple of days ago, Chinar Aliyev has written an article about a method for persuading the optimizer to calculate the correct cardinality estimate without using any undocumented, or otherwise dubious, mechanisms.

Hacking for Skew

In my presentation to the UKOUG SIG yesterday “Struggling with Statistics – part 2” I described a problem that I wrote about a few months ago: when you join a fact table with a massively skewed distribution on one of the surrogate key columns to a dimension holding the unique list of keys and descriptions a query against a description “loses” the skew. Here’s an demo of the problem that’s a little simpler than the one in the previous article.