I’ve written notes about the different joins in the past – but such things are always worth revisiting, so here’s an accumulated bundle of comments about hash joins.
Here’s a funny little problem I came across some time ago when setting up some materialized views. I have two tables, orders and order_lines, and I’ve set up materialized view logs for them that allow a join materialized view (called orders_join) to be fast refreshable. Watch what happens if I refresh this view just before gathering stats on the order_lines table.
I have a little script that start with “set echo on”, then calls two packaged procedures, one to refresh the join view, the other to collect stats on the order_lines table; here’s the output from that script:
Here’s a little detail that I hadn’t noticed before (and it goes back to at least 8i). This is running on 18.104.22.168, and table t1 is just all_objects where rownum <= 20000:
I’ve posted this note as a quick way of passing on an example prompted by a twitter conversation with Timur and Maria about Bloom filters:
— Jonathan Lewis (@JLOracle) August 5, 2013
The Bloom filter (capital B because it’s named after a person) is not supposed to appear in Oracle plans unless the query is executing in parallel but here’s an example which seems to use a serial Bloom filter. Running in 22.214.171.124 and 126.96.36.199 (the results shown are the latter – the numbers are slightly different between versions):
Continuing from the previous post, here is one more case when adjusting optimizer_index_cost_adj may hurt you.
drop table t1 cascade constraints purge; create table t1 (id, x, pad, constraint t1_pk primary key(id, x)) as select trunc(rownum/10) , mod(rownum, 10) , s1.text from all_source s1, all_source s2 where rownum <= 1e6; exec dbms_stats.gather_table_stats(user, 't1', method_opt=>'for all columns size 1', cascade=>true, no_invalidate=>false) alter session set optimizer_index_cost_adj = 100; alter session set optimizer_index_caching = 0; explain plan for select * from t1 where x = :1; @xp
Here’s the plan:
The clustering_factor is one of the most important numbers (if not the most important number) affecting the optimizer’s choice of execution plan – it’s the thing that has the most significant effect on the optimizer’s decision on whether to choose a table scan or an index, and on which index to choose.
I’ve mentioned “linear decay” in several posts when explaining a problem that someone has seen with an execution path – but I’ve recently realised that I don’t have a post describing what it is and how it works – although it’s in Cost Based Oracle – Fundamentals, of course, if you want some detail – so here’s a brief introduction (based on simple stats with no histograms).
A couple of days ago I wrote an article about Oracle picking the “wrong index” after an index rebuild, and I mentioned that the sample data I had generated looked a little odd because it came from a script I had been using to investigate a completely different problem. This note describes that other problem, which appeared on the Oracle-L mailing list last month.
Here’s a couple of extracts from a trace file after I’ve set optimizer_dynamic_sampling to level 3. I’ve run two, very similar, SQL statements that both require dynamic sampling according to the rules for the parameter – but take a look at the different ways that sampling has happened, and ask yourself what’s going on:
Statement 1 produced this sampling code:
In part 1 of this mini-series we looked at the effects of costing a tablescan serially and then parallel when the maxthr and slavethr statistics had not been set.
In part 2 we looked at the effect of setting just the maxthr - and this can happen if you don’t happen to do any parallel execution while the stats collection is going on.
In part 3 we’re going to look at the two variations the optimizer displays when both statistics have been set. So here are the starting system stats: