Search

OakieTags

Who's online

There are currently 0 users and 40 guests online.

Recent comments

Affiliations

Execution plans

Internal Views

When Oracle transforms a query it will sometimes report an internal view name in the execution plan – and I have seen people asking for help occasionally because they’ve tried to find the view name in dba_views and it hasn’t existed. So, just for reference, here’s a short list of the view names that might appear during query transformation but won’t exist in your system.

With each name I’ve put a brief comment of why they might appear:

index$_join$_#  Index hash join
vw_nso_#		Unnesting "IN" subquery
vw_nsq_#		Haven't seen one recently, but I think it's a variation on "IN" subqueries
vw_sq_#			Unnesting - possibly specific to correlated subqueries
vw_gbc_#		group by placement
vw_wif_#		Analytic functions (especially those generated by "remove aggregate subquery")
vm_nwvw_#		Note "VM" rather than "VW" - possibly related to complex view merging with distinct
vw_foj_#		Full Outer Join
vw_jf_????		Join Factorization
--
-- From Timur Akhmadeev (comment #2)
--
VW_CDG_%08X
VW_COL_%d          # Naming convention (somtimes) for columns used in generated views
VW_DAT_%08X        # distinct aggregate transform?
VW_DIG_%08X
VW_DIS_%08X        # distinct (similar to group by placement)?
VW_DTP_%08X
VW_FTG_%08X
VW_LT_%08X
VW_RIG_%08X
VW_RT_%08X
VW_%s_%d
VW_SJE_%08X
VW_%s_%s
VW_ST_%08X
VW_STJ_%08X
VW_TEX_%08X
--
-- From Tane Poder (comment #3)
--
VW_TE_#         Table expansion transformation
VW_DAG_#        Distinct aggregatation transform

Note: I’ve used ‘#’ to represent the possible appearance of a number, the items supplied by Timur Akhmedeev use the C “printf” conventions.

Originally I called for suggestions of the ones I’d failed to report – but the list is now much closer to completion. If you’d like to contribute any further names names, or fill in some of the missing causes I’ll add them to the list.

For reference notes for a couple of these cases:

ANSI Outer

Here’s an example of ANSI SQL that does something in a fashion that arguably looks a little tidier than the strategy you have to adopt in Oracle. As so often when I compare Oracle syntax and ANSI syntax it’s an exanple that relates to an outer join. We start with two tables – as usual I have locally managed tablespaces, 8KB blocks, 1MB uniform extents and freelist management. I’ve also disabled system statistics (CPU costing):


create table t1
as
select
	rownum - 1			id,
	mod(rownum - 1,20)		n1,
	lpad(rownum - 1,10,'0')		v1,
	rpad('x',100)			padding
from
	all_objects
where
	rownum <= 4000
;

create table t2
as
select
	rownum - 1			id,
	mod(rownum - 1,20)		n1,
	lpad(rownum - 1,10,'0')		v1,
	rpad('x',100)			padding
from
	all_objects
where
	rownum <= 4000
;

create index t1_i1 on t1(id);
create index t2_i1 on t2(id);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		estimate_percent => 100,
		method_opt 	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		estimate_percent => 100,
		method_opt 	 => 'for all columns size 1'
	);

end;
/

If you're familiar with ANSI SQL you won't need more than a couple of moments to interpret the following query - but I have to admit that I had to think about it carefully before I figured out what it was trying to achieve:


select
	/*+ gather_plan_statistics */
	t1.id,
	t1.n1,
	t1.v1,
	t2.n1
from
	t1
left join
	t2
on
	t2.id = t1.n1
and	t1.n1 in (7, 11, 13)
where
	t1.id = 15
;

        ID         N1 V1                 N1
---------- ---------- ---------- ----------
        15         15 0000000015

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   1 |  NESTED LOOPS OUTER            |       |      1 |      1 |      1 |00:00:00.01 |       4 |
|   2 |   TABLE ACCESS BY INDEX ROWID  | T1    |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX RANGE SCAN            | T1_I1 |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   VIEW                         |       |      1 |      1 |      0 |00:00:00.01 |       0 |
|*  5 |    FILTER                      |       |      1 |        |      0 |00:00:00.01 |       0 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T2    |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  7 |      INDEX RANGE SCAN          | T2_I1 |      0 |      1 |      0 |00:00:00.01 |       0 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."ID"=15)
   5 - filter(("T1"."N1"=7 OR "T1"."N1"=11 OR "T1"."N1"=13))
   7 - access("T2"."ID"="T1"."N1")
       filter(("T2"."ID"=7 OR "T2"."ID"=11 OR "T2"."ID"=13))

Look carefully at the query. The ON clause includes a reference to the t1 table that is NOT a join condition. This condition means that Oracle will try to find a matching row in t2 only if the n1 value in t1 is in 7,11, or 13.

Since the join is left outer, any t1 row where id = 15 will be returned, but the n1 column from t2 will be reported only if the t1.n1 value is in 7,11, or 13 and there is a t2 row with a matching id value.

How would you express the same requirment in standard Oracle syntax ? Here's one possibility:

select
	/*+ gather_plan_statistics */
	t1.id,
	t1.n1,
	t1.v1,
	t2.n1
from
	t1, t2
where
	t1.id = 15
and	t2.id(+) = case
		when t1.n1 not in (7, 11, 13)
		 	then null
			else t1.n1
	end
;

        ID         N1 V1                 N1
---------- ---------- ---------- ----------
        15         15 0000000015

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   1 |  NESTED LOOPS OUTER          |       |      1 |      1 |      1 |00:00:00.01 |       4 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2    |      1 |      1 |      0 |00:00:00.01 |       0 |
|*  5 |    INDEX RANGE SCAN          | T2_I1 |      1 |      1 |      0 |00:00:00.01 |       0 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."ID"=15)
   5 - access("T2"."ID"=CASE  WHEN (("T1"."N1"<>7) AND ("T1"."N1"<>11) AND
              ("T1"."N1"<>13)) THEN NULL ELSE "T1"."N1" END )

You will have noticed that I used the /*+ gather_plan_statistics */ hint in both queries, and the plans I've shown are the ones that I pulled from memory with their last execution statistics included.

The plans are clearly just a little different from each other - but are they equally efficient ?

Both plans start the same way - for each relevant row in t1 they call line 4 - and in both cases there is only one relevant row in t1, hence one call to line 4.

In the first plan we call the subroutine to create a rowsource (the VIEW operator), and this calls the FILTER operation. The filter operation is an example of a "conditional" filter - i.e. if the test in line 5 is true then line 6 is called - and in this case the test is false so line 6 is never called (Starts = 0) and Oracle doesn't try to access table t2. So we fetch one row from t1, then call two subroutines that between them do a "case" test but access no data blocks.

In the second plan line 4 calls the table access operation, which calls the index range scan operation in line 5 - but line 5 starts with a call to the case statement that returns NULL - so even though we call the index range scan operation, we don't access any data blocks, which means we don't pass any rowids to the table access in line 4, which means that that operation doesn't access any data blocks. The net workload is that lines 4 and 5 in the second plan also represent nothing more than two subroutine calls and a "case" test.

The two plans are virtually identical in resource usage - so your choice of which to use probably comes down to aesthetic appeal and comfort factor. Thanks to my previous habits I think the Oracle variant is "obviously" much easier to understand - but the client probably felt the opposite because they spent most of their time writing ANSI.

Is there another way to write the query - yes, there is, but for some reason it's not supported. If you look at the 10053 trace file for the ANSI example you'll find that Oracle has rewritten it with a LATERAL subquery before optimising it. So, presumably, we can appply the same transformation manually:


select
	/*+ gather_plan_statistics */
	t1.id,
	t1.n1,
	t1.v1,
	t2.n1
from
	t1,
	lateral (
		(
		select
			t2.n1
		from
			t2
		where
			t1.n1 in (7, 11, 13)
		and	t2.id = t1.n1
		)
	)(+) t2
where
	t1.id = 15
;

        lateral (
                *
ERROR at line 9:
ORA-00933: SQL command not properly ended

On second thoughts perhaps we can't - but it was a nice idea.

Oracle uses the table() operator with collections (or pipelined functions) as the equivalent of the ANSI LATERAL() operator, but doesn't let you use it with queries (you get Oracle error "ORA-22905: cannot access rows from a non-nested table item" if you try).

The concept is simple - the table() or lateral() operator allows you to introduce a subquery in the FROM clause that references columns from a table that appears earlier in the same FROM clause.

I don't know why Oracle doesn't support the lateral() operator in end-user code - but actually, for experimental purposes only of course, if you want to play with it you can always set event 22829:


alter session set events '22829 trace name context forever';

-- execute lateral query, and get this plan

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   1 |  NESTED LOOPS OUTER            |       |      1 |      1 |      1 |00:00:00.01 |       4 |
|   2 |   TABLE ACCESS BY INDEX ROWID  | T1    |      1 |      1 |      1 |00:00:00.01 |       4 |
|*  3 |    INDEX RANGE SCAN            | T1_I1 |      1 |      1 |      1 |00:00:00.01 |       3 |
|   4 |   VIEW                         |       |      1 |      1 |      0 |00:00:00.01 |       0 |
|*  5 |    FILTER                      |       |      1 |        |      0 |00:00:00.01 |       0 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T2    |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  7 |      INDEX RANGE SCAN          | T2_I1 |      0 |      1 |      0 |00:00:00.01 |       0 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."ID"=15)
   5 - filter(("T1"."N1"=7 OR "T1"."N1"=11 OR "T1"."N1"=13))
   7 - access("T2"."ID"="T1"."N1")
       filter(("T2"."ID"=7 OR "T2"."ID"=11 OR "T2"."ID"=13))

The plan is identical to the plan for the ANSI after transformation. I'll leave it to you to decide whether the code is easier to understand than the ANSI or ordinary Oracle versions - but I'd like to see it made legal, even if I didn't find many cases where I needed it.

Subquery Selectivity – 2

Here’s an item I thought I’d published a few years ago as a follow-up to an article on a 10g bug-fix for subquery selectivity. I was reminded of my oversight when a question came up on OTN that looked like an example of the bug introduced by the bug-fix – and I found that I couldn’t simply supply a link to explain the problem.

We start with some simple SQL to create a test data set:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,729)		id1,
	rownum			id2,
	mod(rownum,11)		n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

alter table t1 add constraint t1_pk primary key(id1, id2);

create table t2
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,737)		id1,
	trunc((rownum-1)/737)	id2,
	mod(rownum,11)		n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

alter table t2 add constraint t2_pk primary key(id1, id2);

-- gather statistics, compute, no histograms

Having created these data sets, we can run the following query against them. It’s a very simple query with aggregate subquery. I’ve included a no_unnest hint because many newer versions of Oracle would do a cost-based transformation on something this simple and unnest the subquery. I want to keep the code and numbers simple while demonstrating a principle. (In the earlier article I used a two-table join with two existence subqueries to make a point about how a change in arithmetic could cause a change in execution plan; in this article I’m just going to show you the change in arithmetic.)


select
	*
from
	t1
where
	1 = (
		select
			/*+ no_unnest */
			max(n1)
		from
			t2
		where
			t2.id1 = t1.id1
	)
;

Here are the execution plans from 9.2.0.8 and 11.1.0.6 respectively (the 10.2 plan matches the 11g plan). For consistency I am not using CPU costing (system statistics) in either case:


Execution Plan from 9.2.0.8
-----------------------------------------------------------------------------
| Id  | Operation                     |  Name       | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             | 10000 |  1201K|    13M|
|*  1 |  FILTER                       |             |       |       |       |
|   2 |   TABLE ACCESS FULL           | T1          | 10000 |  1201K|  2712 |
|   3 |   SORT AGGREGATE              |             |     1 |     7 |       |
|   4 |    TABLE ACCESS BY INDEX ROWID| T2          |  1357 |  9499 |  1363 |
|*  5 |     INDEX RANGE SCAN          | T2_PK       |  1357 |       |     6 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ NO_UNNEST */ MAX("T2"."N1") FROM "T2" "T2" WHERE
              "T2"."ID1"=:B1)=1)
   5 - access("T2"."ID1"=:B1)

Execution Plan from 11.1.0.6 (matches plan from 10.2.0.3)
-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |    14 |  1722 |   996K|
|*  1 |  FILTER                       |       |       |       |       |
|   2 |   TABLE ACCESS FULL           | T1    |  1000K|   117M|  2712 |
|   3 |   SORT AGGREGATE              |       |     1 |     7 |       |
|   4 |    TABLE ACCESS BY INDEX ROWID| T2    |  1357 |  9499 |  1363 |
|*  5 |     INDEX RANGE SCAN          | T2_PK |  1357 |       |     6 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ NO_UNNEST */ MAX("N1") FROM "T2" "T2" WHERE
              "T2"."ID1"=:B1)=1)
   5 - access("T2"."ID1"=:B1)

You will notice, of course, that the estimated cardinality of the result (line 0) is different, and the estimated cardinality of the scan on t1 (line 2) is different. There was an important code change introduced in 10g relating to the handling of subqueries that explains this effect.

In the previous article I pointed out that an existence subquery introduces a selectivity of 5% – and that the upgrade to 10g changed the point in the optimizer’s arithmetic where this 5% was applied (hence where the plan showed a change in cardinality). In the example above my code is of the form ” = (subquery) “, which means Oracle applies a “magic” 1% rather than 5%. (For a range-based predicate – i.e. <, > etc – it would have been the 5% that we got for existence.)

So in the 9i plan, line 2 shows a cardinality of 10,000 because that’s 1% of the original 1,000,000 rows but, as I pointed out in the earlier article, this means the optimizer will be working out the rest of the plan based on the estimated effect of a filter subquery that has not yet run. (Do read the previous article if you don’t follow that last comment.)

From 10g onwards (and in our 11.1 plan) the cardinality for line 2 is 1,000,000 because that’s how many rows will appear in the rowsource as the table is scanned. But then you have to ask why Oracle didn’t print a cardinality on line 1 (the FILTER line) because it’s at that line that we should see a cardinality of 10,000 as the filter selectivity of 1% is applied.

But there is another problem. The final cardinality of the 11.1 plan is NOT reported as 10,000; it’s reported as 14. As Oracle has fixed one defect in the optimizer it has introduced another (whether by accident, or by design, I don’t know). The estimate of 14 rows is effectively double-counting. The optimizer has applied the the 1% selectivity for the subquery, and then applied the selectivty of the predicate “t1.id1 = {unknown value}”. (If the predicate had been “> (subquery)” or some other range-based operator, or existence, then the arithmetic would have applied a 5% selectivity on top of that “t1.id1 = {unknown value}” selectivity.

In this case, of course, it probably doesn’t matter that the final cardinality is wrong – but in more complex cases this error may be introduced somewhere in the middle of the plan, making the optimizer do something very silly from that point onwards.

There is a surprising “workaround” to this issue – which may not be relevant in all cases and may not even work in some. If you force Oracle into running the subquery early (with the push_subq hint) the double-counting doesn’t occur:

select
	/*+ push_subq */	-- 9i hint
	*
from
	t1
where
	1 = (
		select
			/*+ no_unnest push_subq */	-- 10g+ hint (if you don't use query block naming)
			max(n1)
		from
			t2
		where
			t2.id1 = t1.id1
	)
;

-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       | 10000 |  1201K|  4075 |
|*  1 |  TABLE ACCESS FULL            | T1    | 10000 |  1201K|  2712 |
|   2 |   SORT AGGREGATE              |       |     1 |     7 |       |
|   3 |    TABLE ACCESS BY INDEX ROWID| T2    |  1357 |  9499 |  1363 |
|*  4 |     INDEX RANGE SCAN          | T2_PK |  1357 |       |     6 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( (SELECT /*+ PUSH_SUBQ NO_UNNEST */ MAX("N1") FROM "T2"
              "T2" WHERE "T2"."ID1"=:B1)=1)
   4 - access("T2"."ID1"=:B1)

It strikes me as a little bizarre that this hint has any effect at all in this case since there is no “earlier point” at which the subquery can run. But it has pushed the optimizer through a different code path, which has produced a plan with a different shape and avoided the double-counting.

Footnote: I’ve just included in the final query text a reminder that the push_subq hint changed in the upgrade from 9i to 10g. Originally it went at the top of the query and caused every subquery to be pushed; from 10g you have to be selective about the subqueries you want pushed. The hint at the top will be “ignored” and you apply the hint to each subquery that you want pushed by writing it in the subquery (as I have here) or by naming each subquery with the qb_name hint and then putting the push_subq(@{your subquery name}) hint(s) at the top of the query, naming the subqueries to be pushed.

Fake Baselines

SQL Baslines in 11g are the new Stored Outlines – and one of the nicest features of SQL Baselines is that you are allowed to fake them; or rather, it’s legal to generate an execution plan for one query and transfer its execution plan to another query using the packaged procedure dbms_spm.load_plans_from_cursor_cache(). This posting is a demonstration of the technique.

We start with a sample data set and a query that is going to do “the wrong thing”. As usual I’ve got a locally managed tablespace with 8KB blocks and 1MB uniform extents, freelist management, and I’m running with CPU Costing disabled (and running 11.1.0.6 in this case):


set serveroutput off

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	mod(rownum,1000)	n1000,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

create index t1_n1 on t1(n1000);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/
select
	/*+ target_query */
	id
from
	t1
where
	n1000 = 500
;

The final query is the one that I want to fake a baseline for. With my current set up it does an index range scan to pick up 10 rows, but I’m going to make it do a tablescan instead. I’m going to need to pull the exact text of the query from memory in a moment, so I’ll find its sql_id and child_number by searching for the “pseudo-hint” that I’ve included in the text, and I’ll report the execution plan to show that I’ve picked up the right thing (I’m assuming that there’s only one piece of SQL that’s going to include the text “target_query”, of course):

column	sql_id			new_value	m_sql_id_1
column 	plan_hash_value		new_value	m_plan_hash_value_1
column	child_number		new_value	m_child_number_1

select
	sql_id, plan_hash_value, child_number
from
	v$sql
where
	sql_text like '%target_query%'
and	sql_text not like '%v$sql%'
and	rownum = 1
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_1',&m_child_number_1));

SQL_ID  306m98cpu9yz7, child number 0
-------------------------------------
select  /*+ target_query */  id from  t1 where  n1000 = 500

Plan hash value: 1420382924

---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |       |       |    11 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    10 |    80 |    11 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |    10 |       |     1 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1000"=500)

The next step is to create a new query (and in a production system that might simply mean running a heavily hinted version of the target query) that uses the execution plan I want to see; and I’ll use the same search technique to find it and report its plan:

select
	/*+ full(t1) alternate_query */
	id
from
	t1
where
	n1000 = 500
;

column	sql_id			new_value	m_sql_id_2
column 	plan_hash_value		new_value	m_plan_hash_value_2
column	child_number		new_value	m_child_number_2

select
	sql_id, plan_hash_value, child_number
from
	v$sql
where
	sql_text like '%alternate_query%'
and	sql_text not like '%v$sql%'
and	rownum = 1
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_2',&m_child_number_2));

SQL_ID  bvpb73bb6c6uy, child number 0
-------------------------------------
select  /*+ full(t1) alternate_query */  id from  t1 where  n1000 = 500

Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    28 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |    80 |    28 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1000"=500)

After been running and reporting these queries and their plans, I’ve captured the SQL_Id, child_number, and plan_hash_value for each query; and this is more than enough information to make it possible to create an SQL Baseline for one query using the execution plan for the other query.

declare
	m_clob	clob;
begin
	select
		sql_fulltext
	into
		m_clob
	from
		v$sql
	where
		sql_id = '&m_sql_id_1'
	and	child_number = &m_child_number_1
	;

	dbms_output.put_line(m_clob);

	dbms_output.put_line(
		dbms_spm.load_plans_from_cursor_cache(
			sql_id 			=> '&m_sql_id_2',
			plan_hash_value		=> &m_plan_hash_value_2,
			sql_text		=> m_clob,
			fixed			=> 'NO',
			enabled			=> 'YES'
		)
	);

end;
/

I used the SQL_ID and child_number from the first query to get the full SQL text of the query into an in-memory CLOB, and then use the SQL_id and plan_hash_value from the second query to associate the second plan with the first query – storing the result as a SQL Baseline that is enabled and ready for use.

You’ll have to take my word that I haven’t faked the following text, but this is what I get when I re-run the original query (flushing the shared pool first to make sure that I don’t accidentally end up picking up the original child cursor):

alter system flush shared_pool;

select
	/*+ target_query */
	id
from
	t1
where
	n1000 = 500
;

select * from table(dbms_xplan.display_cursor('&m_sql_id_1',null));

SQL_ID  306m98cpu9yz7, child number 1
-------------------------------------
select  /*+ target_query */  id from  t1 where  n1000 = 500

Plan hash value: 3617692013

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |    28 |
|*  1 |  TABLE ACCESS FULL| T1   |    10 |    80 |    28 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1000"=500)

Note
-----
   - cpu costing is off (consider enabling it)
   - SQL plan baseline SYS_SQL_PLAN_3c0ea7f3dbd90e8e used for this statement

It’s so much easier than the fiddling one had to do for stored outlines which was quite easy in 8i, but got a little nasty in 9i because of the extra (undocumented) details that looked as if they might have been necessary when the third table appeared in the outln schema. However, in 10g, the dbms_outln package was enhanced to allow you to create outlines from the library cache – see this note from Randolf Geist for more details – but remember that stored outlines will be deprecated in Oracle 12.

Footnote: the dbms_spm package is created by script $ORACLE_HOME/rdbms/admin/dbmsspm.sql, and there is a comment near the top saying:


--           Grant the execute privilege on SPM package to public.           --
--           But ADMINISTER SQL MANAGEMENT OBJECT privilege is checked       --
--           before a user is allowed to execute.                            --

Cost – again

Browsing through some postings on Tony Hasler’s blog a little while ago I found this response to a note he had posted on some anomalies (i.e. bugs) in the costing of the “(min/max)” index scans:

My current understanding is it is not valid to try to compare costs across different queries (even if you just alter it by adding a hint). In general a better plan will have a lower cost but you cannot rely on this metric. The metric is really for the CBO to choose between alternative plans for this specific query, not to compare plans generated for different queries.

Now I know that this is a statement that pretty much paraphrases something that Tom Kyte wrote on AskTom several years ago – but it’s wrong. As I’ve pointed out in the past, “Cost is Time”. The cost of a query represents the optimizer’s estimate of how long it will take that query to run – so it is perfectly valid to compare the cost of two queries to see which one the optimizer thinks will be faster but, thanks to limitations and defects in the optimizer it may not be entirely sensible to do so.

The point I want to address in this post though is the comment that “it’s valid to compare the cost of different plans, but not to compare the cost of two different queries”. Consider the following queries:

select
	t1.v1
from
	t1
where	t1.id in (
		select
			t2.id
		from	t2
		where	t2.n1 = 15
	)
;

select
	t1.v1
from
	t1
where	exists (
		select
			t2.id
		from	t2
		where	t2.n1 = 15
		and	t2.id = t1.id
	)
;

select
	t1.v1
from
	(
		select	distinct t2.id
		from	t2
		where	t2.n1 = 15
	) t2,
	t1
where
	t1.id = t2.id
;

select
	v1.v1
from	(
	select
		distinct
			t1.rowid,
			t2.id,
			t1.v1
	from
		t2, t1
	where
		t2.n1 = 15
	and	t1.id = t2.id
	) v1
;

Tables t1 and t2 have the following definitions, so all four queries are logically equivalent:

Name                    Null?    Type
----------------------- -------- ----------------
ID                      NOT NULL NUMBER
N1                      NOT NULL NUMBER
V1                               VARCHAR2(6)
PADDING                          VARCHAR2(100)

According to the claim, it is not valid to compare the costs that the optimizer gives you for these four different queries – but they are the same query. In principle the optimizer might transform the IN to an EXISTS, it might simply unnest, it might unnest and merge – so when the optimizer is “comparing different costs for the same query”, it will also be “comparing costs of different queries”.

Cost IS time – but only in theory. The “trick” to sorting out optimization problems lies in recognising where the optimizer model is not right for your data, or the optimizer arithmetic is too simplistic or has a bug.

SQL Plan Baselines

Here’s another of my little catalogues of articles – this one on SQL Plan Baselines.

Be a little careful as you read through these notes – there are various changes in internal mechanisms, storage, etc. as you go through different versions of Oracle, so check which version the author is writing about.

Maria Colgan of Oracle Corp.

Christian Anognini:

Kerry Osborne

Tim Hall

Jason Arneil

Existence

Here’s a posting on OTN that demonstrates a piece of SQL that uses inline scalar subqueries which are all “existence” tests to produce (presumably) a set of flags describing the state of a particular item of data.

I’ve linked to it because I contributed a comment about the implications of the cost figures that appeared in the execution plan for two of the “exists” subqueries. Essentially “existence” is optimized as a “first_rows(1)” operation – which results in two lines of the plan showing two different costs for table scans of the same table.

Update 30th Dec:
If you follow the OTN note you’ll see that the original poster was confused by my comments about the relative costs of the two tablescans, so I’ve whipped up a quick script to clarify the point. It uses my typical “reproducible” setup of 1MB uniform extents, 8KB block size, freelist management and disabling system statistics. Here’s the starting data set:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	mod(rownum,1000)	n_1000,
	mod(rownum,5000)	n_5000,
	mod(rownum,10000)	n_10000,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 100000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		estimate_percent => 100,
		method_opt 	 => 'for all columns size 1'

	);
end;
/

Notice that any given value of n_1000 appears every 1,000 rows, any given value of n_5000 appears every 5,000 rows, and any given value for n_10000 appears every 10,000 rows. Think about the effect this has on Oracle’s prediction of how much work it has to do when asked to find a particular row under first_rows(1) optimisation (which is the optimisation strategy triggered by the “rownum = 1″ predicates below):


set autotrace traceonly explain

select count(*) from t1;
select id from t1 where	n_1000  =  500 and rownum = 1;
select id from t1 where	n_5000  = 2500 and rownum = 1;
select id from t1 where	n_10000 = 5000 and rownum = 1;

set autotrace off

The more rows you have to scan (on average) to find a given value, the more costly the tablescan becomes. (The initial select count(*) is there to demonstrate Oracle’s estimate of the cost of scanning the whole table).

---------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost  |
---------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |   283 |
|   1 |  SORT AGGREGATE    |      |     1 |       |
|   2 |   TABLE ACCESS FULL| T1   |   100K|   283 |
---------------------------------------------------

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     9 |     5 |
|*  1 |  COUNT STOPKEY     |      |       |       |       |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |    18 |     5 |
-----------------------------------------------------------
   1 - filter(ROWNUM=1)
   2 - filter("N_1000"=500)

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     9 |    16 |
|*  1 |  COUNT STOPKEY     |      |       |       |       |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |    18 |    16 |
-----------------------------------------------------------
   1 - filter(ROWNUM=1)
   2 - filter("N_5000"=2500)

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     9 |    30 |
|*  1 |  COUNT STOPKEY     |      |       |       |       |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |    18 |    30 |
-----------------------------------------------------------
   1 - filter(ROWNUM=1)
   2 - filter("N_10000"=5000)

And now, check the costs of the tablescans in the existence subqueries for the following:


select
	(select 1 from dual where exists (select null from t1 where n_1000  =  500)) n_1000,
	(select 1 from dual where exists (select null from t1 where n_5000  = 2500)) n_5000,
	(select 1 from dual where exists (select null from t1 where n_10000 = 5000)) n_10000
from
	dual
;

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |       |     2 |
|*  1 |  FILTER            |      |       |       |       |
|   2 |   FAST DUAL        |      |     1 |       |     2 |
|*  3 |   TABLE ACCESS FULL| T1   |     2 |     8 |     5 |
|*  4 |  FILTER            |      |       |       |       |
|   5 |   FAST DUAL        |      |     1 |       |     2 |
|*  6 |   TABLE ACCESS FULL| T1   |     2 |     8 |    16 |
|*  7 |  FILTER            |      |       |       |       |
|   8 |   FAST DUAL        |      |     1 |       |     2 |
|*  9 |   TABLE ACCESS FULL| T1   |     2 |     8 |    30 |
|  10 |  FAST DUAL         |      |     1 |       |     2 |
-----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ */ 0 FROM "T1" "T1" WHERE "N_1000"=500))
   3 - filter("N_1000"=500)
   4 - filter( EXISTS (SELECT /*+ */ 0 FROM "T1" "T1" WHERE "N_5000"=2500))
   6 - filter("N_5000"=2500)
   7 - filter( EXISTS (SELECT /*+ */ 0 FROM "T1" "T1" WHERE "N_10000"=5000))
   9 - filter("N_10000"=5000)

Index join – 4

In a recent note I wrote about index joins I made a passing comment about limitations in the optimizer’s available strategies that might make you choose to write your code to emulate an index join through explicit SQL references.

Here are two SQL similar SQL statements (with execution plans) that demonstrate the initial problem – the first is just a restatement of the basic example I supplied in the first article:

create table indjoin
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rpad('x',500) padding
from all_objects where rownum <= 3000
;

-- collect stats, compute, no histograms

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);

select
	val1, val2
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

---------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     3 |    24 |    24 |
|*  1 |  VIEW                  | index$_join$_001 |     3 |    24 |    24 |
|*  2 |   HASH JOIN            |                  |       |       |       |
|*  3 |    INDEX FAST FULL SCAN| IJ_V1            |     3 |    24 |    11 |
|*  4 |    INDEX FAST FULL SCAN| IJ_V2            |     3 |    24 |    11 |
---------------------------------------------------------------------------

select
	val1, val2, rowid
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    60 |    17 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| INDJOIN |     3 |    60 |    17 |
|*  2 |   INDEX FULL SCAN           | IJ_V1   |   102 |       |     9 |
-----------------------------------------------------------------------

When we include the rowid in the query the optimizer stops using the index join – and it won’t even use the mechanism if we hint it. Apparently, for the purposes of analysing the query, Oracle doesn’t recognise the rowid as a column in the table and this automatically precludes the possibility of using the index join as the access method. So we have to use the manual rewrites I introduced in an earlier article.

You might wonder why this matters – but consider a case where a “perfect” index doesn’t exist for the following query:

select
	padding
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

The only access path available to the optimizer at this point is a fulll tablescan – but what if the two indexes are very small compared to the table; wouldn’t it be a good idea to use an index hash join between the two indexes to get a list of rowids and visit the table only for those rows. Unfortunately isn’t a path the optimizer can derive – so we might try something like:


select
	t.padding
from
	(
	select
		/*+
			index_join(ij ij_v1 ij_v2)
			no_merge
		*/
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	and	val2 between 50 and 150
	)	v1,
	indjoin	t
where
	t.rowid = v1.rowid
;

But, as we’ve just seen, you can’t do an index join if you select the rowid, so this code won’t follow the strategy we want. (In fact, when I tried it, there was something distinctly bug-like about the plan – but I won’t go into that now). But we can do the following:


select
	t.padding
from
	(
	select
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	)	v1,
	(
	select
		rowid
	from
		indjoin		ij
	where
		val2 between 50 and 150
	)	v2,
	indjoin	t
where
	v2.rowid = v1.rowid
and	t.rowid = v2.rowid
;

-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |  1632 |    10 |
|   1 |  NESTED LOOPS               |         |     3 |  1632 |    10 |
|*  2 |   HASH JOIN                 |         |     3 |    96 |     7 |
|*  3 |    INDEX FAST FULL SCAN     | IJ_V1   |   102 |  1632 |     3 |
|*  4 |    INDEX FAST FULL SCAN     | IJ_V2   |   102 |  1632 |     3 |
|   5 |   TABLE ACCESS BY USER ROWID| INDJOIN |     1 |   512 |     1 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V2".ROWID="V1".ROWID)
   3 - filter("VAL1">=100 AND "VAL1"<=200)
   4 - filter("VAL2">=50 AND "VAL2"<=150)

It’s amazing what you can make the optimizer do (even without hinting) if you think about the mechanics underneath the basic operations.

[Further reading on Index Joins]

Join Surprise

Imagine I have a simple SQL statement with a “where clause” that looks like this:


	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2

Would you expect it to run several times faster (25 minutes instead of a few hours) when the only change you made was to swap the order of the join predicates to:


	t2.id2(+) = t1.id2
and	t2.id1(+) = t1.id1

You may recall that a couple of years ago I wrote about some bugs in the optimizer, and pointed you to a blog article by Alberto Dell’Era that demonstrated an anomaly in cardinality calculations that made this type of thing possible. But here’s an example which has nothing to do with cardinality errors. We start with a suitable dataset – running on 11.1.0.6.


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	trunc(dbms_random.value(1,1000))	id1,
	trunc(dbms_random.value(1,1000))	id2,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',1000)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

create table t2
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 7
)
select
	t1.id1,
	t1.id2,
	v1.id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',70)		padding
from
	t1		t1,
	generator	v1
;

-- collect stats, compute, no histograms

This data set models a problem – stripped to the bare essentials – that I came across at a client site some time ago. We have a “parent/child” relationship between the tables (although I haven’t declared the referential integrity), with roughly seven child rows per parent. The parent rows are quite long, the child rows are quite short. Some parents may not have children (although in this data set they do).

We now run a “report” that generates data for a number-crunching tool that extracts all the data from the tables – using an outer join so that parent rows don’t get lost. For various reasons the tool wanted the data sorted in a certain order – so there’s also an order by clause in the query. I’m going to show you the original query – first unhinted, and then hinted to use a merge join:


select
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2
order by
	t1.id2,
	t1.id1
;

---------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      | 10000 |    10M|       |  3720   (1)| 00:00:45 |
|   1 |  SORT ORDER BY         |      | 10000 |    10M|    22M|  3720   (1)| 00:00:45 |
|*  2 |   HASH JOIN RIGHT OUTER|      | 10000 |    10M|  6224K|  1436   (1)| 00:00:18 |
|   3 |    TABLE ACCESS FULL   | T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
|   4 |    TABLE ACCESS FULL   | T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")

select
	/*+ leading(t1 t2) use_merge(t2) */
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id1(+) = t1.id1
and	t2.id2(+) = t1.id2
order by
	t1.id2,
	t1.id1
;

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 10000 |    10M|       |  6343   (1)| 00:01:17 |
|   1 |  SORT ORDER BY       |      | 10000 |    10M|    22M|  6343   (1)| 00:01:17 |
|   2 |   MERGE JOIN OUTER   |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   3 |    SORT JOIN         |      | 10000 |  9853K|    19M|  2509   (1)| 00:00:31 |
|   4 |     TABLE ACCESS FULL| T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
|*  5 |    SORT JOIN         |      | 70000 |  5400K|    12M|  1549   (1)| 00:00:19 |
|   6 |     TABLE ACCESS FULL| T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")
       filter("T2"."ID2"(+)="T1"."ID2" AND "T2"."ID1"(+)="T1"."ID1")

But there’s something a little odd about how the optimizer has chosen to do the merge join. Although our join condition references the join columns in the order (id1, id2) our final sort order is on (id2, id1) – and the optimizer hasn’t taken advantage of the fact that it could do the “sort join” operations in the order (id2, id1) and avoid the final “sort order by” at line 1.

So let’s rewrite the query to make the order of the join predicates match the order of the order by clause, and see what happens to the plan:


select
	/*+ leading(t1 t2) use_merge(t2) */
	t1.padding,
	t2.padding
from
	t1, t2
where
	t2.id2(+) = t1.id2
and	t2.id1(+) = t1.id1
order by
	t1.id2,
	t1.id1
;

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   1 |  MERGE JOIN OUTER   |      | 10000 |    10M|       |  4059   (1)| 00:00:49 |
|   2 |   SORT JOIN         |      | 10000 |  9853K|    19M|  2509   (1)| 00:00:31 |
|   3 |    TABLE ACCESS FULL| T1   | 10000 |  9853K|       |   390   (1)| 00:00:05 |
|*  4 |   SORT JOIN         |      | 70000 |  5400K|    12M|  1549   (1)| 00:00:19 |
|   5 |    TABLE ACCESS FULL| T2   | 70000 |  5400K|       |   260   (1)| 00:00:04 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."ID2"(+)="T1"."ID2" AND "T2"."ID1"(+)="T1"."ID1")
       filter("T2"."ID1"(+)="T1"."ID1" AND "T2"."ID2"(+)="T1"."ID2")

The plan no longer has the final “sort order by” operation – and the cost of the plan is much lower as a consequence.. You’ll also notice that the predicate sections (always check the predicate section) are a little different – the order of evaluation has been reversed.

In my test case the cost of the merge join still hasn’t fallen below the cost of the hash join – but in the case of the client changing the order of predicates – without adding any hints – made the cost of the merge join much cheaper than the cost of the hash join. Fortunately this was a case where the cost was a realistic indication of run time and avoiding a sort operation of some 35GB of join result was a very good move.

So watch out – with multi-column joins, the order of the join predicates can make a big difference to the way Oracle operates a merge join.

Quiz Night

I have four simple (non-partitioned, non-clustered, not views, not object type – really I’m not trying to be cunning or devious here) heap tables, and write a query that joins them:

select
	/*+
		leading(t1 t2 t3 t4)
		use_hash(t2) use_hash(t3) use_hash(t4)
	*/
	count(t1.small_vc),
	count(t2.small_vc),
	count(t3.small_vc),
	count(t4.small_vc)
from
	t1,
	t2,
	t3,
	t4
where
	t2.id2 = t1.id1
and	t3.id3 = t2.id2
and	t4.id4 = t3.id3
;

Given that I’ve specified the join order and every join method, and given that there are no indexes on any of the tables, how many different execution plans are there for this query.

Update: When I say I’m not trying to be cunning or devious, it means I’m not trying to be cunning or devious – it doesn’t mean that I’m being really, really cunning and devious. Honestly, I’m just trying to make a simple but important point that isn’t terribly well known.

Update 2: And when I said simple tables I should also have said they weren’t declared parallel, weren’t going to be hinted parallel, weren’t external or remote/non-oracle, etc. etc. etc.

The Answer:

The answer to the question I was trying to ask is eight – as stated then demonstrated by Pavol Babel and explained very succinctly by Greg Rahn.

The answer to the question I actually asked is four as stated by Vyacheslav Rasskazov. In my attempts to describe the situation as clearly and simply as possible I forgot about a special case that I will comment on at the end of this note.

Going back to the “expected” answer. The key point is this:

    You have NOT defined a hash join completely until you have specified which rowsource should be used as the build table and which as the probe table – so every time you supplyy the use_hash() hint for a table, you should also supply the swap_join_inputs() hint or the no_swap_join_inputs() hint.

So my original query is suffering from incomplete hinting. There are three hash joins, so there should be three hints about swapping inputs or not. For example:

/*+
	leading(t1 t2 t3 t4)
	use_hash(t2) no_swap_join_inputs(t2)
	use_hash(t3) no_swap_join_inputs(t3)
	use_hash(t4) no_swap_join_inputs(t4)
*/

Since there are two possibilites for the swap/no_swap option, there are 2 x 2 x 2 = 8 possibilities in total for the execution plan – even though only one join order is examined. (If you check the 10053 trace file for this query you will find all the computation for these execution plans under one line that reads: Join order [1], there will not be a Join order[2])

Pavol did list all the different patterns of hints with their execution plans – but I’m going to do it again, stripped to the minimum output:

use_hash(t2) no_swap_join_inputs(t2)
use_hash(t3) no_swap_join_inputs(t3)
use_hash(t4) no_swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|*  3 |    HASH JOIN          |      |    70 |  1260 |     8 |
|*  4 |     HASH JOIN         |      |    70 |   840 |     5 |
|   5 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   6 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   7 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|   8 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2)    swap_join_inputs(t2)
use_hash(t3) no_swap_join_inputs(t3)
use_hash(t4) no_swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|*  3 |    HASH JOIN          |      |    70 |  1260 |     8 |
|*  4 |     HASH JOIN         |      |    70 |   840 |     5 |
|   5 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   6 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   7 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|   8 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2) no_swap_join_inputs(t2)
use_hash(t3)    swap_join_inputs(t3)
use_hash(t4) no_swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|*  3 |    HASH JOIN          |      |    70 |  1260 |     8 |
|   4 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|*  5 |     HASH JOIN         |      |    70 |   840 |     5 |
|   6 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   7 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   8 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2)    swap_join_inputs(t2)
use_hash(t3)    swap_join_inputs(t3)
use_hash(t4) no_swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|*  3 |    HASH JOIN          |      |    70 |  1260 |     8 |
|   4 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|*  5 |     HASH JOIN         |      |    70 |   840 |     5 |
|   6 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   7 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   8 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2) no_swap_join_inputs(t2)
use_hash(t3) no_swap_join_inputs(t3)
use_hash(t4)    swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|   3 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
|*  4 |    HASH JOIN          |      |    70 |  1260 |     8 |
|*  5 |     HASH JOIN         |      |    70 |   840 |     5 |
|   6 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   7 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   8 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2)    swap_join_inputs(t2)
use_hash(t3) no_swap_join_inputs(t3)
use_hash(t4)    swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|   3 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
|*  4 |    HASH JOIN          |      |    70 |  1260 |     8 |
|*  5 |     HASH JOIN         |      |    70 |   840 |     5 |
|   6 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   7 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   8 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2) no_swap_join_inputs(t2)
use_hash(t3)    swap_join_inputs(t3)
use_hash(t4)    swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|   3 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
|*  4 |    HASH JOIN          |      |    70 |  1260 |     8 |
|   5 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|*  6 |     HASH JOIN         |      |    70 |   840 |     5 |
|   7 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   8 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
--------------------------------------------------------------

use_hash(t2)    swap_join_inputs(t2)
use_hash(t3)    swap_join_inputs(t3)
use_hash(t4)    swap_join_inputs(t4)

--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|   3 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
|*  4 |    HASH JOIN          |      |    70 |  1260 |     8 |
|   5 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|*  6 |     HASH JOIN         |      |    70 |   840 |     5 |
|   7 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   8 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
--------------------------------------------------------------

Note the extreme change in shape and apparent order of tables in the plan. Despite this the join order really is t1 -> t2 -> t3 -> t4 in every case. I’ll give a quick description of the first and last plans to explain this.

First plan (no_swap all the way):

    We build a hash table from t1 and probe it with t2 to create a result set
    As this result set is generated we build a new hash table from it
    As the result set completes we discard the hash table from t1
    We probe the result set with t3 to create a second result set
    As the second result set is generated we build a new hash table from it
    As the second result set completes we discard the hash table from the first result set
    We probe the second result set with t4 to create a third result set
    As the third result set is generated we pass the results up to the aggregation step to count the output.

It is very obvious from this description that the tables are being joined in the order we dictated.
Last plan (swap all the way)

    We build a hash table from t4
    We build a hash table from t3
    We build a hash table from t2
    We pick a row from t1 and probe the t2 hash,
    if the row “survives” we probe the t3 hash
    if the row “survives” we probe the t4 hash
    if the row “survives” we pass it up to the aggregation step to be counted.

With this description it becomes clear that, once again, the tables are being joined in the order we dictated.

Notice: the number of in-memory hash tables we build in the first case is two and, no matter how many tables are involved in this pattern, the number of in-memory hash tables will always be two. The actual size of the two hash tables is a little unpredictable, though, and, a very crude guideline, you might expect the size to grow as more tables are joined into the result set. In the last case the number of in-memory hash tables we build is “N-1″ (where N is the number of tables joined). We can predict the approximate size of each hash table because it is based on the data we expect to extract from the corresponding “real” table. If you have enough memory to hold all the hash tables in memory at once you will find that this join pattern is likely to be the fastest option you can produce.

Footnote: although a hash join is not fully specified unless you have also supplied an associated “swap/no swap” hint, the no_swap_join_inputs() hint didn’t become available until 10g !

Footnote 2: I’ve had this note on my draft list for nearly a year now – after answering this question on OTN. Unfortunately I think it’s a bit too late to update the thread now.

Special Case

And now we come to the point made by Vyacheslav Rasskazov. I have changed the original code to hint the eight different paths that could come from a single join order with three hash joins. But if I were limited to changing the data (technically the statistics) I would only be able to persuade the optimizer to pick four of those paths for this join order. The four paths that the optimizer till not consider are the ones which I have hinted with /*+ swap_join_inputs(t2) – i.e. the 2nd, 4th, 6th and 8th above.

As far as I can tell the built-in decision process is this: when considering a hash join, the optimizer will examine the cost of swapping the inputs if the expected volume (i.e. predicted cardinality * row length) of the second input is smaller than the expected volume of the first input. However, in the special case of the first join of the first join order, this does not happen. This is fairly easy to demonstrate by setting up a two-table example where selected content is a small number of large rows from one table and a large number of small rows from the other table – the first join order is dictated by number of rows, swapping is dictated by volume of data. I suspect, but cannot confirm, that this is an accident (i.e. bug) since I can think of no rational explanation of why this has to happen.

Since my original text only allowed for one join order (the leading() hint) the optimizer will not examine the option for swapping the (t1, t2) step of that join order – which is why Vyacheslav’s answer is the answer to the question I actually asked.

My apologies – this wasn’t intended to be a cunning trap – I simply forgot that odd little glitch when I was writing up the question.

In passing, I have launched (yet another) poll in response to a question from a reader. In this case the poll is whether I should carry on blogging, or stop blogging for a year to write a book. To my mind it makes more sense to blog because it gets more information out in a more timely fashion – but I am curious to see what other people think.