After publishing my note about emulating star transformations when there was something blocking the optimizer’s ability to do them automatically I ended up having an interesting email conversation with Mark Farnham about variations on the theme, concluding with a completely different (and possibly more intuitive) SQL statement to achieve the same aim.
You will recall that I started off with the following statement:
select ord.* from products prd, customers cst, orders ord where prd.grp = 50 and cst.grp = 50 and ord.id_prd = prd.id and ord.id_cst = cst.id ;
Although this specific example is so simple that the optimizer could turn it into a star transformation automatically, I used it to demonstrate a “minimum work” version of the statement as follows (with the execution plan from 11.1.0.6):
select
ord.*
from
(
select
/*+
leading(prd ord)
use_nl(ord)
no_merge
*/
ord.rowid prid
from
products prd,
orders ord
where
prd.grp = 50
and ord.id_prd = prd.id
) prid,
(
select
/*+
leading(cst ord)
use_nl(ord)
no_merge
*/
ord.rowid crid
from
customers cst,
orders ord
where
cst.grp = 50
and ord.id_cst = cst.id
) crid,
orders ord
where
prid.prid = crid.crid
and ord.rowid = prid.prid
;
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 1650 | 570 |
| 1 | NESTED LOOPS | | 11 | 1650 | 570 |
|* 2 | HASH JOIN | | 11 | 264 | 559 |
| 3 | VIEW | | 3361 | 40332 | 277 |
| 4 | NESTED LOOPS | | 3361 | 87386 | 277 |
|* 5 | TABLE ACCESS FULL | CUSTOMERS | 98 | 882 | 81 |
|* 6 | INDEX RANGE SCAN | ORD_CST_FK | 34 | 578 | 2 |
| 7 | VIEW | | 3390 | 40680 | 281 |
| 8 | NESTED LOOPS | | 3390 | 88140 | 281 |
|* 9 | TABLE ACCESS FULL | PRODUCTS | 100 | 900 | 81 |
|* 10 | INDEX RANGE SCAN | ORD_PRD_FK | 34 | 578 | 2 |
| 11 | TABLE ACCESS BY USER ROWID| ORDERS | 1 | 126 | 1 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("PRID"."PRID"="CRID"."CRID")
5 - filter("CST"."GRP"=50)
6 - access("ORD"."ID_CST"="CST"."ID")
9 - filter("PRD"."GRP"=50)
10 - access("ORD"."ID_PRD"="PRD"."ID")
Go back to the original SQL statement, though. The select list contains columns from just the orders table, and there’s a fairly well-known suggestion (or possibly guideline) that “tables that are not in the select list should not appear in the from clause, they should appear in subqueries in the where clause”. It’s not difficult to come up with examples where this strategy is a bad idea – but it’s often worth thinking through the consequences of trying to apply it. (Of course, you often find that after you’ve rewritten your SQL to match the strategy the optimizer transforms it back into the join that you had been avoiding.)
Since the products and customers tables don’t appear in the select list, can I find a way of rewriting my statement with subqueries ? The answer is yes. Here’s the SQL, with the execution plan I got.
select
ord.*
from
orders ord
where
ord.rowid in (
select
/*+
no_use_hash_aggregation
*/
prid.prid
from
(
select /*+ no_merge */
ord.rowid prid
from
products prd,
orders ord
where
prd.grp = 50
and ord.id_prd = prd.id
) prid,
(
select /*+ no_merge */
ord.rowid crid
from
customers cst,
orders ord
where
cst.grp = 50
and ord.id_cst = cst.id
) crid
where
prid.prid = crid.crid
)
;
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 138 | 591 (1)| 00:00:08 |
| 1 | NESTED LOOPS | | 1 | 138 | 591 (1)| 00:00:08 |
| 2 | VIEW | VW_NSO_1 | 11 | 132 | 589 (1)| 00:00:08 |
| 3 | SORT UNIQUE | | 1 | 264 | | |
|* 4 | HASH JOIN | | 11 | 264 | 589 (1)| 00:00:08 |
| 5 | VIEW | | 3314 | 39768 | 294 (1)| 00:00:04 |
| 6 | NESTED LOOPS | | 3314 | 86164 | 294 (1)| 00:00:04 |
|* 7 | TABLE ACCESS FULL | PRODUCTS | 100 | 900 | 94 (2)| 00:00:02 |
|* 8 | INDEX RANGE SCAN | ORD_PRD_FK | 33 | 561 | 2 (0)| 00:00:01 |
| 9 | VIEW | | 3314 | 39768 | 294 (1)| 00:00:04 |
| 10 | NESTED LOOPS | | 3314 | 86164 | 294 (1)| 00:00:04 |
|* 11 | TABLE ACCESS FULL | CUSTOMERS | 100 | 900 | 94 (2)| 00:00:02 |
|* 12 | INDEX RANGE SCAN | ORD_CST_FK | 33 | 561 | 2 (0)| 00:00:01 |
| 13 | TABLE ACCESS BY USER ROWID| ORDERS | 1 | 126 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("PRID"."PRID"="CRID"."CRID")
7 - filter("PRD"."GRP"=50)
8 - access("ORD"."ID_PRD"="PRD"."ID")
11 - filter("CST"."GRP"=50)
12 - access("ORD"."ID_CST"="CST"."ID")
You’ll notice that this plan is remarkably similar to the plan I got from the joins with inline views – although the new plan has one extra “sort unique” operation at line 3. The optimizer has transformed my query by unnesting the subqueries and turning them into inline views – adding in a “distinct” operation because that’s what happens when you turn an “IN subquery” into a join when there isn’t a suitable uniqueness constraint on the join column.
There are two reasons for adopting this subquery approach. There’s the (highly subjective) argument that the code is a little easier to understand in this form, especially if you then want to join the orders table onwards to other tables. There’s also an objective argument relating to the “sort unique”. Note that I included the hint “no_use_hash_aggregation” to stop Oracle from using a “hash unique” operation at this point. By forcing Oracle to sort for uniqueness I know that the rowids will appear in (at least) an “extent-based” order rather than a completely random order. It’s possible that walking the fact table in physical order will be a little more efficient than accessing it in a completely randomised order. The former might find multiple rows in a block or benefit from some form of O/S or SAN read-ahead; the latter is more likely to turn one row into one random physical read.
One final thought – I said that you might want to take this type of approach for queries where the optimizer can’t do a star transformation automatically. There’s one very important case where this is always true – when you’re running Standard Edition, which doesn’t support bitmap indexes. So if you’re running SE and want to do star transformations – you now know how.
Footnote: I haven’t spent much time testing the limits of variations of this code although I did spend 30 minutes extending the example to use three dimension tables. If you think that this strategy might solve a production problem, think about using the sample code to create and test a better model of your production requirement before rewriting the production code – and if you go to production, check very carefully that the resulting code does produce the same results.
Recent comments
16 weeks 6 days ago
26 weeks 4 days ago
28 weeks 2 days ago
31 weeks 3 days ago
33 weeks 5 days ago
43 weeks 2 days ago
44 weeks 6 days ago
45 weeks 6 days ago
46 weeks 6 hours ago
48 weeks 5 days ago