Search

Top 60 Oracle Blogs

Recent comments

Index Joins

Index Hash

I’m afraid this is one of my bad puns again – an example of the optimizer  making a real hash of the index hash join. I’m going to create a table with several indexes (some of them rather similar to each other) and execute a query that should do an index join between the obvious two indexes. To show how obvious the join should be I’m going to start with a couple of queries that show the cost of simple index fast full scans.

Here’s the data generating code:

Partitioned Bitmaps

The following question appeared in a comment to an earlier posting on multi-column bitmap indexes and the inability of Oracle to create a bitmap index join when (to the human eye) the strategy was an obvious choice.

    I have a query which is using 2 indexes both are bitmap indexes (sizes are 37 and 24 Mbs) and table size is 17gb. While i ran the following query which can very well get the index itself, it takes around 6-8 minutes and using pga around 3 gb.

could you please explain me why ?

SQL_ID  5z0a50a2yzdky, child number 0
-------------------------------------
select count(1) from (select distinct SRC_SYS,PROD_KEY from dds.REV_F)

Plan hash value: 867747470

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                      |       |       |   221K(100)|          |       |       |
|   1 |  SORT AGGREGATE                   |                      |     1 |       |            |          |       |       |
|   2 |   VIEW                            |                      | 24533 |       |   221K  (6)| 00:44:22 |       |       |
|   3 |    HASH UNIQUE                    |                      | 24533 |   479K|   221K  (6)| 00:44:22 |       |       |
|   4 |     VIEW                          | index$_join$_002     |    63M|  1209M|   212K  (2)| 00:42:28 |       |       |
|*  5 |      HASH JOIN                    |                      |       |       |            |          |       |       |
|   6 |       PARTITION LIST ALL          |                      |    63M|  1209M|  3591   (1)| 00:00:44 |     1 |   145 |
|   7 |        BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M|  3591   (1)| 00:00:44 |       |       |
|   8 |         BITMAP INDEX FULL SCAN    | REV_F_IDX1           |       |       |            |          |     1 |   145 |
|   9 |       PARTITION LIST ALL          |                      |    63M|  1209M| 13724   (1)| 00:02:45 |     1 |   145 |
|  10 |        BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M| 13724   (1)| 00:02:45 |       |       |
|  11 |         BITMAP INDEX FULL SCAN    | REV_F_IDX5           |       |       |            |          |     1 |   145 |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(ROWID=ROWID)

28 rows selected.

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2    610.89    1464.86     707459      17090          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4    610.90    1464.87     707459      17090          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=17090 pr=707459 pw=446115 time=1464867976 us)
  26066   VIEW  (cr=17090 pr=707459 pw=446115 time=1464795748 us)
  26066    HASH UNIQUE (cr=17090 pr=707459 pw=446115 time=1464769678 us)
63422824     VIEW  index$_join$_002 (cr=17090 pr=707459 pw=446115 time=1084846545 us)
63422824      HASH JOIN  (cr=17090 pr=707459 pw=446115 time=958000889 us)
63422824       PARTITION LIST ALL PARTITION: 1 145 (cr=3561 pr=0 pw=0 time=63423134 us)
63422824        BITMAP CONVERSION TO ROWIDS (cr=3561 pr=0 pw=0 time=9554 us)
   7112         BITMAP INDEX FULL SCAN REV_F_IDX1 PARTITION: 1 145 (cr=3561 pr=0 pw=0 time=155525 us)(object id 366074)
63422824       PARTITION LIST ALL PARTITION: 1 145 (cr=13529 pr=8864 pw=0 time=190268771 us)
63422824        BITMAP CONVERSION TO ROWIDS (cr=13529 pr=8864 pw=0 time=63553723 us)
 432700         BITMAP INDEX FULL SCAN REV_F_IDX5 PARTITION: 1 145 (cr=13529 pr=8864 pw=0 time=3157351 us)(object id 366658)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  direct path write temp                      29741        1.62        107.05
  db file sequential read                      8864        0.20          2.35
  direct path read temp                       46573        0.79        211.02
  SQL*Net message from client                     2       29.22         29.22

In this case Oracle is clearly doing the bitmap join, but it’s managing to do something very inefficient. The problem lies in the partitioning or, to be more precise, Oracle’s failure to take advantage of partitioning. The OP complains of using 3GB of memory and several minutes of elapsed time. The plan shows us that the we have 145 partitions (PARTITION LIST ALL PARTITION: 1 145), and we have been told that the table is about 17GB is size, so the “average” partition is about 120MB – so why isn’t Oracle using a partition-wise approach and processing the data one partition at a time ? The answer is simple – it can’t be done (at present).

An index join works by doing a hash join between rowids – and since we are using bitmap indexes we have to convert bitmaps to rowids as part of the plan. In this query we then want to count the number of distinct combintations of (SRC_SYS,PROD_KEY) – and the same combination may appear in different partitions, so Oracle has had to generate a plan that handles the entire data set in a single join rather than trying to handle each partition separately (notice how the “partition list all” operator appears twice, once for each index).

The “Row Source Operation” tells us we only had to scan a few thousand block, but we have to build a hash table of 64 million entries:

63422824        BITMAP CONVERSION TO ROWIDS (cr=3561 pr=0 pw=0 time=9554 us)

At 10 bytes per rowid (for a partitioned table), plus the length of the input column, plus linked list overheads (8 bytes per pointer) you can see the 3 GB beginning to appear out of the volume of work being done. And then the Oracle dumped the whole thing to disc (perhaps in anticipation of the impending “hash unique”, perhaps because it still needed to do a one-pass operation – it would have been interesting to see the full row source execution statistics from a call to dbms_xplan.display_cursor())

What we need to see is a plan more like the following:

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                      |       |       |   221K(100)|          |       |       |
|   1 |  SORT AGGREGATE                    |                      |     1 |       |            |          |       |       |
|   2 |   VIEW                             |                      | 24533 |       |   221K  (6)| 00:44:22 |       |       |
|   3 |    HASH UNIQUE                     |                      | 24533 |   479K|   221K  (6)| 00:44:22 |       |       |
|   4 |     PARTITION LIST ALL             |                      |    63M|  1209M|  3591   (1)| 00:00:44 |     1 |   145 |
|*  5 |      HASH UNIQUE                   |                      |       |       |            |          |       |       |
|   6 |       VIEW                         | index$_join$_002     |    63M|  1209M|   212K  (2)| 00:42:28 |       |       |
|*  7 |        HASH JOIN                   |                      |       |       |            |          |       |       |
|   8 |         BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M|  3591   (1)| 00:00:44 |       |       |
|   9 |          BITMAP INDEX FULL SCAN    | REV_F_IDX1           |       |       |            |          |     1 |   145 |
|  10 |         BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M| 13724   (1)| 00:02:45 |       |       |
|  11 |          BITMAP INDEX FULL SCAN    | REV_F_IDX5           |       |       |            |          |     1 |   145 |
-------------------------------------------- ------------------------------------------------------------------------------

Line 4 shows us that we do something for each partition in turn. The sub-plan to line 4 tells us that we are collecting the unique combinations of (SRC_SYS,PROD_KEY) for a given partition. Line 3 tells us that we are collating the results from the different partitions and generating the set of values that is unique across all partitions.

The problem is this: can we engineer a strange piece of SQL that makes plan appear – because the optimizer isn’t going to do it automatically (yet).

Obviously it would be pretty easy to write some sort of solution using pl/sql and pipelined functions - perhaps a function that takes a table_name loops through each partition of the table in turn returning the distinct combinations for that partition, as this would allow you to “select count(distinct(…)) from table_function(…);” to get your result.

You might be able to avoid pl/sql by creating a piece of SQL joining the table’s metadata to the table by partition identifier – except you would probably need to use a lateral view, which Oracle doesn’t support, and make the partition extended syntax part of the lateral dependency .. which Oracle definitely doesn’t support.

So is there an alternative, purely SQL, strategy ?

I’m thinking about it – I have a cunning plan, but I haven’t had time to test it yet.

to be continued …

Update (7th July 2011):

I’ve finally managed to make a bit of time to write up my notes about this – and in the interim Randolf Geist has said everything that needs to be said, and the OP has pointed out that a full tablescan works faster anyway. However, here goes …

My cunning plans involved finding ways of forcing Oracle into doing a single partition hash join for each partition in turn by playing clever  tricks with dbms_mview.pmarker or the dba_tab_partitions view, or even the tbl$or$idx$part$num() function. But I realised I was in trouble as soon as I wrote the first  little test that was supposed to do an index hash join on a single explicitly referenced partition. Here’s my table description, index definitions, query and execution plan:

SQL> desc t1
 Name                    Null?    Type
 ----------------------- -------- ----------------
 NVCUSTATUS                       VARCHAR2(20)        -- list partitined on this column, multiple values in some partitions
 FREQ_COLUMN                      NUMBER(4)
 HBAL_COLUMN                      NUMBER(8)
 RAND_COLUMN                      NUMBER(8)

create bitmap index t1_freq on t1(freq_column) local;
create bitmap index t1_hbal on t1(hbal_column) local;

select
	count(1)
from	(
	select
		/*+index_join(t1) */
		distinct freq_column, hbal_column
	from t1 partition (p6)
	)
;

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |   894K|  6112K|  2966   (2)| 00:00:36 |       |       |
|   1 |  VIEW                          | index$_join$_001 |   894K|  6112K|  2966   (2)| 00:00:36 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION LIST SINGLE       |                  |   894K|  6112K|   208   (0)| 00:00:03 |   KEY |   KEY |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |   894K|  6112K|   208   (0)| 00:00:03 |       |       |
|   5 |      BITMAP INDEX FULL SCAN    | T1_FREQ          |       |       |            |          |     6 |     6 |
|   6 |    PARTITION LIST SINGLE       |                  |   894K|  6112K|   299   (0)| 00:00:04 |   KEY |   KEY |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |   894K|  6112K|   299   (0)| 00:00:04 |       |       |
|   8 |      BITMAP INDEX FULL SCAN    | T1_HBAL          |       |       |            |          |     6 |     6 |
-------------------------------------------------------------------------------------------------------------------

Even though we address a single partition explicitly, and even though the optimizer recognises it as partition 6 in lines 5 and 8, the plan isn’t doing a “partition-wise” join between the two indexes – the hash join calls two independent partition operations. In effect, Oracle is joining two different tables, and there isn’t a join condition on the partitioning column.

If we change the index definitions, though, to append the partitioning column (and given the small number of distinct values in each partition this didn’t affect the index size by much), and then rewrite the query to include the join, we get a better result – but only if we do a manual rewrite of the query (and if the partitioning column is declared not null).


alter table t1 modify nvcustatus not null;

drop index t1_freq;
drop index t1_hbal;

create bitmap index t1_freq on t1(freq_column, nvcustatus) local;
create bitmap index t1_hbal on t1(hbal_column, nvcustatus) local;

select
	count(*)
from	(
	select
		/*+
			qb_name(main)
			no_eliminate_join(@SEL$96BB32CC t1@hbal)
			no_eliminate_join(@SEL$96BB32CC t1@freq)
		*/
		distinct ftb.freq_column, htb.hbal_column
	from
		(
		select
			/*+ qb_name(freq) index(t1 t1_freq) */
			nvcustatus, freq_column, rowid frid
		from
			t1
		)	ftb,
		(
		select
			/*+ qb_name(hbal) index(t1 t1_hbal) */
			nvcustatus, hbal_column, rowid hrid
		from
			t1
		)	htb
	where
		ftb.frid = htb.hrid
	and	ftb.nvcustatus= htb.nvcustatus
	)
;

-------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |         |      1 |        |      1 |00:00:05.44 |     719 |   4380 |   3660 |       |       |          |         |
|   1 |  SORT AGGREGATE                  |         |      1 |      1 |      1 |00:00:05.44 |     719 |   4380 |   3660 |       |       |          |         |
|   2 |   VIEW                           |         |      1 |  35118 |  23223 |00:00:05.50 |     719 |   4380 |   3660 |       |       |          |         |
|   3 |    HASH UNIQUE                   |         |      1 |  35118 |  23223 |00:00:05.46 |     719 |   4380 |   3660 |  1268K|  1268K| 2647K (0)|         |
|   4 |     PARTITION LIST ALL           |         |      1 |    330K|   1000K|00:00:18.44 |     719 |   4380 |   3660 |       |       |          |         |
|*  5 |      HASH JOIN                   |         |      8 |    330K|   1000K|00:00:15.34 |     719 |   4380 |   3660 |   283M|    16M|   27M (1)|   31744 |
|   6 |       BITMAP CONVERSION TO ROWIDS|         |      8 |   1000K|   1000K|00:00:00.99 |     296 |    289 |      0 |       |       |          |         |
|   7 |        BITMAP INDEX FULL SCAN    | T1_FREQ |      8 |        |   1831 |00:00:00.30 |     296 |    289 |      0 |       |       |          |         |
|   8 |       BITMAP CONVERSION TO ROWIDS|         |      7 |   1000K|   1000K|00:00:06.48 |     423 |    416 |      0 |       |       |          |         |
|   9 |        BITMAP INDEX FULL SCAN    | T1_HBAL |      7 |        |   7353 |00:00:00.41 |     423 |    416 |      0 |       |       |          |         |
-------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("NVCUSTATUS"="NVCUSTATUS" AND ROWID=ROWID)

As you can see from the plan, I ran one of my tests with rowsource execution statistics enabled, so that I could count starts.

My query basically breaks the implicit join of two indexes into an explcit join so that I can include the join on the partitioning column. You’ll notice that I now iterate through each partition – line 4 – and do a hash join for each pair of partitions. This is the target I was hoping for. The theory was that by doing (up to) eight small hash joins, each join will operate in memory, rather than flooding one large build table to disc. Of course, the thing I’m building the hash table with is now three values (rowid, freq_value, nvcustatus) rather than the two values from the implicit join (rowid, freq_value), so if I’m unlucky this version of the query could take even longer than the original because it may have to do more I/O.

If you’re wondering why I only started lines 8 and 9 once, this was because one of may partitions was empty, so Oracle had to scan it once as the build table to find that it was empty, then didn’t need to scan it as the probe table.

Footnote: Although I got the plan I wanted – it really didn’t make much difference, and even with the million rows I was using it was fast to solve the problem with a tablescan. The relative benefit of the different plans would be dicatated by the length of rows, the lengths of the relevant columns, the number of partitions, and the available memory for hashing.

Index join – 4

In a recent note I wrote about index joins I made a passing comment about limitations in the optimizer’s available strategies that might make you choose to write your code to emulate an index join through explicit SQL references.

Here are two SQL similar SQL statements (with execution plans) that demonstrate the initial problem – the first is just a restatement of the basic example I supplied in the first article:

create table indjoin
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rpad('x',500) padding
from all_objects where rownum <= 3000
;

-- collect stats, compute, no histograms

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);

select
	val1, val2
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

---------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     3 |    24 |    24 |
|*  1 |  VIEW                  | index$_join$_001 |     3 |    24 |    24 |
|*  2 |   HASH JOIN            |                  |       |       |       |
|*  3 |    INDEX FAST FULL SCAN| IJ_V1            |     3 |    24 |    11 |
|*  4 |    INDEX FAST FULL SCAN| IJ_V2            |     3 |    24 |    11 |
---------------------------------------------------------------------------

select
	val1, val2, rowid
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    60 |    17 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| INDJOIN |     3 |    60 |    17 |
|*  2 |   INDEX FULL SCAN           | IJ_V1   |   102 |       |     9 |
-----------------------------------------------------------------------

When we include the rowid in the query the optimizer stops using the index join – and it won’t even use the mechanism if we hint it. Apparently, for the purposes of analysing the query, Oracle doesn’t recognise the rowid as a column in the table and this automatically precludes the possibility of using the index join as the access method. So we have to use the manual rewrites I introduced in an earlier article.

You might wonder why this matters – but consider a case where a “perfect” index doesn’t exist for the following query:

select
	padding
from
	indjoin		ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
;

The only access path available to the optimizer at this point is a fulll tablescan – but what if the two indexes are very small compared to the table; wouldn’t it be a good idea to use an index hash join between the two indexes to get a list of rowids and visit the table only for those rows. Unfortunately isn’t a path the optimizer can derive – so we might try something like:


select
	t.padding
from
	(
	select
		/*+
			index_join(ij ij_v1 ij_v2)
			no_merge
		*/
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	and	val2 between 50 and 150
	)	v1,
	indjoin	t
where
	t.rowid = v1.rowid
;

But, as we’ve just seen, you can’t do an index join if you select the rowid, so this code won’t follow the strategy we want. (In fact, when I tried it, there was something distinctly bug-like about the plan – but I won’t go into that now). But we can do the following:


select
	t.padding
from
	(
	select
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	)	v1,
	(
	select
		rowid
	from
		indjoin		ij
	where
		val2 between 50 and 150
	)	v2,
	indjoin	t
where
	v2.rowid = v1.rowid
and	t.rowid = v2.rowid
;

-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |  1632 |    10 |
|   1 |  NESTED LOOPS               |         |     3 |  1632 |    10 |
|*  2 |   HASH JOIN                 |         |     3 |    96 |     7 |
|*  3 |    INDEX FAST FULL SCAN     | IJ_V1   |   102 |  1632 |     3 |
|*  4 |    INDEX FAST FULL SCAN     | IJ_V2   |   102 |  1632 |     3 |
|   5 |   TABLE ACCESS BY USER ROWID| INDJOIN |     1 |   512 |     1 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V2".ROWID="V1".ROWID)
   3 - filter("VAL1">=100 AND "VAL1"<=200)
   4 - filter("VAL2">=50 AND "VAL2"<=150)

It’s amazing what you can make the optimizer do (even without hinting) if you think about the mechanics underneath the basic operations.

[Further reading on Index Joins]

Index Join – 3

I’ve recently been writing about the index join mechanism and ways of emulating it. Those notes were originally inspired by an example of an index join that appeared on OTN a little while ago.

It was a plan that combined “bitmap/btree conversion” with the basic index join strategy so, with hindsight, it was an “obvious” and brilliant execution plan for a certain type of query. The query in the original posting was a simple select (with no predicates) against a huge table in a data warehouse – presumably extracting a small number of columns from a much wider row.

SELECT DISTINCT
	ECP_ITEM_MASTER_DIM.ORGANIZATION_ID,
	ECP_ITEM_MASTER_DIM.INV_MFG_PRODUCTION_LINE,
	ECP_ITEM_MASTER_DIM.INV_PRODUCT_FAMILY,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_3,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_4,
	ECP_ITEM_MASTER_DIM.INV_SEGMENT_5
FROM
	ecp.ECP_ITEM_MASTER_DIM
;

(I really hate reading SQL where the whole table name has been repeated as the alias all the way through the SQL – it makes the code so hard to read, especially when it’s all in upper case. It’s important to use aliases, of course, but 3 or 4 letters is a sensible length.)

Here’s the execution plan:


---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                              |  1799K|    42M|       | 51279   (1)| 00:10:16 |
|   1 |  HASH UNIQUE                       |                              |  1799K|    42M|   151M| 51279   (1)| 00:10:16 |
|   2 |   VIEW                             | index$_join$_001             |  1799K|    42M|       | 38227   (1)| 00:07:39 |
|*  3 |    HASH JOIN                       |                              |       |       |       |            |          |
|*  4 |     HASH JOIN                      |                              |       |       |       |            |          |
|*  5 |      HASH JOIN                     |                              |       |       |       |            |          |
|*  6 |       HASH JOIN                    |                              |       |       |       |            |          |
|*  7 |        HASH JOIN                   |                              |       |       |       |            |          |
|   8 |         BITMAP CONVERSION TO ROWIDS|                              |  1799K|    42M|       |   485   (1)| 00:00:06 |
|   9 |          BITMAP INDEX FULL SCAN    | ECP_ITEM_MASTER_DIM_IMPL_BMX |       |       |       |            |          |
|  10 |         BITMAP CONVERSION TO ROWIDS|                              |  1799K|    42M|       |   230   (0)| 00:00:03 |
|  11 |          BITMAP INDEX FULL SCAN    | ECP_ITEM_MASTER_DIM_IPF_BMX  |       |       |       |            |          |
|  12 |        BITMAP CONVERSION TO ROWIDS |                              |  1799K|    42M|       |   229   (0)| 00:00:03 |
|  13 |         BITMAP INDEX FULL SCAN     | ECP_ITEM_MASTER_DIM_IS3_BMX  |       |       |       |            |          |
|  14 |       BITMAP CONVERSION TO ROWIDS  |                              |  1799K|    42M|       |   228   (0)| 00:00:03 |
|  15 |        BITMAP INDEX FULL SCAN      | ECP_ITEM_MASTER_DIM_IS4_BMX  |       |       |       |            |          |
|  16 |      BITMAP CONVERSION TO ROWIDS   |                              |  1799K|    42M|       |   201   (0)| 00:00:03 |
|  17 |       BITMAP INDEX FULL SCAN       | ECP_ITEM_MASTER_DIM_IS5_BMX  |       |       |       |            |          |
|  18 |     BITMAP CONVERSION TO ROWIDS    |                              |  1799K|    42M|       |   207   (0)| 00:00:03 |
|  19 |      BITMAP INDEX FULL SCAN        | ECP_ITEM_MASTER_DIM_OI_BMX   |       |       |       |            |          |
---------------------------------------------------------------------------------------------------------------------------

Isn’t it brilliant! The optimizer has seen that all the required columns can be found in indexes (six of them) – but they happen to be bitmap indexes so the optimizer has done a “bitmap conversion to rowid” on all six indexes one after the other with five consecutive hash joins – carrying the column values with each conversion and hash join.

Unfortunately the owner of this plan wasn’t happy with the resulting plan because a full tablescan turned out to be faster – nevertheless, it’s a very clever concept as the size of the table was measured in Gigabytes while the indexes were only a few megabytes each, allowing for a significant saving in I/O time.

I was a little curious, though, about the final join strategy. It’s annoying that Oracle didn’t report any costs on the hash join lines themselves because that could be very revealing. It’s remarkable that the value in the Bytes column for the final view (which is six columns of data) is the same as the bytes column for each index conversion (and remember that the projection from each conversion is just one data column with an accompanying rowid) – there’s clearly something wrong with the arithmetic.

This may explain why the optimizer has decided to run the 6-way join using only two running hash joins (rather than first setting up five hash tables in memory than passing the last table through them). If you think about this, when Oracle gets to the last hash join (lines 3, 4 and 18) it has to build a hash table from the result of the previous four joins and (in this case) that’s going to need a similar amount of memory as five in-memory hash tables. With that thought in mind I was puzzled that Oracle hadn’t just built five in-memory hash tables then walked through each in turn with the sixth.

Still – it’s not my (or my client’s) problem; maybe one day I’ll need to look more closely at a similar case.

[Further reading on Index Joins]

Index Join – 2

In an earlier article introducing the index join I raised a question that came up at the first ES2N virtual conference:

    “If you hint an index hash join, is there any way of telling Oracle the order in which it should use the indexes?”

Consider the following example:

create table indjoin
nologging
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rownum	val3,
	rpad('x',500) padding
from
	all_objects
where
	rownum <= 5000
;

/*

alter table indjoin
add constraint ij_pk primary key (id)

*/

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);
create unique index ij_v3 on indjoin(id, val3);

-- gather statistics: without histograms

select
	/*+ index_join(ij) */
	count(*)
from
	indjoin	ij
where
	val1 between 100 and 200
and	val2 between 50 and 150
and	val3 between 250 and 550
;

The query plan for this query is (thanks to the hint) a three-way index hash join:

-----------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |     1 |    12 |    74 |
|   1 |  SORT AGGREGATE          |                  |     1 |    12 |       |
|*  2 |   VIEW                   | index$_join$_001 |     1 |    12 |    74 |
|*  3 |    HASH JOIN             |                  |       |       |       |
|*  4 |     HASH JOIN            |                  |       |       |       |
|*  5 |      INDEX FAST FULL SCAN| IJ_V1            |     1 |    12 |    18 |
|*  6 |      INDEX FAST FULL SCAN| IJ_V2            |     1 |    12 |    18 |
|*  7 |     INDEX FAST FULL SCAN | IJ_V3            |     1 |    12 |    18 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("VAL1">=100 AND "VAL1"<=200 AND "VAL2">=50 AND
              "VAL2"<=150 AND "VAL3">=250 AND "VAL3"<=550)
   3 - access(ROWID=ROWID)
   4 - access(ROWID=ROWID)
   5 - filter("VAL1"<=200 AND "VAL1">=100)
   6 - filter("VAL2"<=150 AND "VAL2">=50)
   7 - filter("VAL3"<=550 AND "VAL3">=250)

But what if you know the data better than Oracle, and know that the join order for the three indexes should be different – there are no extra direct hints you can add to the code to tell Oracle the best order for the hash join. (You might, of course, be able to make use of the cardinality() hint – or plan around with the undocumented, hence unsupported, opt_estimate() or column_stats() or index_stats() hints, but I wouldn’t be keen to use such an indirect approach.)

But you CAN rewrite the query to get the same mechanism working under your control. The code looks more complex – but we often have to make a trade between clarity (simplicity) and speed in critical cases, so you may find some examples where the complexity is acceptable:

select
	count(*)
from
	(
	select
		/*+ no_merge */
		rowid
	from
		indjoin		ij
	where
		val1 between 100 and 200
	)	v1,
	(
	select
		/*+ no_merge */
		rowid
	from
		indjoin		ij
	where
		val2 between 50 and 150
	)	v2,
	(
	select
		/*+ no_merge */
		rowid
	from
		indjoin		ij
	where
		val3 between 250 and 550
	)	v3
where
	v2.rowid = v1.rowid
and	v3.rowid = v1.rowid
;

It’s another example of referencing a table twice (or three times) in the query because multiple references allow you to define a better execution path than a single reference. The execution we get from this plan (running under 10.2.0.3) is as follows:


------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    36 |    14 |
|   1 |  SORT AGGREGATE          |       |     1 |    36 |       |
|*  2 |   HASH JOIN              |       |   102 |  3672 |    14 |
|*  3 |    HASH JOIN             |       |   102 |  2448 |     9 |
|   4 |     VIEW                 |       |   102 |  1224 |     4 |
|*  5 |      INDEX FAST FULL SCAN| IJ_V1 |   102 |  1632 |     4 |
|   6 |     VIEW                 |       |   102 |  1224 |     4 |
|*  7 |      INDEX FAST FULL SCAN| IJ_V2 |   102 |  1632 |     4 |
|   8 |    VIEW                  |       |   302 |  3624 |     4 |
|*  9 |     INDEX FAST FULL SCAN | IJ_V3 |   302 |  4832 |     4 |
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V3".ROWID="V1".ROWID)
   3 - access("V2".ROWID="V1".ROWID)
   5 - filter("VAL1">=100 AND "VAL1"<=200)
   7 - filter("VAL2">=50 AND "VAL2"<=150)
   9 - filter("VAL3">=250 AND "VAL3"<=550)

By creating three explicit query blocks (which I’ve ring-fenced with no_merge hints), one for each index, I’ve made Oracle extract the same three sets of data that it was using in the index hash join. I’ve then left Oracle to join the three result sets – which it has done with hash joins. Interestingly this seems to have done a little less work than the original index join – the final complex filter action doesn’t appear in the manual rewrite.

Since I’ve now got a query that seems to be “just” a three table join, I can dictate the join order, guarantee the hash joins, and dictate which rowsources should be used as build rowsources, and which as probe. For example, let’s apply the following hints:

select
	/*+
		leading (v1 v3 v2)
		use_hash(v3) no_swap_join_inputs(v3)
		use_hash(v2) swap_join_inputs(v2)
	*/
	count(*)
from
        ....

The resulting plan is as follows:

------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    36 |    14 |
|   1 |  SORT AGGREGATE          |       |     1 |    36 |       |
|*  2 |   HASH JOIN              |       |   102 |  3672 |    14 |
|   3 |    VIEW                  |       |   102 |  1224 |     4 |
|*  4 |     INDEX FAST FULL SCAN | IJ_V2 |   102 |  1632 |     4 |
|*  5 |    HASH JOIN             |       |   102 |  2448 |     9 |
|   6 |     VIEW                 |       |   102 |  1224 |     4 |
|*  7 |      INDEX FAST FULL SCAN| IJ_V1 |   102 |  1632 |     4 |
|   8 |     VIEW                 |       |   302 |  3624 |     4 |
|*  9 |      INDEX FAST FULL SCAN| IJ_V3 |   302 |  4832 |     4 |
------------------------------------------------------------------

The join order (you can check the trace file to confirm this) is: ij_v1, ij_v3, ij_v2 – but because of the swap_join_inputs(v2) hint the ij_v2 index appears first in the plan.
We build a hash table with ij_v2, then build a hash table with ij_v1 with we probe (join) ij_v3.
We then use the result of joining ij_v1/ij_v3 to probe (join) ij_v2 – which means v2 really is the last object in the join order.

It may look complex – but all we’ve done is describe an index-join in detail, and that has allowed us to specify which indexes are joined when. I’ve already pointed out that the manual version appears to be slightly more efficien than the original. It’s also more powerful, and addresses a defect in the current implementation of the index join. But that’s a topic for another blog.

[Further reading on Index Joins]

Index Join

One of the less well known access paths available to the optimizer is the “index join” also known as the “index hash join” path. It’s an access path that can be used when the optimizer decides that it doesn’t need to visit a table to supply the select list because there are indexes on the table that, between them, hold all the required columns. A simple example might look something like the following:


create table indjoin
as
select
	rownum	id,
	rownum	val1,
	rownum	val2,
	rpad('x',500) padding
from
	all_objects
where
	rownum <= 3000
;

create unique index ij_v1 on indjoin(id, val1);
create unique index ij_v2 on indjoin(id, val2);

-- collect stats on the table and indexes

select
	ij.id
from
	indjoin		ij
where
	ij.val1 between 100 and 200
and	ij.val2 between 50 and 150
;

Note that the columns in the where clause appear in (some) indexes, and the column(s) in the select list exist in (at least) some indexes. Under these circumstances the optimizer can produce the following plan (the test script was one I wrote for 8i – but this plan comes from an 11.1 instance):


---------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     3 |    36 |    24 |
|   1 |  VIEW                  | index$_join$_001 |     3 |    36 |    24 |
|*  2 |   HASH JOIN            |                  |       |       |       |
|*  3 |    INDEX FAST FULL SCAN| IJ_V1            |     3 |    36 |    11 |
|*  4 |    INDEX FAST FULL SCAN| IJ_V2            |     3 |    36 |    11 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(ROWID=ROWID)
   3 - filter("VAL1"<=200 AND "VAL1">=100)
   4 - filter("VAL2"<=150 AND "VAL2">=50)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "ID"[NUMBER,22]
   2 - (#keys=1) "ID"[NUMBER,22]
   3 - ROWID[ROWID,10], "ID"[NUMBER,22]
   4 - ROWID[ROWID,10]

We do a fast full scan of the two indexes extracting the rowid and id from index ij_v1 and just the rowid from index ij_v2. We can then get the result we want by doing a hash join between these two result sets on the rowid values because any time the two rowsources have a rowid in common, it’s a rowid for a row where val1 is between 100 and 200, and val2 is between 50 and 150 and the first rowsource is carrying the id - which is the thing we need to report.

There are a couple of little observations that we can make about this example.

    First, although I’ve only used two indexes in this example Oracle is not limited to just two indexes. The number of indexes that could be used is effectively unlimited.
    Second, the index_join path is strictly limited to cases where the optimizer can see that every column in the query can be found in indexes on the table.
    Third, although my example uses index fast full scans that’s not a necessary feature of the plan. Just like any other hash join, Oracle could use an index range (or full) scan to get some of the data.
    Finally, there are clearly a couple of bugs in the code.

Bugs:

If you check the rows/bytes columns in the plan you’ll see that the predicted number of rows selected is the same for both indexes (lines 3 and 4) – but we extract the rowid and the id from the first index (projection detail for line 3), so the total data volume expected from line 3 is slightly larger than the total data volume from line 4 where we extract only the rowid; theoretically, therefore, the optimizer has used the tables (indexes) in the wrong order – the one supplying the smaller volume of data should have been used as the first (build) rowsource.

More significantly, though, a quick check of the code that generates the data tells you that each index will supply 101 rows to the hash join – and you can even show that for other query execution plans the optimizer will calculate this cardinality (nearly) correctly. In the case of the index join the optimizer seems to have lost the correct individual cardinalities and has decided to use the size of the final result set as the cardinality of the two driving index scans.

There’s more, of course – one of the strangest things about the index join is that if your select list includes the table’s rowid, the optimizer doesn’t consider that to be a column in the index. So even though the predicate section of the plan shows the rowids being projected in the hash join, Oracle won’t use an index join for a query returning the rowid !

Footnote: The reason I’ve written this brief introduction to the index join is because an interesting question came up at the first E2SN virtual conference.

“If you hint an index hash join, is there any way of telling Oracle the order in which it should use the indexes?”

The answer is no – but there are ways of creating code that will do what you want, and that will be the topic of my next blog.

[Further reading on Index Joins]