For as long as I have been working with Oracle technology {which is now getting towards 2 decades and isn’t that pause for thought} there has been a constant search for Performance Silver Bullets – some trick or change or special init.ora parameter {alter system set go_faster_flag=’Y'} you can set to give you a guaranteed boost in performance. For all that time there has been only one.
There are a few performance Bronze Bullets…maybe Copper Bullets. The problem is, though, that the Oracle database is a complex piece of software and what is good for one situation is terrible for another. Often this is not even a case of “good 90% of the time, indifferent 9% of the time and tragic 1% of the time”. Usually it is more like 50%:30%:20%.
I’ve just been unfair to Oracle software actually, a lot of the problem is not with the complexity of Oracle, it is with the complexity of what you are doing with Oracle. There are the two extremes of OnLine Transaction Processing (lots of short running, concurrent, simple transactions you want to run very quickly by many users) and Data Warehouse where you want to process a vast amount of data by only a small number of users. You may well want to set certain initialisation parameters to favour quick response time (OLTP) or fastest processing time to completion (DW). Favouring one usually means a negative impact on the other. Many systems have both requirements in one… In between that there are the dozens and dozens of special cases and extremes that I have seen and I am just one guy. People get their database applications to do some weird stuff.
Partitioning is a bronze bullet. For many systems, partitioning the biggest tables makes them easier to manage, allows some queries to run faster and aids parallel activity. But sometimes (more often than you might think) Partitioning can drop rather than increase query or DML performance. In earlier versions of Oracle setting optimizer_index_caching and optimizer_index_cost_adj was often beneficial and in Oracle 9/8/7 setting db_file_multiblock_read_count “higher” was good for DWs….Go back to Oracle 7 and doing stuff to increase the buffer cache hit ratio towards 98% was generally good {and I will not respond to any comments citing Connors magnificent “choose your BCHR and I’ll achieve it” script}.
You know what? There was an old trick in Oracle 7 you could maybe still look at as a bronze bullet. Put your online redo logs and key index tablespaces on the fastest storage you have and split your indexes/tables/partitions across the faster/slower storage as is fit. Is all your storage the same speed? Go buy some SSD and now it isn’t….
Then there are bronze bullets that you can use that very often improve performance but the impact can be catastrophic {Let’s call them wooden bullets
}. Like running your database in noarchivelog mode. That can speed up a lot of things, but if you find yourself in the situation of needing to do a recovery and you last cold backup is not recent enough – catastrophe. A less serious but more common version of this is doing things nologging. “oh, we can just re-do that after a recovery”. Have you done a test recovery that involved that “oh, we can just do it” step? And will you remember it when you have a real recovery situation and the pressure is on? Once you have one of these steps, you often end up with many of them. Will you remember them all?
How many of you have looked at ALTER SYSTEM SET COMMIT_WRITE=’BATCH,NOWAIT’? It could speed up response times and general performance on your busy OLTP system. And go lose you data on crash recovery. Don’t even think about using this one unless you have read up on the feature, tested it, tested it again and then sat and worried about could possibly go wrong for a good while.
That last point is maybe at the core of all these Performance Bronze Bullets. Each of these things may or may not work but you have to understand why and you have to understand what the payback is. What could now take longer or what functionality have I now lost? {hint, it is often recovery or scalability}.
So, what was that one Silver Bullet I tantalizingly left hanging out for all you people to wait for? You are not going to like this…
Look at what your application is doing and look at the very best that your hardware can do. Do you want 10,000 IOPS a second and your storage consists of less than 56 spindles? Forget it, your hardware cannot do it. No matter what you tune or tweak or fiddle with. The one and only Performance Silver Bullet is to look at your system and your hardware configuration and work out what is being asked and what can possibly be delivered. Now you can look at:
It’s hard work but it is possibly the only Silver Bullet out there. Time to roll up our sleeves and get cracking…
{Many Thanks to Kevin Closson for providing all the pictures – except the Silver Bullet, which he only went and identified in his comment!}

<..IOT1 – the basics
<….IOT2 – Examples and proofs
IOT4 – Boosting Buffer Cache Efficiency..>
IOT5 – Primary Key issues….>
In the previous two posts I covered the basics of Index Organized Tables (IOTs) and then created some test tables to demonstrate the benefit of IOTs that is most often covered – reducing the IO needed to get a single record by one IO, say from 5 to 4. {Whether this is a buffer get from memory or a disc IO depends on if the block is cached, of course}.
In this post I am going to show how IOTs can far more significantly reduce the IO when several related rows are required.
Below is one of my test tables, the one that is a normal heap table and has a primary key, CHHE_PK on PARE_ID and CRE_DATE:
mdw11> desc child_heap Name Null? Type ----------------------------------------- -------- -------------- PARE_ID NOT NULL NUMBER(10) CRE_DATE NOT NULL DATE VC_1 NOT NULL VARCHAR2(100) DATE_1 DATE NUM_1 NUMBER(2) NUM_2 NUMBER(2) -- mdw11> select count(*),count(distinct(pare_id)) from child_heap COUNT(*) COUNT(DISTINCT(PARE_ID)) ---------- ------------------------ 1000000 9999
As you can see, the table has 1 million records and 9,999 values for PARE_ID, there are approx 100 records per parent. The data was created to match a common situation – that of a bit of data coming in for each parent every day. See post 2 for details.
The result of this is that the data for any given parent is scattered through the table. As the data comes in for a given day, the data for the first parent is added to the end of the table, followed by all the data for all the other parents who have data that day. The next day this is repeated, so the child records for a given parent are interspersed with the child records for many other parents.
The below diagram demonstrate what will now happen if you issue a statement like
select *
from CHILD_HEAP
where PARE_ID=12
Oracle quickly works down the index to the leaf block containing the first key that matches the range. This takes, in my example, 4 block reads. Oracle now works through the index entries and, via the rowid, identifies the exact block to visit in the table for each key. For each key it has to visit a new block – because the data is scattered through the table. This is what the clustering_factor in the index statistics is measuring, how often contiguous rows in the index are for the same block. In our case, almost never.
In my diagram I do not number those table reads but in my simplistic diagram it would be 10 further reads.
If Oracle reaches the end of the leaf block before it reaches the end of the range of key values, oracle follows the pointer in the leaf block (not shown) to the next leaf block (whcih is another block read) and continues working through the keys until the range scan is completed.
In my simplified diagram I only have 6 entries per leaf block. In reality, and in my example tables, this is more like a few hundred. 247 in the case of CHHE_PK.
Now let’s consider my Index Organized Table, CHILD_IOT. It has exactly the same columns as CHILD_HEAP and the data was created in the same way. However, because it is an IOT, as the data came in it was inserted into the primary key index and is thus in an ordered state.
The below diagram demonstrate what will now happen if you issue a statement like
select *
from CHILD_IOT
where PARE_ID=12
Oracle works down the index to the leaf block where the range scan begins and now simply works along the leaf blocks. There is no need to go and visit the table as there is no table.
In my IOT diagram the leaf entries are longer and there are fewer in each leaf block, ie 5. So my scan has to visit 3 leaf blocks rather than 2. In reality the difference is more pronounced, in my example table there are actually 56 rows per leaf block, compared to the 247 in the index on the heap table. As such, my scan on the IOT will cover more leaf blocks but this is insignificant compared to the reduction in block visits caused by not having to go hunt down records scattered over the table. Even in the unlikely event of my IOT being deeper by 1 level (an extra layer of branch blocks) due to the reduces entries per leaf block, I would still be winning for range scans.
That is all nice theory and pictures. As ever, we need to back this up with some real tests. Firstly, I am using SQL*Plus and I need to set my arraysize large enough so that I do not introduce extra consistent gets through selecting small sets of rows between client and server. You will need to do the same to see similar results to me.
{I keep meaning to do a dedicated post on arraysize but H.Tonguç YIlmaz has a nice post already on it.}
set arraysize 200
set autotrace on
Now I will select all the records for PARE_ID=10, including a column not in the Primary Key, so that the table needs to be visited. I did this twice to remove the parsing overhead:
select pare_id,cre_date,vc_1
from child_heap
where pare_id =10
order by cre_date
PARE_ID CRE_DATE VC_1
---------- --------- -----------------------------------------------------------------------
10 17-APR-11 LDOBKMLCYCSQYBDFIUISJWQAHNYSQOSUQJKIGCSEJHDPOFFLHHXYSMDSQNUB
10 18-APR-11 LBGDNOYQFQMTMJQRAUWSRNBTHQSKBEUVLZSFWEGULOPDXQSVXOIC
10 18-APR-11 LBGDNOYQFQMTMJQRAUWSRNBTHQSKBEUVLZSFWEGULOPDXQSVXOICOSFTSYNO
10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJKZNII
10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJ
10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJ
10 20-APR-11 USIGVSPPIUUXEIRBMPFNBTTMDUJTVITHKQWZAKZOMJEDZCUPQAEFQQEYM
10 20-APR-11 USIGVSPPIUUXEIRBMPFNBTTMDUJTVITHKQWZAKZOMJEDZCUPQAEF
...
10 19-JUL-11 BNOYCIDTFJHPPOYPSVAVKJSYUNVPGPHLJXUOIKYKASKHYGZNVHVFFGPVAKN
10 25-JUL-11 HDFGAQWTYZBSVYVXTFFRDIAKRYWFUPFCNDCETHUWHSQUITHHVUEJTJ
82 rows selected.
Execution Plan
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 6900 | 103 (0)| 00:00:02 |
| 1 | TABLE ACCESS BY INDEX ROWID| CHILD_HEAP | 100 | 6900 | 103 (0)| 00:00:02 |
|* 2 | INDEX RANGE SCAN | CHHE_PK | 100 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
86 consistent gets
0 physical reads
82 rows collected and 86 consistent gets. That will be 4 consistent gets to process the index blocks and 82 for the table blocks.
Now let’s repeat that on the IOT:
select pare_id,cre_date,vc_1
from child_IOT
where pare_id =10
order by cre_date
mdw11> /
any key>
PARE_ID CRE_DATE VC_1
---------- --------- ------------------------------------------------------------
10 17-APR-11 QJHQXTQAYEUICPNDQTYMMFZPWJSIDLBKOXYTHLEHKTVWUPKQMWUUX
10 18-APR-11 BUTUEWDCDQVPLTPPRFGBBEDOZYRPERPRROVUQPTSRZLHKVBSBUEAMZYAS
10 18-APR-11 BUTUEWDCDQVPLTPPRFGBBEDOZYRPERPRROVUQPTSRZLHKVBSBUEAMZY
10 19-APR-11 DEGNPALVLMIDYCYIQIIQJJVZFTNIMEULMAGDEWVTOAKBNHOPUQJE
10 19-APR-11 DEGNPALVLMIDYCYIQIIQJJVZFTNIMEULMAGDEWVTOAKBNHOPUQJ
...
10 24-JUL-11 TJGLOEITTVXQTQPHSKGVERSGJDREYSKKCDUFMQXQVXMHMMDWPLJNSNK
10 24-JUL-11 TJGLOEITTVXQTQPHSKGVERSGJDREYSKKCDUFMQXQVXMHMMDWPLJNSNKCN
10 25-JUL-11 BCLLVPYMWAAQOVLILXARQZXEGAQAARPURIFKFKHROUSFORRYYXQZUAJHDBL
108 rows selected.
Execution Plan
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 6900 | 4 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| CHIO_PK | 100 | 6900 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
We actually gathered more data, 108 rows compared to 82, all for 6 consistent gets compared to 86 consistent gets. That is a reduction to less than 10% of the original effort.
Now for a more extreme test. I am going to select a single row summary of data for 10 parents, flushing the cache between each run to show the impact when you have to do real IO to support those consistent gets. This is on a fairly old {4 years} laptop with a rather tired hard disc
alter system flush buffer_cache
System altered.
Elapsed: 00:00:00.18
--
--
select count(*),sum (num_1)
from child_heap
where pare_id between 50 and 60
COUNT(*) SUM(NUM_1)
---------- ----------
1155 12031
Elapsed: 00:00:06.39
Execution Plan
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 1203 (0)| 00:00:18 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| CHILD_HEAP | 1200 | 8400 | 1203 (0)| 00:00:18 |
|* 3 | INDEX RANGE SCAN | CHHE_PK | 1200 | | 7 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1157 consistent gets
1112 physical reads
--
--
alter system flush buffer_cache
System altered.
Elapsed: 00:00:00.18
--
--
select count(*),sum (num_1)
from child_iot
where pare_id between 50 and 60
COUNT(*) SUM(NUM_1)
---------- ----------
1111 11528
Elapsed: 00:00:00.29
Execution Plan
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 24 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
|* 2 | INDEX RANGE SCAN| CHIO_PK | 1200 | 8400 | 24 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
25 consistent gets
25 physical reads
The Heap took 6.39 seconds and required 1112 physical reads to support 1157 consistent gets.
The IOT took 0.29 seconds and required 25 physical reads to support 25 consistent gets.
I think we can all see that IOTs have the potential to greatly reduce physical and logical IO. Perhaps we should all be using IOTs more.
Final point. The Heap version took less physical reads than consistent gets as some blocks read into the block buffer cache held data required later in the query.
The impact of IOTs on the buffer cache will be the topic of my next post on IOTs. I think { hope:-) } that many of you will be very interested and impressed by what you could gain…
I’m following up on a conference almost half a year later — try to bet that! Actually, this blog post was written more than 3 months ago and was sitting in my drafts waiting the moment I understand why I really wrote it. 3 months later… I still don’t know but I thought I should [...]
<.. IOT1 – Basics
IOT3 – Great reductions in IO for IOTs..>
IOT4 – Boosting Buffer Cache Efficiency….>
IOT5 – Primary Key issues……>
In my first post on IOTs I ran through the basics of what they are. Here I am going to create some test tables and show you a few things.
I am going to create a simple PARENT table with 9,999 records and then two CHILD tables. CHILD_HEAP, a normal table, and CHILD_IOT, an Index Organized Table. They have the same columns and will hold very similar data.
All of this is on Oracle 11.1 but is exactly the same on 10.2. 8K block size, tablespaces are auto segment space managed.
Here are the creation statements:
--first create the parent table, keyed by ID.
-- The other columns are not significant, they just represent "information"
create table mdw.parent
(id number(10) not null
,name varchar2(100) not null
,date_1 date not null
,num_1 number(2)
,num_2 number(2)
,constraint pare_pk primary key(id)
using index tablespace index_01
)
tablespace data_01
/
--
--Now put my 9999 parents into the table.
insert into parent
select rownum
,dbms_random.string('U',mod(rownum,10)+50)
,sysdate-(mod(rownum,500)+1000)
,mod(rownum,99)+1
,trunc(dbms_random.value(0,100))
from dual connect by level < 10000
/
--
-- create the table to hold the children as a heap table
create table child_heap
(pare_id number(10) not null
,cre_date date not null
,vc_1 varchar2(100) not null
,date_1 date
,num_1 number(2)
,num_2 number(2)
,constraint chhe_pk primary key(pare_id,cre_date)
using index tablespace index_01
)
tablespace data_01
/
--
-- create the table to hold the children as an IOT table
create table child_iot
(pare_id number(10) not null
,cre_date date not null
,vc_1 varchar2(100) not null
,date_1 date
,num_1 number(2)
,num_2 number(2)
,constraint chio_pk primary key(pare_id,cre_date)
-- using index tablespace index_01 -- CANNOT STATE for IOT. State in table definition
)
ORGANIZATION INDEX -- This is it. This makes the table an IOT
tablespace data_01
/There are only two differences between the statements creating the CHILD_HEAP and the CHILD_IOT tables.
The main one is the inclusion of the line ORGANIZATION INDEX and is what instructs Oracle to create the table as an IOT. Note that it does not state the index and you cannot state the index. The IOT is created based on the Primary Key.
The other change is that you now cannot state the tablespace for the Primary Key index. I’ve not played with this at all but I don’t think you can state anything with the “using index” as the table storage clauses are used for the Primary Key index. I personally find this a little illogical as it is the index segment that is created, but I guess others would find it more natural that you still state this at the table level.
When I create IOTs on a real system, I put the IOT in a table tablespace {I still maintain table and index tablespaces, for reasons I won’t go into here}. I put it there as it holds the actual data. If I lose that Primary Key index I am losing real data, not duplicated data.
I then populated the two CHILD tables with data. The method of creating this test data is very important.
I am simulating a very common situation, where data is coming in for a set of Parents (think customers, accounts, scientific instruments, financial entities) and the data is coming in as a record or set of records each day. ie not where the parent and all of it’s child records are created at one time, like an order and it’s order lines. I am simulating where the child data is created a few records at a time, not all in one go.
The code is simple. it loops for one hundred days and for each day it creates 10,000 records for random parents. On each day any given parent will have none, one or several records. On average, each parent will end up with 100 records, but some will have more and some less. The key thing is that the data for any given parent is created a record at a time, with lots of records created for other parents before the next record for that given parent.
The two tables will have the same pattern of data but not identical data. {I could have seeded the random number generator to make the two data sets the same but this will do}. Below is the statement for one table, you just change the table name to populate each table. {BTW I like using the from dual connect by level <=x method of getting the number of rows desired – it is fast and is neat, once you have seen it once}.
declare
v_num number :=10000; -- number of people
v_str varchar2(60);
begin
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..100 loop --days to do
v_str:=dbms_random.string('U',60);
insert into CHILD_HEAP
(pare_id,cre_date,vc_1,date_1,num_1,num_2)
select
trunc(dbms_random.value(1,v_num))
,sysdate-(100-i) + (rownum/(60*60*24) )
,substr(v_str,1,51+mod(rownum,10))
,sysdate-(100-i) + ((mod(rownum,30)+1)/3)
,mod(rownum,20)+1
,mod(rownum,99)+1
from dual connect by level <=v_num;
end loop;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/I then gathered objects stats on the tables.
Let’s check the size of the tables:
select segment_name, segment_type,tablespace_name,blocks from dba_segments where owner=USER and segment_name like 'CHILD%'; SEGMENT_NAME SEGMENT_TYPE TABLESPACE_NAME BLOCKS --------------- --------------- --------------- ---------- CHILD_HEAP TABLE DATA_01 12288 1 row selected.
ONE row? Where is the other table, where is CHILD_IOT? It does not exists.
Remember from my first post that I made the comment I would have prefered it if Index Organized Tables had been called something like ‘Table Containing Indexes’? The table data has been placed in the Primary Key index and the table segment does not even exist. If you start using IOTs this will catch you out periodically – it does me anyway and I’ve been using them on and off for years
.
Let’s look at the size of the primary key indexes:
select segment_name, segment_type,tablespace_name,blocks from dba_segments where owner=USER and segment_name like 'CH%PK' and segment_name not like '%ORD%' SEGMENT_NAME SEGMENT_TYPE TABLESPACE_NAME BLOCKS --------------- --------------- --------------- ---------- CHHE_PK INDEX INDEX_01 4224 CHIO_PK INDEX DATA_01 19456 2 rows selected.
Note that the Primary Key index for CHILD_HEAP, CHHE_PK, is there and is 4,224 blocks in size, and the CHILD_IOT Primary Key, CHIO_PK, is a lot larger at 19,456 blocks. In fact, not only is the CHIO_PK index larger than the CHILD_HEAP table, it is larger than the combined size of the CHILD_HEAP table and CHHE_PK index combines. So much for me saying last post that IOTs can save disk space? I’ll come back to that in a later post…
Here are some other stats from one of my scripts:
mdw11> @tab_sci_own owner for Table: mdw Name for Table: child_heap OWNER TABLE_NAME NUM_ROWS BLOCKS AVG_L GLS ULS LST_ANL PRT SAMP_SIZE -------- -------------- ------------- ----------- ----- --- --- ------------ --- ---------- MDW CHILD_HEAP 1000,000 12,137 83 YES NO 250711 22:01 NO 1000000 INDEX_NAME TYP PRT UNQ BL L_BLKS DIST_KEYS CLUSTF LB_KEY DB_KEY LST_ANL --------------- --- --- --- -- ---------- ----------- ------------ ---------- ---------- ------------ CHHE_PK NOR NO UNI 2 4,034 1000,000 995,857 1 1 250711 22:02 INDEX_NAME TABLE_NAME PSN COL_NAME ---------------------------- ---------------- --- ------------------------------------------------ CHHE_PK CHILD_HEAP 1 PARE_ID CHHE_PK CHILD_HEAP 2 CRE_DATE -- -- owner for Table: mdw Name for Table: child_iot OWNER TABLE_NAME NUM_ROWS BLOCKS AVG_L GLS ULS LST_ANL PRT SAMP_SIZE -------- -------------- ------------- ----------- ----- --- --- ------------ --- ---------- MDW CHILD_IOT 1000,000 83 YES NO 250711 22:03 NO 1000000 INDEX_NAME TYP PRT UNQ BL L_BLKS DIST_KEYS CLUSTF LB_KEY DB_KEY LST_ANL --------------- --- --- --- -- ---------- ----------- ------------ ---------- ---------- ------------ CHIO_PK IOT NO UNI 2 17,855 910,881 0 1 1 250711 22:03 INDEX_NAME TABLE_NAME PSN COL_NAME ---------------------------- ---------------- --- ------------------------------------------------ CHIO_PK CHILD_IOT 1 PARE_ID CHIO_PK CHILD_IOT 2 CRE_DATE
Note the lack of BLOCKS for the CHILD_IOT table and the CLUSTERING_FACTOR of 0 for the CHIO_PK.
The clustering factor is the number of times Oracle, when scanning the whole index in order, would have to swap to a different Table block to look up the table record for each index entry. If it is close to the number of blocks in the table, then the clustering factor is low and the order of records in the table matches the order of entries in the index. This would make index range scans that need to visit the table reasonably efficient.
If the clustering factor is close to the number of records in the table then it means there is no correlation between index order and table row order and such index ranges scans that have to visit the table would be inefficient. Again, this is significant and will be the major topic of the next post.
The depth of the index does not change, being 3 in each case (BL or blevel 2)
So, can we see evidence of the theoretical efficiency of looking up single records via the IOT that I mentioned in the fist post? Here we go {oh, usual disclaimer, I run the code twice and show the second run, to remove the parsing overhead}:
-- First the Heap table
select * from child_HEAP where PARE_ID=1234
AND cre_date=to_date('24-JUN-11 20:13:21','DD-MON-YY HH24:MI:SS')
PARE_ID CRE_DATE VC_1
---------- --------- ------------------------------------------------------
DATE_1 NUM_1 NUM_2
--------- ---------- ----------
1234 24-JUN-11 LUTFHOCIJNYREYICQNORREAJOVBRIHFVLXNIGIVZDMFJCTGYFWC
25-JUN-11 11 16
1 row selected.
Execution Plan
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 83 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| CHILD_HEAP | 1 | 83 | 3 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | CHHE_PK | 1 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
--and now the IOT table
select * from child_IOT where PARE_ID=1234
AND cre_date=to_date('24-JUN-11 21:23:41','DD-MON-YY HH24:MI:SS')
PARE_ID CRE_DATE VC_1
---------- --------- -------------------------------------------------------
DATE_1 NUM_1 NUM_2
--------- ---------- ----------
1234 24-JUN-11
CSIGBHSXWNDDTCFRCNWYPRNLEQWPCRYTXQQZHACDEXHOBEYXLNYBHRUHJ
27-JUN-11 7 52
1 row selected.
Execution Plan
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 83 | 2 (0)| 00:00:01 |
|* 1 | INDEX UNIQUE SCAN| CHIO_PK | 1 | 83 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets{I had to look up the exact values of CRE_DATE of a couple of records to do the above queries}
To look up a single row with the heap table you can see that the explain plan was to carry out a unique scan on the primary key and then look up the row via the rowid and took 4 consistent gets. 3 to walk down the index and get the rowid, one to look up the row block.
For the IOT table the explain plan reveals that there was simply an index unique scan of the Primary Key, nothing more. All data for the row was there in the index entry rather than the rowid. Thus only 3 consistent gets were required.
For single row lookups on the Primary Key, IOTS are more efficient than traditional Heap tables with a Primary Key index. {Please, no one point out that if all the columns you need are in the index you also do not need to go to the table, that is a different topic}.
Quite a few people have shown this efficiency before but the next step is far, far more interesting and shows a much more significant impact of IOTs. That is the topic of the next post
.
For now, I am going to finish off with what happens with range scans as I suggested they could slow down with an IOT.
Below, I select count(*) for just one of the parent values.
select count(*) from child_heap where pare_id = 2
COUNT(*)
----------
98
Execution Plan
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX RANGE SCAN| CHHE_PK | 100 | 400 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
--
--
select count(*) from child_iot where pare_id = 2
COUNT(*)
----------
93
Execution Plan
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX RANGE SCAN| CHIO_PK | 100 | 400 | 4 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent getsBoth statements carry out a range scan on the Primary Key of the table. For the normal HEAP table this takes 3 consistent gets, which is no suprise as we have an 8k block size and only 100 rows for a given parent, they happen to fit into one block of the index. So Oracle works down the depth of the index and looks at one block.
For the IOT the scan works down the index but has to scan three blocks. Even though there are fewer entries, 93 compared to 98, they span three blocks and thus the total number of consistent gets is 5.
Admittedly I was a little lucky in my example above. Sometimes the entries for one parent will scan 2 blocks for the heap table’s Primary Key and occasionally the entries for the IOT will fit into 2 blocks. But if you look at the number of leaf blocks in the earlier stats (4,034 for the normal and 17,855 for the IOT, both for 10,000 entries) usually the 100 or so entries for single parent in the normal index will all fall into one block and the entries for the IOT will fall into between 2 and 3 blocks.
A select count(*) will full scan the smallest segment that can satisfy the query. Let’s try it:
mdw11> select count(*) from child_heap
COUNT(*)
----------
1000000
Execution Plan
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 989 (1)| 00:00:15 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| CHHE_PK | 1000K| 989 (1)| 00:00:15 |
-------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
2 db block gets
4109 consistent gets
4088 physical reads
mdw11> select count(*) from child_iot
COUNT(*)
----------
1000000
Execution Plan
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4359 (1)| 00:01:05 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| CHIO_PK | 1000K| 4359 (1)| 00:01:05 |
-------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
19298 consistent gets
19246 physical readsThe number of consistent gets (and physical reads) are close to the number of leaf blocks in the two segments, though higher. This is because Oracle is scanning the whole index, leaf blocks and branch blocks. The scan is far more expensive for the IOT, simply as the index is so much larger. I’ve not shown timings but on my little laptop, the count(*) takes about 3 seconds on CHILD_HEAP and about 5 seconds on the CHILD_IOT.
That is enough for one post.
Working with 100 talented database engineers is fun and there are lots going on — lots of exciting (and not so much) projects ongoing, huge amount of problems solved, mistakes made (and learned from), many unique (as well as routine) customer needs satisfied, many new (and old) methods applied, many good (and less so) tools [...]
As I’m done with the book and back from a quick vacation (to Prague, which is an awesome place – well, at least during the summer) I promised (in Twitter) that now I’d start regularly writing blog articles again. In a separate tweet I asked what to write about. Among other requests (which I’ll write about later), one of the requests was to write something about enabling and reading SQL trace files…
I am probably not going to write (much) about SQL trace for a single reason – Cary Millsap has already written a paper so good about this topic, that there’s no point for me to try to repeat it (and my paper wouldn’t probably be as clear as Cary’s).
So, if you want to get the most out of SQL Trace, read Cary’s Mastering Performance with Extended SQL Trace paper:
The above link directs you to Method-R’s article index, as there’s a lot of other useful stuff to read there.
Wow, now I’m done with my first request – to write something about SQL Trace :-)
Earlier I did a presentation at VirtaThon which is the same topic that I presented at Hotsos 2011.. Mining the AWR and Capacity Planning are very dear to my heart and up until now I’m using every research I did on that presentation to work on an “Exadata Provisioning Tool” which I’m planning to present at the next Hotsos 2012… well, the only thing that’s different this time is.. my attendees are virtual geeks all over the world
IOT2 – Examples and proofs..>
IOT3 – Greatly reducing IO with IOTs….>
IOT4 – Boosting Buffer Cache Efficiency……>
I think Index Organized Tables(IOTs) are a much under-used and yet very useful feature of Oracle. Over the next few postings I’m going to cover some aspect of Index Organised Tables, both good and not-so-good. I am going to cover some benefits of IOTs that I think many people are unaware of. In this first post I am just going to run through the basics of IOTs.
The idea behind an IOT is simple. You hold all the data for the table in the ordered structure of an index. Why would you want to do that? Let us consider a very common requirement, accessing a row in a “large” table via a known, unique key.
Traditionally you have a heap table holding the data you want to access and a standard index to support access to that table. See the first diagram below. The 4-layer triangle represents the index, with a root block, two levels of branch blocks and then the leaf blocks at the “bottom”. The blue rectangle represents the table with the squares being individual rows. Of course, in a large table there would be thousands or millions of “squares”, this is just a simple diagram to show the idea.
When you issue a SQL statement to select the row via the indexed column(s) then oracle will read the root block (1), find the relevent block in the first level of branch blocks (2), then the relevant block in the second level of branch blocks (3) and finally (as far as the index is concerned) the relevant Leaf Block for the unique key. The leaf block holds the indexed column(s) and also the rowid. The rowid is the fastest way to look up a record, it states the file, block and row offset for the row. This allows oracle to go straight to the block and get the row. That is read number (5).
The number of branch blocks {and thus the number of blocks that need to be read to find a row} will vary depending on how much data is indexed, the number and size of the columns in the index, how efficiently the space has been used in the blocks and one or two other factors. In my experience most indexes for tables with thousands or millions of rows have one, two or three levels of branch blocks.
The second diagram shows a representation of the Index Organized Table. The table has in effect disappeared as a distinct object and the information has been moved into the leaf blocks of the index {part of me feels Index Organized Tables should really be called Table Organized Indexes or Table Containing Indexes as that would better indicate what is physically done}:

So with the IOT oracle reads the root block (1), the two branch level blocks (2 and 3) and finally the leaf block (4). The leaf block does not hold the rowid but rather the rest of the columns for the table {this can be changed, a more advanced feature allows you to store some or all the extra columns in an overflow segment}. Thus to access the same data, Oracle has to read only 4 blocks, not 5. Using an IOT saves one block read per unique lookup.
This saving of block reads is probably the main feature that IOTs are known for, but there are others which I will cover in later posts. Two things I will mention now is that, firstly, the use of IOTs is potentially saving disc space. An index is in effect duplication of data held in the table. When you create an index no new information is created but space is used up holding some of the table information in a structure suitable for fast lookup. Secondly, the index and table have to be maintained whenever a change is made to the columns that are indexed. IOTs reduce this maintenance overhead as there is only one thing to maintain.
Now for some drawbacks.
So, a brief summary is that Index Organised Tables effectively move the table data into the Primary Key index, reduce the number of block lookups needed to select one row, can save some disc space. But you can only organize the table via the Primary Key and it can make full or partial table scans and lookups via other indexes slower.
There are several more important benefits to IOTs {in my opinion} which I will come to over the next week or two.
It’s been a long time since my last blog and ever since I joined Enkitec I’ve been busy immersing myself in Exadata stuff. So most of the time I’m just posting my brain dumps on my wiki although I know there’s a lot of blog worthy scenarios and projects that I have worked on just like last week when we did an Exadata Half Rack X2-2 Split Configuration from the factory image without the use of Oracle’s ACS. We did all of the pre-config and config tasks like the onecommand, patched the database to 11.2.0.2 BP8, patched the cells to 11.2.2.3.2, and did all the post config tasks… all of these config are tailored according to the client’s needs and not the default template install/config done by Oracle’s ACS.
Recent comments
21 weeks 2 days ago
31 weeks 13 hours ago
32 weeks 5 days ago
35 weeks 6 days ago
38 weeks 1 day ago
47 weeks 5 days ago
49 weeks 2 days ago
50 weeks 2 days ago
50 weeks 3 days ago
1 year 1 week ago