Search

OakieTags

Who's online

There are currently 0 users and 50 guests online.

Recent comments

Affiliations

Index Organized Tables

What Have I Let Myself in For! – UKOUG this year

One of my favourite Oracle happenings of the year is fast approaching, the UK Oracle User Group technical conference {see/click on the link on the right margin}. I’ve blogged before ( like here, last year) why I think it is so good.

I try and present at the conference each year and I go no matter if I am presenting or not.

However, this year I think I might have got myself into trouble. I put forward 3 talks, expecting one or possibly two to get through. One on Index Organized Tables, one on IT disasters and one as an introduction to database design – I’ve moaned about it being a dying art so I figured I should get off my backside and do something positive about it. Each talk is in a different stream.

IOT P6(a) Update

In my last post, IOT part 6, inserts and updates slowed down, I made the point that IOT insert performance on a relatively small Oracle system was very slow, much slower than on a larger system I had used for professional testing. A major contributing factor was that the insert was working on the whole of the IOT as data was created. The block buffer cache was not large enough to hold the whole working set (in this case the whole IOT) once it grew beyond a certain size. Once it no longer fitted in memory, Oracle had to push blocks out of the cache and then read them back in next time they were needed, resulting in escalating physical IO.

Why Are My Indexes Still Valid Quiz ? (Move On)

OK, this quiz is a nice easy one, the lads at work got this without too much trouble.    Normally, when you MOVE (re-org) a table, all the associated indexes become Unusable. As below:         So the indexes are now all unusable ..     However, I previously created another table called BOWIE that [...]

IOT Part 5 – Primary Key Drawback – and Workaround

<..IOT1 – the basics
<….IOT2 – Examples and proofs
<……IOT3 – Significantly reducing IO
<……..IOT4 – Boosting Buffer Cache efficiency

One of the drawbacks of IOTs is that they have to be organised by the primary key of the table. If your table does not have a primary key, it cannot be Index Organized.

I would argue that any table that holds persistent data (ie it is not transient data about to be loaded into the database proper or a temporary working set) should have a Primary Key. If I am working on a system and come across a table without a Primary Key I immediately challenge it. {There are occasional, valid reasons for a persistent table to lack a PK, but I confess I am struggling right now to come up with one – but I digress}. I’m a big fan of database-enforced referential integrity.

The problem is, if you you are making a table into an Index Organized Table so that the records are clustered to match how you process the data, it could well be that the primary key is not related to how you want to order the data. Let me give you an example. {Oh, and for brevity, I’ll put the SQL statements to create the examples at the end of this post}.

mdw11> desc ACCOUNT
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ----------------------
 ACCO_TYPE                                             NOT NULL NUMBER(2)  ---PKK
 ACCO_ID                                               NOT NULL NUMBER(10) ---PK
 NAME                                                  NOT NULL VARCHAR2(100)
 DATE_1                                                NOT NULL DATE
 NUM_1                                                          NUMBER(2)
 NUM_2                                                          NUMBER(2)

mdw11> desc TRANSACTION_HEAP
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ----------------------
 TRAN_TYPE                                             NOT NULL NUMBER(2)  ---PK
 TRAN_ID                                               NOT NULL NUMBER(10) ---PK
 ACCO_TYPE                                             NOT NULL NUMBER(2)
 ACCO_ID                                               NOT NULL NUMBER(10)
 CRE_DATE                                              NOT NULL DATE
 VC_1                                                  NOT NULL VARCHAR2(100)
 DATE_1                                                         DATE
 NUM_1                                                          NUMBER(2)
 NUM_2                                                          NUMBER(2)

This is a classic parent-child relationship, each account has a set of transactions. I’ve expanded on my prior example by:

  • changing the parent to be called ACCOUNT and giving it a two-part Primary Key, ACCO_TYPE and ACCO_ID.
  • Changing the child to be called TRANSACTION and given it a Primary Key of TRAN_TYPE and TRAN_ID.
  • In a real system I would create a foreign key from TRANSACTION.ACCO_TYPE,ACCO_ID to the ACCOUNT table primary key.

Note that the Primary Key on the TRANSACTION table is NOT based on the account columns. Maybe in theory the primary key on the transaction table would be the account columns and the cre_date – if the cre_date held a datetime AND two records could not be created on the same second.  If we used a timestamp then you might be able to argue no record would be created in the same fraction of a second – except that often transactions get given a fixed time. Midnight springs to mind (consider when you would add the accrued interest on a savings account). So, a new surrogate Primary Key is intoduced, a transaction type and ID. TRAN_TYPE and TRAN_ID are the primary key of the TRANSACTION table.

I’d say that I see such two-part primary keys more often then single column primary keys these days. Possibly because so many databases recevie information from other systems or even applications on the same database.

As before, I create 10,000 parent records (ACCOUNT) and 10,000 random child records (TRANSACTION_HEAP) each day for 100 days. 

Also as before, I want to select information grouped by account. I want all the transactions for an account, not all transactions on a day or for a range of transaction IDs. Hopefully this is a scenario most of you will recognise. 

Selecting a sum of one of the non-indexed columns and a count of records for a given account takes quite a bit of effort on the part of the HEAP table:

select sum(num_1), count(*) from transaction_heap th where acco_type=10 and acco_id=123

SUM(NUM_1)   COUNT(*)
---------- ----------
      1201        116
Elapsed: 00:00:02.68

Execution Plan
---------------------------------------------------------------------------------------
| Id  | Operation          | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                  |     1 |    10 |  3466   (1)| 00:00:52 |
|   1 |  SORT AGGREGATE    |                  |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL| TRANSACTION_HEAP |   100 |  1000 |  3466   (1)| 00:00:52 |
---------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      13929  consistent gets
      13921  physical reads

Of course, it has to do a full table scan as my Primary Key is on two columns that have nothing to do with the query. I can repeat this statement as often as I like, it takes the same number of physical reads and consistent gets as it is not caching the information.

I add an index on the ACCO_TYPE, ACCO_ID and CRE_DATE columns and re-run the query:

select sum(num_1),count(*) from transaction_heap th where acco_type=10 and acco_id=123

SUM(NUM_1)   COUNT(*)
---------- ----------
      1201        116
Elapsed: 00:00:00.01

Execution Plan
---------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                    |     1 |    10 |   103   (0)| 00:00:02 |
|   1 |  SORT AGGREGATE              |                    |     1 |    10 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| TRANSACTION_HEAP   |   100 |  1000 |   103   (0)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | TRHE_ACCO_CRDA_IDX |   100 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        120  consistent gets
          0  physical reads

I ran it twice to get rid of the parse overhead, but the first time it did a load of physical reads to support those 120 consistent gets.

I could recreate the TRANSACTION_HEAP table as an IOT of course – but it will be organized by the TRAN_TYPE and TRAN_ID columns. That is useless to me. Even if I add a secondary index on the ACCO_TYPE, ACCO_ID and CRE_DATE columns it will at best be no better than the above HEAP table and, because the secondary index will hold rowid guesses and will sometimes have to use the primary key information to walk down the index, it will be worse. {I am not sure I have explained that bit yet about row guesses. Post 6?}

So, if you want the information organized in an order that is not helped by the Primary Key of the table, an IOT is useless to you. You cannot achieve that physical record grouping by the IOT method.

I am going to do something else though. I’m going to sort of change the rules to work around the issue.

As far as the physical implementation is concerned, a Primary Key is in effect just a unique index and two rules. The rules are that all the columns in the Primary Key must be mandatory and there can only be one PK on a table. I can have as many unique indexes as I like, so long as the key combinations lead to no duplicate rows. I can alter my Primary Key – it is not set in stone.

Before I go any further I am going to stress that I am about to abuse the concept of the Primary Key. I’d need to do a seperate blog to fully justify saying what a Primary Key is, but part of the concept is that no column must be derivable from other columns in the PK and it must be the minimum number of columns required to make the key unique.

We want to group the data by the account columns and the creation date. So let’s define a Primary Key that is ACCO_TYPE, ACCO_ID, CRE_DATE and whatever else we need to guarantee the key is unique. In our case that would be TRAN_TYPE and TRAN_ID – the current Primary Key! If I knew I would always want all records for the account, I could drop the CRE_DATE out of my fake Primary Key, but I know that the creation date is very often important. You may want activity for the last month, last quarter, a stated date or even an exact datetime. For all those cases, including the CRE_DATE column is highly beneficial.

So, I create TRANSACTION_IOT below and populate it with data.

desc transaction_iot
 Name                                                        Null?    Type
 ----------------------------------------------------------- -------- --------------
 TRAN_TYPE                                                   NOT NULL NUMBER(2)
 TRAN_ID                                                     NOT NULL NUMBER(10)
 ACCO_TYPE                                                   NOT NULL NUMBER(2)
 ACCO_ID                                                     NOT NULL NUMBER(10)
 CRE_DATE                                                    NOT NULL DATE
 VC_1                                                        NOT NULL VARCHAR2(100)
 DATE_1                                                               DATE
 NUM_1                                                                NUMBER(2)
 NUM_2                                                                NUMBER(2)

--
--

OWNER    TABLE_NAME          NUM_ROWS      BLOCKS AVG_L GLS ULS LST_ANL      PRT  SAMP_SIZE
-------- -------------- ------------- ----------- ----- --- --- ------------ --- ----------
MDW      TRANSACTION_IO      1000,000                94 YES NO  160811 23:05 NO     1000000
         T
INDEX_NAME      TYP PRT UNQ BL     L_BLKS   DIST_KEYS       CLUSTF     LB_KEY     DB_KEY LST_ANL
--------------- --- --- --- -- ---------- ----------- ------------ ---------- ---------- ------------
TRIO_PK         IOT NO  UNI  2     21,433    1058,381            0          1          1 160811 23:05
TRIO_TRAN_UQ    NOR NO  UNI  2      4,386    1000,000      999,405          1          1 160811 23:05

INDEX_NAME                   TABLE_NAME       PSN COL_NAME
---------------------------- ---------------- --- ------------------------------------------------
TRIO_PK                      TRANSACTION_IOT  1   ACCO_TYPE
TRIO_PK                      TRANSACTION_IOT  2   ACCO_ID
TRIO_PK                      TRANSACTION_IOT  3   CRE_DATE
TRIO_PK                      TRANSACTION_IOT  4   TRAN_TYPE
TRIO_PK                      TRANSACTION_IOT  5   TRAN_ID
TRIO_TRAN_UQ                 TRANSACTION_IOT  1   TRAN_TYPE
TRIO_TRAN_UQ                 TRANSACTION_IOT  2   TRAN_ID

Now let’s select our data from that IOT.

select sum(num_1),count(*) from transaction_IOT th where acco_type=10 and acco_id=123

SUM(NUM_1)   COUNT(*)
---------- ----------
      1030         97
Elapsed: 00:00:00.00

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |    10 |     5   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |         |     1 |    10 |            |          |
|*  2 |   INDEX RANGE SCAN| TRIO_PK |   100 |  1000 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads

5 consistent gets. It has walked down the IOT and scanned 3 blocks to collect that data. Our IOT based on an abused Primary Key does the job of supporting range scans efficiently, with the benefits to the Block Buffer Cache I refered to in IOT4

That “Primary Key” I created is NOT a real Primary key. It is not the minimum number of columns I need to uniquely identify a column. My Primary key is on ACCO_TYPE, ACCO_ID, CRE_DATE,TRAN_TYPE and TRAN_ID – the account, the datetime of the transaction and the transaction. What if I was to alter the datetime by a second? I could create a record with the same account, the same transaction_id as an existing record but a second into the future. That is just wrong. After all, the whole point of the TRAN_TYPE and TRAN_ID is to uniquely identify a record. If created the new record I stated above, there would be two records for the one TRAN_TYPE/TRAN_ID.

I protect against this ability to create incorrect records by creating a UNIQUE KEY against the table also, against columns TRAN_TYPE and TRAN_ID. This is unique index TRIO_TRAN_UQ as displayed in the information above. A Primary Key is usually the referenced parent of any referential integrity, ie foreign keys, between this table and any children. However, a Unique Key can also be the target of Referential Integrity. I cannot create a record in TRANSACTION_IOT with the same TRAN_TYPE/TRAN_ID as already exists due to this unique constraint:

insert into transaction_iot_p
values
(2,163 -- existing transaction type and id
,10,11111
,sysdate,'ASCAFWEWEHGWSHERJH',SYSDATE,7,7)
/

insert into transaction_iot_p
*
ERROR at line 1:
ORA-00001: unique constraint (MDW.TIP_TRAN_UQ) violated

Elapsed: 00:00:00.34

So, I have my IOT to support querying code and I have my Unique Constraint to police my original Primary Key and be used as the target for any Foreign Key requirements I might need. This is not a perfect solution – the design will look a little strange to anyone who looks at this database and the Unique Key is supported by a secondary index on an IOT which can have some issues. But it does work.

My “primary key” is no longer a true Primary Key. It is just a tool for allowing me to organise the data physically in a way that will support my application. That is what I meant about changing the rules.

I am willing to abuse a Primary Key in this way because of the performance benefits. It is a solution for a system where most of the query access is against a set of records which would be scatter-gunned across a table if you did not use some sort of physical grouping. If you are reading this and thinking “oh, I am not sure about you doing that to a Primary Key Martin” then you are probably OK to consider this solution. If you can’t see a problem with it then you are either very used to turning off referential integrity and understand the consequences – or you simply do not understand what RI does for your database. If you are in the latter camp, do not even consider doing this. If you are one of those people who works on data warehouse and for whom is it just part of the DW process to turn off RI as that is what you do for data warehouses – DON’T do this!

OK, I’m nearly at the end of this topic but I want to touch on partitioning. You can range partitition an Index Organized Table from 9i I think. It is certainly supported in Oracle 10 upwards. Partitioning is important in this technique because a unique index must contain the partition key if the index is to be locally partitioned – otherwise the index must be global, ie the one index object references all the partitions across the table.

Below is my table creation statement for the IOT organized by the account, creation date and transaction. The table is ranged partitioned by CRE_DATE, into months.

create table transaction_IOT_P
(tran_type number(2)     not null
,tran_id   number(10)    not null
,acco_type number(2)     not null
,acco_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint tip_pk primary key(ACCO_TYPE,ACCO_ID,CRE_DATE,TRAN_TYPE,TRAN_ID)
--  using index tablespace index_01
,constraint tip_tran_uq unique (TRAN_TYPE,TRAN_ID)
  using index tablespace index_01
)
organization index
tablespace data_01
partition by range  (cre_date)
(partition rm20110601 values less than (to_date('01-06-2011','DD-MM-YYYY'))
  tablespace data_01
,partition rm20110701 values less than (to_date('01-07-2011','DD-MM-YYYY'))
  tablespace data_01
,partition rm20110801 values less than (to_date('01-08-2011','DD-MM-YYYY'))
  tablespace data_01
,PARTITION RMTOP  VALUES LESS THAN (MAXVALUE)
  tablespace USERS
)
/

You can see the definition of my fake Primary Key and the fact that it does not have a tablespace defined for it – as the ‘organization index’ statement lower down causes the table to be an IOT and the segment will go into the “table” tablespace.
I then state my Unique Index to police the integrity of my table – TIP_TRAN_UQ
I then state the partition clause, ‘partition by range (cre_date)’ followed by my initial partition definitions. It’s as simple as that to partition an IOT.

What gets created? A set of four segments for the IOT, which are primary key index segments of course, not table segments:

@seg_dets
Enter value for seg_name: tip_pk
Enter value for owner: mdw

OWNER    SEG_NAME        SEG TS_NAME     BYTES_K    BLOCKS exts   INI_K   NXT_K
-------- --------------- --- -------- ---------- --------- ---- ------- -------
MDW      TIP_PK RM201106 IP  DATA_01      45,056     5,632   59      64    1024
         01
MDW      TIP_PK RM201107 IP  DATA_01      60,416     7,552   74      64    1024
         01
MDW      TIP_PK RM201108 IP  DATA_01      61,440     7,680   75      64    1024
         01
MDW      TIP_PK RMTOP    IP  USERS        34,816     4,352   49      64    1024

Note that the SEG (type) is “IP” – my script decodes the type into a short mnemonic and IP is Index Partition. You can see the tablespaces those segments are in and the size of the segments. What about that unique index I created?

@seg_dets
Enter value for seg_name: tip_tran_uq
Enter value for owner: mdw

OWNER    SEG_NAME        SEG TS_NAME     BYTES_K    BLOCKS exts   INI_K   NXT_K
-------- --------------- --- -------- ---------- --------- ---- ------- -------
MDW      TIP_TRAN_UQ     IND INDEX_01     35,840     4,480   50      64    1024

It is a single segment, a normal index. I cannot have it as a locally partitioned index as it is a unique index and lacks the partitioning key in it’s definition.

This could be a problem. The usual reason you partition a table is because it is too large to comfortably be held as a single segment {and also for the benefit of partition exclusion, but you don’t usually need that on small tables!}. This means that the global index to support that primary key is going to be large. Now, I made a “mistake” when I created my partitioned IOT – I did not create a partition for this month, some data has gone into the MAXVALUE partition (see the size of the segment above, 34K and 49 extents). If I split that last partition to create a new partition for this month and a new MAXVALUE partition, I will invalidate the global index and I will have to rebuild it. Very large indexes can take a long time and a heck of a lot of temporary space to gather and sort the data. That could be an ongoing maintenance nightmare.

In a recent implementation I did using IOTs I did not create a global unique index to replace the original foreign key. I create a non-unique, locally partitioned index to support some queries using those columns and the table had no children so no Foreign Keys were needed. But there was something else I needed to do as I had removed the referential integrity rules for that table. Remember I sad I am a fan of database enforced referential integrity? Now I “know” the application will not create data that will break the removed Primary Key rule, I “know” I documented what I had done. And I know that in 12 months time there will almost certainly be data that will have duplicate values for that Primary Key if it is not enforced somehow, because it always happends. I need to implement a little script to regularly check for duplicate TRAN_TYPE/TRAN_ID conmbinations being created. If you remove RI from a relational database, you should replace it in some way. Otherwise, you will pretty soon have a non-relational database.

That’s it for this topic. The below is my example script for creating most of the above, in case anyone wants it or wants to verify what I have said.

-- test_iot2.sql
-- create test tables to show how you can work around the PK issue and
-- partition an IOt - and the possible impact on my PK workaround.
spool test_iot2.lst
--
set feed on timi on pause off
--
drop table account purge;
drop table transaction_heap purge;
drop table transaction_iot purge;
drop table transaction_iot_p purge;
--
-- create 10,000 parent records
create table mdw.account
(ACCO_type  number(2)     not null
,ACCO_id       number(10)    not null
,name     varchar2(100) not null
,date_1   date          not null
,num_1    number(2)
,num_2    number(2)
,constraint ACCO_pk primary key(ACCO_type,ACCO_id)
 using index tablespace index_01
)
tablespace data_01
/
insert into account
select 10
,rownum
,dbms_random.string('U',mod(rownum,10)+50)
,sysdate-(mod(rownum,500)+1000)
,mod(rownum,99)+1
,trunc(dbms_random.value(0,100))
from dual connect by level <= 5000
/
insert into account
select 15
,rownum
,dbms_random.string('U',mod(rownum,10)+50)
,sysdate-(mod(rownum,500)+1000)
,mod(rownum,99)+1
,trunc(dbms_random.value(0,100))
from dual connect by level <= 5000
/
--
-- create the table to hold the children as a heap table
create table transaction_heap
(tran_type number(2)     not null
,tran_id   number(10)    not null
,ACCO_type number(2)     not null
,ACCO_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint trhe_pk primary key(tran_type,tran_id)
 using index tablespace index_01
)
tablespace data_01
/
--
create index trhe_ACCO_crda_idx
on transaction_heap(ACCO_type,ACCO_id,cre_date)
tablespace index_01
/
-- populate the Heap table
-- 100 days, 10000 people
declare
v_num number :=10000; -- number of people
v_str varchar2(60);
begin
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..100 loop --days to do
  v_str:=dbms_random.string('U',60);
  insert into transaction_heap
    (tran_type,tran_id,ACCO_type,ACCO_id,cre_date,vc_1,date_1,num_1,num_2)
  select mod(rownum,3)+1
   ,((i-1)*v_num)+rownum
   , 5+(trunc(dbms_random.value(1,3))*5)
   ,trunc(dbms_random.value(1,v_num/2))
   ,sysdate-(100-i) + (rownum/(60*60*24) )
   ,substr(v_str,1,51+mod(rownum,10))
   ,sysdate-(100-i) + ((mod(rownum,30)+1)/3)
   ,mod(rownum,20)+1
   ,mod(rownum,99)+1
  from dual connect by level <=v_num;
end loop;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/
--
--
--
create table transaction_IOT
(tran_type number(2)     not null
,tran_id   number(10)    not null
,ACCO_type number(2)     not null
,ACCO_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint trio_pk primary key(ACCO_type,ACCO_id,cre_date,tran_type,tran_id)
--  using index tablespace index_01
,constraint trio_tran_uq unique (tran_type,tran_id)
  using index tablespace index_01
)
organization index
tablespace data_01
/
--
-- populate the IOT table
-- 100 days, 10000 people
declare
v_num number :=10000; -- number of people
v_str varchar2(60);
begin
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..100 loop --days to do
  v_str:=dbms_random.string('U',60);
  insert into transaction_IOT
    (tran_type,tran_id,ACCO_type,ACCO_id,cre_date,vc_1,date_1,num_1,num_2)
  select mod(rownum,3)+1
   ,((i-1)*v_num)+rownum
   , 5+(trunc(dbms_random.value(1,3))*5)
   ,trunc(dbms_random.value(1,v_num/2))
   ,sysdate-(100-i) + (rownum/(60*60*24) )
   ,substr(v_str,1,51+mod(rownum,10))
   ,sysdate-(100-i) + ((mod(rownum,30)+1)/3)
   ,mod(rownum,20)+1
   ,mod(rownum,99)+1
  from dual connect by level <=v_num;
end loop;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/
create table transaction_IOT_P
(tran_type number(2)     not null
,tran_id   number(10)    not null
,ACCO_type number(2)     not null
,ACCO_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint tip_pk primary key(ACCO_type,ACCO_id,cre_date,tran_type,tran_id)
--  using index tablespace index_01
,constraint tip_tran_uq unique (tran_type,tran_id)
  using index tablespace index_01
)
organization index
tablespace data_01
partition by range  (cre_date)
(partition rm20110601 values less than (to_date('01-06-2011','DD-MM-YYYY'))
  tablespace data_01
,partition rm20110701 values less than (to_date('01-07-2011','DD-MM-YYYY'))
  tablespace data_01
,partition rm20110801 values less than (to_date('01-08-2011','DD-MM-YYYY'))
  tablespace data_01
,PARTITION RMTOP  VALUES LESS THAN (MAXVALUE)
  tablespace USERS
)
/
-- populate the IOT_P table
-- 100 days, 10000 people
declare
v_num number :=10000; -- number of people
v_str varchar2(60);
begin
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..100 loop --days to do
  v_str:=dbms_random.string('U',60);
  insert into transaction_IOT_P
    (tran_type,tran_id,ACCO_type,ACCO_id,cre_date,vc_1,date_1,num_1,num_2)
  select mod(rownum,3)+1
   ,((i-1)*v_num)+rownum
   , 5+(trunc(dbms_random.value(1,3))*5)
   ,trunc(dbms_random.value(1,v_num/2))
   ,sysdate-(100-i) + (rownum/(60*60*24) )
   ,substr(v_str,1,51+mod(rownum,10))
   ,sysdate-(100-i) + ((mod(rownum,30)+1)/3)
   ,mod(rownum,20)+1
   ,mod(rownum,99)+1
  from dual connect by level <=v_num;
end loop;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/
commit;
--
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'ACCOUNT')
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'TRANSACTION_HEAP')
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'TRANSACTION_IOT')
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'TRANSACTION_IOT_P')
--
select * from transaction_iot_p
where rownum < 10
/
insert into transaction_iot_p
values
(2,163 -- existing transaction type and id
,1,11111
,sysdate,'ASCAFWEWEHGWSHERJH',SYSDATE,7,7)
/
insert into transaction_iot_p
values
(3,163 -- new transaction type and id
,1,11111 -- but the whole of the rest of the record is the same.
,sysdate,'ASCAFWEWEHGWSHERJH',SYSDATE,7,7)
/
--
BEGIN
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
END;
/
--
spool off

IOT part 3 – Significantly Reducing IO

<..IOT1 – the basics
<….IOT2 – Examples and proofs
IOT4 – Boosting Buffer Cache Efficiency..>
IOT5 – Primary Key issues….>

In the previous two posts I covered the basics of Index Organized Tables (IOTs) and then created some test tables to demonstrate the benefit of IOTs that is most often covered – reducing the IO needed to get a single record by one IO, say from 5 to 4. {Whether this is a buffer get from memory or a disc IO depends on if the block is cached, of course}.

In this post I am going to show how IOTs can far more significantly reduce the IO when several related rows are required.

Below is one of my test tables, the one that is a normal heap table and has a primary key, CHHE_PK on PARE_ID and CRE_DATE:

mdw11> desc child_heap
 Name                                      Null?    Type
 ----------------------------------------- -------- --------------
 PARE_ID                                   NOT NULL NUMBER(10)
 CRE_DATE                                  NOT NULL DATE
 VC_1                                      NOT NULL VARCHAR2(100)
 DATE_1                                             DATE
 NUM_1                                              NUMBER(2)
 NUM_2                                              NUMBER(2)

--
mdw11> select count(*),count(distinct(pare_id)) from child_heap

  COUNT(*) COUNT(DISTINCT(PARE_ID))
---------- ------------------------
   1000000                     9999

As you can see, the table has 1 million records and 9,999 values for PARE_ID, there are approx 100 records per parent. The data was created to match a common situation – that of a bit of data coming in for each parent every day. See post 2 for details.

The result of this is that the data for any given parent is scattered through the table. As the data comes in for a given day, the data for the first parent is added to the end of the table, followed by all the data for all the other parents who have data that day. The next day this is repeated, so the child records for a given parent are interspersed with the child records for many other parents.

The below diagram demonstrate what will now happen if you issue a statement like
select *
from CHILD_HEAP
where PARE_ID=12

Oracle quickly works down the index to the leaf block containing the first key that matches the range. This takes, in my example, 4 block reads. Oracle now works through the index entries and, via the rowid, identifies the exact block to visit in the table for each key. For each key it has to visit a new block – because the data is scattered through the table. This is what the clustering_factor in the index statistics is measuring, how often contiguous rows in the index are for the same block. In our case, almost never.
In my diagram I do not number those table reads but in my simplistic diagram it would be 10 further reads.
If Oracle reaches the end of the leaf block before it reaches the end of the range of key values, oracle follows the pointer in the leaf block (not shown) to the next leaf block (whcih is another block read) and continues working through the keys until the range scan is completed.

In my simplified diagram I only have 6 entries per leaf block. In reality, and in my example tables, this is more like a few hundred. 247 in the case of CHHE_PK.

Now let’s consider my Index Organized Table, CHILD_IOT. It has exactly the same columns as CHILD_HEAP and the data was created in the same way. However, because it is an IOT, as the data came in it was inserted into the primary key index and is thus in an ordered state.

The below diagram demonstrate what will now happen if you issue a statement like
select *
from CHILD_IOT
where PARE_ID=12

Oracle works down the index to the leaf block where the range scan begins and now simply works along the leaf blocks. There is no need to go and visit the table as there is no table.

In my IOT diagram the leaf entries are longer and there are fewer in each leaf block, ie 5. So my scan has to visit 3 leaf blocks rather than 2. In reality the difference is more pronounced, in my example table there are actually 56 rows per leaf block, compared to the 247 in the index on the heap table. As such, my scan on the IOT will cover more leaf blocks but this is insignificant compared to the reduction in block visits caused by not having to go hunt down records scattered over the table. Even in the unlikely event of my IOT being deeper by 1 level (an extra layer of branch blocks) due to the reduces entries per leaf block, I would still be winning for range scans.

That is all nice theory and pictures. As ever, we need to back this up with some real tests. Firstly, I am using SQL*Plus and I need to set my arraysize large enough so that I do not introduce extra consistent gets through selecting small sets of rows between client and server. You will need to do the same to see similar results to me.
{I keep meaning to do a dedicated post on arraysize but H.Tonguç YIlmaz has a nice post already on it.}

set arraysize 200
set autotrace on

Now I will select all the records for PARE_ID=10, including a column not in the Primary Key, so that the table needs to be visited. I did this twice to remove the parsing overhead:

select pare_id,cre_date,vc_1
from child_heap
where pare_id =10
order by cre_date

   PARE_ID CRE_DATE  VC_1
---------- --------- -----------------------------------------------------------------------
        10 17-APR-11 LDOBKMLCYCSQYBDFIUISJWQAHNYSQOSUQJKIGCSEJHDPOFFLHHXYSMDSQNUB
        10 18-APR-11 LBGDNOYQFQMTMJQRAUWSRNBTHQSKBEUVLZSFWEGULOPDXQSVXOIC
        10 18-APR-11 LBGDNOYQFQMTMJQRAUWSRNBTHQSKBEUVLZSFWEGULOPDXQSVXOICOSFTSYNO
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJKZNII
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJ
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJ
        10 20-APR-11 USIGVSPPIUUXEIRBMPFNBTTMDUJTVITHKQWZAKZOMJEDZCUPQAEFQQEYM
        10 20-APR-11 USIGVSPPIUUXEIRBMPFNBTTMDUJTVITHKQWZAKZOMJEDZCUPQAEF
...
        10 19-JUL-11 BNOYCIDTFJHPPOYPSVAVKJSYUNVPGPHLJXUOIKYKASKHYGZNVHVFFGPVAKN
        10 25-JUL-11 HDFGAQWTYZBSVYVXTFFRDIAKRYWFUPFCNDCETHUWHSQUITHHVUEJTJ

82 rows selected.

Execution Plan
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |   100 |  6900 |   103   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |   100 |  6900 |   103   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | CHHE_PK    |   100 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         86  consistent gets
          0  physical reads

82 rows collected and 86 consistent gets. That will be 4 consistent gets to process the index blocks and 82 for the table blocks.

Now let’s repeat that on the IOT:

select pare_id,cre_date,vc_1
from child_IOT
where pare_id =10
order by cre_date
mdw11> /
any key>

   PARE_ID CRE_DATE  VC_1
---------- --------- ------------------------------------------------------------
        10 17-APR-11 QJHQXTQAYEUICPNDQTYMMFZPWJSIDLBKOXYTHLEHKTVWUPKQMWUUX
        10 18-APR-11 BUTUEWDCDQVPLTPPRFGBBEDOZYRPERPRROVUQPTSRZLHKVBSBUEAMZYAS
        10 18-APR-11 BUTUEWDCDQVPLTPPRFGBBEDOZYRPERPRROVUQPTSRZLHKVBSBUEAMZY
        10 19-APR-11 DEGNPALVLMIDYCYIQIIQJJVZFTNIMEULMAGDEWVTOAKBNHOPUQJE
        10 19-APR-11 DEGNPALVLMIDYCYIQIIQJJVZFTNIMEULMAGDEWVTOAKBNHOPUQJ
...
        10 24-JUL-11 TJGLOEITTVXQTQPHSKGVERSGJDREYSKKCDUFMQXQVXMHMMDWPLJNSNK
        10 24-JUL-11 TJGLOEITTVXQTQPHSKGVERSGJDREYSKKCDUFMQXQVXMHMMDWPLJNSNKCN
        10 25-JUL-11 BCLLVPYMWAAQOVLILXARQZXEGAQAARPURIFKFKHROUSFORRYYXQZUAJHDBL

108 rows selected.

Execution Plan
----------------------------------------------------------------------------
| Id  | Operation        | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |         |   100 |  6900 |     4   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| CHIO_PK |   100 |  6900 |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads

We actually gathered more data, 108 rows compared to 82, all for 6 consistent gets compared to 86 consistent gets. That is a reduction to less than 10% of the original effort.

Now for a more extreme test. I am going to select a single row summary of data for 10 parents, flushing the cache between each run to show the impact when you have to do real IO to support those consistent gets. This is on a fairly old {4 years} laptop with a rather tired hard disc

alter system flush buffer_cache

System altered.

Elapsed: 00:00:00.18

--
--

select count(*),sum (num_1)
from child_heap
where pare_id between 50 and 60

  COUNT(*) SUM(NUM_1)
---------- ----------
      1155      12031

Elapsed: 00:00:06.39

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |     1 |     7 |  1203   (0)| 00:00:18 |
|   1 |  SORT AGGREGATE              |            |     1 |     7 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |  1200 |  8400 |  1203   (0)| 00:00:18 |
|*  3 |    INDEX RANGE SCAN          | CHHE_PK    |  1200 |       |     7   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1157  consistent gets
       1112  physical reads

--
--

alter system flush buffer_cache

System altered.

Elapsed: 00:00:00.18

--
--

select count(*),sum (num_1)
from child_iot
where pare_id between 50 and 60

  COUNT(*) SUM(NUM_1)
---------- ----------
      1111      11528

Elapsed: 00:00:00.29

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |     7 |    24   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |         |     1 |     7 |            |          |
|*  2 |   INDEX RANGE SCAN| CHIO_PK |  1200 |  8400 |    24   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         25  consistent gets
         25  physical reads

The Heap took 6.39 seconds and required 1112 physical reads to support 1157 consistent gets.
The IOT took 0.29 seconds and required 25 physical reads to support 25 consistent gets.

I think we can all see that IOTs have the potential to greatly reduce physical and logical IO. Perhaps we should all be using IOTs more.

Final point. The Heap version took less physical reads than consistent gets as some blocks read into the block buffer cache held data required later in the query.

The impact of IOTs on the buffer cache will be the topic of my next post on IOTs. I think { hope:-) } that many of you will be very interested and impressed by what you could gain…

IOT 2 – First examples and proofs

<.. IOT1 – Basics
IOT3 – Great reductions in IO for IOTs..>
IOT4 – Boosting Buffer Cache Efficiency….>
IOT5 – Primary Key issues……>

In my first post on IOTs I ran through the basics of what they are. Here I am going to create some test tables and show you a few things.

I am going to create a simple PARENT table with 9,999 records and then two CHILD tables. CHILD_HEAP, a normal table, and CHILD_IOT, an Index Organized Table. They have the same columns and will hold very similar data.

All of this is on Oracle 11.1 but is exactly the same on 10.2. 8K block size, tablespaces are auto segment space managed.

Here are the creation statements:

--first create the parent table, keyed by ID.
-- The other columns are not significant, they just represent "information"
create table mdw.parent
(id       number(10)    not null
,name     varchar2(100) not null
,date_1   date          not null
,num_1    number(2)
,num_2    number(2)
,constraint pare_pk primary key(id)
 using index tablespace index_01
)
tablespace data_01
/
--
--Now put my 9999 parents into the table.
insert into parent
select rownum
,dbms_random.string('U',mod(rownum,10)+50)
,sysdate-(mod(rownum,500)+1000)
,mod(rownum,99)+1
,trunc(dbms_random.value(0,100))
from dual connect by level < 10000
/
--
-- create the table to hold the children as a heap table
create table child_heap
(pare_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint chhe_pk primary key(pare_id,cre_date)
 using index tablespace index_01
)
tablespace data_01
/
--
-- create the table to hold the children as an IOT table
create table child_iot
(pare_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint chio_pk primary key(pare_id,cre_date)
-- using index tablespace index_01 -- CANNOT STATE for IOT. State in table definition
)
ORGANIZATION INDEX -- This is it. This makes the table an IOT
tablespace data_01
/

There are only two differences between the statements creating the CHILD_HEAP and the CHILD_IOT tables.

The main one is the inclusion of the line ORGANIZATION INDEX and is what instructs Oracle to create the table as an IOT. Note that it does not state the index and you cannot state the index. The IOT is created based on the Primary Key.
The other change is that you now cannot state the tablespace for the Primary Key index. I’ve not played with this at all but I don’t think you can state anything with the “using index” as the table storage clauses are used for the Primary Key index. I personally find this a little illogical as it is the index segment that is created, but I guess others would find it more natural that you still state this at the table level.

When I create IOTs on a real system, I put the IOT in a table tablespace {I still maintain table and index tablespaces, for reasons I won’t go into here}. I put it there as it holds the actual data. If I lose that Primary Key index I am losing real data, not duplicated data.

I then populated the two CHILD tables with data. The method of creating this test data is very important.

I am simulating a very common situation, where data is coming in for a set of Parents (think customers, accounts, scientific instruments, financial entities) and the data is coming in as a record or set of records each day. ie not where the parent and all of it’s child records are created at one time, like an order and it’s order lines. I am simulating where the child data is created a few records at a time, not all in one go.

The code is simple. it loops for one hundred days and for each day it creates 10,000 records for random parents. On each day any given parent will have none, one or several records. On average, each parent will end up with 100 records, but some will have more and some less. The key thing is that the data for any given parent is created a record at a time, with lots of records created for other parents before the next record for that given parent.

The two tables will have the same pattern of data but not identical data. {I could have seeded the random number generator to make the two data sets the same but this will do}. Below is the statement for one table, you just change the table name to populate each table. {BTW I like using the from dual connect by level <=x method of getting the number of rows desired – it is fast and is neat, once you have seen it once}.

declare
v_num number :=10000; -- number of people
v_str varchar2(60);
begin
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..100 loop --days to do
  v_str:=dbms_random.string('U',60);
  insert into CHILD_HEAP
    (pare_id,cre_date,vc_1,date_1,num_1,num_2)
  select
    trunc(dbms_random.value(1,v_num))
   ,sysdate-(100-i) + (rownum/(60*60*24) )
   ,substr(v_str,1,51+mod(rownum,10))
   ,sysdate-(100-i) + ((mod(rownum,30)+1)/3)
   ,mod(rownum,20)+1
   ,mod(rownum,99)+1
  from dual connect by level <=v_num;
end loop;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/

I then gathered objects stats on the tables.
Let’s check the size of the tables:

select segment_name, segment_type,tablespace_name,blocks
from dba_segments where owner=USER and segment_name like 'CHILD%';

SEGMENT_NAME    SEGMENT_TYPE    TABLESPACE_NAME     BLOCKS
--------------- --------------- --------------- ----------
CHILD_HEAP      TABLE           DATA_01              12288

1 row selected.

ONE row? Where is the other table, where is CHILD_IOT? It does not exists.

Remember from my first post that I made the comment I would have prefered it if Index Organized Tables had been called something like ‘Table Containing Indexes’? The table data has been placed in the Primary Key index and the table segment does not even exist. If you start using IOTs this will catch you out periodically – it does me anyway and I’ve been using them on and off for years :-) .

Let’s look at the size of the primary key indexes:

select segment_name, segment_type,tablespace_name,blocks
from dba_segments where owner=USER and segment_name like 'CH%PK'
and segment_name not like '%ORD%'

SEGMENT_NAME    SEGMENT_TYPE    TABLESPACE_NAME     BLOCKS
--------------- --------------- --------------- ----------
CHHE_PK         INDEX           INDEX_01              4224
CHIO_PK         INDEX           DATA_01              19456

2 rows selected.

Note that the Primary Key index for CHILD_HEAP, CHHE_PK, is there and is 4,224 blocks in size, and the CHILD_IOT Primary Key, CHIO_PK, is a lot larger at 19,456 blocks. In fact, not only is the CHIO_PK index larger than the CHILD_HEAP table, it is larger than the combined size of the CHILD_HEAP table and CHHE_PK index combines. So much for me saying last post that IOTs can save disk space? I’ll come back to that in a later post…

Here are some other stats from one of my scripts:

mdw11> @tab_sci_own
owner for Table: mdw
Name for Table: child_heap

OWNER    TABLE_NAME          NUM_ROWS      BLOCKS AVG_L GLS ULS LST_ANL      PRT  SAMP_SIZE
-------- -------------- ------------- ----------- ----- --- --- ------------ --- ----------
MDW      CHILD_HEAP          1000,000      12,137    83 YES NO  250711 22:01 NO     1000000

INDEX_NAME      TYP PRT UNQ BL     L_BLKS   DIST_KEYS       CLUSTF     LB_KEY     DB_KEY LST_ANL
--------------- --- --- --- -- ---------- ----------- ------------ ---------- ---------- ------------
CHHE_PK         NOR NO  UNI  2      4,034    1000,000      995,857          1          1 250711 22:02

INDEX_NAME                   TABLE_NAME       PSN COL_NAME
---------------------------- ---------------- --- ------------------------------------------------
CHHE_PK                      CHILD_HEAP       1   PARE_ID
CHHE_PK                      CHILD_HEAP       2   CRE_DATE

--
--
owner for Table: mdw
Name for Table: child_iot

OWNER    TABLE_NAME          NUM_ROWS      BLOCKS AVG_L GLS ULS LST_ANL      PRT  SAMP_SIZE
-------- -------------- ------------- ----------- ----- --- --- ------------ --- ----------
MDW      CHILD_IOT           1000,000                83 YES NO  250711 22:03 NO     1000000

INDEX_NAME      TYP PRT UNQ BL     L_BLKS   DIST_KEYS       CLUSTF     LB_KEY     DB_KEY LST_ANL
--------------- --- --- --- -- ---------- ----------- ------------ ---------- ---------- ------------
CHIO_PK         IOT NO  UNI  2     17,855     910,881            0          1          1 250711 22:03

INDEX_NAME                   TABLE_NAME       PSN COL_NAME
---------------------------- ---------------- --- ------------------------------------------------
CHIO_PK                      CHILD_IOT        1   PARE_ID
CHIO_PK                      CHILD_IOT        2   CRE_DATE

Note the lack of BLOCKS for the CHILD_IOT table and the CLUSTERING_FACTOR of 0 for the CHIO_PK.

The clustering factor is the number of times Oracle, when scanning the whole index in order, would have to swap to a different Table block to look up the table record for each index entry. If it is close to the number of blocks in the table, then the clustering factor is low and the order of records in the table matches the order of entries in the index. This would make index range scans that need to visit the table reasonably efficient.

If the clustering factor is close to the number of records in the table then it means there is no correlation between index order and table row order and such index ranges scans that have to visit the table would be inefficient. Again, this is significant and will be the major topic of the next post.

The depth of the index does not change, being 3 in each case (BL or blevel 2)

So, can we see evidence of the theoretical efficiency of looking up single records via the IOT that I mentioned in the fist post? Here we go {oh, usual disclaimer, I run the code twice and show the second run, to remove the parsing overhead}:

-- First the Heap table
select * from child_HEAP where PARE_ID=1234
AND cre_date=to_date('24-JUN-11 20:13:21','DD-MON-YY HH24:MI:SS')

   PARE_ID CRE_DATE  VC_1
---------- --------- ------------------------------------------------------
DATE_1         NUM_1      NUM_2
--------- ---------- ----------
      1234 24-JUN-11  LUTFHOCIJNYREYICQNORREAJOVBRIHFVLXNIGIVZDMFJCTGYFWC
25-JUN-11         11         16
1 row selected.

Execution Plan
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |    83 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |     1 |    83 |     3   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | CHHE_PK    |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets

--and now the IOT table

select * from child_IOT where PARE_ID=1234
AND cre_date=to_date('24-JUN-11 21:23:41','DD-MON-YY HH24:MI:SS')

   PARE_ID CRE_DATE  VC_1
---------- --------- -------------------------------------------------------
DATE_1         NUM_1      NUM_2
--------- ---------- ----------
      1234 24-JUN-11
CSIGBHSXWNDDTCFRCNWYPRNLEQWPCRYTXQQZHACDEXHOBEYXLNYBHRUHJ
27-JUN-11          7         52
1 row selected.

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |    83 |     2   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| CHIO_PK |     1 |    83 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets

{I had to look up the exact values of CRE_DATE of a couple of records to do the above queries}

To look up a single row with the heap table you can see that the explain plan was to carry out a unique scan on the primary key and then look up the row via the rowid and took 4 consistent gets. 3 to walk down the index and get the rowid, one to look up the row block.

For the IOT table the explain plan reveals that there was simply an index unique scan of the Primary Key, nothing more. All data for the row was there in the index entry rather than the rowid. Thus only 3 consistent gets were required.

For single row lookups on the Primary Key, IOTS are more efficient than traditional Heap tables with a Primary Key index. {Please, no one point out that if all the columns you need are in the index you also do not need to go to the table, that is a different topic}.

Quite a few people have shown this efficiency before but the next step is far, far more interesting and shows a much more significant impact of IOTs. That is the topic of the next post :-) .

For now, I am going to finish off with what happens with range scans as I suggested they could slow down with an IOT.
Below, I select count(*) for just one of the parent values.

select count(*) from child_heap where pare_id = 2

  COUNT(*)
----------
        98

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |     4 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |         |     1 |     4 |            |          |
|*  2 |   INDEX RANGE SCAN| CHHE_PK |   100 |   400 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets

--
--

select count(*) from child_iot where pare_id = 2

  COUNT(*)
----------
        93

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |     4 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |         |     1 |     4 |            |          |
|*  2 |   INDEX RANGE SCAN| CHIO_PK |   100 |   400 |     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets

Both statements carry out a range scan on the Primary Key of the table. For the normal HEAP table this takes 3 consistent gets, which is no suprise as we have an 8k block size and only 100 rows for a given parent, they happen to fit into one block of the index. So Oracle works down the depth of the index and looks at one block.

For the IOT the scan works down the index but has to scan three blocks. Even though there are fewer entries, 93 compared to 98, they span three blocks and thus the total number of consistent gets is 5.

Admittedly I was a little lucky in my example above. Sometimes the entries for one parent will scan 2 blocks for the heap table’s Primary Key and occasionally the entries for the IOT will fit into 2 blocks. But if you look at the number of leaf blocks in the earlier stats (4,034 for the normal and 17,855 for the IOT, both for 10,000 entries) usually the 100 or so entries for single parent in the normal index will all fall into one block and the entries for the IOT will fall into between 2 and 3 blocks.

A select count(*) will full scan the smallest segment that can satisfy the query. Let’s try it:

mdw11> select count(*) from child_heap

  COUNT(*)
----------
   1000000

Execution Plan
-------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |     1 |   989   (1)| 00:00:15 |
|   1 |  SORT AGGREGATE       |         |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| CHHE_PK |  1000K|   989   (1)| 00:00:15 |
-------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          2  db block gets
       4109  consistent gets
       4088  physical reads

mdw11> select count(*) from child_iot

  COUNT(*)
----------
   1000000

Execution Plan
-------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |     1 |  4359   (1)| 00:01:05 |
|   1 |  SORT AGGREGATE       |         |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| CHIO_PK |  1000K|  4359   (1)| 00:01:05 |
-------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      19298  consistent gets
      19246  physical reads

The number of consistent gets (and physical reads) are close to the number of leaf blocks in the two segments, though higher. This is because Oracle is scanning the whole index, leaf blocks and branch blocks. The scan is far more expensive for the IOT, simply as the index is so much larger. I’ve not shown timings but on my little laptop, the count(*) takes about 3 seconds on CHILD_HEAP and about 5 seconds on the CHILD_IOT.

That is enough for one post.


Index Organized Tables – the Basics.

IOT2 – Examples and proofs..>
IOT3 – Greatly reducing IO with IOTs….>
IOT4 – Boosting Buffer Cache Efficiency……>

I think Index Organized Tables(IOTs) are a much under-used and yet very useful feature of Oracle. Over the next few postings I’m going to cover some aspect of Index Organised Tables, both good and not-so-good. I am going to cover some benefits of IOTs that I think many people are unaware of. In this first post I am just going to run through the basics of IOTs.

The idea behind an IOT is simple. You hold all the data for the table in the ordered structure of an index. Why would you want to do that? Let us consider a very common requirement, accessing a row in a “large” table via a known, unique key.

Traditionally you have a heap table holding the data you want to access and a standard index to support access to that table. See the first diagram below. The 4-layer triangle represents the index, with a root block, two levels of branch blocks and then the leaf blocks at the “bottom”. The blue rectangle represents the table with the squares being individual rows. Of course, in a large table there would be thousands or millions of “squares”, this is just a simple diagram to show the idea.

When you issue a SQL statement to select the row via the indexed column(s) then oracle will read the root block (1), find the relevent block in the first level of branch blocks (2), then the relevant block in the second level of branch blocks (3) and finally (as far as the index is concerned) the relevant Leaf Block for the unique key. The leaf block holds the indexed column(s) and also the rowid. The rowid is the fastest way to look up a record, it states the file, block and row offset for the row. This allows oracle to go straight to the block and get the row. That is read number (5).
The number of branch blocks {and thus the number of blocks that need to be read to find a row} will vary depending on how much data is indexed, the number and size of the columns in the index, how efficiently the space has been used in the blocks and one or two other factors. In my experience most indexes for tables with thousands or millions of rows have one, two or three levels of branch blocks.

The second diagram shows a representation of the Index Organized Table. The table has in effect disappeared as a distinct object and the information has been moved into the leaf blocks of the index {part of me feels Index Organized Tables should really be called Table Organized Indexes or Table Containing Indexes as that would better indicate what is physically done}:

So with the IOT oracle reads the root block (1), the two branch level blocks (2 and 3) and finally the leaf block (4). The leaf block does not hold the rowid but rather the rest of the columns for the table {this can be changed, a more advanced feature allows you to store some or all the extra columns in an overflow segment}. Thus to access the same data, Oracle has to read only 4 blocks, not 5. Using an IOT saves one block read per unique lookup.

This saving of block reads is probably the main feature that IOTs are known for, but there are others which I will cover in later posts. Two things I will mention now is that, firstly, the use of IOTs is potentially saving disc space. An index is in effect duplication of data held in the table. When you create an index no new information is created but space is used up holding some of the table information in a structure suitable for fast lookup. Secondly, the index and table have to be maintained whenever a change is made to the columns that are indexed. IOTs reduce this maintenance overhead as there is only one thing to maintain.

Now for some drawbacks.

  • The IOT has to be indexed on the primary key. There is no option to create an IOT based on other indexes. As such you have to either be accessing the table via the primary key to get the benefit – or you have to be a little cunning.
  • The index is going to be larger than it was and very often larger than the original table. This can slow down range scans or full scans of the index and a “full table scan” will now be a full index scan on this large object, so that can also negatively impact performance. However, if a range scan would then have resulted in access to the table to get extra columns, the IOT gives a similar benefit in reducing IO to that for single row lookups.
  • I just want to highlight that you now have no rowid for the rows.
  • Secondary indexes are supported but will potentially be less efficient due to this lack of rowid.

So, a brief summary is that Index Organised Tables effectively move the table data into the Primary Key index, reduce the number of block lookups needed to select one row, can save some disc space. But you can only organize the table via the Primary Key and it can make full or partial table scans and lookups via other indexes slower.

There are several more important benefits to IOTs {in my opinion} which I will come to over the next week or two.

Oracle11g Bitmap-Join IOTs (Us and Them)

With each new database release, nice little improvements and enhanced options continually get added. Since 11g R1, two index related features can finally be used in combination with each other.   To demonstrate, I’m first going to create and populate a so-called “large” Data Warehouse table.      I’ll next create a standard bitmap index [...]