<..IOT1 – the basics
<….IOT2 – Examples and proofs
<……IOT3 – Significantly reducing IO
<……..IOT4 – Boosting Buffer Cache efficiency
<……….IOT5 – Primary Key Drawback
<…………IOT6(A) – Bulk Insert slowed down
IOT 6 part A covered the sort of insert work used to create an IOT or push a large number of rows into it via an “insert into…select from” type format. ie Bulk Load.
In this, part B, I am going to test inserting a small number of rows (10,000) into a table already holding a million rows. Something more like an hourly load or the inserts that would occur over a working day – a more OLTP-like test
I’m going to do the tests against four things. A “non-insert” test to ensure my data creation is not taking a lot of resource {the Null test}, then against my three versions of test table – a normal heap table, my IOT and my Partitioned IOT. I run each test on each segment type three times, the minimum number needed to help identify variation that could invalidate the test.
You will probably be surprised by the results….
All three tables have the same column structure, the heap has indexes on the primary key and a unique constraint, the IOT and partitioned IOT are organized on a modified version of the Primary key and have the second index for the unique key. Below is the creation statement for the partitioned IOT (the IOT is the same without the partition clause). You can get full creation details from IOT part 2:
create table transaction_IOT_P
(tran_type number(2) not null
,tran_id number(10) not null
,ACCO_type number(2) not null
,ACCO_id number(10) not null
,cre_date date not null
,vc_1 varchar2(1000) not null
,vc_2 varchar2(1000)
,date_1 date
,num_1 number(2)
,num_2 number(2)
,constraint tip_pk primary key(ACCO_type,ACCO_id,cre_date,tran_type,tran_id)
-- using index tablespace index_01
,constraint tip_tran_uq unique (tran_type,tran_id)
using index tablespace index_01
)
organization index
tablespace data_01
partition by range (cre_date)
(partition rm20110901 values less than (to_date('01-09-2011','DD-MM-YYYY'))
tablespace data_01
,partition rm20111001 values less than (to_date('01-10-2011','DD-MM-YYYY'))
tablespace data_01
,partition rm20111101 values less than (to_date('01-11-2011','DD-MM-YYYY'))
tablespace data_01
,PARTITION RMTOP VALUES LESS THAN (MAXVALUE)
tablespace USERS
)
The three tests are:
The code is written ‘badly’, ie a single record is prepared and inserted into the table at a time, within a loop of 10000 iterations. This is to avoid the bulk-processing that may benefit the heap table and also to better match the reality of an OLTP system. I do however only commit after all records are created for any given test, which is maybe a mistake but that’s what I did.
Now, before I go any further, does anyone want to guess by how much inserting into the heap table will be faster than inserting into the IOT (partitioned or otherwise)? Also, whether the partitioned IOT will be faster or slower than the IOT and what, if any, impact there will be by the ordering of data?
To allow you to think about that and not see the results by accident, I’ll stick in one of the insert harnesses here so you can see how I am doing it. Skip over if you just want the results
--
execute s_snap.my_snap(' finished non-insert test1')
--
-- Transaction_heap random data test
declare
v_num number :=10000; -- number of accounts
v_offset number :=-4;
v_str1 varchar2(60);
v_str2 varchar2(60);
v_tran_type number(2);
v_tran_id number(10);
v_acco_type number(2);
v_acco_id number(10);
v_cre_date date;
v_vc_1 varchar2(1000);
v_vc_2 varchar2(1000);
v_date_1 date :=sysdate;
v_num_1 number(2) :=10;
v_num_2 number(2) :=15;
begin
v_str1:=dbms_random.string('U',60);
v_str2:=dbms_random.string('U',60);
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..10000 loop
v_tran_type :=mod(1,3)+1;
v_tran_id := 1000000+i;
v_acco_type := 5+(trunc(dbms_random.value(1,3))*5); -- 10 or 15
v_acco_id :=trunc(dbms_random.value(1,v_num/2)); -- 1 to 5000
v_cre_date :=trunc(sysdate-v_offset)+(i/(60*60*24));
v_vc_1 :=substr(v_str1,1,51+mod(i,10));
v_vc_2 :=substr(v_str2,1,46+mod(i,15));
insert into transaction_heap
(tran_type,tran_id,ACCO_type,ACCO_id,cre_date,vc_1,vc_2,date_1,num_1,num_2)
values
(v_tran_type,v_tran_id,v_ACCO_type,v_ACCO_id,v_cre_date
,V_vc_1,v_vc_2,v_date_1,v_num_1,v_num_2);
end loop;
commit;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/
--
execute s_snap.my_snap(' finished th insert test1')
--
Here are the results of three runs for all tests:
Elapsed time in seconds:
random insert ordered-by-account perfect IOT order match
Segment type Test 1 2 3 Test 1 2 3 Test 1 2 3
------------ ------ ---- ---- ------ ---- ---- ----- ---- ----
No insert 2.1 0.1 0.1 0.1 0.1 0.1 0.1 0.0 0.1
AVG AVG AVG
---- ---- ----
Heap 52.3 20.5 11.4 28.1 33.4 13.8 10.7 19.3 9.1 13.0 9.8 10.6
IOT 137.1 83.7 26.4 82.4 110.8 63.6 39.0 71.2 6.4 6.7 4.0 5.8
Partn'd IOT 33.2 9.5 8.7 17.1 17.1 10.4 14.3 13.9 8.1 5.3 4.5 6.0
---- ---- ----
Allow me to summarise that.
First, the preparation of the data and cycling around the loop, the no-insert tests, are negligible. {The first no_insert test is taking 2 seconds due to the parsing and collecting dictionary information overhead of the first thing done}. We can ignore the overhead of the test harness.
Much more significantly:
Hands up everyone expecting this? NB My hand would be down if I had not done similar tests before.
Why do we see these results?
I’ll just mention that the reason all the tests get quicker with each iteration is that the computer is doing the same thing over and over again and gets used to it, the relevant circuits in the chip are “oiling in”. Of course not, how silly – but it seems to be how non-IT people think of it
Basically more and more of each segment is getting cached and that reduces physical IO and speed things up. You see this a lot in testing and you need to be constantly vigilant about it. You might think I should really run the tests until they are steady-state, but that is a very false situation. On real system you don’t tend to get the same thing run over and over and over again on a system with nothing else running.
Much more interestingly, why is the partitioned IOT winning? Well, for the Heap table data is being poured into the leading edge of the table, which takes a little processing and some blocks to be created. However, there are also two indexes being maintained, the Primary Key and that Unique key. For each insert oracle has to find the right spot in each index (root block ->branch block->leaf block->check for row) and insert the new record, occasionally splitting a full leaf block.
For the IOT and the Partitioned IOT there is no table to maintain. In effect there are just two indexes to maintain. For the normal IOT this benefit is outweighed by the cost of working with a less space efficient Primary Key index. As there is the whole of the row in each index entry there are fewer index entries possible per leaf block and so more block need to be read in from disc and more splits are needed. Thus the inserts into the IOT are slower.
But for the partitioned IOT not only is there no table to maintain but the table (really the Primary Key) is partitioned – on date. We are creating data for a single day at a time. All the relevant blocks will be in the one partition (and it’s the latest and smallest partition) and quickly all cached in memory and processed in memory. It’s the efficient use of the cache again. For both the IOT and the Heap, the primary key segment being processed is larger and the inserts are spread across the whole index.
Ordering data to match the index order (ordered_by_account tests) helps for all three segment types – heap, IOT and partitioned IOT. We have two indexes and the data is being ordered for both in effect, as one is on the columns we are ordering the data creation for and the other, the unique key, is generated by an ascending sequence. The gain is about 25, 30% and is due to working through the indexes in order as opposed to skipping about. If the indexes were significantly larger than the buffer cache or on a highly active system where the turnover in the cache was very high (indeicated by a low buffer cache hit ratio
) then the benefit would be higher.
The perfect ordering test (perfect_iot_order_match) is a special case. Not only is the data being processed in order of the indexes but only one part of the index is being updated – for a single account. In this situation the maintaining of the index is so efficient that the time saved not processing the heap table means both the IOT and IOT_P win out over the heap. So in this special case, inserts into IOTs are faster than inserts into a heap table.
And finally a quick summary:
BTW I am conscious of the fact that the above may surprise many and you might wonder if my data or tests are flawed. I have stripped out most of the code from this post as it was just becoming massive and confusing, but contact me and I will happily supply all the example scripts and data.
Recent comments
17 weeks 1 day ago
26 weeks 6 days ago
28 weeks 4 days ago
31 weeks 6 days ago
34 weeks 1 day ago
43 weeks 4 days ago
45 weeks 1 day ago
46 weeks 1 day ago
46 weeks 2 days ago
49 weeks 18 hours ago