Search

OakieTags

Who's online

There are currently 0 users and 33 guests online.

Recent comments

pbach's blogentries

Applying Exadata Bundle Patch 6 (BP6)

Well November came around quickly - when I started blogging I promised myself that I would at least try to share my Exadata findings once a month.  So it is time to share my experience and findings from applying the bundle patch 6. If you are new to Exadata patching you should hopefully find some important points below. The patching took place on a ¼ Rack (2 compute nodes). The high level approach for applying BP6 is
1.       Apply patch 10114705 - Database Bundle Patch 6 for Exadata first

Exadata, to index or not to index, that is the question

We will take a look at in this blog post, by testing several different approaches and comparing the time and the statistics for each scenario.

The tests have been performed on a quarter rack ExaData database machine (2 db nodes – with 16 cores each and 3 storage servers). The database is setup for a data warehouse implementation and has been patched with bundle patch 5 at the time of testing.

The tests were executed on a table with 403M rows distributed over 14 range partitions – 7 non-compressed and 7 partitions compressed with HCC query option.  Each test spans over two partitions covering 57.5M rows.  Please note the dimension tables contain 4000 or less rows.  The data is production data and are event based data, meaning data is generated when a certain events occur.

TABLE_NAME  PARTITION_NAME COMPRESS COMPRESS_FOR LAST_ANALYZED  SAMPLE_SIZE   NUM_ROWS
EVENT_PART  TEST_20100901  DISABLED              27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100902  DISABLED              27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100903  DISABLED              27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100904  DISABLED              27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100905  DISABLED              27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100906  DISABLED              27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100907  DISABLED              27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100908  ENABLED  QUERY HIGH   27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100909  ENABLED  QUERY HIGH   27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100910  ENABLED  QUERY HIGH   27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100911  ENABLED  QUERY HIGH   27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100912  ENABLED  QUERY HIGH   27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100913  ENABLED  QUERY HIGH   27-SEP-10         28793000   28793000
EVENT_PART  TEST_20100914  ENABLED  QUERY HIGH   27-SEP-10         28793000   28793000
 

Each test-case/SQL has been executed 5 times under different scenario:

(1)    Without any bitmap or regular indexes on non-compressed partitions
           Storage FTS on event_part and dimensions
(2)    Without any bitmap or regular indexes on HCC partitions
            Storage FTS on event_part and dimensions
(3)    With primary key  constraint on dimension tables only
          Storage FTS on event_part with primary key look up on dimensions
(4)    With bitmap and primary key indexes on non-compressed partitions
            Bitmap index lookup on event_part and primary key lookup on dimensions
(5)    With bitmap and primary key  indexes on HCC partitions
           Bitmap index lookup on event_part and primary key lookup on dimensions 

The test cases used are from a warehouse environment and I have modified the column and table names. For the bitmap test-cases I had to hint the queries to ensure the bitmap indexes was actually used.

The output from the test cases is over 20K lines, so I have summed up the elapsed time and a few statistics in tables below to provide a better overview. 

select /*+ MONITOR  basic */
            st.s_name,  pt.p_name,  cct.pc_name,  ep.c_level, count(ep.c_id)
  from event_part ep,  t_test tt, s_test st, p_test pt, cc_test cct
   where tt.t_id between 20100901 and 20100902
        and ep.t_id = tt.t_id
        and ep.t_id between 20100901 and 20100902
        and ep.s_id = st.s_id
        and ep.p_id = pt.p_id
        and ep.cc_id = cct.c_id
    group by st.s_name, pt.p_name, cct.pc_name, ep.c_level
    order by st.s_name, pt.p_name, cct.pc_name, ep.c_level;
 

Table figure for basic breakdown – Test case 1

Stats / Tests

1

2

3

4

5

Elapsed time sec

8.87

12.00

09.22

185.05

149.55

cell physical IO bytes saved by storage index

0

0

0

0

0

cell physical IO bytes eligible for predicate offload

5,209,325,568

0

5,209,325,568

0

0

cell physical IO interconnect bytes returned by smart scan

2,562,203,600

0

2,562,201,584

0

0

cell flash cache read hits

9

26

9

9,290

2,063

CC Total Rows for Decompression

 

57,586,000

 

 

57,586,000

 This is a basic query, which is used for high level summaries and serves as a good base line to compare with, for the other test-cases.  There is no use of a where clause in the test case, so we will not benefit from any storage indexes in this case.  The first 3 tests are without any indexes on the fact table and are performing much better than test 4 and 5 and we should of course not expect the CBO to follow this path anyway.   It is evident for test 1 and 3 that the performance gained is supported by the storage server offloading and the smart scans.  The above CC stats for test 2, tell us that the db node performs the decompression, so this test will have to burn extra CPU cycles compared to test 1 and 3.  There is more to be mentioned for test 2, but I’ll try to cover that in the conclusion.

 select /*+ MONITOR lc breakdown */
           st.s_name, pt.p_name,  cct.pc_name,  ep.c_level,  count(ep.c_id)
 from event_part ep, t_test tt, s_test st, p_test pt, cc_test cct
 where tt.t_id between 20100903 and 20100904
      and ep.t_id = tt.t_id
      and ep.t_id between 20100903 and 20100904
      and ep.s_id = st.s_id and ep.p_id = pt.p_id
      and ep.cc_id = cct.c_id and ep.c_id = 7
 group by st.s_name, pt.p_name, cct.pc_name, ep.c_level
 order by st.s_name, pt.p_name, cct.pc_name, ep.c_level;
 

Table figure for LC breakdown - Test case 2

Stats / Tests

1

2

3

4

5

Elapsed time sec

2.05

19.17

1.84

30.33

36.58

cell physical IO bytes saved by storage index

4,186,636,288

0

4,186,636,288

0

0

cell physical IO bytes eligible for predicate offload

5,209,292,800

0

5,209,292,800

0

0

cell physical IO interconnect bytes returned by smart scan

317,496,848

0

317,497,280

0

0

cell flash cache read hits

18

59

36

1,043

219

CC Total Rows for Decompression

0

57,782,554

0

0

7,842,364

Similar finding as we saw from the 1st test case; however, in this test-case we are performing the breakdown for a certain ID and therefore the performance of test 1 and 3, improved further from the IO saved by the Storage Index.   For this test case, I ran test 1 and 3 on the save partitions and it is worth noticing, that second time around the savings from the Storage Index improved; so the storage indexes are further maintained/improved as we select data from the tables and partitions.

select /*+ MONITOR lp breakdown */
           st.s_name,  pt.p_name, cct.pc_name, ep.c_level,  count(ep.c_id)
 from event_part ep,  t_test tt,  s_test st,  p_test pt,  cc_test cct
 where tt.t_id between 20100905 and 20100906
       and ep.t_id = tt.t_id
      and ep.t_id between 20100905 and 20100906
      and ep.s_id = st.s_id  and ep.p_id = pt.p_id
      and ep.cc_id = cct.c_id and ep.p_id = 4611686019802841877
 group by st.s_name, pt.p_name, cct.pc_name, ep.c_level
 order by st.s_name, pt.p_name, cct.pc_name, ep.c_level;
 

Table figure for lp breakdown - Test case 3

Stats / Tests

1

2

3

4

5

Elapsed time sec

 2.99

 6.01

 2.72

 49.22

 39.29

cell physical IO bytes saved by storage index

 2,623,143,936

 

0

 

2,623,799,296

 

0

 

0

cell physical IO bytes eligible for predicate offload

 

5,209,325,568

 

0

 

5,209,325,568

 

0

 

0

cell physical IO interconnect bytes returned by smart scan

 

674,439,456

 

0

 

 

674,436,288

 

0

 

0

cell flash cache read hits

64

44

10

2,113

635

CC Total Rows for Decompression

 

0

 

57,979,108

 

0

 

0

 

15,582,048

 Similar findings as we saw from the 2nd test case; this test is just performed on a different ID, which has a higher distinct count than the first ID we tested in test case 2; and as a result of that and on how the data is sorted during insert we are seeing less IO saved by the storage index.

 select /*+ MONITOR spcl breakdown */
           st.s_name, pt.p_name,  cct.pc_name,  ep.c_level, count(ep.c_id)
  from event_part ep, t_test tt,   s_test st, p_test pt,  cc_test cct
 where tt.t_id between 20100906 and 20100907
      and ep.t_id = tt.t_id
      and ep.t_id between 20100906 and 20100907
      and ep.s_id = st.s_id and ep.p_id = pt.p_id
      and ep.cc_id = cct.c_id and ep.s_id = 1
      and ep.cc_id =7 and ep.p_id = 4611686019802841877
  group by st.s_name, pt.p_name, cct.pc_name, ep.c_level
  order by st.s_name, pt.p_name, cct.pc_name, ep.c_level;
 

Table figure for spcl breakdown – Test case 4

Stats / Tests

1

2

3

4

5

Elapsed time sec

1.67

13.69

01.14

12.77

7.90

cell physical IO bytes saved by storage index

 

4,531,191,808

 

0

 

4,532,174,848

 

0

 

0

cell physical IO bytes eligible for predicate offload

 

5,209,325,568

 

0

 

5,209,325,568

 

0

 

0

cell physical IO interconnect bytes returned by smart scan

 

237,932,736

 

0

 

237,933,312

 

0

 

0

cell flash cache read hits

73

52

10

594

183

CC Total Rows for Decompression

 

0

 

57,782,554

 

0

 

0

 

5,614,752

This test case is performed with a where clause on multiple ID’s.  Again test 1 and 3 are taking advantage of the Exadata features and are performing well.   Test 4 and 5 are still not close to test 1 or 3, but have definitely become a bit more competitive.  Comparing the two HCC tests (2 and 5) test 5 seems to do better as it only has to burn CPU cycles for 10% of the results set of test 2.   A valid question to ask here would be why we are not seeing any benefits from either Storage offloading or indexing on test 2, but again I’ll defer that discussion to the conclusion.

select /*+ MONITOR  ttl */
           st.s_name, cct.pc_name, ep.c_level, count(ep.c_id) row_count,
           round(min((ep.ceg - ep.cbg) / 60),2) min_g_min,
           round(avg((ep.ceg - ep.cbg) / 60.0),2)  avg_g_min,
          round(max((ep.ceg - ep.cbg) / 60),2) max_g_min
  from event_part ep,  t_test tt,   s_test st,  cc_test cct
 where  tt.t_id between 20100901 and 20100902
      and ep.t_id = tt.t_id
      and ep.t_id between 20100901 and 20100902
      and ep.s_id = st.s_id
      and ep.character_class_id = cct.class_id
  group by st.shard_name, cct.public_class_name, ep.character_level
  order by  st.shard_name, cct.public_class_name, ep.character_level;
 

Table figure for ttl breakdown - Test case 5

Stats / Tests

1

2

3

4

5

Elapsed time sec

12.82

15.50

11.67

254.26

304.92

cell physical IO bytes saved by storage index

 

0

 

0

 

0

 

0

 

0

cell physical IO bytes eligible for predicate offload

 

5,209,325,568

 

0

 

5,209,325,568

 

0

 

0

cell physical IO interconnect bytes returned by smart scan

 

2,328,566,432

 

0

 

2,328,567,440

 

0

 

0

cell flash cache read hits

9

15

9

132,467

2,341

CC Total Rows for Decompression

 

0

 

57,586,000

 

0

 

0

 

61,643,318

Very similar findings as we saw from the 1st test case; the only difference is this query looks examine the time to something.

select /*+ MONITOR ttlsc */
           st.s_name, cct.pc_name, ep.c_level,  count(ep.c_id) row_count,
           round(min((ep.ceg - ep.cbg) / 60),2) min_g_min,
           round(avg((ep.ceg - ep.cbg) / 60.0),2)  avg_g_min,
           round(max((ep.ceg - ep.cbg) / 60),2) max_g_min
 from event_part ep, t_test tt, shard_test st, cc_test cct
where tt.t_id between 20100903 and 20100904
      and ep.t_id = tt.t_id
      and ep.t_id between 20100903 and 20100904
      and ep.s_id = st.s_id
      and ep.cc_id = cct.c_id
      and ep.s_id = 2
      and ep.cc_id =6
    group by st.s_name, cct.pc_name, ep.c_level
    order by st.s_name, cct.pc_name, ep.c_level;
 

Table figure for ttlsc breakdown - Test case 6

Stats / Tests

1

2

3

4

5

Elapsed time sec

 1.16

4.57

1.01

12.71

 03.87

cell physical IO bytes saved by storage index

 

4,697,096,192

 

0

 

4,698,832,896

 

0

 

0

cell physical IO bytes eligible for predicate offload

 

5,209,292,800

 

0

 

5,209,292,800

 

0

 

0

cell physical IO interconnect bytes returned by smart scan

 

55,906,960

 

0

 

55,906,384

 

0

 

0

cell flash cache read hits

 9

31

10

3891

 107

CC Total Rows for Decompression

 0

 57,749,795

 0

 0

1,998,299

 Very similar findings as we saw for the 4th test case.

 Conclusion

Most warehouse like queries I have performed in our Exadata environment is doing well without indexes on fact tables.  So it is no surprise to me to hear more and more people are dropping most of their indexes and take advantage of the Exadata features.   If you like to keep the primary key indexes on your dimension tables to ensure the hassle of resolving the duplicate key issues, that seems to be a valid option as well.

In my environment I’m still to find a case where the bitmap index search could compete with the no index approach; and let just say we found such a case, when it would still have to show significant improvements before  I would choose that path;  Consider the benefits of not having to maintain the bitmap indexes after each load.   There are also several restrictions with bitmap indexes that would be nice not to have to worry about.

Now, I mentioned that I would get back to the test 2 results, which were based on Storage FTS on partitions compressed with the HCC query option.   In the past I have performed queries on HCC tables and have seen IO savings from the Storage indexes.  

Initially i suspected the test2 results observed above to be a bug or alternatively be related to my HCC compressed partitions are only 29MB a piece versa 2.4GB uncompressed.  Oracle support/development has confirmed it to be related to the data size, as we can see from the stat "cell physical IO bytes eligible for predicate offload", which doesn't get bumped up after query.  The reason for that is after partition pruning,  the table is too small for predicate push to kick in and since predicate push doesn't kick in, the Storage Indexes won't kick in either.

Please be aware i don't know the Storage index internals, but I look forward to learn.

When is Exadata’s storage indexes used?

When is Exadata’s storage indexes used?

Exadata’s Storage indexes provides data pruning at the storage layer.  The storage indexes eliminating disk IO, by storing a summary of the data distribution on the disks.    MOS Note ID 1094934.1 provides the following summary:

Each disk in the Exadata storage cell is divided into equal sized pieces called storage regions (default 1MB). There is an index entry for every storage regions (1MB of data stored on disk). Each entry contains the minimum and maximum value for columns seen in ‘where’ clause predicates. Information for up to 8 columns can be stored. The index is then used to eliminate disk IO by identifying which storage regions don’t match the ‘where’ clause of a query.

The question I’ll try help answer in this blog post is, when Oracle can take advantage of the storage indexes, by sharing some of the test results and findings I came across when testing storage indexes.  The data used for these tests are simple versions of a dba_tab_columns including an id column.   The tests were performed on a quarter rack ExaData database machine (2 db nodes – with 16 cores each and 3 storage servers)

Create sorted and unsorted Test data for storage index testing
 SQL> create table pba_order AS SELECT rownum pba_id, column_name, owner, table_name
            from (SELECT column_name, owner, table_name
                          from dba_tab_columns where rownum <= 10000),
                      (select rownum from dual CONNECT BY LEVEL <=20000)
          order by pba_id;
Table created.

SQL> select count(*) from pba_order;
          COUNT(*)  --  200,000,000
 
SQL> select count(distinct(pba_id)) from pba_order;
           COUNT(DISTINCT(PBA_ID))   --      200,000,000
 
SQL> create table pba_rnd as  select * from pba_order order by DBMS_RANDOM.RANDOM;
Table created.
 
Create a few null value entries for id column
SQL> update pba_rnd set PBA_ID = null where pba_id between 100 and 120;
21 rows updated.
 
SQL> update PBA_ORDER set PBA_ID = null where pba_id between 100 and 120;
21 rows updated.
SQL> commit;
 
Collect CBO statistics for tables
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('JA2','PBA_RND');
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('JA2','PBA_ORDER');
PL/SQL procedure successfully completed.

 SQL> select TABLE_NAME, NUM_ROWS, SAMPLE_SIZE, LAST_ANALYZED from user_tables where table_name like '%PBA%'; 

TABLE_NAME     NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
PBA_RND 200000000 200000000 13-SEP-10
PBA_ORDER  200000000 200000000 13-SEP-10


Check the storage session statistics before testing

Name Value
cell physical IO bytes saved by storage index 0
 
Storage indexes works on the columns in your where clause, so let’s check out a few different predicates, please notice that I have not created any indexes on either of the tables.
SQL> select count(*) from PBA_RND where pba_id is null;     -- IS NULL on random sorted data
  COUNT(*)  --    21
 Elapsed: 00:00:00.04
 
Name Value
cell physical IO bytes saved by storage index  10303389184
 
SQL>  select count(*) from PBA_ORDER where pba_id is null;    -- IS NULL on sorted data
    COUNT(*)   --     21
Elapsed: 00:00:00.04 
 
Name Value
cell physical IO bytes saved by storage index  20866007040
 So the entry for the storage index column can contain a NULL value unlike regular b-tree indexes, which is also what Kerry Osborne blog post on Exadata offload confirms http://kerryosborne.oracle-guy.com/2010/06/exadata-offload-the-secret-sauce/#more-2531

Check the storage session statistics before testing

Name Value
cell physical IO bytes saved by storage index  72634597376


SQL> select avg(pba_id) from pba_order where pba_id between 1000000 and 1000999;   - Range scan on sorted data.

    AVG(PBA_ID)  --    1000499.5
1 row selected.
Elapsed: 00:00:00.05 
 
Name Value
cell physical IO bytes saved by storage index  82996625408
 
 SQL> select avg(pba_id) from pba_rnd where pba_id between 1000000 and 1000999;  -- Range scan on random sorted data.
AVG(PBA_ID)  --    1000499.5
 Elapsed: 00:00:05.68 
 
Name Value
cell physical IO bytes saved by storage index  82996625408

I tried a few more test scenarios, with forcing direct reads, using parallel query but it did not make any difference for this SQL.  For the between test, I could only get the storage index to kick in for the ordered data.

Check the storage session statistics before testing

Name Value
cell physical IO bytes saved by storage index  93359243264


SQL> select avg(pba_id) from pba_order where pba_id <100;      -- LESS than on sorted data

AVG(PBA_ID)  --    50
Elapsed: 00:00:00.04
 
Name Value
cell physical IO bytes saved by storage index  103721861120
 
 SQL> select avg(pba_id) from pba_rnd where pba_id <100;       -- LESS than on unsorted data
    AVG(PBA_ID)  --        50
 Elapsed: 00:00:00.13
 
Name Value
cell physical IO bytes saved by storage index 113980342272
 
 SQL> select avg(pba_id) from pba_order where pba_id > 199999000;   -- greater than on sorted data
    AVG(PBA_ID)  --     199999501
Elapsed: 00:00:00.06
 
Name Value
cell physical IO bytes saved by storage index 144962977792
 
 SQL> select avg(pba_id) from pba_rnd where pba_id > 199999000;  -- greater than on unsorted data.
AVG(PBA_ID)  --    199999501
 Elapsed: 00:00:00.62
 
Name Value
cell physical IO bytes saved by storage index 154317127680

So we can confirm that Oracle can use storage indexing for both less and greater than predicates.

Check the storage session statistics before testing

Name Value
cell physical IO bytes saved by storage index 174033272832
 SQL> select pba_id from pba_order where pba_id = 100000000;   -- equal sign on sorted data
    PBA_ID  --   100000000
 Elapsed: 00:00:00.05 
 
Name Value
cell physical IO bytes saved by storage index 184395268096
 
SQL> select pba_id from pba_rnd where pba_id = 100000000;    -- equal sign on unsorted data
    PBA_ID  -- 100000000
 Elapsed: 00:00:05.66
 
Name Value
cell physical IO bytes saved by storage index 184395268096

Again I tried a few more test scenarios, with forcing direct reads, using parallel query but it did not make any difference in this case.  For the equal sign predicate test, I could only get the storage index to kick in for the ordered data.

Before finishing up, let’s check if storage indexes are kicking in on tables compressed with EHCC.   

Segment Name Size MB
PBA_ORDER 9,937
PBA_HCC 856

 

Table Name Last Analyzed Par Flash_C Cell_fl Compressed Compressed_for
PBA_HCC 14-SEP-10 No Default Default Enabled Query High Default
PBA_ORDER 14-SEP-10 No Default Default Disabled Default

Check the storage session statistics before testing 

Name Value
cell physical IO bytes saved by storage index 0

SQL>  select avg(pba_id) from pba_order where pba_id <100;

     AVG(PBA_ID) -- 50
 Elapsed: 00:00:00.13 
 
Name Value
cell physical IO bytes saved by storage index 10362617856
 

SQL> select avg(pba_id) from pba_hcc  where pba_id <100;
      AVG(PBA_ID)  --  50
Elapsed: 00:00:00.04  
 
Name Value
cell physical IO bytes saved by storage index 11097702400
 

I you like to read more on EHCC; I have blog post on EHCC with a few test-case and conclusions on http://www.oaktable.net/blog/what-can-ehcc-do-you

Conclusion

In Exadata, the Storage Index can evaluate predicates like <, <=, =, !=, >=, >, is NULL, is NOT NULL, as you saw above I was not able to get the storage indexes to kick in for the between and equal predicate for the random sorted data.    It may work in other circumstances or test-cases; currently it is very hard to evaluate, as there is no public tracing tool available. 

In general we saw marginal better IO savings on the test-case with sorted data and that storage indexes also worked on EHCC compressed tables.    I have also performed the same tests on a global temp table and the storage indexes did work with global temp tables as well.

Kerry Osborne has a good blog post, confirming that Storage indexes also works with bind variables: http://kerryosborne.oracle-guy.com/2010/09/do-storage-indexes-work-with-bind-variables/

The performance gained from the storage indexes really rocks my world and it made me review our design and index strategy for our warehouse.

What can EHCC do for you?

What can EHCC do for you?
By now you have probably heard about ExaData Hybrid Columnar Compression (EHCC), but what benefit can EHCC give you in terms of storage and performance savings?  
 
As always, it depends on your data.  Below I’ll share some of the test results I came across when testing EHCC. The data used for these tests are a short version of a fact table.   The tests were performed on a quarter rack ExaData database machine (2 db nodes – with 16 cores each and 3 storage servers)
 
As you may already have noticed, English is my second language, so please excuse me for spelling, grammar and whatever errors you may find in this post ;-)
 
- Test block compression against EHCC compression tables
First let’s look at the time it took to create the test data and the compression rate.
 
-- None compressed table
SQL> create table he100 PARALLEL 64 as select /*+ PARALLEL 64 */ * from he100_load_test;
Table created.
 
Elapsed: 00:00:33.97  
Did not seem to be CPU bound - Saw CPU utilization around 10 - 25%.
 
-- Block compressed table
SQL> create table he100_block compress for all operations PARALLEL 64 as select /*+ PARALLEL 64 */ *
                      from he100_load_test;
 
Elapsed: 00:00:28.51
Noticed CPU utilization to be around 40 - 70%
 
-- Table creation with EHCC query option