Search

OakieTags

Who's online

There are currently 0 users and 27 guests online.

Recent comments

Affiliations

Exadata storage indexes and DML

Last week I’ve gotten a question on how storage indexes (SI) behave when the table for which the SI is holding data is changed. Based on logical reasoning, it can be two things: the SI is invalidated because the data it’s holding is changed, or the SI is updated to reflect the change. Think about this for yourself, and pick a choice. I would love to hear if you did choose the correct one.

First let’s do a step back and lay some groundwork first. The tests done in this blogpost are done on an actual Exadata (V2 hardware), with Oracle version 11.2.0.4.6 (meaning bundle patch 6). The Exadata “cellos” (Cell O/S) version is 11.2.3.3.1.140529.1 on both the compute nodes and the storage nodes.

A storage index is a memory structure used by the cell daemon, which is the storage server process on the storage layer of Exadata. By default (although I’ve never seen it different yet) a SI contains minimum and maximum values for up to eight columns of the table it is describing. The memory structure is transient. It describes a region of one megabyte of a table. A storage index is populated during the fetching of data for a smart scan, based on the filter predicates of the query causing the smart scan. In essence the whole SI management is done automatically by the cell daemon (“essence” means you can play around with some undocumented settings on the database level, and there are some undocumented settings and events you can set on the cell server level).

Okay, back to the the original question. Let’s test this. First we need a table large enough to be smartscanned. I’ve got a table called ‘BIGTAB_NOHCC’ (as you probably guessed, there is a table ‘BIGTAB’ in the schema I am using, which is hybrid columnar compressed). This table consists of 2671736 blocks of 8KB, which means it got a total size of 21G. This is big enough for my database instance (big enough is relative to the buffer cache size) to get a smart scan.

For this test, I use the session statistics. In v$sesstat there is a statistic called ‘cell physical IO bytes saved by storage index’, which tells us how many bytes we saved from being scanned because of the use of storage indexes. I also show some output of a (single) storage server, although a normal Exadata rack typically will have 3, 7 or 14 storage servers in use.

First of all, in order to get an idea and be pretty sure my query will populate storage indexes, I stop and start the cell daemon on the storage servers. I use ‘service celld restart’ for that.

Next, in order to get storage index information for the table I use, I need some metadata. The metadata I need are:
a) data object id

select data_object_id from dba_objects where owner = 'MARTIN' and object_name = 'BIGTAB_NOHCC';

b) tablespace number

select ts# from v$tablespace t, dba_segments s, dba_objects o where t.name=s.tablespace_name and s.segment_name=o.object_name and o.data_object_id = 18716;

c) database unique id from the x$ksqdn

select ksqdngunid from x$ksqdn;

(thanks to the progeeking website for this information)

Next, I run the query on my BIGTAB_NOHCC table. Please mind it’s fundamental to have a filter predicate, and that the execution plan shows we are doing a full table scan. Only with a filter predicate the storage server has a reason to build a storage index:

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index					     0

MARTIN:dbm011> select count(*) from bigtab_nohcc where id=906259;

   COUNT(*)
-----------
	 16

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index					     0

Now let’s see if the storage layer built a storage index:

CellCLI> alter cell events = "immediate cellsrv.cellsrv_storidx('dumpridx','all',18716,5,1126366144);
Dump sequence #1 has been written to /opt/oracle/cell/log/diag/asm/cell/enkcel01/trace/svtrc_26565_58.trc
Cell enkcel01 successfully altered

This is why it’s handy to have the aforementioned data on the table: you can dump the specific storage indexes for a single table. The three numbers after ‘all’ are data_object_id, tablespace number and the ksqdngunid from the x$ksqdn table.

Let’s look in the file which the cell event produced:

# less /opt/oracle/cell/log/diag/asm/cell/enkcel01/trace/svtrc_26565_58.trc
Trace file /opt/oracle/cell/log/diag/asm/cell/enkcel01/trace/svtrc_26565_58.trc
ORACLE_HOME = /opt/oracle/cell
System name:    Linux
Node name:      enkcel01.enkitec.com
Release:        2.6.39-400.128.17.el5uek
Version:        #1 SMP Tue May 27 13:20:24 PDT 2014
Machine:        x86_64
CELL SW Version:        OSS_11.2.3.3.1_LINUX.X64_140529.1

*** 2014-07-12 13:16:17.030

*** 2014-07-12 13:16:17.030
UserThread: LWPID: 27455 userId: 58 kernelId: 58 pthreadID: 0x7f2863fe7940
2014-07-12 13:21:36.499123 :0005C9DE: $$$ Dumping storage idx summary for griddisk DATA_CD_08_enkcel01:
2014-07-12 13:21:36.499217 :0005C9E0: Dump sequence #1:

*** 2014-07-12 13:21:36.499
2014-07-12 13:21:36.499212 :0005C9DF: 
***************************
2014-07-12 13:21:36.499249 :0005C9E1: Dumping RIDX summary for objd 18716, tsn 5, dbid 1126366144

2014-07-12 13:21:36.499249*: RIDX(0x7f27b4ddab64) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-12 13:21:36.499249*: RIDX: strt 0 end 2048 offset 6215958528 size 1048576 rgnIdx 5928 RgnOffset 0 scn: 0x099a.5cc294b8 hist: 2
2014-07-12 13:21:36.499249*: RIDX validation history: 
2014-07-12 13:21:36.499249*: 0:FullRead 1:Undef 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
2014-07-12 13:21:36.499249*: Col id [1] numFilt 4 flg 2 (HASNONNULLVALUES): 
2014-07-12 13:21:36.499249*: lo: c3 c 34 11 0 0 0 0
2014-07-12 13:21:36.499249*: hi: c3 5a 5 18 0 0 0 0

Here you see the typical heading of an Oracle trace file, next the announcement of the dump (“Dumping storage idx summary”,”Dumping RIDX summary”). The real storage index information starts with “RIDX(0x7f27b4ddab64)”. Starting with that and until the “lo” and “hi” values, you are looking at an actual storage index which holds data for a single column. You can see which column by looking at the “Col id” in the square brackets: 1. It’s interesting to note that there is a scn (system change number) included. The storage index shows if there are NULL values in the column (in this case it says HASNONNULLVALUES, so we don’t have any NULL values in the 1MB chunk in the column this storage index describes), and, of course, the low and high values in the Oracle internal data format.

So, despite any indication on the database layer, the query built storage indexes! That should mean that executing the same query again will result in actually using the storage indexes which were just build:

MARTIN:dbm011> select count(*) from bigtab_nohcc where id=906259;

   COUNT(*)
-----------
	 16

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    5746368512

Yes! This using the storage index we caused the be built up previously!

With this being established, we can try to get an answer to our question: how does the storage index react to DML on the table it is describing?

We build the storage index, and used it. Now let’s update the ‘id’ field for which the storage index was build, and redo our query test:

MARTIN:dbm011> update bigtab_nohcc set id = id + 1;

16000000 rows updated.

MARTIN:dbm011> commit;

Commit complete.

Okay, now let’s redo the select again, and take stats before and after on storage index usage!

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    5746368512

MARTIN:dbm011> select count(*) from bigtab_nohcc where id=906260;

   COUNT(*)
-----------
	 16

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    5746368512

What we see here with the storage index session statistic, is the update statement didn’t use the storage index (which is obvious, we updated the data, which happened on the database layer, so we didn’t use a smartscan) because this number is the same as the last time we looked at it before the update statement.
When we executed the select query on the table with a filter on the id column again, there is no storage index usage, because the storage index session statistic didn’t increase.

Actually, this query built new storage indexes. When this query is executed again, we can use these:

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    5746368512

MARTIN:dbm011> select count(*) from bigtab_nohcc where id=906260;

   COUNT(*)
-----------
	 16

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				   14534426624

(actually, it sometimes can take more than one execution for the storage indexes to be created again, my guess would be some heuristics are used to try to come up with the best candidates for storage indexes)

Let’s try another case: update a column in the table for which storage indexes are created on another column.
First make sure storage indexes are build:

MARTIN@dbm011 > @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index					     0

MARTIN@dbm011 > select count(*) from bigtab_nohcc where id=906260;

  COUNT(*)
----------
	16

MARTIN@dbm011 > @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    8790155264

For completeness sake, I verified the storage indexes for this table by using the dump shown previously. A simple grep on the dump shows this query created storage indexes for only column 1, the id column:

grep -e RIDX_VALID -e 'Col id' /opt/oracle/cell/log/diag/asm/cell/enkcel01/trace/svtrc_26565_4.trc
014-07-13 03:34:46.814235*: RIDX(0x7f27b4ddab64) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814235*: Col id [1] numFilt 5 flg 2 (HASNONNULLVALUES): 
2014-07-13 03:34:46.814235*: RIDX(0x7f27b4ddac40) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814235*: Col id [1] numFilt 4 flg 2 (HASNONNULLVALUES): 
2014-07-13 03:34:46.814235*: RIDX(0x7f27b4ddad1c) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814235*: Col id [1] numFilt 5 flg 2 (HASNONNULLVALUES): 
2014-07-13 03:34:46.814587*: RIDX(0x7f27b4ddadf8) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814587*: Col id [1] numFilt 5 flg 2 (HASNONNULLVALUES): 
2014-07-13 03:34:46.814587*: RIDX(0x7f27b4ddaed4) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814587*: Col id [1] numFilt 5 flg 2 (HASNONNULLVALUES): 
2014-07-13 03:34:46.814587*: RIDX(0x7f27b4ddb08c) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814587*: Col id [1] numFilt 5 flg 2 (HASNONNULLVALUES): 
...etc.

Now let’s update another field:

MARTIN@dbm011 > update bigtab_nohcc set spcol = spcol + 1;

16000000 rows updated.

MARTIN@dbm011 > commit;

Commit complete.

And query the storage index use, and do our query with filter predicate again:

MARTIN@dbm011 > @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    8790155264

MARTIN@dbm011 > select count(*) from bigtab_nohcc where id=906260;

  COUNT(*)
----------
	16

MARTIN@dbm011 > @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    8790155264

So…this strongly suggests the update invalidated our storage index, even when the column was not in the storage index.
Just for completeness sake, let’s issue the select statement again to see if the storage index was build up again:

MARTIN@dbm011 > select count(*) from bigtab_nohcc where id=906260;

  COUNT(*)
----------
	16

MARTIN@dbm011 > @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    9808936960

Yes, there it is again, the previous statement didn’t benefit from the storage indexes, but build them, and now we take advantage of it again.

Conclusion.
I ran these tests on a quarter rack without any other usage. The method I used for measuring how storage indexes behave was to execute on the database layer, and see the result on the storage layer and in database layer session statistics. It’s important to realise that despite this being quite strong evidence, there isn’t definite proof on my conclusion. Also, the behaviour described can change in future storage server versions.

My conclusion is that the storage server uses a system change number to validate if the storage indexes are valid. If not, storage indexes are not used. If blocks contained in a storage index progresses it’s system change number (in other words: are updated), the storage index is invalidated right away (during my tests, when the update starts, no or few storage indexes remain).

So, the ones that picked storage indexes being invalidated were right. As far as I could detect, there is no updating of storage indexes.

Tagged: exadata, internals, oracle, performance, storage index