Who's online

There are currently 0 users and 39 guests online.

Recent comments

Oakies Blog Aggregator

Delphix: Gartner’s Cool Vendors in Storage for 2013!


Delphix has been selected as one of Gartner’s Cool Vendors in Storage for 2013!

Gartner uses the Cool Vendor awards to highlight innovative and disruptive new companies that are rapidly changing how enterprises do business.

View Report


J Lewis Delphix recently engaged with Oracle guru Jonathan Lewis who spent a week putting Delphix through its paces and responding to community-driven requests / questions. A recording of the first webinar is available, informally answering the top community questions.

Watch Webinar


451 Research Many organizations have adopted agile development processes to accelerate their application projects. However, the database underlying these applications is often a major obstacle to success with agile development.  Register for this live webinar on May 8 to listen to the 451 Research and Delphix discuss the role of databases in agile development

.Watch Webinar


IBM Delphix completed a performance driven test with IBM, focusing on how to architect a high performance environment for virtual environments. The results may surprise you, as Delphix powered virtual databases performed over 500% faster in our joint tests.

Learn More


The number of ways Delphix customers continue to leverage Delphix is impressive. Delphix recently worked with customers who were using Delphix to dramatically reduce their load on production databases and their SAN. These discussions lead our professional services team to document Delphix’s impact on production from the data we’ve jointly collected with our customers.

Learn More


Delphix is now a SAP Endorsed Business Solution. Delphix’s data virtualization technologytransforms the economics of data and application management, allowing accelerated deployments of SAP® solutions with reduced total cost of ownership (TCO) and effort.

Learn More



Iron Man 3…

Iron Man blew me away when I first saw it. Iron Man 2 was good, but not as good as the first. Iron Man 3 was better than Iron Man 2, but not as good as Iron Man or The Avengers movie, in my opinion. Don’t take this as a major criticism, because all of these films are very cool. It’s a relative thing…

The tech didn’t take so much of the center stage during this movie. Don’t get me wrong, there is a lot, but it seemed much more about Tony than the suite. That’s not a bad thing, but the suit is the cool bit for me. There were a few suit-related scenes I would have developed a bit more. Also, it needed some AC/DC during a kick-ass action scene. If they were there I didn’t notice them. :(

The end was a bit weak, but stay until right at the end credits to see the extra scene and a little message before you leave…

I’m being extra picky here. It’s definitely cool and worth going to see!




Iron Man 3… was first posted on April 29, 2013 at 11:50 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

4k sector size and Grid Infrastructure 11.2 installation gotcha

Some days are just too good to be true :) I ran into an interesting problem trying to install Grid Infrastructure for a two node cluster. The storage was presented via iSCSI which turned out to be a blessing and inspiration for this blog post. So far I haven’t found out yet how to create “shareable” LUNs in KVM the same way I did successfully with Xen. I wouldn’t recommend general purpose iSCSI for anything besides lab setups though. If you want network based storage, go and use 10GBit/s Ethernet and either use FCoE or (direct) NFS.

Here is my setup. Storage is presented in 3 targets using tgtd on the host:

  1. Target 1 contains 3×2 GB LUNs for OCR and voting disks in normal redundancy.
  2. Target 2 contains 3×10 GB LUNs for +DATA
  3. Target 2 contains 3×10 GB LUNs for +RECO

iSCSI initiators are Oracle Linux 6.4 on KVM with the host running OpenSuSE 12.3 providing the iSCSI targets. Yes, I know I’m probably the only Oracle DBA running SuSE, but to my defence I have a similar system with Oracle Linux 6.4 throughout and both work.

So besides the weird host OS there is nothing special. Since I’m lazy sometimes and don’t particularly like udev I decided to use ASMLib for device name persistence on the iSCSI LUNs. This turned out to be crucial, otherwise I’d never had written this post.

So much for the introduction

And here’s the problem. While installing Grid Infrastructure OUI allowed me to fill out all wizard interfaces and proceeded to install the binaries on all hosts. If you have installed RAC before that’s not the interesting part of the installation. It gets far more interesting when you run! Normally simply completes if you paid attention to the pre-requisites. I _have_ paid attention to them yet still the script failed on node 1! I don’t have the exact output on screen any more but the script bailed out trying to create the voting files.

Whenever something goes wrong with the installation of Grid Infrastructure you can turn to $GRID_HOME/cfgtoollogs/crsconfig/rootcrs_$(hostname).log. The relevant section in the file contained this:

2013-04-23 14:11:12: Creating voting files
2013-04-23 14:11:12: Creating voting files in ASM diskgroup OCR
2013-04-23 14:11:12: Executing crsctl replace votedisk '+OCR'
2013-04-23 14:11:12: Executing /u01/app/11.2.0/grid/bin/crsctl replace votedisk '+OCR'
2013-04-23 14:11:12: Executing cmd: /u01/app/11.2.0/grid/bin/crsctl replace votedisk '+OCR'
2013-04-23 14:11:12: Command output:
>  Failed to create voting files on disk group OCR.
>  Change to configuration failed, but was successfully rolled back.
>  CRS-4000: Command Replace failed, or completed with errors.
>End Command output
2013-04-23 14:11:12: Voting file add failed

Ooops. Not good-why would that fail? The ASM instance was up, the OCR has already been created. Most Clusterware commands leave a trace in $GRID_HOME/log/$(hostname)/client. I checked the last file in there but it didn’t help much:

2013-04-23 15:07:49.079: [  CRSCTL][3108575008]crsctl_format_diskgroup: diskgroup OCR creation with status 1. Please check the alert log file for ASM
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: === clsssConfigUnlock ===
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: lock(0x1cb70d0), version(1), size(1088)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: offsets(24), activever(186647296)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: id(14), instantiation(12), incarn(1)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: mapoff(28), configoff(548), mapsize(512)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: state(0), holders(0), waiters(0)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: committimestamp(0), commitstate(0)
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: === Map (of first 7 entries) ===
2013-04-23 15:07:49.082: [ CSSCLNT][3108575008]clsssConfigLockTrace: 000 000 000 000 000 000 000

Except for line 1-that is a very obvious pointer. Maybe there is something in the ASM instance’s alert.log?

2013-04-23 15:07:48.733000 +01:00
NOTE: [ (TNS V1-V3) 13092] opening OCR file
NOTE: updated gpnp profile ASM diskstring:
NOTE: Creating voting files in diskgroup OCR
NOTE: Voting File refresh pending for group 1/0x20cf9032 (OCR)
NOTE: Attempting voting file creation in diskgroup OCR
ERROR: Could not create voting files. It spans across 161 AUs (max supported is 64 AUs)
ERROR: Voting file allocation failed for group OCR
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_13100.trc:
ORA-15303: Voting files could not be created in diskgroup OCR due to small Allocation Unit size
2013-04-23 15:07:51.278000 +01:00
NOTE: Attempting voting file refresh on diskgroup OCR

That was it! But what’s so special about the disk group? Something started to dawn on me…I have recently spent quite some time on 4k sector size disks and their implications. And my hard disks have recently been replaced … here’s what I got from ASM:

SQL> select name,sector_size from v$asm_diskgroup;

NAME                           SECTOR_SIZE
------------------------------ -----------
OCR                                   4096

So sure enough, my disk group uses 4k sectors, even though at no point I asked it to do so. In fact up to today I struggled to create a disk group with a 4k sector size for lack of supporting hardware! So this is the first time I see those, my hard disks in the lab server must be pretty new then. There are many ways to check for the block size of your LUN, this time I chose fdisk (this only works for LUNs that use the MBR format, if the LUN has been initialised with a GPT you need to install parted instead)

[root@rac11gr2node1 client]# fdisk -lu /dev/sda

Disk /dev/sda: 2147 MB, 2147483648 bytes
67 heads, 62 sectors/track, 1009 cylinders, total 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1          531712     4191385     1829837   83  Linux
[root@rac11gr2node1 client]#

Notice this line:

Sector size (logical/physical): 512 bytes / 4096 bytes

And there exactly lies the problem. ASMLib in my version cannot deal with a 4k sector size properly it seems. The problem is known, just search for ORA-15303 in My Oracle Support. It seems particular to LUNs/disks with different logical and physical block sizes though-refer to the MOS note for more information. For your reference, here are my versions:

[oracle@rac11gr2node1 ~]$ rpm -qa| grep oracleasm
[oracle@rac11gr2node1 ~]$

The strange thing is that the ASM instance with the single disk group has been created successfully, as shown by the log (note the disk list in line 1):

2013-04-23 14:10:54: Executing as oracle: /u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList 'ORCL:OCR1,ORCL:OCR2,ORCL:O
CR3' -redundancy NORMAL -configureLocalASM -au_size 1
2013-04-23 14:10:54: Running as user oracle: /u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList 'ORCL:OCR1,ORCL:OCR2,ORCL:OCR3' -redundancy NORMAL -configureLocalASM -au_size 1
2013-04-23 14:10:54:   Invoking "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList 'ORCL:OCR1,ORCL:OCR2,ORCL:OCR3' -redundancy NORMAL -configureLocalASM -au_size 1" as user "oracle"
2013-04-23 14:10:54: Executing /bin/su oracle -c "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList 'ORCL:OCR1,ORCL:OCR2,ORCL:OCR3' -redundancy NORMAL -configureLocalASM -au_size 1"
2013-04-23 14:10:54: Executing cmd: /bin/su oracle -c "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList 'ORCL:OCR1,ORCL:OCR2,ORCL:OCR3' -redundancy NORMAL -configureLocalASM -au_size 1"
2013-04-23 14:11:09: Command output:
>  ASM created and started successfully.
>  Disk Group OCR created successfully.
>End Command output

After spending a little while thinking what to do I decided to bite the bullet and start from scratch. The deinstall scripts again worked really well in my case.

Instead of ASMLib I used udev for the second attempt. The trick with udev is to find something to map. The scsi_id command for example is a great help in determining disk attributes, but there are others too. In my case, all I wanted to achieve is to change permissions of the iSCSI disks to oracle:dba and 0660. In my lab environment I didn't use multiple paths to the iSCSI target, and I didn't care about symlinks. You get it, this isn't 100% realistic ... Note that In KVM disks are named vd*, like /dev/vda for the first one etc. if you are using virtio, the para-virtualised drivers. The iSCSI disks ended up being called /dev/sd* which makes it easy to define an asm_diskstring.
[root@rac11gr2node1 sys]# /sbin/scsi_id --whitelisted --replace-whitespace --page=0x80 --device=/dev/sdg --export

So using the vendor string and the model attribute from SYSFS will most likely work. Please note that this is a rather simplistic model and wouldn't necessarily work with Fibre Channel attached disks or multipathing. The resulting udev rule in /etc/udev/rules/99-asm.rules was:

KERNEL=="sd[a-z]*", BUS=="scsi", SYSFS{vendor}=="IET", SYSFS{model}=="VIRTUAL-DISK", OWNER="oracle", GROUP="dba" MODE="0660"

I also deconfigured ASMLib on all nodes just to be sure:

[root@rac11gr2node1 u01]# /etc/init.d/oracleasm configure
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver.  The following questions will determine whether the driver is
loaded on boot and what permissions it will have.  The current values
will be shown in brackets ('[]').  Hitting  without typing an
answer will keep that current value.  Ctrl-C will abort.

Default user to own the driver interface [oracle]:
Default group to own the driver interface [dba]:
Start Oracle ASM library driver on boot (y/n) [y]: n
Scan for Oracle ASM disks on boot (y/n) [y]: n
Writing Oracle ASM library driver configuration: done
Dropping Oracle ASMLib disks:                              [  OK  ]
Shutting down the Oracle ASMLib driver:                    [  OK  ]
[root@rac11gr2node1 u01]#

Running udevadm trigger loaded the new rules and indeed, the /dev/sd*1 devices now were owned by oracle:dba.

Then I ran through OUI again and wanted to assign the previously used disks again, but they didn't appear as candidates :( OK so the deconfig didn't zero out the disk headers. Not a problem, oracleasm has a "deletedisk" command that can be used for this exact purpose. You just need to be really sure it's the correct disk you are zeroing out, otherwise someone else will surely complain (you have been warned!).

The rest was simple. The ASM instance has again been created successfully, note the difference in disk names this time (my asm_diskstring was set to /dev/sd*1):

2013-04-23 16:29:29: Executing as oracle: /u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList '/dev/sda1,/dev/sdb1,/dev/sdc1' -redundancy NORMAL -diskString '/dev/sd*1' -configureLocalASM -au_size 1
2013-04-23 16:29:29: Running as user oracle: /u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList '/dev/sda1,/dev/sdb1,/dev/sdc1' -redundancy NORMAL -diskString '/dev/sd*1' -configureLocalASM -au_size 1
2013-04-23 16:29:29:   Invoking "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList '/dev/sda1,/dev/sdb1,/dev/sdc1' -redundancy NORMAL -diskString '/dev/sd*1' -configureLocalASM -au_size 1" as user "oracle"
2013-04-23 16:29:29: Executing /bin/su oracle -c "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList '/dev/sda1,/dev/sdb1,/dev/sdc1' -redundancy NORMAL -diskString '/dev/sd*1' -configureLocalASM -au_size 1"
2013-04-23 16:29:29: Executing cmd: /bin/su oracle -c "/u01/app/11.2.0/grid/bin/asmca -silent -diskGroupName OCR -diskList '/dev/sda1,/dev/sdb1,/dev/sdc1' -redundancy NORMAL -diskString '/dev/sd*1' -configureLocalASM -au_size 1"

And this time the voting disks were created as well:

CRS-2676: Start of 'ora.diskmon' on 'rac11gr2node2' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac11gr2node2' succeeded

ASM created and started successfully.

Disk Group OCR created successfully.

clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4256: Updating the profile
Successful addition of voting disk 9d9a9577a66a4f6fbf267b125dc7f4a3.
Successful addition of voting disk 26e5ff2544084ff0bfb9e753b090ec22.
Successful addition of voting disk 38a820cd4b424f4ebf514370168fa499.
Successfully replaced voting disk group with +OCR.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   9d9a9577a66a4f6fbf267b125dc7f4a3 (/dev/sda1) [OCR]
 2. ONLINE   26e5ff2544084ff0bfb9e753b090ec22 (/dev/sdb1) [OCR]
 3. ONLINE   38a820cd4b424f4ebf514370168fa499 (/dev/sdc1) [OCR]
Located 3 voting disk(s).

Very nice-job done. You should refer back to the relevant MOS note to read more about the problem but for now be advised that 4k sector sizes can cause surprises.

MV Refresh

Materialized views open up all sorts of possibilities for making reporting more efficient – but at the same time they can introduce some “interesting” side effects when you start seeing refreshes taking place. (Possibly one of the most dramatic surprises appeared in the upgrade that switched many refreshes into “atomic” mode, changing a “truncate / append” cycle into a massively expensive “delete / insert” cycle).

If you want to have some ideas of the type of work that is involved in the materialized view “fast refresh”, you could look at a recent pair of articles by Alberto Dell’Era on (very specifically) outer join materialized views (which a link back to a much older article on inner join materialized view refresh):



Its not about the outage

My Oracle Support had a fairly lengthy outage today right in the middle of the Australian business day.

But I’m not going to blog about that.  One thing I’ve learnt from many client sites is that people will understand and forgive things like outages, or errors, or crashes, or just plain wrong software, as long its evident that you are passionately working for the benefit of the user, that you were not lazy or flippant or learning from mistakes…

But one thing, perhaps the biggest thing, that customers will NOT tolerate, is when you don’t listen to what they’re trying to tell you

And that’s where MOS is suffering – not from outages, not from errors, but from not listening….

I logged this SR:

Problem: Making index partition unusable does not free underlying segment

Test Case:

SQL> sho parameter deferred

------------------------------------ ----------- ----------------------------------
deferred_segment_creation boolean TRUE

SQL> drop table T purge;

Table dropped.

SQL> create table T ( x timestamp, y int)
  4  (
  6  )
  7  /

Table created.

SQL> insert into T values ('01-APR-13',1);
SQL> insert into T values ('02-APR-13',1);
SQL> insert into T values ('03-APR-13',1);
SQL> insert into T values ('04-APR-13',1);

SQL> create index IX on T ( x ) local;

Index created.

SQL> col partition_name new_value d
SQL> select segment_name, partition_name, bytes from user_segments;

---------------------------------------- ------------------------------ ----------
T SYS_P42997 8388608
T SYS_P42998 8388608
T SYS_P42999 8388608
T ARCH_P0 8388608
T SYS_P42996 8388608
IX ARCH_P0 65536
IX SYS_P43000 65536
IX SYS_P43001 65536
IX SYS_P43002 65536
IX SYS_P43003 65536

10 rows selected.

SQL> alter index IX modify partition &&d unusable;

Index altered.

SQL> select segment_name, partition_name, bytes from user_segments;

---------------------------------------- ------------------------------ ----------
T SYS_P42997 8388608
T SYS_P42998 8388608
T SYS_P42999 8388608
T ARCH_P0 8388608
T SYS_P42996 8388608
IX ARCH_P0 65536
IX SYS_P43000 65536
IX SYS_P43001 65536
IX SYS_P43002 65536
IX SYS_P43003 65536

10 rows selected.

Now before people criticise that I didn’t show version and platform, remember that in logging an SR, these things are provided as the SR is logged.  In this case, the platform is AIX and the version is  I also posted a case into the SR showing the under on Linux, the segment is indeed correctly dropped when the index partition is set to unusable.

Its a trivial test case, and my inquiry was simple – is it platform or version or both that is the issue.

But this post is about listening…

First response to the SR:

Your problem is this: "Unusable Index Segment Still Exists in DBA_SEGMENTS for Alter Table Move"

Well…I dont think so. Probably because the test case makes no mention of alter-table-move. 

I point this out via an SR update.

Second response to the SR:

You can drop the partition to reclaim free space

For starters, the ambiguity is risky advice.  Do they mean drop the table partition or the index partition ?  Might my next SR be "how can I recover the data I just dropped ?"

Secondly, its not addressing the original request.

So I’m more than happy to accept that an MOS outage happened….these things do in the IT world.  We try to avoid them, but they happen.

But its sad when the NON-technical components of IT, that of simple good listening skills also suffer regular "outages".

OUGN Norway Conference

The OUGN conference (held on a cruise ship) was an awesome experience and so professionally run by the committee.  I recommend it to anyone.

The slides from my talks can be found here

Nightmares about exams…

It’s been over 20 years since I finished my finals for my first degree, but this time of year never fails to affect my already terrible sleeping patterns. I’ve started waking up in the morning panicking about an exam, which I’ve forgotten to revise for. It takes a few minutes for me to realize there is no exam, so I’ve got nothing to worry about… :)

Like I said, this happens every year, but I think the fact I’m currently working for University has reinforced the panic in me. I sense a few more weeks of exam panic induced insomnia, before I settle back into my normal insomnia.



Nightmares about exams… was first posted on April 29, 2013 at 10:25 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

fast refresh of outer-join-only materialized views – algorithm, part 2

In this post, we are going to complete part 1 illustrating the (considerably more complex) general case of a fast refresh from a master inner table without a unique constraint on the joined column(s).

To recap, now the outer slice can be composed of more than one row, for example:

ooo inn1
ooo inn2

and hence, both the DEL and INS step must consider (and read) the whole outer slice even if only a subset of the inner rows have been modified. This requires both more resources and a considerably more complex algorithm. Let's illustrate it (the mandatory test case is here).

The DEL macro step

This sub step (named DEL.del by me) is performed first:

delete from test_mv where rowid in (
select rid
  from (
select test_mv.rowid rid,
       row_number() over (partition by test_outer_rowid order by rid$ nulls last) r,
       count(*)     over (partition by test_outer_rowid ) t_cnt,
       count(rid$)  over (partition by test_outer_rowid ) in_mvlog_cnt
  from test_mv, (select distinct rid$ from mlog$_test_inner) mvlog
 where /* read touched outer slices start */
       test_mv.test_outer_rowid in
          select test_outer_rowid
            from test_mv
           where test_inner_rowid in (select rid$ from mlog$_test_inner)
       /* read touched outer slices end */
   and test_mv.test_inner_rowid = mvlog.rid$(+)
 /* victim selection start */
 where t_cnt > 1
   and ( (in_mvlog_cnt = t_cnt and r > 1)
         (in_mvlog_cnt < t_cnt and r <= in_mvlog_cnt)
 /* victim selection end */

followed by the DEL.upd one:

update test_mv
   set jinner = null, xinner = null, pkinner = null, test_inner_rowid = null
 where test_inner_rowid in (select rid$ from mlog$_test_inner)

This two steps combined do change all the rows of the MV marked in the log (and only them, other rows are not modified at all); the first step deletes some of them, leaving all the others to the second one, that sets to null their columns coming from the inner table.

DEL.upd is straighforward. Let's illustrate the DEL.del algorithm:

a) the section "read touched outer slices" fetches all the MV outer slices that have at least one of their rows marked in the log;
b) the slices are outer joined with the "mvlog" in-line view, so that rid$ will be nonnull for all rows marked in the log;
c) the analytic functions, for each outer slice separately, compute the number of rows (column t_cnt), the number of rows marked (column in_mvlog_cnt), and then attach a label (column r) that orders the row (order is not important at all besides non-marked rows being ordered last)
d) the where-predicate "victim selection" dictates which rows to delete.

The victim selection predicate has three sub-components, each implementing a different case (again, considering each slice separately):

"t_cnt > 1": do not delete anything if the slice contains only one row (since it is for sure marked and hence will be nulled by DEL.upd)

             rid$ t_cnt in_mvlog_cnt r  action
ooo inn1 not-null     1            1 1  updated by DEL.upd   

"in_mvlog_cnt = t_cnt and r > 1": all rows are marked, delete all but one (that will be nulled by DEL.upd)

             rid$ t_cnt in_mvlog_cnt r  action
ooo inn1 not-null     3            3 1  updated by DEL.upd
ooo inn2 not-null     3            3 2  deleted by DEL.del
ooo inn3 not-null     3            3 3  deleted by DEL.del

"in_mvlog_cnt < t_cnt and r <= in_mvlog_cnt": only some rows are marked; delete all marked rows, keep all the others.

             rid$ t_cnt in_mvlog_cnt r  action
ooo inn1 not-null     3            2 1  deleted by DEL.del
ooo inn2 not-null     3            2 2  deleted by DEL.del
ooo inn3     null     3            2 3  nothing

The INS macro step

The first sub-step is INS.ins:

insert into test_mv
select  o.jouter,  o.xouter,  o.pkouter, o.rowid,
       jv.jinner, jv.xinner, jv.pkinner, jv.rid
  from ( select test_inner.rowid rid,
           from test_inner
          where rowid in (select rid$ from mlog_test_inner)
       ) jv, test_outer o
 where jv.jinner = o.jouter

this sub-step simply find matches in the outer table for the marked inner table rows (note that it is an inner join, not an outer join), and inserts them in the MV.

Then, INS.del:

delete from test_mv sna$ where rowid in (
select rid
 from (
select test_mv.rowid rid,
       row_number()            over (partition by test_outer_rowid order by test_inner_rowid nulls first) r,
       count(*)                over (partition by test_outer_rowid ) t_cnt,
       count(test_inner_rowid) over (partition by test_outer_rowid ) nonnull_cnt
  from test_mv
 where /* read touched outer slices start */
       test_mv.test_outer_rowid in
          select o.rowid
            from ( select test_inner.rowid rid$,
                     from test_inner
                    where rowid in (select rid$ from mlog$_test_inner)
                 ) jv, test_outer o
           where jv.jinner = o.jouter
      /* read touched outer slices end */
 /* victim selection start */
 where t_cnt > 1
   and ( (nonnull_cnt = 0 and r > 1)
         (nonnull_cnt > 0 and r <= t_cnt - nonnull_cnt)
 /* victim selection end */

this substep has a SQL structure very similar to DEL.upd, hence I will simply outline the algorith: first, the statement identifies (in the "read touched outer slices" section) all the outer slices that had at least one rows inserted by INS.ins, by replaying its join; then, for each slice, it deletes any row, if it exists, that has column "test_inner_rowid" set to null (check the "victim selection predicate").

Side note: I cannot understand how nonnull_cnt could be = 0 - possibly that is for robustness only or because it can handle variants of the DEL step I haven't observed.

speeding up

These are the indexes that the CBO might enjoy using to optimize the steps of the propagation from the inner table:
- DEL.del: test_mv(test_inner_rowid, test_outer_rowid)
- DEL.upd: test_mv(test_inner_rowid)
- INS.ins: test_outer(jouter)
- INS.del: test_outer(jouter) and test_mv(test_outer_rowid , test_inner_rowid)

And hence, to optimize all steps:
- test_outer(jouter)
- test_mv(test_inner_rowid, test_outer_rowid)
- test_mv(test_outer_rowid , test_inner_rowid)

And of course we need the usual index on test_inner(jinner) to optimize the propagation from the outer table (not shown in this post), unless we positively know that the outer table is never modified.

Note that the two indexes test_mv(test_inner_rowid, test_outer_rowid) and test_mv(test_outer_rowid , test_inner_rowid) allow to skip visiting the MV altogether (except for deleting rows, obviously) and hence might reduce the number of consistent gets dramatically (the indexes are both "covering" indexes for the SQL statements we observed in the DEL.del and INS.del) .

For example, in my test case (check ojoin_mv_test_case_indexed.sql), the plan for the DEL.del step is:

| 0|DELETE STATEMENT                |                        |
| 1| DELETE                         |TEST_MV                 |
| 2|  NESTED LOOPS                  |                        |
| 3|   VIEW                         |VW_NSO_1                |
| 4|    SORT UNIQUE                 |                        |
| 5|     VIEW                       |                        |
| 6|      WINDOW SORT               |                        |
| 7|       HASH JOIN OUTER          |                        |
| 8|        HASH JOIN SEMI          |                        |
|10|         VIEW                   |VW_NSO_2                |
|11|          NESTED LOOPS          |                        |
|12|           TABLE ACCESS FULL    |MLOG$_TEST_INNER        |
|14|        VIEW                    |                        |
|15|         SORT UNIQUE            |                        |
|16|          TABLE ACCESS FULL     |MLOG$_TEST_INNER        |
|17|   MAT_VIEW ACCESS BY USER ROWID|TEST_MV                 |
5 - filter[ (T_CNT>1 AND ((IN_MVLOG_CNT=T_CNT AND R>1)

Note the absence of any access to the MV to identify the rows to be deleted (row source operation 5 and its progeny; note the filter operation, which is the final "victim selection predicate"); the MV is only accessed to physically delete the rows.

Ditto for the INS.del step:

| 0|DELETE STATEMENT                     |                        |
| 1| DELETE                              |TEST_MV                 |
| 2|  NESTED LOOPS                       |                        |
| 3|   VIEW                              |VW_NSO_1                |
| 4|    SORT UNIQUE                      |                        |
| 5|     VIEW                            |                        |
| 6|      WINDOW SORT                    |                        |
| 7|       HASH JOIN SEMI                |                        |
| 8|        INDEX FULL SCAN              |TEST_MV_TEST_INNER_ROWID|
| 9|        VIEW                         |VW_NSO_2                |
|10|         NESTED LOOPS                |                        |
|11|          NESTED LOOPS               |                        |
|12|           TABLE ACCESS FULL         |MLOG$_TEST_INNER        |
|13|           TABLE ACCESS BY USER ROWID|TEST_INNER              |
|14|          INDEX RANGE SCAN           |TEST_OUTER_JOUTER_IDX   |
|15|   MAT_VIEW ACCESS BY USER ROWID     |TEST_MV                 |
5 - filter[ (T_CNT>1 AND ((NONNULL_CNT=0 AND R>1)

You might anyway create just the two "standard" single-column indexes on test_mv(test_inner_rowid) and test_mv(test_outer_rowid) and be happy with the resulting performance, even if you now will access the MV to get the "other" rowid - it all depends, of course, on your data (how many rows you have in each slice, and how many slices are touched by the marked rows) and how you modify the master tables.

Hey Mum, I’m Famous!!!

I got a mail this week from Richard Harrison:

“Hi Martin
See you made it in to oracle magazine this month.That’s the pinnacle of any oracle professionals career – all downhill from here on in :-)”

I was not aware of my sudden raise to fame, but Richard is right – I’m in this month’s “peer to peer” section, which just gives some details about recent Oracle Ace’s I think. I’d forgotten that I had done a form they sent me before Christmas, answering a set of questions. It is interesting to see what they picked out of all my answers to include.

I think most of us would feel it is nice to see something about ourselves in print (so long as it is not derogatory or critical, of course!), though when I come to think of it, I don’t really know why it is nice – other than the rather self-serving feeling of having our egos polished. And as my friends I drink with would (and probably will) comment, my ego certainly does not need much polishing :-). I’ve of course made it worse by blogging about how famous I now am. Polish, polish, polish.

Don’t worry, my wife stepped in to put me back in my place. “You could tell your mum when you next ring her – not that she’ll be impressed at all!”. Thanks wife. She’s right. My mum will just say “that’s nice” in a tone that in no way convinces me she means it, and will then proceed to talk at me about her new cats, what’s on TV and all the terrible things going on in the world, according to the “Daily Mail” (An utterly horrible and vacuous daily tabloid paper her in the UK).

So thank you for the heads-up Richard. I’m looking forward to the rapid decline of my career as you predict… :-)

Announcing EMC WORLD 2013 Flash Related Sessions

Interested In EMC Flash Products Division Technology?
This is just a quick blog entry to announce sessions at EMC WORLD offered by speakers from EMC’s Flash Products Division.  The sessions I’m speaking at is the one about accelerating SQL Server and Oracle with EMC XtremSW Cache.



Filed under: oracle