Search

Top 60 Oracle Blogs

Recent comments

linux

11.2.0.2 Bundled Patch 3 for Linux x86-64bit Take 2

Yesterday I wrote about the application of Bundle Patch 3 to my 2 node RAC cluster. Unfortunately I have run into problems when applying the patches for the GRID_HOME. I promised a fix to the situation, and here it is.

First of all I was unsure if I could apply the missing patches manually, but then decided against it. The opatch output for interim patches lists the patch together with a unique patch as shown here:

Interim patches (3) :

Patch  10626132     : applied on Wed Feb 02 16:08:43 GMT 2011
Unique Patch ID:  13350217
 Created on 31 Dec 2010, 00:18:12 hrs PST8PDT
 Bugs fixed:
 10626132

The formatting unfortunately is lost when pasting this here.

I was not sure if that patch/unique patch combination would appear if I patched manually so decided to not be brave and roll the bundle patch back altogether before applying it again.

Patch Rollback

This was actually very simple: I have opted to rollback all the applied patches from the GRID_HOME. The documentation states that you have to simply append the “-rollback” flag to the opatch command. I tried it on the node where the application of 2 patches failed:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oragrid/product/11.2.0.2 -rollback
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh /u01/app/oragrid/product/11.2.0.2 -rollback -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-03_09-04-13.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
Successfully unlock /u01/app/oragrid/product/11.2.0.2
patch 10387939  rollback successful for home /u01/app/oragrid/product/11.2.0.2
The patch  10157622 does not exist in /u01/app/oragrid/product/11.2.0.2
The patch  10626132 does not exist in /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.

So that was simple enough. Again – you won’t see anything in your opatch session and you might think that the command somehow stalled. I usually start a “screen” on my terminal and open a new window to tail the opatchauto file in $GRID_HOME/cfgtoollogs/.

Re-Applying the Patch

The next step is to re-apply the patch. The initial failure was a lack of disk space as it has been evident from the log file.

2011-02-02 15:57:45: The apply patch output is Invoking OPatch 11.2.0.1.4

 Oracle Interim Patch Installer version 11.2.0.1.4
 Copyright (c) 2010, Oracle Corporation.  All rights reserved.

 UTIL session

 Oracle Home       : /u01/app/oragrid/product/11.2.0.2
 Central Inventory : /u01/app/oracle/product/oraInventory
 from           : /etc/oraInst.loc
 OPatch version    : 11.2.0.1.4
 OUI version       : 11.2.0.2.0
 OUI location      : /u01/app/oragrid/product/11.2.0.2/oui
 Log file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch2011-02-02_15-57-35PM.log

 Patch history file: /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch_history.txt

 Invoking utility "napply"
 Checking conflict among patches...
 Checking if Oracle Home has components required by patches...
 Checking conflicts against Oracle Home...
 OPatch continues with these patches:   10157622

 Do you want to proceed? [y|n]
 Y (auto-answered by -silent)
 User Responded with: Y

 Running prerequisite checks...
 Prerequisite check "CheckSystemSpace" failed.
 The details are:
 Required amount of space(2086171834) is not available.
 UtilSession failed: Prerequisite check "CheckSystemSpace" failed.

 OPatch failed with error code 73

2011-02-02 15:57:45: patch /u01/app/oracle/product/stage/10387939/10157622  apply  failed  for home  /u01/app/oragrid/product/11.2.0.2
2011-02-02 15:57:45: Performing Post patch actions

So this time around I ensured that I had enough free space (2.5G recommended minimum) available in my GRID_HOME. The procedure is the inverse to the rollback:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oragrid/product/11.2.0.2
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh /u01/app/oragrid/product/11.2.0.2 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-03_09-27-39.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
Successfully unlock /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.
[root@node1 stage]#

And this time it worked-all 3 patches were applied. However my free space diminished quite drastically from 2.5G to around 780M. And that was after purging lots of logs from $GRID_HOME/log/`hostname -s`/ . Nevertheless, this concludes the patch application.

Summary

In summary I am quite impressed with this patch. It looks as if it had been designed to be deployable by OEM and it’s silent, doesn’t require input (except for the ocm.rsp file) and is a rolling patch. However, the user has to manually check the installed patches vs the list of installable targets manually or through a script to ensure that all patches have been applied.You also have to ensure you have enough free space on your $GRID_HOME mount point.

As an enhancement request I’d like to request feedback from the opatch session that it started doing things-initially I hit CTRL-C thinking the command had stalled while it was actually busy in the background, shutting down my CRS stack. The “workaround” is to tail the logfile with the “-f” option.

11.2.0.2 Bundled Patch 3 for Linux x86-64bit

It has become a habit recently to issue explicit warnings when presenting information that-if it doesn’t work as described-might cause you harm. So to follow suit, here’s mine:

Warning! This is a Patch Set Update installation war story-it worked for me. That does by no means imply that the same works for you! So always apply the PSU or bundle patch on a non-critical test system first. And always take a backup of your ORACLE_HOME and inventory first before applying any patches-you have been warned. I like to be rather safe than sorry.

A good one I think …  First of all thanks to Coskan for the pointer to the Oracle blog entry: http://blogs.oracle.com/UPGRADE/2011/01/11202_bundled_patch_3_for_linu.html which actually prompted me to apply the patch in the first place.

My environment is a 2 node 11.2.0.2 cluster on OEL 5.5 64bit. Judging by the readme file I need the latest OPatch, 11.2.0.1.4, (patch p6880880)  and the bundle patch itself (patch 10387939). Unfortunately Oracle leaves it as an exercise to the reader to make sense of the patch readme and note 1274453.1 which is applicable if ACFS is involved on the cluster. Although I am using ACFS all my ACFS volumes were disabled before patching (i.e. they are not shared ORACLE_HOMEs). If you have ACFS in use then follow note 1274453.1 and shut all of them down before proceeding. Note 1274453.1 is divided into a number of sections. First it’s important to find the right case-I wanted to try the opatch auto feature this time, but patch GRID_HOME independently of the RDBMS_HOME. I knew from other users that opatch auto has gained a bad reputation in the past but was curious whether or not Oracle fixed it. Regardless of your configuration, you have to stop dbconsole instances on all nodes if there are any.

If applicable, the next step is to stop all database instances and their corresponding ACFS volumes. The ACFS mounts also have to be dismounted. If you are unsure you can use /sbin/acfsutil registry to list the ACFS volumes. Those registered with Clusterware show up in the output of crsctl status resource -t.

Important! If not already done so, update OPatch on all nodes for GI and RDBMS home. It’s a lot of manual work, I wish there was a cluster wide OPatch location…it would make my life so much easier. Just patching OPatch on 4 nodes is a pain, I don’t want to imagine a 8 or more nodes cluster right now.

Now proceed with the patch installation. I’ll only let it patch the GRID_HOME at this stage. The patch should be rolling which is good. Note that you’ll be root, which can cause its own problems in larger organisations. My patch is staged in /u01/app/oracle/product/stage, and the patch readme suggests I should use the opatch executable in the $GRID_HOME.

[root@node1 ~]# cd /u01/app/oracle/product/stage
[root@node1 stage]# cd 10387939
[root@node1 10387939]# export PATH=/u01/app/oragrid/product/11.2.0.2/OPatch:$PATH

Oracle Configuration Manager response file

Unlike with previous patchsets which prompted the user each time you ran opatch, this time the OCM configuration is not part of the patch set installation. Instead, you have to create an OCM configuration “response” file before you apply the patch. The file is created in the current directory. As the GRID_OWNER, execute $ORACLE_HOME/OPatch/ocm/bin/emocmrsp as shown here:

[root@node1 stage]# su - oragrid
[oragrid@node1 ~]$ . oraenv
ORACLE_SID = [oragrid] ? grid
The Oracle base for ORACLE_HOME=/u01/app/oragrid/product/11.2.0.2 is /u01/app/oragrid/product/admin
[oragrid@node1 ~]$ /u01/app/oragrid/product/11.2.0.2/OPatch/ocm/bin/emocmrsp
OCM Installation Response Generator 10.3.1.2.0 - Production
Copyright (c) 2005, 2009, Oracle and/or its affiliates.  All rights reserved.

Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name:

You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  y
The OCM configuration response file (ocm.rsp) was successfully created.
[oragrid@node1 ~]$ pwd
/home/oragrid

I don’t want automatic email notifications in this case so I left the email address blank. I also prefer to be uninformed of future patch sets. I then moved the file into /u01/app/oracle/product/stage/ocm.rsp as root for the next steps. It is good practise to save the detailed oracle inventory information for later. Again as the grid owner I ran this command:

[root@node1 stage]# su - oragrid
[oragrid@node1 ~]$ . oraenv
ORACLE_SID = [oragrid] ? grid
The Oracle base for ORACLE_HOME=/u01/app/oragrid/product/11.2.0.2 is /u01/app/oragrid/product/admin
[oragrid@node1 ~]$ /u01/app/oragrid/product/11.2.0.2/OPatch/opatch lsinv -detail > /tmp/gridhomedetail

What can (will) be patched?

The bundle.xml file lists the components that can be patched with a bundle patch. In my case, the bundle is applicable for these:


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

So in other words, the bundle exists of 3 patches – 10387939, 10157622 and 10626132. For each of these a target type is shown. SIHA is short for Single Instance High Availability, which has been renamed “Oracle Restart”. I assume SIDB is single instance DB, but don’t know for sure.

Patching the GRID_HOME

I’m now ready to apply the patch to the GRID home. The command to do so is simple, the below example worked for me:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oragrid/product/11.2.0.2
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh

/u01/app/oragrid/product/11.2.0.2 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-02_15-50-04.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:

Before entering the path I opened another terminal to look at the log file, which is in $ORACLE_HOME/cfgtoollogs/opatchauto plus a timestamp. After entering the full path to my ocm.rsp file nothing happened in my opatch session-Oracle could have told us that it was about to apply the patch. That is because it actually is doing that. Go back to your other terminal and see the log messages fly by! The opatch auto command automatically does everything that the DBA had to do previously, including unlocking the CRS stack and calling opatch napply for the relevant bits and pieces. This is indeed a nice step forward. I can vividly remember having to apply portions of a PSU to the GRID_HOME and others to the RDBMS home, 6 or 7 steps in total before a patch was applied. That was indeed hard work.

Only after CRS has been shut down (after quite a while after entering the path to the ocm.rsp file!) will you be shown this line:

Successfully unlock /u01/app/oragrid/product/11.2.0.2

Even further down the line you see those, after the patches have been applied:

patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622  apply  failed  for home  /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.

And with that I got my prompt back. One patch failed to apply-the log file indicated a lack of space (2G are required to be free).Tomorrow I’ll post an update and remove/reapply the patch manually.

I checked the other node and found that the patch has indeed been applied on the local node only. I hasn’t propagated across which is good as the readme wasn’t really clear if the patch was rolling. From what I’ve seen I’d call it a local patch, similar to the “opatch napply -local” we did manually before the opatch auto option has been introduced. And even better, it worked. Querying opatch lsinventory I got this result:

Lsinventory Output file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/lsinv/lsinventory2011-02-02_16-01-27PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.

Interim patches (1) :

Patch  10387939     : applied on Wed Feb 02 15:56:50 GMT 2011
Unique Patch ID:  13350217
 Created on 30 Dec 2010, 22:55:01 hrs PST8PDT
 Bugs fixed:
 10158965, 9940990, 10190642, 10031806, 10228635, 10018789, 9744252
 10010252, 9956713, 10204358, 9715581, 9770451, 10094635, 10121589
 10170431, 9824198, 10071193, 10145612, 10035737, 9845644, 10086980
 10052141, 10039731, 10035521, 10219576, 10184634, 10207092, 10138589
 10209232, 8752691, 9965655, 9819413, 9500046, 10106828, 10220118, 9881076
 9869287, 10040531, 10122077, 10218814, 10261389, 10033603, 9788588
 9735237, 10126219, 10043801, 10073205, 10205715, 9709292, 10105926
 10079168, 10098253, 10005127, 10013431, 10228151, 10092153, 10142909
 10238786, 10260808, 10033071, 9791810, 10052956, 9309735, 10026972
 10080579, 10073683, 10004943, 10019218, 9539440, 10022980, 10061490
 10006008, 6523037, 9724970, 10142776, 10208386, 10113803, 10261680
 9671271, 10084145, 10051966, 10355493, 10227133, 10229719, 10046912
 10228393, 10353054, 10142788, 10221016, 9414040, 10127360, 10310299
 10094201, 9591812, 10129643, 10332589, 10026193, 10195991, 10260870
 10248523, 9951423, 10261072, 10299224, 10230571, 10222719, 10233732
 10113633, 10102506, 10094949, 10077191, 10329146, 8685446, 10048701
 10314582, 10149223, 10245259, 10151017, 9924349, 10245086, 11074393

Rac system comprising of multiple nodes
 Local node = node1
 Remote node = node2

--------------------------------------------------------------------------------

OPatch succeeded.

Patching the RDBMS home

Now with that I’ll try applying it to the RDBMS home on the same node:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oracle/product/11.2.0.2
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh

/u01/app/oracle/product/11.2.0.2 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-02_16-05-29.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622/custom/server/10157622  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oracle/product/11.2.0.2
[root@node1 stage]#

Cool! That was indeed easy. Did it work?

$ORACLE_HOME/OPatch/opatch lsinv
[...]
--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Database 11g                                                  11.2.0.2.0
There are 1 products installed in this Oracle Home.

Interim patches (3) :

Patch  10626132     : applied on Wed Feb 02 16:08:43 GMT 2011
Unique Patch ID:  13350217
 Created on 31 Dec 2010, 00:18:12 hrs PST8PDT
 Bugs fixed:
 10626132

Patch  10157622     : applied on Wed Feb 02 16:08:27 GMT 2011
Unique Patch ID:  13350217
 Created on 19 Nov 2010, 01:41:19 hrs PST8PDT
 Bugs fixed:
 9979706, 9959110, 10016083, 10015460, 10014392, 9918485, 10157622
 10089120, 10057296, 9971646, 10053985, 10040647, 9978765, 9864003
 10069541, 10110969, 10107380, 9915329, 10044622, 10029119, 9812970
 10083009, 9812956, 10048027, 10036193, 10008467, 10040109, 10015210
 10083789, 10033106, 10073372, 9876201, 10042143, 9963327, 9679401
 10062301, 10018215, 10075643, 10007185, 10071992, 10057680, 10038791
 10124517, 10048487, 10078086, 9926027, 10052721, 9944948, 10028235
 10146768, 10011084, 10027079, 10028343, 10045436, 9907089, 10073075
 10175855, 10178670, 10072474, 10036834, 9975837, 10028637, 10029900, 9949676

Patch  10157622     : applied on Wed Feb 02 16:08:27 GMT 2011
Unique Patch ID:  13350217
 Created on 19 Nov 2010, 01:41:19 hrs PST8PDT
 Bugs fixed:
 9979706, 9959110, 10016083, 10015460, 10014392, 9918485, 10157622
 10089120, 10057296, 9971646, 10053985, 10040647, 9978765, 9864003
 10069541, 10110969, 10107380, 9915329, 10044622, 10029119, 9812970
 10083009, 9812956, 10048027, 10036193, 10008467, 10040109, 10015210
 10083789, 10033106, 10073372, 9876201, 10042143, 9963327, 9679401
 10062301, 10018215, 10075643, 10007185, 10071992, 10057680, 10038791
 10124517, 10048487, 10078086, 9926027, 10052721, 9944948, 10028235
 10146768, 10011084, 10027079, 10028343, 10045436, 9907089, 10073075
 10175855, 10178670, 10072474, 10036834, 9975837, 10028637, 10029900, 9949676

Patch  10387939     : applied on Wed Feb 02 16:07:18 GMT 2011
Unique Patch ID:  13350217
 Created on 30 Dec 2010, 22:55:01 hrs PST8PDT
 Bugs fixed:
 10158965, 9940990, 10190642, 10031806, 10228635, 10018789, 9744252
 10010252, 9956713, 10204358, 9715581, 9770451, 10094635, 10121589
 10170431, 9824198, 10071193, 10145612, 10035737, 9845644, 10086980
 10052141, 10039731, 10035521, 10219576, 10184634, 10207092, 10138589
 10209232, 8752691, 9965655, 9819413, 9500046, 10106828, 10220118, 9881076
 9869287, 10040531, 10122077, 10218814, 10261389, 10033603, 9788588
 9735237, 10126219, 10043801, 10073205, 10205715, 9709292, 10105926
 10079168, 10098253, 10005127, 10013431, 10228151, 10092153, 10142909
 10238786, 10260808, 10033071, 9791810, 10052956, 9309735, 10026972
 10080579, 10073683, 10004943, 10019218, 9539440, 10022980, 10061490
 10006008, 6523037, 9724970, 10142776, 10208386, 10113803, 10261680
 9671271, 10084145, 10051966, 10355493, 10227133, 10229719, 10046912
 10228393, 10353054, 10142788, 10221016, 9414040, 10127360, 10310299
 10094201, 9591812, 10129643, 10332589, 10026193, 10195991, 10260870
 10248523, 9951423, 10261072, 10299224, 10230571, 10222719, 10233732
 10113633, 10102506, 10094949, 10077191, 10329146, 8685446, 10048701
 10314582, 10149223, 10245259, 10151017, 9924349, 10245086, 11074393

Rac system comprising of multiple nodes
 Local node = node1
 Remote node = node2

--------------------------------------------------------------------------------

OPatch succeeded.

It did indeed. As a nice side effect I didn’t even have to worry about the srvctl start/stop home commands, they were automatically done for me. From the log:

2011-02-02 16:08:45: /u01/app/oracle/product/11.2.0.2/bin/srvctl start home -o /u01/app/oracle/product/11.2.0.2 -s

/u01/app/oracle/product/11.2.0.2/srvm/admin/stophome.txt -n node1 output is
2011-02-02 16:08:45: Started resources from datbase home /u01/app/oracle/product/11.2.0.2

Now I’ll simply repeat this on the remaining nodes and should be done. To test the stability of the process I didn’t limit the opatch command to a specific home but had it patch them all.

Patching all homes in one go

This takes a little longer but otherwise is just the same. I have added the output here for the sake of completeness:

[root@node2 stage]# which opatch
/u01/app/oragrid/product/11.2.0.2/OPatch/opatch
[root@node2 stage]# opatch auto /u01/app/oracle/product/stage/10387939
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-02_16-36-09.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622/custom/server/10157622  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oracle/product/11.2.0.2
Successfully unlock /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.
[root@node2 stage]#

Well that’s for the technical part. When I compared the number of patches applied in the $GRID_HOME on node 2 (which simply used “opatch auto”) then I found 3 interim patches applied, as compared to only 1 on node 1 when I only patched the GRID_HOME. I’ll have to raise this with Oracle…

[oragrid@node2 ~]$ /u01/app/oragrid/product/11.2.0.2/OPatch/opatch lsinv
Invoking OPatch 11.2.0.1.4

Oracle Interim Patch Installer version 11.2.0.1.4
Copyright (c) 2010, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/oragrid/product/11.2.0.2
Central Inventory : /u01/app/oracle/product/oraInventory
 from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.4
OUI version       : 11.2.0.2.0
OUI location      : /u01/app/oragrid/product/11.2.0.2/oui
Log file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch2011-02-02_17-13-40PM.log

Patch history file: /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/lsinv/lsinventory2011-02-02_17-13-40PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.

Interim patches (3) :

Patch  10626132     : applied on Wed Feb 02 16:48:59 GMT 2011
Unique Patch ID:  13350217
 Created on 31 Dec 2010, 00:18:12 hrs PST8PDT
 Bugs fixed:
 10626132

Patch  10157622     : applied on Wed Feb 02 16:48:34 GMT 2011
Unique Patch ID:  13350217
 Created on 19 Nov 2010, 01:41:33 hrs PST8PDT
 Bugs fixed:
 9979706, 9959110, 10016083, 10015460, 10014392, 9918485, 10157622
 10089120, 10057296, 9971646, 10053985, 10040647, 9978765, 9864003
 10069541, 10110969, 10107380, 9915329, 10044622, 10029119, 9812970
 10083009, 9812956, 10048027, 10036193, 10008467, 10040109, 10015210
 10083789, 10033106, 10073372, 9876201, 10042143, 9963327, 9679401
 10062301, 10018215, 10075643, 10007185, 10071992, 10057680, 10038791
 10124517, 10048487, 10078086, 9926027, 10052721, 9944948, 10028235
 10146768, 10011084, 10027079, 10028343, 10045436, 9907089, 10073075
 10175855, 10072474, 10036834, 9975837, 10028637, 10029900, 9949676
 9974223, 10260251

Patch  10387939     : applied on Wed Feb 02 16:46:29 GMT 2011
Unique Patch ID:  13350217
 Created on 30 Dec 2010, 22:55:01 hrs PST8PDT
 Bugs fixed:
 10158965, 9940990, 10190642, 10031806, 10228635, 10018789, 9744252
 10010252, 9956713, 10204358, 9715581, 9770451, 10094635, 10121589
 10170431, 9824198, 10071193, 10145612, 10035737, 9845644, 10086980
 10052141, 10039731, 10035521, 10219576, 10184634, 10207092, 10138589
 10209232, 8752691, 9965655, 9819413, 9500046, 10106828, 10220118, 9881076
 9869287, 10040531, 10122077, 10218814, 10261389, 10033603, 9788588
 9735237, 10126219, 10043801, 10073205, 10205715, 9709292, 10105926
 10079168, 10098253, 10005127, 10013431, 10228151, 10092153, 10142909
 10238786, 10260808, 10033071, 9791810, 10052956, 9309735, 10026972
 10080579, 10073683, 10004943, 10019218, 9539440, 10022980, 10061490
 10006008, 6523037, 9724970, 10142776, 10208386, 10113803, 10261680
 9671271, 10084145, 10051966, 10355493, 10227133, 10229719, 10046912
 10228393, 10353054, 10142788, 10221016, 9414040, 10127360, 10310299
 10094201, 9591812, 10129643, 10332589, 10026193, 10195991, 10260870
 10248523, 9951423, 10261072, 10299224, 10230571, 10222719, 10233732
 10113633, 10102506, 10094949, 10077191, 10329146, 8685446, 10048701
 10314582, 10149223, 10245259, 10151017, 9924349, 10245086, 11074393

Rac system comprising of multiple nodes
 Local node = node2
 Remote node = node1

--------------------------------------------------------------------------------

OPatch succeeded.

Viewing Runtime Load Balancing Events

Yesterday I have run a benchmark on a 2 node RAC cluster (ProLiant BL685c G6 with 4 Six-Core AMD Opteron 8431) and 32G RAM each. It’s running Oracle Grid Infrastructure 11.2.0.2 as well as an Oracle 11.2.0.2 database on Oracle Enterprise Linux 5.5 64bit and device-mapper-multipath.

I was testing how the system would react under load but also wanted to see if the Runtime Load Balancing was working. The easiest way to check this is to view the AQ events that are generated for a service if AQ HA notifications is set to true. They can either be dequeued from the database as described in chapter 11 of Pro Oracle Database 11g RAC on Linux or alternatively queried from the database. The latter is the quicker method and this article will focus on it.

Before you can make use of Runtime Load Balancing you need to set at least 2 properties in your service:

  • Connection Load Balancing Goal (either SHORT or LONG)
  • Runtime Load Balancing Goal (SERVICE_TIME or THROUGHPUT)

.Net applications require AQ HA notifications to be set to true as these can’t directly make use of Fast Application Notification (FAN) events as said in the introduction. My JDBC application is fully capable of using the FAN events, however as you will see later I am using the AQ notifications anyway to view the events.

Connected as the owner of the Oracle binaries, I created a new service to make use of both instances:

$ srvctl add service -d TEST -s TESTSRV -r TEST1,TEST2 -P BASIC  \
> -l PRIMARY -y MANUAL -q true -x false -j short -B SERVICE_TIME \
> -e SESSION -m BASIC -z 0 -w 0

The service TESTSRV for database TEST has TEST1 and TEST2 as preferred instances, and the service should be started (manually) when the database is in the primary role. AQ Notifications are enabled, and I chose the connection load balancing goal to be “short” (usually ok with web applications and connection pooling) and a runtime load balancing goal of service time (should also be appropriate for many short transactions typical for a web environment). The remaining paramters define Transparent Application Failover. Please refer to the output of “srvctl add service -h” for more information about the command line parameters.

The result of this endavour can be viewed with srvctl config service:

$ srvctl config service -d TEST -s TESTSRV
Service name: TESTSRV
Service is enabled
Server pool: TEST_TESTSRV
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: MANUAL
DTP transaction: false
AQ HA notifications: true
Failover type: SESSION
Failover method: BASIC
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: SHORT
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: BASIC
Edition:
Preferred instances: TEST1,TEST2
Available instances:

So to begin with I created the order entry schema (SOE) in preparation of a swingbench run. (I know that Swingbench’s Order Entry is probably not the best benchmark out there but my client knows and likes it). Once about 10G of data were generated I started a swingbench run with 300 users, and reasonably low think time (min transaction time 20ms and max of 60ms). The connect string was //scan1.example.com:1800/TESTSRV

A query against gv$session showed an even balance of sessions, which was good:

select count(inst_id), inst_id
 from gv$session
where username = 'SOE'
group by inst_id

However, whatever I did I couldn’t get the Runtime Load Balancing in sys.sys$service_metrics_tab to chanage. They always looked like this (column user_data):

{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} }

That sort of made sense as none of the nodes broke into a sweat-the system was > 50% idle with a load average of about 12. So that wouldn’t cut it. Instead of trying to experiment with the Swingbench parameters, I decided to revert back to the silly CPU burner: a while loop which generates random numbers. I wasn’t interested in I/O at this stage, and created this minimal script:

$ cat dothis.sql
declare
 n number;
begin
 while (true) loop
 n:= dbms_random.random();
 end loop;
end;
/

A simple for loop can be used to start the load test:

$ for i in $(seq 30); do
> sqlplus soe/soe@scan1.example.com:1800/TESTSRV @dothis &
done

This created an even load on both nodes. I then started another 20 sessions on node1 against TEST1 to trigger the change in behaviour. And fair enough, the top few lines of “top” revealed the difference. The output for node 1 was as follows:


top - 10:59:30 up 1 day, 21:16,  6 users,  load average: 42.44, 20.23, 10.07
Tasks: 593 total,  48 running, 545 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.9%us,  0.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32960688k total, 11978912k used, 20981776k free,   441260k buffers
Swap: 16777208k total,        0k used, 16777208k free,  8078336k cached

Whereas node 2 was relatively idle.

top - 10:59:22 up 5 days, 17:45,  4 users,  load average: 15.80, 10.53, 5.74
Tasks: 631 total,  16 running, 605 sleeping,  10 stopped,   0 zombie
Cpu(s): 58.8%us,  0.6%sy,  0.0%ni, 40.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32960688k total, 11770080k used, 21190608k free,   376672k buffers
Swap: 16777208k total,        0k used, 16777208k free,  7599496k cached

Would that imbalance finally make a difference? It did, as the user_data column (truncated here for better readability) reveals:

SQL> select user_data
2  from sys.sys$service_metrics_tab
3  order by enq_time desc;

{instance=TEST1 percent=4 flag=GOOD aff=TRUE}{instance=TEST2 percent=96 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:01:16')
{instance=TEST1 percent=6 flag=GOOD aff=TRUE}{instance=TEST2 percent=94 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:00:46')
{instance=TEST1 percent=10 flag=GOOD aff=TRUE}{instance=TEST2 percent=90 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:00:16')
{instance=TEST1 percent=18 flag=GOOD aff=TRUE}{instance=TEST2 percent=82 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:59:46')
{instance=TEST1 percent=28 flag=GOOD aff=TRUE}{instance=TEST2 percent=72 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:59:16')
{instance=TEST1 percent=35 flag=GOOD aff=TRUE}{instance=TEST2 percent=65 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:58:46')
{instance=TEST1 percent=40 flag=GOOD aff=TRUE}{instance=TEST2 percent=60 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:58:16')
{instance=TEST1 percent=43 flag=GOOD aff=TRUE}{instance=TEST2 percent=57 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:57:46')
{instance=TEST1 percent=44 flag=GOOD aff=TRUE}{instance=TEST2 percent=56 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:57:16')
{instance=TEST1 percent=48 flag=GOOD aff=TRUE}{instance=TEST2 percent=52 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:56:46')
{instance=TEST1 percent=49 flag=GOOD aff=TRUE}{instance=TEST2 percent=51 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:56:16')
{instance=TEST1 percent=50 flag=GOOD aff=TRUE}{instance=TEST2 percent=50 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:55:46')
{instance=TEST1 percent=50 flag=GOOD aff=TRUE}{instance=TEST2 percent=50 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:55:16')
{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} } timestamp=2011-01-20 10:54:46')
{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} } timestamp=2011-01-20 10:54:16')

Where it was initially even at 50-50 it soon became imbalanced, and TEST2 would be preferred after a few minutes in the test.  So everything was working as expected, I just didn’t manage to put enough load on the system intially.

Solaris Eye for the Linux Guy… Part II (oprofile, Dtrace, and Oracle Event Trace)

Proper tool for the job

My grandfather used to say to me: “Use the proper tool for the job”.  This is important to keep in mind when faced with performance issues.  When I am faced with performance problems in Oracle, I typically start at a high level with AWR reports or Enterprise Manager to get a high level understanding of the workload.   To drill down further, the next step is to use Oracle “10046 event” tracing.  Cary Millsap created a methodology around event tracing called “Method-R” which shows how to focus in on the source of a performance problem by analyzing the components that contribute to response time.   These are all fine places to start to analyze performance problems from the “user” or “application” point of view.  But what happens if the OS is in peril?

Adding storage dynamically to ASM on Linux

Note: This discussion is potentially relevant only to OEL 5.x and RHEL 5.x- I haven’t been able to verify that it works the same way on other Linux distributions. I would assume so though. Before starting with the article, here are some facts:

  • OEL/RHEL 5.5 64bit
  • Oracle 11.2.0.2
  • native multipathing: device-mapper-multipath

The question I have asked myself many times is: how can I dynamically add a LUN to ASM without having to stop any component of the stack? Mocking “reboot-me” OS’s like Windows I soon was quiet when it came to discussing the addition of a LUN to ASM on Linux. Today I learned how to do this, by piecing together information I got from Angus Thomas, a great Red Hat system administrator I had the pleassure to work with in 2009 and 2010. And since I have a short lived memory I decided to write it down.

I’ll describe the process from the top to bottom, from the addition of the LUN to the server all the way up to the addition of the ASM disk to the disk group.

Adding the storage to the cluster nodes

The first step is to obviosuly get the LUN assigned to the server(s). This is the easy part, and outside of the control of the Linux/Oracle admin. The storage team will provision a new LUN to the hosts in question. At this stage, Linux has no idea about the new storage: to make it available, the system administrator has to rescan the SCSI bus. A proven and tested way in RHEL 5 is to issue this command:

[root@node1 ~]# for i in `ls -1 /sys/class/scsi_host`; do
> echo "- - -" > /sys/class/scsi_host/${i}/scan
> done

The new, unpartitioned LUN will appear in /proc/partitions. If it doesn’t then there’s probably something wrong on the SAN side-check /var/log/messages and talk to your storage administrator. If it’s not a misconfiguration then you may not have an option but to reboot the node.

Configure Multipathing

So far so good, the next step is to add it to the multipathing. First of all, you need to find out what the new WWID of the device is. In my case that’s simple: the last new line in /proc/partitions is usually a giveaway. If you are unsure, ask the man who can check the WWID a console to the array. It’s important to get this right at this stage :)

To add the new disk to the multipath.conf file, all you need to do is to add a new section, as in the following example:


multipaths {
..
multipath {
wwid 360000970000294900664533030344239
alias ACFS0001
path_grouping_policy failover
}
..
}

By the way, I have written a more detailed post about configuring multipathing in a previous blog post here. Don’t forget to replicate the changes to the other cluster nodes!

Now  you reload multipathd using /etc/init.d/multipathd reload on each node, and voila, you should see the device in /dev/mapper/ – my ACFS disk appeared as /dev/mapper/ACFS0001.

Now the tricky bit is to partition it (if you need to-it’s no longer mandatory with 11.1 and newer. Some software like EMC’s Replication Manager requires you to though). I succeeded in doing so by checking the device in /dev/disk/by-id and then using fdisk against it as in this example:

...
# fdisk /dev/disk/by-id/scsi-360000970000294900664533030344239
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

The number of cylinders for this disk is set to 23251.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
 (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): u
Changing display/entry units to sectors

Command (m for help): n
Command action
 e   extended
 p   primary partition (1-4)
p
Partition number (1-4): 1
First sector (32-47619839, default 32): 128
Last sector or +size or +sizeM or +sizeK (129-47619839, default 47619839):
Using default value 47619839

Command (m for help): p

Disk /dev/disk/by-id/scsi-360000970000294900664533030344239: 24.3 GB, 24381358080 bytes
64 heads, 32 sectors/track, 23251 cylinders, total 47619840 sectors
Units = sectors of 1 * 512 = 512 bytes

 Device Boot                                                         Start  End         Blocks     Id  System
/dev/disk/by-id/scsi-360000970000294900664533030344239p1             128    47619839    23809855+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Once you are in fdisk, the commands are identical to single-pathed storage. Type “n” to create a new partition, “p” for a primary and specify the start and end cylinders as needed. Type “p” to print the partition table, and if you are happy with it use “w” to write it. You might wonder why I added an offset and changed the unit (“u”)-this is due to the EMC storage this site uses.  The EMC® Host Connectivity Guide for Linux (P/N 300-003-865 REV A23) suggests a 64k offset. Don’t simply repeat this in your environment-check with the storage team first.

Before adding the partitions to ACFS0001 and ACFS0002 I had 107 partitions:


[root@node1 ~]# wc -l /proc/partitions
107 /proc/partitions

The new partitions are recognised after the 2 fdisk commands completed:


[root@node1 ~]# wc -l /proc/partitions
 107 /proc/partitions

But when you check /dev/mapper now you still don’t see the partition-the naming convention is to append pn to the device name, i.e. /dev/mapper/ACFS0001p1 for the first partition and so on.

kpartx to the rescue! This superb utility can read the partition table of a device and modify it. Initially my setup was as follows:


[root@node1 ~]# ls -l /dev/mapper/ACFS*
brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001
brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002

Usually I would have rebooted the node at this stage as I didn’t know about how to update the partition table. But with kpartx (“yum install kpartx” to install) this is no longer needed. Consider the below example:

[root@node1 ~]# kpartx -l /dev/mapper/ACFS0001
ACFS0001p1 : 0 47619711 /dev/mapper/ACFS0001 129
[root@node1 ~]# kpartx -a /dev/mapper/ACFS0001
[root@node1 ~]# kpartx -l /dev/mapper/ACFS0002
ACFS0002p1 : 0 47619711 /dev/mapper/ACFS0002 129
[root@node1 ~]# kpartx -a /dev/mapper/ACFS0002

[root@node1 ~]# ls -l /dev/mapper/ACFS000*
brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001
brw-rw---- 1 root disk 253, 36 Jan 18 10:13 /dev/mapper/ACFS0001p1
brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002
brw-rw---- 1 root disk 253, 37 Jan 18 10:13 /dev/mapper/ACFS0002p1

“kpartx -l” prints the partition table, and “kpartx -a” adds it as the example shows. No more need to reboot! However, as it’s been pointed out in the comments section (see below), kpartx doesn’t use/add both paths, so you should run the partprobe command to add the missing paths:


[root@node1 ~]# partprobe
[root@node1 ~]# wc -l /proc/partitions
109 /proc/partitions

 

 

See how there are 109 partitions listed now instead of just 107 from before-the 2 missing paths have been added (one for each device).

Add disks to ASM

With this done, you can add the disk to ASM – I personally like the intermediate step to create and ASMLib disk. Connect to ASM as sysasm and add the disk using the alter diskgroup command:

SQL> alter diskgroup ACFSDG add disk 'ORCL:ACFS0002', 'ORCL:ACFS0001';

Now just wait for the rebalance operation to complete.

Solaris Eye for the Linux guy… or how I learned to stop worrying about Linux and Love Solaris (Part 1)

This entry goes out to my Oracle techie friends that have been in the Linux camp for sometime now and are suddenly finding themselves needing to know more about Solaris… hmmmm… I wonder if this has anything to do with Solaris now being an available option with Exadata?  Or maybe the recent announcement that the SPARC T3 multiplier for T3-x servers is now 0.25.  Judging by my inbox recently, I suspect a renewed interest in Solaris to continue.

Adding user equivalence for RAC the easy way

This is the first time I am setting up a new 11.2.0.2 cluster with the automatic SSH setup. Until now, I ensured user equivalence by copying ssh RSA and DSA manually to all cluster nodes. For two nodes that’s not too bad, but recently someone asked a question around a 28 (!) node cluster on a mailing list I am subscribing to. So that’s when I think the whole process  gets a bit too labour intensive.

So setting up user equivalence using a script may be the solution. You can also use OUI to do the same, but I like to run “cluvfy stage -post hwos” to check everything is ok before even thinking about executing ./runInstaller.

Here’s the output of a session, my 2 cluster nodes are acfsprodnode1 and acfsprodnode2 (yes, they are for 11.2 ACFS replication and encryption testing). I am using the grid user as the owner of Grid Infrastructure, and oracle to own the RDBMS binaries. Start by navigating to the location where you unzipped the Grid Infrastructure patch file. Then change into directoy “sshsetup” and run the command:


[grid@acfsprdnode1 sshsetup]$ ./sshUserSetup.sh
Please specify a valid and existing cluster configuration file.
Either user name or host information is missing
Usage ./sshUserSetup.sh -user  [ -hosts "" | -hostfile  ] [ -advanced ]  [ -verify] [ -exverify ] [ -logfile  ] [-confirm] [-shared] [-help] [-usePassphrase] [-noPromptPassphrase]

Next execute the command, I opted for option noPromptPassphrase, as I don’t use them for the key.

[grid@acfsprdnode1 sshsetup]$ ./sshUserSetup.sh -user grid -hosts "acfsprdnode1 acfsprdnode2" -noPromptPassphrase
The output of this script is also logged into /tmp/sshUserSetup_2010-12-22-15-39-18.log
Hosts are acfsprdnode1 acfsprdnode2
user is grid
Platform:- Linux
Checking if the remote hosts are reachable
PING acfsprdnode1.localdomain (192.168.99.100) 56(84) bytes of data.
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=1 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=2 ttl=64 time=0.019 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=3 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=4 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=5 ttl=64 time=0.018 ms

--- acfsprdnode1.localdomain ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.017/0.017/0.019/0.004 ms
PING acfsprdnode2.localdomain (192.168.99.101) 56(84) bytes of data.
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=1 ttl=64 time=0.331 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=2 ttl=64 time=0.109 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=3 ttl=64 time=0.324 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=4 ttl=64 time=0.256 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=5 ttl=64 time=0.257 ms

--- acfsprdnode2.localdomain ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.109/0.255/0.331/0.081 ms
Remote host reachability check succeeded.
The following hosts are reachable: acfsprdnode1 acfsprdnode2.
The following hosts are not reachable: .
All hosts are reachable. Proceeding further...
firsthost acfsprdnode1
numhosts 2
#ff0000;">The script will setup SSH connectivity from the host acfsprdnode1 to all
#ff0000;">the remote hosts. After the script is executed, the user can use SSH to run
commands on the remote hosts or copy files between this host acfsprdnode1
and the remote hosts without being prompted for passwords or confirmations.

NOTE 1:
As part of the setup procedure, this script will use ssh and scp to copy
files between the local host and the remote hosts. Since the script does not
store passwords, you may be prompted for the passwords during the execution of
the script whenever ssh or scp is invoked.

NOTE 2:
AS PER SSH REQUIREMENTS, THIS SCRIPT WILL SECURE THE USER HOME DIRECTORY
AND THE .ssh DIRECTORY BY REVOKING GROUP AND WORLD WRITE PRIVILEDGES TO THESE
directories.

Do you want to continue and let the script make the above mentioned changes (yes/no)?
#ff0000;">yes

The user chose yes
User chose to skip passphrase related questions.
Creating .ssh directory on local host, if not present already
Creating authorized_keys file on local host
Changing permissions on authorized_keys to 644 on local host
Creating known_hosts file on local host
Changing permissions on known_hosts to 644 on local host
Creating config file on local host
If a config file exists already at /home/grid/.ssh/config, it would be backed up to /home/grid/.ssh/config.backup.
Removing old private/public keys on local host
Running SSH keygen on local host with empty passphrase
Generating public/private rsa key pair.
Your identification has been saved in /home/grid/.ssh/id_rsa.
Your public key has been saved in /home/grid/.ssh/id_rsa.pub.
The key fingerprint is:
de:e3:66:fa:16:e8:6e:36:fd:c5:e3:77:75:07:9a:b0 grid@acfsprdnode1
Creating .ssh directory and setting permissions on remote host acfsprdnode1
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR grid. THIS IS AN SSH REQUIREMENT.
The script would create ~grid/.ssh/config file on remote host acfsprdnode1. If a config file exists already at ~grid/.ssh/config, it would be backed up to ~grid/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host acfsprdnode1.
Warning: Permanently added 'acfsprdnode1,192.168.99.100' (RSA) to the list of known hosts.
grid@acfsprdnode1's password:
Done with creating .ssh directory and setting permissions on remote host acfsprdnode1.
Creating .ssh directory and setting permissions on remote host acfsprdnode2
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR grid. THIS IS AN SSH REQUIREMENT.
The script would create ~grid/.ssh/config file on remote host acfsprdnode2. If a config file exists already at ~grid/.ssh/config, it would be backed up to ~grid/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host acfsprdnode2.
Warning: Permanently added 'acfsprdnode2,192.168.99.101' (RSA) to the list of known hosts.
grid@acfsprdnode2's password:
Done with creating .ssh directory and setting permissions on remote host acfsprdnode2.
Copying local host public key to the remote host acfsprdnode1
The user may be prompted for a password or passphrase here since the script would be using SCP for host acfsprdnode1.
grid@acfsprdnode1's password:
Done copying local host public key to the remote host acfsprdnode1
Copying local host public key to the remote host acfsprdnode2
The user may be prompted for a password or passphrase here since the script would be using SCP for host acfsprdnode2.
grid@acfsprdnode2's password:
Done copying local host public key to the remote host acfsprdnode2
cat: /home/grid/.ssh/known_hosts.tmp: No such file or directory
cat: /home/grid/.ssh/authorized_keys.tmp: No such file or directory
SSH setup is complete.

------------------------------------------------------------------------
Verifying SSH setup
===================
The script will now run the date command on the remote nodes using ssh
to verify if ssh is setup correctly. IF THE SETUP IS CORRECTLY SETUP,
THERE SHOULD BE NO OUTPUT OTHER THAN THE DATE AND SSH SHOULD NOT ASK FOR
PASSWORDS. If you see any output other than date or are prompted for the
password, ssh is not setup correctly and you will need to resolve the
issue and set up ssh again.
The possible causes for failure could be:
1. The server settings in /etc/ssh/sshd_config file do not allow ssh
for user grid.
2. The server may have disabled public key based authentication.
3. The client public key on the server may be outdated.
4. ~grid or ~grid/.ssh on the remote host may not be owned by grid.
5. User may not have passed -shared option for shared remote users or
may be passing the -shared option for non-shared remote users.
6. If there is output in addition to the date, but no password is asked,
it may be a security alert shown as part of company policy. Append the
additional text to the /sysman/prov/resources/ignoreMessages.txt file.
------------------------------------------------------------------------
--acfsprdnode1:--
Running /usr/bin/ssh -x -l grid acfsprdnode1 date to verify SSH connectivity has been setup from local host to acfsprdnode1.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
Wed Dec 22 15:40:10 GMT 2010
------------------------------------------------------------------------
--acfsprdnode2:--
Running /usr/bin/ssh -x -l grid acfsprdnode2 date to verify SSH connectivity has been setup from local host to acfsprdnode2.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
Wed Dec 22 15:40:10 GMT 2010
------------------------------------------------------------------------
SSH verification complete.
[grid@acfsprdnode1 sshsetup]$ ssh acfsprdnode1 hostname
acfsprdnode1
[grid@acfsprdnode1 sshsetup]$ ssh acfsprdnode2 hostname
acfsprodnode2
[grid@acfsprdnode1 sshsetup]$

Nice! That’s a lot of work taken away from me, and I can start runing cluvfy now to fix problems before OUI warns me about shortcomings on my system.

You should note that per the above output, the script only distributes the local ssh keys to the remote hosts. When in OUI’s cluster node addition screen (6 of 16 in the advanced installation) you still need to click on the “SSH Connectivity” button and then on “Setup” after providing username and password to establish cluster wide user equivalence.

Wheeew, I am now a RedHat Certified Engineer!

A couple of weeks ago, RedHat announced the general availability of RHEL6… also effective on this release is the change on their certification offering. RHCT will now be replaced by RHCSA (Red Hat Certified System Administrator), and if you would like to be RHCE on RHEL6 regardless of your certification on RHEL5 you still have to go through the RHCSA exam.. and then once you pass.. you are then allowed to take RHCE exam for RHEL6. More details here: RHCSA, RHCE

UltraEdit for Mac, Production Release…

Just to let you know, UltraEdit for Mac has now been released to production.

I’ve been using the beta version for a while and it’s really cool. If you love UltraEdit on Windows, then you will love UltraEdit on Mac. It’s been released as version 2.0, which from what I can see has pretty much all the features of the Windows version 16.x. The Linux version (1.0)  is still missing a lot, but it is supposed to have a version 2.0 release in the new year.

I’ve upgraded my Windows license to a multi-platform license with unlimited updates, so I can run UltraEdit on Windows, Linux and Mac and never pay for an upgrade again. It’s not the cheapest option but for a bit of kit like this I’m very willing to pay up.

So now I have UltraEdit on everything and SnagIt on Windows and Mac. If I could switch from Shutter to SnagIt on Linux I would be ecstatic. :)

Cheers

Tim…

Configuration device mapper multipath on OEL5 update 5

I have always wondered how to configure the device mapper multipath package for a Linux system. I knew how to do it in principle, but was never involved in the configuration from start up. Today I got the chance to work on this. The system is used for a lab test and not a production box (otherwise I probably wouldn’t have been allowed on). Actually it’s part of a 2 node cluster.

So the first step is to find out which partitions are visible to the system. The Linux kernel presents this information in the /proc/partitions table, as in the following example:


[root@node1 ~]# cat /proc/partitions
major minor  #blocks  name

 104     0   71652960 cciss/c0d0
 104     1     152586 cciss/c0d0p1
 104     2   71497282 cciss/c0d0p2
 8     0       2880 sda
 8    16  190479360 sdb
 8    32   23809920 sdc
 8    48   23809920 sdd
 8    64   23809920 sde
 8    80   23809920 sdf
 8    96   23809920 sdg
 8   112    1048320 sdh
 8   128    1048320 sdi
 8   144    1048320 sdj
 8   160       2880 sdk
 8   176  190479360 sdl
 8   192   23809920 sdm
 8   208   23809920 sdn
 8   224   23809920 sdo
 8   240   23809920 sdp
 65     0   23809920 sdq
 65    16    1048320 sdr
 65    32    1048320 sds
 65    48    1048320 sdt
 253     0    5111808 dm-0
 253     1   25591808 dm-1
 253     2   10223616 dm-2
 253     3    1015808 dm-3
 253     4   16777216 dm-4
[root@node1 ~]#

Using a keen eye you can see that sdk is the same size as sda, so probably that means that we have two paths to sda to sdj. We’ll confirm this later. The more HBAs and paths you have, the more partitions you are going to see. This is where the multipathing software comes into play: it allows us to abstract from the physical paths and presents a logical device. And offers some additional goodies such as path failover and limited load balancing.

Before proceeding I checked the status of the multipath daemon:

[root@node1 ~]# service multipathd status
multipathd is stopped
[root@node1 ~]# chkconfig --list multipathd
multipathd      0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@node1 ~]# chkconfig multipathd on

As you can see it was not started, and wouldn't start with a reboot - it was necessary to enable the service at boot time using the chkconfig command. This will automatically create links in /etc/rc.d/rcx.d to start and stop the service. As an additional benefit this command will respect dependencies the authors of the startup script have defined and create the {K,S}xxmultipathd links accordingly.

I next loaded the necessary modules-dm-multipath and dm-round-robin:

[root@node1 ~]# modprobe dm-multipath
[root@node1 ~]# modprobe dm-round-robin

With the multipathing nearly done, I need to get the WWIDs of all attached devices. At some point the WWID is going to be repeated - this is where you stop creating meta devices. Let's have a look at the output of this first. You have to change directory to /sys, as the scsi_id commands are relative to it.

[node1 sys]# for i in `cat /proc/partitions | awk '{print $4}' |grep sd`; do echo "### $i: `scsi_id -g -u -s /block/$i`"; done
### sda: 360000970000294900664533030303238
### sdb: 360000970000294900664533030344133
### sdc: 360000970000294900664533030344142
### sdd: 360000970000294900664533030344143
### sde: 360000970000294900664533030344144
### sdf: 360000970000294900664533030344239
### sdg: 360000970000294900664533030344241
### sdh: 360000970000294900664533030344244
### sdi: 360000970000294900664533030344245
### sdj: 360000970000294900664533030344246
### sdk: 360000970000294900664533030303238
### sdl: 360000970000294900664533030344133
### sdm: 360000970000294900664533030344142
### sdn: 360000970000294900664533030344143
### sdo: 360000970000294900664533030344144
### sdp: 360000970000294900664533030344239
### sdq: 360000970000294900664533030344241
### sdr: 360000970000294900664533030344244
### sds: 360000970000294900664533030344245
### sdt: 360000970000294900664533030344246
[node1 sys]#

Here you see again that sda and sdk have the same WWID. I like to assign alias names to the multipathing devices-that's going to make it easier to find out what they are used for. I now have to get the disk sizes and map these to their intended use.

Getting disk sizes:

[node1 sys]# fdisk -l 2>/dev/null | grep ^Disk
Disk /dev/cciss/c0d0: 73.3 GB, 73372631040 bytes        local
Disk /dev/sda: 2 MB, 2949120 bytes                ignore
Disk /dev/sdb: 195.0 GB, 195050864640 bytes
Disk /dev/sdc: 24.3 GB, 24381358080 bytes
Disk /dev/sdd: 24.3 GB, 24381358080 bytes
Disk /dev/sde: 24.3 GB, 24381358080 bytes
Disk /dev/sdf: 24.3 GB, 24381358080 bytes
Disk /dev/sdg: 24.3 GB, 24381358080 bytes
Disk /dev/sdh: 1073 MB, 1073479680 bytes
Disk /dev/sdi: 1073 MB, 1073479680 bytes
Disk /dev/sdj: 1073 MB, 1073479680 bytes
Disk /dev/sdk: 2 MB, 2949120 bytes                ignore
Disk /dev/sdl: 195.0 GB, 195050864640 bytes
Disk /dev/sdm: 24.3 GB, 24381358080 bytes
Disk /dev/sdn: 24.3 GB, 24381358080 bytes
Disk /dev/sdo: 24.3 GB, 24381358080 bytes
Disk /dev/sdp: 24.3 GB, 24381358080 bytes
Disk /dev/sdq: 24.3 GB, 24381358080 bytes
Disk /dev/sdr: 1073 MB, 1073479680 bytes
Disk /dev/sds: 1073 MB, 1073479680 bytes
Disk /dev/sdt: 1073 MB, 1073479680 bytes

The cleared, consolidated view on the storage:

### sdb: 360000970000294900664533030344133    195G
### sdc: 360000970000294900664533030344142    24.3G
### sdd: 360000970000294900664533030344143    24.3G   
### sde: 360000970000294900664533030344144    24.3G   
### sdf: 360000970000294900664533030344239    24.3G   
### sdg: 360000970000294900664533030344241    24.3G   
### sdh: 360000970000294900664533030344244    1G
### sdi: 360000970000294900664533030344245    1G
### sdj: 360000970000294900664533030344246    1G

### sdl: 360000970000294900664533030344133    repeat - second path
### sdm: 360000970000294900664533030344142
### sdn: 360000970000294900664533030344143
### sdo: 360000970000294900664533030344144
### sdp: 360000970000294900664533030344239
### sdq: 360000970000294900664533030344241
### sdr: 360000970000294900664533030344244
### sds: 360000970000294900664533030344245
### sdt: 36000097000029490066453303034424

Finally here's the mapping I will use:

  • sdb    DATA001
  • sdc    REDO001
  • sdd     FRA001
  • sde    FRA002
  • sdf    ACFS001
  • sdg    ACFS002
  • h,i,j     VOTINGOCR{1,2,3}

The mapping between WWID and alias happens in the /etc/multipath.conf file. The defaults section has been taken from MOS note 555603.1. The devnode_blacklist section has to be set up according to your storage config-in my case I ignore IDE devices and the internal RAID adapter.

[root@node1 ~]# cat /etc/multipath.conf
defaults {
 udev_dir                /dev
 polling_interval        10
 selector                "round-robin 0"
 path_grouping_policy    multibus
 getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
 prio_callout            /bin/true
 path_checker            readsector0
 rr_min_io               100
 rr_weight               priorities
 failback                immediate
 no_path_retry           fail
 user_friendly_name      no
}

devnode_blacklist {
 devnode "^(ramrawloopfdmddm-srscdst)[0-9]*"
 devnode "^hd[a-z]"
 devnode "^cciss!c[0-9]d[0-9]*"
 }

}

multipaths {
 multipath {
 wwid 360000970000294900664533030344133
 alias data001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344142
 alias redo001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344143
 alias fra001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344144
 alias fra002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344239
 alias acfs001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344241
 alias acfs002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344244
 alias votingocr001
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344245
 alias votingocr002
 path_grouping_policy failover
 }
 multipath {
 wwid 360000970000294900664533030344246
 alias votingocr003
 path_grouping_policy failover
 }
}

The mapping is really simple-for each device you use, create a "multipath" section, enter WWID, an alias and a path policy. Done! See if that worked by starting the multipath daemon:

[root@node1 ~]# service multipathd start

As always, /var/log/messages is a good place to check:

Nov 16 16:34:58 loninengblc204 kernel: device-mapper: table: 253:5: multipath: error getting device
Nov 16 16:34:58 loninengblc204 kernel: device-mapper: ioctl: error adding target to table
Nov 16 16:34:58 loninengblc204 multipathd: 360000970000294900664533030303238: load table [0 5760 multipath 0 0 1 1 round-robin 0 2 1 8:0 1000 8:160 1000]
Nov 16 16:34:58 loninengblc204 multipathd: data001: load table [0 380958720 multipath 0 0 2 1 round-robin 0 1 1 8:16 1000 round-robin 0 1 1 8:176 1000]
Nov 16 16:34:58 loninengblc204 multipathd: redo001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:32 1000 round-robin 0 1 1 8:192 1000]
Nov 16 16:34:58 loninengblc204 multipathd: fra001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:48 1000 round-robin 0 1 1 8:208 1000]
Nov 16 16:34:58 loninengblc204 multipathd: fra002: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:64 1000 round-robin 0 1 1 8:224 1000]
Nov 16 16:34:58 loninengblc204 multipathd: acfs001: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:80 1000 round-robin 0 1 1 8:240 1000]
Nov 16 16:34:58 loninengblc204 multipathd: acfs002: load table [0 47619840 multipath 0 0 2 1 round-robin 0 1 1 8:96 1000 round-robin 0 1 1 65:0 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr001: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:112 1000 round-robin 0 1 1 65:16 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr002: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:128 1000 round-robin 0 1 1 65:32 1000]
Nov 16 16:34:58 loninengblc204 multipathd: votingocr003: load table [0 2096640 multipath 0 0 2 1 round-robin 0 1 1 8:144 1000 round-robin 0 1 1 65:48 1000]
Nov 16 16:34:58 loninengblc204 multipathd: 360000970000294900664533030303238: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: data001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: redo001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: fra001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: fra002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: acfs001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: acfs002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr001: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr002: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: votingocr003: event checker started
Nov 16 16:34:58 loninengblc204 multipathd: path checkers start u

Great - are all paths working?

[root@node1 ~]# multipath -ll | head
fra002 (360000970000294900664533030344144) dm-9 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:4 sde 8:64  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:4 sdo 8:224 [active][ready]
fra001 (360000970000294900664533030344143) dm-8 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:3 sdd 8:48  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:3 sdn 8:208 [active][ready]
acfs002 (360000970000294900664533030344241) dm-11 EMC,SYMMETRIX
[size=23G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:6 sdg 8:96  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:6 sdq 65:0  [active][ready]

Congratulations - distribute the working multipath.conf to all cluster nodes and start multipathd.

The beauty of this over a solution such as power path is that the device names are consistent across the cluster. With PowerPath I have come across a situation where /dev/rdsk/emcpower1a on node1 was /dev/rdsk/emcpower4a on node2 and again a different device on the other nodes. Not really user friendly, but neither a big issue with ASM: it'll read the information from the disk headers anyway. It was more a problem with pre 11.2 when you had to use block devices to store the OCR and voting files.