Search

Top 60 Oracle Blogs

Recent comments

linux

Getting up and running with Universal Connection Pool

Oracle’s next generation connection pooling solution, Universal Connection Pool, can be a bit tricky to set up. This is especially true when a JNDI data source is to be used-most example don’t assume such a scenario. A lot of information is out there on the net, but no one seems to have given the full picture. During the research for chapter 11 of “Pro Oracle Database 11g RAC on Linux” I learned this the hard way. Since the book has been published, a few minor changes changes have been made to the software I used at the time, and those merit an update. Please note that this article’s emphasis is to get  this example running-it is by no means meant to be secure enough for a production release! You need to harden  the setup considerably for production, but it serves well for demonstration purposes (only).

THE SETUP

I have used a four node 11.2.0.2 RAC system as the source for my data. A 2 node cluster database with service “TAFTEST” runs on nodes 1 and 2. It’s administrator-managed and the service has both nodes set aside as “preferred” nodes. The database nodes run Oracle Enterprise Linux 5.564bit with RAC 11.2.0.2. For the sake of simplicity, I used my Windows laptop to host the Tomcat instance, which is now updated to version 6.0.30. I am using apache Ant to build the application. The current stable ant build is 1.8.2. My JDK is also upgraded to the latest and greatest, version 1.6.0_23. I am using the 32bit 11.2.0.2 client package to supply me with ons.jar, ojdbc6.jar and ucp.jar.ORACLE CLIENT

Part of this excercise is to demonstrate FCF and FAN events, which means we need an Oracle client for the remote ONS configuration (please refer to chapter 11 of the RAC book for a detailed description of local vs remote ONS configurations). I downloaded the 11.2.0.2 32bit client from support.oracle.com for Windows and installed it to c:\oracle\product\11.2.0\client_1, chosing the administrator option in Oracle Universal Installer.

TOMCAT

Start by downloading Tomcat for your platform-I have successfully tested Tomcat on Linux and Windows. I deployed apache-tomcat-6.0.30 to c:\ for this test. Once it’s unzipped, copy the necessary JAR files from the Oracle client installation into %TOMCAT_HOME%\lib. These are ojdbc6.jar, ons.jar and ucp.jar. Next, you should set a few environment variables. To keep things simple, I edited  %tomcat_home%\bin\startup.sh and added these:

  • set JAVA_HOME=c:\program files\Java\jdk1.6.0_23
  • set JAVA_OPTS=-Doracle.ons.oraclehome=c:\oracle\product\11.2.0\client_1

I’m also interested in the final content of %JAVA_OPTS%, so I modified catalina.bat as well and added this line into the section below line 164:

echo Using JAVA_OPTS: “%JAVA_OPTS%”

Finally we need to add a user with access to the tomcat manager application. Edit %tomcat_home%\conf\tomcat-users.xml and create and entry similar to this one:


This is one of the major differences-starting with tomcat 6.0.30, the previous “manager” role has been split into the ones shown above to protect from xref attacks. It took me a while to discover the reason for all the http 403 errors I got when I tried to deploy my application… You’d obviously use a strong password here!

This concludes the TOMCAT setup.

ANT

Ant is a build tool used for deployment and compiling the sample application I am going to adapt. Simply download the zip file (version 1.8.2 was current when I wrote this) and deploy it somewhere conveniently. Again, a few environment variables are helpful which I usually put into a file called sp.bat:

@echo off
set ant_home=c:\apache-ant-1.8.2
set path=%ant_home%\bin;%path%
set java_home=c:\program files\Java\jdk1.6.0_23
set path=%java_home%\bin;%path%

This makes building the application a lot easier. Just change into the application directory and enter sp to get set up.

BUILDING AN APPLICATION

My earliest contact with Tomcat was a long time ago with version 3 and since then I remember the well written documentation. The “docs” application, usually accessible as http://localhost:8080/docs has a section about the first application. It’s highly recommended to read it: http://localhost:8080/docs/appdev/index.html.

To get started, I copied the “sample” directory from %tomcat_home%\webapps\docs\appdev to c:\temp and started adapting it. First of all my sp.bat script is copied in there. With a command prompt I changed into sample and edited the build.xml file. The first major section to go over is starting in line 129. I changed it to read like this:

 
 
 
 
 
 
 
 
 
 
 
 

The properties to change/add are app.name, app.version, catalina.home, manager.username and manager.password. The manager.* properties will come in very handy later as they allow us to automatically deploy the compiled/changed application.

With all this done, try to compile the application as it is:

C:\TEMP\sample>ant compile
Buildfile: C:\TEMP\sample\build.xml
Trying to override old definition of datatype resources

prepare:

compile:
 [javac] C:\TEMP\sample\build.xml:301: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds

BUILD SUCCESSFUL
Total time: 0 seconds

C:\TEMP\sample>

That’s it-the application has been created. You can now start tomcat and try to deploy the app. Open a cmd window, change to %tomcat_home%\bin and execute startup.bat. This will start tomcat-the new window shows any problems it might have when deploying application. This window is very useful for example when it comes to troubleshoot incorrect XML configuration files.

Now the next step is to use the ant target “install” to test the installation of the web archive. I changed the install target slightly in the way that I depend on the “dist” target to be completed before deployment to the tomcat server. My modified working install target is defined as this in build.xml:

 

 
 

Try invoking it as shown in the example below:

C:\TEMP\sample>ant install
Buildfile: C:\TEMP\sample\build.xml
Trying to override old definition of datatype resources

prepare:

compile:
 [javac] C:\TEMP\sample\build.xml:301: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=las
t; set to false for repeatable builds

javadoc:
 [javadoc] Generating Javadoc
 [javadoc] Javadoc execution
 [javadoc] Loading source files for package mypackage...
 [javadoc] Constructing Javadoc information...
 [javadoc] Standard Doclet version 1.6.0_23
 [javadoc] Building tree for all the packages and classes...
 [javadoc] Building index for all the packages and classes...
 [javadoc] Building index for all classes...

dist:

install:
 [deploy] OK - Deployed application at context path /ucp

BUILD SUCCESSFUL
Total time: 1 second

C:\TEMP\sample>

This operation succeeded. You should also see a line in your tomcat window:

INFO: Deploying web application archive ucp.war

When pointing your browser to http://localhost:8080/ucp/ you should be greeted by the familiar tomcat sample application.

ADDING THE DATASOURCE

So far this hasn’t been groundbreaking at all. It’s only now that it’s getting more interesting: the JNDI data source needs to be defined and used in our code. Instead of messing around with resources and res-ref configuration in the global %tomcat_home%\conf directory it is advisable to add the context to the application.

Back in directory c:\temp\sample create a new directory web\META-INF. Inside META-INF you create a file “context.xml” which takes the JNDI data source definition:


 

It really is that simple to implement-if it only were equally simple to find how to do this… The next step is to modify the Hello.java Servlet to reference the JNDI data source. The code for the servlet is shown below-it’s basically the existing servlet code amended with the JNDI and JDBC relevant code. It actually does very little: after looking the JDNI name up it grabs a session from the pool and checks which instance it is currently connected to. It then releases all resources and exits.


/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package mypackage;

import java.sql.Connection;
import java.sql.SQLException;
import java.sql.Statement;
import java.sql.ResultSet;
import oracle.ucp.jdbc.PoolDataSourceFactory;
import oracle.ucp.jdbc.PoolDataSource;
import javax.naming.*;

import java.io.IOException;
import java.io.PrintWriter;
import java.util.Enumeration;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public final class Hello extends HttpServlet {

public void doGet(HttpServletRequest request,
 HttpServletResponse response)
 throws IOException, ServletException {

 response.setContentType("text/html");
 PrintWriter writer = response.getWriter();

 writer.println("");
 writer.println("");
 writer.println("Sample Application Servlet Page");
 writer.println("");
 writer.println("");

 writer.println("");
 writer.println("");
 writer.println("");
 writer.println("");
 writer.println("");
 writer.println("
"); writer.println(""); writer.println(""); writer.println("

Sample Application Servlet

"); writer.println("This is the output of a servlet that is part of"); writer.println("the Hello, World application."); writer.println("
"); // this is the UCP specific part! writer.println("

UCP

"); try { Context ctx = new InitialContext(); Context envContext = (Context) ctx.lookup("java:/comp/env"); javax.sql.DataSource ds = (javax.sql.DataSource) envContext.lookup ("jdbc/UCPPool"); writer.println("Got the datasource"); Connection conn = ds.getConnection(); writer.println("

Connected to an Oracle intance

"); Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery("select 'Hello World from '||sys_context('userenv','instance_name') from dual"); while (rs.next()) { writer.println(rs.getString(1)); } rs.close(); stmt.close(); conn.close(); conn = null; } catch (Exception e) { writer.println("
" + e + "

");
}

writer.println("

");
writer.println("");
}
}

Done! Now let’s compile the code and deploy it to Tomcat before testing. The most common problem I get with the code is a JNDI error, stating the “jdbc/UCPPool” is not defined. This can happen in 2 cases:

  • You have a typo in the resource definition, in which case the context really doesn’t exist (it’s case sensitive)
  • You have non-compatible line breaks in the context.xml file. In this case I’d try having all the contents of the file in 1 line (use “J” in vi to join lines together)

VALIDATION

You should now see a number of sessions as user “scott” against the database-query gv$session for username “SCOTT” and you should see the result of your hard work.

11.2.0.2 Bundled Patch 3 for Linux x86-64bit Take 2

Yesterday I wrote about the application of Bundle Patch 3 to my 2 node RAC cluster. Unfortunately I have run into problems when applying the patches for the GRID_HOME. I promised a fix to the situation, and here it is.

First of all I was unsure if I could apply the missing patches manually, but then decided against it. The opatch output for interim patches lists the patch together with a unique patch as shown here:

Interim patches (3) :

Patch  10626132     : applied on Wed Feb 02 16:08:43 GMT 2011
Unique Patch ID:  13350217
 Created on 31 Dec 2010, 00:18:12 hrs PST8PDT
 Bugs fixed:
 10626132

The formatting unfortunately is lost when pasting this here.

I was not sure if that patch/unique patch combination would appear if I patched manually so decided to not be brave and roll the bundle patch back altogether before applying it again.

Patch Rollback

This was actually very simple: I have opted to rollback all the applied patches from the GRID_HOME. The documentation states that you have to simply append the “-rollback” flag to the opatch command. I tried it on the node where the application of 2 patches failed:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oragrid/product/11.2.0.2 -rollback
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh /u01/app/oragrid/product/11.2.0.2 -rollback -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-03_09-04-13.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
Successfully unlock /u01/app/oragrid/product/11.2.0.2
patch 10387939  rollback successful for home /u01/app/oragrid/product/11.2.0.2
The patch  10157622 does not exist in /u01/app/oragrid/product/11.2.0.2
The patch  10626132 does not exist in /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.

So that was simple enough. Again – you won’t see anything in your opatch session and you might think that the command somehow stalled. I usually start a “screen” on my terminal and open a new window to tail the opatchauto file in $GRID_HOME/cfgtoollogs/.

Re-Applying the Patch

The next step is to re-apply the patch. The initial failure was a lack of disk space as it has been evident from the log file.

2011-02-02 15:57:45: The apply patch output is Invoking OPatch 11.2.0.1.4

 Oracle Interim Patch Installer version 11.2.0.1.4
 Copyright (c) 2010, Oracle Corporation.  All rights reserved.

 UTIL session

 Oracle Home       : /u01/app/oragrid/product/11.2.0.2
 Central Inventory : /u01/app/oracle/product/oraInventory
 from           : /etc/oraInst.loc
 OPatch version    : 11.2.0.1.4
 OUI version       : 11.2.0.2.0
 OUI location      : /u01/app/oragrid/product/11.2.0.2/oui
 Log file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch2011-02-02_15-57-35PM.log

 Patch history file: /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch_history.txt

 Invoking utility "napply"
 Checking conflict among patches...
 Checking if Oracle Home has components required by patches...
 Checking conflicts against Oracle Home...
 OPatch continues with these patches:   10157622

 Do you want to proceed? [y|n]
 Y (auto-answered by -silent)
 User Responded with: Y

 Running prerequisite checks...
 Prerequisite check "CheckSystemSpace" failed.
 The details are:
 Required amount of space(2086171834) is not available.
 UtilSession failed: Prerequisite check "CheckSystemSpace" failed.

 OPatch failed with error code 73

2011-02-02 15:57:45: patch /u01/app/oracle/product/stage/10387939/10157622  apply  failed  for home  /u01/app/oragrid/product/11.2.0.2
2011-02-02 15:57:45: Performing Post patch actions

So this time around I ensured that I had enough free space (2.5G recommended minimum) available in my GRID_HOME. The procedure is the inverse to the rollback:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oragrid/product/11.2.0.2
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh /u01/app/oragrid/product/11.2.0.2 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-03_09-27-39.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
Successfully unlock /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.
[root@node1 stage]#

And this time it worked-all 3 patches were applied. However my free space diminished quite drastically from 2.5G to around 780M. And that was after purging lots of logs from $GRID_HOME/log/`hostname -s`/ . Nevertheless, this concludes the patch application.

Summary

In summary I am quite impressed with this patch. It looks as if it had been designed to be deployable by OEM and it’s silent, doesn’t require input (except for the ocm.rsp file) and is a rolling patch. However, the user has to manually check the installed patches vs the list of installable targets manually or through a script to ensure that all patches have been applied.You also have to ensure you have enough free space on your $GRID_HOME mount point.

As an enhancement request I’d like to request feedback from the opatch session that it started doing things-initially I hit CTRL-C thinking the command had stalled while it was actually busy in the background, shutting down my CRS stack. The “workaround” is to tail the logfile with the “-f” option.

11.2.0.2 Bundled Patch 3 for Linux x86-64bit

It has become a habit recently to issue explicit warnings when presenting information that-if it doesn’t work as described-might cause you harm. So to follow suit, here’s mine:

Warning! This is a Patch Set Update installation war story-it worked for me. That does by no means imply that the same works for you! So always apply the PSU or bundle patch on a non-critical test system first. And always take a backup of your ORACLE_HOME and inventory first before applying any patches-you have been warned. I like to be rather safe than sorry.

A good one I think …  First of all thanks to Coskan for the pointer to the Oracle blog entry: http://blogs.oracle.com/UPGRADE/2011/01/11202_bundled_patch_3_for_linu.html which actually prompted me to apply the patch in the first place.

My environment is a 2 node 11.2.0.2 cluster on OEL 5.5 64bit. Judging by the readme file I need the latest OPatch, 11.2.0.1.4, (patch p6880880)  and the bundle patch itself (patch 10387939). Unfortunately Oracle leaves it as an exercise to the reader to make sense of the patch readme and note 1274453.1 which is applicable if ACFS is involved on the cluster. Although I am using ACFS all my ACFS volumes were disabled before patching (i.e. they are not shared ORACLE_HOMEs). If you have ACFS in use then follow note 1274453.1 and shut all of them down before proceeding. Note 1274453.1 is divided into a number of sections. First it’s important to find the right case-I wanted to try the opatch auto feature this time, but patch GRID_HOME independently of the RDBMS_HOME. I knew from other users that opatch auto has gained a bad reputation in the past but was curious whether or not Oracle fixed it. Regardless of your configuration, you have to stop dbconsole instances on all nodes if there are any.

If applicable, the next step is to stop all database instances and their corresponding ACFS volumes. The ACFS mounts also have to be dismounted. If you are unsure you can use /sbin/acfsutil registry to list the ACFS volumes. Those registered with Clusterware show up in the output of crsctl status resource -t.

Important! If not already done so, update OPatch on all nodes for GI and RDBMS home. It’s a lot of manual work, I wish there was a cluster wide OPatch location…it would make my life so much easier. Just patching OPatch on 4 nodes is a pain, I don’t want to imagine a 8 or more nodes cluster right now.

Now proceed with the patch installation. I’ll only let it patch the GRID_HOME at this stage. The patch should be rolling which is good. Note that you’ll be root, which can cause its own problems in larger organisations. My patch is staged in /u01/app/oracle/product/stage, and the patch readme suggests I should use the opatch executable in the $GRID_HOME.

[root@node1 ~]# cd /u01/app/oracle/product/stage
[root@node1 stage]# cd 10387939
[root@node1 10387939]# export PATH=/u01/app/oragrid/product/11.2.0.2/OPatch:$PATH

Oracle Configuration Manager response file

Unlike with previous patchsets which prompted the user each time you ran opatch, this time the OCM configuration is not part of the patch set installation. Instead, you have to create an OCM configuration “response” file before you apply the patch. The file is created in the current directory. As the GRID_OWNER, execute $ORACLE_HOME/OPatch/ocm/bin/emocmrsp as shown here:

[root@node1 stage]# su - oragrid
[oragrid@node1 ~]$ . oraenv
ORACLE_SID = [oragrid] ? grid
The Oracle base for ORACLE_HOME=/u01/app/oragrid/product/11.2.0.2 is /u01/app/oragrid/product/admin
[oragrid@node1 ~]$ /u01/app/oragrid/product/11.2.0.2/OPatch/ocm/bin/emocmrsp
OCM Installation Response Generator 10.3.1.2.0 - Production
Copyright (c) 2005, 2009, Oracle and/or its affiliates.  All rights reserved.

Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name:

You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  y
The OCM configuration response file (ocm.rsp) was successfully created.
[oragrid@node1 ~]$ pwd
/home/oragrid

I don’t want automatic email notifications in this case so I left the email address blank. I also prefer to be uninformed of future patch sets. I then moved the file into /u01/app/oracle/product/stage/ocm.rsp as root for the next steps. It is good practise to save the detailed oracle inventory information for later. Again as the grid owner I ran this command:

[root@node1 stage]# su - oragrid
[oragrid@node1 ~]$ . oraenv
ORACLE_SID = [oragrid] ? grid
The Oracle base for ORACLE_HOME=/u01/app/oragrid/product/11.2.0.2 is /u01/app/oragrid/product/admin
[oragrid@node1 ~]$ /u01/app/oragrid/product/11.2.0.2/OPatch/opatch lsinv -detail > /tmp/gridhomedetail

What can (will) be patched?

The bundle.xml file lists the components that can be patched with a bundle patch. In my case, the bundle is applicable for these:


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

So in other words, the bundle exists of 3 patches – 10387939, 10157622 and 10626132. For each of these a target type is shown. SIHA is short for Single Instance High Availability, which has been renamed “Oracle Restart”. I assume SIDB is single instance DB, but don’t know for sure.

Patching the GRID_HOME

I’m now ready to apply the patch to the GRID home. The command to do so is simple, the below example worked for me:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oragrid/product/11.2.0.2
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh

/u01/app/oragrid/product/11.2.0.2 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-02_15-50-04.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:

Before entering the path I opened another terminal to look at the log file, which is in $ORACLE_HOME/cfgtoollogs/opatchauto plus a timestamp. After entering the full path to my ocm.rsp file nothing happened in my opatch session-Oracle could have told us that it was about to apply the patch. That is because it actually is doing that. Go back to your other terminal and see the log messages fly by! The opatch auto command automatically does everything that the DBA had to do previously, including unlocking the CRS stack and calling opatch napply for the relevant bits and pieces. This is indeed a nice step forward. I can vividly remember having to apply portions of a PSU to the GRID_HOME and others to the RDBMS home, 6 or 7 steps in total before a patch was applied. That was indeed hard work.

Only after CRS has been shut down (after quite a while after entering the path to the ocm.rsp file!) will you be shown this line:

Successfully unlock /u01/app/oragrid/product/11.2.0.2

Even further down the line you see those, after the patches have been applied:

patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622  apply  failed  for home  /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.

And with that I got my prompt back. One patch failed to apply-the log file indicated a lack of space (2G are required to be free).Tomorrow I’ll post an update and remove/reapply the patch manually.

I checked the other node and found that the patch has indeed been applied on the local node only. I hasn’t propagated across which is good as the readme wasn’t really clear if the patch was rolling. From what I’ve seen I’d call it a local patch, similar to the “opatch napply -local” we did manually before the opatch auto option has been introduced. And even better, it worked. Querying opatch lsinventory I got this result:

Lsinventory Output file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/lsinv/lsinventory2011-02-02_16-01-27PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.

Interim patches (1) :

Patch  10387939     : applied on Wed Feb 02 15:56:50 GMT 2011
Unique Patch ID:  13350217
 Created on 30 Dec 2010, 22:55:01 hrs PST8PDT
 Bugs fixed:
 10158965, 9940990, 10190642, 10031806, 10228635, 10018789, 9744252
 10010252, 9956713, 10204358, 9715581, 9770451, 10094635, 10121589
 10170431, 9824198, 10071193, 10145612, 10035737, 9845644, 10086980
 10052141, 10039731, 10035521, 10219576, 10184634, 10207092, 10138589
 10209232, 8752691, 9965655, 9819413, 9500046, 10106828, 10220118, 9881076
 9869287, 10040531, 10122077, 10218814, 10261389, 10033603, 9788588
 9735237, 10126219, 10043801, 10073205, 10205715, 9709292, 10105926
 10079168, 10098253, 10005127, 10013431, 10228151, 10092153, 10142909
 10238786, 10260808, 10033071, 9791810, 10052956, 9309735, 10026972
 10080579, 10073683, 10004943, 10019218, 9539440, 10022980, 10061490
 10006008, 6523037, 9724970, 10142776, 10208386, 10113803, 10261680
 9671271, 10084145, 10051966, 10355493, 10227133, 10229719, 10046912
 10228393, 10353054, 10142788, 10221016, 9414040, 10127360, 10310299
 10094201, 9591812, 10129643, 10332589, 10026193, 10195991, 10260870
 10248523, 9951423, 10261072, 10299224, 10230571, 10222719, 10233732
 10113633, 10102506, 10094949, 10077191, 10329146, 8685446, 10048701
 10314582, 10149223, 10245259, 10151017, 9924349, 10245086, 11074393

Rac system comprising of multiple nodes
 Local node = node1
 Remote node = node2

--------------------------------------------------------------------------------

OPatch succeeded.

Patching the RDBMS home

Now with that I’ll try applying it to the RDBMS home on the same node:

[root@node1 stage]# opatch auto /u01/app/oracle/product/stage/10387939 -oh /u01/app/oracle/product/11.2.0.2
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -oh

/u01/app/oracle/product/11.2.0.2 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-02_16-05-29.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622/custom/server/10157622  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oracle/product/11.2.0.2
[root@node1 stage]#

Cool! That was indeed easy. Did it work?

$ORACLE_HOME/OPatch/opatch lsinv
[...]
--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Database 11g                                                  11.2.0.2.0
There are 1 products installed in this Oracle Home.

Interim patches (3) :

Patch  10626132     : applied on Wed Feb 02 16:08:43 GMT 2011
Unique Patch ID:  13350217
 Created on 31 Dec 2010, 00:18:12 hrs PST8PDT
 Bugs fixed:
 10626132

Patch  10157622     : applied on Wed Feb 02 16:08:27 GMT 2011
Unique Patch ID:  13350217
 Created on 19 Nov 2010, 01:41:19 hrs PST8PDT
 Bugs fixed:
 9979706, 9959110, 10016083, 10015460, 10014392, 9918485, 10157622
 10089120, 10057296, 9971646, 10053985, 10040647, 9978765, 9864003
 10069541, 10110969, 10107380, 9915329, 10044622, 10029119, 9812970
 10083009, 9812956, 10048027, 10036193, 10008467, 10040109, 10015210
 10083789, 10033106, 10073372, 9876201, 10042143, 9963327, 9679401
 10062301, 10018215, 10075643, 10007185, 10071992, 10057680, 10038791
 10124517, 10048487, 10078086, 9926027, 10052721, 9944948, 10028235
 10146768, 10011084, 10027079, 10028343, 10045436, 9907089, 10073075
 10175855, 10178670, 10072474, 10036834, 9975837, 10028637, 10029900, 9949676

Patch  10157622     : applied on Wed Feb 02 16:08:27 GMT 2011
Unique Patch ID:  13350217
 Created on 19 Nov 2010, 01:41:19 hrs PST8PDT
 Bugs fixed:
 9979706, 9959110, 10016083, 10015460, 10014392, 9918485, 10157622
 10089120, 10057296, 9971646, 10053985, 10040647, 9978765, 9864003
 10069541, 10110969, 10107380, 9915329, 10044622, 10029119, 9812970
 10083009, 9812956, 10048027, 10036193, 10008467, 10040109, 10015210
 10083789, 10033106, 10073372, 9876201, 10042143, 9963327, 9679401
 10062301, 10018215, 10075643, 10007185, 10071992, 10057680, 10038791
 10124517, 10048487, 10078086, 9926027, 10052721, 9944948, 10028235
 10146768, 10011084, 10027079, 10028343, 10045436, 9907089, 10073075
 10175855, 10178670, 10072474, 10036834, 9975837, 10028637, 10029900, 9949676

Patch  10387939     : applied on Wed Feb 02 16:07:18 GMT 2011
Unique Patch ID:  13350217
 Created on 30 Dec 2010, 22:55:01 hrs PST8PDT
 Bugs fixed:
 10158965, 9940990, 10190642, 10031806, 10228635, 10018789, 9744252
 10010252, 9956713, 10204358, 9715581, 9770451, 10094635, 10121589
 10170431, 9824198, 10071193, 10145612, 10035737, 9845644, 10086980
 10052141, 10039731, 10035521, 10219576, 10184634, 10207092, 10138589
 10209232, 8752691, 9965655, 9819413, 9500046, 10106828, 10220118, 9881076
 9869287, 10040531, 10122077, 10218814, 10261389, 10033603, 9788588
 9735237, 10126219, 10043801, 10073205, 10205715, 9709292, 10105926
 10079168, 10098253, 10005127, 10013431, 10228151, 10092153, 10142909
 10238786, 10260808, 10033071, 9791810, 10052956, 9309735, 10026972
 10080579, 10073683, 10004943, 10019218, 9539440, 10022980, 10061490
 10006008, 6523037, 9724970, 10142776, 10208386, 10113803, 10261680
 9671271, 10084145, 10051966, 10355493, 10227133, 10229719, 10046912
 10228393, 10353054, 10142788, 10221016, 9414040, 10127360, 10310299
 10094201, 9591812, 10129643, 10332589, 10026193, 10195991, 10260870
 10248523, 9951423, 10261072, 10299224, 10230571, 10222719, 10233732
 10113633, 10102506, 10094949, 10077191, 10329146, 8685446, 10048701
 10314582, 10149223, 10245259, 10151017, 9924349, 10245086, 11074393

Rac system comprising of multiple nodes
 Local node = node1
 Remote node = node2

--------------------------------------------------------------------------------

OPatch succeeded.

It did indeed. As a nice side effect I didn’t even have to worry about the srvctl start/stop home commands, they were automatically done for me. From the log:

2011-02-02 16:08:45: /u01/app/oracle/product/11.2.0.2/bin/srvctl start home -o /u01/app/oracle/product/11.2.0.2 -s

/u01/app/oracle/product/11.2.0.2/srvm/admin/stophome.txt -n node1 output is
2011-02-02 16:08:45: Started resources from datbase home /u01/app/oracle/product/11.2.0.2

Now I’ll simply repeat this on the remaining nodes and should be done. To test the stability of the process I didn’t limit the opatch command to a specific home but had it patch them all.

Patching all homes in one go

This takes a little longer but otherwise is just the same. I have added the output here for the sake of completeness:

[root@node2 stage]# which opatch
/u01/app/oragrid/product/11.2.0.2/OPatch/opatch
[root@node2 stage]# opatch auto /u01/app/oracle/product/stage/10387939
Executing /usr/bin/perl /u01/app/oragrid/product/11.2.0.2/OPatch/crs/patch112.pl -patchdir /u01/app/oracle/product/stage -patchn 10387939 -paramfile /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
opatch auto log file location is /u01/app/oragrid/product/11.2.0.2/OPatch/crs/../../cfgtoollogs/opatchauto2011-02-02_16-36-09.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/oragrid/product/11.2.0.2/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/stage/ocm.rsp
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622/custom/server/10157622  apply successful for home  /u01/app/oracle/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oracle/product/11.2.0.2
Successfully unlock /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10387939  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10157622  apply successful for home  /u01/app/oragrid/product/11.2.0.2
patch /u01/app/oracle/product/stage/10387939/10626132  apply successful for home  /u01/app/oragrid/product/11.2.0.2
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4123: Oracle High Availability Services has been started.
[root@node2 stage]#

Well that’s for the technical part. When I compared the number of patches applied in the $GRID_HOME on node 2 (which simply used “opatch auto”) then I found 3 interim patches applied, as compared to only 1 on node 1 when I only patched the GRID_HOME. I’ll have to raise this with Oracle…

[oragrid@node2 ~]$ /u01/app/oragrid/product/11.2.0.2/OPatch/opatch lsinv
Invoking OPatch 11.2.0.1.4

Oracle Interim Patch Installer version 11.2.0.1.4
Copyright (c) 2010, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/oragrid/product/11.2.0.2
Central Inventory : /u01/app/oracle/product/oraInventory
 from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.4
OUI version       : 11.2.0.2.0
OUI location      : /u01/app/oragrid/product/11.2.0.2/oui
Log file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch2011-02-02_17-13-40PM.log

Patch history file: /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /u01/app/oragrid/product/11.2.0.2/cfgtoollogs/opatch/lsinv/lsinventory2011-02-02_17-13-40PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1):

Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.

Interim patches (3) :

Patch  10626132     : applied on Wed Feb 02 16:48:59 GMT 2011
Unique Patch ID:  13350217
 Created on 31 Dec 2010, 00:18:12 hrs PST8PDT
 Bugs fixed:
 10626132

Patch  10157622     : applied on Wed Feb 02 16:48:34 GMT 2011
Unique Patch ID:  13350217
 Created on 19 Nov 2010, 01:41:33 hrs PST8PDT
 Bugs fixed:
 9979706, 9959110, 10016083, 10015460, 10014392, 9918485, 10157622
 10089120, 10057296, 9971646, 10053985, 10040647, 9978765, 9864003
 10069541, 10110969, 10107380, 9915329, 10044622, 10029119, 9812970
 10083009, 9812956, 10048027, 10036193, 10008467, 10040109, 10015210
 10083789, 10033106, 10073372, 9876201, 10042143, 9963327, 9679401
 10062301, 10018215, 10075643, 10007185, 10071992, 10057680, 10038791
 10124517, 10048487, 10078086, 9926027, 10052721, 9944948, 10028235
 10146768, 10011084, 10027079, 10028343, 10045436, 9907089, 10073075
 10175855, 10072474, 10036834, 9975837, 10028637, 10029900, 9949676
 9974223, 10260251

Patch  10387939     : applied on Wed Feb 02 16:46:29 GMT 2011
Unique Patch ID:  13350217
 Created on 30 Dec 2010, 22:55:01 hrs PST8PDT
 Bugs fixed:
 10158965, 9940990, 10190642, 10031806, 10228635, 10018789, 9744252
 10010252, 9956713, 10204358, 9715581, 9770451, 10094635, 10121589
 10170431, 9824198, 10071193, 10145612, 10035737, 9845644, 10086980
 10052141, 10039731, 10035521, 10219576, 10184634, 10207092, 10138589
 10209232, 8752691, 9965655, 9819413, 9500046, 10106828, 10220118, 9881076
 9869287, 10040531, 10122077, 10218814, 10261389, 10033603, 9788588
 9735237, 10126219, 10043801, 10073205, 10205715, 9709292, 10105926
 10079168, 10098253, 10005127, 10013431, 10228151, 10092153, 10142909
 10238786, 10260808, 10033071, 9791810, 10052956, 9309735, 10026972
 10080579, 10073683, 10004943, 10019218, 9539440, 10022980, 10061490
 10006008, 6523037, 9724970, 10142776, 10208386, 10113803, 10261680
 9671271, 10084145, 10051966, 10355493, 10227133, 10229719, 10046912
 10228393, 10353054, 10142788, 10221016, 9414040, 10127360, 10310299
 10094201, 9591812, 10129643, 10332589, 10026193, 10195991, 10260870
 10248523, 9951423, 10261072, 10299224, 10230571, 10222719, 10233732
 10113633, 10102506, 10094949, 10077191, 10329146, 8685446, 10048701
 10314582, 10149223, 10245259, 10151017, 9924349, 10245086, 11074393

Rac system comprising of multiple nodes
 Local node = node2
 Remote node = node1

--------------------------------------------------------------------------------

OPatch succeeded.

Viewing Runtime Load Balancing Events

Yesterday I have run a benchmark on a 2 node RAC cluster (ProLiant BL685c G6 with 4 Six-Core AMD Opteron 8431) and 32G RAM each. It’s running Oracle Grid Infrastructure 11.2.0.2 as well as an Oracle 11.2.0.2 database on Oracle Enterprise Linux 5.5 64bit and device-mapper-multipath.

I was testing how the system would react under load but also wanted to see if the Runtime Load Balancing was working. The easiest way to check this is to view the AQ events that are generated for a service if AQ HA notifications is set to true. They can either be dequeued from the database as described in chapter 11 of Pro Oracle Database 11g RAC on Linux or alternatively queried from the database. The latter is the quicker method and this article will focus on it.

Before you can make use of Runtime Load Balancing you need to set at least 2 properties in your service:

  • Connection Load Balancing Goal (either SHORT or LONG)
  • Runtime Load Balancing Goal (SERVICE_TIME or THROUGHPUT)

.Net applications require AQ HA notifications to be set to true as these can’t directly make use of Fast Application Notification (FAN) events as said in the introduction. My JDBC application is fully capable of using the FAN events, however as you will see later I am using the AQ notifications anyway to view the events.

Connected as the owner of the Oracle binaries, I created a new service to make use of both instances:

$ srvctl add service -d TEST -s TESTSRV -r TEST1,TEST2 -P BASIC  \
> -l PRIMARY -y MANUAL -q true -x false -j short -B SERVICE_TIME \
> -e SESSION -m BASIC -z 0 -w 0

The service TESTSRV for database TEST has TEST1 and TEST2 as preferred instances, and the service should be started (manually) when the database is in the primary role. AQ Notifications are enabled, and I chose the connection load balancing goal to be “short” (usually ok with web applications and connection pooling) and a runtime load balancing goal of service time (should also be appropriate for many short transactions typical for a web environment). The remaining paramters define Transparent Application Failover. Please refer to the output of “srvctl add service -h” for more information about the command line parameters.

The result of this endavour can be viewed with srvctl config service:

$ srvctl config service -d TEST -s TESTSRV
Service name: TESTSRV
Service is enabled
Server pool: TEST_TESTSRV
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: MANUAL
DTP transaction: false
AQ HA notifications: true
Failover type: SESSION
Failover method: BASIC
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: SHORT
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: BASIC
Edition:
Preferred instances: TEST1,TEST2
Available instances:

So to begin with I created the order entry schema (SOE) in preparation of a swingbench run. (I know that Swingbench’s Order Entry is probably not the best benchmark out there but my client knows and likes it). Once about 10G of data were generated I started a swingbench run with 300 users, and reasonably low think time (min transaction time 20ms and max of 60ms). The connect string was //scan1.example.com:1800/TESTSRV

A query against gv$session showed an even balance of sessions, which was good:

select count(inst_id), inst_id
 from gv$session
where username = 'SOE'
group by inst_id

However, whatever I did I couldn’t get the Runtime Load Balancing in sys.sys$service_metrics_tab to chanage. They always looked like this (column user_data):

{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} }

That sort of made sense as none of the nodes broke into a sweat-the system was > 50% idle with a load average of about 12. So that wouldn’t cut it. Instead of trying to experiment with the Swingbench parameters, I decided to revert back to the silly CPU burner: a while loop which generates random numbers. I wasn’t interested in I/O at this stage, and created this minimal script:

$ cat dothis.sql
declare
 n number;
begin
 while (true) loop
 n:= dbms_random.random();
 end loop;
end;
/

A simple for loop can be used to start the load test:

$ for i in $(seq 30); do
> sqlplus soe/soe@scan1.example.com:1800/TESTSRV @dothis &
done

This created an even load on both nodes. I then started another 20 sessions on node1 against TEST1 to trigger the change in behaviour. And fair enough, the top few lines of “top” revealed the difference. The output for node 1 was as follows:


top - 10:59:30 up 1 day, 21:16,  6 users,  load average: 42.44, 20.23, 10.07
Tasks: 593 total,  48 running, 545 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.9%us,  0.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32960688k total, 11978912k used, 20981776k free,   441260k buffers
Swap: 16777208k total,        0k used, 16777208k free,  8078336k cached

Whereas node 2 was relatively idle.

top - 10:59:22 up 5 days, 17:45,  4 users,  load average: 15.80, 10.53, 5.74
Tasks: 631 total,  16 running, 605 sleeping,  10 stopped,   0 zombie
Cpu(s): 58.8%us,  0.6%sy,  0.0%ni, 40.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32960688k total, 11770080k used, 21190608k free,   376672k buffers
Swap: 16777208k total,        0k used, 16777208k free,  7599496k cached

Would that imbalance finally make a difference? It did, as the user_data column (truncated here for better readability) reveals:

SQL> select user_data
2  from sys.sys$service_metrics_tab
3  order by enq_time desc;

{instance=TEST1 percent=4 flag=GOOD aff=TRUE}{instance=TEST2 percent=96 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:01:16')
{instance=TEST1 percent=6 flag=GOOD aff=TRUE}{instance=TEST2 percent=94 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:00:46')
{instance=TEST1 percent=10 flag=GOOD aff=TRUE}{instance=TEST2 percent=90 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:00:16')
{instance=TEST1 percent=18 flag=GOOD aff=TRUE}{instance=TEST2 percent=82 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:59:46')
{instance=TEST1 percent=28 flag=GOOD aff=TRUE}{instance=TEST2 percent=72 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:59:16')
{instance=TEST1 percent=35 flag=GOOD aff=TRUE}{instance=TEST2 percent=65 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:58:46')
{instance=TEST1 percent=40 flag=GOOD aff=TRUE}{instance=TEST2 percent=60 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:58:16')
{instance=TEST1 percent=43 flag=GOOD aff=TRUE}{instance=TEST2 percent=57 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:57:46')
{instance=TEST1 percent=44 flag=GOOD aff=TRUE}{instance=TEST2 percent=56 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:57:16')
{instance=TEST1 percent=48 flag=GOOD aff=TRUE}{instance=TEST2 percent=52 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:56:46')
{instance=TEST1 percent=49 flag=GOOD aff=TRUE}{instance=TEST2 percent=51 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:56:16')
{instance=TEST1 percent=50 flag=GOOD aff=TRUE}{instance=TEST2 percent=50 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:55:46')
{instance=TEST1 percent=50 flag=GOOD aff=TRUE}{instance=TEST2 percent=50 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:55:16')
{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} } timestamp=2011-01-20 10:54:46')
{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} } timestamp=2011-01-20 10:54:16')

Where it was initially even at 50-50 it soon became imbalanced, and TEST2 would be preferred after a few minutes in the test.  So everything was working as expected, I just didn’t manage to put enough load on the system intially.

Solaris Eye for the Linux Guy… Part II (oprofile, Dtrace, and Oracle Event Trace)

Proper tool for the job

My grandfather used to say to me: “Use the proper tool for the job”.  This is important to keep in mind when faced with performance issues.  When I am faced with performance problems in Oracle, I typically start at a high level with AWR reports or Enterprise Manager to get a high level understanding of the workload.   To drill down further, the next step is to use Oracle “10046 event” tracing.  Cary Millsap created a methodology around event tracing called “Method-R” which shows how to focus in on the source of a performance problem by analyzing the components that contribute to response time.   These are all fine places to start to analyze performance problems from the “user” or “application” point of view.  But what happens if the OS is in peril?

Adding storage dynamically to ASM on Linux

Note: This discussion is potentially relevant only to OEL 5.x and RHEL 5.x- I haven’t been able to verify that it works the same way on other Linux distributions. I would assume so though. Before starting with the article, here are some facts:

  • OEL/RHEL 5.5 64bit
  • Oracle 11.2.0.2
  • native multipathing: device-mapper-multipath

The question I have asked myself many times is: how can I dynamically add a LUN to ASM without having to stop any component of the stack? Mocking “reboot-me” OS’s like Windows I soon was quiet when it came to discussing the addition of a LUN to ASM on Linux. Today I learned how to do this, by piecing together information I got from Angus Thomas, a great Red Hat system administrator I had the pleassure to work with in 2009 and 2010. And since I have a short lived memory I decided to write it down.

I’ll describe the process from the top to bottom, from the addition of the LUN to the server all the way up to the addition of the ASM disk to the disk group.

Adding the storage to the cluster nodes

The first step is to obviosuly get the LUN assigned to the server(s). This is the easy part, and outside of the control of the Linux/Oracle admin. The storage team will provision a new LUN to the hosts in question. At this stage, Linux has no idea about the new storage: to make it available, the system administrator has to rescan the SCSI bus. A proven and tested way in RHEL 5 is to issue this command:

[root@node1 ~]# for i in `ls -1 /sys/class/scsi_host`; do
> echo "- - -" > /sys/class/scsi_host/${i}/scan
> done

The new, unpartitioned LUN will appear in /proc/partitions. If it doesn’t then there’s probably something wrong on the SAN side-check /var/log/messages and talk to your storage administrator. If it’s not a misconfiguration then you may not have an option but to reboot the node.

Configure Multipathing

So far so good, the next step is to add it to the multipathing. First of all, you need to find out what the new WWID of the device is. In my case that’s simple: the last new line in /proc/partitions is usually a giveaway. If you are unsure, ask the man who can check the WWID a console to the array. It’s important to get this right at this stage :)

To add the new disk to the multipath.conf file, all you need to do is to add a new section, as in the following example:


multipaths {
..
multipath {
wwid 360000970000294900664533030344239
alias ACFS0001
path_grouping_policy failover
}
..
}

By the way, I have written a more detailed post about configuring multipathing in a previous blog post here. Don’t forget to replicate the changes to the other cluster nodes!

Now  you reload multipathd using /etc/init.d/multipathd reload on each node, and voila, you should see the device in /dev/mapper/ – my ACFS disk appeared as /dev/mapper/ACFS0001.

Now the tricky bit is to partition it (if you need to-it’s no longer mandatory with 11.1 and newer. Some software like EMC’s Replication Manager requires you to though). I succeeded in doing so by checking the device in /dev/disk/by-id and then using fdisk against it as in this example:

...
# fdisk /dev/disk/by-id/scsi-360000970000294900664533030344239
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

The number of cylinders for this disk is set to 23251.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
 (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): u
Changing display/entry units to sectors

Command (m for help): n
Command action
 e   extended
 p   primary partition (1-4)
p
Partition number (1-4): 1
First sector (32-47619839, default 32): 128
Last sector or +size or +sizeM or +sizeK (129-47619839, default 47619839):
Using default value 47619839

Command (m for help): p

Disk /dev/disk/by-id/scsi-360000970000294900664533030344239: 24.3 GB, 24381358080 bytes
64 heads, 32 sectors/track, 23251 cylinders, total 47619840 sectors
Units = sectors of 1 * 512 = 512 bytes

 Device Boot                                                         Start  End         Blocks     Id  System
/dev/disk/by-id/scsi-360000970000294900664533030344239p1             128    47619839    23809855+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Once you are in fdisk, the commands are identical to single-pathed storage. Type “n” to create a new partition, “p” for a primary and specify the start and end cylinders as needed. Type “p” to print the partition table, and if you are happy with it use “w” to write it. You might wonder why I added an offset and changed the unit (“u”)-this is due to the EMC storage this site uses.  The EMC® Host Connectivity Guide for Linux (P/N 300-003-865 REV A23) suggests a 64k offset. Don’t simply repeat this in your environment-check with the storage team first.

Before adding the partitions to ACFS0001 and ACFS0002 I had 107 partitions:


[root@node1 ~]# wc -l /proc/partitions
107 /proc/partitions

The new partitions are recognised after the 2 fdisk commands completed:


[root@node1 ~]# wc -l /proc/partitions
 107 /proc/partitions

But when you check /dev/mapper now you still don’t see the partition-the naming convention is to append pn to the device name, i.e. /dev/mapper/ACFS0001p1 for the first partition and so on.

kpartx to the rescue! This superb utility can read the partition table of a device and modify it. Initially my setup was as follows:


[root@node1 ~]# ls -l /dev/mapper/ACFS*
brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001
brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002

Usually I would have rebooted the node at this stage as I didn’t know about how to update the partition table. But with kpartx (“yum install kpartx” to install) this is no longer needed. Consider the below example:

[root@node1 ~]# kpartx -l /dev/mapper/ACFS0001
ACFS0001p1 : 0 47619711 /dev/mapper/ACFS0001 129
[root@node1 ~]# kpartx -a /dev/mapper/ACFS0001
[root@node1 ~]# kpartx -l /dev/mapper/ACFS0002
ACFS0002p1 : 0 47619711 /dev/mapper/ACFS0002 129
[root@node1 ~]# kpartx -a /dev/mapper/ACFS0002

[root@node1 ~]# ls -l /dev/mapper/ACFS000*
brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001
brw-rw---- 1 root disk 253, 36 Jan 18 10:13 /dev/mapper/ACFS0001p1
brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002
brw-rw---- 1 root disk 253, 37 Jan 18 10:13 /dev/mapper/ACFS0002p1

“kpartx -l” prints the partition table, and “kpartx -a” adds it as the example shows. No more need to reboot! However, as it’s been pointed out in the comments section (see below), kpartx doesn’t use/add both paths, so you should run the partprobe command to add the missing paths:


[root@node1 ~]# partprobe
[root@node1 ~]# wc -l /proc/partitions
109 /proc/partitions

 

 

See how there are 109 partitions listed now instead of just 107 from before-the 2 missing paths have been added (one for each device).

Add disks to ASM

With this done, you can add the disk to ASM – I personally like the intermediate step to create and ASMLib disk. Connect to ASM as sysasm and add the disk using the alter diskgroup command:

SQL> alter diskgroup ACFSDG add disk 'ORCL:ACFS0002', 'ORCL:ACFS0001';

Now just wait for the rebalance operation to complete.

Solaris Eye for the Linux guy… or how I learned to stop worrying about Linux and Love Solaris (Part 1)

This entry goes out to my Oracle techie friends that have been in the Linux camp for sometime now and are suddenly finding themselves needing to know more about Solaris… hmmmm… I wonder if this has anything to do with Solaris now being an available option with Exadata?  Or maybe the recent announcement that the SPARC T3 multiplier for T3-x servers is now 0.25.  Judging by my inbox recently, I suspect a renewed interest in Solaris to continue.

Adding user equivalence for RAC the easy way

This is the first time I am setting up a new 11.2.0.2 cluster with the automatic SSH setup. Until now, I ensured user equivalence by copying ssh RSA and DSA manually to all cluster nodes. For two nodes that’s not too bad, but recently someone asked a question around a 28 (!) node cluster on a mailing list I am subscribing to. So that’s when I think the whole process  gets a bit too labour intensive.

So setting up user equivalence using a script may be the solution. You can also use OUI to do the same, but I like to run “cluvfy stage -post hwos” to check everything is ok before even thinking about executing ./runInstaller.

Here’s the output of a session, my 2 cluster nodes are acfsprodnode1 and acfsprodnode2 (yes, they are for 11.2 ACFS replication and encryption testing). I am using the grid user as the owner of Grid Infrastructure, and oracle to own the RDBMS binaries. Start by navigating to the location where you unzipped the Grid Infrastructure patch file. Then change into directoy “sshsetup” and run the command:


[grid@acfsprdnode1 sshsetup]$ ./sshUserSetup.sh
Please specify a valid and existing cluster configuration file.
Either user name or host information is missing
Usage ./sshUserSetup.sh -user  [ -hosts "" | -hostfile  ] [ -advanced ]  [ -verify] [ -exverify ] [ -logfile  ] [-confirm] [-shared] [-help] [-usePassphrase] [-noPromptPassphrase]

Next execute the command, I opted for option noPromptPassphrase, as I don’t use them for the key.

[grid@acfsprdnode1 sshsetup]$ ./sshUserSetup.sh -user grid -hosts "acfsprdnode1 acfsprdnode2" -noPromptPassphrase
The output of this script is also logged into /tmp/sshUserSetup_2010-12-22-15-39-18.log
Hosts are acfsprdnode1 acfsprdnode2
user is grid
Platform:- Linux
Checking if the remote hosts are reachable
PING acfsprdnode1.localdomain (192.168.99.100) 56(84) bytes of data.
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=1 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=2 ttl=64 time=0.019 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=3 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=4 ttl=64 time=0.017 ms
64 bytes from acfsprdnode1.localdomain (192.168.99.100): icmp_seq=5 ttl=64 time=0.018 ms

--- acfsprdnode1.localdomain ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.017/0.017/0.019/0.004 ms
PING acfsprdnode2.localdomain (192.168.99.101) 56(84) bytes of data.
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=1 ttl=64 time=0.331 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=2 ttl=64 time=0.109 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=3 ttl=64 time=0.324 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=4 ttl=64 time=0.256 ms
64 bytes from acfsprdnode2.localdomain (192.168.99.101): icmp_seq=5 ttl=64 time=0.257 ms

--- acfsprdnode2.localdomain ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 0.109/0.255/0.331/0.081 ms
Remote host reachability check succeeded.
The following hosts are reachable: acfsprdnode1 acfsprdnode2.
The following hosts are not reachable: .
All hosts are reachable. Proceeding further...
firsthost acfsprdnode1
numhosts 2
#ff0000;">The script will setup SSH connectivity from the host acfsprdnode1 to all
#ff0000;">the remote hosts. After the script is executed, the user can use SSH to run
commands on the remote hosts or copy files between this host acfsprdnode1
and the remote hosts without being prompted for passwords or confirmations.

NOTE 1:
As part of the setup procedure, this script will use ssh and scp to copy
files between the local host and the remote hosts. Since the script does not
store passwords, you may be prompted for the passwords during the execution of
the script whenever ssh or scp is invoked.

NOTE 2:
AS PER SSH REQUIREMENTS, THIS SCRIPT WILL SECURE THE USER HOME DIRECTORY
AND THE .ssh DIRECTORY BY REVOKING GROUP AND WORLD WRITE PRIVILEDGES TO THESE
directories.

Do you want to continue and let the script make the above mentioned changes (yes/no)?
#ff0000;">yes

The user chose yes
User chose to skip passphrase related questions.
Creating .ssh directory on local host, if not present already
Creating authorized_keys file on local host
Changing permissions on authorized_keys to 644 on local host
Creating known_hosts file on local host
Changing permissions on known_hosts to 644 on local host
Creating config file on local host
If a config file exists already at /home/grid/.ssh/config, it would be backed up to /home/grid/.ssh/config.backup.
Removing old private/public keys on local host
Running SSH keygen on local host with empty passphrase
Generating public/private rsa key pair.
Your identification has been saved in /home/grid/.ssh/id_rsa.
Your public key has been saved in /home/grid/.ssh/id_rsa.pub.
The key fingerprint is:
de:e3:66:fa:16:e8:6e:36:fd:c5:e3:77:75:07:9a:b0 grid@acfsprdnode1
Creating .ssh directory and setting permissions on remote host acfsprdnode1
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR grid. THIS IS AN SSH REQUIREMENT.
The script would create ~grid/.ssh/config file on remote host acfsprdnode1. If a config file exists already at ~grid/.ssh/config, it would be backed up to ~grid/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host acfsprdnode1.
Warning: Permanently added 'acfsprdnode1,192.168.99.100' (RSA) to the list of known hosts.
grid@acfsprdnode1's password:
Done with creating .ssh directory and setting permissions on remote host acfsprdnode1.
Creating .ssh directory and setting permissions on remote host acfsprdnode2
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR grid. THIS IS AN SSH REQUIREMENT.
The script would create ~grid/.ssh/config file on remote host acfsprdnode2. If a config file exists already at ~grid/.ssh/config, it would be backed up to ~grid/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host acfsprdnode2.
Warning: Permanently added 'acfsprdnode2,192.168.99.101' (RSA) to the list of known hosts.
grid@acfsprdnode2's password:
Done with creating .ssh directory and setting permissions on remote host acfsprdnode2.
Copying local host public key to the remote host acfsprdnode1
The user may be prompted for a password or passphrase here since the script would be using SCP for host acfsprdnode1.
grid@acfsprdnode1's password:
Done copying local host public key to the remote host acfsprdnode1
Copying local host public key to the remote host acfsprdnode2
The user may be prompted for a password or passphrase here since the script would be using SCP for host acfsprdnode2.
grid@acfsprdnode2's password:
Done copying local host public key to the remote host acfsprdnode2
cat: /home/grid/.ssh/known_hosts.tmp: No such file or directory
cat: /home/grid/.ssh/authorized_keys.tmp: No such file or directory
SSH setup is complete.

------------------------------------------------------------------------
Verifying SSH setup
===================
The script will now run the date command on the remote nodes using ssh
to verify if ssh is setup correctly. IF THE SETUP IS CORRECTLY SETUP,
THERE SHOULD BE NO OUTPUT OTHER THAN THE DATE AND SSH SHOULD NOT ASK FOR
PASSWORDS. If you see any output other than date or are prompted for the
password, ssh is not setup correctly and you will need to resolve the
issue and set up ssh again.
The possible causes for failure could be:
1. The server settings in /etc/ssh/sshd_config file do not allow ssh
for user grid.
2. The server may have disabled public key based authentication.
3. The client public key on the server may be outdated.
4. ~grid or ~grid/.ssh on the remote host may not be owned by grid.
5. User may not have passed -shared option for shared remote users or
may be passing the -shared option for non-shared remote users.
6. If there is output in addition to the date, but no password is asked,
it may be a security alert shown as part of company policy. Append the
additional text to the /sysman/prov/resources/ignoreMessages.txt file.
------------------------------------------------------------------------
--acfsprdnode1:--
Running /usr/bin/ssh -x -l grid acfsprdnode1 date to verify SSH connectivity has been setup from local host to acfsprdnode1.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
Wed Dec 22 15:40:10 GMT 2010
------------------------------------------------------------------------
--acfsprdnode2:--
Running /usr/bin/ssh -x -l grid acfsprdnode2 date to verify SSH connectivity has been setup from local host to acfsprdnode2.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
Wed Dec 22 15:40:10 GMT 2010
------------------------------------------------------------------------
SSH verification complete.
[grid@acfsprdnode1 sshsetup]$ ssh acfsprdnode1 hostname
acfsprdnode1
[grid@acfsprdnode1 sshsetup]$ ssh acfsprdnode2 hostname
acfsprodnode2
[grid@acfsprdnode1 sshsetup]$

Nice! That’s a lot of work taken away from me, and I can start runing cluvfy now to fix problems before OUI warns me about shortcomings on my system.

You should note that per the above output, the script only distributes the local ssh keys to the remote hosts. When in OUI’s cluster node addition screen (6 of 16 in the advanced installation) you still need to click on the “SSH Connectivity” button and then on “Setup” after providing username and password to establish cluster wide user equivalence.

Wheeew, I am now a RedHat Certified Engineer!

A couple of weeks ago, RedHat announced the general availability of RHEL6… also effective on this release is the change on their certification offering. RHCT will now be replaced by RHCSA (Red Hat Certified System Administrator), and if you would like to be RHCE on RHEL6 regardless of your certification on RHEL5 you still have to go through the RHCSA exam.. and then once you pass.. you are then allowed to take RHCE exam for RHEL6. More details here: RHCSA, RHCE

UltraEdit for Mac, Production Release…

Just to let you know, UltraEdit for Mac has now been released to production.

I’ve been using the beta version for a while and it’s really cool. If you love UltraEdit on Windows, then you will love UltraEdit on Mac. It’s been released as version 2.0, which from what I can see has pretty much all the features of the Windows version 16.x. The Linux version (1.0)  is still missing a lot, but it is supposed to have a version 2.0 release in the new year.

I’ve upgraded my Windows license to a multi-platform license with unlimited updates, so I can run UltraEdit on Windows, Linux and Mac and never pay for an upgrade again. It’s not the cheapest option but for a bit of kit like this I’m very willing to pay up.

So now I have UltraEdit on everything and SnagIt on Windows and Mac. If I could switch from Shutter to SnagIt on Linux I would be ecstatic. :)

Cheers

Tim…