I researched an interesting new feature available with Oracle 11g R2, the so called RAC FAN API when writing the workload management chapter for the RAC book. The RAC FAN API is documented in Oracle® Database JDBC Developer’s Guide, 11g Release 2 (11.2) available online, but when it came to the initial documentation following the 11.2.0.1 release on Linux it was pretty useless. The good news is that it improved!
The RAC FAN Java API
The aim of this API is to allow a Java application to listen to FAN events by creating a subscription to the RAC nodes’ ONS processes. The application then registers a FANListener, based on the subscription, which can pick up instances of the following events:
ServiceDownEvent
NodeDownEvent
LoadAdvisoryEvent
All of these are in the oracle.simplefan namespace, the javadoc reference of which you can find in the official documenation.
When it was initially released in 11.2.0.1 I tried to get the FAN subscription to work without any luck, the registration with the ONS didn’t work. Schematically, this is how it should work:
The application (shown on the top) requires an Oracle client for the ONS configuration and libraries, and the setup is very similar to how you’d set up FCF. With the ons.config file pointing to the RAC nodes’ ONS processes you should be able to read FAN events as they occur on the source system. The ons.config in $CLIENT_HOME/opmn/conf/ons.config has to contain the local and remote ONS port, as well as all the RAC nodes with their remote ONS port. When subscribing to the events, you specify the service name you’d like to receive events for.
My 4 node RAC cluster is comprised of node rac11gr2drnode{1,2,3}, with local ONS listing on port 6200 for external, and port 6100 for internal requests. This can be checked using the onsctl debug command on any cluster node.. The client’s ons.config contains these lines:
Unfortunately I couldn’t subscribe to the remote ONS-I know that my setup is valid because of my UCP/FCF testing I did previously (check my earlier blog postings about how to get started with UCP and Tomcat 6). Or maybe it doesn’t work on Windows, who knows? Trying to subscribe to the remote ONS my application bails out with the following error:
Exception in thread “main” oracle.ons.SubscriptionException: Subscription request timed out after 30000 millseconds. Possible causes: OPMN may not be running, you may have an OPMN running in an alternate ORACLE_HOME using duplicate port values, or OPMN may be misconfigured.
at oracle.ons.SubscriptionNotification.waitForReply(SubscriptionNotification.java:83)
at oracle.ons.ONS.addSubscriber(ONS.java:956)
at oracle.ons.Subscriber.realStartup(Subscriber.java:103)
at oracle.ons.Subscriber.(Subscriber.java:79)
at oracle.simplefan.impl.FanSubscription.subscribe(FanSubscription.java:228)
at oracle.simplefan.impl.FanSubscription.createFanSubscription(FanSubscription.java:46)
at oracle.simplefan.impl.FanManager.subscribe(FanManager.java:120)
at de.mbh.TestClass.(TestClass.java:21)
at de.mbh.TestClass.main(TestClass.java:53)
I tried my favourite search engine on the Exception but couldn’t find any useful information. This is not a big deal for me, I can start a local ONS on the client. This is shown here:
C:\oracle\product\11.2.0\client_1\opmn\bin>onsctl ping
ons is not running ...
C:\oracle\product\11.2.0\client_1\opmn\bin>onsctl start
onsctl start: ons started
C:\oracle\product\11.2.0\client_1\opmn\bin>onsctl debug
HTTP/1.1 200 OK
Content-Length: 3627
Content-Type: text/html
Response:
== client:6200 5872 11/05/26 11:47:16 ==
Home: c:\oracle\product\11.2.0\client_1
======== ONS ========
IP ADDRESS PORT TIME SEQUENCE FLAGS
--------------------------------------- ----- -------- -------- --------
10.xxx.xx.98 6200 4dde2fb0 00000006 00000008
Listener:
TYPE BIND ADDRESS PORT SOCKET
-------- --------------------------------------- ----- ------
Local 127.0.0.1 6100 344
Remote any 6200 352
Servers: (3)
INSTANCE NAME TIME SEQUENCE FLAGS DEFER
---------------------------------------- -------- -------- -------- ----------
dbInstance_rac11gr2drnode1_6200 4d53d6a4 00029496 00000002 0
10.xxx.xx.155 6200
dbInstance_rac11gr2drnode2_6200 4d3eb9e2 0006cbeb 00000002 0
10.xxx.xx157 6200
dbInstance_rac11gr2drnode3_6200 4d5937d6 0001990f 00000002 0
10.xxx.xx.158 6200
Connection Topology: (4)
IP PORT VERS TIME
--------------------------------------- ----- ----- --------
10.xxx.xx.158 6200 4 4dde2fb2
** 10.xxx.xx.155 6200
** 10.xxx.xx.157 6200
** 10.xxx.xx.98 6200
10.xxx.xx.157 6200 4 4dde2fb2
** 10.xxx.xx.155 6200
** 10.xxx.xx.158 6200
** 10.xxx.xx.98 6200
10.xxx.xx.155 6200 4 4dde2fb2
** 10.xxx.xx.98 6200
** 10.xxx.xx.157 6200
** 10.xxx.xx.158 6200
10.xxx.xx.98 6200 4 4dde2fb2=
** 10.xxx.xx.155 6200
** 10.xxx.xx.158 6200
** 10.xxx.xx.157 6200
Server connections:
ID CONNECTION ADDRESS PORT FLAGS SENDQ REF WSAQ
-------- --------------------------------------- ----- ------ ----- --- ----
0 10.xxx.xx.155 6200 010405 00000 001 ---
1 10.xxx.xx.156 6200 002405 00000 001
2 10.xxx.xx.157 6200 010405 00000 001 --
3 10.xxx.xx.158 6200 010405 00000 001 ---
Client connections:
ID CONNECTION ADDRESS PORT FLAGS SENDQ REF SUB W
-------- --------------------------------------- ----- ------ ----- --- --- -
4 internal 0 01008a 00000 001 002
request 127.0.0.1 6100 03201a 00000 001 000
Worker Ticket: 28/28, Last: 11/05/26 11:47:15
THREAD FLAGS
-------- --------
120 00000012
124 00000012
128 00000012
Resources:
Notifications:
Received: Total 12 (Internal 6), in Receive Q: 0
Processed: Total 12, in Process Q: 0
Pool Counts:
Message: 1, Link: 1, Ack: 1, Match: 1
C:\oracle\product\11.2.0\client_1\opmn\bin>
With a local ONS started on my client, I can actually subscribe to the ONS and make use of the events. The easiest way is to simply decode the load balancing events, as I did in my code, shown below (modified version of the code in the Oracle documentation to make it work):
package de.mbh;
import oracle.simplefan.FanSubscription;
import oracle.simplefan.FanEventListener;
import oracle.simplefan.FanManager;
import oracle.simplefan.LoadAdvisoryEvent;
import oracle.simplefan.NodeDownEvent;
import oracle.simplefan.ServiceDownEvent;
import java.util.Properties;
public class TestClass {
TestClass() {
System.out.println("Hello");
Properties p = new Properties();
p.put("serviceName", "OEMSRV");
System.setProperty("oracle.ons.oraclehome", "c:\\oracle\\product\\11.2.0\\client_1");
System.out.println(System.getProperty("oracle.ons.oraclehome"));
FanSubscription sub = FanManager.getInstance().subscribe(p);
System.out.println("I'm subscribed!");
sub.addListener(new FanEventListener() {
public void handleEvent(ServiceDownEvent arg0) {
System.out.println("Service Down registered!");
}
public void handleEvent(NodeDownEvent arg0) {
System.out.println("Node Down Event Registered");
}
public void handleEvent(LoadAdvisoryEvent arg0) {
System.out.println("Just got a Load Advisory event");
System.out.println("originating database: " + arg0.getDatabaseUniqueName());
System.out.println("originating instance: " + arg0.getInstanceName());
System.out.println("Service Quality : " + arg0.getServiceQuality());
System.out.println("Percent : " + arg0.getPercent());
System.out.println("Service Name : " + arg0.getServiceName());
System.out.println("Service Quality : " + arg0.getServiceQuality());
System.out.println("Observed at : " + arg0.getTimestamp() + "\n\n");
} } );
}
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
TestClass tc = new TestClass();
int i = 0;
while ( i < 100000) {
try {
Thread.sleep(100);
i++;
} catch (Exception e) {
System.out.println(e);
}
}
System.out.println("execution ended");
}
}
When compiling or executing the code, you need the simplefan.jar and ons.jar files in your classpath, and chances are that you need ojdbc6.jar as well.
Starting the application reveals Load Balancing events being read:
Hello
c:\oracle\product\11.2.0\client_1
I'm subscribed!
Just got a Load Advisory event
originating database: LNGDS1RD
originating instance: dbInstance_rac11gr2drnode3_6200
Service Quality : -1
Percent : 51
Service Name : OEMSRV
Service Quality : -1
Observed at : Thu May 26 11:56:03 BST 2011
Just got a Load Advisory event
originating database: LNGDS1RD
originating instance: dbInstance_rac11gr2drnode3_6200
Service Quality : -1
Percent : 49
Service Name : OEMSRV
Service Quality : -1
Observed at : Thu May 26 11:56:03 BST 2011
This proves that you can write your own connection cache, which allows you to react to *down events and rebalance your session differently.
The Fedora 15 final release dropped a couple of days ago and I slapped it on my main desktop PC straight away. As usual, the first impression is all about the visuals. GNOME 3 looks great. I’m sure lots of people think KDE looks great too, but I tend to just stick with the default window manager, so it’s GNOME for me.
If you’ve read any of the press you will know that the menu bar and task bar have been removed. This is kinda weird at first. How does it affect me?
Task Bar: I was surprised how reliant I was on the task bar for switching between open apps. I would say about 50% of the time I was using the task bar and 50% of the time I was using alt+tab. Now I’m using alt+tab 100% of the time. I think this change has worked really well for me. I feel like I’m navigating quicker.
Menu Bar: I think this one will annoy a few people. In essence, the menu bar is still there, but one layer down. If you click on “Activies” in the top left (or hit the Windows key) you switch to the activities screen. There is a favorites doc on the left and if you click “Applications”, you get a menu (more like a filter) on the right of the screen. That’s all nice, but the thing I really like is if you click the Windows key and type in a few letters it returns all apps and items that contain those letters in the name. Similar to the Windows menu, but prettier. So if you insist on using the menu like an old-style menu, I think this change will annoy you as it requires an extra click and some animations. If you use the favorites doc and the search feature I think it’s quite cool.
The fancy visuals worked straight out of the box for my main desktop machine, but one of my other machines (with a better graphics card) couldn’t handle GNOME 3 and ran using fallback mode. Fallback mode is pretty much like previous GNOME releases with a menu bar and task bar. I’m sure some people will prefer fallback mode, but I think the new stuff is certainly worth a try.
If you really can’t handle the new interface you can manually switch to fallback mode. Start up the System Info dialog (Activities > Applications > System Settings > System Info), click “Graphics” , flick the “Forced Fallback Mode” switch and relog.
As for the OS itself, I’ve had no dramas so far, but it is early days. Time will tell…
By the way, I did the usual Oracle on Fedora thing.
First of all, the RAC Attack deep dive at Collaborate went great – thanks to everyone who participated! The room was full (20 participants) and I got evaluations from about half of them. Here’s a summary of the eval results:
100% class met expectations, would recommend to others
66% easy to follow, could use skills in working environment
100% already familiar with oracle, 90% use oracle daily
0 negative reviews of instructor (phew!)
1 negative review of curriculum: said practice exercises weren’t relevant but training manual was still above average.
0 negative “comments”
There were several positive comments such as this: “I would recommend this class to others. This setup is perfect to pick up new skills and expose what ifs w/out worrying about pressing the wrong button.”
I am playing around with the Grid Infrastructure 11.2.0.2 PSU 2 and found an interesting note on My Oracle Support regarding the Patch Set Update. This reminds me that it’s always a good idea to search for a patch number on Metalink before applying a PSU. It also seems to be a good idea to wait for a few days before trying a PSU (or maybe CPU) on your DEV environment for the first time (and don’t even think about applying a PSU on production without thorough testing!)
OK, back to the story: there is a known issue with the patchset which has to do with the change in the Mutex behaviour which the PSU was intended to fix. To quote MOS note “Oracle Database Patch Set Update 11.2.0.2.2 Known Issues (Doc ID 1291879.1)”, Patch 12431716 Is a Recommended Patch for 11.2.0.2.2. In fact, Oracle strongly recommends you to apply the patch to fix Bug 12431716 – Unexpected change in mutex wait behavior in 11.2.0.2.2 PSU (higher CPU possible).
In a nutshell, not applying the patch can cause your system to suffer from excessive CPU usage and more than expected mutex contention. More information can be found in the description of Bug 12431716 Mutex waits may cause higher CPU usage in 11.2.0.2.2 PSU / GI PSU which is worth reading.
Besides this, the PSU was applied without any problems to my four node cluster, I just wish there was a way to roll out a new version of opatch to all cluster node’s $GRID_HOME and $ORACLE_HOME in one command. The overall process for the PSU is the same as already described in my previous post about Bundle Patch 3:
Get the latest version of OPatch
Deploy OPatch to $GRID_HOME and $ORACLE_HOME (ensure permissions are set correctly for the OPatch in $GRID_HOME!)
Unzip the PSU (Bug 11724916 – 11.2.0.2.2 Patch Set Update (PSU) (Doc ID 11724916.8)), for example to /tmp/PSU
Change directory to where you unzipped (/tmp/PSU) and become root
Ensure that $GRID_HOME/OPatch is part of the path
Read the readme
Create an OCM response file and save it to say, /tmp/ocm.rsp
Start the patch as root: opatch auto and supply the full path to the OCM response file (/tmp/ocm.rsp)
Apply the beforementioned one-off patch
Then wait, and after a little while you spend trailing the logfile in $GRID_HOME/cfgtoollogs/ and having a coffee the process eventually finishes. Repeat on each node and you’re done. I’m really happy there aren’t these long readme files anymore with 8 steps to be performed, partially as root, partially as CRS owner/RDBMS owner. It reduces tge tune ut takes to apply a PSU significantly.
Addmittedly I haven’t checked for a little while, but an email by my co-author Steve Show prompted me to go to the Amazon website and look it up.
And yes, it’s reality! Our book is now finally available as a kindle version, how great is that?!?
There isn’t really a lot more to say about this subject. I’ll wonder how many techies are intersted in the kindle version after the PDF has been out for quite a while. If you read this and decide to get the kindle version, could you please let me know how you liked it? Personally I think the book is well suited for the Amazon reader as it’s mostly text which suits the device well.
I have recently upgraded my lab’s reference machine to Oracle Linux 6 and have experimented today with its network failover capabilities. I seemed to remember that network bonding on xen didn’t work, so was curious to test it on new hardware. As always, I am running this on my openSuSE 11.2 lab server, which features these components:
xen-3.4.1_19718_04-2.1.x86_64
Kernel 2.6.31.12-0.2-xen
libvirt-0.7.2-1.1.4.x86_64
Now for the fun part-I cloned my OL6REF domU, and in about 10 minutes had a new system to experiment with. The necessary new NIC was added quickly before registering the domU with XenStore. All you need to do in this case is to add another interface, as in this example (00:16:1e:1b:1d:1f already existed):
...
...
After registering the domU using a call to “virsh define bondingTest.xml” the system starts as usual, except that it has a second NIC, which at this stage is unconfigured. Remember that the Oracle Linux 5 and 6 network configuration is in /etc/sysconfig/network and /etc/sysconfig/network-scripts/.
The first step is to rename the server-change /etc/sysconfig/network to match your new server name.That’s easy :)
Now to the bonding driver. RHEL6 and OL 6 have deprecated /etc/modprobe.conf in favour of /etc/modprobe.d and its configuration files. It’s still necessary to tell the kernel that it should use the bonding driver for my new device, bond0 so I created a new file /etc/modprobe.d/bonding.conf with just one line in it:
alias bond0 bonding
That’s it, don’t put any further information about module parameters in the file, this is deprecated. The documentation clearly states “Important: put all bonding module parameters in ifcfg-bondN files”.
Now I had to create the configuration files for eth0, eth1 and bond0. They are created as follows:
Now for the bonding paramters-there are a few of interest. First, I wanted to set the mode to active-passive, which is Oracle recommended (with the rationale: it is simple). Additionally, you have to set either the arp_interval/arp_target parameters or a value to miimon to allow for speedy link failure detection. My BONDING_OPTS for bond0 is therefore as follows:
The test is going to be simple: first I’ll bring up the interface bond0 by issuing a “system network restart” command on the xen console, followed by a “xm network-detach” command.The output of the network restart command is here:
[root@rhel6ref network-scripts]# service network restart
Shutting down loopback interface: [ OK ]
Bringing up loopback interface: [ OK ]
Bringing up interface bond0: [ OK ]
[root@rhel6ref network-scripts]# ifconfig
bond0 Link encap:Ethernet HWaddr 00:16:1E:1B:1D:1F
inet addr:192.168.99.126 Bcast:192.168.99.255 Mask:255.255.255.0
inet6 addr: fe80::216:1eff:fe1b:1d1f/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:297 errors:0 dropped:0 overruns:0 frame:0
TX packets:32 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9002 (8.7 KiB) TX bytes:1824 (1.7 KiB)
eth0 Link encap:Ethernet HWaddr 00:16:1E:1B:1D:1F
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:214 errors:0 dropped:0 overruns:0 frame:0
TX packets:22 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:6335 (6.1 KiB) TX bytes:1272 (1.2 KiB)
Interrupt:18
eth1 Link encap:Ethernet HWaddr 00:16:1E:1B:1D:1F
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:83 errors:0 dropped:0 overruns:0 frame:0
TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2667 (2.6 KiB) TX bytes:552 (552.0 b)
Interrupt:17
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
The kernel traces these operations in /var/log/messages:
May 1 07:55:49 rhel6ref kernel: bonding: bond0: Setting MII monitoring interval to 1000.
May 1 07:55:49 rhel6ref kernel: bonding: bond0: setting mode to active-backup (1).
May 1 07:55:49 rhel6ref kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready
May 1 07:55:49 rhel6ref kernel: bonding: bond0: Adding slave eth0.
May 1 07:55:49 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full.
May 1 07:55:49 rhel6ref kernel: bonding: bond0: making interface eth0 the new active one.
May 1 07:55:49 rhel6ref kernel: bonding: bond0: first active interface up!
May 1 07:55:49 rhel6ref kernel: bonding: bond0: enslaving eth0 as an active interface with an up link.
May 1 07:55:49 rhel6ref kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
May 1 07:55:49 rhel6ref kernel: bonding: bond0: Adding slave eth1.
May 1 07:55:49 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to be 100Mb/sec and Full.
May 1 07:55:49 rhel6ref kernel: bonding: bond0: enslaving eth1 as a backup interface with an up link.
This shows an active device of eth0, with eth1 as the passive device. Note that the MAC addresses of all devices are identical (which is expected behaviour). Now let’s see what happens to the channel failover when I take a NIC offline. First of all I have to check xenstore which NICs are present:
I would like to take the active link away, which is at index 0. Let’s try:
# xm network-detach bondingTest 0
The domU shows the link failover:
May 1 08:00:46 rhel6ref kernel: bonding: bond0: Warning: the permanent HWaddr of eth0 - 00:16:1e:1b:1d:1f - is still in use by bond0.
Set the HWaddr of eth0 to a different address to avoid conflicts.
May 1 08:00:46 rhel6ref kernel: bonding: bond0: releasing active interface eth0
May 1 08:00:46 rhel6ref kernel: bonding: bond0: making interface eth1 the new active one.
May 1 08:00:46 rhel6ref kernel: net eth0: xennet_release_rx_bufs: fix me for copying receiver.
Oops, there seems to be a problem with the xennet driver, but never mind. The important information is in the lines above: the active eth0 device has been released, and eth1 jumped in. Next I think I will have to run a workload against the interface to see if that makes a difference.
And the reverse …
I couldn’t possibly leave the system in the “broken” state, so I decided to add the NIC back. That’s yet another online operation I can do:
I can also see that the kernel added the new interface back in.
May 2 05:05:31 rhel6ref kernel: bonding: bond0: Adding slave eth0.
May 2 05:05:31 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full.
May 2 05:05:31 rhel6ref kernel: bonding: bond0: enslaving eth0 as a backup interface with an up link.
I was just mailed a bug update and it included this text (spelling mistakes theirs, not mine).
Note customer is on Linux but could not find an available
11.2 Linux database to test on. Reprocided problem on Solaris
confirming that there is some generic problem here.
Really?
And here’s me thinking that firing up a VM with any version of Linux & Oracle was quick and easy. Perhaps their VMs are running on Amazon, hence the lack of available systems.
I’ve recently put a couple of new articles about old subjects on the website. In both cases, the articles were initiated by forum questions, but the explanations became too painful in the format of a forum post so they graduated into articles…
UDEV SCSI Rules Configuration In Oracle Linux 5 : For those of you that like to follow my Virtual RAC guides, but don’t like using ASMLib, you can use this article and replace ASMLib with UDEV.
Database Triggers Overview : This is really a primer on database triggers. I’ve focussed mostly on simple DML triggers, since this is what the vast majority of trigger-related questions I’m asked relate to. Consider it the “minimum” you should know before you write a database trigger.
Julian Dyke has started an interesting thread on the Oak Table mailing list after the latest UKOUG RAC and HA SIG. Unfortunately I couldn’t attend that event, I wish I had, and I knew it would be great.
Anyway, the question revolved around an ASM disk group created with normal redundancy spanning two storage arrays. This should in theory protect against the failure of an array, although at a high price. All ASM disks exported from an array would be 1 failure group. Remember that disks in a failure group all fail if the supporting infrastructure (network, HBA, controller etc) fails. So what would happen with such a setup, if you followed these steps:
Shutdown the array for failure group 2
Stop the database
Shutdown the second array – failure group 1
Do some more maintenance…
Startup failgroup B SAN
Start the database
Startup failgroup A SAN
ASM can tolerate the failure of one failgroup (capacity permitting), so the failure of failgroup 2 should not bring the disk group down, which would result in immediate loss of service. But what happens if it comes up after the data in the other failure group has been modified? Will there be data corruption?
Replaying
To simulate two storage arrays my distinguished filer01 and filer02 OpenFiler appliances have been used, each exporting 2 approx. 4G “LUNS” to my database host. At this time I only had access to my 11.1.0.7 2 node RAC system, if time permits I’ll repeat this with 10.2.0.5 and 11.2.0.2. The RAC cluster in the SIG presentation was 10.2. I am skipping the bit about the LUN creation and presentation to the hosts, and assume the following setup:
192.168.99.50 is my first openfiler instance, .192.168.99.51 the second. As you can see each export DISKA and DISKB. Mapped to the hosts, this is the target mapping (use iscsiadm –mode session –print 3 to find out):
filer02DiskB: /dev/sda
filer01DiskA: /dev/sdb
filer02DiskA: /dev/sdc
filer01DiskB: /dev/sdd
I am using ASMLib (as always on the lab) to label these disks:
DATA1 and DATA2 will not play a role in this article, I’m interested in the other disks. Assuming that the scandisks command completed on all nodes, I can add the disks to the new diskgroup:
Let’s create the diskgroup. The important part is to create failure groups per storage array. By the way this is not different from extended distance RAC!
SQL> create diskgroup fgtest normal redundancy
2 failgroup filer01 disk 'ORCL:FILER01DISKA', 'ORCL:FILER01DISKB'
3 failgroup filer02 disk 'ORCL:FILER02DISKA', 'ORCL:FILER02DISKB'
4 attribute 'compatible.asm'='11.1';
Diskgroup created.
With that done let’s have a look at the asm disk information:
SQL> select MOUNT_STATUS,HEADER_STATUS,STATE,REDUNDANCY,FAILGROUP,PATH from v$asm_disk where group_number=2;
MOUNT_S HEADER_STATU STATE REDUNDA FAILGROUP PATH
------- ------------ -------- ------- ------------------------------ --------------------
CACHED MEMBER NORMAL UNKNOWN FILER01 ORCL:FILER01DISKA
CACHED MEMBER NORMAL UNKNOWN FILER01 ORCL:FILER01DISKB
CACHED MEMBER NORMAL UNKNOWN FILER02 ORCL:FILER02DISKA
CACHED MEMBER NORMAL UNKNOWN FILER02 ORCL:FILER02DISKB
I have set the disk repair time to 24 hours and raised compatible parameters for RDBMS and ASM to 11.1, resulting in these attributes:
SQL> select * from v$asm_attribute
NAME VALUE GROUP_NUMBER ATTRIBUTE_INDEX ATTRIBUTE_INCARNATION READ_ON SYSTEM_
------------------------------ -------------------- ------------ --------------- --------------------- ------- -------
disk_repair_time 3.6h 2 0 1 N Y
au_size 1048576 2 8 1 Y Y
compatible.asm 11.1.0.0.0 2 20 1 N Y
compatible.rdbms 11.1.0.0.0 2 21 1 N Y
Unlike 11.2 where disk groups are managed as Clusterware resource, 11.1 requires you to manually start them or append the new disk group to ASM_DISKGORUPS. You should query gv$asm_diskgroup.state to ensure the new diskgroup is mounted on all cluster nodes.
I need some data! A small demo database can be restored to the new failure group to provide some experimental playground. This is quite easily done by using an RMAN duplicate with the correct {db|log}_file_name_convert parameter set.
Mirror
The diskgroup is created with normal redundancy, which means that ASM will create a mirror for every primary extent, taking failure groups into consideration. I wanted to ensure that the data is actually mirrored on the new disk group, which has group number 2.I need to get this information from V$ASM_FILE and V$ASM_ALIAS:
SQL> select * from v$asm_file where group_number = 2
GROUP_NUMBER FILE_NUMBER COMPOUND_INDEX INCARNATION BLOCK_SIZE BLOCKS BYTES SPACE TYPE REDUND STRIPE CREATION_ MODIFICAT R
------------ ----------- -------------- ----------- ---------- ---------- ---------- ---------- -------------------- ------ ------ --------- --------- -
2 256 33554688 747669775 16384 1129 18497536 78643200 CONTROLFILE HIGH FINE 05-APR-11 05-APR-11 U
2 257 33554689 747669829 8192 69769 571547648 1148190720 DATAFILE MIRROR COARSE 05-APR-11 05-APR-11 U
2 258 33554690 747669829 8192 60161 492838912 990904320 DATAFILE MIRROR COARSE 05-APR-11 05-APR-11 U
2 259 33554691 747669829 8192 44801 367009792 739246080 DATAFILE MIRROR COARSE 05-APR-11 05-APR-11 U
2 260 33554692 747669831 8192 25601 209723392 424673280 DATAFILE MIRROR COARSE 05-APR-11 05-APR-11 U
2 261 33554693 747669831 8192 641 5251072 12582912 DATAFILE MIRROR COARSE 05-APR-11 05-APR-11 U
2 262 33554694 747670409 512 102401 52429312 120586240 ONLINELOG MIRROR FINE 05-APR-11 05-APR-11 U
2 263 33554695 747670409 512 102401 52429312 120586240 ONLINELOG MIRROR FINE 05-APR-11 05-APR-11 U
2 264 33554696 747670417 512 102401 52429312 120586240 ONLINELOG MIRROR FINE 05-APR-11 05-APR-11 U
2 265 33554697 747670417 512 102401 52429312 120586240 ONLINELOG MIRROR FINE 05-APR-11 05-APR-11 U
2 266 33554698 747670419 8192 2561 20979712 44040192 TEMPFILE MIRROR COARSE 05-APR-11 05-APR-11 U
11 rows selected.
SQL> select * from v$asm_alias where group_NUMBER=2
NAME GROUP_NUMBER FILE_NUMBER FILE_INCARNATION ALIAS_INDEX ALIAS_INCARNATION PARENT_INDEX REFERENCE_INDEX A S
------------------------------ ------------ ----------- ---------------- ----------- ----------------- ------------ --------------- - -
RAC11G 2 4294967295 4294967295 0 3 33554432 33554485 Y Y
CONTROLFILE 2 4294967295 4294967295 53 3 33554485 33554538 Y Y
current.256.747669775 2 256 747669775 106 3 33554538 50331647 N Y
DATAFILE 2 4294967295 4294967295 54 1 33554485 33554591 Y Y
SYSAUX.257.747669829 2 257 747669829 159 1 33554591 50331647 N Y
SYSTEM.258.747669829 2 258 747669829 160 1 33554591 50331647 N Y
UNDOTBS1.259.747669829 2 259 747669829 161 1 33554591 50331647 N Y
UNDOTBS2.260.747669831 2 260 747669831 162 1 33554591 50331647 N Y
USERS.261.747669831 2 261 747669831 163 1 33554591 50331647 N Y
ONLINELOG 2 4294967295 4294967295 55 1 33554485 33554644 Y Y
group_1.262.747670409 2 262 747670409 212 1 33554644 50331647 N Y
group_2.263.747670409 2 263 747670409 213 1 33554644 50331647 N Y
group_3.264.747670417 2 264 747670417 214 1 33554644 50331647 N Y
group_4.265.747670417 2 265 747670417 215 1 33554644 50331647 N Y
TEMPFILE 2 4294967295 4294967295 56 1 33554485 33554697 Y Y
TEMP.266.747670419 2 266 747670419 265 1 33554697 50331647 N Y
My USERS tablespace which I am interested in most has file number 261-I chose it for this example as it’s only 5M in size. Taking my 1 MB allocation unit into account, it means I don’t have to trawl through thousands of line of output when getting the extent map.
Credit where credit is due-the next queries are partly based on the excellent work by Luca Canali from CERN, who has looked at ASM internals for a while. Make sure you have a look at the excellent reference available here: https://twiki.cern.ch/twiki/bin/view/PDBService/ASM_Internals. So to answer the question if the extents making up my users tablespace we need to have a look at the X$KFFXP, i.e. file extent pointers view:
As you can see, I have a number of extents, all evenly spread over my disks. I can verify that this information is correct by querying the X$KFDAT view as well which contains similar information, but more related to the disk to AU mapping
OK so I am confident that my data is actually mirrored-otherwise the following test would not make any sense. I have double checked that the disks in failgroup FILER01 actually belong to my OpenFiler “filer01″, and the same for filer02. Going back to the original scenario:
Shut down Filer02
This will take down all the disks of failure group B. Two minutes after taking the filer down I checked if it was indeed shut down:
Yes, no doubt about it-it’s down. What would the effect be? Surely I/O errors, but I wanted to enforce a check. Connected to +ASM2 I issued the “select * from v$asm_disk” command. This caused quite significant logging in the instance’s alert.log:
The interesting lines are “all mirror sides found readable, no repair required”. So taking down the failgroup didn’t cause an outage. The other ASM instance complained as well a little later:
2011-04-06 17:16:58.393000 +01:00
NOTE: initiating PST update: grp = 2, dsk = 2, mode = 0x15
NOTE: initiating PST update: grp = 2, dsk = 3, mode = 0x15
kfdp_updateDsk(): 24
kfdp_updateDskBg(): 24
WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:2, diskname:ORCL:FILER02DISKA disk:0x2.0x97a6d9f7 au:1
iop:0x2b9ea4855e70 bufp:0x2b9ea4850a00 offset(bytes):1052672 iosz:4096 operation:2(Write) synchronous:1
result: 4 osderr:0x3 osderr1:0x2e pid:870
NOTE: group FGTEST: updated PST location: disk 0000 (PST copy 0)
2011-04-06 17:17:03.508000 +01:00
NOTE: ASMB process exiting due to lack of ASM file activity for 5 seconds
NOTE: PST update grp = 2 completed successfully
NOTE: initiating PST update: grp = 2, dsk = 2, mode = 0x1
NOTE: initiating PST update: grp = 2, dsk = 3, mode = 0x1
kfdp_updateDsk(): 25
kfdp_updateDskBg(): 25
2011-04-06 17:17:07.454000 +01:00
NOTE: group FGTEST: updated PST location: disk 0000 (PST copy 0)
NOTE: PST update grp = 2 completed successfully
NOTE: cache closing disk 2 of grp 2: FILER02DISKA
NOTE: cache closing disk 3 of grp 2: FILER02DISKB
SUCCESS: extent 0 of file 267 group 2 repaired by offlining the disk
NOTE: repairing group 2 file 267 extent 0
SUCCESS: extent 0 of file 267 group 2 repaired - all mirror sides found readable, no repair required
2011-04-06 17:19:04.526000 +01:00
GMON SlaveB: Deferred DG Ops completed.
2011-04-06 17:22:07.487000 +01:00
GMON SlaveB: Deferred DG Ops completed.
No interruption of service though, which is good-the GV$ASM_CLIENT view reported all database instances still connected.
SQL> select name,state,header_status,path from v$asm_disk;
NAME STATE HEADER_STATU PATH FAILGROUP
------------------------------ -------- ------------ -------------------------------------------------- ------------------------------
NORMAL UNKNOWN ORCL:FILER02DISKA
NORMAL UNKNOWN ORCL:FILER02DISKB
DATA1 NORMAL MEMBER ORCL:DATA1 DATA1
DATA2 NORMAL MEMBER ORCL:DATA2 DATA2
FILER01DISKA NORMAL MEMBER ORCL:FILER01DISKA FILER01
FILER01DISKB NORMAL MEMBER ORCL:FILER01DISKB FILER01
FILER02DISKA NORMAL UNKNOWN FILER02
FILER02DISKB NORMAL UNKNOWN FILER02
8 rows selected.
As I expected the disks for failgroup filer02 are gone, and so is the information about the failure group. My disk repair time should be high enough to protect me from having to rebuild the whole disk group. Now I’m really curious if my database can become corrupted-I’ll increase the SCN.
[oracle@rac11gr1node1 ~]$ . setsid.sh rac11g
[oracle@rac11gr1node1 ~]$ sq
SQL*Plus: Release 11.1.0.7.0 - Production on Wed Apr 6 17:24:18 2011
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> select current_scn from v$database;
CURRENT_SCN
-----------
5999304
SQL> begin
2 for i in 1..5 loop
3 execute immediate 'alter system switch logfile';
4 end loop;
5 end;
6 /
PL/SQL procedure successfully completed.
SQL> select current_scn from v$database;
CURRENT_SCN
-----------
5999378
SQL>
Back to the test case.
Stop the Database
[oracle@rac11gr1node1 ~]$ srvctl stop database -d rac11g
[oracle@rac11gr1node1 ~]$ srvctl status database -d rac11g
Instance rac11g2 is not running on node rac11gr1node1
Instance rac11g1 is not running on node rac11gr1node2
Done-this part was simple. Next they stopped their first filer. To prevent bad things from happening I’ll shut down ASM on all nodes. I hope that doesn’t invalidate the test but I can’t see how ASM would not get a problem if the other failgroup went down as well.
Shut down Filer01 and start Filer02
Also quite simple. Shutting down this filer will allow me to follow the story. After filer01 was down I started filer02. I’m curious as to how ASM will react. I have deliberately NOT put disk group FGTEST into the ASM_DISKSTRING, I want to start it manually to get a better understanding of what happens.
After having started ASM on both nodes, I queried V$ASM_DISK and tried to mount the disk group:
SQL> select disk_number,name,state,header_status,path,failgroup from v$asm_disk;
DISK_NUMBER NAME STATE HEADER_STATU PATH FAILGROUP
----------- ------------------------------ -------- ------------ -------------------------------------------------- ------------------------------
0 NORMAL MEMBER ORCL:FILER02DISKA
1 NORMAL MEMBER ORCL:FILER02DISKB
2 NORMAL UNKNOWN ORCL:FILER01DISKA
3 NORMAL UNKNOWN ORCL:FILER01DISKB
0 DATA1 NORMAL MEMBER ORCL:DATA1 DATA1
1 DATA2 NORMAL MEMBER ORCL:DATA2 DATA2
6 rows selected.
Ooops, now they are both gone….
SQL> alter diskgroup fgtest mount;
alter diskgroup fgtest mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing
ORA-15042: ASM disk "0" is missing
ORA-15080: synchronous I/O operation to a disk failed
ORA-15080: synchronous I/O operation to a disk failed
OK, I have a problem here. Both ASM instances report I/O errors with the FGTEST diskgroup, and I can’t mount it. That means I can’t mount the database either-in a way it proves I won’t have corruption. But neither will I have a database, what is worse?
Can I get around this problem?
I think I’ll have to start filer01 and see if that makes a difference. Hopefully I can recover my system with the information in failgroup filer01. Soon after filer01 came online I tried the query against v$asmdisk again and tried to mount it.
SQL> select disk_number,name,state,header_status,path,failgroup from v$asm_disk;
DISK_NUMBER NAME STATE HEADER_STATU PATH FAILGROUP
----------- ------------------------------ -------- ------------ -------------------------------------------------- ------------------------------
0 NORMAL MEMBER ORCL:FILER02DISKA
1 NORMAL MEMBER ORCL:FILER02DISKB
2 NORMAL MEMBER ORCL:FILER01DISKA
3 NORMAL MEMBER ORCL:FILER01DISKB
0 DATA1 NORMAL MEMBER ORCL:DATA1 DATA1
1 DATA2 NORMAL MEMBER ORCL:DATA2 DATA2
6 rows selected.
That worked!
Wed Apr 06 17:45:32 2011
SQL> alter diskgroup fgtest mount
NOTE: cache registered group FGTEST number=2 incarn=0x72c150d7
NOTE: cache began mount (first) of group FGTEST number=2 incarn=0x72c150d7
NOTE: Assigning number (2,0) to disk (ORCL:FILER01DISKA)
NOTE: Assigning number (2,1) to disk (ORCL:FILER01DISKB)
NOTE: Assigning number (2,2) to disk (ORCL:FILER02DISKA)
NOTE: Assigning number (2,3) to disk (ORCL:FILER02DISKB)
Wed Apr 06 17:45:33 2011
NOTE: start heartbeating (grp 2)
kfdp_query(): 12
kfdp_queryBg(): 12
NOTE: cache opening disk 0 of grp 2: FILER01DISKA label:FILER01DISKA
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache opening disk 1 of grp 2: FILER01DISKB label:FILER01DISKB
NOTE: cache opening disk 2 of grp 2: FILER02DISKA label:FILER02DISKA
NOTE: F1X0 found on disk 2 fcn 0.0
NOTE: cache opening disk 3 of grp 2: FILER02DISKB label:FILER02DISKB
NOTE: cache mounting (first) group 2/0x72C150D7 (FGTEST)
Wed Apr 06 17:45:33 2011
* allocate domain 2, invalid = TRUE
kjbdomatt send to node 0
Wed Apr 06 17:45:33 2011
NOTE: attached to recovery domain 2
NOTE: cache recovered group 2 to fcn 0.7252
Wed Apr 06 17:45:33 2011
NOTE: LGWR attempting to mount thread 1 for diskgroup 2
NOTE: LGWR mounted thread 1 for disk group 2
NOTE: opening chunk 1 at fcn 0.7252 ABA
NOTE: seq=3 blk=337
NOTE: cache mounting group 2/0x72C150D7 (FGTEST) succeeded
NOTE: cache ending mount (success) of group FGTEST number=2 incarn=0x72c150d7
Wed Apr 06 17:45:33 2011
kfdp_query(): 13
kfdp_queryBg(): 13
NOTE: Instance updated compatible.asm to 11.1.0.0.0 for grp 2
SUCCESS: diskgroup FGTEST was mounted
SUCCESS: alter diskgroup fgtest mount
The V$ASM_DISK view is nicely updated and everything seems to be green:
SQL> select disk_number,name,state,header_status,path,failgroup from v$asm_disk;
DISK_NUMBER NAME STATE HEADER_STATU PATH FAILGROUP
----------- ------------------------------ -------- ------------ -------------------------------------------------- ------------------------------
0 DATA1 NORMAL MEMBER ORCL:DATA1 DATA1
1 DATA2 NORMAL MEMBER ORCL:DATA2 DATA2
0 FILER01DISKA NORMAL MEMBER ORCL:FILER01DISKA FILER01
1 FILER01DISKB NORMAL MEMBER ORCL:FILER01DISKB FILER01
2 FILER02DISKA NORMAL MEMBER ORCL:FILER02DISKA FILER02
3 FILER02DISKB NORMAL MEMBER ORCL:FILER02DISKB FILER02
6 rows selected.
Brilliant-will it have an effect on the database?
Starting the Database
Even though things looked ok, they weren’t! I didn’t expect this to happen:
[oracle@rac11gr1node1 ~]$ srvctl start database -d rac11g
PRKP-1001 : Error starting instance rac11g2 on node rac11gr1node1
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:SQL*Plus: Release 11.1.0.7.0 - Production on Wed Apr 6 17:48:58 2011
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:Copyright (c) 1982, 2008, Oracle. All rights reserved.
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:Enter user-name: Connected to an idle instance.
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:SQL> ORACLE instance started.
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:Total System Global Area 1720328192 bytes
rac11gr1node1:ora.rac11g.rac11g2.inst:Fixed Size 2160392 bytes
rac11gr1node1:ora.rac11g.rac11g2.inst:Variable Size 1291847928 bytes
rac11gr1node1:ora.rac11g.rac11g2.inst:Database Buffers 419430400 bytes
rac11gr1node1:ora.rac11g.rac11g2.inst:Redo Buffers 6889472 bytes
rac11gr1node1:ora.rac11g.rac11g2.inst:ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [9572],
rac11gr1node1:ora.rac11g.rac11g2.inst:[9533], [0x000000000], [], [], [], [], [], [], [], []
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
rac11gr1node1:ora.rac11g.rac11g2.inst:With the Partitioning, Real Application Clusters, OLAP, Data Mining
rac11gr1node1:ora.rac11g.rac11g2.inst:and Real Application Testing options
rac11gr1node1:ora.rac11g.rac11g2.inst:
CRS-0215: Could not start resource 'ora.rac11g.rac11g2.inst'.
PRKP-1001 : Error starting instance rac11g1 on node rac11gr1node2
CRS-0215: Could not start resource 'ora.rac11g.rac11g1.inst'.
Oops. A quick search on Metalink revealed note Ora-00600: [Kccpb_sanity_check_2], [3621501],[3621462] On Startup (Doc ID 435436.1). The explanation for the ORA-600 is that “the seq# of the last read block is higher than the seq# of the control file header block.” Oracle Support explains it with a lost write, but here the situation is quite different. Interesting! I have to leave that for another blog post.
Followers of the blog will know I’m a big fan of UltraEdit. I have a multi-platform unlimited upgrades license, so I run it on Linux, Mac and occasionally on a Windows VM.
I noticed today that version 2.1.0.3 was released for Mac and Linux about a month ago. Not sure how I missed that on the update notices. The changes for Mac are not that big because it was already at version 2.x, but the Linux version had been hanging around the 1.x for some time now and was missing a lot of functionality compared to the Mac version. This latest release is a pretty big catch-up for the Linux version and it now contains pretty much all of the functionality I use on a regular basis.
Both the Mac and Linux versions are still lagging behind the Windows version in terms of total functionality, but who cares about Windows…
Recent comments
16 weeks 5 days ago
26 weeks 3 days ago
28 weeks 1 day ago
31 weeks 2 days ago
33 weeks 4 days ago
43 weeks 1 day ago
44 weeks 5 days ago
45 weeks 5 days ago
45 weeks 6 days ago
48 weeks 4 days ago