Search

Top 60 Oracle Blogs

Recent comments

linux

Pro Oracle Database 11g RAC on Linux is available for Kindle

Addmittedly I haven’t checked for a little while, but an email by my co-author Steve Show prompted me to go to the Amazon website and look it up.

And yes, it’s reality! Our book is now finally available as a kindle version, how great is that?!?

There isn’t really a lot more to say about this subject. I’ll wonder how many techies are intersted in the kindle version after the PDF has been out for quite a while. If you read this and decide to get the kindle version, could you please let me know how you liked it?  Personally I think the book is well suited for the Amazon reader as it’s mostly text which suits the device well.

Happy reading!

Testing NIC bonding on RHEL6

I have recently upgraded my lab’s reference machine to Oracle Linux 6 and have experimented today with its network failover capabilities. I seemed to remember that network bonding on xen didn’t work, so was curious to test it on new hardware. As always, I am running this on my openSuSE 11.2 lab server, which features these components:

  • xen-3.4.1_19718_04-2.1.x86_64
  • Kernel  2.6.31.12-0.2-xen
  • libvirt-0.7.2-1.1.4.x86_64

Now for the fun part-I cloned my OL6REF domU, and in about 10 minutes had a new system to experiment with. The necessary new NIC was added quickly before registering the domU with XenStore. All you need to do in this case is to add another interface, as in this example (00:16:1e:1b:1d:1f already existed):

...








...

After registering the domU using  a call to “virsh define bondingTest.xml” the system starts as usual, except that it has a second NIC, which at this stage is unconfigured. Remember that the Oracle Linux 5 and 6 network configuration is in /etc/sysconfig/network and /etc/sysconfig/network-scripts/.

The first step is to rename the server-change /etc/sysconfig/network to match your new server name.That’s easy :)

Now to the bonding driver. RHEL6 and OL 6 have deprecated /etc/modprobe.conf in favour of /etc/modprobe.d and its configuration files. It’s still necessary to tell the kernel that it should use the bonding driver for my new device, bond0 so I created a new file /etc/modprobe.d/bonding.conf with just one line in it:

alias bond0 bonding

That’s it, don’t put any further information about module parameters in the file, this is deprecated. The documentation clearly states “Important: put all bonding module parameters in ifcfg-bondN files”.

Now I had to create the configuration files for eth0, eth1 and bond0. They are created as follows:

File: ifcfg-eth0

DEVICE=eth0
 BOOTPROTO=none
 ONBOOT=yes
 MASTER=bond0
 SLAVE=yes
 USERCTL=no

File: ifcfg-eth1

DEVICE=eth1
 BOOTPROTO=none
 ONBOOT=yes
 MASTER=bond0
 SLAVE=yes
 USERCTL=no

File: ifcfg-bond0

DEVICE=bond0
 IPADDR=192.168.0.126
 NETMASK=255.255.255.0
 ONBOOT=yes
 BOOTPROTO=none
 USERCTL=no
 BONDING_OPTS=""

Now for the bonding paramters-there are a few of interest. First, I wanted to set the mode to active-passive, which is Oracle recommended (with the rationale: it is simple). Additionally, you have to set either the arp_interval/arp_target parameters or a value to miimon to allow for speedy link failure detection. My BONDING_OPTS for bond0 is therefore as follows:

BONDING_OPTS=”miimon=1000 mode=active-backup”

Have a look at the documentation for more detail about the options.

The Test

The test is going to be simple: first I’ll bring up the interface bond0 by issuing a “system network restart” command on the xen console, followed by a “xm network-detach” command.The output of the network restart command is here:

[root@rhel6ref network-scripts]# service network restart
Shutting down loopback interface:  [  OK  ]
Bringing up loopback interface:  [  OK  ]
Bringing up interface bond0:  [  OK  ]
[root@rhel6ref network-scripts]# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          inet addr:192.168.99.126  Bcast:192.168.99.255  Mask:255.255.255.0
          inet6 addr: fe80::216:1eff:fe1b:1d1f/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:297 errors:0 dropped:0 overruns:0 frame:0
          TX packets:32 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:9002 (8.7 KiB)  TX bytes:1824 (1.7 KiB)

eth0      Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:214 errors:0 dropped:0 overruns:0 frame:0
          TX packets:22 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:6335 (6.1 KiB)  TX bytes:1272 (1.2 KiB)
          Interrupt:18

eth1      Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:83 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2667 (2.6 KiB)  TX bytes:552 (552.0 b)
          Interrupt:17

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

The kernel traces these operations in /var/log/messages:

May  1 07:55:49 rhel6ref kernel: bonding: bond0: Setting MII monitoring interval to 1000.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: setting mode to active-backup (1).
 May  1 07:55:49 rhel6ref kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: Adding slave eth0.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: making interface eth0 the new active one.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: first active interface up!
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: enslaving eth0 as an active interface with an up link.
 May  1 07:55:49 rhel6ref kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: Adding slave eth1.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to be 100Mb/sec and Full.
 May  1 07:55:49 rhel6ref kernel: bonding: bond0: enslaving eth1 as a backup interface with an up link.

This shows an active device of eth0, with eth1 as the passive device. Note that the MAC addresses of all devices are identical (which is expected behaviour). Now let’s see what happens to the channel failover when I take a NIC offline. First of all I have to check xenstore which NICs are present:

# xm network-list bondingTest
Idx BE     MAC Addr.     handle state evt-ch tx-/rx-ring-ref BE-path
0   0  00:16:1e:1b:1d:1f    0     4      14    13   /768     /local/domain/0/backend/vif/208/0
1   0  00:16:1e:10:11:1f    1     4      15    1280 /1281    /local/domain/0/backend/vif/208/1

I would like to take the active link away, which is at index 0. Let’s try:

# xm network-detach bondingTest 0

The domU shows the link failover:

May  1 08:00:46 rhel6ref kernel: bonding: bond0: Warning: the permanent HWaddr of eth0 - 00:16:1e:1b:1d:1f - is still in use by bond0.
Set the HWaddr of eth0 to a different address to avoid conflicts.
 May  1 08:00:46 rhel6ref kernel: bonding: bond0: releasing active interface eth0
 May  1 08:00:46 rhel6ref kernel: bonding: bond0: making interface eth1 the new active one.
 May  1 08:00:46 rhel6ref kernel: net eth0: xennet_release_rx_bufs: fix me for copying receiver.

Oops, there seems to be a problem with the xennet driver, but never mind. The important information is in the lines above: the active eth0 device has been released, and eth1 jumped in. Next I think I will have to run a workload against the interface to see if that makes a difference.

And the reverse …

I couldn’t possibly leave the system in the “broken” state, so I decided to add the NIC back. That’s yet another online operation I can do:

# xm network-attach bondingTest type='bridge' mac='00:16:1e:1b:1d:1f' bridge=br1 script=/etc/xen/scripts/vif-bridge

Voila-job done. Checking the output of ifconfig I can see the interface is back:

# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          inet addr:192.168.99.126  Bcast:192.168.99.255  Mask:255.255.255.0
          inet6 addr: fe80::216:1eff:fe1b:1d1f/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:39110 errors:0 dropped:0 overruns:0 frame:0
          TX packets:183 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1171005 (1.1 MiB)  TX bytes:32496 (31.7 KiB)

eth0      Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:7 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:412 (412.0 b)  TX bytes:0 (0.0 b)
          Interrupt:18

eth1      Link encap:Ethernet  HWaddr 00:16:1E:1B:1D:1F
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:39106 errors:0 dropped:0 overruns:0 frame:0
          TX packets:186 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1170749 (1.1 MiB)  TX bytes:33318 (32.5 KiB)
          Interrupt:17

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:46 errors:0 dropped:0 overruns:0 frame:0
          TX packets:46 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2484 (2.4 KiB)  TX bytes:2484 (2.4 KiB)

I can also see that the kernel added the new interface back in.

May  2 05:05:31 rhel6ref kernel: bonding: bond0: Adding slave eth0.
May  2 05:05:31 rhel6ref kernel: bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full.
May  2 05:05:31 rhel6ref kernel: bonding: bond0: enslaving eth0 as a backup interface with an up link.

All is well that ends well.

No Linux servers for Oracle Support…

I was just mailed a bug update and it included this text (spelling mistakes theirs, not mine).

Note customer is on Linux but could not find an available
11.2 Linux database to test on.  Reprocided problem on Solaris
confirming that there is some generic problem here.

Really?

And here’s me thinking that firing up a VM with any version of Linux & Oracle was quick and easy. Perhaps their VMs are running on Amazon, hence the lack of available systems. :)

Cheers

Tim…




Two New Articles: UDEV and Database Triggers…

I’ve recently put a couple of new articles about old subjects on the website. In both cases, the articles were initiated by forum questions, but the explanations became too painful in the format of a forum post so they graduated into articles…

  • UDEV SCSI Rules Configuration In Oracle Linux 5 : For those of you that like to follow my Virtual RAC guides, but don’t like using ASMLib, you can use this article and replace ASMLib with UDEV.
  • Database Triggers Overview : This is really a primer on database triggers. I’ve focussed mostly on simple DML triggers, since this is what the vast majority of trigger-related questions I’m asked relate to. Consider it the “minimum” you should know before you write a database trigger.

Cheers

Tim…




ASM normal redundancy and failure groups spanning SANs

Julian Dyke has started an interesting thread on the Oak Table mailing list after the latest UKOUG RAC and HA SIG. Unfortunately I couldn’t attend that event, I wish I had, and I knew it would be great.

Anyway, the question revolved around an ASM disk group created with normal redundancy spanning two storage arrays. This should in theory protect against the failure of an array, although at a high price. All ASM disks exported from an array would be 1 failure group. Remember that disks in a failure group all fail if the supporting infrastructure (network, HBA, controller etc) fails. So what would happen with such a setup, if you followed these steps:

  • Shutdown the array for failure group 2
  • Stop the database
  • Shutdown the second array – failure group 1
  • Do some more maintenance…
  • Startup failgroup B SAN
  • Start the database
  • Startup failgroup A SAN

ASM can tolerate the failure of one failgroup (capacity permitting), so the failure of failgroup 2 should not bring the disk group down, which would result in immediate loss of service. But what happens if it comes up after the data in the other failure group has been modified? Will there be data corruption?

Replaying

To simulate two storage arrays my distinguished filer01 and filer02 OpenFiler appliances have been used, each exporting 2 approx. 4G “LUNS” to my database host. At this time I only had access to my 11.1.0.7 2 node RAC system, if time permits I’ll repeat this with 10.2.0.5 and 11.2.0.2. The RAC cluster in the SIG presentation was 10.2. I am skipping the bit about the LUN creation and presentation to the hosts, and assume the following setup:

[root@rac11gr1node1 ~]# iscsiadm -m session
tcp: [1] 192.168.99.51:3260,1 iqn.2006-01.com.openfiler:filer02DiskB
tcp: [2] 192.168.99.50:3260,1 iqn.2006-01.com.openfiler:filer01DiskA
tcp: [3] 192.168.99.51:3260,1 iqn.2006-01.com.openfiler:filer02DiskA
tcp: [4] 192.168.99.50:3260,1 iqn.2006-01.com.openfiler:filer01DiskB

192.168.99.50 is my first openfiler instance, .192.168.99.51 the second. As you can see each export DISKA and DISKB. Mapped to the hosts, this is the target mapping (use iscsiadm –mode session –print 3 to find out):

  • filer02DiskB: /dev/sda
  • filer01DiskA: /dev/sdb
  • filer02DiskA: /dev/sdc
  • filer01DiskB: /dev/sdd

I am using ASMLib (as always on the lab) to label these disks:

[root@rac11gr1node1 ~]# oracleasm listdisks
DATA1
DATA2
FILER01DISKA
FILER01DISKB
FILER02DISKA
FILER02DISKB

DATA1 and DATA2 will not play a role in this article, I’m interested in the other disks. Assuming that the scandisks command completed on all nodes, I can add the disks to the new diskgroup:

SQL> select path from v$asm_disk

PATH
--------------------------------------------------------------------------------
ORCL:FILER01DISKA
ORCL:FILER01DISKB
ORCL:FILER02DISKA
ORCL:FILER02DISKB
ORCL:DATA1
ORCL:DATA2

Let’s create the diskgroup. The important part is to create failure groups per storage array. By the way this is not different from extended distance RAC!

SQL> create diskgroup fgtest normal redundancy
 2  failgroup filer01 disk 'ORCL:FILER01DISKA', 'ORCL:FILER01DISKB'
 3  failgroup filer02 disk 'ORCL:FILER02DISKA', 'ORCL:FILER02DISKB'
 4  attribute 'compatible.asm'='11.1';

Diskgroup created.

With that done let’s have a look at the asm disk information:

SQL> select MOUNT_STATUS,HEADER_STATUS,STATE,REDUNDANCY,FAILGROUP,PATH from v$asm_disk where group_number=2;

MOUNT_S HEADER_STATU STATE    REDUNDA FAILGROUP                      PATH
------- ------------ -------- ------- ------------------------------ --------------------
CACHED  MEMBER       NORMAL   UNKNOWN FILER01                        ORCL:FILER01DISKA
CACHED  MEMBER       NORMAL   UNKNOWN FILER01                        ORCL:FILER01DISKB
CACHED  MEMBER       NORMAL   UNKNOWN FILER02                        ORCL:FILER02DISKA
CACHED  MEMBER       NORMAL   UNKNOWN FILER02                        ORCL:FILER02DISKB

I have set the disk repair time to 24 hours and raised compatible parameters for RDBMS and ASM to 11.1, resulting in these attributes:

SQL> select * from v$asm_attribute

NAME                           VALUE                GROUP_NUMBER ATTRIBUTE_INDEX ATTRIBUTE_INCARNATION READ_ON SYSTEM_
------------------------------ -------------------- ------------ --------------- --------------------- ------- -------
disk_repair_time               3.6h                            2               0                     1 N       Y
au_size                        1048576                         2               8                     1 Y       Y
compatible.asm                 11.1.0.0.0                      2              20                     1 N       Y
compatible.rdbms               11.1.0.0.0                      2              21                     1 N       Y

Unlike 11.2 where disk groups are managed as Clusterware resource, 11.1 requires you to manually start them or append the new disk group to ASM_DISKGORUPS. You should query gv$asm_diskgroup.state to ensure the new diskgroup is mounted on all cluster nodes.

I need some data! A small demo database can be restored to the new failure group to provide some experimental playground. This is quite easily done by using an RMAN duplicate with the correct {db|log}_file_name_convert parameter set.

Mirror

The diskgroup is created with normal redundancy, which means that ASM will create a mirror for every primary extent, taking failure groups into consideration. I wanted to ensure that the data is actually mirrored on the new disk group, which has group number 2.I need to get this information from V$ASM_FILE and V$ASM_ALIAS:

SQL> select * from v$asm_file where group_number = 2

GROUP_NUMBER FILE_NUMBER COMPOUND_INDEX INCARNATION BLOCK_SIZE     BLOCKS      BYTES      SPACE TYPE                 REDUND STRIPE CREATION_ MODIFICAT R
------------ ----------- -------------- ----------- ---------- ---------- ---------- ---------- -------------------- ------ ------ --------- --------- -
 2         256       33554688   747669775      16384       1129   18497536   78643200 CONTROLFILE          HIGH   FINE   05-APR-11 05-APR-11 U
 2         257       33554689   747669829       8192      69769  571547648 1148190720 DATAFILE             MIRROR COARSE 05-APR-11 05-APR-11 U
 2         258       33554690   747669829       8192      60161  492838912  990904320 DATAFILE             MIRROR COARSE 05-APR-11 05-APR-11 U
 2         259       33554691   747669829       8192      44801  367009792  739246080 DATAFILE             MIRROR COARSE 05-APR-11 05-APR-11 U
 2         260       33554692   747669831       8192      25601  209723392  424673280 DATAFILE             MIRROR COARSE 05-APR-11 05-APR-11 U
 2         261       33554693   747669831       8192        641    5251072   12582912 DATAFILE             MIRROR COARSE 05-APR-11 05-APR-11 U
 2         262       33554694   747670409        512     102401   52429312  120586240 ONLINELOG            MIRROR FINE   05-APR-11 05-APR-11 U
 2         263       33554695   747670409        512     102401   52429312  120586240 ONLINELOG            MIRROR FINE   05-APR-11 05-APR-11 U
 2         264       33554696   747670417        512     102401   52429312  120586240 ONLINELOG            MIRROR FINE   05-APR-11 05-APR-11 U
 2         265       33554697   747670417        512     102401   52429312  120586240 ONLINELOG            MIRROR FINE   05-APR-11 05-APR-11 U
 2         266       33554698   747670419       8192       2561   20979712   44040192 TEMPFILE             MIRROR COARSE 05-APR-11 05-APR-11 U

11 rows selected.

SQL> select * from v$asm_alias where group_NUMBER=2

NAME                           GROUP_NUMBER FILE_NUMBER FILE_INCARNATION ALIAS_INDEX ALIAS_INCARNATION PARENT_INDEX REFERENCE_INDEX A S
------------------------------ ------------ ----------- ---------------- ----------- ----------------- ------------ --------------- - -
RAC11G                                    2  4294967295       4294967295           0                 3     33554432        33554485 Y Y
CONTROLFILE                               2  4294967295       4294967295          53                 3     33554485        33554538 Y Y
current.256.747669775                     2         256        747669775         106                 3     33554538        50331647 N Y
DATAFILE                                  2  4294967295       4294967295          54                 1     33554485        33554591 Y Y
SYSAUX.257.747669829                      2         257        747669829         159                 1     33554591        50331647 N Y
SYSTEM.258.747669829                      2         258        747669829         160                 1     33554591        50331647 N Y
UNDOTBS1.259.747669829                    2         259        747669829         161                 1     33554591        50331647 N Y
UNDOTBS2.260.747669831                    2         260        747669831         162                 1     33554591        50331647 N Y
USERS.261.747669831                       2         261        747669831         163                 1     33554591        50331647 N Y
ONLINELOG                                 2  4294967295       4294967295          55                 1     33554485        33554644 Y Y
group_1.262.747670409                     2         262        747670409         212                 1     33554644        50331647 N Y
group_2.263.747670409                     2         263        747670409         213                 1     33554644        50331647 N Y
group_3.264.747670417                     2         264        747670417         214                 1     33554644        50331647 N Y
group_4.265.747670417                     2         265        747670417         215                 1     33554644        50331647 N Y
TEMPFILE                                  2  4294967295       4294967295          56                 1     33554485        33554697 Y Y
TEMP.266.747670419                        2         266        747670419         265                 1     33554697        50331647 N Y

My USERS tablespace which I am interested in most has file number 261-I chose it for this example as it’s only 5M in size. Taking my 1 MB allocation unit into account, it means I don’t have to trawl through thousands of line of output when getting the extent map.

Credit where credit is due-the next queries are partly based on the excellent work by Luca Canali from CERN, who has looked at ASM internals for a while. Make sure you have a look at the excellent reference available here: https://twiki.cern.ch/twiki/bin/view/PDBService/ASM_Internals. So to answer the question if the extents making up my users tablespace we need to have a look at the X$KFFXP, i.e. file extent pointers view:

SQL> select GROUP_KFFXP,DISK_KFFXP,AU_KFFXP from x$kffxp where number_kffxp=261 and group_kffxp=2 order by disk_kffxp;

GROUP_KFFXP DISK_KFFXP   AU_KFFXP
----------- ---------- ----------
 2          0        864
 2          0        865
 2          0        866
 2          1        832
 2          1        831
 2          1        833
 2          2        864
 2          2        866
 2          2        865
 2          3        832
 2          3        833
 2          3        831

12 rows selected.

As you can see, I have a number of extents, all evenly spread over my disks. I can verify that this information is correct by querying the X$KFDAT view as well which contains similar information, but more related to the disk  to AU mapping

SQL> select GROUP_KFDAT,NUMBER_KFDAT,AUNUM_KFDAT from x$kfdat where fnum_kfdat = 261 and group_kfdat=2

GROUP_KFDAT NUMBER_KFDAT AUNUM_KFDAT
----------- ------------ -----------
 2            0         864
 2            0         865
 2            0         866
 2            1         831
 2            1         832
 2            1         833
 2            2         864
 2            2         865
 2            2         866
 2            3         831
 2            3         832
 2            3         833

12 rows selected.

OK so I am confident that my data is actually mirrored-otherwise the following test would not make any sense. I have double checked that the disks in failgroup FILER01 actually belong to my OpenFiler “filer01″, and the same for filer02. Going back to the original scenario:

Shut down Filer02

This will take down all the disks of failure group B. Two minutes after taking the filer down I checked if it was indeed shut down:

martin@dom0:~> sudo xm list | grep filer
filer01                                    183   512     1     -b----   1159.6
filer02                                          512     1              1179.6
filer03                                    185   512     1     -b----   1044.4

Yes, no doubt about it-it’s down. What would the effect be? Surely I/O errors, but I wanted to enforce a check. Connected to +ASM2 I issued the “select * from v$asm_disk” command. This caused quite significant logging in the instance’s alert.log:

NOTE: ASMB process exiting due to lack of ASM file activity for 5 seconds
Wed Apr 06 17:17:39 2011
WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKA disk:0x0.0x97954459 au:0
 iop:0x2b2997b61330 bufp:0x2b29977b3e00 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0
 result: 4 osderr:0x3 osderr1:0x2e pid:6690
WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKB disk:0x1.0x9795445a au:0
 iop:0x2b2997b61220 bufp:0x2b29977b0200 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0
 result: 4 osderr:0x3 osderr1:0x2e pid:6690
Wed Apr 06 17:17:58 2011
WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKB disk:0x1.0x9795445a au:0
 iop:0x2b2997b61440 bufp:0x2b29977a6400 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0
 result: 4 osderr:0x3 osderr1:0x2e pid:6690
WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKA disk:0x0.0x97954459 au:0
 iop:0x2b2997b61550 bufp:0x2b29977b1600 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0
 result: 4 osderr:0x3 osderr1:0x2e pid:6690
Wed Apr 06 17:18:03 2011
WARNING: Disk (FILER02DISKA) will be dropped in: (86400) secs on ASM inst: (2)
WARNING: Disk (FILER02DISKB) will be dropped in: (86400) secs on ASM inst: (2)
GMON SlaveB: Deferred DG Ops completed.
Wed Apr 06 17:19:26 2011
WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKB disk:0x1.0x9795445a au:0
 iop:0x2b2997b61550 bufp:0x2b29977b1600 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0
 result: 4 osderr:0x3 osderr1:0x2e pid:6690
WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKA disk:0x0.0x97954459 au:0
 iop:0x2b2997b61440 bufp:0x2b29977b0200 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0
 result: 4 osderr:0x3 osderr1:0x2e pid:6690
Wed Apr 06 17:20:10 2011
WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKA disk:0x0.0x97954459 au:0
 iop:0x2b2997b61000 bufp:0x2b29977a6400 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0
 result: 4 osderr:0x3 osderr1:0x2e pid:6690
WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKB disk:0x1.0x9795445a au:0
 iop:0x2b2997b61550 bufp:0x2b29977b1600 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0
 result: 4 osderr:0x3 osderr1:0x2e pid:6690
Wed Apr 06 17:21:07 2011
WARNING: Disk (FILER02DISKA) will be dropped in: (86217) secs on ASM inst: (2)
WARNING: Disk (FILER02DISKB) will be dropped in: (86217) secs on ASM inst: (2)
GMON SlaveB: Deferred DG Ops completed.
Wed Apr 06 17:27:15 2011
WARNING: Disk (FILER02DISKA) will be dropped in: (85849) secs on ASM inst: (2)
WARNING: Disk (FILER02DISKB) will be dropped in: (85849) secs on ASM inst: (2)
GMON SlaveB: Deferred DG Ops completed.

The interesting lines are “all mirror sides found readable, no repair required”. So taking down the failgroup didn’t cause an outage. The other ASM instance complained as well a little later:

2011-04-06 17:16:58.393000 +01:00
NOTE: initiating PST update: grp = 2, dsk = 2, mode = 0x15
NOTE: initiating PST update: grp = 2, dsk = 3, mode = 0x15
kfdp_updateDsk(): 24
kfdp_updateDskBg(): 24
WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:2, diskname:ORCL:FILER02DISKA disk:0x2.0x97a6d9f7 au:1
 iop:0x2b9ea4855e70 bufp:0x2b9ea4850a00 offset(bytes):1052672 iosz:4096 operation:2(Write) synchronous:1
 result: 4 osderr:0x3 osderr1:0x2e pid:870
NOTE: group FGTEST: updated PST location: disk 0000 (PST copy 0)
2011-04-06 17:17:03.508000 +01:00
NOTE: ASMB process exiting due to lack of ASM file activity for 5 seconds
NOTE: PST update grp = 2 completed successfully
NOTE: initiating PST update: grp = 2, dsk = 2, mode = 0x1
NOTE: initiating PST update: grp = 2, dsk = 3, mode = 0x1
kfdp_updateDsk(): 25
kfdp_updateDskBg(): 25
2011-04-06 17:17:07.454000 +01:00
NOTE: group FGTEST: updated PST location: disk 0000 (PST copy 0)
NOTE: PST update grp = 2 completed successfully
NOTE: cache closing disk 2 of grp 2: FILER02DISKA
NOTE: cache closing disk 3 of grp 2: FILER02DISKB
SUCCESS: extent 0 of file 267 group 2 repaired by offlining the disk
NOTE: repairing group 2 file 267 extent 0
SUCCESS: extent 0 of file 267 group 2 repaired - all mirror sides found readable, no repair required
2011-04-06 17:19:04.526000 +01:00
GMON SlaveB: Deferred DG Ops completed.
2011-04-06 17:22:07.487000 +01:00
GMON SlaveB: Deferred DG Ops completed.

No interruption of service though, which is good-the GV$ASM_CLIENT view reported all database instances still connected.

SQL> select * from gv$asm_client;

 INST_ID GROUP_NUMBER INSTANCE_NAME                                                    DB_NAME  STATUS       SOFTWARE_VERSIO COMPATIBLE_VERS
---------- ------------ ---------------------------------------------------------------- -------- ------------ --------------- ---------------
 2            2 rac11g2                                                          rac11g   CONNECTED    11.1.0.7.0      11.1.0.0.0
 1            2 rac11g1                                                          rac11g   CONNECTED    11.1.0.7.0      11.1.0.0.0

The result in the V$ASM_DISK view was as follows:

SQL> select name,state,header_status,path from v$asm_disk;

NAME                           STATE    HEADER_STATU PATH                                               FAILGROUP
------------------------------ -------- ------------ -------------------------------------------------- ------------------------------
 NORMAL   UNKNOWN      ORCL:FILER02DISKA
 NORMAL   UNKNOWN      ORCL:FILER02DISKB
DATA1                          NORMAL   MEMBER       ORCL:DATA1                                         DATA1
DATA2                          NORMAL   MEMBER       ORCL:DATA2                                         DATA2
FILER01DISKA                   NORMAL   MEMBER       ORCL:FILER01DISKA                                  FILER01
FILER01DISKB                   NORMAL   MEMBER       ORCL:FILER01DISKB                                  FILER01
FILER02DISKA                   NORMAL   UNKNOWN                                                         FILER02
FILER02DISKB                   NORMAL   UNKNOWN                                                         FILER02

8 rows selected.

As I expected the disks for failgroup filer02 are gone, and so is the information about the failure group. My disk repair time should be high enough to protect me from having to rebuild the whole disk group. Now I’m really curious if my database can become corrupted-I’ll increase the SCN.

[oracle@rac11gr1node1 ~]$ . setsid.sh rac11g
[oracle@rac11gr1node1 ~]$ sq

SQL*Plus: Release 11.1.0.7.0 - Production on Wed Apr 6 17:24:18 2011

Copyright (c) 1982, 2008, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
 5999304

SQL> begin
 2   for i in 1..5 loop
 3    execute immediate 'alter system switch logfile';
 4   end loop;
 5  end;
 6  /

PL/SQL procedure successfully completed.

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
 5999378

SQL>

Back to the test case.

Stop the Database

[oracle@rac11gr1node1 ~]$ srvctl stop database -d rac11g
[oracle@rac11gr1node1 ~]$ srvctl status database -d rac11g
Instance rac11g2 is not running on node rac11gr1node1
Instance rac11g1 is not running on node rac11gr1node2

Done-this part was simple. Next they stopped their first filer. To prevent bad things from happening I’ll shut down ASM on all nodes. I hope that doesn’t invalidate the test but I can’t see how ASM would not get a problem if the other failgroup went down as well.

Shut down Filer01 and start Filer02

Also quite simple. Shutting down this filer will allow me to follow the story. After filer01 was down I started filer02. I’m curious as to how ASM will react. I have deliberately NOT put disk group FGTEST into the ASM_DISKSTRING, I want to start it manually to get a better understanding of what happens.

After having started ASM on both nodes, I queried V$ASM_DISK and tried to mount the disk group:

SQL> select disk_number,name,state,header_status,path,failgroup from v$asm_disk;

DISK_NUMBER NAME                           STATE    HEADER_STATU PATH                                               FAILGROUP
----------- ------------------------------ -------- ------------ -------------------------------------------------- ------------------------------
 0                                NORMAL   MEMBER       ORCL:FILER02DISKA
 1                                NORMAL   MEMBER       ORCL:FILER02DISKB
 2                                NORMAL   UNKNOWN      ORCL:FILER01DISKA
 3                                NORMAL   UNKNOWN      ORCL:FILER01DISKB
 0 DATA1                          NORMAL   MEMBER       ORCL:DATA1                                         DATA1
 1 DATA2                          NORMAL   MEMBER       ORCL:DATA2                                         DATA2

6 rows selected.

Ooops, now they are both gone….

SQL> alter diskgroup fgtest mount;
alter diskgroup fgtest mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing
ORA-15042: ASM disk "0" is missing
ORA-15080: synchronous I/O operation to a disk failed
ORA-15080: synchronous I/O operation to a disk failed

OK, I have a problem here. Both ASM instances report I/O errors with the FGTEST diskgroup, and I can’t mount it. That means I can’t mount the database either-in a way it proves I won’t have corruption. But neither will I have a database, what is worse?

Can I get around this problem?

I think I’ll have to start filer01 and see if that makes a difference. Hopefully I can recover my system with the information in failgroup filer01. Soon after filer01 came online I tried the query against v$asmdisk again and tried to mount it.

SQL> select disk_number,name,state,header_status,path,failgroup from v$asm_disk;

DISK_NUMBER NAME                           STATE    HEADER_STATU PATH                                               FAILGROUP
----------- ------------------------------ -------- ------------ -------------------------------------------------- ------------------------------
 0                                NORMAL   MEMBER       ORCL:FILER02DISKA
 1                                NORMAL   MEMBER       ORCL:FILER02DISKB
 2                                NORMAL   MEMBER       ORCL:FILER01DISKA
 3                                NORMAL   MEMBER       ORCL:FILER01DISKB
 0 DATA1                          NORMAL   MEMBER       ORCL:DATA1                                         DATA1
 1 DATA2                          NORMAL   MEMBER       ORCL:DATA2                                         DATA2

6 rows selected.

That worked!

Wed Apr 06 17:45:32 2011
SQL> alter diskgroup fgtest mount
NOTE: cache registered group FGTEST number=2 incarn=0x72c150d7
NOTE: cache began mount (first) of group FGTEST number=2 incarn=0x72c150d7
NOTE: Assigning number (2,0) to disk (ORCL:FILER01DISKA)
NOTE: Assigning number (2,1) to disk (ORCL:FILER01DISKB)
NOTE: Assigning number (2,2) to disk (ORCL:FILER02DISKA)
NOTE: Assigning number (2,3) to disk (ORCL:FILER02DISKB)
Wed Apr 06 17:45:33 2011
NOTE: start heartbeating (grp 2)
kfdp_query(): 12
kfdp_queryBg(): 12
NOTE: cache opening disk 0 of grp 2: FILER01DISKA label:FILER01DISKA
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache opening disk 1 of grp 2: FILER01DISKB label:FILER01DISKB
NOTE: cache opening disk 2 of grp 2: FILER02DISKA label:FILER02DISKA
NOTE: F1X0 found on disk 2 fcn 0.0
NOTE: cache opening disk 3 of grp 2: FILER02DISKB label:FILER02DISKB
NOTE: cache mounting (first) group 2/0x72C150D7 (FGTEST)
Wed Apr 06 17:45:33 2011
* allocate domain 2, invalid = TRUE
kjbdomatt send to node 0
Wed Apr 06 17:45:33 2011
NOTE: attached to recovery domain 2
NOTE: cache recovered group 2 to fcn 0.7252
Wed Apr 06 17:45:33 2011
NOTE: LGWR attempting to mount thread 1 for diskgroup 2
NOTE: LGWR mounted thread 1 for disk group 2
NOTE: opening chunk 1 at fcn 0.7252 ABA
NOTE: seq=3 blk=337
NOTE: cache mounting group 2/0x72C150D7 (FGTEST) succeeded
NOTE: cache ending mount (success) of group FGTEST number=2 incarn=0x72c150d7
Wed Apr 06 17:45:33 2011
kfdp_query(): 13
kfdp_queryBg(): 13
NOTE: Instance updated compatible.asm to 11.1.0.0.0 for grp 2
SUCCESS: diskgroup FGTEST was mounted
SUCCESS: alter diskgroup fgtest mount

The V$ASM_DISK view is nicely updated and everything seems to be green:

SQL> select disk_number,name,state,header_status,path,failgroup from v$asm_disk;

DISK_NUMBER NAME                           STATE    HEADER_STATU PATH                                               FAILGROUP
----------- ------------------------------ -------- ------------ -------------------------------------------------- ------------------------------
 0 DATA1                          NORMAL   MEMBER       ORCL:DATA1                                         DATA1
 1 DATA2                          NORMAL   MEMBER       ORCL:DATA2                                         DATA2
 0 FILER01DISKA                   NORMAL   MEMBER       ORCL:FILER01DISKA                                  FILER01
 1 FILER01DISKB                   NORMAL   MEMBER       ORCL:FILER01DISKB                                  FILER01
 2 FILER02DISKA                   NORMAL   MEMBER       ORCL:FILER02DISKA                                  FILER02
 3 FILER02DISKB                   NORMAL   MEMBER       ORCL:FILER02DISKB                                  FILER02

6 rows selected.

Brilliant-will it have an effect on the database?

Starting the Database

Even though things looked ok, they weren’t! I didn’t expect this to happen:

[oracle@rac11gr1node1 ~]$ srvctl start database -d rac11g
PRKP-1001 : Error starting instance rac11g2 on node rac11gr1node1
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:SQL*Plus: Release 11.1.0.7.0 - Production on Wed Apr 6 17:48:58 2011
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:Copyright (c) 1982, 2008, Oracle.  All rights reserved.
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:Enter user-name: Connected to an idle instance.
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:SQL> ORACLE instance started.
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:Total System Global Area 1720328192 bytes
rac11gr1node1:ora.rac11g.rac11g2.inst:Fixed Size                    2160392 bytes
rac11gr1node1:ora.rac11g.rac11g2.inst:Variable Size              1291847928 bytes
rac11gr1node1:ora.rac11g.rac11g2.inst:Database Buffers    419430400 bytes
rac11gr1node1:ora.rac11g.rac11g2.inst:Redo Buffers                  6889472 bytes
rac11gr1node1:ora.rac11g.rac11g2.inst:ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [9572],
rac11gr1node1:ora.rac11g.rac11g2.inst:[9533], [0x000000000], [], [], [], [], [], [], [], []
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:
rac11gr1node1:ora.rac11g.rac11g2.inst:SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
rac11gr1node1:ora.rac11g.rac11g2.inst:With the Partitioning, Real Application Clusters, OLAP, Data Mining
rac11gr1node1:ora.rac11g.rac11g2.inst:and Real Application Testing options
rac11gr1node1:ora.rac11g.rac11g2.inst:
CRS-0215: Could not start resource 'ora.rac11g.rac11g2.inst'.
PRKP-1001 : Error starting instance rac11g1 on node rac11gr1node2
CRS-0215: Could not start resource 'ora.rac11g.rac11g1.inst'.

Oops. A quick search on Metalink revealed note Ora-00600: [Kccpb_sanity_check_2], [3621501],[3621462] On Startup (Doc ID 435436.1). The explanation for the ORA-600 is that “the seq# of the last read block is higher than the seq# of the control file header block.” Oracle Support explains it with a lost write, but here the situation is quite different. Interesting! I have to leave that for another blog post.

UltraEdit for Mac/Linux v2.1.0.3…

Followers of the blog will know I’m a big fan of UltraEdit. I have a multi-platform unlimited upgrades license, so I run it on Linux, Mac and occasionally on a Windows VM.

I noticed today that version 2.1.0.3 was released for Mac and Linux about a month ago. Not sure how I missed that on the update notices. :) The changes for Mac are not that big because it was already at version 2.x, but the Linux version had been hanging around the 1.x for some time now and was missing a lot of functionality compared to the Mac version. This latest release is a pretty big catch-up for the Linux version and it now contains pretty much all of the functionality I use on a regular basis.

Both the Mac and Linux versions are still lagging behind the Windows version in terms of total functionality, but who cares about Windows… :)

Cheers

Tim…




Installing Oracle Linux 6 as a domU in Xen

Despite all recent progress in other virtualisation technologies I am staying faithful to Xen. The reason is simple: it works for me. Paravirtualised domUs are the fastest way to run virtual machines on hardware that doesn’t support virtualisation in the processor.

I read about cgroups yesterday, a feature that’s appeared in kernel 2.6.38 and apparently was back-ported into RedHat 6. Unfortunately I can’t get hold of a copy, so I decided to use Oracle’s clone instead. I wanted to install the new domU on my existing lab environment which is a 24G RAM core i7 920 system with 1.5TB of storage. The only limitation I can see is the low number of cores, I wish I could rent an Opteron 6100 series system instead (for the same price).Creating the domU

The first setback was the failure of virt-manager. Virt Manager is OpenSuSE’s preferred too to create xen virtual machines. I wrote about virt-manager and OpenSuSE 11.2 some time ago and went back to this post for instructions. However, the logic coded into the application doesn’t seem to handle OL6, it repeatedly failed to start the PV kernel. I assume the directory structure on the ISO has changed or some other configuration issue here, maybe even a PEBKC.

That was a bit of a problem, because it meant I had to do the legwork all on my own. Thinking about it virt-manager is not to blame really, OL6 wasn’t yet released when the tool came out. So be it, at least I’ll learn something new today. To start with, I needed to create a new “disk” to contain my root volume group. The way the dom0 is set up doesn’t allow me to use LVM logical volumes – all the space is already allocated. My domUs are all stored in /var/lib/xen/images/domUName. I started of by creating the top level directory for my new OL6 reference domU:

# mkdir /var/lib/xen/images/ol6

Inside the directory I created the sparse file for my first “disk”:

# cd /var/lib/xen/images/ol6
# dd if=/dev/zero of=disk0 bs=1024k seek=8192 count=0

This will create a sparse file (much like a temp file in Oracle) for use as my first virtual disk. The next step was to extract the kernel and initial ramdisk from the ISO image and store it somewhere convenient. My default location for xen kernels is /m/xenkernels/domUName. The new kernel is a PVOPS kernel (but still not dom0 capable!) so copying it from a loopback mounted ISO image’s isolinux/vmlinuz location was enough. There is also only one initrd to copy. You should get them into the xen kernel location as shown here:

# mkdir /m/xenkernels/ol6
# cp /mnt/ol6/isolinux/{initrd.img,vmlinuz} /m/xenkernels/ol6

Next, I copied the contents of the ISO image to /var/srw/www/htdocs/ol6.

We now need a configuration file to start the domU initially. The below config file worked for me-I have deliberately not chosen a libvirt compatible XML file to keep the example simple. We’ll convert to xenstore later ….

cat /tmp/rhel6ref
name="rhel6ref"
memory=1024
maxmem=4096
vcpus=2
on_poweroff="destroy"
on_reboot="restart"
on_crash="destroy"
localtime=0
builder="linux"
bootargs=""
extra=" "
disk=[ 'file:/var/lib/xen/images/rhel6ref/disk0,xvda,w', ]
vif=[ 'mac=00:16:1e:1b:1d:ef,bridge=br1', ]
vfb=['type=vnc,vncdisplay=11']
kernel = "/m/xenkernels/ol6/vmlinuz"
ramdisk = "/m/xenkernels/ol6/initrd.img"

As I am forwarding the VNC port I needed a fixed one. In my putty session I forwarded local port 5911 to my dom0′s port 5911. Now start the domU using the xm create /tmp/rhel6ref command. Within a few seconds you should be able to point your vncviewer to localhost:11 and connect to the domU. That’s all! From now on the installation is really just a matter of “next”, “next”, “next”. Some things have changed though, have a look at these selected screenshots.

Walking through the installation

First of all you need to configure the network for the domU to talk to your staging server. I always use a manual configuration, and configure IPv4 only. The URL setup is interesting-I used http://dom0/ol6 as the repository and got further. When in doubt, check the access and error logs in /var/log/apache2

After the welcome screen I was greeted with a message stating that my xen-vbd-xxx device had to be reinitialised. Huh? But ok, so I did that and progressed. I then entered the hostname and got to the partitioning screen. I chose to “use all space” and ticked the box next to “review and modify partitioning layout”. Remember that ext4 is now the default for all file systems, but OpenSuSE’s pygrub can’t read it. The important step is to ensure that you have a separate /boot partition outside any LVM devices, and that it’s formatted with ext2. The ext3 file system might also work, but I decided to stick with ext2 which I knew pygrub could deal with. I also tend to rename my volume group to rootvg, instead vg_hostname as the installer suggests.

The VNC interface now became a little bit difficult to use when being asked to select a timezone, I deferred that to later. I ran into a bit of a problem when it came to the package selection screen. Suddenly the installer, which happily read all data from my apache setup, claimed it couldn’t read the repodata.xml file. I thouoght that was strange but then manually pointed it to the Server/repodata/repomd.xml file and clicked on the install button. Unfortunately the installer now couldn’t read the first package. The reason was quickly identified in the access log

192.168.99.124 – - [01/Apr/2011:14:01:42 +0200] “GET /ol6/Server/Packages/alsa-utils-1.0.21-3.el6.x86_64.rpm HTTP/1.1″ 403 1036 “-” “Oracle Linux Server (anaconda)/6.0″
192.168.99.124 – - [01/Apr/2011:14:03:10 +0200] “GET /ol6/Server/Packages/alsa-utils-1.0.21-3.el6.x86_64.rpm HTTP/1.1″ 403 1036 “-” “Oracle Linux Server (anaconda)/6.0″

HTTP 403 errors (i.e. FORBIDDEN). They were responsible for the problem with the repomd.xml file as well:

192.168.99.124 – - [01/Apr/2011:14:00:53 +0200] “GET /ol6/repodata/repomd.xml HTTP/1.1″ 403 1036 “-” “Oracle Linux Server (anaconda)/6.0″

The reason for these could be found in the error log:

[Fri Apr 01 14:00:53 2011] [error] [client 192.168.99.124] Symbolic link not allowed or link target not accessible: /srv/www/htdocs/ol6/repodata

Oha.Where do these come from? Fair enough, the directory structure has changed:

[root@dom0 /srv/www/htdocs/ol6] # ls -l repodata
lrwxrwxrwx 1 root root 15 Apr  1 14:09 repodata -> Server/repodata

Now then, because this is my lab and I’m solely responsible for its security, I change the Options in my apache’s server root to FollowSymLinks.Do not do this in real life! Create a separate directory, or alias, and don’t compromise your server root. Enough said …

A reload of the apache2 daemon fixed that problem, but I had to start from scratch. This time however it ran through without problems.

Cleaning Up

When the installer prompts you for a reboot, don’t click on the “OK” button just yet. The configuration file needs to be changed to use the bootloader pygrub. Change the configuration file to something similar to this:

# cat /tmp/rhel6ref
name="rhel6ref"
memory=1024
maxmem=4096
vcpus=2
on_poweroff="destroy"
on_reboot="restart"
on_crash="destroy"
localtime=0
builder="linux"
bootargs=""
extra=" "
disk=[ 'file:/var/lib/xen/images/rhel6ref/disk0,xvda,w', ]
vif=[ 'mac=00:16:1e:1b:1d:ef,bridge=br1', ]
vfb=['type=vnc,vncdisplay=11']
bootloader = "pygrub"

The only change is the replacement of the kernel and ramdisk lines with bootloader. You may have to xm destroy the VM for the change to take effect, a reboot doesn’t seem to trigger a reparse of the configuration file.

With that done, restart the VM and enjoy the final stages of the configuration. If your domU doesn’t start now, you probably forgot to format /boot with ext2 and it is ext4. In that case you have to do some research on google whether or not you can save your installation.

The result is a new Oracle Linux reference machine!

# ssh root@192.168.99.124
The authenticity of host '192.168.99.124 (192.168.99.124)' can't be established.
RSA key fingerprint is 3d:90:d5:ef:33:e1:15:f8:eb:4a:38:15:cd:b9:f1:7e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.99.124' (RSA) to the list of known hosts.
root@192.168.99.124's password:
[root@rhel6ref ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.0 (Santiago)
[root@rhel6ref ~]# uname -a
Linux rhel6ref.localdomain 2.6.32-100.28.5.el6.x86_64 #1 SMP Wed Feb 2 18:40:23 EST
2011 x86_64 x86_64 x86_64 GNU/Linux

If you want, you can register the domU in xenstore-while the domU is up and running, dump the XML file using virsh dumpxml > /tmp/ol6.xml. My configuration file looked like this:


 ol6ref
 530c97e6-3c23-ec9f-d65b-021a79d61585
 4194304
 1048576
 2
 /usr/bin/pygrub
 
 linux
  
 
 
 destroy
 restart
 destroy
 
 /usr/lib64/xen/bin/qemu-dm
 
 

 
 
 

 
 
 
 

 
 
 
 

Shut the domU down and register it using a call to “virsh define /tmp/ol6.xml”. Subsequent editing can be done using “virsh edit ol6″.

6th Planboard DBA Symposium – Registration now open

I am pleased to announce that we, the program committee members, just finalized the program for the 6th Planboard DBA Symposium to be held at May 17 2011 in Amstelveen. There are 10 presentations scheduled in 2 parallel tracks by Frits Hoogland, Johan van Veen, Iloon Ellen-Wolff, Paul der Kinderen, Wouter Wethmar, Alex Nuijten, Gert […]

Copying the password file for RAC databases

This post is inspired by a recent thread on the oracle-l mailing list. In post “11g RAC orapw file issue- RAC nodes not updated” the fact that the password file is local to the instance has been brought up. In fact, all users with the SYSOPER or SYSDBA role granted are stored in the password file, and changing the account for the SYS user on one instance doesn’t mean the password change is reflected on the other RAC instances. Furthermore, your Data Guard configuration will break as well, since the SYS account is used to log in to the standby database.

On a related note, the change of the sys password for the ASM instance in GRID_HOME will propagate to all cluster nodes automatically, a fact I have first seen mentioned on the Dutch Prutser’s weblog, Harald van Breederode.

Now to get over the annoyance of having to manually copy the new password file to all cluster nodes I have written a small shell script, which I use for all my Linux clusters. It takes the ORACLE_SID of the local instance for input, then works out the corresponding ORACLE_HOME and copies the password file to all instances in the cluster, as listed in the output of olsnodes. The script can deal with separation of duty, i.e. Systems where GRID_HOME is owned by a different owner then the RDBMS ORACLE_HOME. The script is by no means perfect, and could be extended to deal with a more general setup. My assumption is that all cluster nodes have a 1:1 mapping of Oracle instance and ORACLE_SID, for example instance PROD1 will be hosted on the first cluster node, prodnode1.

The script is shown below, it’s been written and tested on Linux:

#!/bin/bash

# A small and simple script to copy a password file
# to all nodes of a cluster
# This works for me, it doesn't necessarily work for you,
# and the script is provided "as is"-I will not take
# responsibility for its operation and it comes with no
# warrenty of any sorts
#
# Martin Bach 2011
#
# You are free to use the script as you feel fit, but please
# retain the reference to the author.
#
# Usage: requires the local ORACLE_SID as a parameter.
# requires the ORACLE_SID or DBNAME to be in oratab

ORACLE_SID=$1
[[ $ORACLE_SID == "" ]] && {
 echo usage `basename $0` ORACLE_SID
 exit 1
}

#### TUNEABLES

# change to /var/opt/oracle/oratab for Solaris
ORATAB=/etc/oratab
GRID_HOME=/u01/crs/11.2.0.2

#### this section doesn't normally have to be changed

DBNAME=${ORACLE_SID%*[0-9]}
ORACLE_HOME=`grep $DBNAME $ORATAB | awk -F":" '{print $2}'`
[[ $ORACLE_HOME == "" ]] && {
 echo cannot find ORACLE_HOME for database $DBNAME in $ORATAB
 exit 2
}

cd $ORACLE_HOME/dbs
cp -v orapw$ORACLE_SID /tmp
INST=1

echo starting copy of passwordfile
for NODE in `$GRID_HOME/bin/olsnodes`; do
 echo copying orapw$ORACLE_SID to $NODE as orapw${DBNAME}${INST}
 scp orapw$ORACLE_SID $NODE:${ORACLE_HOME}/dbs/orapw${DBNAME}${INST}
 INST=$(( $INST + 1))
done

It’s fairly straight forward, we first get the ORACLE_SID and use this to get the ORACLE_HOME for the database.  The GRID_HOME has to be hard coded to keep it compatible with < 11.2 database where you could have a CRS_HOME different from the ASM_HOME. For Oracle < 11.2, you need to set the GRID_HOME variable to your Clusterware home.

The DBNAME is the $ORACLE_SID without trailing number, which I need to work out the SIDs of the other cluster nodes. Before copying the password file from the local node to all cluster nodes a copy is taken to /tmp, just in case.

The main logic is in the loop provided by the output of olsnodes, and the local password file is copied across all cluster nodes.

Feel free to use at your own risk, and modify/distribute as needed. This works well for me, especially across the 8 node cluster.

Troubleshooting Grid Infrastructure startup

This has been an interesting story today when one of my blades decided to reboot after an EXT3 journal error. The hard facts first:

  • Oracle Linux 5.5 with kernel 2.6.18-194.11.4.0.1.el5
  • Oracle 11.2.0.2 RAC
  • Bonded NICs for private and public networks
  • BL685-G6 with 128G RAM

First I noticed the node had problems when I tried to get all databases configured on the cluster. I got the dreaded “cannot communicate with the CRSD”

[oracle@node1.example.com] $ srvctl config database
PRCR-1119 : Failed to look up CRS resources of database type
PRCR-1115 : Failed to find entities of type resource that match filters (TYPE ==ora.database.type) and contain attributes DB_UNIQUE_NAME,ORACLE_HOME,VERSION
Cannot communicate with crsd

Not too great, especially since everything worked when I left yesterday. What could have gone wrong?An obvious reason for this could be a reboot, and fair enough, there has been one:

[grid@node1.example.com] $ uptime
09:09:22 up  2:40,  1 user,  load average: 1.47, 1.46, 1.42

The next step was to check if the local CRS stack was up, or better, to check what was down. Sometimes it’s only crsd which has a problem. In my case everything was down:

[grid@node1.example.com] $ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[grid@node1.example.com] $ crsctl check cluster -all
**************************************************************
node1:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
**************************************************************
CRS-4404: The following nodes did not reply within the allotted time:
node2,node3, node4, node5, node6, node7, node8

The CRS-4404 was slightly misleading, I assumed all cluster nodes were down after a clusterwide reboot. Sometimes a single node reboot triggers worse things. However, logging on to node 2 I saw that all but the first node were ok.

CRSD really needs CSSD to be up and running, and CSSD requires the OCR to be there. I wanted to know if the OCR was impacted in any way:

[grid@node1.example.com] $ ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
ORA-29701: unable to connect to Cluster Synchronization Service

Well it seemed that the OCR location was unavailable. I know that on this cluster, the OCR is stored on ASM. Common reasons for the PROC-26 error are

  • Unix admin upgrades the kernel but forgets to upgrade the ASMLib kernel module (common grief with ASMLib!)
  • Storage is not visible on the host, i.e. SAN connectivity broken/taken away (happens quite frequently with storage/sys admin unaware of ASM)
  • Permissions not set correctly on the block devices (not an issue when using asmlib)

I checked ASMLib and it reported a working status:

[oracle@node1.example.com] $ /etc/init.d/oracleasm status
Checking if ASM is loaded: yes
Checking if /dev/oracleasm is mounted: yes

That was promising, /dev/oracleasm/ was populated and the matching kernel modules loaded. /etc/init.d/oracleasm listdisks listed all my disks as well. Physical storage not accessible (PROC-26) seemed a bit unlikely now.

I could rule out permission problems since ASMLib was working fine, and I also rule out the kernel upgrade/missing libs problem by comparing the RPM with the kernel version: they matched. So maybe it’s storage related?

Why did the node go down?

Good question, usually to be asked towards the unix administration team. Luckily I have a good contact placed right inside that team and I could get the following excerpt from /var/log/messages arond the time of the crash (6:31 this morning):

Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192116, count = 1
Mar 17 06:26:06 node1 kernel: Aborting journal on device dm-2.
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 55 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192216, count = 1
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 56 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192166, count = 1
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 55 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192122, count = 1
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 55 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192140, count = 1
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 56 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_free_blocks: Freeing blocks in system zones - Block = 8192174, count = 1
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_free_blocks_sb: Journal has aborted
Mar 17 06:26:06 node1 last message repeated 10 times
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_reserve_inode_write: Journal has aborted
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_truncate: Journal has aborted
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_reserve_inode_write: Journal has aborted
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_orphan_del: Journal has aborted
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_reserve_inode_write: Journal has aborted
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2) in ext3_delete_inode: Journal has aborted
Mar 17 06:26:06 node1 kernel: __journal_remove_journal_head: freeing b_committed_data
Mar 17 06:26:06 node1 kernel: ext3_abort called.
Mar 17 06:26:06 node1 kernel: EXT3-fs error (device dm-2): ext3_journal_start_sb: Detected aborted journal
Mar 17 06:26:06 node1 kernel: Remounting filesystem read-only
Mar 17 06:26:06 node1 kernel: __journal_remove_journal_head: freeing b_committed_data
Mar 17 06:26:06 node1 snmpd[25651]: Connection from UDP: [127.0.0.1]:19030
Mar 17 06:26:06 node1 snmpd[25651]: Received SNMP packet(s) from UDP: [127.0.0.1]:19030
Mar 17 06:26:06 node1 snmpd[25651]: Connection from UDP: [127.0.0.1]:19030
Mar 17 06:26:06 node1 snmpd[25651]: Connection from UDP: [127.0.0.1]:41076
Mar 17 06:26:06 node1 snmpd[25651]: Received SNMP packet(s) from UDP: [127.0.0.1]:41076
Mar 17 06:26:09 node1 kernel: SysRq : Resetting
Mar 17 06:31:15 node1 syslogd 1.4.1: restart.

So it looks like a file system error triggered the reboot-I’m glad the box came back up ok on it’s own. The $GRID_HOME/log/hostname/alerthostname.log didn’t show anything specific to storage. Normally you would see that it starts counting a node down if it lost contact to the voting disks (in this case OCR and voting disks share the same diskgroup).

And why does Clusteware not start?

After some more investigation it seems there was no underlying problem with the storage, so I tried to manually start the cluster, traililng the ocssd.log file for possible clues.

[root@node1 ~]# crsctl start cluster
CRS-2672: Attempting to start ‘ora.cssd’ on ‘node1′
CRS-2674: Start of ‘ora.cssd’ on ‘node1′ failed
CRS-2679: Attempting to clean ‘ora.cssd’ on ‘node1′
CRS-2681: Clean of ‘ora.cssd’ on ‘node1′ succeeded
CRS-5804: Communication error with agent process
CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘node1′
CRS-2676: Start of ‘ora.cssdmonitor’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.cssd’ on ‘node1′

… the command eventually failed. The ocssd.log file showed this:

...
2011-03-17 09:47:49.073: [GIPCHALO][1081923904] gipchaLowerProcessNode: no valid interfaces found to node for 10996354 ms, node 0x2aaab008a260 { host 'node4', haName 'CSS_lngdsu1-c1', srcLuid b04d4b7b-a7491097, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [61 : 61], createTime 10936224, flags 0x4 }
2011-03-17 09:47:49.084: [GIPCHALO][1081923904] gipchaLowerProcessNode: no valid interfaces found to node for 10996364 ms, node 0x2aaab008a630 { host 'node6', haName 'CSS_lngdsu1-c1', srcLuid b04d4b7b-2f6ece1c, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [61 : 61], createTime 10936224, flags 0x4 }
2011-03-17 09:47:49.113: [    CSSD][1113332032]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 2, node2, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 30846440, LATS 10996434, lastSeqNo 30846437, uniqueness 1300108895, timestamp 1300355268/3605443434
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 3, node3, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31355257, LATS 10996434, lastSeqNo 31355254, uniqueness 1300344405, timestamp 1300355268/10388584
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 4, node4, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31372473, LATS 10996434, lastSeqNo 31372470, uniqueness 1297097908, timestamp 1300355268/3605182454
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 5, node5, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31384686, LATS 10996434, lastSeqNo 31384683, uniqueness 1297098093, timestamp 1300355268/3604696294
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 6, node6, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31388819, LATS 10996434, lastSeqNo 31388816, uniqueness 1297098327, timestamp 1300355268/3604712934
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 7, node7, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 29612975, LATS 10996434, lastSeqNo 29612972, uniqueness 1297685443, timestamp 1300355268/3603054884
2011-03-17 09:47:49.158: [    CSSD][1090197824]clssnmvDHBValidateNCopy: node 8, node8, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31203293, LATS 10996434, lastSeqNo 31203290, uniqueness 1297156000, timestamp 1300355268/3604855704
2011-03-17 09:47:49.161: [    CSSD][1085155648]clssnmvDHBValidateNCopy: node 3, node33, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31355258, LATS 10996434, lastSeqNo 31355255, uniqueness 1300344405, timestamp 1300355268/10388624
2011-03-17 09:47:49.161: [    CSSD][1085155648]clssnmvDHBValidateNCopy: node 4, node4, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31372474, LATS 10996434, lastSeqNo 31372471, uniqueness 1297097908, timestamp 1300355268/3605182494
2011-03-17 09:47:49.161: [    CSSD][1085155648]clssnmvDHBValidateNCopy: node 5, node5, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31384687, LATS 10996434, lastSeqNo 31384684, uniqueness 1297098093, timestamp 1300355268/3604696304
2011-03-17 09:47:49.161: [    CSSD][1085155648]clssnmvDHBValidateNCopy: node 6, node6, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31388821, LATS 10996434, lastSeqNo 31388818, uniqueness 1297098327, timestamp 1300355268/3604713224
2011-03-17 09:47:49.161: [    CSSD][1085155648]clssnmvDHBValidateNCopy: node 7, node7, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 29612977, LATS 10996434, lastSeqNo 29612974, uniqueness 1297685443, timestamp 1300355268/3603055224
2011-03-17 09:47:49.197: [    CSSD][1094928704]clssnmvDHBValidateNCopy: node 2, node2, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 30846441, LATS 10996474, lastSeqNo 30846438, uniqueness 1300108895, timestamp 1300355269/3605443654
2011-03-17 09:47:49.197: [    CSSD][1094928704]clssnmvDHBValidateNCopy: node 3, node3, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31355259, LATS 10996474, lastSeqNo 31355256, uniqueness 1300344405, timestamp 1300355268/10389264
2011-03-17 09:47:49.197: [    CSSD][1094928704]clssnmvDHBValidateNCopy: node 8, node8, has a disk HB, but no network HB, DHB has rcfg 176226183, wrtcnt, 31203294, LATS 10996474, lastSeqNo 31203291, uniqueness 1297156000, timestamp 1300355269/3604855914
2011-03-17 09:47:49.619: [    CSSD][1116485952]clssnmSendingThread: sending join msg to all nodes
...

The interesting bit is the “BUT NO NETWORK HB”, i.e. something must be wrong with the network configuration. I quickly checked the output of ifconfig and found a missing entry for my private interconnect. This is defined in the GPnP profile if you are unsure:


 
 
 

Now that’s a starting point! If tried to bring up bond1.251, but that failed:

[root@node1 network-scripts]# ifup bond1.251
ERROR: trying to add VLAN #251 to IF -:bond1:-  error: Invalid argument
ERROR: could not add vlan 251 as bond1.251 on dev bond1

The “invalid argument” didn’t mean too much to me, so I ran ifup with the “-x” flag to get more information about which argument was invalid:

[root@node1 network-scripts]# which ifup
/sbin/ifup
[root@node1 network-scripts]# view /sbin/ifup
# turned out it's a shell script! Let's run with debug output enabled
[root@node1 network-scripts]# bash -x /sbin/ifup bond1.251
+ unset WINDOW
...
+ MATCH='^(eth|hsi|bond)[0-9]+\.[0-9]{1,4}$'
+ [[ bond1.251 =~ ^(eth|hsi|bond)[0-9]+\.[0-9]{1,4}$ ]]
++ echo bond1.251
++ LC_ALL=C
++ sed 's/^[a-z0-9]*\.0*//'
+ VID=251
+ PHYSDEV=bond1
+ [[ bond1.251 =~ ^vlan[0-9]{1,4}? ]]
+ '[' -n 251 ']'
+ '[' '!' -d /proc/net/vlan ']'
+ test -z ''
+ VLAN_NAME_TYPE=DEV_PLUS_VID_NO_PAD
+ /sbin/vconfig set_name_type DEV_PLUS_VID_NO_PAD
+ is_available bond1
+ LC_ALL=
+ LANG=
+ ip -o link
+ grep -q bond1
+ '[' 0 = 1 ']'
+ return 0
+ check_device_down bond1
+ echo bond1
+ grep -q :
+ LC_ALL=C
+ ip -o link
+ grep -q 'bond1[:@].*,UP'
+ return 1
+ '[' '!' -f /proc/net/vlan/bond1.251 ']'
+ /sbin/vconfig add bond1 251
ERROR: trying to add VLAN #251 to IF -:bond1:-  error: Invalid argument
+ /usr/bin/logger -p daemon.info -t ifup 'ERROR: could not add vlan 251 as bond1.251 on dev bond1'
+ echo 'ERROR: could not add vlan 251 as bond1.251 on dev bond1'
ERROR: could not add vlan 251 as bond1.251 on dev bond1
+ exit 1

Hmmm so it seemed that the underlying interface bond1 was missing-which was true. The output of ifconfig didn’t show it as configured, and trying to start it manually using ifup bond1 failed as well. It turned out that the ifcfg-bond1 file was missing and had to be recreated from the documentation. All network configuration files in Red Hat based systems belong into /etc/sysconfig/network-scripts/ifcfg-interfaceName. With the recreated file in place, I was back in the running:

[root@node1 network-scripts]# ll *bond1*
-rw-r–r– 1 root root 129 Mar 17 10:07 ifcfg-bond1
-rw-r–r– 1 root root 168 May 19  2010 ifcfg-bond1.251
[root@node1 network-scripts]# ifup bond1
[root@node1 network-scripts]# ifup bond1.251
Added VLAN with VID == 251 to IF -:bond1:-
[root@node1 network-scripts]#

Now I could try to start the lower stack again:

CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘node1′
CRS-2676: Start of ‘ora.cssdmonitor’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.cssd’ on ‘node1′
CRS-2676: Start of ‘ora.cssd’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.cluster_interconnect.haip’ on ‘node1′
CRS-2672: Attempting to start ‘ora.ctssd’ on ‘node1′
CRS-2676: Start of ‘ora.ctssd’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.evmd’ on ‘node1′
CRS-2676: Start of ‘ora.evmd’ on ‘node1′ succeeded
CRS-2676: Start of ‘ora.cluster_interconnect.haip’ on ‘node1′ succeeded
CRS-2679: Attempting to clean ‘ora.asm’ on ‘node1′
CRS-2681: Clean of ‘ora.asm’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.asm’ on ‘node1′
CRS-2676: Start of ‘ora.asm’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.crsd’ on ‘node1′
CRS-2676: Start of ‘ora.crsd’ on ‘node1′ succeeded
[root@node1 network-scripts]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Brilliant-problem solved. This is actually the first time that an incorrect network config prevented a cluster I looked after from starting. The best indication in this case is in the gipcd log file, but it didn’t occur to me to have a look at is as the error was clearly related to storage.