Search

OakieTags

Who's online

There are currently 0 users and 33 guests online.

Recent comments

Affiliations

linux

Which Linux do you pick for Oracle Installations?

There was an interesting thread on the OakTable mailing list the other day regarding the choice of Linux distros for Oracle installations. It was started by one member (the name has been withheld to protect the innocent :) ) who said,

“I cannot imagine (but want to understand) why anyone would pick RHEL5.6 for Oracle as opposed to the vastly superior OEL with the UEK.”

I must admit I’ve kinda forgotten that any distro apart from Oracle Linux (OL) exists as far as production installations of Oracle software are concerned.

Some of the reasons cited for people not to pick OL include:

  • The customer has a long relationship with Red Hat and doesn’t want to jump ship.
  • RHEL is the market leading enterprise Linux distro, so why switch to Oracle?
  • The customer doesn’t want to be too dependent on Oracle.
  • The customer has lots of non-Oracle servers running RHEL and doesn’t want a mix of RHEL and OL as it would complicate administration.
  • The customer uses some software that is certified against RHEL, but not OL.
  • The customer prefers Red Hat support over Oracle support. Wait. Red Hat and support in the same sentence. Give me a minute to stop laughing…
  • The customer is using VMware for Virtualization and the Unbreakable Enterprise Kernel (UEK) is not supported on VMware.

I guess every company and individual will have differing justifications for their choice of distro.

So why would you pick OL and Unbreakable Enterprise Kernel (UEK) for Oracle installations?

  • You can run it for free if you don’t want OS support. Using OL without support doesn’t affect the support status of the products (DB, App Servers etc.) running on top of it.
  • It’s what Oracle use to write the Linux version of the products.
  • It’s what Exadata uses.
  • Oracle products are now certified against the OL + UEK before they are certified against the RHEL kernel.
  • UEK is typically a much more up to date version of the kernel than that shipped by RHEL and includes all the patches vital for optimum Oracle performance.
  • Single vendor, so no finger pointing over support issues (from Google+ comment).
  • It is the only enterprise Linux distro that supports kernel patching without reboots thanks to Oracle’s newly aquired Ksplice.

For more information you might want to read this whitepaper or watch this webcast.

If you are looking at things from a purely technical perspective, I guess you are going to pick OL and UEK. Of course, many of us don’t work in a world where technology is picked purely on its merits. :)

Cheers

Tim…

Update: Check out this post by Jay Weinshenker for a different angle on this issue.




Using Connection Manager to protect a database

I have known about Oracle’s connection manager for quite a while but never managed to use it in anger. In short there was no need to do so. Now however I have been asked to help in finding a solution to an interesting problem.

In summary, my customer is running DR and UAT in the same data centre. Now for technical reasons they rely on RMAN to refresh UAT, no array mirror splits on the SAN (which would be way faster!) possible. The requirement is to prevent an RMAN session with target=UAT and auxiliary=DR from overwriting the DR databases, all of which are RAC databases on 11.2.The architecture included 2 separate networks for the DR hosts and the UAT hosts. DR was on 192.168.99.0/24 whereas UAT was on 192.168.100.0/24. A gateway host with two NICs connects the two. Initially there was a firewall on the gateway host to prevent traffic from UAT to DR, but because of the way Oracle connections work this proved impossible (the firewall was a simple set of IPTABLES rules). After initial discussions I decided to look at connection manager more closely as that is hailed as a solution to Oracle connectivity and firewalls.

Thinking more closely about the problem I realised that the firewall + static routes approach might not be the best one. So I decided to perform a test which didn’t involve IPTABLES or custom routes, and rely on CMAN only to protect the database. A UAT refresh isn’t something that’s going to happen very frequently, which allows me to shut down CMAN, and thus prevent any communication between the two networks. This is easier than maintaining a firewall-remember the easiest way to do something is not to do it at all.

Overview of Connection Manager

For those of you who aren’t familiar with CMAN, here’s a short summary (based on the official Oracle documentation).

Configuration of Oracle Connection Manager (CMAN) allows the clients to connect through a firewall [I haven’t verified this yet, ed]. CMAN is an executable that allows clients to connect despite a firewall being in place between the client and server. CMAN is similar to the Listener in that it reads a configuration file [called CMAN.ora, ed], which contains an address that Oracle Connection Manager listens for incoming connections. CMAN starts similar to the Listener and will enter a LISTEN state.

This solution [to the firewall issue with TCP redirects, ed] will make the REDIRECT happen inside the firewall and the client will not see it; CMAN comports as a proxy service between the client and the real database listener.

Interestingly, Connection Manager is fully integrated into the FAN/FCF framework and equally suitable for UCP connection pools.

Technically speaking Oracle database instances require the initialisation parameters local_listener and remote_listener to be set. In RAC databases, this is usually the case out of the box, however, in addition to the SCAN, the remote_listener must include the CMAN listener as well-an example is provided in this document.

Installing Connection Manager

Connection Manager is now part of the Oracle client, and you can install it by choosing the “custom” option. From the list of selectable options, pick “Oracle Net Listener” and “Oracle Connection Manager”.

From there on it’s exactly the same as any other client installation.

Testing

A quick test with 2 separate networks reveals that the concept actually works. The following hosts are used:

  • cman                     192.168.99.224
  • cmandb                192.168.100.225
  • client                     192.168.99.31

The networks in use are:

  • Public network:  192.168.99.0/24
  • Private network: 192.168.100.0/24

As you can see CMANDB is on a different network than the other hosts.

Connection manager has been installed on host “cman”, with IP 192.168.99.224. The listener process has been configured to listen on port 1821. The corresponding cman.ora file has been configured as follows in $CLIENT_HOME/network/admin:

cman1 =
(configuration=
(address=(protocol=tcp)(host=192.168.99.224)(port=1821))
(rule_list=
(rule=(src=*)(dst=127.0.0.1)(srv=cmon)(act=accept))
(rule=(src=192.168.99.0/24)(dst=192.168.100.225)(srv=*)(act=accept))
)
)

The file has been left in this minimalistic state deliberately. The only connection possible is to the database host. The cmon service must be allowed or otherwise the startup of the connection manager processes will fail.

The gateway host has 2 network interfaces, one for each network:

[root@cman ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:16:3E:F2:34:56
inet addr:192.168.99.224  Bcast:192.168.99.255  Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:1209 errors:0 dropped:0 overruns:0 frame:0
TX packets:854 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:105895 (103.4 KiB)  TX bytes:147462 (144.0 KiB)
Interrupt:17

eth1      Link encap:Ethernet  HWaddr 00:16:3E:52:4A:56
inet addr:192.168.100.224  Bcast:192.168.100.255  Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:334 errors:0 dropped:0 overruns:0 frame:0
TX packets:151 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:36425 (35.5 KiB)  TX bytes:22141 (21.6 KiB)
Interrupt:16

Its routing table is defined as follows:

[root@cman ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.100.0   0.0.0.0         255.255.255.0   U     0      0        0 eth1
192.168.101.0   0.0.0.0         255.255.255.0   U     0      0        0 eth2
192.168.99.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth2
[root@cman ~]

Configuration on host “CMANDB”

Note that host 192.168.100.225 is on the private network! The database CMANDB has its local and remote listener configured using the below entries in tnsnames.ora:

[oracle@cmandb admin]$ cat tnsnames.ora

[oracle@cmandb admin]$ cat tnsnames.ora
LOCAL_CMANDB =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(PORT = 1521)(HOST = 192.168.100.225))
)
)

REMOTE_CMANDB =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(PORT = 1821)(HOST = 192.168.99.224))
(ADDRESS = (PROTOCOL = TCP)(PORT = 1521)(HOST = 192.168.100.225))
)
)

CMAN_LSNR =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(PORT = 1521)(HOST = 192.168.99.224))
)
)
  • local listener is set to local_cmandb
  • remote listener is set to remote_cmandb
  • CMAN_LSNR is used for a test to verify that a connection to the CMAN listener is possible from this host

The host had only one network interface, connecting to the CMAN host:

[root@cmandb ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:16:3E:12:14:51
inet addr:192.168.100.225  Bcast:192.168.100.255  Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:627422 errors:0 dropped:0 overruns:0 frame:0
TX packets:456584 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2241758430 (2.0 GiB)  TX bytes:32751153 (31.2 MiB)

The only change to the system was the addition of a default gateway. Unfortunately the CMAN process cannot listen on more than one IP address:

# route add default gw 192.168.100.224 eth0

The following routing table was in use during the testing:

[root@cmandb ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.100.0   0.0.0.0         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
0.0.0.0         192.168.100.224 0.0.0.0         UG    0      0        0 eth0

After the database listener parameters have been changed to LOCAL_CMANDB and REMOTE_CMANDB, the alert log on the CMAN host recorded a number of service registrations. This is important as it allows Connection Manager to hand the connection request off to the database:

[oracle@cman trace]$ grep cmandb *
08-JUL-2011 10:19:55 * service_register * cmandb * 0
08-JUL-2011 10:25:28 * service_update * cmandb * 0
08-JUL-2011 10:35:28 * service_update * cmandb * 0
[...]
08-JUL-2011 11:15:10 * service_update * cmandb * 0
08-JUL-2011 11:15:28 * service_update * cmandb * 0
[oracle@cman trace]$

Additionally, the CMAN processes now know about the database service CMANDB:

CMCTL:cman1> show services
Services Summary...
Proxy service "cmgw" has 1 instance(s).
Instance "cman", status READY, has 2 handler(s) for this service...
Handler(s):
"cmgw001" established:0 refused:0 current:0 max:256 state:ready

(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=23589))
"cmgw000" established:0 refused:0 current:0 max:256 state:ready

(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=31911))
Service "cmandb" has 1 instance(s).
Instance "cmandb", status READY, has 1 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(PORT=1521)(HOST=192.168.100.225)))
Service "cmandbXDB" has 1 instance(s).
Instance "cmandb", status READY, has 1 handler(s) for this service...
Handler(s):
"D000" established:0 refused:0 current:0 max:1022 state:ready
DISPATCHER 
(ADDRESS=(PROTOCOL=tcp)(HOST=cmandb.localdomain)(PORT=50347))
Service "cmon" has 1 instance(s).
Instance "cman", status READY, has 1 handler(s) for this service...
Handler(s):
"cmon" established:1 refused:0 current:1 max:4 state:ready

(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=59541))
The command completed successfully.
CMCTL:cman1>

Connectivity Test

With the setup completed it was time to perform a test from a third host on the (public) network. Its IP address is 192.168.99.31. The below TNSnames entries were created:

[oracle@client admin]$ cat tnsnames.ora
CMAN =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PORT=1821)(HOST=192.168.99.224)(PROTOCOL=TCP))
)
(CONNECT_DATA =
(SERVICE_NAME = cmandb)
(SERVER = DEDICATED)
)
)

DIRECT =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PORT=1521)(HOST=192.168.100.225)(PROTOCOL=TCP))
)
(CONNECT_DATA =
(SERVICE_NAME = cmandb)
(SERVER = DEDICATED)
)
)

The CMAN entry uses the Connection Manager gateway host to connect to database CMANDB, whereas the DIRECT entry tries to bypass the latter. A tnsping should show whether or not this is possible.

[oracle@client admin]$ tnsping direct
TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 08-JUL-2011 11:19:47

Copyright (c) 1997, 2009, Oracle.  All rights reserved.

Used parameter files:

Used TNSNAMES adapter to resolve the alias

Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PORT=1521)(HOST=192.168.100.225)(PROTOCOL=TCP))) (CONNECT_DATA = (SERVICE_NAME = cmandb) (SERVER = DEDICATED)))

TNS-12543: TNS:destination host unreachable

[oracle@client admin]$ tnsping cman

TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 08-JUL-2011 10:53:27

Copyright (c) 1997, 2009, Oracle.  All rights reserved.

Used parameter files:

Used TNSNAMES adapter to resolve the alias

Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PORT=1821)(HOST=192.168.99.224)(PROTOCOL=TCP))) (CONNECT_DATA = (SERVICE_NAME = cmandb) (SERVER = DEDICATED)))

OK (0 msec)

[oracle@client admin]$

Now that proves that a direct connection is impossible, and also that the connection manager’s listener is working. A tnsping doesn’t imply that a connection is possible though, this requires an end to end test with SQL*Plus:

[oracle@client admin]$ sqlplus system/xxx@cman

SQL*Plus: Release 11.2.0.1.0 Production on Fri Jul 8 10:55:39 2011

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select instance_name,host_name from v$instance;

INSTANCE_NAME
----------------
HOST_NAME
----------------------------------------------------------------
cmandb
cmandb.localdomain

The successful connection is also recorded in the CMAN log file:

Fri Jul 08 10:55:39 2011

08-JUL-2011 10:55:39 * (CONNECT_DATA=(SERVICE_NAME=cmandb)(SERVER=DEDICATED)(CID=(PROGRAM=sqlplus@client)(HOST=client)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.99.31)(PORT=16794)) * establish * cmandb * 0

(LOG_RECORD=(TIMESTAMP=08-JUL-2011 10:55:39)(EVENT=Ready)(CONN NO=0))

VirtualBox 4.0.10 Released…

VirtualBox 4.0.10 released. It’s a maintenance released with a number of bug fixes. Happy upgrading! :)

Cheers

Tim…




UltraEdit 2.2 for Mac/Linux released…

UltraEdit 2.2 has been released for Mac and Linux. There is Fedora 15 download now, which is good. It’s nice to see the continued progress of these versions.

If you are a Windows user, version 17.10 is out now for you guys.

Cheers

Tim…




Open VPN for the Road Warrior

As a consultant it is important to have a test lab, something which is your own, where you can play with new versions and concepts to your heart’s delight without disturbing anyone else. Or worse, causing problems for the customer. For this reason I like to have an Internet facing machine which I can connect to from anywhere. In case the corporate network doesn’t let you out, consider getting mobile broadband on a PAYG basis-it works a dream!

I have blogged about my system a number of times, with special emphasis on RHEL 5.x and 6.x. Unlike many other Oracle scientists I do not use Virtual Box or VMWare for virtualisation, but rather Xen. When I started looking at para-virtualisation I looked at Oracle VM but at the time it was lacking features I wanted such as iSCSI provided storage. OpenSuSE is a great distribution which offers a dom0 kernel out of the box, and this is what I went for. My lab can support a four node 11.2.0.2.2 cluster plus two Grid Control domUs, which is more than enough for me to work with. And although it’s busy, it doesn’t make working with the machine impossible.

For a long time I was very happy with my machine, and SSH access was all I needed. But when moving to Vista/Windows 7 a problem became apparent: I could no longer use port-forwarding to access my samba server on my backup domU. Microsoft added some other software to listen on the required port so I started looking at OpenVPN as a solution. This article assumes that you are familiar with OpenVPN-if you are not then you might want to have a look at the documentation. The howto is a a great starting point:

http://openvpn.net/index.php/open-source/documentation/howto.html

Actually, the OpenVPN tutorial is just fantastic, and it is all you need to set your clients and server up. What it failed to explain is how to access your machines behind the gateway. My dom0 has a public IP address to which OpenVPN binds. This public IP is assigned to my first network bridge, br0. I have three more virtual bridges in the system, not related to a physical interface. VMWare calls such a network a “host only” network. I am using br1 as the public interface for my domUs, br2 is used as a private interconnect. The final one, br3 is either  the storage network for my iSCSI nodes or-for my 11.2.0.2.2 systems it serves as an alternative interface to the HAIP resource.

So all in all a reasonable setup I think. The trouble is that I didn’t know how to access my database servers in the br1 subnet. This is another one of those tips that look so obvious, but took literally hours to work out. I am not concerned with the private networks br2 and br3, after all those are not meant to be used anyway. The simplified view on my network is shown in this figure:

The workstation is my laptop-obviously.The figure is not 100% correct-the device openVPN connections go to is called tun0, but it’s related to my br0 device.

First of all, I needed to allow communication from device tun0, to which openVPN listens to, to be forwarded to br1. The following iptables rules do exactly that:

# iptables -I FORWARD -i tun0 -o br1 ACCEPT
# iptables -I FORWARD -i br1 -o tun0 -j ACCEPT

Translated into English these 2 lines instruct the firewall to ACCEPT traffic into the FORWARD chain from device tun0 to br1. In other words route the packet to the internal network. The second line allows traffic from the internal network to traverse the tunnel device tun0.

With this in place, you can instruct openVPN to push routes to the client. In my case this is done in the configuration file used to start the openVPN daemon in server mode. My config file, server.conf, contains this line:

push “route 192.168.99.0 255.255.255.0″

When a client connects to the OpenVPN server, this route is automatically added to the local routing table with the correct gateway. Note that on Windows with UAC enabled you have to start OpenVPN as administrator or otherwise it won’t be able to initialise the TAP device properly.

Are we there yet? I’m afraid not.

This is only half the story as I found out the hard way. After what seems like endless troubleshooting routing tables (which were correct all the time!) I found that I had to set a default gateway on the domUs, pointing to the dom0’s address of br1. This is as simple as updating /etc/sysconfig/network-scripts/ifcfg-ethx, and adding this line to it:

GATEWAY=x.x.x.x

Either use your command line skills to add the default route or – like I did – bring the interface down and up again. Again, where gateway is the IP address of the br1 device on the dom0.

Hope this saves you a few hours of troubleshooting.

Fedora 16 may use Btrfs by default…

Looks like Fedora 16 might use Btrfs as the default filesystem.

I hope the “Oracle doesn’t understand Open Source” brigade remember where this project started. :)

Cheers

Tim…




Fedora 15: Further Observations…

I’ve mentioned Fedora 15 a couple of times recently:

A couple more things I’ve noticed along the way:

  • Lots of the early press complained about the “reduced productivity” of the GNOME3 desktop. Having used it for a while now I can safely say it doesn’t impact me at all. When I’m using my MacBook Pro I have to move the mouse down to the bottom of the screen to activate the doc. It’s like a reflex reaction. I’ve found I do much the same thing with GNOME3. If I’m using the mouse to switch to Activities (rather than the Windows key), I make a circular anti-clockwise motion with the mouse so it touches the top-left corner of the screen. That switches to Activities without a mouse click, so on the down move of the circle I’m over the doc. It’s really pretty stress free. A single mouse move with no extra clicks. Sorted.
  • I did an upgrade from Fedora 14 on one machine. Fedora 15 includes LibreOffice in place of OpenOffice. As a result OpenOffice was not “upgraded”, and the existing OpenOffice installation stopped working. It wasn’t a problem, I just installed LibreOffice, just an observation.

Cheers

Tim…




Tuning Log and Trace levels for Clusterware 11.2

With the introduction of Clusterware 11.2 a great number of command line tools have either been deprecated ($ORA_CRS_HOME/bin/crs_* and others) or merged into other tools. This is especially true for crsctl, which is now the tool to access and manipulate low level resources in Clusterware.

This also implies that some of the notes on Metalink are no longer applicable to Clusterware 11.2, such as the one detailing how to get more detailed information in the logs. Not that the log information wasn’t already rather comprehensive if you asked me…

And here comes a warning: don’t change the log levels unless you have a valid reason, or under the instructions of support. Higher log levels than the defaults tend to generate too much data, filling up the GRID_HOME and potentially killing the node.

Log File Location

The location for logs in Clusterware hasn’t changed much since the unified log structure was introduced in 10.2 and documented in “CRS and 10g/11.1 Real Application Clusters (Doc ID 259301.1)”. It has been extended though, and quite dramatically so in 11.2, which is documented as well in one of the better notes from support: “11gR2 Clusterware and Grid Home – What You Need to Know (Doc ID 1053147.1)”

The techniques for getting debug and trace information as described for example in “Diagnosability for CRS / EVM / RACG (Doc ID 357808.1)” doesn’t really apply any more as the syntax changed.

Getting Log Levels in 11.2

If you are interested at which log level a specific Clusterware resource operates, you can use the crsctl get log resource resourceName call, as in this example:

# crsctl get log res ora.asm
Get Resource ora.asm Log Level: 1

Don’t forget to apply the “-init” flag when you want to query resources which are part of the lower stack:

[root@lonengbl312 ~]# crsctl get log res ora.cssd
CRS-4655: Resource ‘ora.cssd’ could not be found.
CRS-4000: Command Get failed, or completed with errors.
[root@lonengbl312 ~]# crsctl get log res ora.cssd –init
Get Resource ora.cssd Log Level: 1

Interestingly, most Clusterware daemons start leaving a detailed message in their log file as to which module uses what logging level. Take CSSD for example:

2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = CSSD, LogLevel = 2, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = GIPCNM, LogLevel = 2, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = GIPCGM, LogLevel = 2, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = GIPCCM, LogLevel = 2, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = CLSF, LogLevel = 0, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = SKGFD, LogLevel = 0, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = GPNP, LogLevel = 1, TraceLevel = 0
2011-06-02 09:06:14.847: [    CSSD][1531968800]clsu_load_ENV_levels: Module = OLR, LogLevel = 0, TraceLevel = 0

This doesn’t match the output of the previous command. Now how do you check this on the command line? crsctl to the rescue again-it now has a “log” option:

# crsctl get log -h
Usage:
  crsctl get {log|trace} {mdns|gpnp|css|crf|crs|ctss|evm|gipc} ",..."
 where
   mdns          multicast Domain Name Server
   gpnp          Grid Plug-n-Play Service
   css           Cluster Synchronization Services
   crf           Cluster Health Monitor
   crs           Cluster Ready Services
   ctss          Cluster Time Synchronization Service
   evm           EventManager
   gipc          Grid Interprocess Communications
   , ...    Module names ("all" for all names)

  crsctl get log res 
 where
        Resource name

That’s good information, but the crucial bit about the and other parameters is missing. The next question I asked myself was “how do I find out which sub-modules make are part of say, crsd. Apart from looking at the trace file”. That information used to be documented in the OTN reference, but it’s now something you can evaluate from Clusterware as well:

[root@lonengbl312 ~]# crsctl lsmodules -h
Usage:
  crsctl lsmodules {mdns|gpnp|css|crf|crs|ctss|evm|gipc}
 where
   mdns  multicast Domain Name Server
   gpnp  Grid Plug-n-Play Service
   css   Cluster Synchronization Services
   crf   Cluster Health Monitor
   crs   Cluster Ready Services
   ctss  Cluster Time Synchronization Service
   evm   EventManager
   gipc  Grid Interprocess Communications

Back to the question as to which modules make up CSSD, I got this answer from Clusterware, matching the log file output:

# crsctl lsmodules css
List CSSD Debug Module: CLSF
List CSSD Debug Module: CSSD
List CSSD Debug Module: GIPCCM
List CSSD Debug Module: GIPCGM
List CSSD Debug Module: GIPCNM
List CSSD Debug Module: GPNP
List CSSD Debug Module: OLR
List CSSD Debug Module: SKGFD

To get detailed information about the log level of CSSD:module, I can now finally use the crsctl get {log|trace} daemon:module command. My CSSD logfile stated the CSSD:OLR had log level 0 and trace level 0, confirmed by Clusterware:

# crsctl get log css “OLR”
Get CSSD Module: OLR  Log Level: 0

Note that the name of the module has to be in upper case. If you are lazy like me you could use the “all” keyword instead of the module name to list all the information in one command:

# crsctl get log css all
Get CSSD Module: CLSF  Log Level: 0
Get CSSD Module: CSSD  Log Level: 2
Get CSSD Module: GIPCCM  Log Level: 2
Get CSSD Module: GIPCGM  Log Level: 2
Get CSSD Module: GIPCNM  Log Level: 2
Get CSSD Module: GPNP  Log Level: 1
Get CSSD Module: OLR  Log Level: 0
Get CSSD Module: SKGFD  Log Level: 0

Setting Log Levels

Let’s have a look how to increase log levels for the critical components. I am picking CSSD here, simply because I want to know if Oracle records the chicken-and-egg problem when having voting files in ASM in the ocssd.log file. (CSSD needs to voting files to start, and ASM needs CSSD to run, but if the voting files are in ASM, then we have gone full circle). With all the information provided in the section above, that’s actually quite simple to do.

# crsctl set log -h
Usage:
  crsctl set {log|trace} {mdns|gpnp|css|crf|crs|ctss|evm|gipc} "=,..."
 where
   mdns          multicast Domain Name Server
   gpnp          Grid Plug-n-Play Service
   css           Cluster Synchronization Services
   crf           Cluster Health Monitor
   crs           Cluster Ready Services
   ctss          Cluster Time Synchronization Service
   evm           EventManager
   gipc          Grid Interprocess Communications
   , ...    Module names ("all" for all names)
   , ...     Module log/trace levels

  crsctl set log res =
 where
      Resource name
                     Agent log levels

The object I would like to research is the voting file discovery. I know that the ASM disks have an entry in the header indicating whether or not a disk contains a voting file, in the kfdhdb.vfstart and kfdhdb.vfend fields as shown here:

# kfed read /dev/oracleasm/disks/OCR1  | grep kfdhdb.vf
kfdhdb.vfstart:                     256 ; 0x0ec: 0×00000100
kfdhdb.vfend:                       288 ; 0x0f0: 0×00000120

With the default log information, I can see the voting disk discovery happening as shown in this cut down version of the ocssd.log:

2011-06-02 09:06:19.941: [    CSSD][1078901056]clssnmvDDiscThread: using discovery string  for initial discovery
2011-06-02 09:06:19.941: [   SKGFD][1078901056]Discovery with str::
2011-06-02 09:06:19.941: [   SKGFD][1078901056]UFS discovery with ::
2011-06-02 09:06:19.941: [   SKGFD][1078901056]OSS discovery with ::
2011-06-02 09:06:19.941: [   SKGFD][1078901056]Discovery with asmlib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: str ::
2011-06-02 09:06:19.941: [   SKGFD][1078901056]Fetching asmlib disk :ORCL:OCR1:

2011-06-02 09:06:19.942: [   SKGFD][1078901056]Handle 0x58d2a60 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:OCR1:
2011-06-02 09:06:19.942: [   SKGFD][1078901056]Handle 0x58d3290 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:OCR2:
2011-06-02 09:06:19.942: [   SKGFD][1078901056]Handle 0x58d3ac0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:OCR3:

2011-06-02 09:06:19.953: [    CSSD][1078901056]clssnmvDiskVerify: discovered a potential voting file
2011-06-02 09:06:19.953: [   SKGFD][1078901056]Handle 0x5c22670 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk  :ORCL:OCR1:
2011-06-02 09:06:19.954: [    CSSD][1078901056]clssnmvDiskVerify: Successful discovery for disk ORCL:OCR1, UID 4700d9a1-93094ff5-bf8b96e1-7efdb883, Pending CIN 0:1301401766:0, Committed CIN 0:1301401766:0
2011-06-02 09:06:19.954: [   SKGFD][1078901056]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x5c22670 for disk :ORCL:OCR1:

2011-06-02 09:06:19.956: [    CSSD][1078901056]clssnmvDiskVerify: Successful discovery of 3 disks
2011-06-02 09:06:19.956: [    CSSD][1078901056]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2011-06-02 09:06:19.957: [    CSSD][1078901056]clssnmCompleteVFDiscovery: Completing voting file discovery

Now I’m setting the new log level for SKGFD, as this seems to be responsible for the disk discovery.

# crsctl set log css “SKGFD:5″
Set CSSD Module: SKGFD  Log Level: 5

Does it make a difference? Restarting Clusterware should give more clues. Looking at the logfile I could tell there was no difference. I tried using the trace level now for the same module:

# crsctl get trace css SKGFD
Get CSSD Module: SKGFD  Trace Level: 5

Interestingly that wasn’t picked up!

$ grep “Module = SKGFD” ocssd.log | tail -n1
2011-06-02 11:31:06.707: [    CSSD][427846944]clsu_load_ENV_levels: Module = SKGFD, LogLevel = 2, TraceLevel = 0

So I decided to leave it at this point as I was running out of time for this test. I’d be interested if anyone has successfully raised the log level for this component. On the other hand, I wonder if it is ever necessary to do so. I can confirm however that raising the log level for the CSSD module created a lot more output, actually so much that it seems to equal a system wide SQL trace-so don’t change log levels unless support directs you to.


A more user friendly multipath.conf

During some recent work I did involving a stretched 11.2.0.2 RAC for a SAP implementation at a customer site I researched TimeFinder/Clone backups. As part of this exercise I have been able to experiment with RHEL (OEL) 5.6 and the new device mapper multipath package on the mount host. I have been very pleasantly surprise about this new feature which I’d like to share.

Background of this article

Device Mapper Multipath is the “native” Linux multipathing software, as opposed to vendor-supplied multipathing such as EMC’s Power Path or Hitachi’s HDLM.

My customer’s setup is rather unique for a SAP environment as it uses Oracle Enterprise Linux and not Solaris/SPARC or AIX on the Power platform with an active/passive solution. Well if that doesn’t make it sound unique, the fact that there is a plan to run Oracle 11.2.0.2 RAC potentially across sites using ASM and ACFS certainly makes this deployment stand out from the rest.

The reason I mention this is simple-it requires a lot of engineering effort to certify components for this combination. For example: it was not trivial to get vendor support for Solutions Enabler the storage engineering uses for connectivity with the VMAX arrays, and so on. After all, Oracle Enterprise is a fairly new platform and Red Hat certainly has an advantage when it comes to vendor certification.

What hasn’t been achieved was a certification of the EMC Power Path software for use with SAP on Linux, for reasons unknown to me. The result was simple: the setup will use the device-mapper multipath package that comes with the Linux distribution.

Configuring multipath

Now with this said I started looking at MOS to get relevant support notes about Linux’s native multipath and found some. The summary of this research is available on this blog, I have written about it in these articles:

What I didn’t know up to date was the fact that the multipath.conf file allows you to define the ownership and mode of a device. As an ASM user this is very important to me. Experience taught me that incorrect permissions are one of the main reasons for ASM failing to start. Remember that root owns block devices by default.

Consider this example:

multipaths {
...
multipath {
wwid        360a98000486e58526c34515944703277
alias       ocr01
mode        660
uid         501
gid         502
}
...
}

The above entry in the multipaths section translates into English as follows:

  • If you encounter a device with the WWID 36a…277
  • Give it an alias name of “OCR01” in /dev/mapper
  • Set the mode to 660 (i.e. rw-rw—)
  • Assign device ownership to the user with UID 501 (maps to “grid” on my system)
  • Assign the group of the device to 502 (maps to “asmdba” on my system)

The path settings are defined globally and don’t need to be mentioned explicitly for each device unless you prefer to override them. I like to use an alias although it isn’t really necessary since ASM relies on a 4k header in a block device to store its identity. If you don’t chose to alias a device I recommend you use the user friendly name instead, mainly for aesthetic reasons.

Why is this really cool? For me it implies two things:

  • I can now be independent of ASMLib which provided device name stability and set the permissions on the block devices correctly for ASM
  • I don’t need to create udev rules to set the permissions (or /etc/rc.local or whichever other way you chose before to set permissions)

Nice! One less headache, as I have to say that I didn’t really like udev…

Fedora 15: First big problem…

Yesterday I hit a pretty major problem with Fedora 15. I did a reboot and the login screen came up fine, but when I tried to log in I got a message saying,

failed to load session ‘gnome’

No options or alternatives. Just back to the login screen. ??

I started the machine up in “Full multiuser mode” by hitting the “a” key during boot and adding “3″ on to the boot parameters. Once at the login prompt I could now log in as root. Since it looked like it might be a GNOME problem I uninstalled and reinstalled GNOME.

yum -y groupremove "GNOME Desktop Environment"
yum -y groupinstall "GNOME Desktop Environment"

No change!

My next thought was to install KDE, so at least I would have a desktop. I did this using,

yum -y groupinstall kde

I made KDE the default window manager by editing the “/etc/sysconfig/desktop” file to contain.

DISPLAYMANAGER=KDE

The machine now rebooted and I got KDM as the display manager. This allowed me to start KDE, but surprisingly, also allowed me to start GNOME as my window manager.

Now I figured it was probably an issue with GDM, not GNOME itself, so I reinstalled GDM.

yum -y remove gdm
yum -y install gdm
yum -y install gdm-plugin-fingerprint

Bingo. I was now able to switch back to GDM as my display manager by editing the “/etc/sysconfig/desktop” file to contain.

DISPLAYMANAGER=GNOME

I have no idea what happened to cause this problem in the first place. Googling for a solution wasn’t much help because most posts are really old and the new ones just said reinstall.

If anyone else has misfortune to run into this issue, you now know how I got out of it.

Incidentally, my brief time on KDE did not fill me with a desire to switch. I think I prefer GNOME. I am however a little nervous about the stability of Fedora 15 after this incident. Maybe I did something dumb to cause it, but if I did, I have no idea what it was. I’m just running a browser and VirtualBox VMs for the most part.

Cheers

Tim…