This post is long overdue, and it’s as much a reminder to myself as it is a post for you to find when scratching your head.
Here’s the situation: Oracle Restart on an Oracle Enterprise Linux 5.5 system with ASMLib in use. I have installed 11.2.0.2 plus the latest PSU to bring database and Grid Infrastructure to 11.2.0.2.5-so far so good. I have used the non-GUI approach and everything seemed ok until I tried to create a database for some benchmarking. Here’s the surprising output:
During some recent work I did involving a stretched 11.2.0.2 RAC for a SAP implementation at a customer site I researched TimeFinder/Clone backups. As part of this exercise I have been able to experiment with RHEL (OEL) 5.6 and the new device mapper multipath package on the mount host. I have been very pleasantly surprise about this new feature which I’d like to share.
Background of this article
Device Mapper Multipath is the “native” Linux multipathing software, as opposed to vendor-supplied multipathing such as EMC’s Power Path or Hitachi’s HDLM.
My customer’s setup is rather unique for a SAP environment as it uses Oracle Enterprise Linux and not Solaris/SPARC or AIX on the Power platform with an active/passive solution. Well if that doesn’t make it sound unique, the fact that there is a plan to run Oracle 11.2.0.2 RAC potentially across sites using ASM and ACFS certainly makes this deployment stand out from the rest.
The reason I mention this is simple-it requires a lot of engineering effort to certify components for this combination. For example: it was not trivial to get vendor support for Solutions Enabler the storage engineering uses for connectivity with the VMAX arrays, and so on. After all, Oracle Enterprise is a fairly new platform and Red Hat certainly has an advantage when it comes to vendor certification.
What hasn’t been achieved was a certification of the EMC Power Path software for use with SAP on Linux, for reasons unknown to me. The result was simple: the setup will use the device-mapper multipath package that comes with the Linux distribution.
Configuring multipath
Now with this said I started looking at MOS to get relevant support notes about Linux’s native multipath and found some. The summary of this research is available on this blog, I have written about it in these articles:
What I didn’t know up to date was the fact that the multipath.conf file allows you to define the ownership and mode of a device. As an ASM user this is very important to me. Experience taught me that incorrect permissions are one of the main reasons for ASM failing to start. Remember that root owns block devices by default.
Consider this example:
multipaths {
...
multipath {
wwid 360a98000486e58526c34515944703277
alias ocr01
mode 660
uid 501
gid 502
}
...
}
The above entry in the multipaths section translates into English as follows:
The path settings are defined globally and don’t need to be mentioned explicitly for each device unless you prefer to override them. I like to use an alias although it isn’t really necessary since ASM relies on a 4k header in a block device to store its identity. If you don’t chose to alias a device I recommend you use the user friendly name instead, mainly for aesthetic reasons.
Why is this really cool? For me it implies two things:
Nice! One less headache, as I have to say that I didn’t really like udev…
Julian Dyke has started an interesting thread on the Oak Table mailing list after the latest UKOUG RAC and HA SIG. Unfortunately I couldn’t attend that event, I wish I had, and I knew it would be great.
Anyway, the question revolved around an ASM disk group created with normal redundancy spanning two storage arrays. This should in theory protect against the failure of an array, although at a high price. All ASM disks exported from an array would be 1 failure group. Remember that disks in a failure group all fail if the supporting infrastructure (network, HBA, controller etc) fails. So what would happen with such a setup, if you followed these steps:
ASM can tolerate the failure of one failgroup (capacity permitting), so the failure of failgroup 2 should not bring the disk group down, which would result in immediate loss of service. But what happens if it comes up after the data in the other failure group has been modified? Will there be data corruption?
Replaying
To simulate two storage arrays my distinguished filer01 and filer02 OpenFiler appliances have been used, each exporting 2 approx. 4G “LUNS” to my database host. At this time I only had access to my 11.1.0.7 2 node RAC system, if time permits I’ll repeat this with 10.2.0.5 and 11.2.0.2. The RAC cluster in the SIG presentation was 10.2. I am skipping the bit about the LUN creation and presentation to the hosts, and assume the following setup:
[root@rac11gr1node1 ~]# iscsiadm -m session tcp: [1] 192.168.99.51:3260,1 iqn.2006-01.com.openfiler:filer02DiskB tcp: [2] 192.168.99.50:3260,1 iqn.2006-01.com.openfiler:filer01DiskA tcp: [3] 192.168.99.51:3260,1 iqn.2006-01.com.openfiler:filer02DiskA tcp: [4] 192.168.99.50:3260,1 iqn.2006-01.com.openfiler:filer01DiskB
192.168.99.50 is my first openfiler instance, .192.168.99.51 the second. As you can see each export DISKA and DISKB. Mapped to the hosts, this is the target mapping (use iscsiadm –mode session –print 3 to find out):
I am using ASMLib (as always on the lab) to label these disks:
[root@rac11gr1node1 ~]# oracleasm listdisks DATA1 DATA2 FILER01DISKA FILER01DISKB FILER02DISKA FILER02DISKB
DATA1 and DATA2 will not play a role in this article, I’m interested in the other disks. Assuming that the scandisks command completed on all nodes, I can add the disks to the new diskgroup:
SQL> select path from v$asm_disk PATH -------------------------------------------------------------------------------- ORCL:FILER01DISKA ORCL:FILER01DISKB ORCL:FILER02DISKA ORCL:FILER02DISKB ORCL:DATA1 ORCL:DATA2
Let’s create the diskgroup. The important part is to create failure groups per storage array. By the way this is not different from extended distance RAC!
SQL> create diskgroup fgtest normal redundancy 2 failgroup filer01 disk 'ORCL:FILER01DISKA', 'ORCL:FILER01DISKB' 3 failgroup filer02 disk 'ORCL:FILER02DISKA', 'ORCL:FILER02DISKB' 4 attribute 'compatible.asm'='11.1'; Diskgroup created.
With that done let’s have a look at the asm disk information:
SQL> select MOUNT_STATUS,HEADER_STATUS,STATE,REDUNDANCY,FAILGROUP,PATH from v$asm_disk where group_number=2; MOUNT_S HEADER_STATU STATE REDUNDA FAILGROUP PATH ------- ------------ -------- ------- ------------------------------ -------------------- CACHED MEMBER NORMAL UNKNOWN FILER01 ORCL:FILER01DISKA CACHED MEMBER NORMAL UNKNOWN FILER01 ORCL:FILER01DISKB CACHED MEMBER NORMAL UNKNOWN FILER02 ORCL:FILER02DISKA CACHED MEMBER NORMAL UNKNOWN FILER02 ORCL:FILER02DISKB
I have set the disk repair time to 24 hours and raised compatible parameters for RDBMS and ASM to 11.1, resulting in these attributes:
SQL> select * from v$asm_attribute NAME VALUE GROUP_NUMBER ATTRIBUTE_INDEX ATTRIBUTE_INCARNATION READ_ON SYSTEM_ ------------------------------ -------------------- ------------ --------------- --------------------- ------- ------- disk_repair_time 3.6h 2 0 1 N Y au_size 1048576 2 8 1 Y Y compatible.asm 11.1.0.0.0 2 20 1 N Y compatible.rdbms 11.1.0.0.0 2 21 1 N Y
Unlike 11.2 where disk groups are managed as Clusterware resource, 11.1 requires you to manually start them or append the new disk group to ASM_DISKGORUPS. You should query gv$asm_diskgroup.state to ensure the new diskgroup is mounted on all cluster nodes.
I need some data! A small demo database can be restored to the new failure group to provide some experimental playground. This is quite easily done by using an RMAN duplicate with the correct {db|log}_file_name_convert parameter set.
Mirror
The diskgroup is created with normal redundancy, which means that ASM will create a mirror for every primary extent, taking failure groups into consideration. I wanted to ensure that the data is actually mirrored on the new disk group, which has group number 2.I need to get this information from V$ASM_FILE and V$ASM_ALIAS:
SQL> select * from v$asm_file where group_number = 2 GROUP_NUMBER FILE_NUMBER COMPOUND_INDEX INCARNATION BLOCK_SIZE BLOCKS BYTES SPACE TYPE REDUND STRIPE CREATION_ MODIFICAT R ------------ ----------- -------------- ----------- ---------- ---------- ---------- ---------- -------------------- ------ ------ --------- --------- - 2 256 33554688 747669775 16384 1129 18497536 78643200 CONTROLFILE HIGH FINE 05-APR-11 05-APR-11 U 2 257 33554689 747669829 8192 69769 571547648 1148190720 DATAFILE MIRROR COARSE 05-APR-11 05-APR-11 U 2 258 33554690 747669829 8192 60161 492838912 990904320 DATAFILE MIRROR COARSE 05-APR-11 05-APR-11 U 2 259 33554691 747669829 8192 44801 367009792 739246080 DATAFILE MIRROR COARSE 05-APR-11 05-APR-11 U 2 260 33554692 747669831 8192 25601 209723392 424673280 DATAFILE MIRROR COARSE 05-APR-11 05-APR-11 U 2 261 33554693 747669831 8192 641 5251072 12582912 DATAFILE MIRROR COARSE 05-APR-11 05-APR-11 U 2 262 33554694 747670409 512 102401 52429312 120586240 ONLINELOG MIRROR FINE 05-APR-11 05-APR-11 U 2 263 33554695 747670409 512 102401 52429312 120586240 ONLINELOG MIRROR FINE 05-APR-11 05-APR-11 U 2 264 33554696 747670417 512 102401 52429312 120586240 ONLINELOG MIRROR FINE 05-APR-11 05-APR-11 U 2 265 33554697 747670417 512 102401 52429312 120586240 ONLINELOG MIRROR FINE 05-APR-11 05-APR-11 U 2 266 33554698 747670419 8192 2561 20979712 44040192 TEMPFILE MIRROR COARSE 05-APR-11 05-APR-11 U 11 rows selected. SQL> select * from v$asm_alias where group_NUMBER=2 NAME GROUP_NUMBER FILE_NUMBER FILE_INCARNATION ALIAS_INDEX ALIAS_INCARNATION PARENT_INDEX REFERENCE_INDEX A S ------------------------------ ------------ ----------- ---------------- ----------- ----------------- ------------ --------------- - - RAC11G 2 4294967295 4294967295 0 3 33554432 33554485 Y Y CONTROLFILE 2 4294967295 4294967295 53 3 33554485 33554538 Y Y current.256.747669775 2 256 747669775 106 3 33554538 50331647 N Y DATAFILE 2 4294967295 4294967295 54 1 33554485 33554591 Y Y SYSAUX.257.747669829 2 257 747669829 159 1 33554591 50331647 N Y SYSTEM.258.747669829 2 258 747669829 160 1 33554591 50331647 N Y UNDOTBS1.259.747669829 2 259 747669829 161 1 33554591 50331647 N Y UNDOTBS2.260.747669831 2 260 747669831 162 1 33554591 50331647 N Y USERS.261.747669831 2 261 747669831 163 1 33554591 50331647 N Y ONLINELOG 2 4294967295 4294967295 55 1 33554485 33554644 Y Y group_1.262.747670409 2 262 747670409 212 1 33554644 50331647 N Y group_2.263.747670409 2 263 747670409 213 1 33554644 50331647 N Y group_3.264.747670417 2 264 747670417 214 1 33554644 50331647 N Y group_4.265.747670417 2 265 747670417 215 1 33554644 50331647 N Y TEMPFILE 2 4294967295 4294967295 56 1 33554485 33554697 Y Y TEMP.266.747670419 2 266 747670419 265 1 33554697 50331647 N Y
My USERS tablespace which I am interested in most has file number 261-I chose it for this example as it’s only 5M in size. Taking my 1 MB allocation unit into account, it means I don’t have to trawl through thousands of line of output when getting the extent map.
Credit where credit is due-the next queries are partly based on the excellent work by Luca Canali from CERN, who has looked at ASM internals for a while. Make sure you have a look at the excellent reference available here: https://twiki.cern.ch/twiki/bin/view/PDBService/ASM_Internals. So to answer the question if the extents making up my users tablespace we need to have a look at the X$KFFXP, i.e. file extent pointers view:
SQL> select GROUP_KFFXP,DISK_KFFXP,AU_KFFXP from x$kffxp where number_kffxp=261 and group_kffxp=2 order by disk_kffxp; GROUP_KFFXP DISK_KFFXP AU_KFFXP ----------- ---------- ---------- 2 0 864 2 0 865 2 0 866 2 1 832 2 1 831 2 1 833 2 2 864 2 2 866 2 2 865 2 3 832 2 3 833 2 3 831 12 rows selected.
As you can see, I have a number of extents, all evenly spread over my disks. I can verify that this information is correct by querying the X$KFDAT view as well which contains similar information, but more related to the disk to AU mapping
SQL> select GROUP_KFDAT,NUMBER_KFDAT,AUNUM_KFDAT from x$kfdat where fnum_kfdat = 261 and group_kfdat=2 GROUP_KFDAT NUMBER_KFDAT AUNUM_KFDAT ----------- ------------ ----------- 2 0 864 2 0 865 2 0 866 2 1 831 2 1 832 2 1 833 2 2 864 2 2 865 2 2 866 2 3 831 2 3 832 2 3 833 12 rows selected.
OK so I am confident that my data is actually mirrored-otherwise the following test would not make any sense. I have double checked that the disks in failgroup FILER01 actually belong to my OpenFiler “filer01″, and the same for filer02. Going back to the original scenario:
Shut down Filer02
This will take down all the disks of failure group B. Two minutes after taking the filer down I checked if it was indeed shut down:
martin@dom0:~> sudo xm list | grep filer filer01 183 512 1 -b---- 1159.6 filer02 512 1 1179.6 filer03 185 512 1 -b---- 1044.4
Yes, no doubt about it-it’s down. What would the effect be? Surely I/O errors, but I wanted to enforce a check. Connected to +ASM2 I issued the “select * from v$asm_disk” command. This caused quite significant logging in the instance’s alert.log:
NOTE: ASMB process exiting due to lack of ASM file activity for 5 seconds Wed Apr 06 17:17:39 2011 WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKA disk:0x0.0x97954459 au:0 iop:0x2b2997b61330 bufp:0x2b29977b3e00 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0 result: 4 osderr:0x3 osderr1:0x2e pid:6690 WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKB disk:0x1.0x9795445a au:0 iop:0x2b2997b61220 bufp:0x2b29977b0200 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0 result: 4 osderr:0x3 osderr1:0x2e pid:6690 Wed Apr 06 17:17:58 2011 WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKB disk:0x1.0x9795445a au:0 iop:0x2b2997b61440 bufp:0x2b29977a6400 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0 result: 4 osderr:0x3 osderr1:0x2e pid:6690 WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKA disk:0x0.0x97954459 au:0 iop:0x2b2997b61550 bufp:0x2b29977b1600 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0 result: 4 osderr:0x3 osderr1:0x2e pid:6690 Wed Apr 06 17:18:03 2011 WARNING: Disk (FILER02DISKA) will be dropped in: (86400) secs on ASM inst: (2) WARNING: Disk (FILER02DISKB) will be dropped in: (86400) secs on ASM inst: (2) GMON SlaveB: Deferred DG Ops completed. Wed Apr 06 17:19:26 2011 WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKB disk:0x1.0x9795445a au:0 iop:0x2b2997b61550 bufp:0x2b29977b1600 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0 result: 4 osderr:0x3 osderr1:0x2e pid:6690 WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKA disk:0x0.0x97954459 au:0 iop:0x2b2997b61440 bufp:0x2b29977b0200 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0 result: 4 osderr:0x3 osderr1:0x2e pid:6690 Wed Apr 06 17:20:10 2011 WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKA disk:0x0.0x97954459 au:0 iop:0x2b2997b61000 bufp:0x2b29977a6400 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0 result: 4 osderr:0x3 osderr1:0x2e pid:6690 WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:0, diskname:ORCL:FILER02DISKB disk:0x1.0x9795445a au:0 iop:0x2b2997b61550 bufp:0x2b29977b1600 offset(bytes):0 iosz:4096 operation:1(Read) synchronous:0 result: 4 osderr:0x3 osderr1:0x2e pid:6690 Wed Apr 06 17:21:07 2011 WARNING: Disk (FILER02DISKA) will be dropped in: (86217) secs on ASM inst: (2) WARNING: Disk (FILER02DISKB) will be dropped in: (86217) secs on ASM inst: (2) GMON SlaveB: Deferred DG Ops completed. Wed Apr 06 17:27:15 2011 WARNING: Disk (FILER02DISKA) will be dropped in: (85849) secs on ASM inst: (2) WARNING: Disk (FILER02DISKB) will be dropped in: (85849) secs on ASM inst: (2) GMON SlaveB: Deferred DG Ops completed.
The interesting lines are “all mirror sides found readable, no repair required”. So taking down the failgroup didn’t cause an outage. The other ASM instance complained as well a little later:
2011-04-06 17:16:58.393000 +01:00 NOTE: initiating PST update: grp = 2, dsk = 2, mode = 0x15 NOTE: initiating PST update: grp = 2, dsk = 3, mode = 0x15 kfdp_updateDsk(): 24 kfdp_updateDskBg(): 24 WARNING: IO Failed. subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so dg:2, diskname:ORCL:FILER02DISKA disk:0x2.0x97a6d9f7 au:1 iop:0x2b9ea4855e70 bufp:0x2b9ea4850a00 offset(bytes):1052672 iosz:4096 operation:2(Write) synchronous:1 result: 4 osderr:0x3 osderr1:0x2e pid:870 NOTE: group FGTEST: updated PST location: disk 0000 (PST copy 0) 2011-04-06 17:17:03.508000 +01:00 NOTE: ASMB process exiting due to lack of ASM file activity for 5 seconds NOTE: PST update grp = 2 completed successfully NOTE: initiating PST update: grp = 2, dsk = 2, mode = 0x1 NOTE: initiating PST update: grp = 2, dsk = 3, mode = 0x1 kfdp_updateDsk(): 25 kfdp_updateDskBg(): 25 2011-04-06 17:17:07.454000 +01:00 NOTE: group FGTEST: updated PST location: disk 0000 (PST copy 0) NOTE: PST update grp = 2 completed successfully NOTE: cache closing disk 2 of grp 2: FILER02DISKA NOTE: cache closing disk 3 of grp 2: FILER02DISKB SUCCESS: extent 0 of file 267 group 2 repaired by offlining the disk NOTE: repairing group 2 file 267 extent 0 SUCCESS: extent 0 of file 267 group 2 repaired - all mirror sides found readable, no repair required 2011-04-06 17:19:04.526000 +01:00 GMON SlaveB: Deferred DG Ops completed. 2011-04-06 17:22:07.487000 +01:00 GMON SlaveB: Deferred DG Ops completed.
No interruption of service though, which is good-the GV$ASM_CLIENT view reported all database instances still connected.
SQL> select * from gv$asm_client; INST_ID GROUP_NUMBER INSTANCE_NAME DB_NAME STATUS SOFTWARE_VERSIO COMPATIBLE_VERS ---------- ------------ ---------------------------------------------------------------- -------- ------------ --------------- --------------- 2 2 rac11g2 rac11g CONNECTED 11.1.0.7.0 11.1.0.0.0 1 2 rac11g1 rac11g CONNECTED 11.1.0.7.0 11.1.0.0.0
The result in the V$ASM_DISK view was as follows:
SQL> select name,state,header_status,path from v$asm_disk; NAME STATE HEADER_STATU PATH FAILGROUP ------------------------------ -------- ------------ -------------------------------------------------- ------------------------------ NORMAL UNKNOWN ORCL:FILER02DISKA NORMAL UNKNOWN ORCL:FILER02DISKB DATA1 NORMAL MEMBER ORCL:DATA1 DATA1 DATA2 NORMAL MEMBER ORCL:DATA2 DATA2 FILER01DISKA NORMAL MEMBER ORCL:FILER01DISKA FILER01 FILER01DISKB NORMAL MEMBER ORCL:FILER01DISKB FILER01 FILER02DISKA NORMAL UNKNOWN FILER02 FILER02DISKB NORMAL UNKNOWN FILER02 8 rows selected.
As I expected the disks for failgroup filer02 are gone, and so is the information about the failure group. My disk repair time should be high enough to protect me from having to rebuild the whole disk group. Now I’m really curious if my database can become corrupted-I’ll increase the SCN.
[oracle@rac11gr1node1 ~]$ . setsid.sh rac11g [oracle@rac11gr1node1 ~]$ sq SQL*Plus: Release 11.1.0.7.0 - Production on Wed Apr 6 17:24:18 2011 Copyright (c) 1982, 2008, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options SQL> select current_scn from v$database; CURRENT_SCN ----------- 5999304 SQL> begin 2 for i in 1..5 loop 3 execute immediate 'alter system switch logfile'; 4 end loop; 5 end; 6 / PL/SQL procedure successfully completed. SQL> select current_scn from v$database; CURRENT_SCN ----------- 5999378 SQL>
Back to the test case.
Stop the Database
[oracle@rac11gr1node1 ~]$ srvctl stop database -d rac11g [oracle@rac11gr1node1 ~]$ srvctl status database -d rac11g Instance rac11g2 is not running on node rac11gr1node1 Instance rac11g1 is not running on node rac11gr1node2
Done-this part was simple. Next they stopped their first filer. To prevent bad things from happening I’ll shut down ASM on all nodes. I hope that doesn’t invalidate the test but I can’t see how ASM would not get a problem if the other failgroup went down as well.
Shut down Filer01 and start Filer02
Also quite simple. Shutting down this filer will allow me to follow the story. After filer01 was down I started filer02. I’m curious as to how ASM will react. I have deliberately NOT put disk group FGTEST into the ASM_DISKSTRING, I want to start it manually to get a better understanding of what happens.
After having started ASM on both nodes, I queried V$ASM_DISK and tried to mount the disk group:
SQL> select disk_number,name,state,header_status,path,failgroup from v$asm_disk; DISK_NUMBER NAME STATE HEADER_STATU PATH FAILGROUP ----------- ------------------------------ -------- ------------ -------------------------------------------------- ------------------------------ 0 NORMAL MEMBER ORCL:FILER02DISKA 1 NORMAL MEMBER ORCL:FILER02DISKB 2 NORMAL UNKNOWN ORCL:FILER01DISKA 3 NORMAL UNKNOWN ORCL:FILER01DISKB 0 DATA1 NORMAL MEMBER ORCL:DATA1 DATA1 1 DATA2 NORMAL MEMBER ORCL:DATA2 DATA2 6 rows selected.
Ooops, now they are both gone….
SQL> alter diskgroup fgtest mount; alter diskgroup fgtest mount * ERROR at line 1: ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "1" is missing ORA-15042: ASM disk "0" is missing ORA-15080: synchronous I/O operation to a disk failed ORA-15080: synchronous I/O operation to a disk failed
OK, I have a problem here. Both ASM instances report I/O errors with the FGTEST diskgroup, and I can’t mount it. That means I can’t mount the database either-in a way it proves I won’t have corruption. But neither will I have a database, what is worse?
Can I get around this problem?
I think I’ll have to start filer01 and see if that makes a difference. Hopefully I can recover my system with the information in failgroup filer01. Soon after filer01 came online I tried the query against v$asmdisk again and tried to mount it.
SQL> select disk_number,name,state,header_status,path,failgroup from v$asm_disk; DISK_NUMBER NAME STATE HEADER_STATU PATH FAILGROUP ----------- ------------------------------ -------- ------------ -------------------------------------------------- ------------------------------ 0 NORMAL MEMBER ORCL:FILER02DISKA 1 NORMAL MEMBER ORCL:FILER02DISKB 2 NORMAL MEMBER ORCL:FILER01DISKA 3 NORMAL MEMBER ORCL:FILER01DISKB 0 DATA1 NORMAL MEMBER ORCL:DATA1 DATA1 1 DATA2 NORMAL MEMBER ORCL:DATA2 DATA2 6 rows selected.
That worked!
Wed Apr 06 17:45:32 2011 SQL> alter diskgroup fgtest mount NOTE: cache registered group FGTEST number=2 incarn=0x72c150d7 NOTE: cache began mount (first) of group FGTEST number=2 incarn=0x72c150d7 NOTE: Assigning number (2,0) to disk (ORCL:FILER01DISKA) NOTE: Assigning number (2,1) to disk (ORCL:FILER01DISKB) NOTE: Assigning number (2,2) to disk (ORCL:FILER02DISKA) NOTE: Assigning number (2,3) to disk (ORCL:FILER02DISKB) Wed Apr 06 17:45:33 2011 NOTE: start heartbeating (grp 2) kfdp_query(): 12 kfdp_queryBg(): 12 NOTE: cache opening disk 0 of grp 2: FILER01DISKA label:FILER01DISKA NOTE: F1X0 found on disk 0 fcn 0.0 NOTE: cache opening disk 1 of grp 2: FILER01DISKB label:FILER01DISKB NOTE: cache opening disk 2 of grp 2: FILER02DISKA label:FILER02DISKA NOTE: F1X0 found on disk 2 fcn 0.0 NOTE: cache opening disk 3 of grp 2: FILER02DISKB label:FILER02DISKB NOTE: cache mounting (first) group 2/0x72C150D7 (FGTEST) Wed Apr 06 17:45:33 2011 * allocate domain 2, invalid = TRUE kjbdomatt send to node 0 Wed Apr 06 17:45:33 2011 NOTE: attached to recovery domain 2 NOTE: cache recovered group 2 to fcn 0.7252 Wed Apr 06 17:45:33 2011 NOTE: LGWR attempting to mount thread 1 for diskgroup 2 NOTE: LGWR mounted thread 1 for disk group 2 NOTE: opening chunk 1 at fcn 0.7252 ABA NOTE: seq=3 blk=337 NOTE: cache mounting group 2/0x72C150D7 (FGTEST) succeeded NOTE: cache ending mount (success) of group FGTEST number=2 incarn=0x72c150d7 Wed Apr 06 17:45:33 2011 kfdp_query(): 13 kfdp_queryBg(): 13 NOTE: Instance updated compatible.asm to 11.1.0.0.0 for grp 2 SUCCESS: diskgroup FGTEST was mounted SUCCESS: alter diskgroup fgtest mount
The V$ASM_DISK view is nicely updated and everything seems to be green:
SQL> select disk_number,name,state,header_status,path,failgroup from v$asm_disk; DISK_NUMBER NAME STATE HEADER_STATU PATH FAILGROUP ----------- ------------------------------ -------- ------------ -------------------------------------------------- ------------------------------ 0 DATA1 NORMAL MEMBER ORCL:DATA1 DATA1 1 DATA2 NORMAL MEMBER ORCL:DATA2 DATA2 0 FILER01DISKA NORMAL MEMBER ORCL:FILER01DISKA FILER01 1 FILER01DISKB NORMAL MEMBER ORCL:FILER01DISKB FILER01 2 FILER02DISKA NORMAL MEMBER ORCL:FILER02DISKA FILER02 3 FILER02DISKB NORMAL MEMBER ORCL:FILER02DISKB FILER02 6 rows selected.
Brilliant-will it have an effect on the database?
Starting the Database
Even though things looked ok, they weren’t! I didn’t expect this to happen:
[oracle@rac11gr1node1 ~]$ srvctl start database -d rac11g PRKP-1001 : Error starting instance rac11g2 on node rac11gr1node1 rac11gr1node1:ora.rac11g.rac11g2.inst: rac11gr1node1:ora.rac11g.rac11g2.inst:SQL*Plus: Release 11.1.0.7.0 - Production on Wed Apr 6 17:48:58 2011 rac11gr1node1:ora.rac11g.rac11g2.inst: rac11gr1node1:ora.rac11g.rac11g2.inst:Copyright (c) 1982, 2008, Oracle. All rights reserved. rac11gr1node1:ora.rac11g.rac11g2.inst: rac11gr1node1:ora.rac11g.rac11g2.inst:Enter user-name: Connected to an idle instance. rac11gr1node1:ora.rac11g.rac11g2.inst: rac11gr1node1:ora.rac11g.rac11g2.inst:SQL> ORACLE instance started. rac11gr1node1:ora.rac11g.rac11g2.inst: rac11gr1node1:ora.rac11g.rac11g2.inst:Total System Global Area 1720328192 bytes rac11gr1node1:ora.rac11g.rac11g2.inst:Fixed Size 2160392 bytes rac11gr1node1:ora.rac11g.rac11g2.inst:Variable Size 1291847928 bytes rac11gr1node1:ora.rac11g.rac11g2.inst:Database Buffers 419430400 bytes rac11gr1node1:ora.rac11g.rac11g2.inst:Redo Buffers 6889472 bytes rac11gr1node1:ora.rac11g.rac11g2.inst:ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [9572], rac11gr1node1:ora.rac11g.rac11g2.inst:[9533], [0x000000000], [], [], [], [], [], [], [], [] rac11gr1node1:ora.rac11g.rac11g2.inst: rac11gr1node1:ora.rac11g.rac11g2.inst: rac11gr1node1:ora.rac11g.rac11g2.inst:SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production rac11gr1node1:ora.rac11g.rac11g2.inst:With the Partitioning, Real Application Clusters, OLAP, Data Mining rac11gr1node1:ora.rac11g.rac11g2.inst:and Real Application Testing options rac11gr1node1:ora.rac11g.rac11g2.inst: CRS-0215: Could not start resource 'ora.rac11g.rac11g2.inst'. PRKP-1001 : Error starting instance rac11g1 on node rac11gr1node2 CRS-0215: Could not start resource 'ora.rac11g.rac11g1.inst'.
Oops. A quick search on Metalink revealed note Ora-00600: [Kccpb_sanity_check_2], [3621501],[3621462] On Startup (Doc ID 435436.1). The explanation for the ORA-600 is that “the seq# of the last read block is higher than the seq# of the control file header block.” Oracle Support explains it with a lost write, but here the situation is quite different. Interesting! I have to leave that for another blog post.
This is a follow-up post from my previous tale of how not to move networks for RAC. After having successfully restarted the cluster as described here in a previous post I went on to install a Grid Control 11.1 system. This was to be on Solaris 10 SPARC-why SPARC? Not my platform of choice when it comes to Oracle software, but my customer has a huge SPARC estate and wants to make most of it.
After the OMS has been built (hopefully I’ll find time to document this as it can be quite tricky on SPARC) I wanted to secure the agents on my cluster against it. That worked ok for the first node:
Five minutes later the agent appeared in my list of agents in the Grid Control console. With this success backing me I went to do the same on the next cluster node.
Here things were different-here’s the sequence of commands I used:
$ emctl stop agent Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0 Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved $
I didn’t pay too much attention to the fact that there has been no acknowledgement of the completion of the stop command. I noticed something wasn’t quite right when I tried to get the agent’s status:
$ emctl status agent Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0 Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved. --------------------------------------------------------------- emctl stop agent Error connecting to https://node2.example.com:3872/emd/main
Now that should have reported that the agent was down. Strange. I tried a few more commands, such as the following one to start the agent.
[agent]oracle@node2.example.com $ emctl start agent Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0 Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved. Agent is already running
Which wasn’t the case-there was no agent process whatsoever in the process table. I also checked the emd.properties file. Note that the emd.properties file is in $AGENT_HOME/hostname/sysman/config/ now instead of $AGENT_HOME/sysman/config as it was in 10g.
Everything looked correct, and even a comparison with the first node didn’t reveal any discrepancy. So I scratched my head a little more until I found a MOS note on the subject stating that the agent cannot listen to multiple addresses. The note is for 10g only and has the rather clumsy title “Grid Control Agent Startup: “emctl start agent” Command Returns “Agent is already running” Although the Agent is Stopped (Doc ID 1079424.1)
Although stating it’s for 10g and multiple NICs it got me thinking. And indeed, the /etc/hosts file has not been updated, leaving the old cluster address in /etc/hosts while the new one was in DNS.
# grep node2 /etc/hosts
10.x.x4.42 node2.example.com node2 172.x.x.1x8 node2-priv.example.com node2-priv
# host node2.example.com node2.example.com has address 10.x5.x8.3
[root@node2 ~]# grep ^hosts /etc/nsswitch.conf hosts: files dns
This also explained why the agent started on the first node-it had an updated /etc/hosts file. Why the other nodes didn’t have their hosts file updated will forever remain a mystery.
Things then changed dramatically after the hosts file has been updated:
$ emctl status agent Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0 Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved. --------------------------------------------------------------- Agent is Not Running
Note how emctl acknowledges that the agent it down now. I successfully secured and started the agent:
$ emctl secure agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0 Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved. Agent is already stopped... Done. Securing agent... Started. Enter Agent Registration Password : Securing agent... Successful. $ emctl status agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0 Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved. --------------------------------------------------------------- Agent is Not Running
$ emctl start agent Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0 Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved. Starting agent .............. started.
One smaller problem remained:
$ emctl status agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0 Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved. --------------------------------------------------------------- Agent Version : 11.1.0.1.0 OMS Version : 11.1.0.1.0 Protocol Version : 11.1.0.0.0 Agent Home : /u01/app/oracle/product/agent11g/node8.example.com Agent binaries : /u01/app/oracle/product/agent11g Agent Process ID : 14045 Parent Process ID : 14014 Agent URL : https://node8.example.com:3872/emd/main Repository URL : https://oms.example.com:1159/em/upload Started at : 2011-02-14 09:59:03 Started by user : oracle Last Reload : 2011-02-14 10:00:13 Last successful upload : 2011-02-14 10:00:19 Total Megabytes of XML files uploaded so far : 11.56 Number of XML files pending upload : 188 Size of XML files pending upload(MB) : 65.89 Available disk space on upload filesystem : 60.11% Collection Status : Disabled by Upload Manager Last successful heartbeat to OMS : 2011-02-14 10:00:17 --------------------------------------------------------------- Agent is Running and Ready
The message in red highlights the “Disabled by Upload Manager”. That’s because a lot of stuff hasn’t been transferred yet. Let’s force an upload-I know the communication between agent and OMS is working, so that should resolve the issue.
$ emctl upload
$ emctl status agent
Oracle Enterprise Manager 11g Release 1 Grid Control 11.1.0.1.0 Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved. --------------------------------------------------------------- Agent Version : 11.1.0.1.0 OMS Version : 11.1.0.1.0 Protocol Version : 11.1.0.0.0 Agent Home : /u01/app/oracle/product/agent11g/node8.example.com Agent binaries : /u01/app/oracle/product/agent11g Agent Process ID : 14045 Parent Process ID : 14014 Agent URL : https://node8.example.com:3872/emd/main Repository URL : https://oms.example.com:1159/em/upload Started at : 2011-02-14 09:59:03 Started by user : oracle Last Reload : 2011-02-14 10:02:12 Last successful upload : 2011-02-14 10:02:53 Total Megabytes of XML files uploaded so far : 91.12 Number of XML files pending upload : 22 Size of XML files pending upload(MB) : 1.50 Available disk space on upload filesystem : 60.30% Last successful heartbeat to OMS : 2011-02-14 10:02:19 --------------------------------------------------------------- Agent is Running and Ready
That’s about it-a few minutes later the agent was visible on the console. Now that only had to be repeated for all remaining 6 nodes…
NB: For the reasons shown in this article I don’t endorse duplicating host information in /etc/hosts and DNS-a resilient DNS infrastructure should always be used to store this kind of information.
Today is patch day – on my current site we are having quite a few problems with our management server (OMS from now on). It occasionally simply “hangs” and doesn’t respond when connecting to the SSH port. A few minutes later it’s back to normal-but this behaviour is not reproducible.
So in an effort to please Oracle support who couldn’t find a reason for this I decided to apply Patch Set Update 2 to the OMS to get it to 11.1.0.1.2. I would also like to filter the corresponding agent patch for our 11.1 agents through as well. The PSU has been released 2 weeks ago so it’s reasonably fresh.The patches for OMS and agent are generic, which is nice as it implies they are available for every platform. Our OMS had had problems before, and one-off patches have been applied. So the first step as always with PSUs is to check if there are conflicts. The readme file has the required instructions. I unzipped p10270073_111010_Generic.zip in /tmp/ and then executed the prerequisite checker in /tmp as shown in this example:
[oms]oracle@oms $ $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir ./10270073 Invoking OPatch 11.1.0.8.0 Oracle Interim Patch Installer version 11.1.0.8.0 Copyright (c) 2009, Oracle Corporation. All rights reserved. PREREQ session Oracle Home : /u01/app/oracle/product/middleware/oms11g Central Inventory : /u01/app/oracle/product/oraInventory from : /var/opt/oracle/oraInst.loc OPatch version : 11.1.0.8.0 OUI version : 11.1.0.8.0 OUI location : /u01/app/oracle/product/middleware/oms11g/oui Log file location : /u01/app/oracle/product/middleware/oms11g/cfgtoollogs/opatch/opatch2011-02-01_08-48-14AM.log Patch history file: /u01/app/oracle/product/middleware/oms11g/cfgtoollogs/opatch/opatch_history.txt OPatch detects the Middleware Home as "/u01/app/oracle/product/middleware" Invoking prereq "checkconflictagainstohwithdetail" ZOP-40: The patch(es) has conflicts/supersets with other patches installed in the Oracle Home (or) among themselves. Prereq "checkConflictAgainstOHWithDetail" failed. Summary of Conflict Analysis: Patches that can be applied now without any conflicts are : 10270073 Following patches are not required, as they are subset of the patches in Oracle Home or subset of the patches in the given list : 9563902, 9537948, 9489355 Following patches will be rolled back from Oracle Home on application of the patches in the given list : 9563902, 9537948, 9489355 Conflicts/Supersets for each patch are: Patch : 10270073 Bug Superset of 9563902 Super set bugs are: 9491872, 9476313, 9544428 Bug Superset of 9537948 Super set bugs are: 9537948 Bug Superset of 9489355 Super set bugs are: 9489355 OPatch succeeded. [oms]oracle@oms $OK, so a few one-offs will be rolled back. Let’s get started. First of all we have to stop the OMS as shown here:
[oms]oracle@oms $ emctl stop omsOracle Enterprise Manager 11g Release 1 Grid Control Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved. Stopping WebTier... WebTier Successfully Stopped Stopping Oracle Management Server... Oracle Management Server Successfully Stopped Oracle Management Server is DownThe next step is to apply the patch. Here’s the sample session:
[oms]oracle@oms $ cd 10270073 [oms]oracle@oms $ $ORACLE_HOME/OPatch/opatch apply Invoking OPatch 11.1.0.8.0 Oracle Interim Patch Installer version 11.1.0.8.0 Copyright (c) 2009, Oracle Corporation. All rights reserved. Oracle Home : /u01/app/oracle/product/middleware/oms11g Central Inventory : /u01/app/oracle/product/oraInventory from : /var/opt/oracle/oraInst.loc OPatch version : 11.1.0.8.0 OUI version : 11.1.0.8.0 OUI location : /u01/app/oracle/product/middleware/oms11g/oui Log file location : /u01/app/oracle/product/middleware/oms11g/cfgtoollogs/opatch/opatch2011-02-01_09-00-00AM.log Patch history file: /u01/app/oracle/product/middleware/oms11g/cfgtoollogs/opatch/opatch_history.txt OPatch detects the Middleware Home as "/u01/app/oracle/product/middleware" ApplySession applying interim patch '10270073' to OH '/u01/app/oracle/product/middleware/oms11g' Interim patch 10270073 is a superset of the patch(es) [ 9563902 9537948 9489355 ] in the Oracle Home OPatch will rollback the subset patches and apply the given patch. Execution of 'sh /tmp/10270073/custom/scripts/init -apply 10270073 ': Return Code = 0 Running prerequisite checks... OPatch detected non-cluster Oracle Home from the inventory and will patch the local system only. Backing up files and inventory (not for auto-rollback) for the Oracle Home Backing up files affected by the patch '10270073' for restore. This might take a while... Backing up files affected by the patch '9563902' for restore. This might take a while... Backing up files affected by the patch '9537948' for restore. This might take a while... Backing up files affected by the patch '9489355' for restore. This might take a while... ApplySession rolling back interim patch '9563902' from OH '/u01/app/oracle/product/middleware/oms11g' Patching component oracle.sysman.oms.core, 11.1.0.1.0... Copying file to "/u01/app/oracle/product/middleware/oms11g/sysman/omsca/scripts/wls/create_domain.py" RollbackSession removing interim patch '9563902' from inventory ApplySession rolling back interim patch '9537948' from OH '/u01/app/oracle/product/middleware/oms11g' Patching component oracle.sysman.oms.core, 11.1.0.1.0... Updating jar file "/u01/app/oracle/product/middleware/oms11g/sysman/jlib/emInstall.jar" with "/u01/app/oracle/product/middleware/oms11g/.patch_storage/9537948_Apr_12_2010_03_37_52/files//sysman/jlib/emInstall.jar/oracle/sysman/configassistant/addon/AddOnConfigAssistantDriver.class" Updating jar file "/u01/app/oracle/product/middleware/oms11g/sysman/jlib/emInstall.jar" with "/u01/app/oracle/product/middleware/oms11g/.patch_storage/9537948_Apr_12_2010_03_37_52/files//sysman/jlib/emInstall.jar/oracle/sysman/configassistant/addon/AddOnConfigAssistantDriver$1.class" Updating jar file "/u01/app/oracle/product/middleware/oms11g/sysman/jlib/emInstall.jar" with "/u01/app/oracle/product/middleware/oms11g/.patch_storage/9537948_Apr_12_2010_03_37_52/files//sysman/jlib/emInstall.jar/oracle/sysman/configassistant/addon/AddOnConfigAssistantDriver$AddOnFileFilter.class" RollbackSession removing interim patch '9537948' from inventory ApplySession rolling back interim patch '9489355' from OH '/u01/app/oracle/product/middleware/oms11g' Patching component oracle.sysman.oms.core, 11.1.0.1.0... Copying file to "/u01/app/oracle/product/middleware/oms11g/bin/HAConfigCmds.pm" RollbackSession removing interim patch '9489355' from inventory OPatch back to application of the patch '10270073' after auto-rollback. Backing up files affected by the patch '10270073' for rollback. This might take a while... Patching component oracle.sysman.oms.core, 11.1.0.1.0... Updating jar file "/u01/app/oracle/product/middleware/oms11g/sysman/jlib/emCORE.jar" with "/sysman/jlib/emCORE.jar/oracle/sysman/eml/ecm/policy/PolicyViolationsController.class" [...] Copying file to "/u01/app/oracle/product/middleware/oms11g/bin/SecureOMSCmds.pm" Copying file to "/u01/app/oracle/product/middleware/oms11g/sysman/emdrep/scripts/SecureAgent_oms.pl" ApplySession adding interim patch '10270073' to inventory Verifying the update... Inventory check OK: Patch ID 10270073 is registered in Oracle Home inventory with proper meta-data. Files check OK: Files from Patch ID 10270073 are present in Oracle Home. Execution of 'sh /tmp/10270073/custom/scripts/post -apply 10270073 ': Return Code = 0 -------------------------------------------------------------------------------- The following warnings have occurred during OPatch execution: 1) OUI-67620:Interim patch 10270073 is a superset of the patch(es) [ 9563902 9537948 9489355 ] in the Oracle Home -------------------------------------------------------------------------------- OPatch Session completed with warnings. OPatch completed with warnings.This process took 1 hour on my SPARC zone-not too impressed actually. But I’m blaming it on the overloaded box instead of the patch process.The outcome of the patching looks ok to me-I already knew I applied PSU 2 as a superset of 3 other patches. The next step is to apply the post install script. This is done by calling a JDBC application as shown here:
[oms]oracle@oms $ $ORACLE_HOME/bin/rcuJDBCEngine sys/secretpassword@repositoryHost:1821:REPOSDB JDBC_SCRIPT post_install_script.sql Extracting Statement from File Name: 'post_install_script.sql' Line Number: 1 Extracted SQL Statement: [alter session set current_schema=SYSMAN] Statement Type: 'DDL Statement' Executing SQL statement: alter session set current_schema=SYSMAN Extracting Statement from File Name: 'post_install_script.sql' Line Number: 1 Extracting Statement from File Name: 'post_install_script.sql' Line Number: 1 Extracted SQL Statement: [SET serveroutput on size 1000000] Skipping Unsupported Statement Statement Type: 'Oracle RCU NotSupported SQLPlus Statement' Extracting Statement from File Name: 'post_install_script.sql' Line Number: 2 Extracted SQL Statement: [BEGIN EXECUTE IMMEDIATE 'drop table bundle_component_files'; EXCEPTION WHEN OTHERS THEN IF sqlcode = -942 THEN NULL; ELSE RAISE; END IF; END; ] .... END; END IF; END IF; RAISE; END; ] Statement Type: 'BEGIN/END Anonymous Block' Completed SQL script execution normally. 1 scripts were processed [oms]oracle@oms $Now finally it’s time to start the OMS and cross our fingers to see if it worked:
[oms]oracle@oms $ emctl start oms Oracle Enterprise Manager 11g Release 1 Grid Control Copyright (c) 1996, 2010 Oracle Corporation. All rights reserved. Starting WebTier... WebTier Successfully Started Starting Oracle Management Server... Oracle Management Server Successfully Started Oracle Management Server is Up [oms]oracle@oms $I’d call this success-I managed to log on to the OMS and performed some basic testing which implied the patch was successfully applied.
Note: This discussion is potentially relevant only to OEL 5.x and RHEL 5.x- I haven’t been able to verify that it works the same way on other Linux distributions. I would assume so though. Before starting with the article, here are some facts:
The question I have asked myself many times is: how can I dynamically add a LUN to ASM without having to stop any component of the stack? Mocking “reboot-me” OS’s like Windows I soon was quiet when it came to discussing the addition of a LUN to ASM on Linux. Today I learned how to do this, by piecing together information I got from Angus Thomas, a great Red Hat system administrator I had the pleassure to work with in 2009 and 2010. And since I have a short lived memory I decided to write it down.
I’ll describe the process from the top to bottom, from the addition of the LUN to the server all the way up to the addition of the ASM disk to the disk group.
Adding the storage to the cluster nodes
The first step is to obviosuly get the LUN assigned to the server(s). This is the easy part, and outside of the control of the Linux/Oracle admin. The storage team will provision a new LUN to the hosts in question. At this stage, Linux has no idea about the new storage: to make it available, the system administrator has to rescan the SCSI bus. A proven and tested way in RHEL 5 is to issue this command:
[root@node1 ~]# for i in `ls -1 /sys/class/scsi_host`; do
> echo "- - -" > /sys/class/scsi_host/${i}/scan
> done
The new, unpartitioned LUN will appear in /proc/partitions. If it doesn’t then there’s probably something wrong on the SAN side-check /var/log/messages and talk to your storage administrator. If it’s not a misconfiguration then you may not have an option but to reboot the node.
Configure Multipathing
So far so good, the next step is to add it to the multipathing. First of all, you need to find out what the new WWID of the device is. In my case that’s simple: the last new line in /proc/partitions is usually a giveaway. If you are unsure, ask the man who can check the WWID a console to the array. It’s important to get this right at this stage :)
To add the new disk to the multipath.conf file, all you need to do is to add a new section, as in the following example:
multipaths {
..
multipath {
wwid 360000970000294900664533030344239
alias ACFS0001
path_grouping_policy failover
}
..
}
By the way, I have written a more detailed post about configuring multipathing in a previous blog post here. Don’t forget to replicate the changes to the other cluster nodes!
Now you reload multipathd using /etc/init.d/multipathd reload on each node, and voila, you should see the device in /dev/mapper/ – my ACFS disk appeared as /dev/mapper/ACFS0001.
Now the tricky bit is to partition it (if you need to-it’s no longer mandatory with 11.1 and newer. Some software like EMC’s Replication Manager requires you to though). I succeeded in doing so by checking the device in /dev/disk/by-id and then using fdisk against it as in this example:
... # fdisk /dev/disk/by-id/scsi-360000970000294900664533030344239 Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. The number of cylinders for this disk is set to 23251. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): u Changing display/entry units to sectors Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First sector (32-47619839, default 32): 128 Last sector or +size or +sizeM or +sizeK (129-47619839, default 47619839): Using default value 47619839 Command (m for help): p Disk /dev/disk/by-id/scsi-360000970000294900664533030344239: 24.3 GB, 24381358080 bytes 64 heads, 32 sectors/track, 23251 cylinders, total 47619840 sectors Units = sectors of 1 * 512 = 512 bytes Device Boot Start End Blocks Id System /dev/disk/by-id/scsi-360000970000294900664533030344239p1 128 47619839 23809855+ 83 Linux Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks.
Once you are in fdisk, the commands are identical to single-pathed storage. Type “n” to create a new partition, “p” for a primary and specify the start and end cylinders as needed. Type “p” to print the partition table, and if you are happy with it use “w” to write it. You might wonder why I added an offset and changed the unit (“u”)-this is due to the EMC storage this site uses. The EMC® Host Connectivity Guide for Linux (P/N 300-003-865 REV A23) suggests a 64k offset. Don’t simply repeat this in your environment-check with the storage team first.
Before adding the partitions to ACFS0001 and ACFS0002 I had 107 partitions:
[root@node1 ~]# wc -l /proc/partitions 107 /proc/partitions
The new partitions are recognised after the 2 fdisk commands completed:
[root@node1 ~]# wc -l /proc/partitions 107 /proc/partitions
But when you check /dev/mapper now you still don’t see the partition-the naming convention is to append pn to the device name, i.e. /dev/mapper/ACFS0001p1 for the first partition and so on.
kpartx to the rescue! This superb utility can read the partition table of a device and modify it. Initially my setup was as follows:
[root@node1 ~]# ls -l /dev/mapper/ACFS* brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001 brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002
Usually I would have rebooted the node at this stage as I didn’t know about how to update the partition table. But with kpartx (“yum install kpartx” to install) this is no longer needed. Consider the below example:
[root@node1 ~]# kpartx -l /dev/mapper/ACFS0001 ACFS0001p1 : 0 47619711 /dev/mapper/ACFS0001 129 [root@node1 ~]# kpartx -a /dev/mapper/ACFS0001 [root@node1 ~]# kpartx -l /dev/mapper/ACFS0002 ACFS0002p1 : 0 47619711 /dev/mapper/ACFS0002 129 [root@node1 ~]# kpartx -a /dev/mapper/ACFS0002 [root@node1 ~]# ls -l /dev/mapper/ACFS000* brw-rw---- 1 root disk 253, 31 Jan 18 10:05 /dev/mapper/ACFS0001 brw-rw---- 1 root disk 253, 36 Jan 18 10:13 /dev/mapper/ACFS0001p1 brw-rw---- 1 root disk 253, 32 Jan 18 10:05 /dev/mapper/ACFS0002 brw-rw---- 1 root disk 253, 37 Jan 18 10:13 /dev/mapper/ACFS0002p1
“kpartx -l” prints the partition table, and “kpartx -a” adds it as the example shows. No more need to reboot! However, as it’s been pointed out in the comments section (see below), kpartx doesn’t use/add both paths, so you should run the partprobe command to add the missing paths:
[root@node1 ~]# partprobe [root@node1 ~]# wc -l /proc/partitions 109 /proc/partitions
See how there are 109 partitions listed now instead of just 107 from before-the 2 missing paths have been added (one for each device).
Add disks to ASM
With this done, you can add the disk to ASM – I personally like the intermediate step to create and ASMLib disk. Connect to ASM as sysasm and add the disk using the alter diskgroup command:
SQL> alter diskgroup ACFSDG add disk 'ORCL:ACFS0002', 'ORCL:ACFS0001';
Now just wait for the rebalance operation to complete.
Recent comments
17 weeks 4 days ago
27 weeks 2 days ago
29 weeks 7 hours ago
32 weeks 1 day ago
34 weeks 3 days ago
44 weeks 8 hours ago
45 weeks 4 days ago
46 weeks 4 days ago
46 weeks 5 days ago
49 weeks 3 days ago