Search

Top 60 Oracle Blogs

Recent comments

domU

Build your own 11.2.0.2 stretched RAC part III

On to the next part in the series. This time I am showing how I prepared the iSCSI openFiler “appliances” on my host. This is quite straight forward, if one knows how it works :)

Setting up the openFiler appliance on the dom0

OpenFiler 2.3 has a special download option suitable for paravirtualised Xen hosts. Proceed by downloading the file from your favourite mirror, the file name I am using is “openfiler-2.3-x86_64.tar.gz”, you might have to pick another one if you don’t want a 64bit system.

All my domU go to /var/lib/xen/images/vm-name, and so do the openFiler ones. I am not using LVM to present storage to the domUs, my system came without free space I could have turned into a physical volume. Here are the steps to create the openFiler, remember to repeat this 3 times, one for each storage provider.

Begin with the first openFiler appliance. Whenever you see numbers in {} then that implies that the operation has to be repeated for each of the numbers in the curly braces.

# cd /var/lib/xen/images/
# mkdir filer0{1,2,3}
# cd filer0{1,2}

Next create the virtual disks for the appliance. I use 4G for the root file system and one 5G + 2 10G disks. The 5G disk will later on be part of the OCR and voting files disk group, whereas the other two are going to be the local ASM disks. These steps are for filer01 and filer02, the iSCSI target providers.

# dd if=/dev/zero of=disk01 bs=1 count=0 seek=4G
0+0 records in
0+0 records out
0 bytes (0 B) copied, 1.3296e-05 s, 0.0 kB/s  

# dd if=/dev/zero of=disk02 bs=1 count=0 seek=5G
# dd if=/dev/zero of=disk03 bs=1 count=0 seek=10G
# dd if=/dev/zero of=disk04 bs=1 count=0 seek=10G

For the NFS filer03, you only need two 4G disks, disk1 and disk2. For all filers, a root partition has to be created. You also have to create a file system on the “root” volume:

# mkfs.ext3 disk01
mke2fs 1.41.9 (22-Aug-2009)
disk01 is not a block special device.
Proceed anyway? (y,n) y
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
262144 inodes, 1048576 blocks
52428 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=1073741824
32 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 21 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
openSUSE-112-64-minimal:/var/lib/xen/images/filer01 #

Prepare to mount the root volume as a loop device, and also label the disk. Once mounted, copy the contents of the downloaded openfiler tarball into it as shown in this example:

# e2label disk01 root
# mkdir tmpmnt/
# mount -o loop disk01 tmpmnt/
# cd tmpmnt
# tar --gzip -xvf /m/downloads/openfiler-2.3-x86_64.tar.gz

With this done, we need to extract the kernel and the initial RAMdisk for later use in the xen config file. I have not experimented with pygrub for the openfiler appliances, someone with more knowledge may correct me here. This in any case works for this demonstration:

# mkdir  /m/xenkernels/openfiler
# cp -a /var/lib/xen/images/filer01/tmpmnt/boot /m/xenkernels/openfiler

Here are the files now stored inside the kernel directory on the dom0:

# ls -l /m/xenkernels/openfiler/
total 9276
-rw-r--r-- 1 root root  770924 May 30  2008 System.map-2.6.21.7-3.20.smp.gcc3.4.x86_64.xen.domU
-rw-r--r-- 1 root root   32220 Jun 28  2008 config-2.6.21.7-3.20.smp.gcc3.4.x86_64.xen.domU
drwxr-xr-x 2 root root    4096 Jul  1  2008 grub
-rw-r--r-- 1 root root 1112062 Jul  1  2008 initrd-2.6.21.7-3.20.smp.gcc3.4.x86_64.xen.domU.img
-rw-r--r-- 1 root root 5986208 May 14 18:01 vmlinux
-rw-r--r-- 1 root root 1558259 Jun 28  2008 vmlinuz-2.6.21.7-3.20.smp.gcc3.4.x86_64.xen.domU

With this information, at hand we can construct ourselves a xen configuration file, such as the following:

# cat filer01.xml

  filer01
  f5419d70-c124-19a9-6d64-935165c2d7d8
  524288
  524288
  1
  
  
    linux
    /m/xenkernels/openfiler/vmlinuz-2.6.21.7-3.20.smp.gcc3.4.x86_64.xen.domU
    /m/xenkernels/openfiler/initrd-2.6.21.7-3.20.smp.gcc3.4.x86_64.xen.domU.img
    root=/dev/xvda1 ro 
  
  
  destroy
  restart
  restart
  
    
      
      
      
    
    
      
      
      
    
    
      
      
      
    
    
      
      
      
    
    
      
      
      
    
      
      
      
    
      
      
    
  

In plain English, this verbose XML file describes the VM as a paravirtualised linux system with 4 hard disks and 2 network interfaces. The MAC must be static, otherwise you’ll end up with network problems each time you boot. For all currently started domUs the MAC also has to be unique! Change the UUID, name, paths to the disks (“source file”) and MAC addresses for filer02. The same applies for filer03, but this one only uses 2 disks-xvda and xvdb so please remove the disk-tags for disk03 and disk04.

Define the VM in xenstore and start it, while staying attached to the console:

# virsh define filer0{1,2,3}.xml
# xm start filer01 -c

Repeat this for filer02.xml and filer03.xml in separate terminal sessions.

Eventually, you are going to be presented with the welcome screen:

 Welcome to Openfiler NAS/SAN Appliance, version 2.3

You do not appear to have networking. Please login to start networking.

Configuring the OpenFiler domU

Log in as root (which doesn’t have a password, you should change this now!) and correct the missing network information. We have 2 virtual NICs, eth0 for the public network, and eth1 for the storage network. As root, navigate to /etc/sysconfig/network-scripts/ and edit ifcfg-eth{0,1}. In our example, we need 2 static interfaces. For eth0 for example, the existing file has the following contents:

[root@localhost network-scripts]# vi ifcfg-eth0
# Device file installed by rBuilder
DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=yes
TYPE=Ethernet

Change this to:

[root@localhost network-scripts]# cat ifcfg-eth0
# Device file installed by rBuilder
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
NETWORK=192.168.99.0
NETMASK=255.255.255.0
IPADDR=192.168.99.50

Similarly, change ifcfg-eth1 for address  192.168.101.50 and restart the network:

[root@localhost network-scripts]# service network restart

After this, ifconfig should report the correct interfaces and you are ready to access the web console.

The network for filer02 uses 192.168.99.51 for eth0 and 192.168.101.51 for eth1. Similarly, filer03 uses 192.168.99.52 for eth0 and 192.168.101.52 for eth1.

All domUs are in the internal network, you have to set up some port forwarding rules. The easiest way  to do this is in your $HOME/.ssh/config file. For my server, I set up the following options:

martin@linux-itgi:~> cat .ssh/config
Host *eq8
HostName eq8
User martin
Compression yes
# note the white space
LocalForward 4460 192.168.99.50:446
LocalForward 4470 192.168.99.51:446
LocalForward 4480 192.168.99.52:446
LocalForward 5902 192.168.99.56:5902

# other hosts
Host *
PasswordAuthentication yes
 FallBackToRsh no
martin@linux-itgi:~>

I am forwarding the local ports 4460, 4470, 4480 on my PC to the openfiler appliances. This way, I can enter https://localhost:44{6,7,8}0 to access the web frontend for the openFiler appliance. This is needed, as you can’t really administer them otherwise. When using Firefox, you’ll get a warning about certificates-I have added security exceptions because I know the web server is not conducting a man in the middle attack on me. You should always be careful adding unknown certificates to your browser in other cases.

Administering OpenFiler

NOTE: The following steps are for filer01 and filer02 only!

Once logged in as user “openfiler” (the default password is “password”), you might want to secure that password. Click on Accounts -> Admin Password and make the changes you like.

Next I recommend you verify the system setup. Click on System and review the settings. You should see the network configured correctly, and can change the hostname to filer0{1,2}.localdomain. Save your changes. Networking settings should be correct, if not you can update them here.

Next we need to partition our block devices. Previously unknown to me, openFiler uses the “gpt” format to partition disks. Click on Volumes -> Block devices to see all the block devices. Since you are running a domU, you can’t see the root device /dev/xvda. For each device (xvd{b,c,d} create one partition spanning the whole of the “disk”. You can do so by clicking on the device name. Scroll down to the “Create partition in /dev/xvdx” section and fill the data. Click “create” to create the partition. Note that you can’t see the partitions in fdisk should you log in to the appliance as root.

Once the partitions are created, it’s time to create volumes to be exported as iSCSI targets. Still in “Volumes”, click on “Volume Groups”. I chose to create the following volume groups:

  • ASM_VG with member PVs xvdc1 and xvdd1
  • OCRVOTE_VG with member PV xvdb1

Once the volume groups are created, you should proceed by creating logical volumes within these. Click on “Add Volume” to access this screen. You have a drop-down menu to select your volume group. For OCRVOTE_VG I opted to create the following logical volumes (you have to set the type to iSCSI rather than XFS):

  • ocrvote01_lv, about 2.5G in size, type iSCSI
  • ocrvote02_lv, about 2.5G in size, type iSCSI

For volume group ASM_VG, I created these logical volumes:

  • asmdata01_lv, about 10G in size, type iSCSI
  • asmdata02_lv, about 10G in size, type iSCSI

We are almost there! The storage has been carved out of the pool of available storage, and what remains to be done is the definition of the iSCSI targets and ACLs. You can define very fine grained access to iSCSI targets, and even for iSCSI discovery! This example tries to keep it simple and doesn’t use any CHAP authentication for iSCSI targets and discovery-in the real world you’d very much want to implement these security features though.

Preparing the iSCSI part

We are done for now on the Volumes tab. First, we need to enable the iSCSI target server. In “Services”, ensure that the “iSCSI target server” is enabled. If not, click on the link next to it. Before we can export any LUNs, we need to define who is eligible to mount them. In openFiler, this is configured via ACLs. Go to the “System” tab and scroll down to the “Network access configuration” section. Fill in the details of our cluster nodes here as shown below. These are the settings for edcnode1:

  • Name: edcnode1
  • Network/Host: 192.168.101.56
  • Netmaksk: 255.255.255.255 (IMPORTANT: it has to be 255.255.255.255, NOT 255.255.255.0)
  • Type: share

The settings for edcnode2 are identical, except for the IP address which is 192.168.101.58-remember, we are configuring the “STORAGE” network here! Click on “Update” to make the changes permanent. You are now ready to create the iSCSI targets, of which there will be 2: one for the OCR/Voting Disk, and another one for the ASM LUNs.

Back to the Volume tab, click on “iSCSI targets”. You will be notified that no targets have been defined yet. You will have to defined the following targets for filer01:

  • iqn.2006-01.com.openfiler:ocrvoteFiler01
  • iqn.2006-01.com.openfiler:asm01Filer01
  • iqn.2006-01.com.openfiler:asm02Filer01

Leave the default settings, they will do for our example. You simply add the name to the “Target IQN” field and then click on “Add”. The targets currently don’t support any LUNs yet, something that needs addressing in this step.

Switch to target iqn.2006-01.com.openfiler:ocrvoteFiler01 and then use the tab “LUN mapping” to map a LUN. In the list of available LUNs add ocrvote01_lv and ocrvote02_lv to the target. Click on “network ACL” and allow access to the LUN from edcnode1 and edcnode2. For the first ASM target, map asmdata01_lv and set the permissions, then repeat for the last target with asmdata02_lv.

Create the following targets for filer02:

  • iqn.2006-01.com.openfiler:ocrvoteFiler02
  • iqn.2006-01.com.openfiler:asm01Filer02
  • iqn.2006-01.com.openfiler:asm02Filer02

The mappings and settings for the ASM targets are identical to filer01, but for the OCRVOTE target only export the first logical volume, i.e. ocrvote01_lv.

NFS export

The third filer, filer03 is a little bit different in way that it only exports a NFS share to the cluster. It only has one data disk, data02. In a nutshell, create the filer as described to the point where it’s accessible via its web interface. The high level steps for it are:

  1. Partition /dev/xvdb into 1 partition spanning the whole disk
  2. Create a volume group ocrvotenfs_vg from /dev/xvdb1
  3. Create a logical volume nfsvol_lv, approx 1G in size with ext3 as its file system
  4. Enable the NFS v3 server (Services tab)

From there on the procedure is slightly different. Click on “Shares” to access the network shares available from the filer. You should see your volume group with the logical volume nfsvol_lv. Click on the link “nfsvol_lv” and enter “ocrvote” as subfolder name. A new folder icon with the name ocrvote will appear. Click on this one, and in the pop-up dialog click on “Make share”. You should set the following on the now opening lengthy configuration dialog:

  • Public guest acces
  • Host access for edcnode1 and edcnode2 for NFS RW (select the radio button)
  • Click on edit to access special options for edcnode1 and edcnode2. Ensure that the anonymous UID and GID match the one for the grid software owner. The UID/GID mapping has to be “all_squash”, IO mode has to be “sync”. You can ignore the write delay and origin port for this example
  • Leave all other protocols deselected
  • Click update to make the changes permanent

That was it! The storage layer is now perfectly set up for the cluster nodes which I’ll discuss in a follow-on post.

openSUSE-112-64-minimal:/var/lib/xen/images/filer01 # mkfs.ext3 disk01
mke2fs 1.41.9 (22-Aug-2009)
disk01 is not a block special device.
Proceed anyway? (y,n) y
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
262144 inodes, 1048576 blocks
52428 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=1073741824
32 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: doneThis filesystem will be automatically checked every 21 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
openSUSE-112-64-minimal:/var/lib/xen/images/filer01 #mkdir tmpmnt/e2label disk01 root

mount -o loop disk01 tmpmnt/

cd tmpmnt

tar –gzip -xvf ../openfiler-2.3-x86_64.tar.gz

# only for the first time
mkdir  /m/xenkernels/openfiler

cp -a /var/lib/xen/images/filer01/tmpmnt/boot /m/xenkernels/openfiler/

cd ..
umount tmpmnt

openSUSE-112-64-minimal:/var/lib/xen/images/filer01 # ls -l /m/xenkernels/openfiler/
total 9276
-rw-r–r– 1 root root  770924 May 30  2008 System.map-2.6.21.7-3.20.smp.gcc3.4.x86_64.xen.domU
-rw-r–r– 1 root root   32220 Jun 28  2008 config-2.6.21.7-3.20.smp.gcc3.4.x86_64.xen.domU
drwxr-xr-x 2 root root    4096 Jul  1  2008 grub
-rw-r–r– 1 root root 1112062 Jul  1  2008 initrd-2.6.21.7-3.20.smp.gcc3.4.x86_64.xen.domU.img
-rw-r–r– 1 root root 5986208 May 14 18:01 vmlinux
-rw-r–r– 1 root root 1558259 Jun 28  2008 vmlinuz-2.6.21.7-3.20.smp.gcc3.4.x86_64.xen.domU

Build your own 11.2.0.2 stretched RAC part II

I promised in the introduction to introduce my lab environment in the first part of the series. So here we go…

OpenSuSE

Similar to the Fedora project, SuSE (now Novell) have come up with a community distribution some time ago which can be freely downloaded from the Internet. All these community editions give the users a glimpse at the new and upcoming Enterprise distribution, such as RHEL or SLES.

I have chosen the OpenSuSE 12.2 distribution for the host operating system. It has been updated to xen 3.4.1, kernel 2.6.31.12 and libvirt 0.7.2. These packages provide a stable execution environment of the virtual machines we are going to build. Alternative xen-based solutions have not been considered. During initial testing I found that Oracle VM 2.1.x virtual machines could not mount iSCSI targets without kernel-panicking and crashing. Citrix’s xenserver is too commercial, and the community edition is lacking needed features, and finally Virtual Iron had already been purchased by Oracle.

All kernel 2.6.18-x based distributions such as Red Hat 5.x and clones were discarded for lack of features and their age. After all, 2.6.18 has been introduced three years ago and although features were back-ported to it, xen support is way behind what I needed. The final argument in favour of OpenSuSE was the fact that SuSE provide a xen-capable 2.6.31 kernel out of the box. Although it is perfectly possibly to build one’s own xen-kernel, this is an advanced topic and not covered here. OpenSuSE also makes configuring the networking bridges very straight forward by a good integration into yast, the distributions setup and configuration tool.

The host system uses the following components:

  • Single Intel Core i7 processor
  • 24GB RAM
  • 1.5 TB hard disk space in RAID 1

The whole configuration can be rented from hosting providers, something I have chosen to do. The host has run a four node 11.2 cluster plus 2 additional virtual machines for Enterprise Manager Grid Control 11.1 without problems. To my experience the huge amount of memory is the greatest benefit of the above configuration. Allocating four GB of RAM to each VM helped a lot.

Terminology

You should be roughly familiar with the concepts behind XEN virtualisation, the following list explains the most important terminology.

  • Hypervisor The enabling technology to run virtual machines. The hypervisor used in this document is the xen hypervisor.
  • dom0 The dom(ain) 0 is the name for the host. The dom0 has full access to all the system’s peripherals
  • domU In Xen parlance, the domU is a virtual machine. Xen differentiates between paravirtualised and fully virtualised machines. Paravirtualisation broadly speaking offers superior performance, but requires a modified operating system. I am going to use paravirtualised domUs
  • Bridge: A (virtual) network device used for IP communication between virtual machines and the
    host

Prerequisites

Start off by installing the openSuSE 11.2 distribution, either choosing the GNOME or KDE desktop. Long years of exposure to Red Hat based systems made me chose the GNOME desktop. Once the installation has completed, start the yast administration tool and click on the “install hypervisor and tools” button. This will install the xen-aware kernel and add the necessary entry to GRUB boot loader. Once completed, reboot the server and boot the xen kernel. You don’t need to configure any network bridges at this stage, even though yast prompts you to do so.

Networking on the dom0

RAC requires at least 2 NICs per cluster nodes with fibre channel connectivity. In our example I am going to use iSCSI targets for storage, provided by the OpenFiler community edition. It is good practice to separate storage communication from any other communication, the same as with the cluster interconnect. Therefore, a third bridge will be used. Production setups would of course use a different setup, but as iSCSI serves the purpose quite well I decided to implement it. Also, a production cluster would feature redundancy everywhere, including NICs and HBAs. Remember that redundancy can prevent outages!

The communication between the cluster nodes will be channeled over virtual switches, so called bridges. It used to be quite difficult to set up a network bridge for XEN, but openSuSE’s yast configuration
tool makes this quite simple. My host has the following bridges configured:

  • br0 This is the only bridge that has a physical interface bridged, normally eth0 or bond0. It won’t be used for the cluster and is used purely to allow my ssh traffic coming in. If not yet configured,
  • br1 I used br1 as a host only network for the public cluster communication. It does not have a bridged physical interface
  • br2 This is in use for the private cluster interconnect. This bridge doesn’t have a physical NIC configured
  • br3 Finally this bridge will be used to allow iSCSI communication between the filers and the cluster nodes. Neither does this have a physical NIC configured

I said a number of times that configuring a bridge was quite tedious, and for some other distributions it still is. It requires quite a bit of knowledge of the bridge-utils package and the naming conventions for virtual and physical network interfaces in XEN. To configure a bridge in OpenSuSE, start yast, and click on the “Network Settings” icon to start the network configuration.

The configuration tool will load the current network configuration. Bridge br0 should be configured to bridge the public interface name, usually eth0. All other bridges should not bridge physical devices, effectively making them host-only. If you haven’t configured a network bridge when you installed the xen hypervisor and tools, it’s time to do so now. Identify your external networking device in the list of devices shown on the “Overview” page. Take note of all settings such as IP address, netmask, gateway, MTU, routes, etc. You can get this information by selecting your external NIC and clicking on the “Edit” button.

You should see a Network Bridge entry in the list of interfaces, which probably uses DHCP. Select it and click on “Edit”. Enter all the details you just copied from your actual physical NIC and ensure that under tab “Bridged Devices” that interface is listed. Click on “Next”. Confirm the warning that a device is already configured with these settings. This will effectively deconfigure the physical device and replace it with the bridge.

Adding the host-only bridges is easier. Select the “Add” option next, and on the following screen ensure to have selected “Bridge” as the device type. The configuration name will be set correctly, don’t change it unless you know what you are doing. In the following Network Card Setup screen, assign a static IP address, a subnet, and optionally ahostname. I left the hostname blank for all but the public bridge br0.

Finish the configuration assistant. Before restarting the network ensure you have an alternative means of getting to your machine, for example using a console. If the network is badly configured, you might be locked out.

The network setup

The following IP addresses are used for the example cluster:

IP Address Range Used For
192.168.99.50-52 The IP addresses for the web interfaces of the openFiler iSCSI “SAN”
192.168.99.53-55 The Single Client Access Name for the cluster
192.168.99.56-59 Node virtual IP addresses
192.168.100.56 and 58 Private cluster interconnect
192.168.101.56 and 58 Storage subnet for the cluster nodes
192.168.101.50-52 The IP addresses for the iSCSI interfaces of the openFiler “SAN”

Some of these addresses need to go into DNS. Edit your DNS server’s zone files and include the following to the zone’s forward lookup file:

; extended distance cluster
filer01                 IN A    192.168.99.50
filer02                 IN A    192.168.99.51
filer03                 IN A    192.168.99.52

edc-scan                IN A    192.168.99.53
edc-scan                IN A    192.168.99.54
edc-scan                IN A    192.168.99.55

edcnode1                IN A    192.168.99.56
edcnode1-vip            IN A    192.168.99.57

edcnode2                IN A    192.168.99.58
edcnode2-vip            IN A    192.168.99.59

The reverse lookup looks as follows:

; extended distance cluster
50              IN PTR          filer01.localdomain.
51              IN PTR          filer02.localdomain.
52              IN PTR          filer03.localdomain.

53              IN PTR          edc-scan.localdomain.
54              IN PTR          edc-scan.localdomain.
55              IN PTR          edc-scan.localdomain.

56              IN PTR          edcnode1.localdomain.
57              IN PTR          edcnode1-vip.localdomain.

58              IN PTR          edcnode2.localdomain.
59              IN PTR          edcnode2-vip.localdomain.

The public network maps to bridge br1 on network 192.168.99/24, the private  network is supported through br2 in the 192.168.100/24, and the storage will go through br3, using the 192.168.101/24 subnet.

Reload the DNS service now to make these changes active.

You should use the “host” utility to check if the SCAN resolves in DNS to be sure it all works.

That’s it-you successfully set up the dom0 for working with the virtual machines. Continue with the next part of the series, which is going to introduce openFiler and how to install it as a domU with minimal effort.