Search

OakieTags

Who's online

There are currently 0 users and 32 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Ksplice in action

On July 21, 2011 Oracle announced that it has aquired Ksplice. With Ksplice users can update the Linux kernel while it is running, so without a reboot or any other disruption. As of September 15, 2011 Ksplice is available, at no additional charge, to new and existing Oracle PremierSupport customers on the Unbreakable Linux Network […]

Installing Grid Infrastructure 11.2.0.3 on Oracle Linux 6.1 with kernel UEK

Installing Grid Infrastructure 11.2.0.3 on Oracle Linux 6.1

Yesterday was the big day, or the day Oracle release 11.2.0.3 for Linux x86 and x86-64. Time to download and experiment! The following assumes you have already configured RAC 11g Release 2 before, it’s not a step by step guide how to do this. I expect those to shoot out of the grass like mushrooms in the next few days, especially since the weekend allows people to do the same I did!

The Operating System

I have prepared a xen domU for 11.2.0.3, using the latest Oracle Linux 6.1 build I could find. In summary, I am using the following settings:

  • Oracle Linux 6.1 64-bit
  • Oracle Linux Server-uek (2.6.32-100.34.1.el6uek.x86_64)
  • Initially installed to use the “database server” package group
  • 3 NICs – 2 for the HAIP resource and the private interconnect with IP addresses in the ranges of 192.168.100.0/24 and 192.168.101.0/24. The public NIC is on 192.168.99.0/24
    • Node 1 uses 192.168.(99|100|101).129 for eth0, eth1 and eth2. The VIP uses 192.168.99.130
    • Node 1 uses 192.168.(99|100|101).131 for eth0, eth1 and eth2. The VIP uses 192.168.99.132
    • The SCAN is on 192.168.99.(133|134|135)
    • All naming resolution is done via my dom0 bind9 server
  • I am using a 8GB virtual disk for the operating system, and a 20G LUN for the oracle Grid and RDBMS homes. The 20G are subdivided into 2 LVMs of 10G each mounted to /u01/app/oracle and /u01/crs/11.2.0.3. Note you now seem to need 7.5 G for GRID_HOME
  • All software is owned by Oracle
  • Shared storage is provided by the xen blktap driver
    • 3 x 1G LUNs for +OCR containing OCR and voting disks
    • 1 x 10G for +DATA
    • 1 x 10G for +RECO

Configuring Oracle Linux 6.1

Installation of the operating environment is beyond the scope of this article, and it hasn’t really changed much since 5.x. All I did was to install the database server package group. I wrote this article for fans of xen-based para-virtualisation. Although initially for 6.0, it applies equally for 6.1. Here’s the xen native domU description (you can easily convert that to xenstore format using libvirt):

# cat node1.cfg
name="rac11_2_0_3_ol61_node1"
memory=4096
maxmem=8192
vcpus=4
on_poweroff="destroy"
on_reboot="restart"
on_crash="destroy"
localtime=0
builder="linux"
bootargs=""
extra=" "
disk=[
'file:/var/lib/xen/images/rac11_2_0_3_ol61_node1/disk0,xvda,w',
'file:/var/lib/xen/images/rac11_2_0_3_ol61_node1/oracle,xvdb,w',
'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/ocr1,xvdc,w!',
'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/ocr2,xvdd,w!',
'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/ocr3,xvde,w!',
'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/data1,xvdf,w!',
'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/fra1,xvdg,w!'
]
vif=[
'mac=00:16:1e:2b:1d:ef,bridge=br1',
'mac=00:16:1e:2b:1a:e1,bridge=br2',
'mac=00:16:1e:2a:1d:1f,bridge=br3',
]
bootloader = "pygrub"

Use the “xm create node1.cfg” command to start the domU. After the OS was ready I installed the following additional software to satisfy the installation requirements:

  • compat-libcap1
  • compat-libstdc++-33
  • libstdc++-devel
  • gcc-c++
  • ksh
  • libaio-devel

This is easiest done via yum and the public YUM server Oracle provides. It also has instructions on how to set your repository up.

# yum install compat-libcap1 compat-libstdc++-33 libstdc++-devel gcc-c++ ksh libaio-devel

On the first node only I wanted a VNC-like interface for a graphical installation. The older package vnc-server I loved from 5.x isn’t available anymore, the package you need is now called tigervnc-server. It also requires a new viewer to be downloaded from sourceforge. On the first node you might want to install these, unless you are brave enough to use a silent installation:

  • xorg-x11-utils
  • xorg-x11-server-utils
  • twm
  • tigervnc-server
  • xterm

Ensure that SELinux and the IPTables packages are turned off. SELinux is still configured in /etc/sysconfig/selinux, where the setting has to be permissive at least. You can use “chkconfig iptables off” to disable the firewall service at boot. Check that there are no filter rules using “iptables -L”.

I created the oracle account using these usual steps-this hasn’t change since 11.2.0.2.

A few changes to /etc/sysctl.were needed; you can copy and paste the below example and append it to your existing settings. Ensure to up the limits where you have more resources!

kernel.shmall = 4294967296
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 6815744
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_max = 1048576
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
net.ipv4.conf.eth1.rp_filter = 0
net.ipv4.conf.eth2.rp_filter = 0

Also ensure that you change the rp_filter for your private interconnect to 0 (or 2)-my devices are eth1 and eth2. This is a new requirement for reverse path filtering introduced with 11.2.0.3.

ASM “disks” must be owned by the GRID owner. The easiest way to change the permissions of the ASM disks is to create a new set of udev rules, such as the following:

# cat 61-asm.rules
 KERNEL=="xvd[cdefg]1", OWNER="oracle", GROUP="asmdba" MODE="0660"

After a quick “start_udev” as root these were applied.

Note that as per my domU config file I actually know the device names are persistent, so it was easy to come up with this solution. In real life you would use the dm-multipath package which allows setting the owner,group and permission now in /etc/multipath.conf for every ASM LUN.

There was an interesting problem initially in that kfod seemed to trigger a change of permissions back to root:disk whenever it ran. Changing the ownership back to oracle only lasted until the next execution of kfod. The only fix I could come up with involved the udev rules.

Good news for those who suffered from the multicast problem introduced in 11.2.0.2-cluvfy now knows about it and checks during the post hwos stage (I had already installed cvuqdisk):

[oracle@rac11203node1 grid]$ ./runcluvfy.sh stage -post hwos -n rac11203node1

Performing post-checks for hardware and operating system setup

Checking node reachability...
Node reachability check passed from node "rac11203node1"

Checking user equivalence...
User equivalence check passed for user "oracle"

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Node connectivity passed for subnet "192.168.99.0" with node(s) rac11203node1
TCP connectivity check passed for subnet "192.168.99.0"

Node connectivity passed for subnet "192.168.100.0" with node(s) rac11203node1
TCP connectivity check passed for subnet "192.168.100.0"

Node connectivity passed for subnet "192.168.101.0" with node(s) rac11203node1
TCP connectivity check passed for subnet "192.168.101.0"

Interfaces found on subnet "192.168.99.0" that are likely candidates for VIP are:
rac11203node1 eth0:192.168.99.129

Interfaces found on subnet "192.168.100.0" that are likely candidates for a private interconnect are:
rac11203node1 eth1:192.168.100.129

Interfaces found on subnet "192.168.101.0" that are likely candidates for a private interconnect are:
rac11203node1 eth2:192.168.101.129

Node connectivity check passed

Checking multicast communication...

Checking subnet "192.168.99.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.99.0" for multicast communication with multicast group "230.0.1.0" passed.

Checking subnet "192.168.100.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.100.0" for multicast communication with multicast group "230.0.1.0" passed.

Checking subnet "192.168.101.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.101.0" for multicast communication with multicast group "230.0.1.0" passed.

Check of multicast communication passed.
Check for multiple users with UID value 0 passed
Time zone consistency check passed

Checking shared storage accessibility...

Disk                                  Sharing Nodes (1 in count)
------------------------------------  ------------------------
/dev/xvda                             rac11203node1
/dev/xvdb                             rac11203node1
/dev/xvdc                             rac11203node1
/dev/xvdd                             rac11203node1
/dev/xvde                             rac11203node1
/dev/xvdf                             rac11203node1
/dev/xvdg                             rac11203node1

Shared storage check was successful on nodes "rac11203node1"

Post-check for hardware and operating system setup was successful.

As always, I tried to fix as many problems before invoking runInstaller as possible. The “-fixup” option to runcluvfy is again very useful. I strongly recommend running the fixup script prior to executing the OUI binary.

The old trick to remove /etc/ntp.conf causes the NTP check to complete ok, in which case you are getting the ctsd service for time synchronisation. You should not do this in production-consistent times in the cluster are paramount!

I encountered an issue with the check for free space later in the installation during my first attemps. OUI wants 7.5G for GRID_HOME, even though the installation “only” took around 3 in the end. I exported TMP and TEMP to point to my 10G mount point to avoid this warning:

$ export TEMP=/u01/crs/temp
$ export TMP=/u01/crs/temp
$ ./runInstaller

The installation procedure for Grid Infrastructure 11.2.0.3 is almost exactly the same as for 11.2.0.2, except for the option to change the AU size for the initial disk group you create:


Once you have completed the wizard, it’s time to hit the “install” button. The magic again happens in the root.sh file, or rootupgrade.sh if you are upgrading. I included the root.sh output so you have something to compare against:

Performing root user operation for Oracle 11g

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME=  /u01/crs/11.2.0.3

Enter the full pathname of the local bin directory: [/usr/local/bin]: Creating y directory...
Copying dbhome to y ...
Copying oraenv to y ...
Copying coraenv to y ...

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/crs/11.2.0.3/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
OLR initialization - successful
root wallet
root wallet cert
root cert export
peer wallet
profile reader wallet
pa wallet
peer wallet keys
pa wallet keys
peer cert request
pa cert request
peer cert
pa cert
peer root cert TP
profile reader root cert TP
pa root cert TP
peer pa cert TP
pa peer cert TP
profile reader pa cert TP
profile reader peer cert TP
peer user cert
pa user cert
Adding Clusterware entries to upstart
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac11203node1'
CRS-2676: Start of 'ora.mdnsd' on 'rac11203node1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac11203node1'
CRS-2676: Start of 'ora.gpnpd' on 'rac11203node1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac11203node1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac11203node1'
CRS-2676: Start of 'ora.gipcd' on 'rac11203node1' succeeded
CRS-2676: Start of 'ora.cssdmonitor' on 'rac11203node1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac11203node1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac11203node1'
CRS-2676: Start of 'ora.diskmon' on 'rac11203node1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac11203node1' succeeded

ASM created and started successfully.

Disk Group OCR created successfully.

clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4256: Updating the profile
Successful addition of voting disk 1621f2201ab94f32bf613b17f62982b0.
Successful addition of voting disk 337a3f0b8a2d4f7ebff85594e4a8d3cd.
Successful addition of voting disk 3ae328cce2b94f3bbfe37b0948362993.
Successfully replaced voting disk group with +OCR.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   1621f2201ab94f32bf613b17f62982b0 (/dev/xvdc1) [OCR]
2. ONLINE   337a3f0b8a2d4f7ebff85594e4a8d3cd (/dev/xvdd1) [OCR]
3. ONLINE   3ae328cce2b94f3bbfe37b0948362993 (/dev/xvde1) [OCR]
Located 3 voting disk(s).
CRS-2672: Attempting to start 'ora.asm' on 'rac11203node1'
CRS-2676: Start of 'ora.asm' on 'rac11203node1' succeeded
CRS-2672: Attempting to start 'ora.OCR.dg' on 'rac11203node1'
CRS-2676: Start of 'ora.OCR.dg' on 'rac11203node1' succeeded
CRS-2672: Attempting to start 'ora.registry.acfs' on 'rac11203node1'
CRS-2676: Start of 'ora.registry.acfs' on 'rac11203node1' succeeded
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

That’s it! After returning to the OUI screen you run the remaing assistants and finally are rewarded with the success message:

Better still, I could now log in to SQL*Plus and was rewarded with the new version:

$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Sat Sep 24 22:29:45 2011

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL>

Summary

You might remark that in the output there has only ever been one node referenced. That is correct-my lab box has limited resources and I’d like to test the addNode.sh script for each new release so please be patient! I’m planning an article about upgrading to 11.2.0.3 soon, as well as the addition of a node. One thing I noticed was the abnormally high CPU usage for the CSSD processes: ocssd.bin, cssdagent and cssdmonitor-something I find alarming at the moment.

top - 22:53:19 up  1:57,  5 users,  load average: 5.41, 4.03, 3.77
Tasks: 192 total,   1 running, 191 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3%us,  0.2%sy,  0.0%ni, 99.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4102536k total,  3500784k used,   601752k free,    59792k buffers
Swap:  1048568k total,     4336k used,  1044232k free,  2273908k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27646 oracle    RT   0 1607m 119m  53m S 152.0  3.0  48:57.35 /u01/crs/11.2.0.3/bin/ocssd.bin
27634 root      RT   0  954m  93m  55m S 146.0  2.3  31:45.50 /u01/crs/11.2.0.3/bin/cssdagent
27613 root      RT   0  888m  91m  55m S 96.6  2.3  5124095h /u01/crs/11.2.0.3/bin/cssdmonitor
28110 oracle    -2   0  485m  14m  12m S  1.3  0.4   0:34.65 asm_vktm_+ASM1
28126 oracle    -2   0  499m  28m  15m S  0.3  0.7   0:04.52 asm_lms0_+ASM1
28411 root      RT   0  500m 144m  59m S  0.3  3.6  5124095h /u01/crs/11.2.0.3/bin/ologgerd -M -d /u01/crs/11.2.0.3/crf/db/rac11203node1
32394 oracle    20   0 15020 1300  932 R  0.3  0.0  5124095h top
1 root      20   0 19336 1476 1212 S  0.0  0.0   0:00.41 /sbin/init

...

11.2.0.2 certainly didn’t use that much CPU across 4 cores…

Update: I have just repeated the same installation on VirtualBox 4.1.2 with less potent hardware, and funny enough the CPU problem has disappeared. How is that possible? I need to understand more, and maybe update the XEN host to something more recent.

Direct I/O for Solaris benchmarking

ZFS doesn’t have direct I/O.
Solaris dd doesn’t have a iflag= direct.

Thus for I/O benchmarking it requires mounting and umounting the file system between tests for UFS and for ZFS exporting and re-importing the pools.

But there is a trick. Reading off of /dev/rdsk will by pass the cache.

Here is a simple piece of code that will benchmark the disks. The code was put together by George Wilson and Jeff Bonwick (I beleive)

#!/bin/ksh
disks=`format < /dev/null | grep c.t.d | nawk '{print $2}'`
getspeed1()
{
       ptime dd if=/dev/rdsk/${1}s0 of=/dev/null bs=64k count=1024 2>&1 |
           nawk '$1 == "real" { printf("%.0f\n", 67.108864 / $2) }'
}
getspeed()
{
       for iter in 1 2 3
       do
               getspeed1 $1
       done | sort -n | tail -2 | head -1
}
for disk in $disks
do
       echo $disk `getspeed $disk` MB/sec
done

 

11.2.0.3 Patch Set For Oracle Database Server

Just a quick post that the 11.2.0.3 patch set for Oracle Database Server has been released for x86 and x86-64 platforms (other ports will soon follow). The patchset number is 10404530 and is available for download from My Oracle Support.

Also be sure to check out the 11.2.0.3 New Features Guide.

Quick Reference

.. and a temporary one, to boot:

If you’ve seen anything of the latest Oracle offering – the database appliance, you might want to read Alex Gorbachev’s summary

 

Oracle 11.2.0.3-can’t be long now!

Update: well it’s out, actually. See the comments below. However the certification matrix hasn’t been updated so it’s anyone’s guess if Oracle/Red Hat 6 are certified at this point in time.

Tanel Poder has already announced it a few days ago, but 11.2.0.3 must be ready for release very soon. It has even been spotted in the “lastet patchset” page on OTN, only to be removed quickly. After another tweet came out from Laurent Schneider, it was time to investigate what’s new. The easiest way is to point your browser to tahiti.oracle.com and type “11.2.0.3” into the search box. You are going to find a wealth of new information!

As a RAC person by heart I am naturally interested in RAC features first. The new features I spotted in the Grid Infrastructure installation guide for Linux are listed here:

http://download.oracle.com/docs/cd/E11882_01/install.112/e22489/whatsnew.htm#CEGJBBBB

Additional information for this article was taken from the “New Features” guide:

http://download.oracle.com/docs/cd/E11882_01/server.112/e22487/chapter1_11203.htm#NEWFTCH1-c

So the question is-what’s in for us?

ASM Cluster File System

As I expected, there was support for ACFS and ADVM for Oracle’s own kernel. This has been overdue for a while. I remember how surprised I was when I installed RAC on Oracle Linux 5.5 with the UEK kernel only to see that infamous “…not supported” output when the installer probed the kernel version. Supported kernels are Linux kernels UEK5-2.6.32-100.34.1 and subsequent updates to 2.6.32-100 kernels for Oracle Linux kernels OL5 and OL6.

A big surprise is support for ACFS for SLES-I though that was pretty much dead in the water after all that messing around from Novell. ACFS has always worked on SLES 10 up to SP3, but it did never for SLES 11. The requirement is SLES 11 SP1, and it has to be 64bit.

There are quite a few additional changes to ACFS. For example, it’s now possible to use ACFS replication and tagging on Windows.

Upgrade

If one of the nodes in the cluster being upgraded with the rootupgrade.sh script fails during the execution of said script, the operation can be completed with the new “force” flag.

Random Bits

I found the following note on MOS regarding the time zone file: Actions For DST Updates When Upgrading To Or Applying The 11.2.0.3 Patchset [ID 1358166.1] I suggest you have a look at that note, as it mentions a new pre-upgrade script you need to download and corrective actions for 11.2.0.1 and 11.2.0.2. I’m sure it’s going to be mentioned in the 11.2.0.3 patch readme as well.

There are also changes expected with the dreaded mutex problem in busy systems, MOS note WAITEVENT: “library cache: mutex X” [ID 727400.1] lists 11.2.0.3 as the release where many of the problems related to this are fixed. Time will tell if they are…

Further enhancements are focused on Warehouse Builder, and XML. SQL apply and the log miner have also been enhanced which is good news for users of Streams and logical standby databases

Summary

It is much to early to say anything else about the patchset. Quite a few important documents don’t have a “new features” section yet. That includes the Real Application Clusters Administration and Deployment guide as well which I’ll cover as soon as it’s out. From a quick glance at the still unreleased patchset it seems it’s less of a radical change than 11.2.0.2 was which is a good thing. Unfortunately the certification matrix hasn’t been updated yet, I am very keen to see support for Oracle/Red Hat Linux 6.x.

Friday Philosophy – Human Tuning Issues

Oracle Tuning is all about technical stuff. It’s perhaps the most detail-focused and technical aspect of Oracle Administration there is. Explain Plans, Statistics, the CBO, database design, Physical implementation, the impact of initialisation variables, subquery factoring, sql profiles, pipeline functions,… To really get to grips with things you need to do some work with 10046 and 10053 traces, block dumps, looking at latching and queueing…

But I realised a good few years ago that there is another, very important aspect and one that is very often overlooked. People and their perception. The longer I am on an individual site, the more significant the People side of my role is likely to become.

Here is a little story for you. You’ll probably recognise it, it’s one that has been told (in many guises) before, by several people – it’s almost an IT Urban Myth.

When I was but a youth, not long out of college, I got a job with Oracle UK (who had a nice, blue logo back then) as a developer on a complex and large hospital system. We used Pyramid hardware if I remember correctly. When the servers were put in place, only half the memory boards and half the CPU boards were initiated. We went live with the system like that. Six months later, the users had seen the system was running quite a bit slower than before and started complaining. An engineer came in and initiated those other CPU boards and Memory boards. Things went faster and all the users were happy. OK, they did not throw a party but they stopped complaining. Some even smiled.

I told you that you would recognise the story. Of course, I’m now going to go on about the dishonest vendor and what was paid for this outrageous “tuning work”. But I’m not. This hobbling of the new system was done on purpose and it was done at the request of “us”, the application developers. Not the hardware supplier. It was done because some smart chap knew that as more people used the system and more parts of it were rolled out, things would slow down and people would complain. So some hardware was held in reserve so that the whole system could have a performance boost once workload had ramped up and people would be happy. Of course, the system was now only as fast as if it had been using all the hardware from day one – but the key difference was that rather than having unhappy users as things “were slower than 6 months ago”, everything was performing faster than it had done just a week or two ago, and users were happy due to the recent improvement in response time. Same end point from a performance perspective, much happy end point for the users.

Another aspect of this Human side of Tuning is unstable performance. People get really unhappy about varying response times. You get this sometimes with Parallel Query when you allow Oracle to reduce the number of parallel threads used depending on the workload on the server {there are other causes of the phenomena such as clashes with when stats are gathered or just random variation in data volumes}. So sometimes a report comes back in 30 minutes, sometimes it comes back in 2 hours. If you go from many parallel threads to single threaded execution it might be 4 hours. That really upsets people. In this situation you probably need to look at if you can fix the degree of parallelism that gives a response time that is good enough for business reasons and can always be achieved. OK, you might be able to get that report out quicker 2 days out of 5, but you won’t have a user who is happy on 3 days and ecstatic with joy on the 2 days the report is early. You will have a user who is really annoyed 3 days and grumbling about “what about yesterday!” on the other 2 days.

Of course this applies to screens as well. If humans are going to be using what I am tuning and would be aware of changes in performance (ie the total run time is above about 0.2 seconds) I try to aim for stable and good performance, not “outright fastest but might vary” performance. Because we are all basically grumpy creatures. We accept what we think cannot be changed but if we see something could be better, we want it!

People are happiest with consistency. So long as performance is good enough to satisfy the business requirements, generally speaking you just want to strive to maintain that level of performance. {There is one strong counter-argument in that ALL work on the system takes resource, so reducing a very common query or update by 75% frees up general resource to aid the whole system}.

One other aspect of Human Tuning I’ll mention is one that UI developers tend to be very attuned to. Users want to see something happening. Like a little icon or a message saying “processing” followed soon by another saying “verifying” or something like that. It does not matter what the messages are {though spinning hour glasses are no longer acceptable}, they just like to see that stuff is happening. So, if a screen can’t be made to come back in less than a small number of seconds, stick up a message or two as it progresses. Better still, give them some information up front whilst the system scrapes the rest together. It won’t be faster, it might even be slower over all, but if the users are happier, that is fine. Of course, Oracle CBO implements this sort of idea when you specify “first_n_rows” as the optimizer goal as opposed to “all_rows”. You want to get some data onto an interactive screen as soon as possible, for the users to look at, rather than aim for the fastest overall response time.

After all, the defining criteria of IT system success is that the users “are happy” -ie accept the system.

This has an interesting impact on my technical work as a tuning “expert”. I might not tune up a troublesome report or SQL statement as much as I possibly can. I had a recent example of this where I had to make some batch work run faster. I identified 3 or 4 things I could try and using 2 of them I got it to comfortably run in the window it had to run in {I’m being slightly inaccurate, it was now not the slowest step and upper management focused elsewhere}. There was a third step I was pretty sure would also help. It would have taken a little more testing and implementing and it was not needed right now. I documented it and let the client know about it, that there was more that could be got. But hold it in reserve because you have other things to do and, heck, it’s fast enough. {I should make it clear that the system as a whole was not stressed at all, so we did not need to reduce system load to aid all other things running}. In six months the step in the batch might not be fast enough or, more significantly, might once more be the slowest step and the target for a random management demand for improvement – in which case take the time to test and implement item 3. (For those curious people, it was to replace a single merge statement with an insert and an update, both of which could use different indexes).

I said it earlier. Often you do not want absolute performance. You want good-enough, stable performance. That makes people happy.

SIOUG 2011 Conference

Sandwiched in the middle of a busy month or so for me - Cary Millsap's seminar the other week (more on that later) and Openworld from the end of next week - was my long-awaited trip to the Slovenian Oracle User Group Conference  in Portoroz. I've been promising Joze Senegacnik I would go to Slovenia for several years now because with it being so close to Openworld, it's quite a few days away from client work. I finally made it and I'm delighted I did. It's a shame that I couldn't make it for the weekend before and maybe got to fly in Joze's plane, but the conference venue, atmosphere and attendees made up for it.

I arrived fairly late on Monday evening via a car from Trieste airport that was laid on by the user group and very much appreciated and the minor hiccup of being delivered to the wrong hotel was soon sorted by the concierge who ran me over to the correct one, where the food might have finished but the wine certainly hadn't! I met and had fun with some Bulgarian visitors, including the soon-to-be-married Svetoslav Gyurov (@sgyurov) who I'd recently met on Twitter and the Finns, who always seem to end up everywhere ;-)

The next day I'd planned to give the "Statistics on Partitioned Objects" presentation over two 45 minute sessions and also agreed to help out by standing in as a replacement for a late cancellation in the slot following those two which meant I'd have to talk for a couple of hours in the afternoon. On the basis that people would probably be bored to tears of me by then, I picked the "How I Learned To Love Pictures" which is always fun.

The morning was spent finalising the demos but when I tried getting the 'How I Learned To Love Pictures' to run, they were far too temperamental and I decided that it might be safer to make a late change to a Real Time SQL Monitoring presentation that I've given a few times and is much easier to do because it only requires a few Active reports. Best of all, it was still about performance and pictures, so I didn't feel I would be short-changing people, as long as I made it clear what I was going to do!

I think the presentations went pretty well. Statistics is a pretty dry subject to listen to someone talk about for an hour and a half straight after lunch, but the fact that the majority seemed to come back after the coffee break was encouraging! Invitations to speak from both the Bulgarian and Serbian User Groups was probably another good sign.

One slight disappointment was that, because I switched to SQL Monitoring at the last moment and didn't tell Joze, I covered a small part of the material for his next presentation on "How to get the best from the Cost-Based Optimiser" but I don't think it affected his presentation too much as he was covering a wide variety of subjects and I thought it was a great reminder of some of the new features to consider, including SQL Plan Management, which is what I'll be talking about at Openworld.

By now I was really tired so had a whale of a time with a small cold beer and a couple of smokes on the balcony of my room, watching an amazing sunset. Which put me right in the mood for dinner and some partying but, whilst the company may have been excellent again, their partying skills were shabby as we handed over something like 12 unused free drinks vouchers to another lucky attendee and then everyone retired to their rooms to finish off their presentations over bottles of water! Sad, really sad.

It was such a flying visit that the next morning I only really had time to catch up on some work before Joze drove Debra and I to Trieste airport for our Ryanair flight home (which wasn't too bad really).

I had a great time, was treated extremely well and look forward to going back - Portoroz is a lovely place and the weather was beautiful, which made such a difference from the likes of London or Birmingham! I think I may have also agreed to speak at the upcoming Bulgarian conference too ;-)

Flash Cache

Have you ever heard the suggestion that if you see time lost on event write complete waits you need to get some faster discs.
So what’s the next move when you’ve got 96GB of flash cache plugged into your server (check the parameters below) and see time lost on event write complete waits: flash cache ?

db_flash_cache_file           /flash/oracle/flash.dat
db_flash_cache_size           96636764160

Here’s an extract from a standard 11.2.0.2 AWR report:

                           Total
Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms  <=1s   >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
write complete waits: flas    32                     3.1              21.9  75.0


                           Waits
                           64ms
Event                      to 2s <32ms <64ms <1/8s <1/4s <1/2s   <1s   <2s  >=2s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
write complete waits: flas    11   3.1   3.1   3.1   3.1   6.3   6.3  12.5  62.5


                           Waits
                            4s
Event                      to 2m   <2s   <4s   <8s  <16s  <32s  < 1m  < 2m  >=2m
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
write complete waits: flas    20  37.5  18.8   9.4  12.5   9.4   9.4   3.1

It’s interesting to see the figures for single and multiblock reads from flash cache. The hardware is pretty good at single block reads – but there’s a strange pattern to the multiblock read times. The first set of figures is from the Top N section of the AWR, the second set is from the event histogram sections (the 11.2 versions are more informative than the 11.1 and 10.2 – even though the arithemtic seems a little odd at the edges). Given the number of reads from flash cache in the hour the tiny number of write waits isn’t something I’m going to worry about just yet – my plan is to get rid of a couple of million flash first. (Most of the read by other session waits are waiting on the flash cache read as well – so I’ll be aiming at two birds with one stone.)

                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
db flash cache single block ph    3,675,650       5,398      1   35.2 User I/O
DB CPU                                            4,446          29.0
read by other session             1,092,573       1,407      1    9.2 User I/O
direct path read                      6,841       1,371    200    8.9 User I/O
db file sequential read             457,099       1,046      2    6.8 User I/O




                                                    % of Waits
                                 -----------------------------------------------
                           Total
Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms  <=1s   >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
db flash cache multiblock  22.1K   7.6  12.3   7.7   6.0   8.8  24.7  32.8
db flash cache single bloc 3683K  66.6  22.6   3.6   2.4   3.9    .8    .0



                           Waits
                           64ms
Event                      to 2s <32ms <64ms <1/8s <1/4s <1/2s   <1s   <2s  >=2s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
db flash cache multiblock   7255  67.2  21.5  10.4    .9    .0
db flash cache single bloc  1587 100.0    .0    .0    .0    .0    .0


Answers to the following question on a postcard please: Why do we get a “double hump” on the distribution of multiblock reads ?

Oracle Database Appliance–Bringing Exadata To The Masses. And, No More Patching!

I just googled ‘Oracle Database Appliance’ +Exadata and got offered 446,000 goodies to click on. There are only two problems with that:

1.     Exadata is not an appliance.

2.     Oracle Database Appliance has no Exadata software in it.

Get Out Of Jail Free Card
In this Computerworld article, Mark Hurd is quoted as saying the Oracle Database Appliance  brings “the benefits of Exadata to entry-level systems.” So, I googled ‘this brings the benefits of Exadata to entry level systems’ and was offered 36,300 nuggets of wisdom to read.

I have only one thing to say about this big news. There is a huge difference between a pre-configured system and an appliance.

I’ve never had to apply a patch to my toaster.  The Oracle Database Appliance is not an appliance, it is a pre-configured Real Application Clusters system.

SMB (Small/Medium Business) + Real Application Clusters? Who is handing out the get out of jail free cards?  Who briefed Oracle’s Executives on what this thing actually is before they started talking about it?

Filed under: oracle