One of the tasks I am performing quite regularly is to deploy Oracle software in form of an RPM. In a previous post I described how this proces could work, based on a post by Frits Hoogland.
Employing the same method, I ran into problems with Oracle 11.2.0.x clients. A few facts to start with:
The problem described here is most likely applicable to other Oracle clients as well although I haven’t verified that.
The problem
While I was performing a three day seminar recently in Switzerland I came across this new option in cluvfy.
Normally you’d run cluvfy in preparation of the installation of Grid Infrastructure or a set of RAC binaries to ensure everything is ready for the next step in the RAC install process. Beginning with 11.2.0.3, there is another option that’s been sneaked in without too much advertisement: the healthcheck.
Part of the “comp” checks, it takes the following options:
cluvfy comp healthcheck [-collect {cluster|database}] [-db db_unique_name] [-bestpractice|-mandatory] [-deviations] [-html] [-save [-savedir directory_path]
The most extensive report is run without any options, as shown in the appendix (the output is too long to display at this stage of the post) You have the following options:
One of the promises from Oracle for OEM 12c was improved support for Oracle RAC One Node. I have spent quite a bit of time researching RON, and wrote a little article in 2 parts about it which you can find here:
One of my complaints with it was the limited support in OEM 11.1. At the time I was on a major consolidation project, which would have used OEM for management of the database.
OEM 11.1
This post is about a potential pitfall when migrating from 11.2.0.x to the next point release. I stumbled over problem this one on a two node cluster.
The operating system is Oracle Linux 5.5 running 11.2.0.2.3 and I wanted to go to 11.2.0.3.0. As you know, Grid Infrastructure upgrades are out-of-place, in other words require a separate Oracle home. This is also one of the reasons I wouldn’t want less than 20G on a non-lab like environment for the Grid Infrastructure mount points …
Now when you are upgrading from 11.2.0.x to 11.2.0.3 you need to apply a one-off patch, but the correct one! Search for patch number 12539000 (11203:ASM UPGRADE FAILED ON FIRST NODE WITH ORA-03113) and apply the one that matches your version-and pay attention to these PSUs! There is the obvious required opatch update to be performed before again as well.
I have run into an interesting problem with my Red Hat 5.5 installation. Naively I assumed that ext4 has been around for a long time it would be stable. For a test I performed for a friend, I created my database files on a file system formatted with ext4 and mounted it the same way I would have mounted an ext3 file system:
$ mount | grep ext4 /dev/mapper/mpath43p1 on /u02/oradata type ext4 (rw)
Now when I tried to create a data file within a tablespace of a certain size, I got block corruption which I found very interesting. My first thought was: you must have a corruption of the file system. So I shut down all processes accessing /u02/oradata and gave the file system a thorough checking.
As part of a server move from one data centre to another I enjoyed working in the depths of Clusterware. This one has been a rather simple case though: the public IP addresses were the only part of the package to change: simple. One caveat though was the recreation of the OCR disk group I am using for the OCR and 3 copies of the voting file. I decided to reply on the backups I took before the server move.
Once the kit has been rewired in the new data centre, it was time to get active. The /etc/multipath.conf file had to be touched to add the new LUNs for my +OCR disk group. I have described the processes in a number of articles, for example here:
http://martincarstenbach.wordpress.com/2011/01/14/adding-storage-dynamic...
A few facts before we start:
As I have hinted at during my last post about installing Oracle 11.2.0.3 on Oracle Linux 6.1 with Kernel UEK, I have planned another article about adding a node to a cluster.
I deliberately started the installation of my RAC system with only one node to allow my moderately spec’d hardware to deal with a second cluster node. In previous versions of Oracle there was a problem with node additions: the $GRID_HOME/oui/bin/addNode.sh script did pre-requisite checks that used to fail when you had used ASMLib. Unfortuntely, due to my setup I couldn’t test if that was solved (I didn’t use ASMLib).
Cluvfy
Installing Grid Infrastructure 11.2.0.3 on Oracle Linux 6.1
Yesterday was the big day, or the day Oracle release 11.2.0.3 for Linux x86 and x86-64. Time to download and experiment! The following assumes you have already configured RAC 11g Release 2 before, it’s not a step by step guide how to do this. I expect those to shoot out of the grass like mushrooms in the next few days, especially since the weekend allows people to do the same I did!
The Operating System
I have prepared a xen domU for 11.2.0.3, using the latest Oracle Linux 6.1 build I could find. In summary, I am using the following settings:
Configuring Oracle Linux 6.1
Installation of the operating environment is beyond the scope of this article, and it hasn’t really changed much since 5.x. All I did was to install the database server package group. I wrote this article for fans of xen-based para-virtualisation. Although initially for 6.0, it applies equally for 6.1. Here’s the xen native domU description (you can easily convert that to xenstore format using libvirt):
# cat node1.cfg name="rac11_2_0_3_ol61_node1" memory=4096 maxmem=8192 vcpus=4 on_poweroff="destroy" on_reboot="restart" on_crash="destroy" localtime=0 builder="linux" bootargs="" extra=" " disk=[ 'file:/var/lib/xen/images/rac11_2_0_3_ol61_node1/disk0,xvda,w', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_node1/oracle,xvdb,w', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/ocr1,xvdc,w!', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/ocr2,xvdd,w!', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/ocr3,xvde,w!', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/data1,xvdf,w!', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/fra1,xvdg,w!' ] vif=[ 'mac=00:16:1e:2b:1d:ef,bridge=br1', 'mac=00:16:1e:2b:1a:e1,bridge=br2', 'mac=00:16:1e:2a:1d:1f,bridge=br3', ] bootloader = "pygrub"
Use the “xm create node1.cfg” command to start the domU. After the OS was ready I installed the following additional software to satisfy the installation requirements:
This is easiest done via yum and the public YUM server Oracle provides. It also has instructions on how to set your repository up.
# yum install compat-libcap1 compat-libstdc++-33 libstdc++-devel gcc-c++ ksh libaio-devel
On the first node only I wanted a VNC-like interface for a graphical installation. The older package vnc-server I loved from 5.x isn’t available anymore, the package you need is now called tigervnc-server. It also requires a new viewer to be downloaded from sourceforge. On the first node you might want to install these, unless you are brave enough to use a silent installation:
Ensure that SELinux and the IPTables packages are turned off. SELinux is still configured in /etc/sysconfig/selinux, where the setting has to be permissive at least. You can use “chkconfig iptables off” to disable the firewall service at boot. Check that there are no filter rules using “iptables -L”.
I created the oracle account using these usual steps-this hasn’t change since 11.2.0.2.
A few changes to /etc/sysctl.were needed; you can copy and paste the below example and append it to your existing settings. Ensure to up the limits where you have more resources!
kernel.shmall = 4294967296 kernel.shmmni = 4096 kernel.sem = 250 32000 100 128 fs.file-max = 6815744 net.core.rmem_default = 262144 net.core.wmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_max = 1048576 fs.aio-max-nr = 1048576 net.ipv4.ip_local_port_range = 9000 65500 net.ipv4.conf.eth1.rp_filter = 0 net.ipv4.conf.eth2.rp_filter = 0
Also ensure that you change the rp_filter for your private interconnect to 0 (or 2)-my devices are eth1 and eth2. This is a new requirement for reverse path filtering introduced with 11.2.0.3.
ASM “disks” must be owned by the GRID owner. The easiest way to change the permissions of the ASM disks is to create a new set of udev rules, such as the following:
# cat 61-asm.rules KERNEL=="xvd[cdefg]1", OWNER="oracle", GROUP="asmdba" MODE="0660"
After a quick “start_udev” as root these were applied.
Note that as per my domU config file I actually know the device names are persistent, so it was easy to come up with this solution. In real life you would use the dm-multipath package which allows setting the owner,group and permission now in /etc/multipath.conf for every ASM LUN.
There was an interesting problem initially in that kfod seemed to trigger a change of permissions back to root:disk whenever it ran. Changing the ownership back to oracle only lasted until the next execution of kfod. The only fix I could come up with involved the udev rules.
Good news for those who suffered from the multicast problem introduced in 11.2.0.2-cluvfy now knows about it and checks during the post hwos stage (I had already installed cvuqdisk):
[oracle@rac11203node1 grid]$ ./runcluvfy.sh stage -post hwos -n rac11203node1 Performing post-checks for hardware and operating system setup Checking node reachability... Node reachability check passed from node "rac11203node1" Checking user equivalence... User equivalence check passed for user "oracle" Checking node connectivity... Checking hosts config file... Verification of the hosts config file successful Node connectivity passed for subnet "192.168.99.0" with node(s) rac11203node1 TCP connectivity check passed for subnet "192.168.99.0" Node connectivity passed for subnet "192.168.100.0" with node(s) rac11203node1 TCP connectivity check passed for subnet "192.168.100.0" Node connectivity passed for subnet "192.168.101.0" with node(s) rac11203node1 TCP connectivity check passed for subnet "192.168.101.0" Interfaces found on subnet "192.168.99.0" that are likely candidates for VIP are: rac11203node1 eth0:192.168.99.129 Interfaces found on subnet "192.168.100.0" that are likely candidates for a private interconnect are: rac11203node1 eth1:192.168.100.129 Interfaces found on subnet "192.168.101.0" that are likely candidates for a private interconnect are: rac11203node1 eth2:192.168.101.129 Node connectivity check passed Checking multicast communication... Checking subnet "192.168.99.0" for multicast communication with multicast group "230.0.1.0"... Check of subnet "192.168.99.0" for multicast communication with multicast group "230.0.1.0" passed. Checking subnet "192.168.100.0" for multicast communication with multicast group "230.0.1.0"... Check of subnet "192.168.100.0" for multicast communication with multicast group "230.0.1.0" passed. Checking subnet "192.168.101.0" for multicast communication with multicast group "230.0.1.0"... Check of subnet "192.168.101.0" for multicast communication with multicast group "230.0.1.0" passed. Check of multicast communication passed. Check for multiple users with UID value 0 passed Time zone consistency check passed Checking shared storage accessibility... Disk Sharing Nodes (1 in count) ------------------------------------ ------------------------ /dev/xvda rac11203node1 /dev/xvdb rac11203node1 /dev/xvdc rac11203node1 /dev/xvdd rac11203node1 /dev/xvde rac11203node1 /dev/xvdf rac11203node1 /dev/xvdg rac11203node1 Shared storage check was successful on nodes "rac11203node1" Post-check for hardware and operating system setup was successful.
As always, I tried to fix as many problems before invoking runInstaller as possible. The “-fixup” option to runcluvfy is again very useful. I strongly recommend running the fixup script prior to executing the OUI binary.
The old trick to remove /etc/ntp.conf causes the NTP check to complete ok, in which case you are getting the ctsd service for time synchronisation. You should not do this in production-consistent times in the cluster are paramount!
I encountered an issue with the check for free space later in the installation during my first attemps. OUI wants 7.5G for GRID_HOME, even though the installation “only” took around 3 in the end. I exported TMP and TEMP to point to my 10G mount point to avoid this warning:
$ export TEMP=/u01/crs/temp $ export TMP=/u01/crs/temp $ ./runInstaller
The installation procedure for Grid Infrastructure 11.2.0.3 is almost exactly the same as for 11.2.0.2, except for the option to change the AU size for the initial disk group you create:
Once you have completed the wizard, it’s time to hit the “install” button. The magic again happens in the root.sh file, or rootupgrade.sh if you are upgrading. I included the root.sh output so you have something to compare against:
Performing root user operation for Oracle 11g The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /u01/crs/11.2.0.3 Enter the full pathname of the local bin directory: [/usr/local/bin]: Creating y directory... Copying dbhome to y ... Copying oraenv to y ... Copying coraenv to y ... Creating /etc/oratab file... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Using configuration parameter file: /u01/crs/11.2.0.3/crs/install/crsconfig_params Creating trace directory User ignored Prerequisites during installation OLR initialization - successful root wallet root wallet cert root cert export peer wallet profile reader wallet pa wallet peer wallet keys pa wallet keys peer cert request pa cert request peer cert pa cert peer root cert TP profile reader root cert TP pa root cert TP peer pa cert TP pa peer cert TP profile reader pa cert TP profile reader peer cert TP peer user cert pa user cert Adding Clusterware entries to upstart CRS-2672: Attempting to start 'ora.mdnsd' on 'rac11203node1' CRS-2676: Start of 'ora.mdnsd' on 'rac11203node1' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'rac11203node1' CRS-2676: Start of 'ora.gpnpd' on 'rac11203node1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac11203node1' CRS-2672: Attempting to start 'ora.gipcd' on 'rac11203node1' CRS-2676: Start of 'ora.gipcd' on 'rac11203node1' succeeded CRS-2676: Start of 'ora.cssdmonitor' on 'rac11203node1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rac11203node1' CRS-2672: Attempting to start 'ora.diskmon' on 'rac11203node1' CRS-2676: Start of 'ora.diskmon' on 'rac11203node1' succeeded CRS-2676: Start of 'ora.cssd' on 'rac11203node1' succeeded ASM created and started successfully. Disk Group OCR created successfully. clscfg: -install mode specified Successfully accumulated necessary OCR keys. Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. CRS-4256: Updating the profile Successful addition of voting disk 1621f2201ab94f32bf613b17f62982b0. Successful addition of voting disk 337a3f0b8a2d4f7ebff85594e4a8d3cd. Successful addition of voting disk 3ae328cce2b94f3bbfe37b0948362993. Successfully replaced voting disk group with +OCR. CRS-4256: Updating the profile CRS-4266: Voting file(s) successfully replaced ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 1621f2201ab94f32bf613b17f62982b0 (/dev/xvdc1) [OCR] 2. ONLINE 337a3f0b8a2d4f7ebff85594e4a8d3cd (/dev/xvdd1) [OCR] 3. ONLINE 3ae328cce2b94f3bbfe37b0948362993 (/dev/xvde1) [OCR] Located 3 voting disk(s). CRS-2672: Attempting to start 'ora.asm' on 'rac11203node1' CRS-2676: Start of 'ora.asm' on 'rac11203node1' succeeded CRS-2672: Attempting to start 'ora.OCR.dg' on 'rac11203node1' CRS-2676: Start of 'ora.OCR.dg' on 'rac11203node1' succeeded CRS-2672: Attempting to start 'ora.registry.acfs' on 'rac11203node1' CRS-2676: Start of 'ora.registry.acfs' on 'rac11203node1' succeeded Configure Oracle Grid Infrastructure for a Cluster ... succeeded
That’s it! After returning to the OUI screen you run the remaing assistants and finally are rewarded with the success message:
Better still, I could now log in to SQL*Plus and was rewarded with the new version:
$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.3.0 Production on Sat Sep 24 22:29:45 2011 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production PL/SQL Release 11.2.0.3.0 - Production CORE 11.2.0.3.0 Production TNS for Linux: Version 11.2.0.3.0 - Production NLSRTL Version 11.2.0.3.0 - Production SQL>
Summary
You might remark that in the output there has only ever been one node referenced. That is correct-my lab box has limited resources and I’d like to test the addNode.sh script for each new release so please be patient! I’m planning an article about upgrading to 11.2.0.3 soon, as well as the addition of a node. One thing I noticed was the abnormally high CPU usage for the CSSD processes: ocssd.bin, cssdagent and cssdmonitor-something I find alarming at the moment.
top - 22:53:19 up 1:57, 5 users, load average: 5.41, 4.03, 3.77 Tasks: 192 total, 1 running, 191 sleeping, 0 stopped, 0 zombie Cpu(s): 0.3%us, 0.2%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4102536k total, 3500784k used, 601752k free, 59792k buffers Swap: 1048568k total, 4336k used, 1044232k free, 2273908k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27646 oracle RT 0 1607m 119m 53m S 152.0 3.0 48:57.35 /u01/crs/11.2.0.3/bin/ocssd.bin 27634 root RT 0 954m 93m 55m S 146.0 2.3 31:45.50 /u01/crs/11.2.0.3/bin/cssdagent 27613 root RT 0 888m 91m 55m S 96.6 2.3 5124095h /u01/crs/11.2.0.3/bin/cssdmonitor 28110 oracle -2 0 485m 14m 12m S 1.3 0.4 0:34.65 asm_vktm_+ASM1 28126 oracle -2 0 499m 28m 15m S 0.3 0.7 0:04.52 asm_lms0_+ASM1 28411 root RT 0 500m 144m 59m S 0.3 3.6 5124095h /u01/crs/11.2.0.3/bin/ologgerd -M -d /u01/crs/11.2.0.3/crf/db/rac11203node1 32394 oracle 20 0 15020 1300 932 R 0.3 0.0 5124095h top 1 root 20 0 19336 1476 1212 S 0.0 0.0 0:00.41 /sbin/init ...
11.2.0.2 certainly didn’t use that much CPU across 4 cores…
Update: I have just repeated the same installation on VirtualBox 4.1.2 with less potent hardware, and funny enough the CPU problem has disappeared. How is that possible? I need to understand more, and maybe update the XEN host to something more recent.
Update: well it’s out, actually. See the comments below. However the certification matrix hasn’t been updated so it’s anyone’s guess if Oracle/Red Hat 6 are certified at this point in time.
Tanel Poder has already announced it a few days ago, but 11.2.0.3 must be ready for release very soon. It has even been spotted in the “lastet patchset” page on OTN, only to be removed quickly. After another tweet came out from Laurent Schneider, it was time to investigate what’s new. The easiest way is to point your browser to tahiti.oracle.com and type “11.2.0.3” into the search box. You are going to find a wealth of new information!
As a RAC person by heart I am naturally interested in RAC features first. The new features I spotted in the Grid Infrastructure installation guide for Linux are listed here:
http://download.oracle.com/docs/cd/E11882_01/install.112/e22489/whatsnew.htm#CEGJBBBB
Additional information for this article was taken from the “New Features” guide:
http://download.oracle.com/docs/cd/E11882_01/server.112/e22487/chapter1_11203.htm#NEWFTCH1-c
So the question is-what’s in for us?
ASM Cluster File System
As I expected, there was support for ACFS and ADVM for Oracle’s own kernel. This has been overdue for a while. I remember how surprised I was when I installed RAC on Oracle Linux 5.5 with the UEK kernel only to see that infamous “…not supported” output when the installer probed the kernel version. Supported kernels are Linux kernels UEK5-2.6.32-100.34.1 and subsequent updates to 2.6.32-100 kernels for Oracle Linux kernels OL5 and OL6.
A big surprise is support for ACFS for SLES-I though that was pretty much dead in the water after all that messing around from Novell. ACFS has always worked on SLES 10 up to SP3, but it did never for SLES 11. The requirement is SLES 11 SP1, and it has to be 64bit.
There are quite a few additional changes to ACFS. For example, it’s now possible to use ACFS replication and tagging on Windows.
Upgrade
If one of the nodes in the cluster being upgraded with the rootupgrade.sh script fails during the execution of said script, the operation can be completed with the new “force” flag.
Random Bits
I found the following note on MOS regarding the time zone file: Actions For DST Updates When Upgrading To Or Applying The 11.2.0.3 Patchset [ID 1358166.1] I suggest you have a look at that note, as it mentions a new pre-upgrade script you need to download and corrective actions for 11.2.0.1 and 11.2.0.2. I’m sure it’s going to be mentioned in the 11.2.0.3 patch readme as well.
There are also changes expected with the dreaded mutex problem in busy systems, MOS note WAITEVENT: “library cache: mutex X” [ID 727400.1] lists 11.2.0.3 as the release where many of the problems related to this are fixed. Time will tell if they are…
Further enhancements are focused on Warehouse Builder, and XML. SQL apply and the log miner have also been enhanced which is good news for users of Streams and logical standby databases
Summary
It is much to early to say anything else about the patchset. Quite a few important documents don’t have a “new features” section yet. That includes the Real Application Clusters Administration and Deployment guide as well which I’ll cover as soon as it’s out. From a quick glance at the still unreleased patchset it seems it’s less of a radical change than 11.2.0.2 was which is a good thing. Unfortunately the certification matrix hasn’t been updated yet, I am very keen to see support for Oracle/Red Hat Linux 6.x.
Like I said in a previous post I have started working in the Exadata performance field, which is really exciting, especially after you get it to work really fast!
Also, finding out what your session is spending time on is important if you are just getting started.
I found the following indispensable tools for Exadata performance analysis:
There are probably a lot more than those, but these are the bare essentials. Let’s have a look at them in a bit more detail.
Snapper
I don’t think I have to lose a single word about snapper-it’s an established tool for performance diagnostics tought by Tanel in his advanced troubleshooting course. I use snapper primarily for analysis of smart scans. Below is an example for a session using serial execution to scan a table (best viewed with Firefox):
SQL> select count(1) from order_items;
SQL> @snapper all 5 1 243 Sampling SID 243 with interval 5 seconds, taking 1 snapshots... setting stats to all due option = all -- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com ) ------------------------------------------------------------------------------------------------------------------------------------- SID, USERNAME , TYPE, STATISTIC , HDELTA, HDELTA/SEC, %TIME, GRAPH ------------------------------------------------------------------------------------------------------------------------------------- 243, SOE , STAT, session logical reads , 95.02k, 19k, 243, SOE , STAT, user I/O wait time , 331, 66.2, 243, SOE , STAT, non-idle wait time , 332, 66.4, 243, SOE , STAT, non-idle wait count , 1.51k, 302.2, 243, SOE , STAT, physical read total IO requests , 776, 155.2, 243, SOE , STAT, physical read total multi block requests , 746, 149.2, 243, SOE , STAT, physical read requests optimized , 30, 6, 243, SOE , STAT, physical read total bytes , 778.67M, 155.73M, 243, SOE , STAT, cell physical IO interconnect bytes , 778.67M, 155.73M, 243, SOE , STAT, consistent gets , 95.06k, 19.01k, 243, SOE , STAT, consistent gets from cache , 7, 1.4, 243, SOE , STAT, consistent gets from cache (fastpath) , 7, 1.4, 243, SOE , STAT, consistent gets direct , 95.05k, 19.01k, 243, SOE , STAT, physical reads , 95.05k, 19.01k, 243, SOE , STAT, physical reads direct , 95.05k, 19.01k, 243, SOE , STAT, physical read IO requests , 776, 155.2, 243, SOE , STAT, physical read bytes , 778.67M, 155.73M, 243, SOE , STAT, calls to kcmgcs , 7, 1.4, 243, SOE , STAT, file io wait time , 394, 78.8, 243, SOE , STAT, Number of read IOs issued , 776, 155.2, 243, SOE , STAT, no work - consistent read gets , 95.15k, 19.03k, 243, SOE , STAT, cell flash cache read hits , 30, 6, 243, SOE , TIME, DB CPU , 1.96s, 391.94ms, 39.2%, |@@@@ | 243, SOE , TIME, sql execute elapsed time , 6s, 1.2s, 120.0%, |@@@@@@@@@@| 243, SOE , TIME, DB time , 6s, 1.2s, 120.0%, |@@@@@@@@@@| 243, SOE , WAIT, direct path read , 3.3s, 660.48ms, 66.0%, |@@@@@@@ | 243, SOE , WAIT, kfk: async disk IO , 5ms, 1ms, .1%, | | -- End of Stats snap 1, end=2011-08-23 16:20:03, seconds=5 ----------------------------------------------------------------------- Active% | SQL_ID | EVENT | WAIT_CLASS ----------------------------------------------------------------------- 74% | 1p7y7pvzmxx3y | direct path read | User I/O 26% | 1p7y7pvzmxx3y | ON CPU | ON CPU -- End of ASH snap 1, end=2011-08-23 16:20:03, seconds=5, samples_taken=47
Watch out for the cell smart entries to figure out of a smart scan is happening. In the above example, there wasn’t one.
The same query again, but this time making use of session offloading:
SQL> select /*+full(t)*/ count(1) from order_items t;
SQL> @snapper all 5 1 243 Sampling SID 243 with interval 5 seconds, taking 1 snapshots... setting stats to all due option = all -- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com ) ------------------------------------------------------------------------------------------------------------------------------------- SID, USERNAME , TYPE, STATISTIC , HDELTA, HDELTA/SEC, %TIME, GRAPH ------------------------------------------------------------------------------------------------------------------------------------- 243, SOE , STAT, session logical reads , 427.69k, 85.54k, 243, SOE , STAT, user I/O wait time , 165, 33, 243, SOE , STAT, non-idle wait time , 167, 33.4, 243, SOE , STAT, non-idle wait count , 3.52k, 703, 243, SOE , STAT, enqueue waits , 12, 2.4, 243, SOE , STAT, enqueue requests , 10, 2, 243, SOE , STAT, enqueue conversions , 15, 3, 243, SOE , STAT, enqueue releases , 10, 2, 243, SOE , STAT, global enqueue gets sync , 25, 5, 243, SOE , STAT, global enqueue releases , 10, 2, 243, SOE , STAT, physical read total IO requests , 4.56k, 911.6, 243, SOE , STAT, physical read total multi block requests , 4.31k, 862.2, 243, SOE , STAT, physical read total bytes , 3.5G, 700.6M, 243, SOE , STAT, cell physical IO interconnect bytes , 1.52G, 303.95M, 243, SOE , STAT, ges messages sent , 12, 2.4, 243, SOE , STAT, consistent gets , 427.54k, 85.51k, 243, SOE , STAT, consistent gets from cache , 75, 15, 243, SOE , STAT, consistent gets from cache (fastpath) , 75, 15, 243, SOE , STAT, consistent gets direct , 427.47k, 85.49k, 243, SOE , STAT, physical reads , 427.47k, 85.49k, 243, SOE , STAT, physical reads direct , 427.47k, 85.49k, 243, SOE , STAT, physical read IO requests , 4.56k, 911, 243, SOE , STAT, physical read bytes , 3.5G, 700.36M, 243, SOE , STAT, calls to kcmgcs , 75, 15, 243, SOE , STAT, file io wait time , 17.45k, 3.49k, 243, SOE , STAT, cell physical IO bytes eligible for predicate offload , 3.49G, 699M, 243, SOE , STAT, cell smart IO session cache lookups , 5, 1, 243, SOE , STAT, cell smart IO session cache hits , 5, 1, 243, SOE , STAT, cell physical IO interconnect bytes returned by smart scan, 1.52G, 303.25M, 243, SOE , STAT, cell session smart scan efficiency , -3, -.6, 243, SOE , STAT, table scans (long tables) , 5, 1, 243, SOE , STAT, table scans (direct read) , 5, 1, 243, SOE , STAT, table scan rows gotten , 129.59M, 25.92M, 243, SOE , STAT, table scan blocks gotten , 422.53k, 84.51k, 243, SOE , STAT, cell scans , 5, 1, 243, SOE , STAT, cell blocks processed by cache layer , 501.97k, 100.39k, 243, SOE , STAT, cell blocks processed by txn layer , 501.97k, 100.39k, 243, SOE , STAT, cell blocks processed by data layer , 426.9k, 85.38k, 243, SOE , STAT, cell blocks helped by minscn optimization , 501.91k, 100.38k, 243, SOE , STAT, cell simulated session smart scan efficiency , 3.5G, 699.17M, 243, SOE , STAT, cell IO uncompressed bytes , 3.5G, 699.17M, 243, SOE , TIME, DB CPU , 3.98s, 796.68ms, 79.7%, |@@@@@@@@ | 243, SOE , TIME, sql execute elapsed time , 6.31s, 1.26s, 126.2%, |@@@@@@@@@@| 243, SOE , TIME, DB time , 6.31s, 1.26s, 126.2%, |@@@@@@@@@@| 243, SOE , WAIT, enq: KO - fast object checkpoint , 2.01ms, 402us, .0%, | | 243, SOE , WAIT, cell smart table scan , 1.65s, 329.33ms, 32.9%, |@@@@ | 243, SOE , WAIT, events in waitclass Other , 5.69ms, 1.14ms, .1%, | | -- End of Stats snap 1, end=2011-08-23 16:21:16, seconds=5 ----------------------------------------------------------------------- Active% | SQL_ID | EVENT | WAIT_CLASS ----------------------------------------------------------------------- 66% | adhm3mbjfzysd | ON CPU | ON CPU 34% | adhm3mbjfzysd | cell smart table scan | User I/O -- End of ASH snap 1, end=2011-08-23 16:21:16, seconds=5, samples_taken=44
The enq: KO fast object checkpoint and the “cell smart table scan” are giveaways for a smart scan.
DBMS_XPLAN
Another proven, useful tool in the arsenal of the performance analyst. For Exadata you might want to use the display_cursor function. It can take a sql_id, a child number and a format parameter. I like the format ‘ALL’ best, as it gives me most information about a statement.
Note that even though you might get the word “storage” in the execution plan, it doesn’t mean you actually see a smart scan in the end! Always check the session wait events and session counters for the word “smart” to make sure one is occurring.
SQL> select * from table(dbms_xplan.display_cursor('3xjhbw9m5u9qu',format=>'ALL'))
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
SQL_ID 3xjhbw9m5u9qu, child number 0
-------------------------------------
select count(*) from order_items where product_id > 10
Plan hash value: 2209137760
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 249K(100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX STORAGE FAST FULL SCAN| ITEM_PRODUCT_IX | 338M| 1293M| 249K (2)| 00:50:00 |
-------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / ORDER_ITEMS@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
2 - storage("PRODUCT_ID">10)
filter("PRODUCT_ID">10)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) COUNT(*)[22]
31 rows selected.
This leads me to the next tool, SQL Monitoring, and it rocks.
Real Time SQL Monitoring
This is yet another awesome tool in your repository if you have the license. It allows you to check what’s happening during the execution of a SQL statement, which is more than useful in larger data warehouse style queries.
The easiest and most accessible way is to use OEM Grid Control or database console to view the report. I personally like to run the report in a putty session on a large screen. To make it easier to run the report based on a SQL ID I created the below script “report.sql”:
set trim on long 2000000 longchunksize 2000000 set trimspool on pagesize 0 echo off timing off linesize 1000 select DBMS_SQLTUNE.REPORT_SQL_MONITOR( sql_id => '&1', report_level=>'ALL') from dual;
With this at hand you can query v$sesssion for a SQL_ID and feed it into the code snippet.
Unfortunately I can’t upload text files to wordpress for security reasons, and posting the output here isn’t possible since the report is rather wide. I decided to run the report with HTML output and type “EM” instead an post the print screens. The command I used is sjhown below:
SQL> select dbms_sqltune.report_sql_monitor(sql_id=>’&1′, type=>’EM’, report_level=>’+histogram’) from dual;
The output of the script would be far too wide so I moved it into the attachement of the EM report type here for an insert /*+ append */ into table select ….
You can use this to even look at statistics gathering-in the below two examples I executed dbms_stats-this report has been created for a stats gathering session with a DOP of 16
What I like best about the monitoring report is that it points out where you are in the execution plan, and it provides an estimate for how much work has already been done. More intelligent people have written lots about using these tools, off the top of my head I’d check Greg Rahn’s as well as Doug Burn’s weblogs for more information.
Oracle 10046 trace
Cary Millsap has written the standard about extended trace files and I strongly recommend reading his book and papers on the subject.
Whilst you can run queries in isolation and easily identify them during testing, the matter gets considerably more difficult as soon as you have to trace a session from a connection pool or other third party application. Most of them are not instrumented making tracing by
Of course you could use the MRTools to enable tracing on a wider scale and then trawl through the trace files with their help, but it might be difficult to get permission to do so on a busy production system.
The lesson learned for me was how to enable tracing in parallel query: this MOS note “Master Note; How to Get a 10046 trace for a Parallel Query [ID 1102801.1]” has the answer. It becomes more difficult if your PQ spans instances, but there are ways around this. You also should consider using DBMS_MONITOR to enable session trace if you can’t use “alter session set events” for some reason. Be sure to read all of the note, especially the bit at the bottom which explains the new 11g interface using “alter session set sql_trace” syntax.
I personally liked setting the tracefile identifier to something more easily identifiable. The ADRCI has a nice option to show trace files-if your ADR is consolidated in a cluster file system you could simply run “adcri > show tracefile -t” to list them all and pick the ones with your custom tracefile identifier.
What about command line utilities?
In the past I liked to use strace to see what a process was doing while “on cpu”. Unfortunately this isn’t really an option with Exadata, as most of the calls are purely network related. After all, RDS is a network protocol. Consider this output from strace during a 10 second query (taken on a database node):
# strace -cp 29031 Process 29031 attached - interrupt to quit Process 29031 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 88.85 0.033064 2 14635 5 poll 5.47 0.002035 0 14754 7377 setsockopt 2.68 0.000999 999 1 munmap 2.43 0.000906 0 22075 7321 recvmsg 0.56 0.000209 0 7399 sendmsg 0.00 0.000000 0 8 6 read 0.00 0.000000 0 2 write 0.00 0.000000 0 3 mmap 0.00 0.000000 0 60 rt_sigprocmask 0.00 0.000000 0 15 5 rt_sigreturn 0.00 0.000000 0 30 setitimer 0.00 0.000000 0 1 semctl 0.00 0.000000 0 262 getrusage 0.00 0.000000 0 112 times 0.00 0.000000 0 1 io_setup 0.00 0.000000 0 1 io_destroy 0.00 0.000000 0 23 semtimedop ------ ----------- ----------- --------- --------- ---------------- 100.00 0.037213 59382 14714 total
Not too enlightening-all I/O happens in the cells.
Summary
These are the tools I found very useful in getting started with Exadata performance analysis, and they are very good for the job at hand. Note that snapper can be used for getting information about parallel query and SQL Monitor is a blessing when it comes to looking at huge parallel queries involving lots of tables.
Recent comments
4 weeks 6 days ago
7 weeks 4 days ago
7 weeks 6 days ago
25 weeks 1 day ago
33 weeks 1 day ago
1 year 6 weeks ago
1 year 8 weeks ago
1 year 12 weeks ago
1 year 13 weeks ago
1 year 13 weeks ago