Search

OakieTags

Who's online

There are currently 0 users and 41 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Runstats utility

A variation on Tom Kyte's invaluable RUNSTATS utility that compares the resource consumption of two alternative units of work. Designed to work under constrained developer environments and builds on the original with enhancements such as "pause and resume" functionality, time model statistics and the option to report on specific statistics. ***Update*** Now available in two formats: 1) as a PL/SQL package and 2) as a free-standing SQL*Plus script (i.e. no installation/database objects needed). January 2007 (updated October 2011)

Troubleshooting Oracle agent 12.1.0.1.0

As you may have read on this blog I recently moved from Oracle Enterprise Manager 11.1 GRID control to the full control of the cloud-12.1 has taken its place in the lab.

I also managed to install agents via self download (my OEM is x86 to reduce the footprint) on a 2 node 11.2.0.3 cluster: rac11203node1 and rac11203node2. After a catastrophic crash of both nodes followed by a reboot none of the agents wanted to report back to the OMS.

The difference

Oracle 12.1 has a new agent structure: where you used the agent base directory in previous releases to create the AGENT_HOME this now changed. In 11.1 I could specify the agent base to be /u01/app/oracle/product, and OUI would deploy everything in a subdirectory it creates, called agent11g (or agent 10g for 10.2.x).

Now I set the agent base to the same value and installed my agents in parallel, but found that there is no agent12c directory under the base. Instead I found these:

[oracle@rac11203node1 product]$ ls -l
total 48
drwxr-xr-x. 73 oracle oinstall  4096 Oct 27 22:40 11.2.0.3
-rw-rw-r--.  1 oracle oinstall    91 Sep 23 08:52 agentimage.properties
drwxr-xr-x.  6 oracle oinstall  4096 Oct 28 14:57 agent_inst
drwxr-xr-x.  3 oracle oinstall  4096 Oct 15 21:35 core
drwx------.  2 oracle oinstall 16384 Oct 14 21:02 lost+found
drwxr-xr-x.  8 oracle oinstall  4096 Oct 15 21:50 plugins
-rwxr-xr-x.  1 oracle oinstall   223 Oct 15 21:25 plugins.txt
-rw-r--r--.  1 oracle oinstall   298 Oct 15 21:42 plugins.txt.status
drwxr-xr-x.  5 oracle oinstall  4096 Oct 15 21:43 sbin

So it’s all a bit different. The core/ directory contains the agent binaries. The agent_inst directory contains the the sysman directory. This is where all the configuration and state information is stored. In that respect the sysman directory is the same as in 11.1.

Now back to my problem-both agents that previously used to work fine were reported “unavailable”. The agent information is no longer in the setup-agents-management agents.

For 12.1 you need to navigate to setup-agents from the top down menu in the upper right corner.This takes you to the overview page. OK, so I could see the agents weren’t communicating with the OMS.

On the machine I could see this:

[oracle@rac11203node1 log]$ emctl status agent
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version      : 12.1.0.1.0
OMS Version        : (unknown)
Protocol Version   : 12.1.0.1.0
Agent Home         : /u01/app/oracle/product/agent_inst
Agent Binaries     : /u01/app/oracle/product/core/12.1.0.1.0
Agent Process ID   : 13270
Parent Process ID  : 13215
Agent URL          : https://rac11203node1.localdomain:3872/emd/main/
Repository URL     : https://oem12oms.localdomain:4901/empbs/upload
Started at         : 2011-10-26 18:30:17
Started by user    : oracle
Last Reload        : (none)
Last successful upload                       : (none)
Last attempted upload                        : (none)
Total Megabytes of XML files uploaded so far : 0
Number of XML files pending upload           : 1,858
Size of XML files pending upload(MB)         : 8.05
Available disk space on upload filesystem    : 49.16%
Collection Status                            : Collections enabled
Last attempted heartbeat to OMS              : 2011-10-27 15:42:47
Last successful heartbeat to OMS             : (none)

---------------------------------------------------------------
Agent is Running and Ready

The settings are correct, I have verified that with another, uploading and otherwise fine agent. I have also secured the agent, and $AGENT_BASE/agent_inst/sysman/log/secure.log as well as the emctl secure agent commands reported normal, successful operation.

Still the stubborn thing doesn’t want to talk to the OMS – in the agent overview page both agents are listed as “unavailable”, but not blocked. When I force an upload, I get this:

[oracle@rac11203node1 log]$ emctl upload
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload error:full upload has failed: uploadXMLFiles skipped :: OMS version not checked yet. If this issue persists check trace files for ping to OMS related errors. (OMS_DOWN)

However it’s not down, I can reach it from another agent (which happens to be on the same box as the OMS)

[oracle@oem12oms 12.1.0.1.0]$ $ORACLE_HOME/bin/emctl status agent
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version      : 12.1.0.1.0
OMS Version        : 12.1.0.1.0
Protocol Version   : 12.1.0.1.0
Agent Home         : /u01/gc12.1/agent/agent_inst
Agent Binaries     : /u01/gc12.1/agent/core/12.1.0.1.0
Agent Process ID   : 2964
Parent Process ID  : 2910
Agent URL          : https://oem12oms.localdomain:3872/emd/main/
Repository URL     : https://oem12oms.localdomain:4901/empbs/upload
Started at         : 2011-10-15 21:00:37
Started by user    : oracle
Last Reload        : (none)
Last successful upload                       : 2011-10-27 15:46:38
Last attempted upload                        : 2011-10-27 15:46:38
Total Megabytes of XML files uploaded so far : 0
Number of XML files pending upload           : 0
Size of XML files pending upload(MB)         : 0
Available disk space on upload filesystem    : 49.16%
Collection Status                            : Collections enabled
Last attempted heartbeat to OMS              : 2011-10-27 15:48:34
Last successful heartbeat to OMS             : 2011-10-27 15:48:34

---------------------------------------------------------------
Agent is Running and Ready

And no, the firewall is turned off and I can connect to the upload from any machine in the network:

[oracle@rac11203node1 log]$ wget --no-check-certificate https://oem12oms.localdomain:4901/empbs/upload
--2011-10-27 15:55:46-- https://oem12oms.localdomain:4901/empbs/upload
Resolving oem12oms.localdomain... 192.168.99.28
Connecting to oem12oms.localdomain|192.168.99.28|:4901... connected.
WARNING: cannot verify oem12oms.localdomain’s certificate, issued by “/O=EnterpriseManager on oem12oms.localdomain/OU=EnterpriseManager on oem12oms.localdomain/L=EnterpriseManager on oem12oms.localdomain/ST=CA/C=US/CN=oem12oms.localdomain”:
Self-signed certificate encountered.
HTTP request sent, awaiting response... 200 OK
Length: 314 [text/html]
Saving to: “upload.1”

100%[======================================>] 314 --.-K/s in 0s

2011-10-27 15:55:46 (5.19 MB/s) - “upload.1” saved [314/314]

The agent complains about this in gcagent.log:

2011-10-27 15:56:08,947 [37:3F09CD9C] WARN – improper ping interval (EM_PING_NOTIF_RESPONSE: BACKOFF::180000)
2011-10-27 15:56:18,471 [167:E3E93C4C] WARN – improper ping interval (EM_PING_NOTIF_RESPONSE: BACKOFF::180000)
2011-10-27 15:56:18,472 [167:E3E93C4C] WARN – Ping protocol error
o.s.gcagent.ping.PingProtocolException [OMS sent an invalid response: "BACKOFF::180000"]

At least someone in Oracle has some humour when it comes to this.

The Solution

Now I dug around a lot more and finally managed to get to the conclusion. It was actually a two fold problem. The first agent was simply blocked. After finding a way to unblock it, it worked happily.

The second agent was a bit more trouble. I unblocked it as well from the agent page in OEM, which failed. As it turned out the agent was shut down. And it didn’t start either:

[oracle@rac11203node2 12.1.0.1.0]$ emctl start agent
Oracle Enterprise Manager 12c Cloud Control 12.1.0.1.0
Copyright (c) 1996, 2011 Oracle Corporation.  All rights reserved.
Starting agent ............. failed.
Target Metadata Loader failed at Startup
Consult the log files in: /u01/app/oracle/product/agent_inst/sysman/log

I checked the logs and found this interesting bit of information:

2011-10-24 21:35:21,387 [1:3305B9] INFO - Plugin oracle.sysman.oh is now active
2011-10-24 21:35:21,393 [1:3305B9] INFO - Plugin oracle.sysman.db is now active
2011-10-24 21:35:21,396 [1:3305B9] WARN - Agent failed to Startup for Target Metadata Loader in step 2
oracle.sysman.gcagent.metadata.MetadataLoadingException: The targets.xml file is empty
at oracle.sysman.gcagent.metadata.MetadataManager$Loader.validateMetadataFile(MetadataManager.java:799)
at oracle.sysman.gcagent.metadata.MetadataManager$RegistryLoader.processMDFile(MetadataManager.java:1733)
at oracle.sysman.gcagent.metadata.MetadataManager$RegistryLoader.readRegistry(MetadataManager.java:1695)
at oracle.sysman.gcagent.metadata.MetadataManager$RegistryLoader.load(MetadataManager.java:1641)
at oracle.sysman.gcagent.metadata.MetadataManager.load(MetadataManager.java:282)
at oracle.sysman.gcagent.metadata.MetadataManager.runStartupStep(MetadataManager.java:450)
at oracle.sysman.gcagent.metadata.MetadataManager.tmNotifier(MetadataManager.java:337)
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.invokeNotifier(TMComponentSvc.java:876)
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.invokeInitializationStep(TMComponentSvc.java:959)
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.doInitializationStep(TMComponentSvc.java:800)
at oracle.sysman.gcagent.tmmain.lifecycle.TMComponentSvc.notifierDriver(TMComponentSvc.java:740)
at oracle.sysman.gcagent.tmmain.TMMain.startup(TMMain.java:215)
at oracle.sysman.gcagent.tmmain.TMMain.agentMain(TMMain.java:458)
at oracle.sysman.gcagent.tmmain.TMMain.main(TMMain.java:447)
2011-10-24 21:35:21,397 [1:3305B9] INFO - Agent exiting with exit code 55
2011-10-24 21:35:21,398 [31:F9C26A76:Shutdown] INFO - *jetty*: Shutdown hook executing
2011-10-24 21:35:21,399 [31:F9C26A76] INFO - *jetty*: Graceful shutdown SslSelectChannelConnector@0.0.0.0:3872
2011-10-24 21:35:21,399 [31:F9C26A76] INFO - *jetty*: Graceful shutdown ContextHandler@14d964af@14d964af/emd/lifecycle/main,null

I yet have to find the reason for the empty targets.xml file but sure enough it existed with 0 byes length.

Simple enough I thought, all I need to do is run agentca to repopulate the file. Unfortunately I couldn’t find it.

[oracle@rac11203node2 emd]$ find /u01/app/oracle/product/ -name "agentca*"
[oracle@rac11203node2 emd]$

This was a bit of a let down. Then I decided to create a new targets.xml file and try a resynchronisation of the agent.This is a well hidden menu item so I dedided to show it here:

The only element that went into targets.xml was “”. This was sufficient to start the agent, which is a requirement for the resynchronisation to succeed. I was quite amazed that this succeeded, but it did:

[oracle@rac11203node2 emd]$ find /u01/app/oracle/product/ -name "agentca*"
[oracle@rac11203node2 emd]$

This was very encouraging, and both agents are now working properly.

Friday Philosophy – When Things Feel Wrong

I got pinged by someone else missing the Friday Philosophy today {BTW, Good news, the technical blogs start again on Monday}, so…

Take a look at the below. It is a rather pleasant spot of countryside on Sao Migel in the Azores, where the area in the foreground has been converted into a bit of a garden to take advantage of the natural beauty.

Nice, isn’t it? Sorry the sun is not shining, but there you go. This waterfall just across the road from a set of water mills and waterfalls at Achada, which is one of the tourist spots that features often in brochures for Sao Miguel. But look at the scene again. Does anything strike you as odd about that waterfall? I could not put my finger on what it was, I just knew it looked odd. (Graham, if you don’t spot this immediately you owe me a pint).

There was a path heading up the valley to one side of the waterfall, one of a network meandering through the gardens, and I went up it. After a short while there was a smaller path heading up the hill more directly. It looked maintained but too steep to be a “wandering around enjoying the scenary” path. So I went up that. At the top of this path was a structure, a concrete “block house” It hummed and it gurgled. There was another path heading back the way I came, towards the waterfall. I followed along it and I found the top of the waterfall…

Yes, the waterfall was a fake. It was coming out of this huge concrete trough fed by a large pipe which went back to the humming, gurgling concrete block house. Returning back down to the bottom of the waterfall I could put my finger on what was odd about that waterfall. There is a valley to the left. OK, that is not so odd, the water could be coming from high land to the right of the valley and draining into the valley at this point. Except there is another valley to the right of the waterfall as well. Both had small streams running through them. This waterfall could only be natural if there was a perfectly formed, shallow middle valley heading up to the hills between the other two valleys and the only point where the water could escape was at the confluence of the lower two valleys. There was also a lot more water coming down this waterfall than was coming down the two valleys.

What has this got to do with Oracle and databases? Well, have you ever been in the position where you look at the output from a system and it just does not “feel right”? I sometimes refer to something I call DBA Intuition. There is also Developer Intuition and there is certainly Tester Intuition. All are where you are not sure why but it just looks or feels wrong (or, you just get a feeling for what a problem is or what the solution might be, all I class as DBA intuition, but I digress).

As an example, you are tasked to look at one of those terrible BI-type reports that consist of two pages of SQL and they want to know why it takes so long to run. Before you dive into the code, you look at the result of the report and you just think “That seems like an awfully large number of people responding to that advertising campaign” or “I can’t believe 10 percent of our income comes from selling baby diapers”.

Usually when I have dug into the actual report or part of the system that feels wrong I find out one of two things. That I had no idea that part of that business really worked that way, or, that the report is utter garbage. Somewhere in that report there a missing table or a logic flaw {nested AND and OR statements are a good place to look}. This of course has the advantage that there is no need for me to tune the report until someone can tell me WHAT the report is supposed to be identifying.

DBA Intuition is, I think, basically a gut feeling you develop in response to experience. I suppose I have more “tuning intuition” these days, I look at how fast some SQL is coming back and the volume of data and I think “seems reasonable actually” or “something is very inefficient here”. I’ve noticed that good system testers have this intuitive knack of just asking the new system to do things in a way or order that does not match the original intention.

So, I encourage you to trust your intuition. If some part of the system feels wrong, go and root around in the system undergrowth; climb up to the top of the data waterfall {OK, I’ll stop with the bad IT metaphors) and see what you find.

Incidentally, after I found the pump house we walked the other way up the valley, following the pipe and the pleasant gardens. It took maybe 20 minutes but we found the "real source" of the fake waterfall, which was a very nice, natural waterfall sitting in the very bottom of a pleasant valley – just where a waterfall should be. It just took a little more effort to get to it. I'm sure there is some moral story in there but I'm damned if I can work it out :-)

A look into the Exadata infrastructure

An Oracle Exadata database machine consists of several parts: Intel based servers, infiniband switches, a cisco ethernet switch, a KVM switch and the hardware surrounding it like cables, 19″ rack, power distribution units. The Intel based servers are what “Exadata administrators” are administering the most. The intention of this article is to let the reader gain a little more insight into Exadata specific administration on those.

Two server layers: computing and storage
The two layers have quite different properties: the Exadata computing layer is Linux with Oracle grid infrastructure and the Oracle database software installed, very much as you would do yourself (if you install it in a strict OFA way), and the storage layer is Linux too, but with specific Exadata storage software.

Computing layer
The computing layer filesystems are using LVM on newer (X2) versions of Exadata, and using plain partitions without LVM on the V2 version of Exadata. There are 3 filesystems on a X2 computing node:

[root@xxxxdb01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
30G 14G 15G 49% /
/dev/sda1 124M 16M 102M 14% /boot
/dev/mapper/VGExaDb-LVDbOra1
99G 29G 65G 32% /u01
tmpfs 81G 196M 81G 1% /dev/shm

These are two filesystems inside logical volume, and the boot partition on a plain partition. The /boot partition consists of the grub configuration, kernel and initrd (initial ramdisk). The root logical volume (LVDbSys1) contains the linux operating system files, and the /u01 logical volume (LVDbOra1) contains the Oracle grid/cluster and database software.

A nice thing to know about the computing layer storage is that there is space left in the single volume group which is used for the logical volumes:

[root@emc1db01 ~]# vgdisplay
--- Volume group ---
VG Name VGExaDb
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 4
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 3
Open LV 3
Max PV 0
Cur PV 1
Act PV 1
VG Size 557.62 GB
PE Size 4.00 MB
Total PE 142751
Alloc PE / Size 39424 / 154.00 GB
Free PE / Size 103327 / 403.62 GB <<-- 400GB space left!
VG UUID AZbMbf-papI-6FE2-hYlR-bzE1-G4pp-Bou81e

The disk configuration on the computing layer is build upon a RAID 5 set of 3 disks with a hot spare. The RAID is hardware based (the diskcontroller is reponsible for the RAID maintenance). This is visible through the MegaCli64 command:

[root@emc1db01 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL

Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name : DBSYS
RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3
Size : 557.75 GB
State : Optimal
Stripe Size : 1.0 MB
Number Of Drives : 3
Span Depth : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy : Read/Write
Disk Cache Policy : Disabled
Encryption Type : None

Number of Dedicated Hot Spares: 1
0 : EnclId - 252 SlotId - 3

Exit Code: 0x00

Let me emphasize this disks/RAID-set only is responsible for serving the operating system and accompanying files, and the grid/clusterware and database executables. All true database processing is done on the storage layer nodes.

Storage layer
The storage layer filesystems are not using LVM, nor regular partitions (/dev/sd* for disks visible as SCSI devices for linux):

[root@xxxxcel01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md5 9.9G 3.4G 6.0G 37% /
tmpfs 12G 0 12G 0% /dev/shm
/dev/md7 2.0G 623M 1.3G 33% /opt/oracle
/dev/md4 116M 37M 74M 34% /boot
/dev/md11 2.3G 222M 2.0G 10% /var/log/oracle

The devices used are linux software RAID devices (‘md’ means multiple devices, which is a linux device which is build from independed underlying devices). The reason for using linux software RAID for the operating system and Oracle storage software is to provide redundancy, because the underlying disks on the storage servers do not provide that. Redundancy/error protection for the databases data on the storage servers is done using ASM normal redundancy, which is, put blunt and simple, RAID 10 handled on the ASM layer.

A logical next question is: okay, how do these linux software RAID/MD devices look like? Well, the most simple way is to use /proc/mdstat:

[root@emc1cel01 ~]# cat /proc/mdstat
Personalities : [raid1]
md4 : active raid1 sdb1[1] sda1[0]
120384 blocks [2/2] [UU]

md5 : active raid1 sdb5[1] sda5[0]
10482304 blocks [2/2] [UU]

md6 : active raid1 sdb6[1] sda6[0]
10482304 blocks [2/2] [UU]

md7 : active raid1 sdb7[1] sda7[0]
2096384 blocks [2/2] [UU]

md8 : active raid1 sdb8[1] sda8[0]
2096384 blocks [2/2] [UU]

md1 : active raid1 sdb10[1] sda10[0]
714752 blocks [2/2] [UU]

md11 : active raid1 sdb11[1] sda11[0]
2433728 blocks [2/2] [UU]

md2 : active raid1 sdb9[1] sda9[0]
2096384 blocks [2/2] [UU]

unused devices:

We see the MD device name: md and a device number, the state (‘active’), the RAID type (‘raid1′) and the actual devices the md device is build from (sdb1[1] sda1[0] for example). On the next line the number of blocks in the md device, the total number of disks and the number of active disks [2/2], and the status of the disks [UU]. “U” means up. A failed device (probably as a result of a failed disk, but it could be the result of the clearing of a partition) is visible as “_”.

Okay, this is all strongly linux related, where’s the Exadata stuff?

Cellos
‘Cell’ is a name used for the storage server in the Exadata context. Both the computing nodes and the storage nodes have a directory ‘/opt/oracle.cellos’. This directory contains Oracle software which is embedded in Linux. The function of cellos is to maintain and check firmware versions and hardware configurations and hardware activation/deactivation.

Validation functions are done by ‘cellos’, or ‘scripts inside the /opt/oracle.cellos directory’ during startup phase of linux. One of the functions cellos can perform is flashing firmware. Strongly related to cellos are the linux kernel and the OFED stack.

Imageinfo
The state of the cellos image is listed using the imageinfo utility (which exists both in /usr/local/bin and in /opt/oracle.cellos), which executes the /opt/oracle.cellos/patch/bin/imageinfo.sh script:

On a database/computing node:

[root@xxxxdb01 bin]# imageinfo -all

Kernel version: 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64
Image version: 11.2.2.2.0.101206.2
Image created: 2010-12-09 10:06:22 -0800
Image activated: 2011-01-28 14:19:59 -0800
Image image type: production
Image status: success
Internal version:
Image label: OSS_11.2.0.3.0_LINUX.X64_101206.2
Node type: COMPUTE
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

On a cell/storage node:

[root@xxxxcel01 bin]# imageinfo -all

Kernel version: 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64
Cell version: OSS_11.2.0.3.0_LINUX.X64_101206.2
Cell rpm version: cell-11.2.2.2.0_LINUX.X64_101206.2-1

Active image version: 11.2.2.2.0.101206.2
Active image created: 2010-12-09 09:37:46 -0800
Active image activated: 2011-01-28 14:33:44 -0800
Active image type: production
Active image status: success
Active internal version:
Active image label: OSS_11.2.0.3.0_LINUX.X64_101206.2
Active node type: STORAGE
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7

In partition rollback: Impossible

Cell boot usb partition: /dev/sdac1
Cell boot usb version: 11.2.2.2.0.101206.2

Inactive image version: undefined
Rollback to the inactive partitions: Impossible

We see imageinfo knows if it’s run on a computing or storage node (‘Active node type’).

CheckHWnFWProfile
Another executable in /opt/oracle.cellos is ‘CheckHWnFWProfile’. The name of this executabe is quite self-explanory: it checks if the configuration of the server it’s run on is valid, which means it is listed in one of the profiles (Exadata has different incarnations with different hardware, of which Oracle calls the hardware and firmware combination a ‘profile’).

If it’s run without any option, it should return [SUCCESS], which means the hardware and firmware are correct for that version of cellos:

[root@xxxxdb01 ~]# /opt/oracle.cellos/CheckHWnFWProfile
[SUCCESS] The hardware and firmware profile matches one of the supported profiles

If you want more information about the server, you can list the configuration with the ‘-d’ option:

[root@xxxxdb01 ~]# /opt/oracle.cellos/CheckHWnFWProfile -d
[SystemManufacturer]
SUN MICROSYSTEMS
[SystemModel]
SUN FIRE X4170 M2 SERVER
[BIOSVendor]
American Megatrends Inc.
[BIOSVersion]
08040203
[BIOSDate]
09/14/2010
[ProcessorInformation]
Cores:24
Intel(R) Xeon(R) X5670 2.93GHz
step: 2
[ILOMVersion]
3.0.9.27.a r58740
[PhysicalMemoryGB]
96
[DiskControllerPCIeSlotNumber]
PCI-E Slot 0
[DiskControllerModel]
LSI MegaRAID SAS 9261-8i
[DiskControllerFirmwareRevision]
12.9.0-0049
[DiskControllerSilicon]
B4
[DiskControllerBatteryBackupUnitVersion]
iBBU08
[InfinibandHCAPCIeSlotNumber]
PCI-E Slot 2
[FansAndFanCount]
FM0 FM1 FM2 FM3 4
[PowersuppliesAndCount]
PS0:OK PS1:OK 2
[InfinibandHCAPCIeSlotWidth]
5Gbps,x8
[InfinibandHCADeviceID]
26428
[InfinibandHCAChipRevision]
B0
[InfinibandHCAImageType]
ConnectX
[InfinibandHCAFirmwareVersion]
2.7.8100
[SASBackplaneFirmwareVersion]
NotAvailable
[DiskControllerPCIeSlotWidth]
x8
[10GEtherModelPCISlot]
82599EB PCI-E Slot 1
82599EB PCI-E Slot 1
[InfinibandHCAHardwareRev]
50
[LightsOutFirmwareVersion]
Firmware Revision : 3.0
[Enclosure:Slot:DiskType:DiskMake:DiskModel:Diskfw]
ALL:ALL SAS SEAGATE ST930003SSUN300G 0B70

If ‘CheckHWnFWProfile’ resulted in an error, you can list your actual configuration with ‘-d’, and use the ‘-s’ option to list what the profile or supported configuration is.

Tagged: database machine, exadata, imageinfo, infrastructure, oracle

Simple SQL with and without Inline Views 2

October 26, 2011 (Back to the Previous Post in the Series) In the previous article of this series we examined a couple of moderately simple data retrieval requirements, and used inline views and/or analytic functions to solve those data retrieval requirements.  I recently saw another opportunity to help a person on the ERP mailing list with [...]

OTN APAC Tour: Beijing, China (update)

Following on from my previous post, once the OTN APAC event was over I managed to see some of the sights of Beijing and the surrounding area, including the Forbidden City, Tiananmen Square, the Great Wall and Ming Tombs. I’ve uploaded some photos here, but it seems my photography skills are getting worse, not better. I’m ashamed to say these are the better photos. Let’s not discuss the ones I’ve not bothered to upload. :)

I have a few suggestions for any travellers to China:

  • Do not consider driving yourself. The traffic is crazy! I’ve never seen anything like it before. I experienced several “we’re all going to die” moments in every trip I took. Probably the worst was when two lorries moved together with our minibus between them. If our driver had not sped up in time, I’m quite convinced the ACE program would have lost several members. As you may have already read, some of the ACEs were in a crash, but came out of it unharmed. I was only able to wear a seat belt on one journey because all but one of the vehicles either had no belts, or belts but nowhere to plug them in. This takes a lot of getting used to.
  • If you have any sort of respiratory issues, it’s probably best to avoid Beijing completely. The pollution is very bad. It quickly makes you feel like you’ve smoked 40 cigarettes for breakfast. Even at the Great Wall (50+km from the city) it is still really bad.
  • Make sure you have printed copies of the Chinese names for any locations you want to visit. Having an English address is pretty much useless. Even with the Chinese addresses, it was often very difficult, especially if the print was small.
  • Don’t rely on any cloud services for your journey. You need local copies of everything and any apps you want to work with should have full offline functionality. Why? The Great Firewall of China is very effective at blocking a variety of services. Most Google services were trashed. Gmail was intermittently offline and I was never able to access attachments. Google Docs essentially doesn’t exist. Google Reader would work for about 5 minutes at a time, then fail bigtime. Most annoyingly, I was unable to use Google to search for anything. :( I was able to search with Bing, but I could only see results in Chinese. Even when I switched to English, I still got results in Chinese. Like I said, be very well prepared because searching for information is not easy. I’m sure there are ways to cope if you are clued up, but I was a little naive and went with very little preparation as far as logistics were concerned.
  • Take money with you. I was able to get cash advances from my credit cards, but the ATM failed to get cash from any of my bank accounts.

I’m now stationed at my friends house about 40 minutes from Auckland. There are cows in the garden, including 6 new calves which are pretty funny, and two pre-school children who are pretty funny too. Although I’m very much a towny at heart, I must admit the change in pace and air quality is very welcome. Last week I was talking about Oracle and this week I’ve chopping down some trees, cleared some scrub and pretended (very badly) to be a farmer. I’ve nearly mastered saying, “Get off my land!”, or as the Kiwi’s say it, “Gt ff my lnd aye!” I’m trying to introduce vowels to the natives, but with the exception of the word “aye” that is added to the end of every sentence, vowels seem to be lost forever. :)

In a few days I will transplant to Auckland and start the NZOUG conference.

Cheers

Tim…




Forms is not dead!

The recent interest in Oracle Forms is quite interesting. Besides ODTUG (Oracle Development Tools User Group); two other groups have recently conducted surveys to see how people are using forms one in Germany (http://talk2gerd.blogspot.com/2011/08/result-of-2011-oracle-forms-poll-part-1.html) and one in the US (http://blog.whitehorses.nl/2011/10/01/results-oracle-forms-survey/). Now, Oracle has release Oracle Forms 11g R2! First announced in Grant Ronald’s blog (http://blogs.oracle.com/grantronald/entry/oracle_forms_and_reports_new) you may now find complete information and download the product from OTN’s Forms page (http://www.oracle.com/technetwork/developer-tools/forms/overview/index.html). I’ve just submitted a paper on the new version to ODTUG KScope 12 in San Antonio; I hope the paper is accepted!

While I work with many languages and tools; Forms is a very capable development tool and I’m looking forward to many more years of using it.

NoCOUG’s 100th Conference Nov 9th!

Download the 100th issue of the NoCOUG Journal at http://bit.ly/rC2gRA.

Can you believe it? 100 conferences! The Northern California Oracle Users Group (NoCOUG) is celebrating its 25th anniversary in 2011 with its 100th conference. The upcoming conference at the Computer History Museum on November 9 is NoCOUG’s 100th quarterly conference and the upcoming issue of the NoCOUG Journal will be the 100th issue.
NoCOUG Conference 100 will feature some of the world’s top Oracle experts and speakers including Steven Feuerstein , Craig Shallahamer, Alex Gorbachev, and Kyle Hailey . It will be held at the Computer History Museum in Mountain View—a fitting location for such an occasion. The museum features marvelous computing artifacts such as a Hollerith Tabulating Machine and an actual operational Babbage Difference Engine—one of two that have been constructed in the past decade.
The 100th issue of the NoCOUG Journal will feature an interview with Michael Stonebraker—the high priest of relational databases, the research paper by Dr. Edgar Codd that started the relational revolution in 1970—A Shared Model of Data for Large Shared Data Banks, and the results of the Second International NoCOUG SQL Challenge.
Because of the significance of the occasion and the venue NoCOUG is expecting serveral hundred people that this conference will quickly reach maximum capacity, so RSVP now here: http://www.nocoug.org/rsvp.html

RSVP by October 31 at http://www.nocoug.org/next.html to receive a free pass to the exhibition at the Computer History Museum.

From

to

What the heck is the SQL Execution ID – SQL_EXEC_ID?

Ok, I think it’s time to write another blog entry. I’ve been traveling and dealing with jetlag from 10-hour time difference, then traveling some more, spoken at conferences, drank beer, had fun, then traveled some more, trained customers, hacked some Exadatas and now I’m back home.

Anyway, do you know what is the SQL_EXEC_ID in V$SESSION and ASH views?

Oh yeah, it’s the “SQL Execution ID” just like the documentation says … all clear. Um … is it? I’d like to know more about it – what does it actually stand for?! Is it session level, instance level or a RAC-global counter? And why does it start from 16 million, not 1?

 

SQL> SELECT sql_exec_id FROM v$session WHERE sid = USERENV('SID');

SQL_EXEC_ID
-----------
   16777216

 

This number 16777216 looks strangely familiar – indeed, it’s 2^24.

When I run the same query again (incrementing the SQL_EXEC_ID counter for the same SQL), I see the counter going up by 1:

 

SQL> SELECT sql_exec_id FROM v$session WHERE sid = USERENV('SID');

SQL_EXEC_ID
-----------
   16777217

SQL> SELECT sql_exec_id FROM v$session WHERE sid = USERENV('SID');

SQL_EXEC_ID
-----------
   16777218

SQL> SELECT sql_exec_id FROM v$session WHERE sid = USERENV('SID');

SQL_EXEC_ID
-----------
   16777219

SQL> SELECT sql_exec_id FROM v$session WHERE sid = USERENV('SID');

SQL_EXEC_ID
-----------
   16777220

 

Further executions of the same query keep incrementing this counter, one by one – even if I run this same SQL from another session in the same instance. So, this SQL_EXEC_ID is not a session-scope value for each SQL_ID, it’s at least instance-wide. It looks like the counting starts from 2^24 (the bit representing 2^24 is set) and ignoring that bit for now, the counting works normally, one by one, starting from zero.

 

Note that changing even a single character in the SQL text (see the extra space in the end before the semi-colon) causes the SQL_ID to change and a different SQL_EXEC_ID counter to be reported (which starts from “zero” again). A separate SQL_EXEC_ID counter is maintained in shared pool for each SQL_ID:

SQL> SELECT sql_exec_id FROM v$session WHERE sid = USERENV('SID') ;

SQL_EXEC_ID
-----------
   16777216

 

So, obviously, when I have just restarted my instance and still see 16777216 as the starting SQL_EXEC_ID for any SQL I execute, it must mean that the full SQL_EXEC_ID value contains something else than just the execution number of this SQL_ID. Whenever I see such familiar values (like powers of 2), then I like to look into the values in hex format to see whether some higher order bits are used for some special purpose. Let’s run a new SQL statement:

 

SQL> SELECT sql_exec_id, TO_CHAR(sql_exec_id,'XXXXXXXX') hex FROM v$session WHERE sid = USERENV('SID') ;

SQL_EXEC_ID HEX
----------- ---------
   16777216   1000000

SQL> SELECT sql_exec_id, TO_CHAR(sql_exec_id,'XXXXXXXX') hex FROM v$session WHERE sid = USERENV('SID') ;

SQL_EXEC_ID HEX
----------- ---------
   16777217   1000001

SQL> SELECT sql_exec_id, TO_CHAR(sql_exec_id,'XXXXXXXX') hex FROM v$session WHERE sid = USERENV('SID') ;

SQL_EXEC_ID HEX
----------- ---------
   16777218   1000002

 

Indeed, it looks like the 25th bit (2^24) is always pre-set to 1, while the least significant 24 bits represent how many times this SQL ID has been executed in an instance (I have tested this with a loop – the 24 least significant bits do get used fully for representing the SQL ID’s execution count in the instance and once it reaches 0xFFFFFF – or 0x1FFFFFF with that pre-set 25th bit, it wraps to 0×1000000 – the 25th bit still remaining set!). So the SQL_EXEC_ID can reliably only track 2^24 – 1 SQL executions in an instance and then the counter wraps to beginning. This is why you should include SQL_EXEC_START (date datatype with 1 sec precision) column in your performance monitoring queries as well, to distinguish between SQL executions with a colliding SQL_EXEC_ID. As long as you’re executing your SQL statement less than 16.7 million times per second per instance, this should be fine :-)

 

Anyway, so what’s the magic 25th bit then? Well, in RAC it would be very hard to somehow coordinate the incrementing of a single counter globally (that’s why you want to keep your sequences cached in RAC), I figure that there are different counters for the same SQL ID in different RAC instances. Let’s check – I will log in to another RAC node (node 2) and run this:

 

SQL> SELECT sql_exec_id, TO_CHAR(sql_exec_id,'XXXXXXXX') hex FROM v$session WHERE sid = USERENV('SID') ;

SQL_EXEC_ID HEX
----------- ---------
   33554433   2000001

SQL> SELECT sql_exec_id, TO_CHAR(sql_exec_id,'XXXXXXXX') hex FROM v$session WHERE sid = USERENV('SID') ;

SQL_EXEC_ID HEX
----------- ---------
   33554434   2000002

SQL> SELECT sql_exec_id, TO_CHAR(sql_exec_id,'XXXXXXXX') hex FROM v$session WHERE sid = USERENV('SID') ;

SQL_EXEC_ID HEX
----------- ---------
   33554435   2000003

 

Whoa – the SQL Execution ID in the 2nd instance starts from 33 Million! And when you convert the value to hex, you’ll see that now the 26th bit is set – showing that this SQL was executed in instance #2!

So, it very much looks like that while the 24 least significant bits are used for the SQL execution ID counter, the more significant bits are used for showing which instance_id ran that SQL. Assuming that 32 bits are used for the whole SQL_EXEC_ID value, then up to 8 higher order bits could be used for storing the instance_id – supporting up to 256-node RAC clusters. This is very useful when analyzing past ASH data as you can aggregate data (count min/max exec ID difference to get the execution counts in a time range) either in each separate instance or globally – by stripping out the instance_id part from the value.

I haven’t tested the instance_id part with 256-node RAC clusters (as Santa Claus is cutting back due to poor economy), but at least on an 8-node full rack Exadata all 8 instance_ids were reported properly. Note that for serial queries, the SQL_EXEC_ID shows you the instance_id of the instance where the session is logged on to, but for inter-instance parallel query, you will see the instance_id of the query coordinator for all PX slaves, regardless of in which instances they run. Here’s a little script from a 8-node Exadata cluster to show it. I’ll leave it up to you to fully figure it out what, how and why it’s doing, but basically what it shows is that the SQL_EXEC_ID consists of the query coordinator’s instance_id value and the execution number for a SQL_ID in the instance where the query coordinator session was logged in:

 

SQL> SELECT qc_instance_id, MIN(TO_CHAR(sql_exec_id,'XXXXXXXX'))
  2  , MAX(TO_CHAR(sql_exec_id,'XXXXXXXX'))
  3* FROM gv$active_session_history GROUP BY qc_instance_id order by 1
SQL> /

QC_INSTANCE_ID MIN(TO_CH MAX(TO_CH
-------------- --------- ---------
             1   1000000   100540F
             2   2000000   20009BF
             3   3000000   300541E
             4   4000000   40000DD
             5   5000000   50C5035
             6   6000000   600018C
             7   7000000   700023D
             8   8000000   8000755
                 1000000   803DF3B

9 rows selected.

 

That’s all for today – more cool stuff is coming, I promise :-)

 

And oh, next week I’ll start another run of my Advanced Oracle Troubleshooting seminar, so check it out! ;-)

Xplan utility

A utility to add parent ID and execution order information to plans reported by DBMS_XPLAN. XPlan includes DISPLAY, DISPLAY_CURSOR and DISPLAY_AWR functionality for use in exactly the same way as the DBMS_XPLAN equivalents. Supports versions from 10g onwards. ***Update*** Now available in two formats: 1) as a PL/SQL package and 2) as a collection of three free-standing SQL*Plus scripts (i.e. no installation/database objects needed). January 2009 (updated October 2011)