Top 60 Oracle Blogs

Recent comments

July 2011

Wait Event and Wait Class Metrics vs v$system_event

This post is a followup to the first metric post on statistics:

Unfortunately the wait event and wait class interface for metrics is not as friendly as it is for statistics.
For wait event views we have (at system level)

  • V$SYSTEM_EVENT – wait events cumulative since startup
  • V$EVENTMETRIC – wait event deltas last 60 seconds
  • DBA_HIST_SYSTEM_EVENT – wait events by snapshot (hour) for last week, cumulative since startup

For wait class we have

  • V$SYSTEM_WAIT_CLASS – cumulative since start up
  • V$WAITCLASSMETRIC – last 60 seconds deltas
  • V$WAITCLASSMETRIC_HISTORY – 60 seconds deltas for last hour

Remember DBA_HIST_WAITCLASSMETRIC_HISTORY is used for alerts and or baselines not everyday stats.

I use wait events for two things:

  1. load/bottlenecks on the system
  2. I/O Latencies

The data in V$ACTIVE_SESSION_HISTORY (ASH) is probably better for seeing load because load is made up of both wait time and cpu time. Cpu information is not in the event/waitclass views but is in ASH along with the waits.

The second part, the  latencies, specifically I/O latencies,  are only available in the wait event and waitclass views (and the filestat views on a per file basis)


Latencies with WAIT CLASS

The waitclass views are mainly good for rollup and rolling up latencies is probably of little use. One possible use  is determining the average read I/O  for all the various kinds of read I/O and read sizes:

select 10*time_waited/nullif(wait_count,0) avg_io_ms -- convert centi-seconds to milliseconds
from   v$waitclassmetric  m
       where wait_class_id= 1740759767 --  User I/O

One issue with V$WAITCLASSMETRIC is that the WAIT_CLASS name is not in the view, so we either have to use the WAIT_CLASS_ID (the hash of the name)  as above or join to V$SYSTEM_WAIT_CLASS as below

        10*m.time_waited/nullif(m.wait_count,0) avgms -- convert centisecs to ms
from   v$waitclassmetric  m,
       v$system_wait_class n
where m.wait_class_id=n.wait_class_id
  and n.wait_class='User I/O'

Another issue is that the documentation for 11gR2 says that the TIME_WAITED is microseconds but in my tests it’s actually centisecs

Name                                        Type
-----------------------------------------  ----------------------------
WAIT_CLASS_ID                                NUMBER
WAIT_CLASS#                                  NUMBER
WAIT_CLASS                                   VARCHAR2(64)
TOTAL_WAITS                                  NUMBER
TIME_WAITED                                  NUMBER  - centi-seconds

The view V$SYSTEM_WAIT_CLASS is not of much use since it’s just cumulative values from database   startup and the deltas are already calculated in V$WAITCLASSMETRIC, on the other hand V$SYSTEM_WAIT_CLASS gives a list of wait classes and decodes the wait_class_id.

 select wait_class_id , wait_class from V$SYSTEM_WAIT_CLASS ;

------------- ----------------------------------------------------------------
   1893977003 Other
   4217450380 Application
   3290255840 Configuration
   4166625743 Administrative
   3875070507 Concurrency
   3386400367 Commit
   2723168908 Idle
   2000153315 Network
   1740759767 User I/O
   4108307767 System I/O

Latencies with Wait Events

For specific I/O latencies there are two choices – v$eventmetric and v$system_event. With v$system_event it requires  running multiple queries and taking the deltas but the deltas are are already calculated in v$eventmetric. Here is an example of getting I/O latencies from v$eventmetric

Latencies in the past minute

col name for a25
select m.intsize_csec, ,
       round(m.time_waited,3) time_waited,
       round(10*m.time_waited/nullif(m.wait_count,0),3) avgms
from v$eventmetric m,
     v$event_name n
where m.event_id=n.event_id
  and in (
                  'db file sequential read',
                  'db file scattered read',
                  'direct path read',
                  'direct path read temp',
                  'direct path write',
                  'direct path write temp',
                  'log file sync',
                  'log file parallel write'
------------ ------------------------- ----------- ---------- ----------
        6017 log file parallel write         2.538          4      6.345
        6017 log file sync                   2.329          1     23.287
        6017 db file sequential read             0          0
        6017 db file scattered read              0          0
        6017 direct path read                    0          0
        6017 direct path read temp               0          0
        6017 direct path write                   0          0
        6017 direct path write temp              0          0

Latencies averaged over each hour

       round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),3) avg_ms
from (
       to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI')  btime,
       total_waits count_end,
       time_waited_micro/1000 time_ms_end,
       Lag (e.time_waited_micro/1000)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg,
       Lag (e.total_waits)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg
   and e.event_name like '%&1%'
order by begin_interval_time
order by btime
BTIME               AVG_MS
--------------- ----------
20-JUL-11 06:00      5.854
20-JUL-11 07:00      4.116
20-JUL-11 08:00     21.158
20-JUL-11 09:02      5.591
20-JUL-11 10:00      4.116
20-JUL-11 11:00      6.248
20-JUL-11 12:00     23.634
20-JUL-11 13:00     22.529
20-JUL-11 14:00      21.62
20-JUL-11 15:00     18.038
20-JUL-11 16:00     23.127


One issue with looking at I/O latencies is determining the I/O sizes.  It would be awesome if there was a view with I/O counts, sizes and latencies in one place. ASH does have this information  but ASH data is weighted to the longer latencies and sizes and not the average. The average sizes have to be gotten from system statistics. The I/O sizes for ‘db file sequential read’ are single block reads, but the other read events can vary in size. To get a general idea of I/O sizes one could just average across all I/O using the system statistics

Average I/O Size (across all I/O waits)

          sum(decode(metric_name,'Physical Reads Per Sec',value,0))*max(intsize_csec)/100  blocks_read,
          nullif(sum(decode(metric_name,'Physical Read IO Requests Per Sec',value,0)),0)*max(intsize_csec)/100  reads,
            sum(decode(metric_name,'Physical Reads Per Sec',value,0))/
          nullif(sum(decode(metric_name,'Physical Read IO Requests Per Sec',value,0)),0) avg_blocks_read
from v$sysmetric
where group_id = 2  -- 60 second deltas only (not the 15 second deltas);

----------- ---------- ---------------
       4798       4798               1

Load and Bottlenecks
The good thing about wait classes is that they simplify dealing with 1000s of wait events and group them into just a few wait classes. We can get a quick view of load on the system with

select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS
from   v$waitclassmetric  m,
       v$system_wait_class n
where m.wait_class_id=n.wait_class_id
and n.wait_class != 'Idle'
WAIT_CLASS             AAS
--------------- ----------
Other                    0
Application              0
Configuration            0
Administrative           0
Concurrency              0
Commit                   0
Network                  0
User I/O              .149
System I/O            .002

but the big drawback with wait event and/or wait class views is that they lack information on CPU load. CPU load can be found in the system statistics but it’s just easier to do it all in one query using v$active_session_history. Here is a query using ASH to calculate AAS load on the database over the last 60 seconds:

            round(count(*)/secs.var,3)     AAS,
            decode(session_state,'ON CPU','CPU',wait_class)  wait_class
       from v$active_session_history ash,
            (select 60 var from dual)  secs
            SAMPLE_TIME > sysdate - (secs.var/(24*60*60)) and
       group by decode(session_state,'ON CPU','CPU',wait_class) , secs.var
---------- ---------------
      .016 Concurrency
      .001 Network
         0 Other
      .083 Configuration
      .001 Administrative
      .034 CPU
         0 System I/O
      .001 Commit
      .054 Application
         0 User I/O

Though with v$sysmetric it’s pretty easy to join to v$waitclassmetric

  select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS
   from   v$waitclassmetric  m,
          v$system_wait_class n
   where m.wait_class_id=n.wait_class_id
   and n.wait_class != 'Idle'
   select  'CPU', round(value/100,3) AAS from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 ;
WAIT_CLASS                                                              AAS
---------------------------------------------------------------- ----------
Administrative                                                            0
Application                                                            .009
CPU                                                                   1.696
Commit                                                                    0
Concurrency                                                            .001
Configuration                                                             0
Network                                                                .002
Other                                                                     0
System I/O                                                                0
User I/O                                                                  0

and adding v$sysmetric into the query allows me to do something I’ve always wanted which is to include the OS CPU in AAS

 select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS
   from   v$waitclassmetric  m,
          v$system_wait_class n
   where m.wait_class_id=n.wait_class_id
   and n.wait_class != 'Idle'
   select  'CPU', round(value/100,3) AAS from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2
   select 'CPU_OS', round((prcnt.busy*parameter.cpu_count)/100,3) - aas.cpu
  ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt,
  ( select value cpu_count from v$parameter where name='cpu_count' )  parameter,
  ( select  'CPU', round(value/100,3) cpu from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2) aas
WAIT_CLASS                                                              AAS
---------------------------------------------------------------- ----------
Administrative                                                            0
Application                                                               0
CPU                                                                    .009
CPU_OS                                                                 .024
Commit                                                                    0
Concurrency                                                               0
Configuration                                                             0
Network                                                                .002
Other                                                                     0
System I/O                                                                0
User I/O                                                                  0
select 10*time_waited/nullif(wait_count,0) avg_io_ms -- convert centi-seconds to milliseconds
from   v$waitclassmetric  m
       where wait_class_id= 1740759767 --  User I/O

One huge loss over using ASH is the loss of the information users waiting for CPU but not running on CPU.

I Can See Clearly Now. Exadata Is Better Than EMC Storage! I Have Seen The Slides! Part II. SuperCluster Storage?

This blog entry is too long.

From Oracle Storage Strategy Update To TPC-C And Back (I Hope)
My recent blog entry entitled I Can See Clearly Now. Exadata Is Better Than EMC Storage! I Have Seen The Slides! Part I was pretty heavily read (over 7,000 views). I was concerned that blogging about something that happened two weeks ago might not be all that interesting. But, since my analysis (opinions) about the June 30, 2011 Oracle Storage Strategy webcast seems to resonate I thought I’d put out this installment.

What Do Transaction Processing Council Benchmarks Have To Do With The Oracle Storage Strategy Update?
I’ve been eagerly anticipating which of IBM or HP would be first to audit a TPC-C with the Xeon E7 (formerly Westmere EX) processor. These vendors have value-add systems componentry that properly extend the vanilla Xeon E7 + QPI capabilities to include scalable 8-socket and very large memory support.

IBM’s x3850 with MAX5 supports 96 32GB low-voltage DIMMS for a total of 3TB RAM with just 4 Sockets. IBM proved the strength of the x3850 several months ago with a 4-socket Nehalem EX (Xeon 7500) result of a little over 2.3 million TpmC. So, part of me was not all that surprised to find that they were able to stay with the recipe and publish a result of just over 3 million TpmC with the Xeon E7 processor and MAX5 chipset (July 11, 2011).  But that has nothing to do with the Oracle Storage Strategy webcast and, in fact, since it was a DB2 number with Linux it has very little to do with Oracle. So why am I blogging this?

While the 3 million TpmC result represents roughly 30% improvement over the Nehalem EX-based result for IBM, I’m saddened the entry was not an 8-socket result. Why? Well, I’ll put it this way. If IBM and HP can’t seem to make 8-socket Xeon boxes able to scale contentious workloads (like TPC-C) then it’s quite likely nobody can. It looks like 8-socket Xeon scalability is still out of reach for us. That is just too bad. But that has nothing to do with the Oracle Storage Strategy webcast. So why am I blogging this? I’m getting to it, trust me.

While perusing the main TPC-C all-results page I noticed three interesting things and one of them actually has to do with the Oracle Storage Strategy webcast!

The three things that caught my eye were:

  1. There are non-clustered Xeon results in the top ten! Sure, the prior IBM x3850 result was in the top ten but when it was published I didn’t catch on to that fact. It wasn’t too long ago that non-clustered x86 boxes were so far down the list as to not matter.
  2. In the ranks of the top-ten results there are two submissions that are less than $1.00/TpmC. I think that is quite significant when you compare to historical costs. Top ten TPC-C results with Xeon at < $1.00/TpmC—wow.
  3. None of the products mentioned in the Oracle Storage Strategy webcast appear in the top ten TPC-C nor TPC-H for that matter. The last Oracle TPC-H result was a 3TB scale M9000 result with Sun Storage 6000 (Sun Storage 6000 is LSI Engenio hardware and the Engenio brand is now owned by Netapp for what it’s worth).

So, obviously, point 3 in the list is what brings me back to the Oracle Storage Strategy Update June 29, 2011 (slides). If one publishes an industry benchmark that performs 3x over the closest competitor—as Oracle did with the SuperCluster 30 million TpmC result—wouldn’t the system (including storage) used to do so be considered a premiere system offering? One would think so—especially when the workload is an I/O intensive workload! But no, generally speaking the configurations used in TPC benchmarks are not to be confused with systems intended for production.

Concept Car or Production Car
The difference between TPC configurations and production configurations is a lot like the difference between a concept car and a car offered by the same manufacturer that is actually sitting on a lot with a price sticker on it. The concept car and the production car have a lot in common—but the differences are usually pretty obvious as well. We shouldn’t have a problem with this. I still think TPC benchmarks are good for certain purposes. An example of one such purpose is to see just how small the line is getting between the “concept car” and the “production car.”

SuperCluster Storage or Oracle Storage Strategy Line-up?
No, the “SuperCluster Storage” that was used for the 30 million TpmC result is not in the Storage Strategy line-up. So then what was the 30 million TpmC “concept car” storage? Take a peek at this link or let me summarize. The SuperCluster storage consisted of the following main ingredients:

  1. 97 Sun X4270M2 servers with one Intel Xeon removed. The 4270 servers ran Solaris and COMSTAR. As such, the servers play the role of “array heads” in order to perform protocol exchange between SAS and Fibre Channel. Why? Because the storage networking was Fibre Channel (108 8GFC Fibre Channel HBAs connecting the 27 Real Application Clusters nodes (4 HBAs each) to the COMSTAR heads and SAS from the COMSTAR heads to the storage.
  2. 138 Sun Storage F5100 Flash Array devices. That bit was $22,000,000. Remember the analogy about the concept car.

So a high-level schematic of the flow of data was F5100 SAS->COMSTAR head (SAS to FC)-> FC switches-> Sun T3-4 Servers. Don’t be alarmed by that many “hops” because they don’t really matter. Indeed, the 30 million TpmC SuperCluster delivered an average New Order response time of 0.35s, which is 69% faster than the IBM p780 result of 1.14 seconds.  That’s a point Oracle marketing pushes vigorously. Oracle marketing doesn’t, however, seem to push the fact that while HP was still Oracle’s premiere hardware partner they teamed with HP to deliver what was, at the time, a world record TPC-C using the recently-shunned Itanium processor. Moreover, they most certainly don’t push the fact that the circa-2007 Itanium TPC-C with Oracle10g delivered New Order average service times of 0.24s—which was 32% faster service times than the SuperCluster! Fine details matter.

Concept Car to Oracle Storage Strategy Update
No, there is no evolution from concept to reality where the COMSTAR+F5100 approach is concerned. In fact, Oracle spelled out quite clearly how the storage recipe for these SuperClusters will be “Sun ZFS Storage 7420” which means either FC, iSCSI or NFS—but no Exadata since there is no port of Exadata iDB to SPARC. I think the ZFS Storage Appliance is a reasonable product but I wouldn’t want to stick my arm in the unified storage meat-grinder with the likes of EMC VNX and Netapp.

So, no, the storage used for the SuperCluster TPC-C shows no promise at this time of evolving from concept to production. However, Oracle customers should be glad because yet another addition to the storage strategy would be all too confusing in my opinion.

Final Words About That IBM x3850 Xeon E7 TPC-C Result
The Oracle SuperCluster result of 30 million TpmC (.353s average New Order service time) didn’t beat out the service times of the ancient Itanium 2 based SuperDome New Order transactions, but at least it also failed to beat the IBM x3850 average service times!

The IBM x3850 pumped out over 3 million TpmC with average New Order service times of .272s and all that for $.59/TpmC. How? Well, the storage wasn’t a concept. The lion’s share of the I/O was serviced by 136 SFF SAS SSDs! That’s about 1/50th the cost for storage for 1/10th the transaction throughput when compared to the SuperCluster. And faster transaction service times too.

Intel Xeon is my concept car of choice—and you can run about any software you so choose on it so that makes it even better. And regardless of what software I chose to run I would rather it not be stored in “concept storage.”

This blog entry was too long.

Filed under: oracle

The Oracle Exadata IO Resource Manager Limit Clause

The Exadata IO Resource Manager (IORM) is a fundamental piece of the exadata database machine since the beginning. The function of the IORM is to guarantee a database and/or category (set via the database resource manager) gets the share of IO as defined in the resource plan at each storage server. The “share” here means the minimal amount of IO. This is widely known and documented in the Exadata documentation. Having a minimal amount of IO means you can guarantee a service level agreement (SLA).

But what if you want to define the maximal amount of IO a database may get? The case of a maximal amount of IO a database can get is that you can use a database on Exadata, and do not (or very limited) change the performance once more databases get onto the same database machine. So instead of having the minimal amount of IO a database must get guaranteed (which is what is documented in the exadata documentation), having a limit on the maximal amount a database can get?

NFS throughput testing trick (solaris)

When testing the read and write speed to an NFS mounted file system it’s often unclear if the bottleneck is the speed of the network connection or the speed of the underlying storage. For the underlying storage it would be nice just take it out of the equation and concentrate on the network speed. For network speed testing for NFS mount here is a cool trick to take the storage subsystem out of the equation by using a ramdisk. On the NFS server create a ramdisk

ramdiskadm -a ramdisk1 1000m
newfs /dev/rramdisk/ramdisk1
mkdir /ramdisk
mount /dev/ramdisk/ramdisk1 /ramdisk
share -F nfs -o rw /ramdisk
chmod 777 /ramdisk

Then on the NFS client, for example LINUX, mount the ramdisk

mkdir /ramdisk
mount -t nfs  -o  'rw,bg,hard,nointr,rsize=1048576,wsize=1048576,tcp,nfsvers=3,timeo=600' /home/odbadmin/ramdisk

Now, on the target I can read and write to the ramdisk and avoid the disk speed issues:

time  dd if=/ramdisk/toto of=/dev/null bs=1024k count=800
838860800 bytes (839 MB) copied, 7.74495 seconds, 108 MB/s

For read tests, be aware that in many cases, reads will come from caching on the client side after the first run.

Understanding and Interpreting Deadlocks

This is my deadlocks presentation, first given at Hotsos 2006 in Dallas, TX.

VirtualBox 4.1 Released…

Hot on the heels of the VirtualBox 4.0.12 maintenance release, shipped a few days ago, comes VirtualBox 4.1. It contains loads of new features, explained here and in the changelog.

The upgrade went smoothly on my MacBook Pro, but on my Fedora 15 servers I had to uninstall the old version manually before installing the new version. None of my settings were lost so everything was easy enough.

It certainly seems applying VirtualBox upgrades is becoming a fulltime job. Of course, the quick release cycle is a lot better than getting no updates, like VMware Server. :)



Bitmap Indexes & Minimize Records_Per_Block (Little Wonder)

As mentioned in my first post regarding the use of Bitmap Indexes to service Not Equal predicates, an index entry in a Bitmap Index consists of the indexed value and a pair of rowids that determine the range of potential rows spanned by the corresponding bitmap sequence. However, Oracle has no way of determining how many [...]

Listener won’t register database


I must admit, I’m a sqlnet hack.  I never really sat down to figure it out. If I have problems getting a database to register with a currently listener then I generally just add a separate listener. Well, today I was stuck on a system where I couldn’t add a second listener. Everything seems fine.  There was a default listener running on 1521 with  a database instance already registered. The first and second database were in the same ORACLE_HOME but I just couldn’t get the listener  to pick up the second databases.  I bounced and reloaded the listener and registered the second database

  • lsnrctl start
  • lsnrctl stop
  • lsnrctl start
  • alter system register

No dice. Then I tried

alter system set local_listener=’(ADDRESS = (PROTOCOL=TCP)(HOST =myhost)(PORT=1521))’

and bang, it registered.

Of course, I guess I should track down why it wouldn’t register in the first place but getting it working for now was enough.




Google+ Hangouts and Support…

You open a ticket and wait… When you do get a reply it tells you to send information you’ve already posted, or suggests you try some workarounds you’ve already listed in the ticket as having not worked for you. You get frustrated and write a blog post ranting about how terrible the support service is etc. I guess this could be a story about just about any internet support service I’ve had to use over the years.

Do you remember in the old days, before the internet was popular, when you phoned support lines? Do you remember how quickly some of these annoying issues were resolved by simply saying, “I’ve already sent that!”, to a real person at the end of the line? OK. I’ve conveniently forgotten to mention being put on hold for hours, but this is my blog and I’m allowed to have a totally biased opinion about things… :)

Maybe elements of the good old days are coming back thanks to social media. Check out this article where Michael Dell proposes using Google+ Hangouts as a way of connecting to Dell service and sales.

Imagine the joy of being able to rant directly at a real person again. :)