This post is a followup to the first metric post on statistics:
Unfortunately the wait event and wait class interface for metrics is not as friendly as it is for statistics.
For wait event views we have (at system level)
For wait class we have
Remember DBA_HIST_WAITCLASSMETRIC_HISTORY is used for alerts and or baselines not everyday stats.
I use wait events for two things:
The data in V$ACTIVE_SESSION_HISTORY (ASH) is probably better for seeing load because load is made up of both wait time and cpu time. Cpu information is not in the event/waitclass views but is in ASH along with the waits.
The second part, the latencies, specifically I/O latencies, are only available in the wait event and waitclass views (and the filestat views on a per file basis)
Latencies with WAIT CLASS
The waitclass views are mainly good for rollup and rolling up latencies is probably of little use. One possible use is determining the average read I/O for all the various kinds of read I/O and read sizes:
select 10*time_waited/nullif(wait_count,0) avg_io_ms -- convert centi-seconds to milliseconds from v$waitclassmetric m where wait_class_id= 1740759767 -- User I/O / AVG_IO_MS ---------- 8.916
One issue with V$WAITCLASSMETRIC is that the WAIT_CLASS name is not in the view, so we either have to use the WAIT_CLASS_ID (the hash of the name) as above or join to V$SYSTEM_WAIT_CLASS as below
select 10*m.time_waited/nullif(m.wait_count,0) avgms -- convert centisecs to ms from v$waitclassmetric m, v$system_wait_class n where m.wait_class_id=n.wait_class_id and n.wait_class='User I/O' / AVG_IO_MS ---------- 8.916
Another issue is that the documentation for 11gR2 says that the TIME_WAITED is microseconds but in my tests it’s actually centisecs
desc V$SYSTEM_WAIT_CLASS Name Type ----------------------------------------- ---------------------------- WAIT_CLASS_ID NUMBER WAIT_CLASS# NUMBER WAIT_CLASS VARCHAR2(64) TOTAL_WAITS NUMBER TIME_WAITED NUMBER - centi-seconds
The view V$SYSTEM_WAIT_CLASS is not of much use since it’s just cumulative values from database startup and the deltas are already calculated in V$WAITCLASSMETRIC, on the other hand V$SYSTEM_WAIT_CLASS gives a list of wait classes and decodes the wait_class_id.
select wait_class_id , wait_class from V$SYSTEM_WAIT_CLASS ; WAIT_CLASS_ID WAIT_CLASS ------------- ---------------------------------------------------------------- 1893977003 Other 4217450380 Application 3290255840 Configuration 4166625743 Administrative 3875070507 Concurrency 3386400367 Commit 2723168908 Idle 2000153315 Network 1740759767 User I/O 4108307767 System I/O
Latencies with Wait Events
For specific I/O latencies there are two choices – v$eventmetric and v$system_event. With v$system_event it requires running multiple queries and taking the deltas but the deltas are are already calculated in v$eventmetric. Here is an example of getting I/O latencies from v$eventmetric
Latencies in the past minute
col name for a25 select m.intsize_csec, n.name , round(m.time_waited,3) time_waited, m.wait_count, round(10*m.time_waited/nullif(m.wait_count,0),3) avgms from v$eventmetric m, v$event_name n where m.event_id=n.event_id and n.name in ( 'db file sequential read', 'db file scattered read', 'direct path read', 'direct path read temp', 'direct path write', 'direct path write temp', 'log file sync', 'log file parallel write' ) / INTSIZE_CSEC NAME TIME_WAITED WAIT_COUNT AVGMS ------------ ------------------------- ----------- ---------- ---------- 6017 log file parallel write 2.538 4 6.345 6017 log file sync 2.329 1 23.287 6017 db file sequential read 0 0 6017 db file scattered read 0 0 6017 direct path read 0 0 6017 direct path read temp 0 0 6017 direct path write 0 0 6017 direct path write temp 0 0
Latencies averaged over each hour
select btime, round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),3) avg_ms from ( select to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI') btime, total_waits count_end, time_waited_micro/1000 time_ms_end, Lag (e.time_waited_micro/1000) OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg, Lag (e.total_waits) OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg from DBA_HIST_SYSTEM_EVENT e, DBA_HIST_SNAPSHOT s where s.snap_id=e.snap_id and e.event_name like '%&1%' order by begin_interval_time ) order by btime / BTIME AVG_MS --------------- ---------- 20-JUL-11 06:00 5.854 20-JUL-11 07:00 4.116 20-JUL-11 08:00 21.158 20-JUL-11 09:02 5.591 20-JUL-11 10:00 4.116 20-JUL-11 11:00 6.248 20-JUL-11 12:00 23.634 20-JUL-11 13:00 22.529 20-JUL-11 14:00 21.62 20-JUL-11 15:00 18.038 20-JUL-11 16:00 23.127
One issue with looking at I/O latencies is determining the I/O sizes. It would be awesome if there was a view with I/O counts, sizes and latencies in one place. ASH does have this information but ASH data is weighted to the longer latencies and sizes and not the average. The average sizes have to be gotten from system statistics. The I/O sizes for ‘db file sequential read’ are single block reads, but the other read events can vary in size. To get a general idea of I/O sizes one could just average across all I/O using the system statistics
Average I/O Size (across all I/O waits)
select sum(decode(metric_name,'Physical Reads Per Sec',value,0))*max(intsize_csec)/100 blocks_read, nullif(sum(decode(metric_name,'Physical Read IO Requests Per Sec',value,0)),0)*max(intsize_csec)/100 reads, sum(decode(metric_name,'Physical Reads Per Sec',value,0))/ nullif(sum(decode(metric_name,'Physical Read IO Requests Per Sec',value,0)),0) avg_blocks_read from v$sysmetric where group_id = 2 -- 60 second deltas only (not the 15 second deltas); BLOCKS_READ READS AVG_BLOCKS_READ ----------- ---------- --------------- 4798 4798 1
Load and Bottlenecks
The good thing about wait classes is that they simplify dealing with 1000s of wait events and group them into just a few wait classes. We can get a quick view of load on the system with
select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS from v$waitclassmetric m, v$system_wait_class n where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle' ; WAIT_CLASS AAS --------------- ---------- Other 0 Application 0 Configuration 0 Administrative 0 Concurrency 0 Commit 0 Network 0 User I/O .149 System I/O .002
but the big drawback with wait event and/or wait class views is that they lack information on CPU load. CPU load can be found in the system statistics but it’s just easier to do it all in one query using v$active_session_history. Here is a query using ASH to calculate AAS load on the database over the last 60 seconds:
select round(count(*)/secs.var,3) AAS, decode(session_state,'ON CPU','CPU',wait_class) wait_class from v$active_session_history ash, (select 60 var from dual) secs where SAMPLE_TIME > sysdate - (secs.var/(24*60*60)) and SESSION_TYPE = 'FOREGROUND' group by decode(session_state,'ON CPU','CPU',wait_class) , secs.var / AAS WAIT_CLASS ---------- --------------- .016 Concurrency .001 Network 0 Other .083 Configuration .001 Administrative .034 CPU 0 System I/O .001 Commit .054 Application 0 User I/O
Though with v$sysmetric it’s pretty easy to join to v$waitclassmetric
select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS from v$waitclassmetric m, v$system_wait_class n where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle' union select 'CPU', round(value/100,3) AAS from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 ; WAIT_CLASS AAS ---------------------------------------------------------------- ---------- Administrative 0 Application .009 CPU 1.696 Commit 0 Concurrency .001 Configuration 0 Network .002 Other 0 System I/O 0 User I/O 0
and adding v$sysmetric into the query allows me to do something I’ve always wanted which is to include the OS CPU in AAS
select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS from v$waitclassmetric m, v$system_wait_class n where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle' union select 'CPU', round(value/100,3) AAS from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 union select 'CPU_OS', round((prcnt.busy*parameter.cpu_count)/100,3) - aas.cpu from ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt, ( select value cpu_count from v$parameter where name='cpu_count' ) parameter, ( select 'CPU', round(value/100,3) cpu from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2) aas ; WAIT_CLASS AAS ---------------------------------------------------------------- ---------- Administrative 0 Application 0 CPU .009 CPU_OS .024 Commit 0 Concurrency 0 Configuration 0 Network .002 Other 0 System I/O 0 User I/O 0
select 10*time_waited/nullif(wait_count,0) avg_io_ms -- convert centi-seconds to milliseconds from v$waitclassmetric m where wait_class_id= 1740759767 -- User I/O /
One huge loss over using ASH is the loss of the information users waiting for CPU but not running on CPU.
This blog entry is too long.
From Oracle Storage Strategy Update To TPC-C And Back (I Hope)
My recent blog entry entitled I Can See Clearly Now. Exadata Is Better Than EMC Storage! I Have Seen The Slides! Part I was pretty heavily read (over 7,000 views). I was concerned that blogging about something that happened two weeks ago might not be all that interesting. But, since my analysis (opinions) about the June 30, 2011 Oracle Storage Strategy webcast seems to resonate I thought I’d put out this installment.
What Do Transaction Processing Council Benchmarks Have To Do With The Oracle Storage Strategy Update?
I’ve been eagerly anticipating which of IBM or HP would be first to audit a TPC-C with the Xeon E7 (formerly Westmere EX) processor. These vendors have value-add systems componentry that properly extend the vanilla Xeon E7 + QPI capabilities to include scalable 8-socket and very large memory support.
IBM’s x3850 with MAX5 supports 96 32GB low-voltage DIMMS for a total of 3TB RAM with just 4 Sockets. IBM proved the strength of the x3850 several months ago with a 4-socket Nehalem EX (Xeon 7500) result of a little over 2.3 million TpmC. So, part of me was not all that surprised to find that they were able to stay with the recipe and publish a result of just over 3 million TpmC with the Xeon E7 processor and MAX5 chipset (July 11, 2011). But that has nothing to do with the Oracle Storage Strategy webcast and, in fact, since it was a DB2 number with Linux it has very little to do with Oracle. So why am I blogging this?
While the 3 million TpmC result represents roughly 30% improvement over the Nehalem EX-based result for IBM, I’m saddened the entry was not an 8-socket result. Why? Well, I’ll put it this way. If IBM and HP can’t seem to make 8-socket Xeon boxes able to scale contentious workloads (like TPC-C) then it’s quite likely nobody can. It looks like 8-socket Xeon scalability is still out of reach for us. That is just too bad. But that has nothing to do with the Oracle Storage Strategy webcast. So why am I blogging this? I’m getting to it, trust me.
While perusing the main TPC-C all-results page I noticed three interesting things and one of them actually has to do with the Oracle Storage Strategy webcast!
The three things that caught my eye were:
So, obviously, point 3 in the list is what brings me back to the Oracle Storage Strategy Update June 29, 2011 (slides). If one publishes an industry benchmark that performs 3x over the closest competitor—as Oracle did with the SuperCluster 30 million TpmC result—wouldn’t the system (including storage) used to do so be considered a premiere system offering? One would think so—especially when the workload is an I/O intensive workload! But no, generally speaking the configurations used in TPC benchmarks are not to be confused with systems intended for production.
Concept Car or Production Car
The difference between TPC configurations and production configurations is a lot like the difference between a concept car and a car offered by the same manufacturer that is actually sitting on a lot with a price sticker on it. The concept car and the production car have a lot in common—but the differences are usually pretty obvious as well. We shouldn’t have a problem with this. I still think TPC benchmarks are good for certain purposes. An example of one such purpose is to see just how small the line is getting between the “concept car” and the “production car.”
SuperCluster Storage or Oracle Storage Strategy Line-up?
No, the “SuperCluster Storage” that was used for the 30 million TpmC result is not in the Storage Strategy line-up. So then what was the 30 million TpmC “concept car” storage? Take a peek at this link or let me summarize. The SuperCluster storage consisted of the following main ingredients:
So a high-level schematic of the flow of data was F5100 SAS->COMSTAR head (SAS to FC)-> FC switches-> Sun T3-4 Servers. Don’t be alarmed by that many “hops” because they don’t really matter. Indeed, the 30 million TpmC SuperCluster delivered an average New Order response time of 0.35s, which is 69% faster than the IBM p780 result of 1.14 seconds. That’s a point Oracle marketing pushes vigorously. Oracle marketing doesn’t, however, seem to push the fact that while HP was still Oracle’s premiere hardware partner they teamed with HP to deliver what was, at the time, a world record TPC-C using the recently-shunned Itanium processor. Moreover, they most certainly don’t push the fact that the circa-2007 Itanium TPC-C with Oracle10g delivered New Order average service times of 0.24s—which was 32% faster service times than the SuperCluster! Fine details matter.
Concept Car to Oracle Storage Strategy Update
No, there is no evolution from concept to reality where the COMSTAR+F5100 approach is concerned. In fact, Oracle spelled out quite clearly how the storage recipe for these SuperClusters will be “Sun ZFS Storage 7420” which means either FC, iSCSI or NFS—but no Exadata since there is no port of Exadata iDB to SPARC. I think the ZFS Storage Appliance is a reasonable product but I wouldn’t want to stick my arm in the unified storage meat-grinder with the likes of EMC VNX and Netapp.
So, no, the storage used for the SuperCluster TPC-C shows no promise at this time of evolving from concept to production. However, Oracle customers should be glad because yet another addition to the storage strategy would be all too confusing in my opinion.
Final Words About That IBM x3850 Xeon E7 TPC-C Result
The Oracle SuperCluster result of 30 million TpmC (.353s average New Order service time) didn’t beat out the service times of the ancient Itanium 2 based SuperDome New Order transactions, but at least it also failed to beat the IBM x3850 average service times!
The IBM x3850 pumped out over 3 million TpmC with average New Order service times of .272s and all that for $.59/TpmC. How? Well, the storage wasn’t a concept. The lion’s share of the I/O was serviced by 136 SFF SAS SSDs! That’s about 1/50th the cost for storage for 1/10th the transaction throughput when compared to the SuperCluster. And faster transaction service times too.
Intel Xeon is my concept car of choice—and you can run about any software you so choose on it so that makes it even better. And regardless of what software I chose to run I would rather it not be stored in “concept storage.”
This blog entry was too long.
Filed under: oracle
The Exadata IO Resource Manager (IORM) is a fundamental piece of the exadata database machine since the beginning. The function of the IORM is to guarantee a database and/or category (set via the database resource manager) gets the share of IO as defined in the resource plan at each storage server. The “share” here means the minimal amount of IO. This is widely known and documented in the Exadata documentation. Having a minimal amount of IO means you can guarantee a service level agreement (SLA).
But what if you want to define the maximal amount of IO a database may get? The case of a maximal amount of IO a database can get is that you can use a database on Exadata, and do not (or very limited) change the performance once more databases get onto the same database machine. So instead of having the minimal amount of IO a database must get guaranteed (which is what is documented in the exadata documentation), having a limit on the maximal amount a database can get?
When testing the read and write speed to an NFS mounted file system it’s often unclear if the bottleneck is the speed of the network connection or the speed of the underlying storage. For the underlying storage it would be nice just take it out of the equation and concentrate on the network speed. For network speed testing for NFS mount here is a cool trick to take the storage subsystem out of the equation by using a ramdisk. On the NFS server create a ramdisk
ramdiskadm -a ramdisk1 1000m newfs /dev/rramdisk/ramdisk1 mkdir /ramdisk mount /dev/ramdisk/ramdisk1 /ramdisk share -F nfs -o rw /ramdisk chmod 777 /ramdisk
Then on the NFS client, for example LINUX, mount the ramdisk
mkdir /ramdisk mount -t nfs -o 'rw,bg,hard,nointr,rsize=1048576,wsize=1048576,tcp,nfsvers=3,timeo=600' 192.168.1.1:/ramdisk /home/odbadmin/ramdisk
Now, on the target I can read and write to the ramdisk and avoid the disk speed issues:
time dd if=/ramdisk/toto of=/dev/null bs=1024k count=800 838860800 bytes (839 MB) copied, 7.74495 seconds, 108 MB/s
For read tests, be aware that in many cases, reads will come from caching on the client side after the first run.
This is my deadlocks presentation, first given at Hotsos 2006 in Dallas, TX.
The upgrade went smoothly on my MacBook Pro, but on my Fedora 15 servers I had to uninstall the old version manually before installing the new version. None of my settings were lost so everything was easy enough.
It certainly seems applying VirtualBox upgrades is becoming a fulltime job. Of course, the quick release cycle is a lot better than getting no updates, like VMware Server.
As mentioned in my first post regarding the use of Bitmap Indexes to service Not Equal predicates, an index entry in a Bitmap Index consists of the indexed value and a pair of rowids that determine the range of potential rows spanned by the corresponding bitmap sequence. However, Oracle has no way of determining how many [...]
I must admit, I’m a sqlnet hack. I never really sat down to figure it out. If I have problems getting a database to register with a currently listener then I generally just add a separate listener. Well, today I was stuck on a system where I couldn’t add a second listener. Everything seems fine. There was a default listener running on 1521 with a database instance already registered. The first and second database were in the same ORACLE_HOME but I just couldn’t get the listener to pick up the second databases. I bounced and reloaded the listener and registered the second database
No dice. Then I tried
alter system set local_listener=’(ADDRESS = (PROTOCOL=TCP)(HOST =myhost)(PORT=1521))’
and bang, it registered.
Of course, I guess I should track down why it wouldn’t register in the first place but getting it working for now was enough.
Earlier I did a presentation at VirtaThon which is the same topic that I presented at Hotsos 2011.. Mining the AWR and Capacity Planning are very dear to my heart and up until now I’m using every research I did on that presentation to work on an “Exadata Provisioning Tool” which I’m planning to present at the next Hotsos 2012… well, the only thing that’s different this time is.. my attendees are virtual geeks all over the world
You open a ticket and wait… When you do get a reply it tells you to send information you’ve already posted, or suggests you try some workarounds you’ve already listed in the ticket as having not worked for you. You get frustrated and write a blog post ranting about how terrible the support service is etc. I guess this could be a story about just about any internet support service I’ve had to use over the years.
Do you remember in the old days, before the internet was popular, when you phoned support lines? Do you remember how quickly some of these annoying issues were resolved by simply saying, “I’ve already sent that!”, to a real person at the end of the line? OK. I’ve conveniently forgotten to mention being put on hold for hours, but this is my blog and I’m allowed to have a totally biased opinion about things…
Imagine the joy of being able to rant directly at a real person again.