I just learned something new yesterday when demoing large page use on Linux during my AOT seminar.
I had 512 x 2MB hugepages configured in Linux ( 1024 MB ). So I set the USE_LARGE_PAGES = TRUE (it actually is the default anyway in 188.8.131.52+). This allows the use of large pages (it doesn’t force, the ONLY option would force the use of hugepages, otherwise the instance wouldn’t start up). Anyway, the previous behavior with hugepages was, that if Oracle was not able to allocate the entire SGA from the hugepages area, it would silently allocate the entire SGA from small pages. It was all or nothing. But to my surprise, when I set my SGA_MAX_SIZE bigger than the amount of allocated hugepages in my testing, the instance started up and the hugepages got allocated too!
It’s just that the remaining part was allocated as small pages, as mentioned in the alert log entry below (and in the latest documentation too – see the link above):
Thu Oct 24 20:58:47 2013 ALTER SYSTEM SET sga_max_size='#ff0000;">1200M' SCOPE=SPFILE; Thu Oct 24 20:58:54 2013 Shutting down instance (abort) License high water mark = 19 USER (ospid: 18166): terminating the instance Instance terminated by USER, pid = 18166 Thu Oct 24 20:58:55 2013 Instance shutdown complete Thu Oct 24 20:59:52 2013 Adjusting the default value of parameter parallel_max_servers from 160 to 135 due to the value of parameter processes (150) Starting ORACLE instance (normal) ****************** Large Pages Information ***************** Total Shared Global Region in Large Pages = #ff0000;">1024 MB (85%) Large Pages used by this instance: 512 (1024 MB) Large Pages unused system wide = 0 (0 KB) (alloc incr 16 MB) Large Pages configured system wide = 512 (1024 MB) Large Page size = 2048 KB RECOMMENDATION: Total Shared Global Region size is 1202 MB. For optimal performance, prior to the next instance restart increase the number of unused Large Pages by atleast 89 2048 KB Large Pages #ff0000;">(178 MB) system wide to get 100% of the Shared Global Region allocated with Large pages *********************************************************** LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0
The ipcs -m command confirmed this (multiple separate shared memory segments had been created).
Note that despite what documentation says, there’s a 4th option, called AUTO for the USE_LARGE_PAGES parameter too (in 184.108.40.206+ I think), which can now ask the OS to increase the number of hugepages when instance starts up – but I would always try to pre-allocate the right number of hugepages from start (ideally right after the OS reboot – via sysctl.conf), to reduce any overhead potential kernel CPU usage spikes due to search and defragmentation effort for “building” large consecutive pages.
Marting Bach has written about the AUTO option here.
We got on the plane from Copenhagen to Oslo and met up with the OUGN folks for some food in the hotel. We spent a long time talking about non-Oracle stuff, like science and religion. It was fun.
The morning started with a long breakfast, which included me nearly throwing orange juice over Mike Dietrich and him succeeding in throwing tea over me. We both blamed the table, but in reality it was our secret desire for a food fight that caused it.
I went to Sten‘s session called “From Requirements to Tool Choice”, which as the name suggests, discusses which tools are a good fit for which types of development. He has some interesting statistics in the presentation, which are a good talking point. You might want to take a look at OraToolWatch, which he maintains.
My sessions were on Analytic Functions and WebLogic. I am incapable of keeping to any sort of schedule. Mike Dietrich came to warn me about leaving for the plane. I thought he was asking me to finish my presentation 15 minutes early, which for a 45 minute session is kinda difficult, so I brushed him off and carried on, only to find out at the end that I had overshot by 15 minutes. I’m now cringing as I write this because I must have looked like such a diva. Just so you know, it wasn’t me being a diva. It was me being a dumb-ass. After the session I spoke to Mike and it seems I had told him the wrong flight times, which was why he was especially concerned, thinking I might miss my plane. People should just shoot me with a tranquillizer dart when my time is up. Sorry to all those that missed out on their coffee break.
Once again, I went straight from my last presentation to the airport to catch my plane. This time it was for the flight home… I sat chatting to Lonneke and Sten for a while before I got my first plane. When I got to Amsterdam I had a 2+ hour stop-over. After about an hour, I sat down with a coffee and I heard, “Can passenger Hall travelling to Birmingham please board immediately at gate D6!” I ditched my coffee and ran like an idiot from D49 to D6. I started apologising to the staff, saying I must of got my times mixed up etc. They checked and it was another passenger called “Hall”, travelling to Birmingham on a different flight. I walked back to gate D49 feeling rather frazzled.
Had a great time last week presenting to the Ohio Oracle User Group (OOUG). I’m generally use to presenting a single presentation but for this day at OOUG, I was the guest speaker and presented 4 back to back presentations.
OOUG members picked me up at the airport, whisked me to my hotel, then picked me up in the morning to take me to the conference venue, then arranged for my transport back to the airport. All in all super smoothly run.
Many thanks to Mary Brown from Nationwide who is the Vice President of OOUG for setting this up and for Nathan and Paddy also from Nationwide for helping out.
Here are the 4 presentations .
Many thanks to Doug Burns for his help on the lock presentation with his impressive set of blog posts on Oracle transaction locks:
I updated 5 blogs with not hassles.
It’s worth having a read of this, which mentions the automatic updates that could happen without your intervention… Sounds kind of scary to me.
Happy upgrading, possibly for the last time…
Next week is Strata + Hadoop World which is bound to be exciting for those who deal with big data on a daily basis. I’ll be spending my time talking about Cloudera Impala at various places so I’m posting my schedule for those interesting in catching about fast SQL on Hadoop. Hope to see you there!
Office Hour with Greg Rahn @ the Cloudera Booth
10/29/2013 11:00am – 11:30am EDT (30 minutes)
Room: 3rd Floor, Mercury Ballroom, Booth #403
Office Hour with Greg Rahn
10/29/2013 2:35pm – 3:05pm EDT (30 minutes)
Room: Table B
Practical Performance Analysis and Tuning for Cloudera Impala
10/30/2013 2:35pm – 3:15pm EDT (40 minutes)
Room: Murray Hill Suite (capacity: 300)
Exadata is about doing IO. I think if there’s one thing people know about Exadata, that’s it. Exadata brings (part of the) processing potentially closer to the storage media, which will be rotating disks for most (Exadata) users, and optionally can be flash.
But with Exadata, you either do normal alias regular IO, which will probably be single block IO, or multiblock IO, which hopefully gets offloaded. The single block reads are hopefully coming from the flashcache, which can be known by looking at v$sysstat/v$sesstat at the statistic (“cell flash cache read hits”), not directly by looking at the IO related views. To understand the composition of the response time of a smartscan, there is even lesser instrumentation in the database (for background, look at this blogpost, where is shown that the smartscan wait does not detail any of the steps done in a smartscan. In other words: if you experience performance differences on Exadata, and the waits point towards IO, there’s not much analysis which can be done to dig deeper.
Luckily, the Exadata storage server provides a very helpful dump which details IO latencies of what the cell considers celldisks (which are both flash and rotating disks). The dump provides:
- IO size by number of reads and writes
- IO size versus latency for reads and writes
- IO size versus pending IO count for reads and writes
- IO size versus pending IO sizes for reads and writes
This is how this dump is executed (in the cellcli of course):
alter cell events="immediate cellsrv.cellsrv_dump('iolstats',0)";
As with the other dumps, the cellcli provides the name of the trace file where the requested dump has been written to. If we look inside this trace file, this is how an IO latencies dump looks like:
IO length (bytes): Num read IOs: Num write IOs: [ 512 - 1023) 212184 104402 [ 1024 - 2047) 0 138812 [ 2048 - 4095) 0 166282 [ 4096 - 8191) 35 134095 [ 8192 - 16383) 498831 466674 [ 16384 - 32767) 2006 73433 [ 32768 - 65535) 91 15072 [ 65536 - 131071) 303 4769 [ 131072 - 262143) 297 6376 [ 262144 - 524287) 1160 230 [ 524288 - 1048575) 2278 36 [1048576 - 2097151) 459 21 Average IO-latency distribution stats for CDisk CD_02_enkcel01 Number of Reads iosize-latency distribution IO len(B)\IO lat(us) || [ 32 | [ 64 | [ 128 | [ 256 | [ 512 | [ 1024 | [ 2048 | [ 4096 | [ 8192 | [ 16384 | [ 32768 | [ 65536 | [ 131072 | [ 262144 | [ 524288 | || 63) | 127) | 255) | 511) | 1023) | 2047) | 4095) | 8191) | 16383) | 32767) | 65535) | 131071) | 262143) | 524287) | 1048575) | ---------------------||------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------| [ 512, 1023) || 31075 | 14592 | 69575 | 55370 | 7744 | 385 | 725 | 6489 | 7044 | 11663 | 4030 | 1770 | 1310 | 408 | 4 | [ 4096, 8191) || 0 | 6 | 5 | 6 | 0 | 0 | 0 | 0 | 7 | 8 | 3 | 0 | 0 | 0 | 0 | [ 8192, 16383) || 66 | 101 | 3189 | 6347 | 717 | 1826 | 23168 | 124246 | 191169 | 79157 | 37032 | 18508 | 12778 | 526 | 1 | [ 16384, 32767) || 22 | 46 | 22 | 1403 | 90 | 46 | 57 | 65 | 77 | 124 | 39 | 5 | 7 | 3 | 0 | ...
What struck me as odd, is the name of the celldisk (CD_02_enkcel01 here) is below the first table (IO lengths) about this celldisk(!)
In my previous post we saw a command to reset statistics (a cell events command). There is a command to reset the statistics for this specific dump (‘iolstats’) too (to be executed in the cellcli of course):
alter cell events = "immediate cellsrv.cellsrv_resetstats(iolstats)";
Next, I executed a smartscan
IO length (bytes): Num read IOs: Num write IOs: [ 4096 - 8191) 0 24 [ 524288 - 1048575) 8 0 [1048576 - 2097151) 208 0 Average IO-latency distribution stats for CDisk CD_02_enkcel01 Number of Reads iosize-latency distribution IO len(B)\IO lat(us) || [ 4096 | [ 8192 | [ 16384 | [ 32768 | [ 65536 | [ 131072 | [ 262144 | || 8191) | 16383) | 32767) | 65535) | 131071) | 262143) | 524287) | ---------------------||------------|------------|------------|------------|------------|------------|------------| [ 524288, 1048575) || 0 | 0 | 3 | 1 | 0 | 2 | 2 | [ 1048576, 2097151) || 1 | 3 | 15 | 22 | 89 | 59 | 19 |
As can be seen, the statistics have been reset (on the local cell/storage server!). This makes diagnosing the physical IO subsystem of Exadata possible!
I spent the first two sessions of the DOUG event watching Mike Dietrich presenting on 12c upgrades, pluggable databases and new features. I’ve seen some of his stuff already during the LAOTN Tour (Southern Leg), but his presentations have changed a little since I last saw them. To quote Noons,
“Mike’s talk is superb. No bull, just down to facts…”
I think that sums it up nicely.
After that I did my two sessions for this event. One on virtualisation and one on WebLogic. It was quite a strange day for me as I did two talks that had no demos and both were on areas that are not my main skill set. A mixture scary and fun rolled in to one. I literally finished my talk, put my laptop in my bag and got in a taxi for the airport, so I didn’t have any time to mingle after the sessions. I hope they went down OK. With a bit of luck I will get to do another conference in Denmark and spend more time talking to the attendees outside the sessions.
Note. The presentations rooms had bowls of sweets. Connor McDonald will know exactly what happened.
When you are administering an Exadata or more Exadata’s, you probably have multiple databases running on different database or “computing” nodes. In order to understand what kind of IO you are doing, you can look inside the statistics of your database, and look in the data dictionary what that instance or instances (in case of RAC) have been doing. When using Exadata there is a near 100% chance you are using either normal redundancy or high redundancy, of which most people know the impact of the “write amplification” of both normal and high redundancy of ASM (the write statistics in the Oracle data dictionary do not reflect the additional writes needed to satisfy normal (#IO times 2) or high (#IO times 3) redundancy). This means there might be difference in IOs between what you measure or think for your database is doing, and actually is done at the storage level.
But what if you want to know what is happening on the storage level, so on the level of the cell or actually “cellsrv”, which is the process which makes IO flow to your databases? One option is to run “iostat -x”, but that gives a list that is quite hard readable (too much disk devices); and: it doesn’t show you what the reason for the IO was: redo write? controlfile read? Archivelog? This would especially be great if you want to understand what is happening if your IO behaves different than you expect, and you’ve ruled out IORM.
Well, it is possible to get an IO overview (cumulative since startup)! Every storage server keeps a table of IO reasons. This table can be dumped into a trace file on the cell; to generate a dump with an overview of what kind of IOs are done; use “cellcli” locally on a cell, and enter the following command:
alter cell events="immediate cellsrv.cellsrv_dump('ioreasons',0)";
This doesn’t generate anything useful as output on the command line, except for the name of the thread-logfile where we can find the contents of the dump we requested:
Dump sequence #18 has been written to /opt/oracle/cell220.127.116.11.1_LINUX.X64_130109/log/diag/asm/cell/enkcel01/trace/svtrc_15737_14.trc Cell enkcel01 successfully altered
This is an example from a cell in the Enkitec lab, which I used for this example:
Cache::dumpReasons I/O Reason Table 2013-10-23 08:11:06.869047*: Dump sequence #18: Cache::dumpReasons Reason Reads Writes Cache::dumpReasons ------------------------------------ Cache::dumpReasons UNKNOWN 436784 162942 Cache::dumpReasons RedoLog Write 0 80329 Cache::dumpReasons RedoLog Read 873 0 Cache::dumpReasons ControlFile Read 399993 0 Cache::dumpReasons ControlFile Write 0 473234 Cache::dumpReasons ASM DiskHeader IO 4326 4 Cache::dumpReasons BufferCache Read 27184 0 Cache::dumpReasons DataHeader Read 2627 0 Cache::dumpReasons DataHeader Write 0 1280 Cache::dumpReasons Datafile SeqRead 45 0 Cache::dumpReasons Datafile SeqWrite 0 373 Cache::dumpReasons HighPriority Checkpoint Write 0 6146 Cache::dumpReasons DBWR Aged Write 0 560 Cache::dumpReasons ReuseBlock Write 0 150 Cache::dumpReasons Selftune Checkpoint Write 0 116800 Cache::dumpReasons RequestLit Write 0 25 Cache::dumpReasons Archivelog IO 0 255 Cache::dumpReasons TrackingFile IO 2586 2698 Cache::dumpReasons ASM Relocate IO 0 200 Cache::dumpReasons ASM Replacement IO 0 91 Cache::dumpReasons ASM CacheCleanup IO 0 4514 Cache::dumpReasons ASM UserFile Relocate 0 2461 Cache::dumpReasons ASM Redo IO 0 10610 Cache::dumpReasons ASM Cache IO 1953 0 Cache::dumpReasons ASM PST IO 0 44 Cache::dumpReasons ASM Heartbeat IO 26 162984 Cache::dumpReasons ASM BlockFormat IO 0 3704 Cache::dumpReasons ASM StaleFile IO 0 675 Cache::dumpReasons OSD Header IO 0 315 Cache::dumpReasons Smart scan 11840 0
Please mind the numbers here are IOs, it doesn’t say anything about the size of the IOs. Also please mind these are numbers of a single cell, you probably have 3, 7 or 14 cells.
In my opinion this IO summary can be of much value during IO performance investigations, but also during proofs of concept.
If the cell has been running for a while, these number may grow very big. In order to make an easy baseline, the IO reason numbers can be reset, so you can start off your test or proof-of-concept run and measure what actually has happened on the cell layer! In order to reset the IO reason table, enter the following command in the cellcli:
alter cell events = "immediate cellsrv.cellsrv_resetstats(ioreasons)";
This will reset the IO reasons table in the cell.
PS1: Thanks to Nikolay Kovachev for pointing out the ‘ioreasons’ resetstats parameter. Indeed ‘all’ is way too blunt.
PS2: The IO numbers seem to be the number IO requests the cell has gotten from it’s clients (ASM and database) for data, not for metadata. During a smartscan metadata flows in between the database and the cell server before data is actually served.
I didn’t sleep too well the night before the Stockholm event, so I woke up feeling extremely groggy. I think it was just the combination of excitement and adrenalin you get before starting a tour. I met Lonneke and Sten for breakfast, then headed on to the conference venue.
I watched Lonneke presenting on SOA for the first two sessions of the day. This is completely not my area of expertise, but I learnt a lot in these sessions. I now understand a lot of the buzzwords and a lot of the common pitfalls for the first time ever. I’ll never be a SOA guy, but it’s nice to know a little more, so that I can understand when people are leading me astray. You don’t have to know how to swim to recognise when someone is drowning.
After those two sessions, I presented three sessions in a row, including my first ever WebLogic presentation. Eeeccckkk! I made it very clear I was not an expert! The approach was, this is what I wished I had known in my first hour of learning WebLogic! I think it went well. I got some helpful feedback from Lonneke, which I’ve added to the presentation.
After my last presentation we went straight from the conference to the airport. There was a problem with the boarding scanners, so we had to be processed manually, which meant we were about an hour late in departing. That meant we were too late to have dinner with the Danish OUG folks, which was a pity. So it was straight from the airport to bed.
Exadata gets its performance by letting the storage (the exadata storage server) participate in query processing, which means part of the processing is done as close as possible to where the data is stored. The participation of the storage server in query processing means that a storage grid can massively parallel (depending on the amount of storage servers participating) process a smart scan request.
However, this also means additional CPU is used on the storage layer. Because there is no real limit on how many queries can use smartscans (and/or hybrid columnar compression, in other words: processing) on the available storage servers, this means a storage server can get overloaded, which could hurt performance. To overcome this problem, Oracle introduced the ‘passthrough’ functionality in the storage server. In the exadata book, it is explained that this functionality came with storage server version 18.104.22.168.0 and Oracle database version 22.214.171.124 and Exadata bundle patch 7. It also explains that the ‘passthrough’ functionality means that the storage server deliberately starts sending non-storage processed data during the smartscan. So when this happens, you still do a smartscan (!), but your foreground process or parallel query slave gets much more data, and needs to process more. The database-side statistic to know this is happening is “cell physical IO bytes sent directly to DB node to balance CPU usage” which is at the database level in v$sysstat and on the session level in v$sesstat.
But how does this work on the storage server?
On the storage server, the passthrough mode properties are governed by a few “underbar” or “undocumented” parameters in the storage server. In order to get the (current) values of the Exadata storage server, the following command can be used on the “cellcli”:
alter cell events="immediate cellsrv.cellsrv_dump('cellparams',0)";
The cell will echo the thread-logfile in which the output of this dump is put:
Dump sequence #1 has been written to /opt/oracle/cell126.96.36.199.1_LINUX.X64_130109/log/diag/asm/cell/enkcel01/trace/svtrc_15737_87.trc Cell enkcel01 successfully altered
Now load this tracefile (readonly) in your favourite text manipulation tool (I use ‘less’ because less is more).
The “underbar” parameters which are of interest are the following parameters:
_cell_mpp_cpu_freq = 2 _cell_mpp_threshold = 90 _cell_mpp_max_pushback = 50
The “MPP” function is responsible for the passthrough functionality. I can’t find anywhere what “MPP” means, my guess is “Measurement (of) Performance (for) Passthrough”. These parameters govern how it works.
_cell_mpp_cpu_freq seems to be the frequency at which the MPP code measures the host CPU, “2″ means per “200ms”.
_cell_mpp_threshold seems to be the CPU usage threshold after which the passthrough functionality kicks in.
_cell_mpp_max_pushback seems to be the maximum percentage of blocks (unsure what the exact granularity is) which are sent to the database in passthrough mode.
In order to get a good understanding of what MPP does, there is a MPP specific dump which could be very beneficial to diagnose MPP related matters. This dump is available on the storage server, which means in the cellcli:
alter cell events="immediate cellsrv.cellsrv_dump('mpp_stats',0)";
The cell will once again echo the thread-logfile in which the output of this dump is put:
Dump sequence #8 has been written to /opt/oracle/cell188.8.131.52.1_LINUX.X64_130109/log/diag/asm/cell/enkcel01/trace/svtrc_15737_22.trc Cell enkcel01 successfully altered
Now peek in the tracefile!
Trace file /opt/oracle/cell184.108.40.206.1_LINUX.X64_130109/log/diag/asm/cell/enkcel01/trace/svtrc_15737_22.trc ORACLE_HOME = /opt/oracle/cell220.127.116.11.1_LINUX.X64_130109 System name: Linux Node name: enkcel01.enkitec.com Release: 2.6.32-400.11.1.el5uek Version: #1 SMP Thu Nov 22 03:29:09 PST 2012 Machine: x86_64 CELL SW Version: OSS_18.104.22.168.1_LINUX.X64_130109 *** 2013-10-21 10:54:05.994 UserThread: LWPID: 16114 userId: 22 kernelId: 22 pthreadID: 0x4d1e5940 2013-10-21 14:36:04.910675*: [MPP] Number of blocks executed in passthru mode because of high CPU utilization: 0 out of 4232 total blocks. Percent = 0.000000% 2013-10-21 14:36:04.910675*: Dump sequence #3: [MPP] Current cell cpu utilization: 7 [MPP] Mon Oct 21 14:36:04 2013 [Cell CPU History] 7 [Pushback Rate] 0 [MPP] Mon Oct 21 14:36:04 2013 [Cell CPU History] 8 [Pushback Rate] 0 [MPP] Mon Oct 21 14:36:04 2013 [Cell CPU History] 7 [Pushback Rate] 0 ... [MPP] Mon Oct 21 14:05:57 2013 [Cell CPU History] 1 [Pushback Rate] 0 [MPP] Mon Oct 21 14:05:56 2013 [Cell CPU History] 1 [Pushback Rate] 0 [MPP] Mon Oct 21 14:05:56 2013 [Cell CPU History] 1 [Pushback Rate] 0
So, what do we see here? We see a cell tracefile which is in the well-known Oracle trace format. This means it starts off with a header which is specific to the cell server.
Then we see a timestamp with three asterisks in front of it. The time in the timestamp is (10:54:05.994), which is is roughly 3 hours and 30 minutes earlier than the timestamp with the next messages, which is 14:36:04.910. The line with the three asterisks is the creation timestamp of the tracefile, which is when the thread which we are using was created. The next line is also created just after that time, which lists the LWPID, userId, etc.
The line with [MPP] is created because of the mpp_stats dump. The timestamp has a single asterisk after it, which means the time is an approximation. The line tells important information: during the approximate 30 minutes in this dump, 4232 blocks where processed by this cell, and 0 blocks where “executed” in “passthru” mode.
Next, the measurements which where taken every 200ms of the past are printed to get an exact overview of measured CPU business and the rate at which “pushback” alias “passthru” was applied.
To see what this means, let’s generate CPU business on the cell, and see if we can get the storage server to invoke “passthru”. There is a simple trick to let a fake process take 100% on it’s CPU thread with common linux shell tools: ‘yes > /dev/null &’. The storage server which I use has 16 CPU threads, so I start 16 of these processes, to effectively have a process that can monopolise every CPU thread.
Next, I started a (sufficiently large; 7GB) scan on a table via sqlplus, and then dumped ‘mpp_stats’ using the method described in this blog.
2013-10-21 16:30:27.977624*: [MPP] Number of blocks executed in passthru mode because of high CPU utilization: 2728 out of 13287 total blocks. Percent = 20.531347% 2013-10-21 16:30:27.977624*: Dump sequence #10: [MPP] Current cell cpu utilization: 8 [MPP] Mon Oct 21 16:30:27 2013 [Cell CPU History] 8 [Pushback Rate] 0 ... [MPP] Mon Oct 21 16:30:13 2013 [Cell CPU History] 96 [Pushback Rate] 5 [MPP] Mon Oct 21 16:30:13 2013 [Cell CPU History] 96 [Pushback Rate] 10 [MPP] Mon Oct 21 16:30:13 2013 [Cell CPU History] 95 [Pushback Rate] 15 [MPP] Mon Oct 21 16:30:12 2013 [Cell CPU History] 95 [Pushback Rate] 20 [MPP] Mon Oct 21 16:30:12 2013 [Cell CPU History] 95 [Pushback Rate] 25 [MPP] Mon Oct 21 16:30:12 2013 [Cell CPU History] 96 [Pushback Rate] 30 [MPP] Mon Oct 21 16:30:12 2013 [Cell CPU History] 96 [Pushback Rate] 35 [MPP] Mon Oct 21 16:30:12 2013 [Cell CPU History] 96 [Pushback Rate] 40 [MPP] Mon Oct 21 16:30:11 2013 [Cell CPU History] 95 [Pushback Rate] 45 [MPP] Mon Oct 21 16:30:11 2013 [Cell CPU History] 98 [Pushback Rate] 50 [MPP] Mon Oct 21 16:30:11 2013 [Cell CPU History] 97 [Pushback Rate] 50 [MPP] Mon Oct 21 16:30:11 2013 [Cell CPU History] 98 [Pushback Rate] 50 [MPP] Mon Oct 21 16:30:11 2013 [Cell CPU History] 100 [Pushback Rate] 50 [MPP] Mon Oct 21 16:30:10 2013 [Cell CPU History] 99 [Pushback Rate] 50 [MPP] Mon Oct 21 16:30:10 2013 [Cell CPU History] 97 [Pushback Rate] 50 [MPP] Mon Oct 21 16:30:10 2013 [Cell CPU History] 100 [Pushback Rate] 50 [MPP] Mon Oct 21 16:30:10 2013 [Cell CPU History] 98 [Pushback Rate] 45 [MPP] Mon Oct 21 16:30:10 2013 [Cell CPU History] 97 [Pushback Rate] 40 [MPP] Mon Oct 21 16:30:09 2013 [Cell CPU History] 98 [Pushback Rate] 35 [MPP] Mon Oct 21 16:30:09 2013 [Cell CPU History] 100 [Pushback Rate] 30 [MPP] Mon Oct 21 16:30:09 2013 [Cell CPU History] 100 [Pushback Rate] 25 [MPP] Mon Oct 21 16:30:09 2013 [Cell CPU History] 100 [Pushback Rate] 20 [MPP] Mon Oct 21 16:30:09 2013 [Cell CPU History] 99 [Pushback Rate] 15 [MPP] Mon Oct 21 16:30:08 2013 [Cell CPU History] 100 [Pushback Rate] 10 [MPP] Mon Oct 21 16:30:08 2013 [Cell CPU History] 99 [Pushback Rate] 5 [MPP] Mon Oct 21 16:30:08 2013 [Cell CPU History] 99 [Pushback Rate] 0 ...
This shows it all! The header shows that during the last 30 minutes, this storage server sended 2728 blocks out of the total of 13287 blocks via passthrough mode. Further, in the lines which contain the historical measurements, the “pushback rate” can be seen climbing up to 50%, because the CPU usage was above 90%.
Please mind the techniques I’ve described here are done on one storage server, while a normal Exadata setup has 3 (8th/quarter rack), 7 (half rack) or 14 (full rack) storage servers.