Search

OakieTags

Who's online

There are currently 0 users and 34 guests online.

Recent comments

Affiliations

The Oracle Wait Interface Is Useless (sometimes) – Part One: The Problem Definition

So here we go, this is part one of this experiment in blogging and co-writing. Tanel has actually written some good stuff already for this, but I wanted to try and formalise things under a common title and make it easier to follow between our sites.

I thought it would be logical to start this process by producing a more concrete problem definition, so that’s the focus of this part. It’s unlikely that we will come up with a complete method in this initial work, but hopefully the wheels will at least turn a little by the end of it!

So first of all, why would I dare to say that the Oracle Wait Interface is useless? Well, partly because I quite like titles that are a little bit catchy, and partly because it is indeed sometimes useless. The emphasis is on the word sometimes, though, because the Oracle Wait Interface is still the single most useful feature in any database product. Wow – that’s quite a claim, isn’t it? This isn’t the place to fully explain why that is, and many others have written great works on this subject already. Check out Cary Millsap’s works, notably his book, Optimizing Oracle Performance, which focuses in great detail on this subject. For the sake of this article, however, here’s why it is so useful: It tells you where the time goes. Think about it: If something is running too slowly, knowing where the time is used up is the single piece of information required to focus on the right subject for tuning.

So what’s wrong with the Oracle wait interface? Just one thing, actually – it is designed to  provide visibility of relatively slow waits. The reason for this is simply that there is a slight overhead in timing every single wait. If that overhead becomes a noticeable proportion of the actual wait itself then the measurement becomes inaccurate (and makes the problem worse). On UNIX-like platforms (yes, that includes Linux), the actual timing interface is implemented using gettimeofday(2) system calls, one before the event and one after the event. This call gives microsecond granularity of timing, at least in theory (on my Opteron 280 test machine, gettimeofday() calls take 1.5 microseconds). So, using this kind of mechanism for events that take a relatively long time makes perfect sense – disk I/O, for example, that will take at least three orders of magnitude longer to complete than the timing calls themselves. Conversely, they make no sense at all for calls that take even as little as 50 microseconds, as the 3 microsecond penalty for measuring the wait is 6% of the actual event time itself at that point. There you go, that’s the beginning of the justification that the wait interface is useless,  in a nutshell.

But hang on, isn’t 50 microseconds really, really fast? Well no, actually, it isn’t. Taking Intel’s Nehalem processor (with Quickpath) as an example, a memory latency is around the 50 ns range – three orders of magnitude faster than the 50 microsecond cut-off that I just arbitrarily invented. Memory access is also the slowest thing that a CPU can do (without factoring in peripheral cards) – in this case the CPU has to wait for about 150 cycles while that memory access takes place. So it’s very possible to have a function call that does fairly complex work and is still an order of magnitude or two faster than the gettimeofday() system call.

Time for an example. Actually, this is a variation on the example that made me start thinking about this – I had been perfectly happy with the Oracle Wait Interface until this point for 99% of cases!

OK, so a user just called you, complaining that his query is simply not ever completing. Following the usual course of action, we might take a few samples from v$session_wait (or v$session from 10g onwards) to look at the current wait state for the process, just in case it’s something obvious:

SID EVENT                          PROCESS                  STATE

---- ------------------------------ ------------------------ -------------------
 8 SQL*Net message to client      15032                    WAITED SHORT TIME
Well, that isn’t too revealing. Let’s now turn on extended sql tracing to get a list of all wait events as they transition:
SQL> oradebug setospid 15033
Oracle pid: 20, Unix process pid: 15033, image: oracle@elise03.sa.int (TNS V1-V3)
SQL> oradebug event 10046 trace name context forever, level 12
Statement processed.

After a few seconds, let’s see if there is anything in the trace file:

*** 2009-11-09 10:54:36.934
*** SESSION ID:(8.10393) 2009-11-09 10:54:36.934
*** CLIENT ID:() 2009-11-09 10:54:36.934
*** SERVICE NAME:(SYS$USERS) 2009-11-09 10:54:36.934
*** MODULE NAME:(sqlplus@elise03.sa.int (TNS V1-V3)) 2009-11-09 10:54:36.934
*** ACTION NAME:() 2009-11-09 10:54:36.934

Received ORADEBUG command (#1) 'event 10046 trace name context forever, level 12' from process 'Unix process pid: 15082, image: '

*** 2009-11-09 10:54:36.935
Finished processing ORADEBUG command (#1) 'event 10046 trace name context forever, level 12'

No, this is not truncated output – there is nothing in this file at all, apart from the actual invocation of the tracing. OK, so what’s next? How about looking at ‘top’ to see if the process if busy:

top - 10:53:15 up 2 days, 19:50,  4 users,  load average: 1.04, 0.55, 0.57
Tasks: 148 total,   2 running, 146 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.8%us,  0.4%sy,  0.0%ni, 73.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3995468k total,  3877052k used,   118416k free,   115432k buffers
Swap:  6029304k total,   170388k used,  5858916k free,  3289460k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
15033 oracle    25   0 2901m 1.1g 1.1g R 99.9 28.5   3:01.79 oracle
3284 oracle    15   0 2903m  54m  46m S  1.7  1.4  60:19.33 oracle
15071 oracle    16   0 2902m  24m  21m S  1.3  0.6   0:00.04 oracle
15069 oracle    15   0 12740 1112  816 R  0.7  0.0   0:00.07 top

Crikey, our process is more or less consuming 100% of a CPU! So, we are not waiting for anything, but the user still has no results, and the process is very busy doing something. Let’s now try the next logical step – let’s truss/strace the process:

[oracle@elise03 trace]$ strace -tp 15033
Process 15033 attached - interrupt to quit
11:28:06 gettimeofday({1257766086, 104118}, NULL) = 0
11:28:06 getrusage(RUSAGE_SELF, {ru_utime={2270, 615813}, ru_stime={0, 634903}, ...}) = 0
11:28:06 gettimeofday({1257766086, 104402}, NULL) = 0
11:28:06 getrusage(RUSAGE_SELF, {ru_utime={2270, 615813}, ru_stime={0, 634903}, ...}) = 0
11:28:08 gettimeofday({1257766088, 105930}, NULL) = 0
11:28:08 getrusage(RUSAGE_SELF, {ru_utime={2272, 609510}, ru_stime={0, 634903}, ...}) = 0
11:28:08 gettimeofday({1257766088, 106186}, NULL) = 0
11:28:08 getrusage(RUSAGE_SELF, {ru_utime={2272, 609510}, ru_stime={0, 634903}, ...}) = 0
11:28:10 gettimeofday({1257766090, 110887}, NULL) = 0
11:28:10 getrusage(RUSAGE_SELF, {ru_utime={2274, 611206}, ru_stime={0, 634903}, ...}) = 0
11:28:10 gettimeofday({1257766090, 111143}, NULL) = 0
11:28:10 getrusage(RUSAGE_SELF, {ru_utime={2274, 611206}, ru_stime={0, 634903}, ...}) = 0
Not much to see here. Those system calls are emitted every two seconds, and are almost certainly just the result of Oracle updating the statistics for the database time model (v$sess_time_model), but we’re really not doing much at all in terms of system calls. Hang on a minute, the database time model – surely that will help us here?
SQL> select stat_name,value from v$sess_time_model where sid=8;
STAT_NAME                                          VALUE
--------------------------------------------- ----------
DB time                                       2802976117
DB CPU                                        2801767067
background elapsed time                                0
background cpu time                                    0
sequence load elapsed time                             0
parse time elapsed                                 74853
hard parse elapsed time                            66626
sql execute elapsed time                      2802910770
connection management call elapsed time            21308
failed parse elapsed time                              0
failed parse (out of shared memory) elapsed t          0
hard parse (sharing criteria) elapsed time         62605
hard parse (bind mismatch) elapsed time                0
PL/SQL execution elapsed time                       5377
inbound PL/SQL rpc elapsed time                        0
PL/SQL compilation elapsed time                    18303
Java execution elapsed time                            0
repeated bind elapsed time                             0
RMAN cpu time (backup/restore)                         0

19 rows selected.

SQL>

At last, we have some kind of a symptom, and confirmation that the user is indeed actually trying to do useful work rather than just spinning on the CPU. The symptom is that there is an increasing amount of time being allocated as ‘sql execute elapsed time’.  In my opinion, this is where the time model statistics fail to deliver when the opportunity is presented to it. There are just 19 statistics reported here on 11gR2, and the only help we are given from its output is that we are using a lot of DB time, a lot of DB CPU and a lot of sql execute time. So we can surmise that we are doing a very CPU intensive query, and that’s about it – no finer granularity provided and this would seem to be the logical place for such instrumentation…

OK, so we are now just confirmed – we have a nasty query that is just using CPU and never waiting for anything. Let’s not start guessing at this stage what the problem is, let’s try and find out the real answer. At this stage, we might jump to v$sql_plan_statistics_all to find out what is going on. These statistics are only updated when statistics_level is set to ALL, and even then do not update until the statement is complete. In our little example here, the query is already running – we can’t set statistics_level=all, and we can’t wait until the query completes – it might never do so! Of course, it’s probably OK to get the user to kill the query and restart with statistics_level=all, or even the ctrl-c would allow the stats to update. However, let’s assume neither of those things are possible, as this is only an example case.
So what techniques can we use to find the problem? One of them might be the new SQL Execution Monitoring in 11g (which looks very nice), if you have the required license – but that is a very specific case where the problem is a SQL execution problem. What if it is not a SQL execution problem? We need a more general method for finding the answer. That’s the subject of part two of this blog, over to Tanel for Part Two!