So here we go, this is part one of this experiment in blogging and co-writing. Tanel has actually written some good stuff already for this, but I wanted to try and formalise things under a common title and make it easier to follow between our sites.
I thought it would be logical to start this process by producing a more concrete problem definition, so that’s the focus of this part. It’s unlikely that we will come up with a complete method in this initial work, but hopefully the wheels will at least turn a little by the end of it!
So first of all, why would I dare to say that the Oracle Wait Interface is useless? Well, partly because I quite like titles that are a little bit catchy, and partly because it is indeed sometimes useless. The emphasis is on the word sometimes, though, because the Oracle Wait Interface is still the single most useful feature in any database product. Wow – that’s quite a claim, isn’t it? This isn’t the place to fully explain why that is, and many others have written great works on this subject already. Check out Cary Millsap’s works, notably his book, Optimizing Oracle Performance, which focuses in great detail on this subject. For the sake of this article, however, here’s why it is so useful: It tells you where the time goes. Think about it: If something is running too slowly, knowing where the time is used up is the single piece of information required to focus on the right subject for tuning.
So what’s wrong with the Oracle wait interface? Just one thing, actually – it is designed to provide visibility of relatively slow waits. The reason for this is simply that there is a slight overhead in timing every single wait. If that overhead becomes a noticeable proportion of the actual wait itself then the measurement becomes inaccurate (and makes the problem worse). On UNIX-like platforms (yes, that includes Linux), the actual timing interface is implemented using gettimeofday(2) system calls, one before the event and one after the event. This call gives microsecond granularity of timing, at least in theory (on my Opteron 280 test machine, gettimeofday() calls take 1.5 microseconds). So, using this kind of mechanism for events that take a relatively long time makes perfect sense – disk I/O, for example, that will take at least three orders of magnitude longer to complete than the timing calls themselves. Conversely, they make no sense at all for calls that take even as little as 50 microseconds, as the 3 microsecond penalty for measuring the wait is 6% of the actual event time itself at that point. There you go, that’s the beginning of the justification that the wait interface is useless, in a nutshell.
But hang on, isn’t 50 microseconds really, really fast? Well no, actually, it isn’t. Taking Intel’s Nehalem processor (with Quickpath) as an example, a memory latency is around the 50 ns range – three orders of magnitude faster than the 50 microsecond cut-off that I just arbitrarily invented. Memory access is also the slowest thing that a CPU can do (without factoring in peripheral cards) – in this case the CPU has to wait for about 150 cycles while that memory access takes place. So it’s very possible to have a function call that does fairly complex work and is still an order of magnitude or two faster than the gettimeofday() system call.
Time for an example. Actually, this is a variation on the example that made me start thinking about this – I had been perfectly happy with the Oracle Wait Interface until this point for 99% of cases!
OK, so a user just called you, complaining that his query is simply not ever completing. Following the usual course of action, we might take a few samples from v$session_wait (or v$session from 10g onwards) to look at the current wait state for the process, just in case it’s something obvious:
SID EVENT PROCESS STATE ---- ------------------------------ ------------------------ ------------------- 8 SQL*Net message to client 15032 WAITED SHORT TIME
SQL> oradebug setospid 15033 Oracle pid: 20, Unix process pid: 15033, image: oracle@elise03.sa.int (TNS V1-V3) SQL> oradebug event 10046 trace name context forever, level 12 Statement processed.
After a few seconds, let’s see if there is anything in the trace file:
*** 2009-11-09 10:54:36.934 *** SESSION ID:(8.10393) 2009-11-09 10:54:36.934 *** CLIENT ID:() 2009-11-09 10:54:36.934 *** SERVICE NAME:(SYS$USERS) 2009-11-09 10:54:36.934 *** MODULE NAME:(sqlplus@elise03.sa.int (TNS V1-V3)) 2009-11-09 10:54:36.934 *** ACTION NAME:() 2009-11-09 10:54:36.934 Received ORADEBUG command (#1) 'event 10046 trace name context forever, level 12' from process 'Unix process pid: 15082, image:' *** 2009-11-09 10:54:36.935 Finished processing ORADEBUG command (#1) 'event 10046 trace name context forever, level 12'
No, this is not truncated output – there is nothing in this file at all, apart from the actual invocation of the tracing. OK, so what’s next? How about looking at ‘top’ to see if the process if busy:
top - 10:53:15 up 2 days, 19:50, 4 users, load average: 1.04, 0.55, 0.57 Tasks: 148 total, 2 running, 146 sleeping, 0 stopped, 0 zombie Cpu(s): 25.8%us, 0.4%sy, 0.0%ni, 73.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3995468k total, 3877052k used, 118416k free, 115432k buffers Swap: 6029304k total, 170388k used, 5858916k free, 3289460k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15033 oracle 25 0 2901m 1.1g 1.1g R 99.9 28.5 3:01.79 oracle 3284 oracle 15 0 2903m 54m 46m S 1.7 1.4 60:19.33 oracle 15071 oracle 16 0 2902m 24m 21m S 1.3 0.6 0:00.04 oracle 15069 oracle 15 0 12740 1112 816 R 0.7 0.0 0:00.07 top
Crikey, our process is more or less consuming 100% of a CPU! So, we are not waiting for anything, but the user still has no results, and the process is very busy doing something. Let’s now try the next logical step – let’s truss/strace the process:
[oracle@elise03 trace]$ strace -tp 15033
Process 15033 attached - interrupt to quit
11:28:06 gettimeofday({1257766086, 104118}, NULL) = 0
11:28:06 getrusage(RUSAGE_SELF, {ru_utime={2270, 615813}, ru_stime={0, 634903}, ...}) = 0
11:28:06 gettimeofday({1257766086, 104402}, NULL) = 0
11:28:06 getrusage(RUSAGE_SELF, {ru_utime={2270, 615813}, ru_stime={0, 634903}, ...}) = 0
11:28:08 gettimeofday({1257766088, 105930}, NULL) = 0
11:28:08 getrusage(RUSAGE_SELF, {ru_utime={2272, 609510}, ru_stime={0, 634903}, ...}) = 0
11:28:08 gettimeofday({1257766088, 106186}, NULL) = 0
11:28:08 getrusage(RUSAGE_SELF, {ru_utime={2272, 609510}, ru_stime={0, 634903}, ...}) = 0
11:28:10 gettimeofday({1257766090, 110887}, NULL) = 0
11:28:10 getrusage(RUSAGE_SELF, {ru_utime={2274, 611206}, ru_stime={0, 634903}, ...}) = 0
11:28:10 gettimeofday({1257766090, 111143}, NULL) = 0
11:28:10 getrusage(RUSAGE_SELF, {ru_utime={2274, 611206}, ru_stime={0, 634903}, ...}) = 0SQL> select stat_name,value from v$sess_time_model where sid=8; STAT_NAME VALUE --------------------------------------------- ---------- DB time 2802976117 DB CPU 2801767067 background elapsed time 0 background cpu time 0 sequence load elapsed time 0 parse time elapsed 74853 hard parse elapsed time 66626 sql execute elapsed time 2802910770 connection management call elapsed time 21308 failed parse elapsed time 0 failed parse (out of shared memory) elapsed t 0 hard parse (sharing criteria) elapsed time 62605 hard parse (bind mismatch) elapsed time 0 PL/SQL execution elapsed time 5377 inbound PL/SQL rpc elapsed time 0 PL/SQL compilation elapsed time 18303 Java execution elapsed time 0 repeated bind elapsed time 0 RMAN cpu time (backup/restore) 0 19 rows selected. SQL>
At last, we have some kind of a symptom, and confirmation that the user is indeed actually trying to do useful work rather than just spinning on the CPU. The symptom is that there is an increasing amount of time being allocated as ‘sql execute elapsed time’. In my opinion, this is where the time model statistics fail to deliver when the opportunity is presented to it. There are just 19 statistics reported here on 11gR2, and the only help we are given from its output is that we are using a lot of DB time, a lot of DB CPU and a lot of sql execute time. So we can surmise that we are doing a very CPU intensive query, and that’s about it – no finer granularity provided and this would seem to be the logical place for such instrumentation…
OK, so we are now just confirmed – we have a nasty query that is just using CPU and never waiting for anything. Let’s not start guessing at this stage what the problem is, let’s try and find out the real answer. At this stage, we might jump to v$sql_plan_statistics_all to find out what is going on. These statistics are only updated when statistics_level is set to ALL, and even then do not update until the statement is complete. In our little example here, the query is already running – we can’t set statistics_level=all, and we can’t wait until the query completes – it might never do so! Of course, it’s probably OK to get the user to kill the query and restart with statistics_level=all, or even the ctrl-c would allow the stats to update. However, let’s assume neither of those things are possible, as this is only an example case.
So what techniques can we use to find the problem? One of them might be the new SQL Execution Monitoring in 11g (which looks very nice), if you have the required license – but that is a very specific case where the problem is a SQL execution problem. What if it is not a SQL execution problem? We need a more general method for finding the answer. That’s the subject of part two of this blog, over to Tanel for Part Two!
Full name
James Morle
My company
http://www.scaleabilities.co.uk
My blog
http://jamesmorle.wordpress.com
Recent comments
16 weeks 5 days ago
26 weeks 4 days ago
28 weeks 2 days ago
31 weeks 3 days ago
33 weeks 5 days ago
43 weeks 2 days ago
44 weeks 5 days ago
45 weeks 5 days ago
46 weeks 44 min ago
48 weeks 5 days ago