Ok, I think it’s time to write another blog entry. I’ve been traveling and dealing with jetlag from 10-hour time difference, then traveling some more, spoken at conferences, drank beer, had fun, then traveled some more, trained customers, hacked some Exadatas and now I’m back home.
Anyway, do you know what is the SQL_EXEC_ID in V$SESSION and ASH views?
Oh yeah, it’s the “SQL Execution ID” just like the documentation says … all clear. Um … is it? I’d like to know more about it – what does it actually stand for?! Is it session level, instance level or a RAC-global counter? And why does it start from 16 million, not 1?
There are many reasons why a parallel execution might not run with the expected degree of parallelism (DOP), beginning with running out of parallel slaves (PARALLEL_MAX_SERVERS or PROCESSES reached), PARALLEL_ADAPTIVE_MULTI_USER, downgrades at execution time via the Resource Manager, or the more recent features like PARALLEL_DEGREE_LIMIT or the Auto DOP introduced in Oracle 11.2.
Time for another of those little surprises that catch you out after the upgrade.
Take a look at this “Top N” from a standard AWR report, from an instance running 184.108.40.206
Top 5 Timed Foreground Events ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg wait % DB Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- direct path read 3,464,056 6,593 2 33.5 User I/O DB CPU 3,503 17.8 db flash cache single block ph 2,293,604 3,008 1 15.3 User I/O db file sequential read 200,779 2,294 11 11.6 User I/O enq: TC - contention 82 1,571 19158 8.0 Other
Have you ever heard the suggestion that if you see time lost on event write complete waits you need to get some faster discs.
So what’s the next move when you’ve got 96GB of flash cache plugged into your server (check the parameters below) and see time lost on event write complete waits: flash cache ?
db_flash_cache_file /flash/oracle/flash.dat db_flash_cache_size 96636764160
Here’s an extract from a standard 220.127.116.11 AWR report:
Here’s one I keep forgetting – and spending 15 minutes trying to think of the answer before getting to the “deja vu” point again. I’ve finally decided that I’ve got to write the answer down because that will save me about 14 minutes the next time I forget.
Q. In a Statspack or AWR report there is a section titles “Segments by Row Lock Waits”. Why could an index be subject to a Row Lock Wait ?
A. Try inserting into a table from two different sessions (without committing) two rows with the same primary key. The second insert will wait on event enq: TX – row lock contention, and show up in v$lock with a lock request for a TX lock in mode 4. When you issue a commit or rollback on the first session, and the second statement errors or completes (depending on whether you commit or rollback the first session) it will increase the value for row lock waits in v$segstat (and v$segment_statistics) for the index by 1.
There are variations on the theme, of course, but the key feature is uniqueness with one session waiting for another session to commit or rollback on a conflicting value. This includes cases of foreign key constraint checking such as inserting a child for a parent that has been deleted but not committed (and there’s an interesting anomaly with that scenario which – in 10g, at least – reports more row lock waits on the parent PK than you might expect.)
Here’s a deadlock graph the appeared on Oracle-L and OTN a couple of days ago.
Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TX-001a0002-0002a0fe 196 197 X 166 1835 S TM-0000c800-00000000 166 1835 SX 196 197 SX SSX
It’s a little unusual because instead of the common TX mode 6 (eXclusive) crossover we have one TX and one TM lock, the TX wait is for mode 4 (S) and the TM wait is for a conversion from 3 (SX) to 5 (SSX).
The modes and types give us some clues about what’s going on: TX/4 is typically about indexes involved in referential integrity (though there are a couple of more exotic reasons such as wait for ITLs, Freelists or tablespace status change); conversion of a TM lock from mode 3 to mode 5 is only possible (as far as I know) in the context of missing foreign key indexes when you delete a parent row.
Here’s a simple data set to help demonstrate the type of thing that could have caused this deadlock:
drop table child; drop table parent; create table parent ( id number(4), name varchar2(10), constraint par_pk primary key (id) ) ; create table child( id_p number(4), id number(4), name varchar2(10), constraint chi_pk primary key (id, id_p), constraint chi_fk_par foreign key(id_p) references parent on delete cascade ) ; insert into parent values (1,'Smith'); insert into parent values (2,'Jones'); insert into child values(1, 1, 'Simon'); insert into child values(2, 1, 'Janet'); commit;
Note that I have define the primary key on the child the “wrong way round”, so that the foreign key doesn’t have a supporting index. Note also that the foreign key constraint is defined as ‘on delete cascade’ – this isn’t a necessity, but it means I won’t have to delete child rows explicitly in my demo.
Now we take the following steps:
Session 1: delete from parent where id = 1;
This will delete the child row – temporarily taking a mode 4 (S) lock on the child table – then delete the parent row. Both tables will end up locked in mode 3.
Session 2: insert into child values (1,2,'Sally');
This will lock the parent table in mode 2, lock the child table in mode 3, then wait with a TX mode 4 for session 1 to commit or rollback. If session 1 commits it will raise Oracle error: “ORA-02291: integrity constraint (TEST_USER.CHI_FK_PAR) violated – parent key not found”; if session 1 rolls back the insert will succeed.
Session 1: delete from parent where id = 2;
This will attempt to lock the child table in mode 4, find that there it already has the child locked in mode three (which is incompatible with mode 4) and therefore attempt to convert to mode 5 (SSX). This will make it queue, waiting for session 2 to commit.
Three seconds later session 2 (the first to start waiting) will timeout and report a deadlock with the follow deadlock graph:
Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TM-00015818-00000000 14 371 SX 17 368 SX SSX TX-0009000e-000054ae 17 368 X 14 371 S session 371: DID 0001-000E-00000018 session 368: DID 0001-0011-00000005 session 368: DID 0001-0011-00000005 session 371: DID 0001-000E-00000018 Rows waited on: Session 368: no row Session 371: no row Session 368: pid=17 serial=66 audsid=2251285 user: 52/TEST_USER O/S info: user: HP-LAPTOPV1\jonathan, term: HP-LAPTOPV1, ospid: 2300:3528, machine: WORKGROUP_JL\HP-LAPTOPV1 program: sqlplus.exe application name: SQL*Plus, hash value=3669949024 Current SQL Statement: delete from parent where id = 2 End of information on OTHER waiting sessions. Current SQL statement for this session: insert into child values(1,2,'new')
You’ll notice that there are no rows waited for – session 1 isn’t waiting for a row it’s waiting for a table and session 2 isn’t waiting for a table row it’s waiting for an index entry.
Footnote: There are several variations on the theme of one session inserting child rows when the other session has deleted (or inserted) the parent. The uncommitted parent change is an easy cause of the TX/4; the delete with unindexed foreign key is a necessary cause of the SX -> SSX.
So, what he hell is that V8 Bundled Exec call which shows up in various Oracle 11g monitoring reports?!
It’s yet another piece of instrumentation which can be useful for diagnosing non-trivial performance problems. Oracle ASH has allowed us to measure what is the top wait event or top SQLID for a long time, but now it’s also possible to take a step back and see what type of operation the database session is servicing.
I am talking about Oracle Program Interface (OPI) calls. Basically for each OCI call in the client side (like , OCIStmtExecute, OCIStmtFetch, etc) there’s a corresponding server side OPI function (like opiexe(), opifch2() etc).
Many Oracle DBA’s are probably familiar with what Optimizer trace files are and likely know how to create them. When I say “Optimizer trace” more than likely you think of event 10053, right? SQL code like this probably is familiar then:
alter session set tracefile_identifier='MY_10053'; alter session set events '10053 trace name context forever'; select /* hard parse comment */ * from emp where ename = 'SCOTT'; alter session set events '10053 trace name context off';
In 11g, a new diagnostic events infrastructure was implemented and there are various levels of debug output that you can control for sql compilation. ORADEBUG shows us the hierarchy.
I have uploaded the latest hacking session video to blip.tv. I have edited it a little, I cut out the part where I spilled an entire Red Bull onto my desk, with some onto my laptop (some keys are still sticky:)
Also, I do upload all these sessins into iTunes – so you can subscribe to my podcast! That way you can download the videos into your computer, phone or iPad. I have deliberately used 1024×768 resolution so it would look awesome on iPad screen! (so hopefully your commute time gets a bit more fun now ;-)
How do you trouble-shoot a problem ? It’s not an easy question to answer when posed in this generic fashion; but perhaps it’s possible to help people trouble-shoot by doing some examples in front of them. (This is why I’ve got so many statspack/AWR examples – just reading a collection of different problems helps you to get into the right mental habit.)
So here’s a problem someone sent me yesterday. Since it only took a few seconds to read, and included a complete build for a test case, with results, and since it clearly displayed an Oracle bug, I took a look at it. (I’ve trimmed the test a little bit, there were a few more queries leading up to the error):
create table person (id number(2), name varchar2(10)) ; insert into person values (1, 'Alpha') ; insert into person values (2, 'Bravo') ; insert into person values (3, 'Charlie') ; insert into person values (4, 'Charles') ; insert into person values (5, 'Delta') ; create or replace view vtest as select id, 'C' as letter from person where name like 'C%' ; select p.id, p.name, v.id, v.letter from person p left join vtest v on v.id = p.id order by p.id ;
The problem was that 10.2.0.4 and 18.104.22.168 gave different results – and the 22.214.171.124 result was clearly wrong. So the question was: “is there something broken with outer joins on views, or possibly ANSI outer joins?” (The ansswer to the last question is always “probably” as far as I’m concerned, but I wouldn’t turn that into a “yes” without checking first.) Here are the two results:
10.2.0.4: ======== ID NAME ID L ---------- ---------- ---------- - 1 Alpha 2 Bravo 3 Charlie 3 C 4 Charles 4 C 5 Delta 126.96.36.199 ======== ID NAME ID L ---------- ---------- ---------- - 1 Alpha C 2 Bravo C 3 Charlie 3 C 4 Charles 4 C 5 Delta C
Clearly the extra ‘C’s in the letter column are wrong.
So what to do next ? Knowing that Oracle transforms ANSI SQL before evaluating an execution plan I decided to run the 10053 trace. Sometimes you get lucky and see the “unparsed SQL” in this trace file, a representation (though not necessarily 100% exact) image of the statement for which Oracle will generate a plan. I was lucky, this was the unparsed SQL (cosmetically enhanced):
SELECT P.ID ID, P.NAME NAME, PERSON.ID ID, CASE WHEN PERSON.ROWID IS NOT NULL THEN 'C' ELSE NULL END LETTER FROM TEST_USER.PERSON P, TEST_USER.PERSON PERSON WHERE PERSON.ID (+) = P.ID AND PERSON.NAME(+) LIKE 'C%' ORDER BY P.ID ;
So I ran this query, and found that the same error appeared – so it wasn’t about ANSI or views. So possibly it’s something about the CASE statement and/or the ROWID in the CASE statement, which I tested by adding three extra columns to the query:
person.name, person.rowid, CASE WHEN PERSON.name IS NOT NULL THEN 'C' ELSE NULL END LETTER
With these extra columns I got the following results from the query:
ID NAME ID NAME ROWID L L ---------- ---------- ---------- ---------- ------------------ - - 1 Alpha C 2 Bravo C 3 Charlie 3 Charlie AAAT7gAAEAAAAIjAAC C C 4 Charles 4 Charles AAAT7gAAEAAAAIjAAD C C 5 Delta C
So the CASE did the right thing with the person.name column, but the wrong thing with the person.rowid column.
Time to get onto MOS (Metalink).
I searched the bug database with the key words: case rowid null
This gave me 2,887 hits, so I added the expression (with the double quotes in place) “outer join”
This gave me 110 hits, so from the “product category” I pick “Oracle Database Products”
This gave me 80 hits, and the first one on the list was:
Bug 10269193: WRONG RESULTS WITH OUTER JOIN AND CASE EXPRESSION OPTIMIZATION CONTAINING ROWID
The text matched my problem, so job done – except it’s reported as not fixed until 12.1
This isn’t a nice bug, of course, because the particular problem can be generated automatically in the transformation of ANSI outer joins to Oracle outer joins, so you can’t just change the code.
In passing, it’s taken me 31 minutes to write this note – that’s 10 minutes longer than it took to pin down the bug, but I have to say I got lucky on two counts: first, that the “unparsed SQL” was available, second that my choice of key words for MOS got me to the bug so quickly (which is where I usually find I waste most time).