The problem of slow queries on v$lock just came up again on the OTN database forum, so I thought I’d better push out a post that’s been hanging around on my blog for the last few months. This is actually mentioned in MOS in note 1328789.1: “Query Against v$lock Run from OEM Performs Slowly” which points out that it is basically a problem of bad statistics and all you have to do is collect the stats.
Here’s a deadlock graph that might cause a little confusion:
Here’s an oddity that I ran into a little while ago while trying to prepare a sample trace file showing a particular locking pattern; it was something that I’d done before, but trace files can change with different versions of Oracle so I decided to use a copy of 220.127.116.11 that happened to be handy at the time to check if anything had changed since the previous (11gR1) release. I never managed to finish the test; here are the steps I got through:
By some strange coincidence, the “London Bus” effect perhaps, there have been three posts on the OTN database forum in the last couple of days relating to deadlocks; and they have prompted me to indulge in a little rant about the myth of Oracle and deadlock detection; it’s the one that goes:
“Oracle detects and resolves deadlocks automatically.”
Oracle may detect deadlocks automatically, but it doesn’t resolve them, it simply reports them (by raising error ORA-00060 and rolling back one statement) then leaves the deadlocked sessions stuck until the session that received the report resolves the problem or an external agent resolves the problem.
Consider the following example (which, I have to admit, I wrote without access to a live instance):
Here’s one I keep forgetting – and spending 15 minutes trying to think of the answer before getting to the “deja vu” point again. I’ve finally decided that I’ve got to write the answer down because that will save me about 14 minutes the next time I forget.
Q. In a Statspack or AWR report there is a section titles “Segments by Row Lock Waits”. Why could an index be subject to a Row Lock Wait ?
A. Try inserting into a table from two different sessions (without committing) two rows with the same primary key. The second insert will wait on event enq: TX – row lock contention, and show up in v$lock with a lock request for a TX lock in mode 4. When you issue a commit or rollback on the first session, and the second statement errors or completes (depending on whether you commit or rollback the first session) it will increase the value for row lock waits in v$segstat (and v$segment_statistics) for the index by 1.
There are variations on the theme, of course, but the key feature is uniqueness with one session waiting for another session to commit or rollback on a conflicting value. This includes cases of foreign key constraint checking such as inserting a child for a parent that has been deleted but not committed (and there’s an interesting anomaly with that scenario which – in 10g, at least – reports more row lock waits on the parent PK than you might expect.)
Here’s a deadlock graph the appeared on Oracle-L and OTN a couple of days ago.
Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TX-001a0002-0002a0fe 196 197 X 166 1835 S TM-0000c800-00000000 166 1835 SX 196 197 SX SSX
It’s a little unusual because instead of the common TX mode 6 (eXclusive) crossover we have one TX and one TM lock, the TX wait is for mode 4 (S) and the TM wait is for a conversion from 3 (SX) to 5 (SSX).
The modes and types give us some clues about what’s going on: TX/4 is typically about indexes involved in referential integrity (though there are a couple of more exotic reasons such as wait for ITLs, Freelists or tablespace status change); conversion of a TM lock from mode 3 to mode 5 is only possible (as far as I know) in the context of missing foreign key indexes when you delete a parent row.
Here’s a simple data set to help demonstrate the type of thing that could have caused this deadlock:
drop table child; drop table parent; create table parent ( id number(4), name varchar2(10), constraint par_pk primary key (id) ) ; create table child( id_p number(4), id number(4), name varchar2(10), constraint chi_pk primary key (id, id_p), constraint chi_fk_par foreign key(id_p) references parent on delete cascade ) ; insert into parent values (1,'Smith'); insert into parent values (2,'Jones'); insert into child values(1, 1, 'Simon'); insert into child values(2, 1, 'Janet'); commit;
Note that I have define the primary key on the child the “wrong way round”, so that the foreign key doesn’t have a supporting index. Note also that the foreign key constraint is defined as ‘on delete cascade’ – this isn’t a necessity, but it means I won’t have to delete child rows explicitly in my demo.
Now we take the following steps:
Session 1: delete from parent where id = 1;
This will delete the child row – temporarily taking a mode 4 (S) lock on the child table – then delete the parent row. Both tables will end up locked in mode 3.
Session 2: insert into child values (1,2,'Sally');
This will lock the parent table in mode 2, lock the child table in mode 3, then wait with a TX mode 4 for session 1 to commit or rollback. If session 1 commits it will raise Oracle error: “ORA-02291: integrity constraint (TEST_USER.CHI_FK_PAR) violated – parent key not found”; if session 1 rolls back the insert will succeed.
Session 1: delete from parent where id = 2;
This will attempt to lock the child table in mode 4, find that there it already has the child locked in mode three (which is incompatible with mode 4) and therefore attempt to convert to mode 5 (SSX). This will make it queue, waiting for session 2 to commit.
Three seconds later session 2 (the first to start waiting) will timeout and report a deadlock with the follow deadlock graph:
Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TM-00015818-00000000 14 371 SX 17 368 SX SSX TX-0009000e-000054ae 17 368 X 14 371 S session 371: DID 0001-000E-00000018 session 368: DID 0001-0011-00000005 session 368: DID 0001-0011-00000005 session 371: DID 0001-000E-00000018 Rows waited on: Session 368: no row Session 371: no row Session 368: pid=17 serial=66 audsid=2251285 user: 52/TEST_USER O/S info: user: HP-LAPTOPV1\jonathan, term: HP-LAPTOPV1, ospid: 2300:3528, machine: WORKGROUP_JL\HP-LAPTOPV1 program: sqlplus.exe application name: SQL*Plus, hash value=3669949024 Current SQL Statement: delete from parent where id = 2 End of information on OTHER waiting sessions. Current SQL statement for this session: insert into child values(1,2,'new')
You’ll notice that there are no rows waited for – session 1 isn’t waiting for a row it’s waiting for a table and session 2 isn’t waiting for a table row it’s waiting for an index entry.
Footnote: There are several variations on the theme of one session inserting child rows when the other session has deleted (or inserted) the parent. The uncommitted parent change is an easy cause of the TX/4; the delete with unindexed foreign key is a necessary cause of the SX -> SSX.
Here’s an example of a slightly less common data deadlock graph (dumped from 11gR2, in this case):
[Transaction Deadlock] The following deadlock is not an ORACLE error. It is a deadlock due to user error in the design of an application or from issuing incorrect ad-hoc SQL. The following information may aid in determining the deadlock: Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TX-00010006-00002ade 16 34 X 12 50 S TX-00050029-000037ab 12 50 X 16 34 S session 34: DID 0001-0010-00000021 session 50: DID 0001-000C-00000024 session 50: DID 0001-000C-00000024 session 34: DID 0001-0010-00000021 Rows waited on: Session 50: no row Session 34: no row Information on the OTHER waiting sessions: Session 50: pid=12 serial=71 audsid=1560855 user: 52/TEST_USER O/S info: user: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX program: sqlplus.exe application name: SQL*Plus, hash value=3669949024 Current SQL Statement: update t1 set n3 = 99 where id = 100 End of information on OTHER waiting sessions. Current SQL statement for this session: update t1 set n3 = 99 where id = 200
The anomaly is that the waiters are both waiting on S (share) mode locks for a TX enqueue.
It’s fairly well known that Share (and Share sub exclusive, SSX) lock waits for TM locks are almost a guarantee of a missing “foreign key index”; and it’s also fairly well known that Share lock waits for TX locks can be due to bitmap collisions, issues with ITL (interested transaction list) waits, various RI (referential integrity) collisions including simultaneous inserts of the same primary key.
A cause for TX/4 waits that is less well known or overlooked (because a featrure is less-well used) is simple data collisions on IOTs (index organized tables). In the example above t1 is an IOT with a primary key of id. Session 34 and 50 have both tried to update the rows with ids 100 and 200 – in the opposite order. If this were a normal heap table the deadlock graph would be showing waits for eXclusive TX locks, because it’s an IOT (and therefore similar in some respects to a primary key wait) we see waits for Share TX locks.