Who's online

There are currently 0 users and 49 guests online.

Recent comments




Here’s a nice example on the OTN database forum of Dom Brooks looking at the evidence.

  • The query is slow – what does the trace say.
  • There’s “row source execution” line that says we get 71,288 rows before doing a hash un ique drops it to 3,429 rows.
  • There’s a statement (upper case, bind variables as :Bn) in the trace file that has been executed 71,288 times
  • A very large fraction of the trace file time is in the secondary statement
  • There’s a user-defined function call in the original select list, before a ‘select distinct’.

Conclusion: the code should probably do a “distinct” in an inline view before calling the function, reducing the number of calls to the function from 71,288 to 3,429.

Footnote: There may be other efficiency steps to consider – I’m always a little suspicious of a query that uses “distinct”: possibly it’s hiding an error in logic, possibly it should be rewritten with an existence subquery somewhere, but sometimes it really is the best strategy. There’s are some unusual statistics names coming from autotrace in the OP’s system – I wonder if he’s installed one of Tanel Poder’s special library hacks.

More CR

Following on from yesterday’s post on consistent reads, I thought I’d make the point that the way you work can make an enormous difference to the amount of work you do. Here’s a silly little demo (in

drop table t1 purge;
create table t1 (id number, n1 number);
insert into t1 values (1,0);
insert into t1 values (2,0);

execute dbms_stats.gather_table_stats(user,'t1')
execute snap_my_stats.start_snap

	for i in 1..1000 loop
		update t1 set n1 = i where id = 1;
	end loop;

execute snap_my_stats.end_snap

set doc off

Output - (some rows deleted)
Name                                                                     Value
----                                                                     -----
opened cursors cumulative                                                   11
user calls                                                                   6
recursive calls                                                          1,068
recursive cpu usage                                                          7
session logical reads                                                    4,041
CPU used when call started                                                   7
CPU used by this session                                                     7
DB time                                                                      6
db block gets                                                            1,030
db block gets from cache                                                 1,030
consistent gets                                                          3,011
consistent gets from cache                                               3,011
consistent gets - examination                                                4
db block changes                                                         2,015
change write time                                                            4
free buffer requested                                                    1,014
switch current to new buffer                                             1,000
calls to kcmgas                                                          1,014
calls to get snapshot scn: kcmgss                                        3,009
redo entries                                                               960
redo size                                                              295,160
undo change vector size                                                111,584
no work - consistent read gets                                           1,004
table scans (short tables)                                               1,001
table scan rows gotten                                                   2,002
table scan blocks gotten                                                 1,001
buffer is pinned count                                                   1,000
execute count                                                            1,009


I’ve created two rows in a table, then updated one of them 1,000 times – using a table scan to do the update. I haven’t yet committed my transaction. At this point I’m going to use a second session to run the same update loop on the second row in the table:

	for i in 1..1000 loop
		update t1 set n1 = i where id = 2;
	end loop;

Name                                                                     Value
----                                                                     -----
opened cursors cumulative                                                    8
user calls                                                                   6
recursive calls                                                          1,009
recursive cpu usage                                                        170
session logical reads                                                  965,999
CPU used when call started                                                 172
CPU used by this session                                                   172
DB time                                                                    173
db block gets                                                            1,030
db block gets from cache                                                 1,030
consistent gets                                                        964,969
consistent gets from cache                                             964,969
consistent gets - examination                                          961,965
db block changes                                                         3,016
consistent changes                                                   1,001,000
free buffer requested                                                    1,015
CR blocks created                                                        1,001
calls to kcmgas                                                          1,015
calls to get snapshot scn: kcmgss                                        3,008
redo entries                                                             1,936
redo size                                                              358,652
undo change vector size                                                111,608
data blocks consistent reads - undo records applied                  1,001,000
cleanouts and rollbacks - consistent read gets                           1,001
immediate (CR) block cleanout applications                               1,001
active txn count during cleanout                                         2,000
cleanout - number of ktugct calls                                        1,001
IMU CR rollbacks                                                        41,041
table scans (short tables)                                               1,001
table scan rows gotten                                                   2,002
table scan blocks gotten                                                 1,001
execute count                                                            1,006

Many of the statistics are (virtually) identical (e.g. “execute count”, “db block gets”, “free buffer requested”); some show an increase by 1,000 (often from zero) – largely because we have to worry 1,000 times about cleaning out the current block and creating a read-consistent version so that we can check if it can be updated.

But the most noticeable changes are in the “CPU time” and “consistent gets” because of the 1,000 times we have to apply 1,000 undo records to the block as we create the read-consistent version of the block. The CPU time has gone from 7 (hundredths of a second) to 172 because of (roughly) 1,000,000 “consistent gets – examination”. As I mentioned yesterday, this matches closely to “data blocks consistent reads – undo records applied” so we know why they are happening. Watch out in your batch jobs – if you have a lot of concurrent update activity going on a significant fraction of the workload may be the constant re-creation of CR clones.

However, there is another interesting detail to watch out for – what happens if I change the update execution path from a tablescan to an indexed access path:

create table t1 (id number, n1 number);
insert into t1 values (1,0);
insert into t1 values (2,0);

execute dbms_stats.gather_table_stats(user,'t1')
create index t1_i1 on t1(id);                            --  Make indexed access path available.

Then with an index hint in my update code, I get the following effects (having done the same update loop on row 1 in the first session, of course):

	for i in 1..1000 loop
		update /*+ index(t1) */ t1 set n1 = i where id = 2;     -- indexed access path hinted
	end loop;

Name                                                                     Value
----                                                                     -----
opened cursors cumulative                                                    7
user calls                                                                   6
recursive calls                                                          1,006
recursive cpu usage                                                         11
session logical reads                                                    2,036
CPU used when call started                                                  11
CPU used by this session                                                    11
DB time                                                                     11
db block gets                                                            1,030
db block gets from cache                                                 1,030
consistent gets                                                          1,006
consistent gets from cache                                               1,006
consistent gets - examination                                                6
db block changes                                                         2,015
free buffer requested                                                       14
shared hash latch upgrades - no wait                                     1,000
calls to kcmgas                                                             14
calls to get snapshot scn: kcmgss                                        1,004
redo entries                                                               960
redo size                                                              295,144
undo change vector size                                                111,608
index crx upgrade (positioned)                                           1,000
index scans kdiixs1                                                      1,000
execute count                                                            1,005

The difference is astonishing – where did all the ‘create CR copy’ activity go ?

I’ve pointed out before now that choosing a different execution plan for an update can have a big impact on performance – this is just another example demonstrating the point.

Consistent Reads

Here’s a quick demo to make a point about consistent reads (prompted by a question on the Oracle-L mailing list):

SQL> drop table t1;

Table dropped.

SQL> create table t1 (n1 number);

Table created.

SQL> insert into t1  values(0);

1 row created.

SQL> begin
  2  for i in 1..1000 loop
  3  update t1 set n1 = i;
  4  end loop;
  5  end;
  6  /

PL/SQL procedure successfully completed.

Note that I haven’t issued a commit in this session, and all I’ve got is a single row in the table (and because it’s my usual demo setup of locally managed tablespaces with uniform extents of 1MB using freelist management I know that that one row is in the first available block of the table).

How much work is a second session going to do to scan that table ?

SQL> alter system flush buffer_cache;
SQL> execute snap_my_stats.start_snap
SQL> select * from t1;
SQL> set serveroutput on size 1000000 format wrapped
SQL> execute snap_my_stats.end_snap
Session stats - 18-Apr 11:33:01
Interval:-  2 seconds
Name                                                                     Value
----                                                                     -----
session logical reads                                                      967
consistent gets                                                            967
consistent gets from cache                                                 967
consistent gets - examination                                              964
consistent changes                                                       1,001
CR blocks created                                                            1
data blocks consistent reads - undo records applied                      1,001
IMU CR rollbacks                                                            41

The snap_my_stats package is similar in concept to Tom Kyte’s “runstats” or Tanel Poder’s “snapper” program to capture changes in values in the dynamic performance views over short time periods. In this case I’ve deleted all but a few of the larger changes, and a couple of small changes.

The figure that stands out (probably) is the “session logical reads” – we’ve done 967 logical I/Os to scan a tables of just one block. The reason for this is that we’ve created a read-consistent copy of that one block (“CR blocks created” = 1), and it has taken a lot of work to create that copy. We’ve had to apply 1,001 undo records (“data blocks consistent reads – undo records applied” = 1001).

Most of those undo records come from individual accesses (which are of the cheaper “consistent gets – examination” type that only need a single get on the “cache buffers chains” latch) to undo blocks, following the “UBA (undo block address)” pointer in the relevant ITL entry of the table block, but since this is a 10g database the last few undo records come out of the “In-memory Undo” of the other session. Basically the cloning operation is something like this:

  1. Get current block
  2. Notice uncommitted transaction
  3. Clone current block – then ignore current block
  4. Get Undo block indicated by uncommitted ITL and apply undo change vector
  5. Repeat step four – 1,000 times
  6. Block is now clear of all uncommitted transactions
  7. There are no committed transactions with a commit SCN later than the start of query
  8. Display contents of block

It is an interesting point that as the first session created undo records it would pin and fill undo blocks – so would only do a few current gets (one for each block) on the undo blocks it was using. As another process reverses out the changes in a CR clone it has to get and release each undo block every time it wants a single undo record … applying undo records introduces far more latch and buffer activity that the original generation of the undo.


It’s worth knowing that there are three statistics relating to applying undo records:

transaction tables consistent reads - undo records applied        Estimating "old" commit SCNs during delayed block cleanout
data blocks consistent reads - undo records applied               Creating CR clones
rollback changes - undo records applied                           The result of a real "rollback;"

See this posting on for a rough description of transaction table consistent reads; and this elderly posting highlighting a benefit of knowing about rollback changes.

Footnote 2

The second step in the list of actions is: “Notice uncommitted transaction”. It’s probably worth pointing out that another part of the ITL entry holds the transaction id (“xid”) which implicitly identifies the undo segment and transaction table slot in that segment that has been used to hold  the transaction state. The current contents of that slot allow Oracle to determine whether or not (and when, if necessary) the transaction was committed.

New scripts, tools and broken links

I have some broken links in my old blog entries right now, so if you’re looking for something, then download the whole zip file from here:

I have uploaded a .zip file (for Windows) and a .tar.gz file (for Unix/Mac). The scripts are all the same with differences in the CR/LF bytes in the files and the init.sql and i.sql which have some OS specific commands in them.

I also uploaded the latest PerfSheet there where I fixed an annoying bug which complained about some missing reference files when opening the file.

I plan to fix the broken links some time between now and my retirement.



Here’s an example of a slightly less common data deadlock graph (dumped from 11gR2, in this case):

[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00010006-00002ade        16      34     X             12      50           S
TX-00050029-000037ab        12      50     X             16      34           S
session 34: DID 0001-0010-00000021	session 50: DID 0001-000C-00000024
session 50: DID 0001-000C-00000024	session 34: DID 0001-0010-00000021
Rows waited on:
Session 50: no row
Session 34: no row
Information on the OTHER waiting sessions:
Session 50:
  pid=12 serial=71 audsid=1560855 user: 52/TEST_USER
            program: sqlplus.exe
  application name: SQL*Plus, hash value=3669949024
  Current SQL Statement:
  update t1 set n3 = 99 where id = 100
End of information on OTHER waiting sessions.
Current SQL statement for this session:
update t1 set n3 = 99 where id = 200

The anomaly is that the waiters are both waiting on S (share) mode locks for a TX enqueue.

It’s fairly well known that Share (and Share sub exclusive, SSX) lock waits for TM locks are almost a guarantee of a missing “foreign key index”; and it’s also fairly well known that Share lock waits for TX locks can be due to bitmap collisions, issues with ITL (interested transaction list) waits, various RI (referential integrity) collisions including simultaneous inserts of the same primary key.

A cause for TX/4 waits that is less well known or overlooked (because a featrure is less-well used) is simple data collisions on IOTs (index organized tables). In the example above t1 is an IOT with a primary key of id. Session 34 and 50 have both tried to update the rows with ids 100 and 200 – in the opposite order. If this were a normal heap table the deadlock graph would be showing waits for eXclusive TX locks, because it’s an IOT (and therefore similar in some respects to a primary key wait) we see waits for Share TX locks.

Reminder and Public Appearances 2011

First, a reminder – my Advanced Oracle Troubleshooting v2.0 online seminar starts next week already. Last chance to sign up, I can accept registrations until Sunday :-)

I won’t do another AOT seminar before Oct (or Nov) this year. More details and sign-up here:

I have rescheduled my Advanced SQL Tuning and Partitioning & Parallel Execution for Performance seminars too. I will do them in September/October. Unfortunately I’m too busy right now to do them before the summer.

Public Appearances:

  • I will be speaking at the UKOUG Exadata Special Event in London on 18th April
  • I have submitted a few papers for Oracle OpenWorld in San Francisco as well (end of Sep/beginning of Oct), all about Exadata. Let’s see how it goes, but I’ll be there anyway, which means that I’ll probably show up at the Oracle Closed World event too!

And that’s all the travel I will do this year…

Virtual Conferences:

I’ll soon announce the 2nd EsSN virtual conference too ;-)

Free online stuff:

Perhaps in a month or so I will do another hacking session (I’ll plan 2 hours this time, 1 hour isn’t nearly enough for going deep). The topic will probably be about low-level details of SQL plan execution internals… stay tuned!

Rows per block

A recent question on the OTN database forum:

Can any one please point to me a document or a way to calculate the average number of rows per block in oralce

One answer would be to collect stats and then approximate as block / avg_row_len – although you have to worry about things like row overheads, the row directory, and block overheads before you can be sure you’ve got it right. On top of this, the average might not be too helpful anyway. So here’s another (not necessarily fast) option that gives you more information about the blocks that have any rows in them (I picked the source$ table from a 10g system because source$ is often good for some extreme behaviour).

break on report

compute sum of tot_blocks on report
compute sum of tot_rows   on report

column avg_rows format 999.99

	min(rows_per_block)			min_rows,
	max(rows_per_block)			max_rows,
	sum(block_ct)				tot_blocks,
	sum(row_total)				tot_rows,
	round(sum(row_total)/sum(block_ct),2)	avg_rows
		ntile(20) over (order by rows_per_block) twentieth,
		count(*)			block_ct,
		rows_per_block * count(*)	row_total
			fno, bno, count(*) rows_per_block
				dbms_rowid.rowid_relative_fno(rowid)	as fno,
				dbms_rOwId.rowid_block_number(rowid)	as bno
		group by
			fno, bno
	group by
	order by
group by
order by

I’ve used the ntile() function to split the results into 20 lines, obviously you might want to change this according to the expected variation in rowcounts for your target table. In my case the results looked like this:

---------- ---------- ---------- ---------- ---------- --------
         1          1         11       2706       3470     1.28
         2         12         22         31        492    15.87
         3         23         34         30        868    28.93
         4         35         45         20        813    40.65
         5         46         57         13        664    51.08
         6         59         70         18       1144    63.56
         7         71         81         23       1751    76.13
         8         82         91         47       4095    87.13
         9         92        101         79       7737    97.94
        10        102        111        140      14976   106.97
        11        112        121        281      32799   116.72
        12        122        131        326      41184   126.33
        13        132        141        384      52370   136.38
        14        142        151        325      47479   146.09
        15        152        161        225      35125   156.11
        16        162        171        110      18260   166.00
        17        172        181         58      10207   175.98
        18        182        191         18       3352   186.22
        19        193        205         22       4377   198.95
        20        206        222         16       3375   210.94
                                 ---------- ----------
sum                                    4872     284538

Of course, the moment you see a result like this it prompts you to ask more questions.

Is the “bell curve” effect that you can see centred around the 13th ntile indicative of a normal distribution of row lengths – if so why is the first ntile such an extreme outlier – is that indicative of a number of peculiarly long rows, did time of arrival have a special effect, is it the result of a particular pattern of delete activity … and so on.

Averages are generally very bad indicators if you’re worried about the behaviour of an Oracle system.

Latch contention troubleshooting case study and Flashback Database performance issues with LOBs

Steve Bamber has written up a case study of library cache latch contention troubleshooting of an Apex application with LatchProf. I’m happy that others also see the value and have had success with my new LatchProf based latch contention troubleshooting approach which takes into account both sides of the contention story (latch waiters and latch holders/blockers) as opposed to the guesswork used previously (hey if it’s shared pool latch contention – is must be about bad SQL not using bind variables …. NOT always…)

Anyway, I’m happy. If you have success stories with LatchProf, please let me know!

As a second topic of interest, Laimutis Nedzinskas has written some good notes about the effect and overhead of Flashback Database option when you are using and modifying (nocache) LOBs. We’ve exchanged some mails on this topic and yeah, my clients have sure seen some problems with this combination as well. You basically want to keep your LOBs cached when using FB database…

Oracle Troubleshooting TV Show: Season 1, Episode 01 ;-)

Ok, it’s official – the first and only Oracle Troubleshooting TV show is live now!

The first show is almost 2 hours about the ORA-4031 errors and shared pool hacking. It’s a recording of the US/EMEA timezone online hacking session I did some days ago.

There are a couple of things to note:

  1. The text still isn’t as sharp as in the original recording, but it’s much better than in my previous upload attempts and is decently readable. I’ll try some more variations with my next shows so I hope the text quality will get better! Or maybe I should just switch to GUI tools or powerpoint slides? ;-)
  2. You probably should view this video in full screen (otherwise the text will be tiny and unreadable)
  3. There’s advertising in the beginning (and maybe end) of this show! I’ll see how much money I’ll make out of this – maybe these shows start contributing towards the awesome beer selection I’ll have in my fridge some day (right now I have none). Viewing a 30-sec advert is small price to pay for 2 hours of kick-ass shared pool hacking content !!!
  4. You can download the scripts and tools used in the demos from
  5. Make sure you check out my online Oracle troubleshooting seminars too (this April and May already)

View the embedded video below or go to my official Oracle Troubleshooting TV show channel:


ASSM wreck

Yesterday I introduced a little framework I use to avoid the traps inherent in writing PL/SQL loops when modelling a session that does lots of simple calls to the database. I decided to publish the framework because I had recently come across an example where a series of SQL statements gives a very different result from a single PL/SQL block.

The model starts with a simple data set – which in this case is created in a tablespace using ASSM (automatic segment space management), an 8KB block size and 1MB uniform extents (in a locally management tablespace).

create table t1
tablespace test_8k_assm
	trunc((rownum-1)/100)	n1,
	lpad('x',40)		v1,
	rpad('x',100)		padding
connect by
	rownum <= 20000

create index t1_i1 on t1(n1, v1)
tablespace test_8k_assm

validate index t1_i1;
execute print_table('select * from index_stats');

You can see that the n1 column is defined to have 200 rows for each of 100 different values, and that each set of two hundreds rows is stored (at least initially) in a very small cluster of blocks.

With the data set in place I am now going to pick a set of two hundred rows at random, delete it, re-insert it, and commit; and I’m going to repeat that process 1,000 times.

	rand	number(3);
	for i in 1..1000 loop

		rand := trunc(dbms_random.value(0,200));

		delete from t1
		where n1 = rand

		insert into t1
		connect by
			rownum <= 100


	end loop;

validate index t1_i1;
execute print_table('select * from index_stats');

You might think that this piece of code is a little strange – but it is a model of some processing that I’ve recently seen on a client site, and it has crossed my mind that it might appear in a number of systems hidden underneath the covers of dbms_job. So what does it do to the index ?

Given the delay that usually appears between the time an index entry is marked as deleted and the time that the space can be reused, and given the way I’ve engineered my date so that the space needed for the 200 rows for each key value is little more than a block (an important feature of this case), I wouldn’t be too surprised if the index had stabilised at nearly twice its original size. But that’s not what happened to my example running under ASSM. Here are the “before” and “after” results from my test:

                       Before         After
LF_ROWS                20,000        70,327
LF_BLKS                   156           811
LF_ROWS_LEN         1,109,800     3,877,785
BR_ROWS                   155           810
BR_BLKS                     3            10
BR_ROWS_LEN             8,903        45,732
DEL_LF_ROWS                 0        50,327
DEL_LF_ROWS_LEN             0     2,767,985
DISTINCT_KEYS             200           190
MOST_REPEATED_KEY         100         1,685
BTREE_SPACE         1,272,096     6,568,320
USED_SPACE          1,118,703     3,923,517
PCT_USED                   88            60
ROWS_PER_KEY              100           370
BLKS_GETS_PER_ACCESS       54           189

It’s a small disaster – our index has grown in size by a factor of about five, and we have more deleted rows than “real” rows. (Note, by the way, that the awfulness of the index is NOT really indicated by the PCT_USED figure – one which is often suggested as an indicator of the state of an index).

Unfortunately this is the type of problem that doesn’t surprise me when using ASSM; it’s supposed to help with highly concurrent OLTP activity (typified by a large number of very small transactions) but runs into problems updating free space bitmaps whenever you get into “batch-like” activity.

However, there is a special consideration in play here – I’ve run the entire operation as a single pl/sql loop. Would the same problem appear if I ran each delete/insert cycle as a completely independent SQL script using the “start_1000.sql” script from my previous note ?

To test the effect of running 1,000 separate tasks, rather than executing a single pl/sql loop, I wrote the following code into the start_1.sql script that I described in the article before running start_1000.sql:

	rand	number(3);

	rand := trunc(dbms_random.value(0,200));

	delete from t1
	where n1 = rand

	insert into t1
	connect by
		rownum <= 100



The impact was dramatically different. (Still very wasteful, but quite a lot closer to the scale of the results that you might expect from freelist management).

                       Before         After
                    ---------     ---------
LF_ROWS                20,000        39,571
LF_BLKS                   156           479
LF_ROWS_LEN         1,109,800     2,196,047
BR_ROWS                   155           478
BR_BLKS                     3             6
BR_ROWS_LEN             8,903        26,654
DEL_LF_ROWS                 0        19,571
DEL_LF_ROWS_LEN             0     1,086,247
DISTINCT_KEYS             200           199
MOST_REPEATED_KEY         100           422
BTREE_SPACE         1,272,096     3,880,192
USED_SPACE          1,118,703     2,222,701
PCT_USED                   88            58
ROWS_PER_KEY              100           199
BLKS_GETS_PER_ACCESS       54           102

I haven’t yet investigated why the pl/sql loop should have produced such a damaging effect – although I suspect that it might be a side effect of the pinning of bitmap blocks (amongst others, of course) that takes place within a single database call. It’s possible that the repeated database calls from SQL*Plus keep “rediscovering” bitmap blocks that show free space while the pinning effects stop the pl/sql from “going back” to bitmap blocks that have recently acquired free space.

Interestingly the impact of using ASSM was dramatically reduced if one object used freelists and the other used ASSM – and with my specific example the combination of a freelist table with an ASSM index even did better than the expected 50% usage from the “traditional” option of using freelists for both the table and index.

Note – the purpose of this note is NOT to suggest that you should avoid using ASSM in general; but if you can identify code in your system that is doing something similar to the model then it’s worth checking the related indexes (see my index efficiency note) to see if any of them are displaying the same problem as this test case. If they are you may want to do one of two things: think about a schedule for coalescing or even rebuilding problem indexes on a regular basis, or see if you can move the table, index, or both, into a tablespace using freelist management.