Search

OakieTags

Who's online

There are currently 0 users and 35 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Full scans, direct path reads and ORA-8103 error hacking session video here (plus iTunes podcast address!)

I have uploaded the latest hacking session video to blip.tv. I have edited it a little, I cut out the part where I spilled an entire Red Bull onto my desk, with some onto my laptop (some keys are still sticky:)

Also, I do upload all these sessins into iTunes – so you can subscribe to my podcast! That way you can download the videos into your computer, phone or iPad. I have deliberately used 1024×768 resolution so it would look awesome on iPad screen! (so hopefully your commute time gets a bit more fun now ;-)

 

Enjoy!

iTunes video-podcast:

 

 

Rise of the Planet of the Apes…

Rise of the Planet of the Apes is AWESOME! Love it, love it, love it!

Don’t go expecting an all out action film, because it isn’t one. The trailers would have you believe it’s a constant stream of apes going … ape. :) For the most part it’s relatively even paced, but in a good way. Thought provoking and visually fantastic. The movement of the apes is great to watch, especially when they do go nuts. Us humans have a habit of anthropomorphizing animals (don’t get me started on dog and cat owners) so seeing it done on screen like this kinda taps into something primal in me. :)

Easily the best of the Planet of the Apes films, even including the original, which was pretty cool when I saw it as a kid.

Cheers

Tim…




Pickler Fetch – What is it?

How many of you have seen the step COLLECTION ITERATOR PICKLER FETCH appearing in Explain Plans and wondered “what is a Pickler Fetch”?

{Note – make sure you check out the comments on this post if you are interested in Pickler Fetches. Timur Akhmadeev and Dom Brooks correct and extend my original efforts}.

----------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |            |       |       |    31 (100)|          |
|*  1 |  FILTER                               |            |       |       |            |          |
|   2 |   NESTED LOOPS                        |            |       |       |            |          |
|   3 |    NESTED LOOPS                       |            |     1 |    71 |    31   (4)| 00:00:01 |
|   4 |     SORT UNIQUE                       |            |     4 |     8 |    26   (0)| 00:00:01 |
|   5 |      COLLECTION ITERATOR PICKLER FETCH|            |     4 |     8 |    26   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                  | CHHE_PK    |     1 |       |     2   (0)| 00:00:01 |
|   7 |    TABLE ACCESS BY INDEX ROWID        | CHILD_HEAP |     1 |    69 |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

It is where Oracle works through a SQL array. I’ve usually seen this within PL/SQL where an array variable is used within a procedure or package, being passed in containing a set of accounts or customers that are of interest and is used with a cursor. But you might also see it as the step processing the output of a pipelined function (one that returns a set of “rows”) being cast into a table. See this example by Tom Kyte. Tom also suggests that it is called a “Pickler” fetch as the data is pickled – packed and formatted. I never knew that, I just thought someone in Oracle development was having a giggle and it was “Pickled” as it was preserved from the PL/SQL side of the SQL engine. It seems that I was a little off-target with that.

{My thanks to Timur (see comments) who corrected me when I said it was a PL/SQL array. It is not, it is a SQL object}.

The above plan was generated for the below SQL statement:

cursor c_collect is
select  PARE_ID
       ,CRE_DATE
       ,VC_1
from child_heap
where pare_id in (select  id  from table(p_parents) p)
and   cre_date >=trunc(p_date) and cre_date < trunc(p_date+1);

Anyway, I just wanted to make a few comments about Pickler Fetches:

  • The CBO is not very good at looking “inside” such arrays to determine the cardinality of that step. This can cause performance issues.
  • Because using them involves a few steps and potentially involves grants, it is possible for them to be “temporarily removed” during testing and so their impact is not seen.
  • They can leak memory. I think.

I’m now going to cover each of those points in turn. If you just wanted to know what a pickler fetch is and I’ve answered that for you, I suggest you go back to whatever you were doing before you started reading this :-)

By “not very good at looking inside pickler fetches” I mean that the CBO seems to make a poor “base it on the first seen array” (V11 up) or utterly fails to identify how many records are inside the SQL array (V10 down), depending on the Oracle version. From an Oracle 10.2.0.4 system I’ve got the two following examples:

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name              | Rows| Bytes | Cost (%CPU)| Time     | Pstart
--------------------------------------------------------------------------------------------------------------------
|   9 |      NESTED LOOPS                       |                   |   1 |   139 |   114   (1)| 00:00:01 |      
|* 10 |       HASH JOIN SEMI                    |                   |   1 |    50 |    11  (10)| 00:00:01 |      
|* 11 |        TABLE ACCESS BY INDEX ROWID      | ABCDE_ABCDEFGHIJ  |   3 |   144 |     4   (0)| 00:00:01 |      
|* 12 |         INDEX RANGE SCAN                | XX_PK             |   3 |       |     3   (0)| 00:00:01 |      
|  13 |        COLLECTION ITERATOR PICKLER FETCH|                   |     |       |            |          |      
|  14 |       PARTITION RANGE ITERATOR          |                   |   6 |   534 |   103   (0)| 00:00:01 |   KEY
|* 15 |        INDEX RANGE SCAN                 | AAXX_PK           |   6 |   534 |   103   (0)| 00:00:01 |   KEY

Note the cost and expected rows for the Pickler Fetch step. Or rather, the lack of them.

This would manifest itself in the following way in OEM screens:

Note the cardinality of the Pickler Fetch step is 0 {sorry, I cut off the column headings}. This resulted in this step having a cost of 0 and all the subsequent steps of having an expected cardinality of one and having very low costs – between 1 and 3 {Again, not shown, sorry}.

The end result of this was that the CBO struggled to accuratley cost any given plan and came up with several, usually quite terrible, plans that it swapped between as other table stats varied. The CBO was picking between very complex plans with total costs of 100 or 200 or so! Any difference was “significant”.

Please note, OPTIMIZER_DYNAMIC_SAMPLING was set to 4 on this system and I tried hints and session settings to higher levels and they did not prompt the CBO to look into the array, on 10.2.0.4 at least.

In 11.1 things seem to be better, as is shown in the explain plan at the top of this post. The step has a cost. I have to confess, I have not tested this very much on 11 {and if anyone has, feel free to correct me/add enlightenment via comments or pointers to other sources}, but it seems to set the cardinality to the number of elements the Pickler Fetch finds in the first itteration. Unless it uses the same sort of trick Oracle 11 now uses for bind variables (detecting when the value supplied is out of range and generates a new plan) this is going to lead to the old and much ‘loved’ issue of the plan being fixed by the first execution, irrespective of how suitable that plan is.

How do you fix this issue? Well, I resort to the cardinality hint. Usually the number of records being passed into the array is not too variable and any half-decent value is better than nothing in Oracle 10 and before. As for in 11, I like stating the value rather than risking a variable ‘first seen at parsing time’ setting. It is a judgement call. The below is from 11.1 but I’ve used it extensively in 10.2, where the impact is much more significant:

select  PARE_ID
       ,CRE_DATE
       ,VC_1
from child_heap
where pare_id in (select  id  from table(p_parents) )
and   cre_date >=trunc(p_date) and cre_date < trunc(p_date+1)
--
----------------------------------------------------------------------------
| Id  | Operation                             | Name       | Rows  | Bytes |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |            |       |       |
|*  1 |  FILTER                               |            |       |       |
|   2 |   NESTED LOOPS                        |            |       |       |
|   3 |    NESTED LOOPS                       |            |     1 |    71 |
|   4 |     SORT UNIQUE                       |            |     4 |     8 |
|   5 |      COLLECTION ITERATOR PICKLER FETCH|            |     4 |     8 |
|*  6 |     INDEX RANGE SCAN                  | CHHE_PK    |     1 |       |
|   7 |    TABLE ACCESS BY INDEX ROWID        | CHILD_HEAP |     1 |    69 |
----------------------------------------------------------------------------
--
-- and now with hint
--
select  PARE_ID
       ,CRE_DATE
       ,VC_1
from child_heap
where pare_id in (select /*+ CARDINALITY(p,11) */ id  from table(p_parents) p)
and   cre_date >=trunc(p_date) and cre_date < trunc(p_date+1)
--
-----------------------------------------------------------------------------
| Id  | Operation                             | Name       | Rows  | Bytes |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |            |       |       |
|*  1 |  FILTER                               |            |       |       |
|   2 |   NESTED LOOPS                        |            |       |       |
|   3 |    NESTED LOOPS                       |            |     1 |    71 |
|   4 |     SORT UNIQUE                       |            |    11 |    22 |
|   5 |      COLLECTION ITERATOR PICKLER FETCH|            |    11 |    22 |
|*  6 |     INDEX RANGE SCAN                  | CHHE_PK    |     1 |       |
|   7 |    TABLE ACCESS BY INDEX ROWID        | CHILD_HEAP |     1 |    69 |
-----------------------------------------------------------------------------

Note the change of ROWS to 11 in step 5. In V10 this is a change from blank to 11 and in real situations, do not be at all suprised if the plan changes dramatically – away from nested loop access and more to hash joins. {I should note, the cardinality hint is not documented in Oracle 10 or 11 and any use you make of it in live code is your responsibility. Sorry about that}.

What about my second point, about testing them? Well, as an example of testing Pickler processing of SQL arrays, which are defined SQL types, this is what I had to do to run my basic test:

-- pare_types_demo.sql
--
-- create and use some record and table types.
--
create or replace type pare_rec as object
(id number(10)
,name varchar2(100))
/
create or replace type pare_rec_t
as table of pare_rec
/
--
set serveroutput on size unli
--
create or replace procedure get_children
(p_parents in pare_rec_t
,p_date    in date)
is
cursor c_collect is
select  PARE_ID
       ,CRE_DATE
       ,VC_1
from child_heap
where pare_id in (select id from table(p_parents))
and   cre_date >=trunc(p_date) and cre_date < trunc(p_date+1);
begin
  for v_rec in c_collect
  loop
    dbms_output.put_line('parent '||to_char(v_rec.pare_id)||' on '||to_char(v_rec.cre_date)
    ||' value '||v_rec.vc_1);
  end loop;
end;
/
--
declare
l_date        date           :=sysdate-100;
l_parents     pare_rec_t;
begin
  l_parents :=pare_rec_T(PARE_REC(1,'ERIC')
                        ,PARE_REC(2,'BARRY')
                        ,PARE_REC(3,'KERMIT')
                        ,PARE_REC(4,'GONZO')
                        );
  GET_CHILDREN (L_PARENTS,L_DATE);
END;
/

I had to create a table type, which is the SQL array, and this was based on an object type which I had to create first {you can have table types based on standard SQL types but very often they are based on a “row” object}. After creating the stored procedure, I had to define and populate the array with a set of records which I then passed in to my procedure call . {If you want to repeat this yourself, check out my postings on IOTs to get the table creation statement for table CHILD_HEAP}.
Now, I created those types so I had access to them. If those types do not belong to you you have to be granted execute on the types to reference them. Not select, execute. Some sites have a pretty strict attuitude to granting execute on anything and types seem to get forgotten when the execute priviledges against packages and procedures are set up. In a recent situation I had, I was forced to do some testing work on Live and it had taken people with big sticks to get me select access on data. Execute privileges were totally refused. Calmly explaining why it was needed and how it was acceptable fell on not so much deaf as bricked-up ears.

So, for testing, the reference to an array passed in is often replaced by a little sub-select. After all, quite often what is being passed in for a pickler fetch is actually a set of records {but a subset of the rows} from a table that has been collected by a previous processing step.
As an example of such a change:

where  ev.SYS_ID                       = pag.SYS_ID
and    ev.EXT_ID                       = pag.EXT_ID
and    P_EXT_ID                        = pag.EXT_ID
and    P_SYS_ID                        = pag.SYS_ID
and    ev.CREATED_DT                  >= pag.START_DT
and (ev.ID, ev.TX) in (select ID,TX from table(P_ARRAY_IN))
--
--BECOMES
--
where  ev.SYS_ID                       = pag.SYS_ID
and    ev.EXT_ID                       = pag.EXT_ID
and    P_EXT_ID                        = pag.EXT_ID
and    P_SYS_ID                        = pag.SYS_ID
and    ev.CREATED_DT                  >= pag.START_DT
--and (ev.ID, ev.TX) in (select ID,TX from table(P_ARRAY_IN)) --NB swap back in before release!
and (ev.ID, ev.TX) in (select ID,TX from CUSTOMERS WHERE SURNAME='WIDLAKE' and FFNAME='MARTIN')

All that has changed is that we now have a little sub-select rather than the casting of the SQL array into a table and, heck, as the developer might say, those were the records that would have been passed in, the code still works as expected and the same data comes back. No identifying which object types you need, no getting the execute permissions, no populating it yourself in the test harness, we can just swap back in the array later.

{I apologise to good developers, who even now are throwing imaginary darts at me. I know Dawn, you would certainly not do this. But I’ve seen it a couple of times. Developers have enough on their plate to go worrying about esoteric aspects of the CBO}

But the thing is, Oracle can look at that select and evaluate it’s cost and get an expected cardinality. The pickler fetch version has the issues I’ve just covered. I had to deal with a situation just like this last year, it does happen. In dev it was fine, in pre-live testing it was not.

What about memory leaks? Well, I had lots of issues with SQL arrays and memory leaks with oracle 10.1. and 10.2 at one client site and there is a documented bug in oracle 8 with pickler fetch and memory leaks but I have to confess, a quick metalink search did not find any hits for Oracle 10 and 11. So maybe you should not trust me on that one. In the situation I saw the arrays were massive, several MB at times, and so if you are using SQL arrays to pass in a modest list of eg accounts or customers, it is not going to be an issue anyway.

You know, this was just going to be a quick post on something I’ve been meaning to mention for months, not a small essay :-) .

Real-Time SQL Monitoring - Statement Not Appearing

Like Greg Rahn, I've looked at many SQL Monitoring reports over the past year or two. Possibly not as many as Greg, but it's become my default method of communicating* SQL performance issues to colleagues to the point that some might be finding it irritating by now, whilst others are hooked from the start. (Personally, I can't understand those who aren't hooked from the start!)

One of those who have been hooked came to me with a problem last week. He simply couldn't see his report in the OEM SQL Monitoring screen and after asking him if it was really running right now (it was) and attempting a re-run with a /*+ MONITOR */ hint, I was almost stumped. Then I suggested we fall back on using DBMS_XPLAN.DISPLAY_CURSOR to get the plan and when I saw the results, I suddenly understood what the problem was. This was a massive plan! It wasn't a particularly complex query but it was referencing a Data Dictionary view (I can't remember which one now) which expanded out into what looked like hundreds of lines. Which was the problem.

There's a hidden parameter - _sqlmon_max_planlines with a default value of 300 - which limits the number of lines beyond which a statement will not be monitored. This statement exceeded that limit.

A small post, but hopefully useful should you ever wonder why a statement isn't appearing. Some day soon there'll be another post about why you can't find statements that you know executed fairly recently, which is a much more common problem in my day-to-day work. (As a preview, that parameter is _sqlmon_max_plan.) Then another one about the mysterious disappearing tabs!

* Communication is where SQL Mon excels, with the ability to send ACTIVE reports to allow others to dig around in the detail.

Advertising appearing?

I’m curious – is anyone visiting my blog seeing some form of advertising popping up?

I ask as there is a section on “links clicked” in the stats page and rather than the usual traffic of people clicking on the oaktable logo or people in my blogroll, the most common link is for ecopressed-dot-com. I’ve never heard of them. When I go in to my blog I don’t seen anything but then it knows it is “me” so maybe it would not.

I’m not too bothered about it, after all WordPress are hosting my blog for pretty much nothing {I pay them some outrageous sum of a few US dollars a year so I can alter my CSS file and thus make the layout wider}. I’m just curious.

I wonder if this is a result of increased traffic to my site? I’m still fairly small-fry compared to lots of other sites but as I’ve been putting out more stuff of late I think I’m going up the web rankings. Oddly enough, those pictures of bullets I put on last Friday’s philosophy have been very popular. I can’t help but feel that most people looking for information on bullets are going to find a blog about IT somewhat disappointing :-)

Indexing A Column With Just One Distinct Value (All The Madmen)

When one thinks about a column that might benefit from being indexed, one generally considers columns with lots of different values so that the selectivity of the column is such that relatively few rows get selected, making the index appealing to the cost based optimizer.   There are of course many exceptions to this generalisation [...]

Logical I/O Evolution - Part 3: 11g

Preface (with apologies to Kevin Closson)

This blog post is too long

Introduction

In the previous part of this series I've already demonstrated that the logical I/O optimization of the Table Prefetching feature depends on the order of the row sources - and 11g takes this approach a big step further.

It is very interesting that 11g does not require any particular feature like Table Prefetching or Nested Loop Join Batching (another new feature introduced in 11g) to take advantage of the Logical I/O optimization - it seems to be available even with the most basic form of a Nested Loop join.

Note that this optimization has already been mentioned several times, but there was always some confusion so far whether this optimization was related to another new feature that has been introduced with 11g - the so called "fastpath" consistent gets.

Buffer Pinning Optimization

So, let's repeat the already known test case from the previous parts in 11g. Another nice feature of 11g is that we have now full control over the Nested Loop plan shapes / features used by Oracle - we can choose from "classic" Nested Loop Join, Table Prefetching and Nested Loop Join Batching.

This is controlled via the [NO_]NLJ_BATCHING and [NO_]NLJ_PREFETCH hints which you will also find in the "outline" hint list generated for Plan Stability from 11g on.

Interestingly if I wanted to have the "classic" Nested Loop shape then I couldn't achieve that by combining the NO_NLJ_BATCHING and NO_NLJ_PREFETCH hint - one seemed to disable the other one - so I had to resort to the "_nlj_batching_enabled" parameter to disable Nested Loop Join Batching.

So this is what the query hints need to look like if we want to have the classic Nested Loop Join shape in 11g:

select
max(b_filler), max(a_filler)
from (
select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 0) no_nlj_prefetch(b) */
a.id as a_id, a.filler as a_filler, b.id as b_id, b.filler as b_filler
from
t2 a
, t1 b
where
a.id = b.id
);

If you want to test with different plan shapes you can simply modify the hint section as required, for example you can get the Table Prefetching shape by changing above hint from NO_NLJ_PREFETCH to NLJ_PREFETCH etc.

Let's start with the data set where T1 and T2 are not in the same order, and stick to the classic plan shape:

11.2.0.1 Classic Nested Loop - Random order

Inner row source Unique Index - T1 different order than T2

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.67 | 310K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.67 | 310K|
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:03.47 | 310K|
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:02.54 | 300K|
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.76 | 200K|
---------------------------------------------------------------------------------------------------------------

Inner row source Non-Unique Index - T1 different order than T2

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:04.40 | 311K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:04.40 | 311K|
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:04.20 | 311K|
| 3 | TABLE ACCESS FULL | T1 | 1 | 100K| 2720 (1)| 100K|00:00:00.20 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T2 | 100K| 1 | 2 (0)| 100K|00:00:03.28 | 301K|
|* 5 | INDEX RANGE SCAN | T2_IDX | 100K| 1 | 1 (0)| 100K|00:00:01.08 | 201K|
---------------------------------------------------------------------------------------------------------------

So far no difference to previous results, the non-unique index variant is still slower than the unique one, and we do not see any special buffer pinning optimization apart from the one we've already seen in the baseline test.

The relevant session statistics:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
STAT..consistent gets 310,012 311,108 1,096
STAT..consistent gets from cache 310,012 311,108 1,096
STAT..session logical reads 310,012 311,108 1,096
STAT..buffer is not pinned count 200,002 100,012 -99,990
STAT..buffer is pinned count 99,999 199,993 99,994
STAT..index fetch by key 100,000 2 -99,998
STAT..rows fetched via callback 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000
STAT..consistent gets - examination 300,001 100,007 -199,994
STAT..consistent gets from cache (fastpath) 10,011 211,101 201,090
STAT..no work - consistent read gets 10,001 211,091 201,090
LATCH.cache buffers chains 320,024 522,216 202,192

Nothing spectacular here either, but there are at least some interesting points to mention:

- We can see that Oracle took advantage of the so called "fastpath" consistent gets for the "normal" consistent gets - they still took two latch acquisitions per get though. The "fastpath" seems to be about a code optimization when buffers get pinned that probably requires less CPU cycles. I don't know if the code change addresses any further contention/concurrency issues apart from being "faster" (faster is always better, isn't it :-)

- The "buffer is pinned count" statistics are not consistent with what we've seen from 10g:

* The "unique index" variant already misses 90,000 pins, but does not produce more consistent gets, so in total we do not arrive at the anticipated 500,000 buffer visits any more - either something seems to be missing from the instrumentation or Oracle does something fundamentally different

* The "non-unique index" variant however records 10,000 excess pinned buffers, so we end up with 510,000 buffer visits recorded in total

Let's repeat the same with the T1 and T2 data ordered in the same way - but not ordered by ID (so simply uncomment the second call to DBMS_RANDOM.SEED(0)):

11.2.0.1 Classic Nested Loop - Same random order

Inner row source Unique Index - T1 same random order as T2

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.55 | 310K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.55 | 310K|
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:03.35 | 310K|
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.22 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:02.41 | 300K|
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.77 | 200K|
---------------------------------------------------------------------------------------------------------------

Inner row source Non-Unique Index - T1 same random order as T2

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:04.23 | 221K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:04.23 | 221K|
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:04.02 | 221K|
| 3 | TABLE ACCESS FULL | T1 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T2 | 100K| 1 | 2 (0)| 100K|00:00:03.10 | 211K|
|* 5 | INDEX RANGE SCAN | T2_IDX | 100K| 1 | 1 (0)| 100K|00:00:01.09 | 201K|
---------------------------------------------------------------------------------------------------------------

So, that's interesting: We can already see here the same optimization for the non-unique index kicking in as we saw in 10g with Table Prefetching, although the classic plan shape gets used.

The statistics correspond to the result - but there is a slight difference to the 10.2 Table Prefetching case: The "buffer is pinned count" is at least "self-consistent" for the "non-unique index" variant, so there is no "excess" pinning recorded as with the Table Prefetching.

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
LATCH.cache buffers chains 320,030 342,242 22,212
STAT..consistent gets 310,012 221,124 -88,888
STAT..consistent gets from cache 310,012 221,124 -88,888
STAT..session logical reads 310,012 221,124 -88,888
STAT..Cached Commit SCN referenced 110,000 20,007 -89,993
STAT..index fetch by key 100,000 2 -99,998
STAT..rows fetched via callback 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000
STAT..consistent gets from cache (fastpath) 10,011 121,117 111,106
STAT..no work - consistent read gets 10,001 121,107 111,106
STAT..buffer is not pinned count 200,002 10,028 -189,974
STAT..buffer is pinned count 99,999 289,977 189,978
STAT..consistent gets - examination 300,001 100,007 -199,994

Redundant Filter Optimization

As I've just demonstrated the inner table lookup for the "unique index" variant does not use the buffer pinning optimization. It's an interesting little detail that in 11.1.0.7 and 11.2.0.1 putting a filter on the inner table lookup changes the result for the "unique index" variant, so running a query like this using a redundant filter that doesn't change the overall result:

select
max(b_filler), max(a_filler)
from (
select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 0) no_nlj_prefetch(b) */
a.id as a_id, a.filler as a_filler, b.id as b_id, b.filler as b_filler
from
t2 a
, t1 b
where
a.id = b.id
and substr(b.filler, 1, 1) = 'x'
);

will result in such an output:

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.25 | 220K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.25 | 220K|
| 2 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:03.05 | 220K|
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
|* 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:02.12 | 210K|
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.77 | 200K|
---------------------------------------------------------------------------------------------------------------

Session Statistics:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..buffer is pinned count 289,998 289,977 -21
STAT..buffer is not pinned count 10,003 10,028 25
LATCH.JS queue state obj latch 0 36 36
LATCH.row cache objects 67 110 43
STAT..CPU used when call started 59 119 60
STAT..DB time 59 119 60
STAT..CPU used by this session 56 119 63
LATCH.enqueues 2 78 76
LATCH.enqueue hash chains 3 80 77
LATCH.simulator hash latch 9,111 9,304 193
STAT..consistent gets 220,012 221,124 1,112
STAT..consistent gets from cache 220,012 221,124 1,112
STAT..session logical reads 220,012 221,124 1,112
STAT..consistent gets - examination 200,001 100,007 -99,994
STAT..index fetch by key 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000
STAT..consistent gets from cache (fastpath) 20,011 121,117 101,106
STAT..no work - consistent read gets 20,001 121,107 101,106
LATCH.cache buffers chains 240,024 342,248 102,224

The interesting part here is that the "unique index" variant now uses the same buffer pinning optimization as the "non-unique index" one - but resorts to "normal" consistent gets (using the "fastpath" version in this case) for the random table access.

I don't know if this is feature or a side-effect of a bug because it ceases to work in 11.2.0.2 - there the "unique index" variant can not be convinced to make use of the "buffer pinning" optimization, it always performs the "shortcut" logical I/O on in the inner table lookup even with a filter specified.

We'll see later on that this has some interesting consequences with concurrent executions.

Ordered Data Sets

OK, now finally the big one: Let's repeat the test case with data sorted by ID, so by using the ORDER BY ID instead of ORDER BY DBMS_RANDOM.VALUE when populating the tables:

11.2.0.1 Classic Nested Loop - data ordered by ID

Inner row source Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.42 | 122K|
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.42 | 122K|
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:03.21 | 122K|
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:02.27 | 112K|
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.55 | 12314 |
---------------------------------------------------------------------------------------------------------------

Inner row source Non-Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.64 | 33143 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.64 | 33143 |
| 2 | NESTED LOOPS | | 1 | 100K| 202K (1)| 100K|00:00:03.45 | 33143 |
| 3 | TABLE ACCESS FULL | T1 | 1 | 100K| 2720 (1)| 100K|00:00:00.20 | 10010 |
| 4 | TABLE ACCESS BY INDEX ROWID| T2 | 100K| 1 | 2 (0)| 100K|00:00:02.55 | 23133 |
|* 5 | INDEX RANGE SCAN | T2_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.67 | 13126 |
---------------------------------------------------------------------------------------------------------------

The result is staggering: The "non-unique" index variant apparently manages to visit 500,000 buffers with just 33K logical I/Os. It is also almost as fast as the "unique index" variant that obviously does not keep the buffers pinned for the inner table random lookup - let's check the session statistics:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
STAT..consistent gets from cache (fastpath) 11,059 26,587 15,528
STAT..no work - consistent read gets 11,049 26,577 15,528
LATCH.cache buffers chains 133,384 60,829 -72,555
STAT..consistent gets 122,324 33,149 -89,175
STAT..consistent gets from cache 122,324 33,149 -89,175
STAT..session logical reads 122,324 33,149 -89,175
STAT..index fetch by key 100,000 2 -99,998
STAT..rows fetched via callback 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000
STAT..consistent gets - examination 111,265 5,470 -105,795
STAT..buffer is not pinned count 200,014 10,028 -189,986
STAT..buffer is pinned count 5,107 195,440 190,333

We can tell now different things from these statistics:

- The "non-unique index" variant requires just 60,000 latch acquisitions - which corresponds to the reduced number of logical I/Os

- The session statistics only "explain" 195,000 buffer visits via already pinned and 33,000 buffer visits recorded as logical I/Os, so we are missing approx. 270,000 buffer visits from the statistics. Compared to the results of the "unordered" test case we actually see a "reduction" of buffers visited that are already pinned (199,993 vs. 195,440), so that seems to be questionable

- The "unique index" variant still does the "short-cut" logical I/O on the inner table random lookup and hence requires actually more logical I/O and latch acquisitions in this case than the "non-unique index" variant

As we've seen above if in 11.1.0.7 and 11.2.0.1 a filter is put on the inner table random lookup Oracle 11g switches to "normal" consistent gets for the "unique index" variant, and in fact when repeating this experiment with the ordered data set, we see these results:

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:02.97 | 32315 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:02.97 | 32315 |
| 2 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:02.76 | 32315 |
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.22 | 10010 |
|* 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:01.81 | 22305 |
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.56 | 12302 |
---------------------------------------------------------------------------------------------------------------

So by switching to the "normal" consistent gets the buffer pinning optimization gets used for the inner table lookup also for the "unique index" variant (only reproducible in 11.1.0.7 and 11.2.0.1). The session statistics:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
STAT..buffer is not pinned count 10,018 10,028 10
STAT..buffer is pinned count 195,103 195,440 337
STAT..consistent gets 32,315 33,149 834
STAT..consistent gets from cache 32,315 33,149 834
STAT..session logical reads 32,315 33,149 834
STAT..consistent gets from cache (fastpath) 21,050 26,587 5,537
STAT..no work - consistent read gets 21,040 26,577 5,537
STAT..consistent gets - examination 11,265 5,470 -5,795
LATCH.cache buffers chains 53,366 60,835 7,469
STAT..index fetch by key 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000

So we see now the "unique index" variant with similar results and the also a similar "gap" in the buffer visits explained by the statistics.

A slightly funny point is that by adding a "useless" filter we seem to arrive actually at a faster execution time due to the optimization kicking in - something that looks quite counter-intuitive and only seems to work in particular versions.

"Fastpath" consistent gets

To see if this optimization depends on the new "fastpath" consistent gets, let's turn this new feature off by setting "_fastpin_enable" to 0 and restarting the instance:

alter system set "_fastpin_enable" = 0 scope = spfile;

I'm showing here the results for the "inner table filter" variation - but those for the original case without the additional filter are also corresponding to those with "fast pinning" enabled:

11.2.0.1 Classic Nested Loop - data ordered by ID, fast pins disabled, inner table filter

Inner row source Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:02.90 | 32315 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:02.90 | 32315 |
| 2 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:02.70 | 32315 |
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
|* 4 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:01.79 | 22305 |
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.53 | 12302 |
---------------------------------------------------------------------------------------------------------------

Inner row source Non-Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.86 | 33143 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.86 | 33143 |
| 2 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:03.67 | 33143 |
| 3 | TABLE ACCESS FULL | T1 | 1 | 100K| 2720 (1)| 100K|00:00:00.21 | 10010 |
|* 4 | TABLE ACCESS BY INDEX ROWID| T2 | 100K| 1 | 2 (0)| 100K|00:00:02.74 | 23133 |
|* 5 | INDEX RANGE SCAN | T2_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.71 | 13126 |
---------------------------------------------------------------------------------------------------------------

So the same optimization kicked in, and we can tell from the session statistics that the "fastpath" consistent gets indeed have not been used:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
STAT..buffer is not pinned count 10,018 10,028 10
STAT..buffer is pinned count 195,103 195,440 337
STAT..consistent gets 32,315 33,149 834
STAT..consistent gets from cache 32,315 33,149 834
STAT..session logical reads 32,315 33,149 834
STAT..no work - consistent read gets 21,040 26,577 5,537
STAT..consistent gets - examination 11,265 5,470 -5,795
LATCH.cache buffers chains 53,372 60,829 7,457
STAT..index fetch by key 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000

The only significant difference is the absence of the "consistent gets from cache (fastpath)" statistics.

Nested Loop Join Batching

Finally let's check if the new "Nested Loop Batching" optimization does have any additional effects on the test case here by enabling the Nested Loop Join Batching. Changing the hints like this does the job:

.
.
.
select /*+ leading(a) use_nl(a b) opt_param('_nlj_batching_enabled', 1) no_nlj_prefetch(b) */
.
.
.

11.2.0.1 Nested Loop Batching - data ordered by ID, inner table filter

Inner row source Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:02.89 | 32306 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:02.89 | 32306 |
| 2 | NESTED LOOPS | | 1 | | | 100K|00:00:02.70 | 32306 |
| 3 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:01.43 | 22306 |
| 4 | TABLE ACCESS FULL | T2 | 1 | 100K| 2720 (1)| 100K|00:00:00.20 | 10010 |
|* 5 | INDEX UNIQUE SCAN | T1_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.53 | 12296 |
|* 6 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 2 (0)| 100K|00:00:00.57 | 10000 |
---------------------------------------------------------------------------------------------------------------

Inner row source Non-Unique Index - T1 and T2 ordered by ID

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 202K(100)| 1 |00:00:03.05 | 33128 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:03.05 | 33128 |
| 2 | NESTED LOOPS | | 1 | | | 100K|00:00:02.85 | 33128 |
| 3 | NESTED LOOPS | | 1 | 1000 | 202K (1)| 100K|00:00:01.57 | 23128 |
| 4 | TABLE ACCESS FULL | T1 | 1 | 100K| 2720 (1)| 100K|00:00:00.20 | 10010 |
|* 5 | INDEX RANGE SCAN | T2_IDX | 100K| 1 | 1 (0)| 100K|00:00:00.67 | 13118 |
|* 6 | TABLE ACCESS BY INDEX ROWID| T2 | 100K| 1 | 2 (0)| 100K|00:00:00.57 | 10000 |
---------------------------------------------------------------------------------------------------------------

Apart from some minor differences in the number of logical I/Os it doesn't change the outcome. The same applies to the session statistics:

Statistics Name Unique Non-Unique Difference
----------------------------------------------------- -------- ----------- -----------
STAT..table scan blocks gotten 10,000 10,000 0
STAT..table scan rows gotten 100,000 100,000 0
STAT..table fetch by rowid 100,000 100,002 2
STAT..buffer is not pinned count 10,006 10,010 4
STAT..buffer is pinned count 195,115 195,458 343
STAT..consistent gets 32,306 33,134 828
STAT..consistent gets from cache 32,306 33,134 828
STAT..session logical reads 32,306 33,134 828
STAT..consistent gets from cache (fastpath) 21,041 26,572 5,531
STAT..no work - consistent read gets 21,031 26,562 5,531
STAT..consistent gets - examination 11,265 5,470 -5,795
LATCH.cache buffers chains 53,348 60,816 7,468
STAT..index fetch by key 100,000 2 -99,998
STAT..index scans kdiixs1 0 100,000 100,000

What is interesting to see however is that it seems to perform faster, in particular the "non-unique index" variant is now really pretty close to the "unique index" variant - so although the Nested Loop Join Batching doesn't show any significant changes in the statistics and latch acquisition, it seems to save CPU cycles and performs better even without any physical I/O involved.

As a side note, if you want to check the effects of the "Nested Loop Join Batching" on physical I/O you need to be aware of an odd behaviour I've experienced during my tests: If any kind of row source statistics sampling was enabled by either using STATISTICS_LEVEL = ALL, the GATHER_PLAN_STATISTICS hint or even enabling (extended) SQL trace, the optimized, batched form of physical I/O could not be reproduced. You could tell this from the session statistics that start with "Batched IO%" - these all stayed at 0. Only when disabling all these things the effects were visible and the corresponding statistics where non-zero. I don't know why this is the case, but it is an important detail when testing this feature. I'll probably publish a separate post on the physical I/O optimizations of the Vector/Batched I/O at some time in the future.

Scalability

When running the "data ordered by ID" version concurrently it can be seen that the "non-unique index" variant scales now almost equally well as the "unique index" variant - so these two variants are now quite close not only in single-user mode, but they both scale very well, too.

There is another interesting effect that can only be observed when running the test case with the unordered data set concurrently: In recent code releases (10.2.0.5, 11.2.0.1 and 11.2.0.2) the "shortcut" consistent gets on the inner table lookup that are used with the "unique index" variant gets "downgraded" to "normal" consistent gets if there is concurrent access to the block. This can be observed in the session statistics and latch acquisitions:

Statistics Name Value
----------------------------------------------------- -----------
STAT..shared hash latch upgrades - no wait 99,995
STAT..RowCR attempts 100,000
STAT..RowCR hits 100,000
STAT..consistent gets from cache (fastpath) 10,011
STAT..no work - consistent read gets 10,000
STAT..consistent gets - examination 200,013
STAT..consistent gets 310,024
STAT..consistent gets from cache 310,024
LATCH.cache buffers chains 1,680,173

Note in particular how the "consistent gets - examination" statistics have been decreased from 300,000 to 200,000. So with four concurrent executions this "unique index" variant suddenly requires approx. 420,000 latch acquisitions per execution in contrast to the usual 320,000. Since 11.2.0.2 does not support the "filter" trick to make use of the buffer pinning optimization for the inner table lookup with the ordered data set and the "unique index" variant, it suffers twice: Not only it requires single latch acquisitions for the inner table lookup but due to the "downgrade" it performs two latch acquisitions per iteration requiring a whopping 200,000 excess latch acquisitions per concurrent execution with the ordered by ID data set.

It's also interesting to note that the "RowCR" optimization is recorded in the session statistics. I couldn't find much information about this - it seems to be in the code since 10.2 (partial support already in 9.2 RAC), but until 10.2.0.5 it is only enabled in RAC mode and not in single-instance mode (see MOS note "Bug 4951888 - Row CR is not enabled for non RAC systems"). I could reproduce this only in 10.2.0.5, 11.2.0.1 and 11.2.0.2. According to the description it has been specifically introduced for using row-level consistent gets instead of rolling back complete block versions for read-consistency in RAC environments where generating the previous version of a block might require undo blocks from remote instances. Why this optimization shows up in the above single-instance, read-only scenario where no rollback to the block version is required is not clear to me. It is however measurable that the "fallback" seems to slow down execution.

Whether this is a side-effect or a deliberate design choice that performs better in RAC environments or certain consistent read scenarios I can't tell yet, however when switching off this optimization via "alter system set "_row_cr" = false" this "downgrade" with concurrent execution doesn't happen any longer, and 11.2.0.2 performs better in my test cases, although it doesn't bring back the "filter" trick, so 11.2.0.2 is the only release where the "non-unique index" variant scales better with the ordered data set than the "unique index" variant.

A final word on scalability in general: I think it is important to point out that the test harness provided so far only checks for concurrent read access. Since it is interesting to see if the "buffer pinning" optimization observed does have any negative side effects on mixed read/write access to the buffers I've published an updated script set that includes new versions of the concurrent execution master and slave scripts. These allow to run a SELECT FOR UPDATE on both tables involved as first session, and all other sessions in read-only mode in order to test the effects of a mixed read/write concurrency scenario.

The result of this quite simple test shows that the buffer pinning optimization not only scales very well for read-only concurrency but also scales very good for the tested mixed read-write scenario. The provided test case might be a specific and simplistic case (there are some specialities with SELECT FOR UPDATE) and there might be other concurrency scenarios where the buffer pinning might not scale that well (for example potentially "free buffer waits" due to many blocks being pinned) but at least with this test case the result is quite impressive.

As a side note, the mixed read-write test is very interesting on its own in several ways, for example:

- It adds additional pressure on the buffer cache due to clone copies created. A query similar to the one provided by Jonathan Lewis here can be quite revealing. You'll find out that you need a much larger cache to still have a fully cached test case (with 8KB block size at least 512MB for keeping two 80MB segments! fully cached)

- It requires additional buffer cache for the undo blocks

- It will generate a much higher contention on the "cache buffers chains" latches due to the additional buffer cache activity (creating clone copies, rollbacks for consistent reads, current mode gets etc.)

- It requires applying undo to the blocks to arrive at a read-consistent version

- The buffers will have to be accessed in exclusive mode for write access

The updated script set also contains an Excel sheet with results from my test runs on different hardware and Oracle versions as well as a sample query to analyse the buffer cache.

Summary

Oracle 11g extends the logical I/O optimizations that could already been seen in Oracle 10g when using the Table Prefetching Nested Loop shape - and it is available without any further optimizations like Table Prefetching or Nested Loop Join Batching. It is also not depending on the new "fastpath" consistent gets introduced with 11g.

The efficiency of the optimization largely depends on the order of the data, so predicting it is not that easy - a bit similar to predicting the efficiency of the Subquery / Filter caching feature that also depends on data patterns.

However this knowledge might offer additional options how to take advantage of this optimization. Of course introducing additional sort operations might easily outweigh the benefits achieved, but there might be cases where a sort is not that costly and allows to improve scalability/concurrency in extreme cases.

Closing remarks

This blog post got way too long

Crowdsourcing Active SQL Monitor Reports

As my loyal readers will know, I have been a big (maybe BIG) fan of the SQL Monitor Report since it’s introduction in 11g. It would not surprise me if I have looked at over 1000 SQL Monitor Reports in the past 4+ years — so I’m pretty familiar with these bad boys. Since I find them so valuable (and many customers are now upgrading to 11g), I’ve decided to do a deep dive into the SQL Monitor Report at both Oracle OpenWorld 2011 in October and the UKOUG in December. I think I have some pretty interesting and educational examples, but for anyone willing to share Active SQL Monitor Reports from their system, I thought I would extend the possibility to have it publicly discussed at either one of these sessions (or even a future blog post). Sound cool? I think so, though I may be slightly biased.

The Rules & Requirements

Here are some rules, requirements, restrictions, etc.:

  1. The SQL Monitor Report requires Oracle Database 11g and the Oracle Database Tuning Pack.
  2. By sending me your SQL Monitor Report you implicitly grant permission to me to use it however I want (in my sessions, on my blog, on my refrigerator, etc.).
  3. If you want to scrub it (remove the SQL Text, rename tables, etc.), feel free, but if you make the report unusable, it will end up in the bit bucket.
  4. I will only consider SQL Monitor Reports that are of type EM or ACTIVE, not TEXT or HTML or XML.
  5. I prefer the statement uses Parallel Execution, but will accept serial statements nonetheless.
  6. Active SQL Monitor Reports can be either saved from the EM/DB Console SQL Monitoring page, or via SQL*Plus (see code below).
  7. Once you save your Active SQL Monitor Report, validate it is functional from your browser (don’t send me broken stuff).

In order to participate in this once in a lifetime offer, just email the Active SQL Monitor Report file as an attachment to sqlmon@structureddata.org. If you are going to be attending my session at either OOW11 or UKOUG11, let me know so if I choose your report I’ll notify you so you can bring your friends, significant other, boss, etc. Thanks in advance!

--
-- script to create an Active SQL Monitor Report given a SQL ID
-- 11.2 and newer (EM/ACTIVE types are not in 11.1)
--
set pagesize 0 echo off timing off linesize 1000 trimspool on trim on long 2000000 longchunksize 2000000 feedback off
spool sqlmon_4vbqtp97hwqk8.html

select dbms_sqltune.report_sql_monitor(report_level=>'ALL', type=>'EM', sql_id=>'4vbqtp97hwqk8') monitor_report from dual;

spool off

blevel=1

Here’s one of those quick answers I give sometimes on forums or newsgroups. I forget when I wrote this, and when, and what the specific question was – but it was something to do with rebuilding an index on a small table where data was constantly being deleted and inserted.

Another problem with high insert/delete rates appears with very small indexes.

If you have a table that is small but constantly recycles its space you may also find you have an index where the number of leaf blocks puts you close to the borderline between having blevel = 1 and blevel = 2. If the size crosses that border occasionally and the statistics are updated to reflect the change – which is quite likely for a table subject to lots of updates and deletes if you have automatic stats collection enabled – then execution plans could change, resulting in dramatic changes in performance.

The workaround is fairly obvious – don’t let Oracle collect stats automatically on that table, instead create a stats-collection strategy for eliminating the change in blevel. For example, keep the stats locked except when you run your own code to deal with the stats, making sure that you overwrite the index blevel with 1 even if it has just crossed the boundary to 2.

Footnote: the reason why a change from 1 to 2 is dramatic is because Oracle ignores the blevel in the optimizer arithmetic when it is set to 1; so the change from 1 to 2 actually has the impact of a change from zero to 2. Then the cost of a nested loop access is “cost of single access multiplied by number of times you do it” – so the sudden appearance of a 2 in the formula gives an increment in cost of  “2 * number of times you visit the table” of your small table is the second table in a nested loop – and suddenly a nested loop becomes much more expensive without a real change in the data size.

Footnote 2: it should be obvious that you don’t need to rebuild the index once you know what the problem is; but since we’re talking about a small index with a blevel that is usually 1 it probably won’t take more than a fraction of a second to rebuild the index and there’s a fair chance you can find a safe moment to do it. In terms of complexity the solution is just as simple as the stats solution – so you might as well consider it. The only thing you need to be careful about is that you don’t happen to rebuild the index at a time when the blevel is likely to be 2.

Footnote 3: For an example of the type of code that will adjust the blevel of an index see this URL. (Note, the example talks about copying stats from one place to another – but the principle is the same.)

Reaction to riots around the UK…

I spent quite a while watching the coverage of the riots on TV last night and this morning. I also made a few angry posts on twitter. Not surprisingly I am still pretty angry about everything this morning. What follows will be a rather extreme rant about the situation. Look away if easily offended…

What were the recent incidents *not* about?

  • This is not a mass uprising. London has 12 million people and only a few hundred scum are doing this.
  • This is not a race-specific issue. I’ve watched the footage and the crowds are made up of people from all races. Roughly speaking, the mix reflects the racial mix of the area that is being attacked. Don’t use this incident to further your racist (positive or negative) agenda!
  • This is not political. We are not under the control of some unjust dictator and fighting for our political freedom. Likening this to the recent events in the middle east is offensive. Since when has looting a PC store or a phone store been political?
  • This is not about “the kids”. The age range of the people is mixed.

What should be done in my opinion?

  • They should announce a curfew. Being on the streets after that time means you accept the consequences of anything that happens to you. Good people will be off the streets, so you know if you see someone they are up to no good.
  • Police and army reserves should be mobilized. They should be issued with rubber-bullets given a “shoot first, ask questions later” policy.
  • If someone gets injured or killed as a result of the police/army action, tough! No claims. No law suits. No apologies. Tough! If you don’t want to get hurt, don’t be there.
  • We have loads of surveillance cameras on our streets. Faces from the footage should be placed on a website where the public can anonymously shop them to the police. I would encourage anyone who recognizes any people from the footage, or has information from any social network sites, to pass the information to the police and get these people off our streets.
  • Everyone identified should be prosecuted and get a custodial sentence. If it means cramming them into cells and “infringing on their human rights”, so be it. They gave up their rights when they decided to do this.
  • Stop putting “community leaders” on TV who try to justify this crap. I’m sorry, but as a disenfranchised man I do not feel the need to rob a new pair of trainers or a mobile phone. Some of the footage showed a group of kids talking and one said, “Let’s get some watches!” Nuff said!

This country is going to hell in a hand basket. We need to stop pussy footing about and get some authority back.

Cheers

Tim…

PS. If this stops me seeing Rise of the Planet of the Apes, I’m going to kick off… :)