Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Understanding the different modes of System Statistics aka. CPU Costing and the effects of multiple blocksizes - part 1

Forward to part 2

This is the first part of a series of posts that cover one of the fundamentals of the cost based optimizer in 9i and later. Understanding how the different system statistics modes work is crucial in making the most out of the cost based optimizer, therefore I'll attempt to provide some detailed explanations and samples about the formulas and arithmetics used. Finally I'll show (again) that using multiple block sizes for "tuning" purposes is a bad idea in general, along with detailed examples why I think this is so.

One of the deficiencies of the traditional I/O based costing was that it simply counted the number of I/O requests making no differentation between single-block I/O and multi-block I/O.

System statistics were introduced in Oracle 9i to allow the cost based optimizer to take into account that single-block I/Os and multi-block I/Os should be treated differently in terms of costing and to include a CPU component in the cost calculation.

The system statistics tell the cost based optimizer (CBO) among other things the time it takes to perform a single block read request and a multi-block read request. Given this information the optimizer ought to be able to come to estimates that better fit the particular environment where the database is running on and additionally use an appropriate costing for multi-block read requests that usually take longer than single block read requests. Given the information about the time it takes to perform the read requests the cost calculated can be turned into a time estimate.

The cost calculated with system statistics is still expressed in the same units as with traditional I/O based costing, which is in units of single-block read requests.

Although the mode using system statistics is also known as "CPU costing" despite the name the system statistics have the most significant impact on the I/O costs calculated for full table scans due to the different measure MREADTIM used for multi-block read requests.

Starting with Oracle 10g you have actually the choice of three different modes of system statistics also known as CPU costing:

1. Default NOWORKLOAD system statistics
2. Gathered NOWORKLOAD system statistics
3. Gathered WORKLOAD system statistics

The important point to understand here is that starting with Oracle 10g system statistics are enabled by default (using the default NOWORKLOAD system statistics) and you can only disable them by either downgrading your optimizer (using the OPTIMIZER_FEATURES_ENABLE parameter) or using undocumented parameters or hints ("_optimizer_cost_model" respectively the CPU_COSTING and NOCPU_COSTING hints).

This initial part of the series will focus on the default NOWORKLOAD system statistics introduced with Oracle 10g.

Default NOWORKLOAD system statistics

The default NOWORKLOAD system statistics measure only the CPU speed (CPUSPEEDNW), the two other remaining values used for NOWORKLOAD system statistics IOSEEKTIM (seek time) and IOTFRSPEED (transfer speed) are using default values (10 milliseconds seek time and 4096 bytes per millisecond transfer speed).

Using these default values for the I/O part the SREADTIM (single-block I/O read time) and MREADTIM (multi-block I/O read time) values are synthesized for cost calculation by applying the following formula:

SREADTIM = IOSEEKTIM + db_block_size / IOTFRSPEED

MREADTIM = IOSEEKTIM + mbrc * db_block_size / IOTFRSPEED

where "db_block_size" represents your database standard block size in bytes and "mbrc" is either the value of "db_file_multiblock_read_count" if it has been set explicitly, or a default of 8 if left unset. From 10.2 on this is controlled internally by the undocumented parameter "_db_file_optimizer_read_count". This means that in 10.2 and later the "mbrc" used by the optimizer to calculate the cost can be different from the "mbrc" actually used at runtime when performing multi-block read requests. If you leave the "db_file_multiblock_read_count" unset in 10.2 and later then Oracle uses a default of 8 for cost calculation but uses the largest possible I/O request size depending on the platform, which is usually 1MB (e.g. 128 blocks when using a block size of 8KB). In 10.2 and later this is controlled internally by the undocumented parameter "_db_file_exec_read_count".

Assuming a default block size of 8KB (8192 bytes) and "db_file_multiblock_read_count" left unset, this results in the following calculation:

SREADTIM = 10 + 8192 / 4096 = 10 + 2 = 12ms

MREADTIM = 10 + 8 * 8192 / 4096 = 10 + 16 = 26ms

These values will then be used to calculate the I/O cost of single block and multi-block read requests according to the execution plan (number of single-block reads + number of multi-block reads * MREADTIM / SREADTIM), which means that the I/O cost with system statistics aka. CPU costing is expressed in units of single block reads.

You can derive from above formula that with system statistics the cost of a full table scan operation is going to be more expensive approximately by the factor MREADTIM / SREADTIM compared to the traditional I/O based costing used in pre-10g by default, therefore system statistics usually tend to favor index access a bit more.

Note that above factor MREADTIM / SREADTIM is not entirely correct since the traditional I/O costing introduces a efficiency reduction factor when using higher MBRC settings, presumably to reflect that the larger the number of blocks per I/O request the higher the possibility that it won't be possible to use that large number of blocks per I/O request due to blocks already being in the buffer cache or hitting extent boundaries.

So with a MBRC setting of 8 the adjusted MBRC used for calculation is actually 6.59. Using e.g. a very high setting of 128 for the MBRC will actually use 40.82 for calculation. So the higher the setting the more the MRBC used for calculation will be reduced.

The following test case shall demonstrate the difference between traditional I/O costing, CPU costing and the factor MREADTIM / SREADTIM when using different "db_file_multiblock_read_count" settings. The test case was run against 10.2.0.4 Win32.

Note that the test case removes your current system statistics so you should be cautious if you have non-default system statistics at present in your database.

Furthermore the test case assumes a 8KB database default block size, and a locally managed tablespace with 1MB uniform extent size using manual segment space management (no ASSM).

drop table t1;

-- Create a table consisting of 10,000 blocks / 1 row per block
-- in a 8KB tablespace with manual segment space management (no ASSM)
create table t1
pctfree 99
pctused 1
-- tablespace test_2k
-- tablespace test_4k
tablespace test_8k
-- tablespace test_16k
as
with generator as (
select --+ materialize
rownum id
from all_objects
where rownum <= 3000
)
select
/*+ ordered use_nl(v2) */
rownum id,
trunc(100 * dbms_random.normal) val,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 10000
;

begin
dbms_stats.gather_table_stats(
user,
't1',
cascade => true,
estimate_percent => null,
method_opt => 'for all columns size 1'
);
end;
/

-- Use default NOWORKLOAD system statistics
-- for test but ignore CPU cost component
-- by using an artificially high CPU speed
begin
dbms_stats.delete_system_stats;
dbms_stats.set_system_stats('CPUSPEEDNW',1000000);
end;
/

-- In order to verify the formula against the
-- optimizer calculations
-- don't increase the table scan cost by one
-- which is done by default from 9i on
alter session set "_table_scan_cost_plus_one" = false;

alter session set db_file_multiblock_read_count = 8;

-- Assumption due to formula is that CPU costing
-- increases FTS cost by MREADTIM/SREADTIM, but
-- traditional I/O based costing introduces a
-- efficiency penalty the higher the MBRC is
-- therefore the factor is not MREADTIM/SREADTIM
-- but MREADTIM/SREADTIM/(MBRC/adjusted MBRC)
--
-- NOWORKLOAD synthesized SREADTIM = 12, MREADTIM = 26
-- MREADTIM/SREADTIM = 26/12 = 2.16
-- Factor CPU Costing / traditional I/O costing
-- 2,709/1,518 = 1.78
-- MBRC = 8, adjusted MBRC = 10,000 / 1,518 = 6.59
-- 8/6.59 = 1.21
-- 2.16 / 1.21 = 1.78

select /*+ nocpu_costing */ max(val)
from t1;

-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1518 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 1518 |
-----------------------------------------------------------

select /*+ cpu_costing */ max(val)
from t1;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 2709 (0)| 00:00:33 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 2709 (0)| 00:00:33 |
---------------------------------------------------------------------------

alter session set db_file_multiblock_read_count = 16;

-- Assumption due to formula is that CPU costing
-- increases FTS cost by MREADTIM/SREADTIM, but
-- traditional I/O based costing introduces a
-- efficiency penalty the higher the MBRC is
-- therefore the factor is not MREADTIM/SREADTIM
-- but MREADTIM/SREADTIM/(MBRC/adjusted MBRC)
--
-- NOWORKLOAD synthesized SREADTIM = 12, MREADTIM = 42
-- MREADTIM/SREADTIM = 42/12 = 3.5
-- Factor CPU Costing / traditional I/O costing
-- 2,188/962 = 2.27
-- MBRC = 16, adjusted MBRC = 10,000 / 962 = 10.39
-- 16/10.39 = 1.54
-- 3.5 / 1.54 = 2.27

select /*+ nocpu_costing */ max(val)
from t1;

-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 962 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 962 |
-----------------------------------------------------------

select /*+ cpu_costing */ max(val)
from t1;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 2188 (0)| 00:00:27 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 2188 (0)| 00:00:27 |
---------------------------------------------------------------------------

alter session set db_file_multiblock_read_count = 32;

-- Assumption due to formula is that CPU costing
-- increases FTS cost by MREADTIM/SREADTIM, but
-- traditional I/O based costing introduces a
-- efficiency penalty the higher the MBRC is
-- therefore the factor is not MREADTIM/SREADTIM
-- but MREADTIM/SREADTIM/(MBRC/adjusted MBRC)
--
-- NOWORKLOAD synthesized SREADTIM = 12, MREADTIM = 74
-- MREADTIM/SREADTIM = 74/12 = 6.16
-- Factor CPU Costing / traditional I/O costing
-- 1,928/610 = 3.16
-- MBRC = 32, adjusted MBRC = 10,000 / 610 = 16.39
-- 32/16.39 = 1.95
-- 6.16 / 1.95 = 3.16

select /*+ nocpu_costing */ max(val)
from t1;

-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 610 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 610 |
-----------------------------------------------------------

select /*+ cpu_costing */ max(val)
from t1;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1928 (0)| 00:00:24 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 1928 (0)| 00:00:24 |
---------------------------------------------------------------------------

alter session set db_file_multiblock_read_count = 64;

-- Assumption due to formula is that CPU costing
-- increases FTS cost by MREADTIM/SREADTIM, but
-- traditional I/O based costing introduces a
-- efficiency penalty the higher the MBRC is
-- therefore the factor is not MREADTIM/SREADTIM
-- but MREADTIM/SREADTIM/(MBRC/adjusted MBRC)
--
-- NOWORKLOAD synthesized SREADTIM = 12, MREADTIM = 138
-- MREADTIM/SREADTIM = 138/12 = 11.5
-- Factor CPU Costing / traditional I/O costing
-- 1,798/387 = 4.64
-- MBRC = 64, adjusted MBRC = 10,000 / 387 = 25.84
-- 64/25.84 = 2.48
-- 11.5 / 2.48 = 4.64

select /*+ nocpu_costing */ max(val)
from t1;

-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 387 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 387 |
-----------------------------------------------------------

select /*+ cpu_costing */ max(val)
from t1;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1798 (0)| 00:00:22 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 1798 (0)| 00:00:22 |
---------------------------------------------------------------------------

alter session set db_file_multiblock_read_count = 128;

-- Assumption due to formula is that CPU costing
-- increases FTS cost by MREADTIM/SREADTIM, but
-- traditional I/O based costing introduces a
-- efficiency penalty the higher the MBRC is
-- therefore the factor is not MREADTIM/SREADTIM
-- but MREADTIM/SREADTIM/(MBRC/adjusted MBRC)
--
-- NOWORKLOAD synthesized SREADTIM = 12, MREADTIM = 266
-- MREADTIM/SREADTIM = 266/12 = 22.16
-- Factor CPU Costing / traditional I/O costing
-- 1,732/245 = 7.07
-- MBRC = 128, adjusted MBRC = 10,000 / 245 = 40.82
-- 128/40.82 = 3.13
-- 22.16 / 3.13 = 7.07

select /*+ nocpu_costing */ max(val)
from t1;

-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 245 |
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 245 |
-----------------------------------------------------------

select /*+ cpu_costing */ max(val)
from t1;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1732 (0)| 00:00:21 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 40000 | 1732 (0)| 00:00:21 |
---------------------------------------------------------------------------

So as you can see the I/O costs for a full table scan are significantly different when using default NOWORKLOAD system statistics. You can also see that the SREADTIM and MREADTIM values derived are quite different when using different "db_file_multiblock_read_count" settings. Furthermore the difference between traditional I/O based costing and the CPU costing is not the factor MREADTIM / SREADTIM as suggested by the formula, but is reduced by the adjustment applied to the MBRC when using traditional I/O costing.

The next part of the series will cover the remaining available System Statistics mode.

Unloading data using external tables in 10g

External tables can write as well as read in 10g. May 2005

Helsinki code layers in the DBMS

Ok, let's continue with the second part of "The Helsinki Declaration". That would be the part where I zoom in on the DBMS and show you how best to do this database centric thing.We have seen that the DBMS is the most stable component in everybodies technology landscape. We have also concluded that the DBMS has been designed to handle WoD application BL-code and DL-code. And current DBMS's are

Advanced Oracle Troubleshooting by Tanel Poder in Singapore

When I first saw that Tanel will conduct his seminar in Singapore, I told myself that I would even spend my own money just to be on that training! I’ve already read performance books like Optimizing Oracle Performance, Oracle 8i Internal Services, Forecasting Oracle Performance… And after that I still want more, and I still have questions that need to be answered. Well, if you’re on a tight budget you just opt to download some more docs/books to do multiple reads coupled with research/test cases and also reading through others blog…
But thanks to my boss for the funding, I was there! </p />
</p></div>

    	  	<div class=

Oracle ACE

I've recently been invited by Oracle to accept the Oracle ACE award.

So I'm now an Oracle ACE. You can check my Oracle ACE profile here.

Thanks to Oracle ACE H.Tonguç Yılmaz and special thanks to Oracle ACE Dion Cho, who nominated me for the Oracle ACE award.

Some statistics (since I'm a CBO guy :-):

- I'm truly honored to be Oracle ACE no. 210 in the world
- There are at present 57 Oracle ACEs in the "Database Management & Performance" category (53 in "Database App Development" and 10 in "Business Intelligence")
- There are 7 ACEs from Germany at present

Maxine Johnson

I want to introduce you to Maxine Johnson, assistant manager of men's sportswear at Nordstrom Galleria Dallas. The reason I think Maxine is important is because she taught my son and me about customer service. I met her several months ago. I still have her card, and I'm still grateful to her. Here's what happened.

A few months ago, my wife and I were in north Dallas with some time to spare, and I convinced her to go with me to pick out one or two pairs of dress slacks. I felt like I was wearing the same pants over and over again when I traveled, and I could use an extra pair or two. We usually go to Nordstrom for that, and so we did again. After some time, I had two pairs of trousers that we both liked, and so we had them measured for hemming and picked them up a few days later.

A week or two passed, and then I packed a pair of my new pants for a trip to Zürich. I put them on in the hotel the first morning I was supposed to speak at an event. On my few-block walk from the hotel to the train station, I caught my reflection in a store window, and—hmmp—my pants were just not... really... quite... long enough. Every step, the whole cuff would come way up above the tops of my shoes. I stopped and tugged them down, and then they seemed alright, but then as soon as I started walking again, they'd ride back up and look too short.

They weren't bad enough that anyone said anything, but I was a little self-consious about it. I kept tugging at them all day.

When I hung them back up in my closet at home, I noticed that when I folded them over the hanger, they didn't reach as far as the other pants that I really liked. Sure enough, when I lined up the waists, these new pants were about an inch shorter than my favorite ones that I had bought at Nordstrom probably four years ago.

Now, pants at Nordstrom cost a little more than maybe at a lot of other places, but they're worth to me what I pay for them because they're nice, and they last a long time. But these new ones made me feel bad, because they were just a little bit off. I could already foresee a future of two new pairs of slacks hanging in my closet for years, never really making the starting rotation because they're just a little bit off, but never making the garage sale pile, either, because they had cost too much.

My wife agreed. They were shorter than the others. They were shorter than they should be. I needed to get them fixed.

Now, this is the part I always hate. Having made the decision, the next step is that step where you take the thing back and try to get the problem fixed. I hate that part. My wife doesn't mind it so much, but these were my pants, and so I was the one that had to go back and put them on so someone could fix them. I really dreaded it though, because I knew that the only way they could fix those pants was to take off the cuff.

It's late in the evening by the time my wife helps me build up a little head of steam, and we both decide (well, she decides, but she's right) that tonight is the perfect night for me to go on a 20-mile drive across town to Nordstrom to get my pants fixed. As a matter of fact, it'd be good if my older boy went with me. That makes it a little more fun, because he's good company for me.

It's late enough by now that before I could leave, I had to phone ahead, just to make sure the store was still open. A nice lady answered the phone. I said my name and told the nice lady that I was having some trouble with some slacks I had bought a few weeks ago, and how late did they stay open? She told me to come right on over.

So my boy and I got into the car, and I drove right on over.

A half hour later, I walked into the store, thankful that the doors were still open, carrying two pairs of slacks on a hanger, with my son walking beside me. A smiling nice lady approached me as I entered the men's department. "Mr. Millsap?" Yes, I am. It surprises me anytime someone remembers my name from that one phase of the conversation where I say real fast, "My name is Cary Millsap, and blah blah blah blah blah," and tell my whole story. The person on the phone hadn't asked me again what my name was. She had caught it in the blur at the beginning of my story.

She proceeded to explain to me what was going to happen. I was going to try on the slacks in the dressing room. The tailor would be there waiting for me. She and the tailor would look them over. If there was enough fabric to make them longer, then they'd do that tonight. If there weren't, then she was going to find two new pairs of slacks for me, and the tailor would have them ready for me tomorrow. If for any reason, those didn't work, then she'd keep preparing new trousers for me until I was satisfied.

Mmm, ok. I was probably grinning a little bit by now, because this was pretty fantastic news. I wasn't going to have to get my pants de-cuffed. I was still a little nervous, though, that when I came out of the dressing room, everyone was going to look at me like, "So what's the problem? I don't see any problem. Those are long enough."

When I came out, Maxine Johnson crossed her arms, put her hand to her chin, shook her head a little, and immediately said something to the effect of, "Oh my, no. That won't do at all." So she brought me two new pairs, which I tried on, and which the tailor measured for me. She gave me a reclaim ticket for the next day. As usual, I had missed her name when she introduced herself as I first entered the men's department. (As you probably already figured out, I have a bad habit of not paying enough attention to that part of the conversation that I think of as "the blur.") I did have the good sense to ask for her business card, which is why I know her name is Maxine Johnson.

My boy and I talked the whole ride home that what we had seen that night had been some real, first-class retail customer care right there, and that we all knew where we'd be buying my next pairs of pants. When I had gotten into the car an hour or so before, I had been very apprehensive about what might happen. I had been especially nervous about how I'd perform during the proving-what's-wrong part of the project. But Maxine Johnson put me completely at ease during my experience. She didn't just do the right thing, she did it in such a manner that I felt glad the whole problem had happened. Here's the thing:

Maxine Johnson made me feel like it was not just ok that I brought the pants back for repair, she made me feel like she was delighted by the opportunity to show me what Nordstrom could do for me under pressure.

I hope that the way Maxine Johnson made me feel is the way that my employees and I make our customers feel. I hope it's the way my children make their customers feel someday when they go to work.

Thank you, Maxine Johnson. Thank you.

People ask the wrong question

People who know me, know that I am enthusiastic about Apex. But I am certainly not an Apex expert. By far not. The DBMS is where my knowledge is. But because they know of my enthusiasm, I often get the question whether Apex is mature enough for building a critical or large-scale WoD application.I then (sigh and) reply by saying: "You are asking the wrong question."Pay attention please.In the

The Helskinki approaches to WoD application development

[continuing from my previous post]In a very similar way as I did here for MVC, the Helsinki UI/BL/DL code classes can be mapped across the client, middle and data tiers too:What I do differently here compared to the earlier display of MVC mapping across the tiers, is that whenever the M is distributed across two tiers, I divide the M into BL and DL. The guideline of how to split up the M, is now

Read Consistency, "ORA-01555 snapshot too old" errors and the SCN_ASCENDING hint

Oracle uses for its read consistency model a true multi-versioning approach which allows readers to not block writers and vice-versa, writers to not block readers. Obviously this great feature allowing highly concurrent processing doesn't come for free, since somewhere the information to build multiple versions of the same data needs to be stored.

Oracle uses the so called undo information not only to rollback on-going transactions but also to re-construct old versions of blocks if required. Very simplified when reading data Oracle knows the point in time (which corresponds to an internal counter called SCN, System Change Number) that data needs to be consistent with. In the default READ COMMITTED isolation mode this point in time is defined when a statement starts to execute. You could also say at the moment a statement starts to run its result is pre-ordained. When Oracle processes a block it checks if the block is "old" enough and if it discovers that the block content is too new (has been changed by other sessions but the current access is not supposed to see this updated content according to the point-in-time assigned to the statement execution) it will start to create a copy of the block and use the information available from the corresponding undo segment to re-construct an older version of the block. Note that this process can be iterative: If after re-constructing the older version of the block it's still not sufficiently old more undo information will be used to go further back in time.

Since the undo information of transactions that have been committed is marked as re-usable Oracle is free to overwrite the corresponding undo data under certain circumstances (e.g. no more free space left in the UNDO tablespace). If now an older version of a block needs to be created but the corresponding undo information required to do so has been overridden, the infamous "ORA-01555 snapshot too old" error will be raised, since the required read-consistent view of the data can not be generated any longer.

In order to avoid this error starting from 10g on you only need to have a sufficiently large UNDO tablespace in automatic undo management mode so that the undo information required to create old versions of the blocks doesn't get overridden prematurely. In 9i you need to set the UNDO_RETENTION parameter according to the longest expected runtime of your queries and of course have sufficient space in the UNDO tablespace to allow Oracle to adhere to this setting.

So until now Oracle was either able to provide a consistent view of the data according to its read-consistency model, or you would get an error message if the required undo data wasn't available any longer.

Enter the SCN_ASCENDING hint: As already mentioned by Martin Berger and Chandra Pabba Oracle officially documented the SCN_ASCENDING hint for Oracle 11.1.0.7 in Metalink Note 6688108.8 (Enhancement: Allow ORA-1555 to be ignored during table scan).

I've run some tests using this hint on 9.2.0.8, 10.2.0.4 and 11.1.0.7.

In order to increase the probability to run into the dreaded ORA-01555 error you should perform the following preparation steps (note this applies to all examples that are provided here):

-- create a small undo tablespace
create undo tablespace undo_small datafile '' size 2M;

-- activate small UNDO tablespace
alter system set undo_tablespace = 'UNDO_SMALL' scope = memory;

-- small cache so that old copies of the blocks won't survive in the buffer cache
-- and delayed block cleanout probability increases
alter system set db_cache_size = 2M scope = memory;

Note that all examples here use DBMS_JOB to simulate the simultaneous modification and reading of data, therefore you need to have the JOB_QUEUE_PROCESSES parameter set accordingly, otherwise the job won't get executed.

I've started with a variation of Tom Kyte's example how to deliberately force an ORA-01555 error, which looks like this:

drop table t purge;

create table t
as
select
a.*, 1 as my_id
from
all_objects a
order by
dbms_random.random;

alter table t add constraint t_pk primary key (object_id);

exec dbms_stats.gather_table_stats(null, 'T', cascade=>true)

set serveroutput on timing on

alter session set nls_language = 'AMERICAN';

spool tomkyte_ora1555_error_demo_modified2.log

declare
cursor c
is
select
*
from
(
select /*+ first_rows */
my_id
from
t
order by
object_id
)
where rownum <= 3000;

l_my_id t.my_id%type;
l_rowcnt number := 0;
l_job_id binary_integer;

function submit_job(what in varchar2)
return binary_integer
is
pragma autonomous_transaction;
job_id binary_integer;
begin
dbms_job.submit(job_id, what);
commit;
return job_id;
end;
begin
select
distinct
my_id
into
l_my_id
from
t;

dbms_output.put_line('The MY_ID as of now: ' || l_my_id);

-- result of this query
-- is as of opening the cursor
-- so it needs to return the same MY_ID
-- for all rows as above query demonstrates
open c;

-- now start to wreck the undo
l_job_id := submit_job('
begin
for x in (
select
rowid as rid
from
t
where
rownum <= 10000
) loop
update
t
set
object_name = reverse(object_name),
my_id = 1 - my_id
where
rowid = x.rid;
commit;
end loop;
end;
');

-- start fetching from result set
loop
fetch c into l_my_id;
exit when c%notfound;
l_rowcnt := l_rowcnt + 1;
dbms_output.put_line('Row: ' || l_rowcnt || ' ID: ' || l_my_id);
dbms_lock.sleep(0.01);
end loop;
close c;
exception
when others then
dbms_output.put_line('Rows fetched: ' || l_rowcnt);
raise;
end;
/

spool off

What this code snippet basically does is the following:

1. It creates a table copy of the ALL_OBJECTS view ordered randomly, and adds a primary key index on the OBJECT_ID

2. It issues a query that uses the FIRST_ROWS hint to force an index access to the table because of the available primary key index and the corresponding ORDER BY. It's one of the built-in heuristic rules of the (deprecated) FIRST_ROWS cost based optimizer mode that an ORDER BY is going to use an index if possible to avoid a sort operation. By using this inefficient approach it is ensured that each block of the table will be accesses multiple times due to the random row access driven by the ordered index.

3. It then spawns a job simulating a separate session that starts to overwrite the data row-by-row the query is supposed to read. Specifically the MY_ID column which has been generated with 1 will be set to 0. By committing each single row update operation the small undo tablespace will eventually be filled up and old undo data can be and needs to be overwritten due to insufficient space.

4. While the update loop is running the data from the query gets slowly fetched. Due to the fact that each block will be visited many times according to the index access it's almost guaranteed that the undo information required to re-construct the old version of the block has been overwritten (due to the artificially small undo tablespace) and therefore the ORA-01555 error will occur.

And sure enough, when running this in 11.1.0.7 with the pre-requisites met, the output will look similar to the following. Note that the first line shows what we expect to get from the second query: Only one distinct value, namely 1

The MY_ID as of now: 1
Row: 1 ID: 1
Row: 2 ID: 1
Row: 3 ID: 1
Row: 4 ID: 1
Row: 5 ID: 1
Row: 6 ID: 1
Row: 7 ID: 1
Row: 8 ID: 1
Row: 9 ID: 1
Row: 10 ID: 1
.
.
.
Row: 1768 ID: 1
Row: 1769 ID: 1
Row: 1770 ID: 1
Row: 1771 ID: 1
Row: 1772 ID: 1
Row: 1773 ID: 1
Rows fetched: 1773
declare
*
ERROR at line 1:
ORA-01555: snapshot too old: rollback segment number 15 with name "_SYSSMU15$"
too small
ORA-06512: at line 83

Elapsed: 00:00:52.21

So you can see that the expected error occurred.

Now I've modified the example to use the SCN_ASCENDING hint for the query that fails:

.
.
.
declare
cursor c
is
select /*+ scn_ascending */
*
from
(
select /*+ first_rows */
my_id
from
t
order by
object_id
)
where rownum <= 3000;
.
.
.

Re-running the test case shows that you still get the same error, and obviously the hint doesn't help to avoid the error in this case.

Now if you read the Metalink note subject again, you might notice that it says: "Allow ORA-1555 to be ignored during table scan". Presumably since our example doesn't use a full table scan but an table access by ROWID the hint may be doesn't work as expected.

Let's modify our test case a little bit to use a full table scan instead of the index access path:

drop table t purge;

create table t
as
select
a.*, 1 as my_id
from
all_objects a
order by
dbms_random.random;

alter table t add constraint t_pk primary key (object_id);

exec dbms_stats.gather_table_stats(null, 'T', cascade=>true)

set serveroutput on timing on

alter session set nls_language = 'AMERICAN';

spool tomkyte_ora1555_error_demo_modified3_no_index_usage.log

declare
cursor c
is
select
*
from
(
select /*+ all_rows */
my_id
from
t
--order by
-- object_id

)
where rownum <= 5000;

l_my_id t.my_id%type;
l_rowcnt number := 0;
l_job_id binary_integer;

function submit_job(what in varchar2)
return binary_integer
is
pragma autonomous_transaction;
job_id binary_integer;
begin
dbms_job.submit(job_id, what);
commit;
return job_id;
end;
begin
select
distinct
my_id
into
l_my_id
from
t;

dbms_output.put_line('The MY_ID as of now: ' || l_my_id);

-- result of this query
-- is as of opening the cursor
-- so it needs to return the same MY_ID
-- for all rows as above query demonstrates
open c;

-- now start to wreck the undo
l_job_id := submit_job('
begin
for x in (
select
rowid as rid
from
t
where
rownum <= 10000
) loop
update
t
set
object_name = reverse(object_name),
my_id = 1 - my_id
where
rowid = x.rid;
commit;
end loop;
end;
');

-- start fetching from result set
loop
fetch c into l_my_id;
exit when c%notfound;
l_rowcnt := l_rowcnt + 1;
dbms_output.put_line('Row: ' || l_rowcnt || ' ID: ' || l_my_id);
dbms_lock.sleep(0.01);
end loop;
close c;
exception
when others then
dbms_output.put_line('Rows fetched: ' || l_rowcnt);
raise;
end;
/

spool off

So now we're reading simply the table row-by-row without the index usage, which will be a full table scan operation. Let's check the result without any special SCN_ASCENDING hint:

The MY_ID as of now: 1
Row: 1 ID: 1
Row: 2 ID: 1
Row: 3 ID: 1
Row: 4 ID: 1
Row: 5 ID: 1
Row: 6 ID: 1
Row: 7 ID: 1
Row: 8 ID: 1
Row: 9 ID: 1
Row: 10 ID: 1
.
.
.
Row: 4662 ID: 1
Row: 4663 ID: 1
Row: 4664 ID: 1
Row: 4665 ID: 1
Row: 4666 ID: 1
Row: 4667 ID: 1
Row: 4668 ID: 1
Row: 4669 ID: 1
Rows fetched: 4669
declare
*
ERROR at line 1:
ORA-01555: snapshot too old: rollback segment number 12 with name "_SYSSMU12$"
too small
ORA-06512: at line 83

Elapsed: 00:01:03.41

OK, great. It takes a bit longer, and may be you need to increase the ROWNUM limits accordingly to encounter the error, but it's still reproducible.

Let's try again with the hint:

.
.
.
declare
cursor c
is
select /*+ scn_ascending */
*
from
(
select /*+ all_rows */
my_id
from
t
--order by
-- object_id
)
where rownum <= 5000;
.
.
.

And here's the (scary) result:

The MY_ID as of now: 1
Row: 1 ID: 1
Row: 2 ID: 1
Row: 3 ID: 1
Row: 4 ID: 1
Row: 5 ID: 1
Row: 6 ID: 1
Row: 7 ID: 1
Row: 8 ID: 1
Row: 9 ID: 1
Row: 10 ID: 1
.
.
.
Row: 530 ID: 1
Row: 531 ID: 0
Row: 532 ID: 1
Row: 533 ID: 1
Row: 534 ID: 0
Row: 535 ID: 1
Row: 536 ID: 1
Row: 537 ID: 1
Row: 538 ID: 1
Row: 539 ID: 1
Row: 540 ID: 1
Row: 541 ID: 1
Row: 542 ID: 1
Row: 543 ID: 1
Row: 544 ID: 0
Row: 545 ID: 1
Row: 546 ID: 1
.
.
.
Row: 4973 ID: 1
Row: 4974 ID: 0
Row: 4975 ID: 1
Row: 4976 ID: 1
Row: 4977 ID: 1
Row: 4978 ID: 1
Row: 4979 ID: 1
Row: 4980 ID: 1
Row: 4981 ID: 1
Row: 4982 ID: 1
Row: 4983 ID: 1
Row: 4984 ID: 1
Row: 4985 ID: 1
Row: 4986 ID: 1
Row: 4987 ID: 1
Row: 4988 ID: 1
Row: 4989 ID: 1
Row: 4990 ID: 1
Row: 4991 ID: 1
Row: 4992 ID: 1
Row: 4993 ID: 1
Row: 4994 ID: 1
Row: 4995 ID: 1
Row: 4996 ID: 1
Row: 4997 ID: 1
Row: 4998 ID: 1
Row: 4999 ID: 1
Row: 5000 ID: 1

PL/SQL procedure successfully completed.

Elapsed: 00:00:54.26

It can be clearly seen that those 0 returned by the query shouldn't be there according to the first line of the output, so this scary feature seems to have worked in this case.

Interestingly you get the same result and behaviour when running the test case against 10.2.0.4, so although the hint is not documented for that version it seems to work there, too.

I couldn't reproduce this on 9.2.0.8, so obviously it wasn't backported there.

Here's another, slightly more complex but even more impressive test case, which basically does the same, but introduces some further stuff. Note that it might require the following to be granted as user SYS to the user executing the test case. The grant on DBMS_LOCK is actually also required for Tom Kyte's demonstration code above:

grant execute on sys.dbms_pipe to cbo_test;

grant execute on sys.dbms_lock to cbo_test;

Here is the code:

drop table scn_ascending_demo purge;

create table scn_ascending_demo
as
select
1 as col1
, rpad('x', 100, 'x') as filler
from dual
connect by level <= 2000;

create index scn_ascending_demo_idx1 on scn_ascending_demo(filler);

drop table scn_ascending_demo_wreck_undo purge;

create table scn_ascending_demo_wreck_undo
as
select * from all_objects
where rownum <= 10000;

create or replace function slow_fetch (the_cursor sys_refcursor)
return sys.ku$_objnumset
-- use this in 9i
-- return mdsys.sdo_numtab authid current_user
pipelined
is
n_num number;
begin
loop
fetch the_cursor into n_num;
if the_cursor%notfound then
close the_cursor;
exit;
end if;
pipe row(n_num);
dbms_lock.sleep(0.01);
end loop;
return;
end slow_fetch;
/

set serveroutput on timing on

alter session set nls_language = 'AMERICAN';

spool scn_ascending_demo.log

declare
job_id binary_integer;
msg_buffer varchar2(2000);
pipe_id integer;
pipe_name constant varchar2(20) := 'scn_ascending_demo';
pipe_status integer;
c sys_refcursor;
l sys_refcursor;
n_result number;
n_row number;
function local_submit_job(what in varchar2)
return binary_integer
is
pragma autonomous_transaction;
job_id binary_integer;
begin
dbms_job.submit(job_id, what);
commit;
return job_id;
end;
begin
pipe_id := dbms_pipe.create_pipe(pipe_name);
open c for
select
col1
from
scn_ascending_demo;

job_id := local_submit_job('
declare
n_status integer;
begin
update
scn_ascending_demo
set
col1 = 1 - col1
, filler = rpad(''y'', 100, ''y'')
;
commit;
dbms_pipe.pack_message(''DONE'');
n_status := dbms_pipe.send_message(''' || pipe_name || ''');
exception
when others then
dbms_pipe.pack_message(''ERROR: '' || sqlerrm);
n_status := dbms_pipe.send_message(''' || pipe_name || ''');
end;
');

pipe_status := dbms_pipe.receive_message(pipe_name);
dbms_pipe.unpack_message(msg_buffer);
if msg_buffer != 'DONE' then
raise_application_error(-20001, 'Error in updating scn_ascending_demo: ' || msg_buffer);
end if;

job_id := local_submit_job('
declare
n_status integer;
snapshot_too_old exception;
pragma exception_init(snapshot_too_old, -1555);
no_space_left exception;
pragma exception_init(no_space_left, -30036);
begin
loop
begin
update
scn_ascending_demo_wreck_undo
set
owner = dbms_random.string(''a'', 30)
, object_name = dbms_random.string(''a'', 30)
, subobject_name = dbms_random.string(''a'', 30)
, object_type = dbms_random.string(''a'', 18)
;
commit;
exception
when snapshot_too_old or no_space_left then
commit;
end;
n_status := dbms_pipe.receive_message(''' || pipe_name || ''', 0);
exit when n_status != 1;
dbms_lock.sleep(0.5);
end loop;
commit;
dbms_pipe.pack_message(''DONE'');
n_status := dbms_pipe.send_message(''' || pipe_name || ''');
end;
');

begin
open l for
select
rownum as r_no
, value(d) as result
from
table(slow_fetch(c)) d;
loop
fetch l into n_row, n_result;
exit when l%notfound;
dbms_output.put_line('Row ' || n_row || ':' || n_result);
end loop;
close l;
/*
n_row := 0;
loop
fetch c into n_result;
exit when c%notfound;
n_row := n_row + 1;
dbms_output.put_line('Row ' || n_row || ':' || n_result);
dbms_lock.sleep(0.01);
end loop;
close c;
*/
exception
when others then
dbms_output.put_line('Error: ' || sqlerrm);
end;

dbms_pipe.pack_message('DONE');
pipe_status := dbms_pipe.send_message(pipe_name);
dbms_lock.sleep(5);
pipe_status := dbms_pipe.receive_message(pipe_name, 5);
pipe_id := dbms_pipe.remove_pipe(pipe_name);
declare
job_does_not_exist exception;
pragma exception_init(job_does_not_exist, -23421);
begin
dbms_job.remove(job_id);
commit;
exception
when job_does_not_exist then
dbms_output.put_line('Job: ' || job_id || ' does not exist any longer.');
end;
exception
when others then
pipe_id := dbms_pipe.remove_pipe(pipe_name);
declare
job_does_not_exist exception;
pragma exception_init(job_does_not_exist, -23421);
begin
dbms_job.remove(job_id);
commit;
exception
when job_does_not_exist then
dbms_output.put_line('Job: ' || job_id || ' does not exist any longer.');
end;
raise;
end;
/

spool off

This code does the following:

1. Creates two tables, one that will be modified and read, and another one whose sole purpose is to ensure that the undo will be overwritten

2. Uses a pipelined table function to fetch data slowly from a passed ref cursor object. Note that this is purely optional for demonstration purposes of a pipelined table function and as you can see the commented part simply fetches from the initial cursor directly to achieve the same result.

3. Uses DBMS_PIPE to perform very rudimentary synchronisation between the spawned jobs and the main session.

4. The basic principle is similar, but somewhat different to the previous test case:
- We open a cursor. At that moment the result is pre-ordained.
- Then we spawn a separate job that modifies the complete table that the query is based on.
- Once this is successfully done we spawn another job that attempts to fill up and overwrite our small undo tablespace.
- While this job is running we start to fetch from the initially opened cursor.
- As soon as the fetch is complete, either due to errors or successfully completed, we tell the job to stop the update operation and finally clean up if the job for whatever reason is still running (which should not happen).

Here is the result from 11.1.0.7 (and 10.2.0.4 which behaves the same) without the SCN_ASCENDING hint:

Row 1:1
Row 2:1
Row 3:1
Row 4:1
Row 5:1
Row 6:1
Row 7:1
Row 8:1
Row 9:1
Row 10:1
.
.
.
Row 584:1
Row 585:1
Row 586:1
Row 587:1
Row 588:1
Row 589:1
Row 590:1
Row 591:1
Row 592:1
Row 593:1
Row 594:1
Error: ORA-01555: snapshot too old: rollback segment number 11 with name
"_SYSSMU11_1238392578$" too small
Job: 170 does not exist any longer.

PL/SQL procedure successfully completed.

Elapsed: 00:00:17.06

So we get the expected error. Now let's try with the SCN_ASCENDING hint:

.
.
.
open c for
select /*+ scn_ascending */
col1
from
scn_ascending_demo;
.
.
.

Here's the (even more obvious) result:

Row 1:1
Row 2:1
Row 3:1
Row 4:1
Row 5:1
Row 6:1
Row 7:1
Row 8:1
Row 9:1
Row 10:1
.
.
.
Row 456:1
Row 457:1
Row 458:1
Row 459:1
Row 460:1
Row 461:1
Row 462:1
Row 463:0
Row 464:0
Row 465:0
Row 466:0
Row 467:0
Row 468:0
Row 469:0
Row 470:0
.
.
.
Row 1993:0
Row 1994:0
Row 1995:0
Row 1996:0
Row 1997:0
Row 1998:0
Row 1999:0
Row 2000:0
Job: 181 does not exist any longer.

PL/SQL procedure successfully completed.

Elapsed: 00:10:38.18

So again the hint has worked and we can see the inconsistent reads that should have been ORA-01555 errors.

What if we change this test case slightly so that an index access is used?

.
.
.
open c for
select /*+ scn_ascending first_rows */
col1
from
scn_ascending_demo
order by
filler;

.
.
.

In this case again I couldn't prevent the ORA-01555 error, so this seems to be corroborate the theory that only full table scans are able to use the SCN_ASCENDING request successfully.

So in summary I have to say that this feature seems to be quite questionable, may be even buggy, and even when it works it looks quite scary given the otherwise very robust multi-versioning capabilities of Oracle which represent one of the cornerstones of its fundamental architecture.

I haven't checked yet if the hint does also modify the behaviour of DML statements, but since these employ already their "write" consistency as it is called by Tom Kyte, it's quite unlikely that the SCN_ASCENDING hint is applicable. This means that an update DML statement (or SELECT FOR UPDATE) that while processing encounters that the data accessed has been modified in the meantime by others is going to "restart". This effectively means that any potential changes already applied are going to be rolled back, and the statement again starts from scratch based on the latest data. Note that this restart can happen multiple times, and yes, the amount of undo and redo generated will be increased if this is going to happen, although Oracle seems to rollback the changes only once and from then on switch to a SELECT FOR UPDATE mode first. This is a bit similar to the SCN_ASCENDING behaviour, but the crucial difference is that the DML statement is able to re-start its work, whereas the query might have already fetched numerous rows that already have been processed by the client, so whereas the DML statement is still consistent because it starts all over again, the query results are potentially inconsistent since there is no re-start possible if the client has already processed a part of the result set.

As a side note: The "restart" effect of the "write" consistency can actually lead to triggers being fired multiple times for the same row and is one of the reasons why you should never perform non-transactional operations (typically sending an email) from within a trigger. The non-transactional operation cannot be rolled back and therefore will be potentially repeated, e.g. sending out emails to the same recipient multiple times. One possible solution to this problem is to encapsulate the non-transactional operation into something that is transactional, e.g. a job submitted via DBMS_JOB, because DBMS_JOB is transactional and the job creation will be rolled back as part of the DML restart.

The "restart" behaviour of the SELECT FOR UPDATE statement is somewhat documented in the Advanced Application Developer's Guide.

Oracle Discoverer - help people write ugly code :)

There's been a discussion going on among some of my friends about all this horrible-looking (and often badly performing) auto-generated SQL coming out of Discoverer and other tools. Here are some of the comments made during the discussion, and some of my memories of how I got started with Oracle with the help of my good friend Mogens Egan...

=======================

Me:

"Oracle Discoverer - helping developers write ugly code for more than a decade."

=======================

NN:

"no no no!

the real beauty of discoverer (and similar tools) is not that it lets developers write ugly code, but it lets people who don't know what code is (business users), write code and share it with other users who also don't know what code is. It's entire purpose in life is to let people who don't know what they are doing, do it. developers do what they do with some understanding and can, sometimes, be educated. accountants and hr people can't."

=======================

Me:

"This brings me back. From 1987 to 1990 I was in a bank, sharing an office with Mogens Egan (the father of Morten Egan) and basically creating a datawarehouse (although we didn't know it) for internal users in the bank.

Our strategy was this:

1. Every night (or once a week or whatever) we would transfer data from the banks mainframe system via a SNA gateway to our VAX. The data came from IMS databases and was delivered as flat ASCII files (one physical record = one logical record) which often resultet in very very long records, of course, since IMS is hierachical. We would then load it into tables and let the users access it.

2. I would hold one- or two-day courses where I'd teach the attendees (who had probably only used a PC for a very short time) how to log onto the VAX using Smarterm, how to use VMS basic commands (including the editor), how to use SQL and SQL*Plus, how to create default forms in Forms 2.3 - and some other stuff.

3. Mogens Egan's idea was that it was better to turn users/experts (SME's in todays jargon) into "programmers" than vice versa. And then it should be our job to fix run-away jobs (read: SQL that performed bad or messed up things for others).

A rather anarchistic approach, you could say. But man, it worked. In three years we had 1000 users, some of who turned out to be natural super users, who started creating systems that helped their co-workers.

Since they were not officially named super users they couldn't demand to be given time to develop something they thought could be useful - they were by natural selection only allowed to spend time on something their co-workers thought useful.

Mogens and I are still in contact with many of those users. The machine is now an Alpha cluster, the data it manages runs a rather large banks' trading stuff, and all that - but its name is still Samson. And the super user we created back then is still called Supermule, which is the Danish name for Super Goof. With the introduction of English-speaking consultants in the last 10 years it has proved a minor mistake - they all ask "What's a super mule?"

So yes, we had many incidents of run-away jobs where the poor user had issued a SQL statement without the proper where-clause, etc. But then we would discover it, kill it, help the user - and all of the victims of this bad SQL knew it could be their turn one day, so they didn't get mad or upset.

That playground which we created back then generated a lot of Oracle-lovers who are still around in various higher positions, and perhaps it would have been even easier for them back then if we had had Discoverer.

So I think you're absolutely right: Discoverer will help computer-illeterates write really bad code even faster. But at least it gets them to use Oracle, and it creates wonderful problems that finances our fantastic lifestyles.

Mogens

PS: In the World as a whole, I think Discoverer had a presence (penetration) of about 2% of customers. In Denmark it was 20% due to my ex-wife Laila (Nathalie's mother), then product sales rep for Discoverer, who insisted that every single customer should have this product, like it or not. And notice how well Miracle is doing here. Perhaps there's a relationship."

===================================