Following on from yesterday’s post on consistent reads, I thought I’d make the point that the way you work can make an enormous difference to the amount of work you do. Here’s a silly little demo (in 10.2.0.3):
drop table t1 purge; create table t1 (id number, n1 number); insert into t1 values (1,0); insert into t1 values (2,0); commit; execute dbms_stats.gather_table_stats(user,'t1') execute snap_my_stats.start_snap begin for i in 1..1000 loop update t1 set n1 = i where id = 1; end loop; end; / execute snap_my_stats.end_snap set doc off doc Output - 10.2.0.3 (some rows deleted) Name Value ---- ----- opened cursors cumulative 11 user calls 6 recursive calls 1,068 recursive cpu usage 7 session logical reads 4,041 CPU used when call started 7 CPU used by this session 7 DB time 6 db block gets 1,030 db block gets from cache 1,030 consistent gets 3,011 consistent gets from cache 3,011 consistent gets - examination 4 db block changes 2,015 change write time 4 free buffer requested 1,014 switch current to new buffer 1,000 calls to kcmgas 1,014 calls to get snapshot scn: kcmgss 3,009 redo entries 960 redo size 295,160 undo change vector size 111,584 no work - consistent read gets 1,004 table scans (short tables) 1,001 table scan rows gotten 2,002 table scan blocks gotten 1,001 buffer is pinned count 1,000 execute count 1,009 #
I’ve created two rows in a table, then updated one of them 1,000 times – using a table scan to do the update. I haven’t yet committed my transaction. At this point I’m going to use a second session to run the same update loop on the second row in the table:
begin for i in 1..1000 loop update t1 set n1 = i where id = 2; end loop; end; / Name Value ---- ----- opened cursors cumulative 8 user calls 6 recursive calls 1,009 recursive cpu usage 170 session logical reads 965,999 CPU used when call started 172 CPU used by this session 172 DB time 173 db block gets 1,030 db block gets from cache 1,030 consistent gets 964,969 consistent gets from cache 964,969 consistent gets - examination 961,965 db block changes 3,016 consistent changes 1,001,000 free buffer requested 1,015 CR blocks created 1,001 calls to kcmgas 1,015 calls to get snapshot scn: kcmgss 3,008 redo entries 1,936 redo size 358,652 undo change vector size 111,608 data blocks consistent reads - undo records applied 1,001,000 cleanouts and rollbacks - consistent read gets 1,001 immediate (CR) block cleanout applications 1,001 active txn count during cleanout 2,000 cleanout - number of ktugct calls 1,001 IMU CR rollbacks 41,041 table scans (short tables) 1,001 table scan rows gotten 2,002 table scan blocks gotten 1,001 execute count 1,006
Many of the statistics are (virtually) identical (e.g. “execute count”, “db block gets”, “free buffer requested”); some show an increase by 1,000 (often from zero) – largely because we have to worry 1,000 times about cleaning out the current block and creating a read-consistent version so that we can check if it can be updated.
But the most noticeable changes are in the “CPU time” and “consistent gets” because of the 1,000 times we have to apply 1,000 undo records to the block as we create the read-consistent version of the block. The CPU time has gone from 7 (hundredths of a second) to 172 because of (roughly) 1,000,000 “consistent gets – examination”. As I mentioned yesterday, this matches closely to “data blocks consistent reads – undo records applied” so we know why they are happening. Watch out in your batch jobs – if you have a lot of concurrent update activity going on a significant fraction of the workload may be the constant re-creation of CR clones.
However, there is another interesting detail to watch out for – what happens if I change the update execution path from a tablescan to an indexed access path:
create table t1 (id number, n1 number); insert into t1 values (1,0); insert into t1 values (2,0); commit; execute dbms_stats.gather_table_stats(user,'t1') create index t1_i1 on t1(id); -- Make indexed access path available.
Then with an index hint in my update code, I get the following effects (having done the same update loop on row 1 in the first session, of course):
begin for i in 1..1000 loop update /*+ index(t1) */ t1 set n1 = i where id = 2; -- indexed access path hinted end loop; end; / Name Value ---- ----- opened cursors cumulative 7 user calls 6 recursive calls 1,006 recursive cpu usage 11 session logical reads 2,036 CPU used when call started 11 CPU used by this session 11 DB time 11 db block gets 1,030 db block gets from cache 1,030 consistent gets 1,006 consistent gets from cache 1,006 consistent gets - examination 6 db block changes 2,015 free buffer requested 14 shared hash latch upgrades - no wait 1,000 calls to kcmgas 14 calls to get snapshot scn: kcmgss 1,004 redo entries 960 redo size 295,144 undo change vector size 111,608 index crx upgrade (positioned) 1,000 index scans kdiixs1 1,000 execute count 1,005
The difference is astonishing – where did all the ‘create CR copy’ activity go ?
I’ve pointed out before now that choosing a different execution plan for an update can have a big impact on performance – this is just another example demonstrating the point.
Here’s a quick demo to make a point about consistent reads (prompted by a question on the Oracle-L mailing list):
SQL> drop table t1; Table dropped. SQL> create table t1 (n1 number); Table created. SQL> insert into t1 values(0); 1 row created. SQL> begin 2 for i in 1..1000 loop 3 update t1 set n1 = i; 4 end loop; 5 end; 6 / PL/SQL procedure successfully completed.
Note that I haven’t issued a commit in this session, and all I’ve got is a single row in the table (and because it’s my usual demo setup of locally managed tablespaces with uniform extents of 1MB using freelist management I know that that one row is in the first available block of the table).
How much work is a second session going to do to scan that table ?
SQL> alter system flush buffer_cache; SQL> execute snap_my_stats.start_snap SQL> select * from t1; SQL> set serveroutput on size 1000000 format wrapped SQL> execute snap_my_stats.end_snap --------------------------------- Session stats - 18-Apr 11:33:01 Interval:- 2 seconds --------------------------------- Name Value ---- ----- session logical reads 967 consistent gets 967 consistent gets from cache 967 consistent gets - examination 964 consistent changes 1,001 CR blocks created 1 data blocks consistent reads - undo records applied 1,001 IMU CR rollbacks 41
The snap_my_stats package is similar in concept to Tom Kyte’s “runstats” or Tanel Poder’s “snapper” program to capture changes in values in the dynamic performance views over short time periods. In this case I’ve deleted all but a few of the larger changes, and a couple of small changes.
The figure that stands out (probably) is the “session logical reads” – we’ve done 967 logical I/Os to scan a tables of just one block. The reason for this is that we’ve created a read-consistent copy of that one block (“CR blocks created” = 1), and it has taken a lot of work to create that copy. We’ve had to apply 1,001 undo records (“data blocks consistent reads – undo records applied” = 1001).
Most of those undo records come from individual accesses (which are of the cheaper “consistent gets – examination” type that only need a single get on the “cache buffers chains” latch) to undo blocks, following the “UBA (undo block address)” pointer in the relevant ITL entry of the table block, but since this is a 10g database the last few undo records come out of the “In-memory Undo” of the other session. Basically the cloning operation is something like this:
It is an interesting point that as the first session created undo records it would pin and fill undo blocks – so would only do a few current gets (one for each block) on the undo blocks it was using. As another process reverses out the changes in a CR clone it has to get and release each undo block every time it wants a single undo record … applying undo records introduces far more latch and buffer activity that the original generation of the undo.
It’s worth knowing that there are three statistics relating to applying undo records:
transaction tables consistent reads - undo records applied Estimating "old" commit SCNs during delayed block cleanout data blocks consistent reads - undo records applied Creating CR clones rollback changes - undo records applied The result of a real "rollback;"
The second step in the list of actions is: “Notice uncommitted transaction”. It’s probably worth pointing out that another part of the ITL entry holds the transaction id (“xid”) which implicitly identifies the undo segment and transaction table slot in that segment that has been used to hold the transaction state. The current contents of that slot allow Oracle to determine whether or not (and when, if necessary) the transaction was committed.
April 14, 2011 I find interesting details from time to time about Oracle Database. I am in the process of organizing material for a presentation, and I encountered something interesting when testing Oracle Database 126.96.36.199 with patch 3 applied. I traced the execution of an application and then processed the 660MB trace file using a trace file [...]
This one is long overdue. There is a partition example of binary xml on this website based on Range, Hash and List partitioning, but this is, seen from a XML view, a incorrect way to do it due to the fact that the partition column is a regular one and not a virtual column as described in the Oracle XMLDB Developers Guide for 11.2. The examples given in that ppost will partition the table on the ID column but does not partition the XML based on a value/element IN the XML document. The following will.
So here, a small example, of how it can be done based on a virtual column. Be aware that you should support these virtual columns with at least an index or XMLIndex structure for performance reasons.
#993333; font-weight: bold;">CREATE #993333; font-weight: bold;">TABLE binary_part_xml #993333; font-weight: bold;">OF XMLType XMLTYPE STORE #993333; font-weight: bold;">AS SECUREFILE #993333; font-weight: bold;">BINARY XML VIRTUAL #993333; font-weight: bold;">COLUMNS #66cc66;">( LISTING_TYPE #993333; font-weight: bold;">AS #66cc66;">(XMLCast#66cc66;">(XMLQuery#66cc66;">(#ff0000;">'/LISTING/@TYPE' PASSING OBJECT_VALUE RETURNING CONTENT#66cc66;">) #993333; font-weight: bold;">AS VARCHAR2#66cc66;">(#cc66cc;">100#66cc66;">)#66cc66;">)#66cc66;">) #66cc66;">) PARTITION #993333; font-weight: bold;">BY LIST #66cc66;">(LISTING_TYPE#66cc66;">) #66cc66;">( PARTITION health #993333; font-weight: bold;">VALUES #66cc66;">(#ff0000;">'Health'#66cc66;">)#66cc66;">, PARTITION law_firms #993333; font-weight: bold;">VALUES #66cc66;">(#ff0000;">'Law Firm'#66cc66;">) #66cc66;">);
First, a reminder – my Advanced Oracle Troubleshooting v2.0 online seminar starts next week already. Last chance to sign up, I can accept registrations until Sunday :-)
I won’t do another AOT seminar before Oct (or Nov) this year. More details and sign-up here:
I have rescheduled my Advanced SQL Tuning and Partitioning & Parallel Execution for Performance seminars too. I will do them in September/October. Unfortunately I’m too busy right now to do them before the summer.
And that’s all the travel I will do this year…
I’ll soon announce the 2nd EsSN virtual conference too ;-)
Free online stuff:
Perhaps in a month or so I will do another hacking session (I’ll plan 2 hours this time, 1 hour isn’t nearly enough for going deep). The topic will probably be about low-level details of SQL plan execution internals… stay tuned!
Steve Bamber has written up a case study of library cache latch contention troubleshooting of an Apex application with LatchProf. I’m happy that others also see the value and have had success with my new LatchProf based latch contention troubleshooting approach which takes into account both sides of the contention story (latch waiters and latch holders/blockers) as opposed to the guesswork used previously (hey if it’s shared pool latch contention – is must be about bad SQL not using bind variables …. NOT always…)
Anyway, I’m happy. If you have success stories with LatchProf, please let me know!
As a second topic of interest, Laimutis Nedzinskas has written some good notes about the effect and overhead of Flashback Database option when you are using and modifying (nocache) LOBs. We’ve exchanged some mails on this topic and yeah, my clients have sure seen some problems with this combination as well. You basically want to keep your LOBs cached when using FB database…
Ok, it’s official – the first and only Oracle Troubleshooting TV show is live now!
The first show is almost 2 hours about the ORA-4031 errors and shared pool hacking. It’s a recording of the US/EMEA timezone online hacking session I did some days ago.
There are a couple of things to note:
View the embedded video below or go to my official Oracle Troubleshooting TV show channel:
I know I have mentioned it before, but I am a big fan of the OEM performance screens that are derived from the ASH/AWR information. One of the things I really like about it is the immediate information it gives you, in one glance, that things are “not normal”. Once you notice that things are not normal you can then, within a few seconds, get a feel for what is still probably OK and where you have something that has changed.
As an example of the immediate information, I recently came back to my desk and glanced at my OEM performance screen. It was showing the below:
“data load has just ran” I said to my comrade-in-arms. “which one?” he asked. “The Delta – It ran the quick plan. But it started a bit late, 12:15. Oh, and looks like the transaction view code has swapped back to the full table scan plan and the summary code is not playing up at the moment.”
“you’re turning into Neo you are – can you see a lady in a red dress???” he asked.
That was of course a reference to the “Matrix” films where at times you see the virtual world displayed on a screen as a stream of characters running down the screen – but once you get used to it you can apparently “see” what is going.
The screen shot above is not even actually a very good example of what the performance screens can show you. One of my minor complaints about the performance screens is that it scales to show the greatest of the largest peak or a number of sessions to match the number of CPUs (real or fake) that are available to you. So if you have more CPU available than you need, you can’t see much detail in the graph. And if you have had a nasty peak of activity, again, all detail is squeezed out. In my case, the box is sized to cope in 12 months and the system is new, so activity is scuttling along the bottom of the graph.
However, “poor” though the example is, it told me what was going on across my system at a glance, something about the major tasks we are running, that one problem is currently occurring and that several of the other issues I need to keep an eye out for are not occurring.
That is why I love these screens – I recognise “my” activity patterns from the graph, I now recognise the SQL IDs for my key statements. If I see a pattern in the graph I don’t recognise, I need to check things out immediately. Three or four times over the last 2 weeks I have spotted an issues, started investigating and found out the cause before the Operations desk has even noticed an issue.
Oh, and what is SQL type 189? It is a merge statement. Our implementation of OEM is a little old, it does not correctly interpret that SQL command type. It might be a little old, it is still a lot useful.
People talk about the Oracle SQL Developer 3 being out, which is cool, but I have something even cooler for you today ;-)
I finally figured out how to convert my screen-recordings to uploadable videos, so that the text wouldn’t get unreadable and blurry.
So, here’s the first video, about a tool called MOATS, which we have built together with fellow OakTable Network member and a PL/SQL wizard Adrian Billington (of oracle-developer.net).
Here’s the video, it’s under 3 minutes long. Play the video in full screen for best results (and if it’s too slow loading, change it to lower resolution from HD mode):
Check it out and if you like MOATS, you can download it from Adrian’s website site (current version 1.05) and make sure you read the README.txt file in the zip!
Also thanks to Randolf Geist for finding and fixing some bugs in our alpha code… Note that MOATS is still kind of beta right now…
P.S. I will post my ORA-4031 and shared pool hacking video real soon now, too! :-)
P.P.S. Have you already figured out how it works?! ;-)
Update: Now you can suggest new features and improvement requests here:
Here’s a note I’ve been meaning to research and write up for more than 18 months – every since Dion Cho pinged a note I’d written about the effects of partitioning because of a comment it made about the “2% small table threshold”.
It has long been an item of common knowledge that Oracle has a “small table threshold” that allows for special treatment of data segments that are smaller than two percent of the size of the buffer cache, viz:
If a table is shorter than the threshold then it is read to the midpoint of the cache (just like any other block read) but – whether by accident or design – the touch count (x$bh.tch) is not set and the table will fall off the LRU end of the buffer cache fairly promptly as other objects are read into the buffer. Such a tablescan would be recorded under the statistic “table scans (short tables)”.
Then, in July 2009, Dion Cho decided to check this description before repeating it, and set about testing it on Oracle 10gR2 – producing some surprising results and adding another item to my to-do list. Since then I have wanted to check his conclusions, check whether the original description had ever been true and when (or if) it had changed.
As a simple starting point, of course, it was easy to check the description of the relevant (hidden) parameter to see when it changed:
188.8.131.52 _small_table_threshold threshold level of table size for forget-bit enabled during scan 184.108.40.206 _small_table_threshold threshold level of table size for direct reads 220.127.116.11 _small_table_threshold lower threshold level of table size for direct reads
This suggests that the behaviour might have changed some time in 9i (18.104.22.168 happened to be the earliest 9i listing of x$ksppi I had on file) – so I clearly had at least three major versions to check.
The behaviour of the cache isn’t an easy thing to test, though, because there are a number of special cases to consider – in particular the results could be affected by the positioning of the “mid-point” marker (x$kcbwds.cold_hd) that separates the “cold” buffers from the “hot” buffers. By default the hot portion of the default buffer is 50% of the total cache (set by hidden parameter _db_percent_hot_default) but on instance startup or after a “flush buffer cache” there are no used buffers so the behaviour can show some anomalies.
So here’s the basic strategy:
Here’s some sample code:
create table t_15400 pctfree 99 pctused 1 as with generator as ( select --+ materialize rownum id from dual connect by rownum <= 10000 ) select rownum id, lpad(rownum,10,'0') small_vc, rpad('x',100) padding from generator v1, generator v2 where rownum <= 15400 ; create index t_15400_id on t_15400(id); begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T_15400', estimate_percent => 100, method_opt => 'for all columns size 1' ); end; / select object_name, object_id, data_object_id from user_objects where object_name in ( 'T_300', 'T_770', 'T_1540', 'T_3750', 'T_7700', 'T_15400', 'T_15400_ID' ) order by object_id ; select /*+ index(t) */ max(small_vc) from t_15400 t where id > 0 ;
The extract shows the creation of just the last and largest table I created and collected statistics for – and it was the only one with an index. I chose the number of blocks (I’ve rigged one row per block) because I had set up a db_cache_size of 128MB on my 10.2.0.3 Oracle instance and this had given me 15,460 buffers.
As you can see from the query against user_objects my test case included tables with 7,700 rows (50%), 3,750 rows (25%), 1,540 rows (10%), 770 rows (5%) and 300 rows (2%). (The number in brackets are the approximate sizes of the tables – all slightly undersized – relative to the number of buffers in the default cache).
Here’s the query that I then ran against x$bh (connected as sys from another session) to see what was in the cache (the range of values needs to be adjusted to cover the range of object_id reported from user_objects):
select obj, tch, count(*) from x$bh where obj between 77710 and 77720 group by obj, tch order by count(*) ;
After executing the first index range scan of t_15400 to fill the cache three times:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75855 0 1 75854 0 1 75853 0 1 75851 0 1 75850 0 1 75849 0 1 75852 0 1 75855 2 9 -- Index blocks, touch count incremented 75855 1 18 -- Index blocks, touch count incremented 75854 1 11521 -- Table blocks, touch count incremented
Then after three tablescans, at 4 second intervals, of the 7,700 block table:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75853 3 1 -- segment header of 7700 table, touch count incremented each time 75855 0 1 75854 0 1 75852 0 1 75849 0 1 75850 0 1 75851 0 1 75855 2 9 75855 1 10 75853 0 3991 -- lots of blocks from 7700 table, no touch count increment 75854 1 7538
Then repeating the tablescan of the 3,750 block table three times:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75853 3 1 75855 0 1 75854 0 1 75851 0 1 75852 3 1 -- segment header block, touch count incremented each time 75849 0 1 75850 0 1 75855 2 9 75855 1 10 75853 0 240 75852 0 3750 -- table completely cached - touch count not incremented 75854 1 7538
Then repeating the tablescan of the 1,540 block table three times:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75853 3 1 75855 0 1 75854 0 1 75851 3 1 -- segment header block, touch count incremented each time 75849 0 1 75850 0 1 75852 3 1 75855 2 9 75855 1 10 75853 0 149 75851 2 1540 -- Table fully cached, touch count incremented but only to 2 75852 0 2430 75854 1 7538
Then executing the tablescan of the 770 block table three times:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75853 3 1 75855 0 1 75850 3 1 -- segment header block, touch count incremented each time 75849 0 1 75851 3 1 75852 3 1 75854 0 1 75855 2 9 75855 1 10 75851 0 69 75853 0 149 75850 2 770 -- Table fully cached, touch count incremented but only to 2 75851 2 1471 75852 0 1642 75854 1 7538
Finally executing the tablescan of the 300 block table three times:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75853 3 1 75855 0 1 75854 0 1 75850 3 1 75852 3 1 75851 3 1 75855 2 9 75855 1 10 75851 0 69 75850 0 131 75853 0 149 75849 3 301 -- Table, and segment header, cached and touch count incremented 3 times 75850 2 639 75852 0 1342 75851 2 1471 75854 1 7538
This set of results on its own isn’t conclusive, of course, but the indications for 10.2.0.3 are:
I can’t state with any certainty where the used and recycled buffers might be, but since blocks from the 3750 tablescan removed the blocks from the 7700 tablescan, it’s possible that “large” tablescans do somehow go “to the bottom quarter” of the LRU.
There also some benefit in checking the statistics “table scans (short)” and “table scans (long)” as the tests run. For the 2% (300 block) table I recorded 3 short tablescans; for the tables in the 2% to 10% range (770 and 1540) I recorded one long and two short (which is consistent with the touch count increment of 2 – the first scan was expected to be long, but the 2nd and 3rd were deemed to be short based on some internal algorithm about the tables being fully cached); finally for the tables above 10% we always got 3 long tablescans.
But as it says in the original note on small partitions – there are plenty of questions still to answer:
I’ve quoted the 2% as the fraction of the db_cache_size – but we have automatic SGA management in 10g, automatic memory management in 11g, and up to eight different cache sizing parameters in every version from 9i onwards. What figure is used as the basis for the 2%, and is that 2% of the blocks or 2% of the bytes, and if you have multiple block sizes does each cache perhaps allow 2% of its own size.
And then, in 11g we have to worry about automatic direct path serial tablescans – and it would be easy to think that the “_small_table_threshold” may have been describing that feature since (at least) 22.214.171.124 if its description hadn’t changed slightly for 11.2 !
So much to do, so little time — but at least you know that there’s something that needs careful investigation if you’re planning to do lots of tablescans.
Footnote: Having written some tests, it’s easy to change versions. Running on 126.96.36.199 and 188.8.131.52, with similar sized caches, I could see that the “traditional” description of the “small_table_threshold” was true – a short tablescan was anything less 2% of the buffer cache, long tablescans were (in effect) done using just a window of “db_file_multiblock_read_count” buffers, and in both cases the touch count was never set (except for the segment header block).