Oakies Blog Aggregator

Histogram Hassle

I came across a simple performance problem recently that ended up highlighting a problem with the 12c hybrid histogram algorithm. It was a problem that I had mentioned in passing a few years ago, but only in the context of Top-N histograms and without paying attention to the consequences. In fact I should have noticed the same threat in a recent article by Maria Colgan that mentioned the problems introduced in 12c by the option “for all columns size repeat”.

So here’s the context (note – all numbers used in this example are approximations to make the arithmetic obvious).  The client had a query with a predicate like the follwing:

    t4.columnA = :b1
and t6.columnB = :b2

The optimizer was choosing to drive the query through an indexed access path into t6, which returned ca. 1,000,000 rows before joining (two tables later) to t4 at which point all but a couple of rows remained – typical execution time was in the order of tens of minutes. A /*+ leading(t4) */ hint to start on t4 with an index that returned two rows reduced the response time to the classic “sub-second”.

The problem had arisen because the optimizer had estimated a cardinality of 2 rows for the index on t6 and the reason for this was that, on average, that was the correct number. There were 2,000,000 rows in the table with 1,000,000 distinct values. It was just very unlucky that one of the values appeared 1,000,000 times and that was the value the users always wanted to query – and there was no histogram on the column to tell the optimizer that there was a massive skew in the data distibribution.

Problem solved – all I had to do was set a table preference for this table to add a histogram to this column and gather stats. Since there were so many distinct values and so much “non-popular” data in the table the optimizer should end up with a hybrid histogram that would highlight this value. I left instructions for the required test and waited for the email telling me that my suggestion was brilliant and the results were fantastic… I got an email telling me it hadn’t worked.

Here’s a model of the situation – I’ve created a table with 2 million rows and a column where every other row contains the same value but otherwise contains the rownum. Because the client code was using a varchar2() column I’ve done the same here, converting the numbers to character strings left-padded with zeros. There are a few rows (about 20) where the column value is higher than the very popular value.


rem
rem     Script:         histogram_problem_12c.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 2e4
)
select
        rownum  as id,
        case
                when mod(rownum,2) = 0
                        then '999960'
                        else lpad(rownum,6,'0')
        end     as bad_col
from
        generator       v1,
        generator       v2
where
        rownum <= 2e6
;

Having created the data I’m going to create a histogram on the bad_col – specifying 254 columns – then query user_tab_histograms for the resulting histogram (from which I’ll delete a huge chunk of boring rows in the middle):


begin

        dbms_stats.gather_table_stats(
                ownname         => 'TEST_USER',
                tabname         => 'T1',
                method_opt      => 'for columns bad_col size 254'
        );

end;
/

select
        column_name, histogram, sample_size
from
        user_tab_columns
where
        table_name = 'T1'
;

column end_av format a12

select
        endpoint_number         end_pt,
        to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx') end_val,
        endpoint_actual_value   end_av,
        endpoint_repeat_count   end_rpt
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'BAD_COL'
order by
        endpoint_number
;


COLUMN_NAME          HISTOGRAM             Sample
-------------------- --------------- ------------
BAD_COL              HYBRID                 5,513
ID                   NONE               2,000,000

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
         1  303030303031001f0fe211e0800000 000001                1
        12  3030383938311550648a5e3d200000 008981                1
        23  303135323034f8f5cbccd2b4a00000 015205                1
        33  3032333035311c91ae91eb54000000 023051                1
        44  303239373236f60586ef3a0ae00000 029727                1
...
      2685  3938343731391ba0f38234fde00000 984719                1
      2695  39393235303309023378c0a1400000 992503                1
      2704  3939373537370c2db4ae83e2000000 997577                1
      5513  393939393938f86f9b35437a800000 999999                1

254 rows selected.

So we have a hybrid histogram, we’ve sampled 5,513 rows to build the histogram, we have 254 buckets in the histogram report, and the final row in the histogram is end point 5513 (matching the sample size). The first row of the histogram shows us the (real) low value in the column and the last row of the histogram reports the (real) high value. But there’s something very odd about the histogram – we know that ‘999960’ is the one popular value, occurring 50% of the time in the data, but it doesn’t appear in the histogram at all.

Looking more closely we see that every bucket covers a range of about 11 (sometimes 9 or 10) rows from the sample, and the highest value in each bucket appears just once; but the last bucket covers 2,809 rows from the sample with the highest value in the bucket appearing just once. We expect a hybrid histogram to have buckets which (at least initially) are all roughly the same size – i.e. “sample size”/”number of buckets” – with some buckets being larger by something like the amount that appears in their repeat count, so it doesn’t seem right that we have an enormous bucket with a repeat count of just 1. Something is broken.

The problem is that the sample didn’t find the low and high values for the column – although the initial full tablescan did, of course – so Oracle has “injected” the low and high values into the histogram fiddling with the contents of the first and last buckets. At the bottom end of the histogram this hasn’t really caused any problems (in our case), but at the top end it has taken the big bucket for our very popular ‘999960’ and apparently simply replaced the value with the high value of ‘999999’ and a repeat count of 1.

As an indication of the truth of this claim, here are the last few rows of the histogram if I repeat the experiment but, before gathering the histogram, delete the rows where bad_col is greater than ‘999960’. (Oracle’s sample is random, of course, and has changed slightly for this run.)

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
      2641  3938373731371650183cf7a0a00000 987717                1
      2652  3939353032310e65c1acf984a00000 995021                1
      2661  393938393433125319cc9f5ba00000 998943                1
      5426  393939393630078c23b063cf600000 999960             2764

Similarly, if I inserted a few hundred rows with a higher value than my popular value (in this case I thought 500 rows would be a fairly safe bet as the sample was about one in 360 rows) I got a histogram which started with a bucket about the popular bucket, so the problem of that bucket being hacked to the high value was less significant:


    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
      2718  393736313130fe68d8cfd6e4000000 976111                1
      2729  393836373630ebfe9c2b7b94c00000 986761                1
      2740  39393330323515efa3c99771600000 993025                1
      5495  393939393630078c23b063cf600000 999960             2747
      5497  393939393938f86f9b35437a800000 999999                1

Bottom line, then: if you have an important popular value in a column and there aren’t very many rows with a higher value, you may find that Oracle loses sight of the popular value as it fudges the column’s high value into the final bucket.

Workaround

I did consider writing a bit of PL/SQL for the client to fake a realistic frequency histogram, but decided that that wouldn’t be particularly friendly to future DBAs who might have to cope with changes. Luckily the site doesn’t gather stats using the automatic scheduler job and only rarely updates stats anyway, so I suggested we create a histogram on the column using an estimate_percent of 100. This took about 8 minutes to run – for reasons that I will go into in a moment – after which I suggested we lock stats on the table and document the fact that when stats are collected on this table it’s got to be a two-pass job – the normal gather with its auto_sample_size to start with, then a 100% sample for this column to gather the histogram:


begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt       => 'for columns bad_col size 254',
                estimate_percent => 100,
                cascade          => false
        );
end;
/

    END_PT END_VAL                         END_AV          END_RPT
---------- ------------------------------- ------------ ----------
...
       125  39363839393911e01d15b75c600000 968999                0
       126  393834373530e98510b6f19a000000 984751                0
       253  393939393630078c23b063cf600000 999960                0
       254  393939393938f86f9b35437a800000 999999                0

129 rows selected.

This took a lot longer, of course, and produced an old-style height-balanced histogram. Part of the time came from the increased volume of data that had to be processed, part of it came from a suprise (which also appeared, in a different guise, in the code that created the original hybrid histogram).

I had specifically chosen the method_opt to gather for nothing but the single column. In fact whether I forced the “legact” (height-balanced) code or the modern (hybrid) code, I got a full tablescan that did some processing of EVERY column in the table and then threw most of the results away. Here are fragements of the SQL – old version first:


select /*+  
            no_parallel(t) no_parallel_index(t) dbms_stats
            cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
            xmlindex_sel_idx_tbl no_substrb_pad  
       */
       count(*), 
       count("ID"), sum(sys_op_opnsize("ID")),      
       count("BAD_COL"), sum(sys_op_opnsize("BAD_COL"))    
       ...
from
       "TEST_USER"."T1" t


select /*+
           full(t)    no_parallel(t) no_parallel_index(t) dbms_stats
           cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
           xmlindex_sel_idx_tbl no_substrb_pad
       */
       to_char(count("ID")),
       to_char(count("BAD_COL")),
       substrb(dump(min("BAD_COL"),16,0,64),1,240),
       substrb(dump(max("BAD_COL"),16,0,64),1,240),
       ...
       count(rowidtochar(rowid)) 
from
       "TEST_USER"."T1" t  /* ACL,TOPN,NIL,NIL,RWID,U,U254U*/

The new code only used the substrb() functions on the bad_col, but all other columns in the table were subject to the to_char(count()).
The old code applied the count() and sys_op_opnsize() to every column in the table.

This initial scan was a bit expensive – and disappointing – for the client since their table had 290 columns (which means intra-block chaining as a minimum) and had been updated so much that 45% of the rows in the table had to be “continued fetches”. I can’t think why every column had to be processed like this, but if they hadn’t been that would have saved a lot of CPU and I/O since the client’s critical column was very near the start of the table.

Finally

This problem with the popular value going missing is a known issue, for which there is a bug number, but there is further work going on in the same area which means this particular detail is being rolled into another bug fix. More news when it becomes available.

 

 

Spectre and Meltdown on Oracle Public Cloud UEK – LIO

In the last post I published the strange results I had when testing physical I/O with the latest Spectre and Meltdown patches. There is the logical I/O with SLOB cached reads.

Logical reads

I’ve run some SLOB cache reads with the latest patches, as well as with only KPTI disabled, and with KPTI, IBRS and IBPB disabled.
I am on the Oracle Public Cloud DBaaS with 4 OCPU

DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 670,001.2
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 671,145.4
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 672,464.0
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 685,706.7 nopti
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 689,291.3 nopti
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 689,386.4 nopti
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 699,301.3 nopti noibrs noibpb
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 704,773.3 nopti noibrs noibpb
DB Time(s) : 1.0 DB CPU(s) : 1.0 Logical read (blocks) : 704,908.2 nopti noibrs noibpb

This is what I expected: when disabling the mitigation for Meltdown (PTI), and for some of the Spectre (IBRS and IBPB), I have a slightly better performance – about 5%. This is with only one SLOB session.

However, with 2 sessions I have something completely different:

DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,235,637.8 nopti noibrs noibpb
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,237,689.6 nopti
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,243,464.3 nopti noibrs noibpb
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,247,257.4 nopti
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,247,257.4 nopti noibrs noibpb
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,251,485.1
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,253,477.0
DB Time(s) : 2.0 DB CPU(s) : 2.0 Logical read (blocks) : 1,271,986.7

This is not a saturation situation here. My VM shape is 4 OCPUs, which is supposed to be the equivalent of 4 hyperthreaded cores.

And this figure is even worse with 4 sessions (all cores used) and more:

DB Time(s) : 4.0 DB CPU(s) : 4.0 Logical read (blocks) : 2,268,272.3 nopti noibrs noibpb
DB Time(s) : 4.0 DB CPU(s) : 4.0 Logical read (blocks): 2,415,044.8


DB Time(s) : 6.0 DB CPU(s) : 6.0 Logical read (blocks) : 3,353,985.7 nopti noibrs noibpb
DB Time(s) : 6.0 DB CPU(s) : 6.0 Logical read (blocks): 3,540,736.5


DB Time(s) : 8.0 DB CPU(s) : 7.9 Logical read (blocks) : 4,365,752.3 nopti noibrs noibpb
DB Time(s) : 8.0 DB CPU(s) : 7.9 Logical read (blocks): 4,519,340.7

The graph from those is here:
CaptureOPCLIO001

If I compare with the Oracle PaaS I tested last year (https://blog.dbi-services.com/oracle-public-cloud-liops-with-4-ocpu-in-paas/) which was on Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz you can also see a nice improvement here on Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz.

This test was on 4.1.12-112.14.10.el7uek.x86_64 and Oracle Linux has now released a new update: 4.1.12-112.14.11.el7uek

 

Cet article Spectre and Meltdown on Oracle Public Cloud UEK – LIO est apparu en premier sur Blog dbi services.

Spectre/Meltdown on Oracle Public Cloud UEK – PIO

The Spectre and Meltdown is now in the latest Oracle UEK kernel, after updating it with ‘yum update':

[opc@PTI ~]$ rpm -q --changelog kernel-uek
| awk '/CVE-2017-5715|CVE-2017-5753|CVE-2017-5754/{print $NF}' | sort | uniq -c
43 {CVE-2017-5715}
16 {CVE-2017-5753}
71 {CVE-2017-5754}

As I did on the previous post on AWS, I’ve run quick tests on the Oracle Public Cloud.

Physical reads

I’ve run some SLOB I/O reads with the patches, as well sit KPTI disabled, and with KPTI, IBRS and IBPB disabled.

And I was quite surprised by the result:


DB Time(s) : 1.0 DB CPU(s) : 0.4 Read IO requests : 23,335.6 nopti
DB Time(s) : 1.0 DB CPU(s) : 0.4 Read IO requests : 23,420.3 nopti
DB Time(s) : 1.0 DB CPU(s) : 0.4 Read IO requests : 24,857.6
DB Time(s) : 1.0 DB CPU(s) : 0.4 Read IO requests : 25,332.1


DB Time(s) : 2.0 DB CPU(s) : 0.7 Read IO requests : 39,857.7 nopti
DB Time(s) : 2.0 DB CPU(s) : 0.7 Read IO requests : 40,088.4 nopti
DB Time(s) : 2.0 DB CPU(s) : 0.7 Read IO requests : 40,627.0
DB Time(s) : 2.0 DB CPU(s) : 0.7 Read IO requests : 40,707.5


DB Time(s) : 4.0 DB CPU(s) : 0.9 Read IO requests : 47,491.4 nopti
DB Time(s) : 4.0 DB CPU(s) : 0.9 Read IO requests : 47,491.4 nopti
DB Time(s) : 4.0 DB CPU(s) : 0.9 Read IO requests : 49,438.2
DB Time(s) : 4.0 DB CPU(s) : 0.9 Read IO requests : 49,764.5


DB Time(s) : 8.0 DB CPU(s) : 1.2 Read IO requests : 54,227.9 nopti
DB Time(s) : 8.0 DB CPU(s) : 1.2 Read IO requests : 54,582.9 nopti
DB Time(s) : 8.0 DB CPU(s) : 1.3 Read IO requests : 57,288.6
DB Time(s) : 8.0 DB CPU(s) : 1.4 Read IO requests : 57,057.2

Yes. I all tests that I’ve done, the IOPS is higher with KPTI enabled vs. when booting the kernel with the nopti option. Here is a graph with those numbers:

CaptureOPCPIO001

I did those tests on the Oracle Cloud because I know that we have very fast I/O here, in hundreds of microseconds, probably all cached in the storage:

Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total Wait Avg % DB Wait
Event Waits Time (sec) Wait time Class
------------------------------ ----------- ---------- --------- ------ --------
db file parallel read 196,921 288.8 1.47ms 48.0 User I/O
db file sequential read 581,073 216.3 372.31us 36.0 User I/O
DB CPU 210.5 35.0
 
% of Total Waits
----------------------------------------------- Waits
Total 1ms
Event Waits <8us <16us <32us <64us <128u <256u =512 Event to 32m <512 <1ms <2ms <4ms <8ms <16ms =32m
------------------------- ------ ----- ----- ----- ----- ----- ----- ----- ----- ------------------------- ------ ----- ----- ----- ----- ----- ----- ----- -----
db file parallel read 196.9K .0 1.0 99.0 db file parallel read 194.9K 1.0 15.4 74.7 8.5 .3 .1 .0 .0
db file sequential read 581.2K 17.3 69.5 13.3 db file sequential read 77.2K 86.7 10.7 2.3 .2 .1 .0 .0 .0
 

So what?

I expected to have higher IOPS when disabling the page table isolation, because of the overhead of context switches. And it is the opposite here. Maybe this is because I have a very small SGA (because my goal is to have only physical reads). Note also that, as far as I know, only my guest OS has been patched for Meltdown and Spectre. We will see if the numbers are different after the next Oracle Cloud maintenance.

 

Cet article Spectre/Meltdown on Oracle Public Cloud UEK – PIO est apparu en premier sur Blog dbi services.

Clone a table

Sometimes doing a CREATE TABLE AS SELECT is all we need to copy the data from an existing table.  But what if we want more than that ?  What if we really want to clone that table to match the original as closely as possible.  We had a question along these lines on AskTOM today.  A standard CTAS copies the NOT NULL attributes and the data types, but not really much else.  We know that Data Pump will take care of it, but that is more complex than a simple CTAS.

So here is a simple routine to wrap the Data Pump calls so that the CTAS can be achieved with just as simple a command.  A database link pointing back to the same database is all we need.

Note:  The true innovation in this blog post came from Laurent’s excellent idea here.  I am just adding a small wrapper to make the process a little more palatable.  So all credit to Laurent here please.


SQL> create table emp as select * from scott.emp;

Table created.

SQL> create sequence seq start with 8000;

Sequence created.

SQL> alter table emp modify empno default seq.nextval;

Table altered.

SQL> alter table emp add primary key ( empno );

Table altered.

SQL> alter table emp add unique ( ename );

Table altered.

SQL> alter table emp compress;

Table altered.

SQL> alter table emp enable row movement;

Table altered.

And here is the routine to clone it, whilst keeping all of those additional bits of metadata.


SQL> create or replace
  2  procedure clone_tab(p_source varchar2,p_target varchar2) is
  3    n number;
  4    g varchar2(30);
  5    j varchar2(30);
  6  begin
  7    select global_name into g from global_name;
  8    begin
  9      execute immediate 'alter session close database link tmp$1';
 10    exception
 11      when others then null;
 12    end;
 13
 14    begin
 15      execute immediate 'drop database link tmp$1';
 16    exception
 17      when others then null;
 18    end;
 19
 20    execute immediate 'create database link tmp$1 using '''||g||'''';
 21
 22    if p_target like '%.%' or p_source like '%.%' then
 23      raise_application_error(-20000,'No schema prefix allowed');
 24    end if;
 25
 26    n := dbms_datapump.open('IMPORT','TABLE','TMP$1');
 27    dbms_datapump.metadata_filter(n,'NAME_LIST',''''||upper(p_source)||'''');
 28    dbms_datapump.metadata_remap(n,'REMAP_TABLE',upper(p_source),upper(p_target));
 29    dbms_datapump.start_job(n);
 30    dbms_datapump.wait_for_job(n, j);
 31  end;
 32  /

Procedure created.

SQL>
SQL> set serverout on
SQL> exec clone_tab('emp','emp2');

PL/SQL procedure successfully completed.

SQL>
SQL> select dbms_metadata.get_ddl('TABLE','EMP2',user) from dual
  2
SQL> select dbms_metadata.get_ddl('TABLE','EMP2',user) from dual;

DBMS_METADATA.GET_DDL('TABLE','EMP2',USER)
---------------------------------------------------------------------------------------

  CREATE TABLE "MCDONAC"."EMP2"
   (    "EMPNO" NUMBER(4,0) DEFAULT "MCDONAC"."SEQ"."NEXTVAL" NOT NULL ENABLE,
        "ENAME" VARCHAR2(10),
        "JOB" VARCHAR2(9),
        "MGR" NUMBER(4,0),
        "HIREDATE" DATE,
        "SAL" NUMBER(7,2),
        "COMM" NUMBER(7,2),
        "DEPTNO" NUMBER(2,0),
         UNIQUE ("ENAME")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE,
         PRIMARY KEY ("EMPNO")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 COMPRESS BASIC LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE ROW MOVEMENT

SQL> select count(*) from emp;

  COUNT(*)
----------
        14

SQL> select count(*) from emp2;

  COUNT(*)
----------
        14

Secret Hacking Session: Oracle Background Process Communication, Exotic Wait Events and Some Tracing too

Update: I unexpectedly ended up falling ill and decided to reschedule this hacking session to January 24, 10am PST. No need to re-register if you already have done so. Sorry for the inconvenience. I will upload the video to Youtube after the event.

Since I’m running my Advanced Oracle Troubleshooting Training in the end of this month, I’ll do one of my “secret” hacking sessions too for promotion and noise-making reasons next week! ;-)

Secret Hacking Session with Tanel Poder: Oracle Background Process Communication, Exotic Wait Events and Some Tracing too

In this session we will look into some internals of Oracle background process communication and also some special types of wait events that most people aren’t aware of. We will use some exotic tracing for internals research and fun and some of this stuff is actually useful in real life too! I’m not going to reveal everything upfront, as this is a secret internals hacking session after all ;-)

We will use various techniques to research what the “reliable message” wait event is about and how reliable background process communication is orchestrated in Oracle.

This is a hacking session, not formal structured training, so I’ll just do free form demos and talk (probably no slides, just hacking stuff on the command line). I will later upload the video to my Youtube channel too – https://youtube.com/TanelPoder

Oh and it’s free!

Date & Time: Wed 24 Jan 10am PST

Location: GotoWebinar

See you soon!

(I said there would probably be no slides, but maybe I’ll still show one or two ;-)

 

https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/01/oracle-... 300w, https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/01/oracle-... 768w, https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/01/oracle-... 50w, https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/01/oracle-... 1600w" sizes="(max-width: 800px) 100vw, 800px" data-recalc-dims="1" />

 

 

NB! I am running one more Advanced Oracle Troubleshooting training in 2018! You can attend the live online training and can download personal video recordings too. The Part 1 starts on 29th January 2018 - sign up here!

ASSM tangle

Here’s a follow-on from Tuesday’s (serious) note about a bug in 12.1.0.2 that introduces random slowdown on large-scale inserts. This threat in this note, while truthful and potentially a nuisance, is much less likely to become visible because it depends on you doing something that you probably shouldn’t be doing.

There have always been problems with ASSM and large-scale deletes – when should Oracle mark a block as having free space on deletion: if your session does it immediately then other sessions will start trying to use the free space that isn’t really there until you commit; if your session doesn’t do it immediately when can it happen, since you won’t want it done on commit – but that means the segment could “lose” a lot of free space if something doesn’t come along in a timely fashion and tidy up.

But here’s a quirky problem that takes things one step further. What happens if you try to delete a load of data and fail and your session rolls back? If we start with yesterday’s script (running on 11.2.0.4 or 12.2.0.1) we can create a table with 1M rows in it and the following space usage:


Unformatted		      : 	   0 /		      0
Freespace 1 (  0 -  25% free) : 	   0 /		      0
Freespace 2 ( 25 -  50% free) : 	   1 /		  8,192
Freespace 3 ( 50 -  75% free) : 	   0 /		      0
Freespace 4 ( 75 - 100% free) : 	  67 /		548,864
Full			      :       41,666 /	    341,327,872

You will recall that each “Full” block actually had the basic 10% free space, plus a couple of hundred extra bytes which Oracle had to “forget about” because the incoming rows were always 290 bytes long. Let’s take this table and delete the first 100,000 rows, then emulate a session error and roll back, and then check the space usage:


delete from t1 where rownum <= 100000;
rollback;

-- generate space usage report

Unformatted		      : 	   0 /		      0
Freespace 1 (  0 -  25% free) :        4,167 /	     34,136,064
Freespace 2 ( 25 -  50% free) : 	   1 /		  8,192
Freespace 3 ( 50 -  75% free) : 	   0 /		      0
Freespace 4 ( 75 - 100% free) : 	  67 /		548,864
Full			      :       37,499 /	    307,191,808

We have 4,167 blocks which were full, and we know they are effectively full for the purposes of our data, but they’re now declared as having some free space. When Oracle rolled back the delete it wasn’t running code that would attempt to discover that the block was going to go over the limit, it simply calculated the byte change from re-inserting the row, added it to the total free space (tosp) and produced a number that hadn’t reached the limit set by pctfree – so flagged the block accordingly. (Remember my comment in the earlier article that Oracle doesn’t generate undo for the state changes on the Level 1 bitmap blocks – this is, at least in part – a consequence of that strategy).

Guideline

If you’re going to do large-scale deletes in an ASSM environment, make sure they don’t fail or subsequent inserts may take a long time.

 

The Future of the DBA, #C18LV, Video 1

I’m starting to move towards doing more videos and hope to improve my video skills, (and maybe add a dance sequence, ya know, like the hip kids…)  Check out this post and please, do add comments, ask questions or just tell me what you think?

Have an awesome Wednesday and no, don’t comment on my consistent need to make a strange face at the beginning of a video… </p />
</p></div>

    	  	<div class=

ASSM Argh 2

After yesterday’s post one of the obvious follow-up questions was whether the problem I demonstrated was a side effect of my use of PL/SQL arrays and loops to load data. What would happen with a pure “insert select” statement.  It’s easy enough to check:


rem
rem     Script:         assm_argh2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

drop table t2 purge;
drop table t1 purge;

create table t2
segment creation immediate
tablespace test_8k_assm
as
select * from all_objects where rownum <= 50000 -- >comment to avoid WordPress anomaly
;

create table t1
segment creation immediate
tablespace test_8k_assm
as
select * from all_objects where rownum = 0
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

insert /*+ append */ into t1
select t2.*
from
        (
         select /*+ cardinality(40) */ rownum id
         from dual connect by level <= 40 -- > comment to avoid WordPress anomaly
        ) d,
        t2
;

commit;

declare
        m_unformatted_blocks    number;
        m_unformatted_bytes     number;
        m_fs1_blocks            number;
        m_fs1_bytes             number;
        m_fs2_blocks            number;
        m_fs2_bytes             number;

        m_fs3_blocks            number;
        m_fs3_bytes             number;
        m_fs4_blocks            number;
        m_fs4_bytes             number;
        m_full_blocks           number;
        m_full_bytes            number;

begin
        dbms_space.SPACE_USAGE(
                segment_owner           => 'TEST_USER',
                segment_name            => 'T1',
                segment_type            => 'TABLE',
                unformatted_blocks      => m_unformatted_blocks,
                unformatted_bytes       => m_unformatted_bytes,
                fs1_blocks              => m_fs1_blocks ,
                fs1_bytes               => m_fs1_bytes,
                fs2_blocks              => m_fs2_blocks,
                fs2_bytes               => m_fs2_bytes,
                fs3_blocks              => m_fs3_blocks,
                fs3_bytes               => m_fs3_bytes,
                fs4_blocks              => m_fs4_blocks,
                fs4_bytes               => m_fs4_bytes,
                full_blocks             => m_full_blocks,
                full_bytes              => m_full_bytes
        );

        dbms_output.put_line('Unformatted                   : ' || to_char(m_unformatted_blocks,'999,999,990') || ' / ' || to_char(m_unformatted_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 1 (  0 -  25% free) : ' || to_char(m_fs1_blocks,'999,999,990') || ' / ' || to_char(m_fs1_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 2 ( 25 -  50% free) : ' || to_char(m_fs2_blocks,'999,999,990') || ' / ' || to_char(m_fs2_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 3 ( 50 -  75% free) : ' || to_char(m_fs3_blocks,'999,999,990') || ' / ' || to_char(m_fs3_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 4 ( 75 - 100% free) : ' || to_char(m_fs4_blocks,'999,999,990') || ' / ' || to_char(m_fs4_bytes,'999,999,999,990'));
        dbms_output.put_line('Full                          : ' || to_char(m_full_blocks,'999,999,990') || ' / ' || to_char(m_full_bytes,'999,999,999,990'));
end;
/

I’ve copied the first 50,000 rows from all_objects as a way of generating date, then cloned it 40 times into the main table to give me a total of 2,000,000 rows.

A comment on yesterday’s blog reported that the behaviour I described has been fixed in the October bundle patch for 12.1.0.2, but I haven’t patched my copy yet. So here are the results (with a little cosmetic editing) from running the insert and reporting on space usage from 11.2.0.4, 12.1.0.2, and 12.2.0.1 in order:


11.2.0.4
========
2000000 rows created.

Unformatted                   :          764 /        6,258,688
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          133 /        1,089,536
Full                          :       28,579 /      234,119,168

12.1.0.2
========
2000000 rows created.

Unformatted                   :          256 /        2,097,152
Freespace 1 (  0 -  25% free) :       32,810 /      268,779,520
Freespace 2 ( 25 -  50% free) :            0 /                0
Freespace 3 ( 50 -  75% free) :            1 /            8,192
Freespace 4 ( 75 - 100% free) :           47 /          385,024
Full                          :          443 /        3,629,056

12.2.0.1
========
2000000 rows created.

Unformatted		      : 	 764 /	      6,258,688
Freespace 1 (  0 -  25% free) : 	   0 /		      0
Freespace 2 ( 25 -  50% free) : 	   1 /		  8,192
Freespace 3 ( 50 -  75% free) : 	   0 /		      0
Freespace 4 ( 75 - 100% free) : 	 226 /	      1,851,392
Full			      :       39,706 /	    325,271,552

The total number of blocks involved changes from version to version, of course, thanks to the huge differences in the contents of all_objects, but the headline message is clear – 12.1.0.2 is broken for this basic requirement. On the plus side, though, this is what you get from 12.1.0.2 if you change that insert to include the /*+ append */ hint:


2000000 rows created.

Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            0 /                0
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :            0 /                0
Full                          :       33,380 /      273,448,960

Unsurprisingly, 11.2.0.4 and 12.2.0.1 also behave and report 100% Full. This is a slightly special case, of course since there was no previous data in the table, but even when I started the big insert after inserting and committing a few rows all three versions behaved.

Spectre and Meltdown, Oracle Database, AWS, SLOB

Last year, I measured the CPU performance for an Oracle Database on several types of AWS instances. Just by curiosity, I’ve run the same test (SLOB cached reads) now that Amazon has applied all Spectre and Meltdown mitigation patches.

I must admit that I wanted to test this on the Oracle Cloud first. I’ve updated a IaaS instance to the latest kernel but the Oracle Unbreakable Enterprise Kernel does not include the Meltdown fix yet, and booting on the Red Hat Compatible Kernel quickly goes to a kernel panic not finding the root LVM.

This is not a benchmark you can rely on to estimate the CPU usage overhead on your application. This test is not doing system calls (so the KPTI fix should be at its minimal impact). If your application is bound on system calls (network roundtrips, physical reads) the consequences can be worse. But in that case, you have a design problem which was just masked by hardware, optimized, but insecure, by a processor running the code before testing.

Figures from last year: M4.xlarge: 4vCPU, 16GB RAM

M4 is hyper-threaded so with 2 Oracle processor licenses we can use 4 vCPU.
Here I was on Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, 2 cores with 2 threads each.
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 13.1 0.00 5.46
DB CPU(s): 1.0 13.1 0.00 5.46
Logical read (blocks): 874,326.7 11,420,189.2

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 2.0 27.3 0.00 9.24
DB CPU(s): 2.0 27.2 0.00 9.22
Logical read (blocks): 1,540,116.9 21,047,307.6

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 4.0 54.6 0.00 14.46
DB CPU(s): 4.0 54.3 0.00 14.39
Logical read (blocks): 1,779,361.3 24,326,538.0

Jan. 2018 with Spectre and Meltdown mitigation:

Same CPU now with the latest RedHat kernel.

[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8

Here is the LIOPS result for the same runs.
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 13.7 0.00 4.69
DB CPU(s): 1.0 13.7 0.00 4.69
Logical read (blocks): 808,954.0 11,048,988.1

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 2.0 27.3 0.00 8.00
DB CPU(s): 2.0 27.1 0.00 7.96
Logical read (blocks): 1,343,662.0 18,351,369.1

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 4.0 42.9 0.00 13.49
DB CPU(s): 4.0 42.5 0.00 13.37
Logical read (blocks): 1,684,204.6 18,106,823.6

Jan. 2018, with Spectre and Meltdown patches, but disabled IBRS, IBPB, KPTI

The RedHat kernel has options to disable Indirect Branch Restricted Speculation, Indirect Branch Prediction Barriers and Kernel Page Table Isolation

[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 nopti noibrs noibpb

Here are the same runs after rebooting with nopti noibrs noibpb kernel options:
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 30.1 0.00 4.86
DB CPU(s): 1.0 29.8 0.00 4.80
Logical read (blocks): 861,138.5 25,937,061.0

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 2.0 27.3 0.00 8.00
DB CPU(s): 2.0 27.0 0.00 7.92
Logical read (blocks): 1,493,336.8 20,395,790.6

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 4.0 42.9 0.00 13.49
DB CPU(s): 4.0 42.4 0.00 13.34
Logical read (blocks): 1,760,218.4 18,911,346.0
Read IO requests: 33.5 360.2

Then with only KPTI disabled, but all Spectre mitigation enabled

Here only the page table isolation is is disabled.

[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 nopti

Here are the same runs witn only nopti kernel option:
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 30.1 0.00 3.91
DB CPU(s): 1.0 29.8 0.00 3.87
Logical read (blocks): 873,451.2 26,303,984.2

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 2.0 23.1 0.00 7.60
DB CPU(s): 2.0 22.9 0.00 7.54
Logical read (blocks): 1,502,151.4 17,360,883.8

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 4.0 42.9 0.00 12.64
DB CPU(s): 4.0 42.4 0.00 12.50
Logical read (blocks): 1,764,293.0 18,954,682.3

Large pages

The previous tests were using small pages. I did a quick test with KPTI enabled and SGA using large pages:
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 30.1 0.00 4.85
DB CPU(s): 1.0 30.1 0.00 4.85
Logical read (blocks): 854,682.1 27,672,906.8

Here is the same but with KPTI disabled:
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~~~~ --------------- --------------- --------- ---------
DB Time(s): 1.0 30.1 0.00 4.85
DB CPU(s): 1.0 30.1 0.00 4.85
Logical read (blocks): 920,129.9 27,672,906.8

So what?

This is just a test on a synthetic workload. Nothing similar to a production database situation. However, those cached SLOB runs are doing what an optimized database application should do most of the time: read blocks from the buffer cache. At least this test is much better than the graphs without explanations, or the SELECT 1, that I have seen these days on social media.

Some interesting food for thought in those numbers, by the way.

Now vs. last year: between 5% and 12% degradation, which is what people have reported those days in general. That looks high but usually when we do database performance troubleshooting we are there to address queries with x10 to x100 CPU usage doing unnecessary stuff because of bad design or suboptimal execution plan.

If disable KPTI: degradation is less than 1%, so that’s an easy way to get the same performance if you are sure that you control all software running. At least temporarily before some database tuning is done.

If disable KPTI, IBRS and IBPB: not better than when disabling only KPTI. I’ve no explanation about that… Makes me wonder if those predictive branching are always a good idea.

In all case, if you are not allocating SGA with large pages, then you should. The KPTI degradation is lowered with large pages, which makes sense as the page table is smaller. And if you are not yet using large pages, the benefit will probably balance the KPTI degradation.

This is not a benchmark and your application may see a higher degradation if doing a lot of system calls. If you upgrade from an old kernel, you may even see an improvement thanks to other features compensating the mitigation ones.

 

Cet article Spectre and Meltdown, Oracle Database, AWS, SLOB est apparu en premier sur Blog dbi services.

ASSM argh!

Here’s a problem with ASSM that used to exist in older versions of Oracle had disappeared by 11.2.0.4 and then re-appeared in 12.1.0.2 – disappearing again by 12.2.0.1. It showed up on MoS a few days ago under the heading: “Insert is running long with more waits on db file sequential read”.

The obvious response to this heading is to question the number of indexes on the table – because big tables with lots of indexes tend to give you lots of random I/O as Oracle maintains the indexes – but this table had no indexes. The owner of the problem supplied several of bits of information in the initial post, with further material in response to follow-up questions, including the tkprof summary of the 10046/level 12 trace of the insert and two extracts from the trace file to show us some of the “db file sequential read” waits – the first extract made me wonder if there might be some issue involving 16KB blocks but the second one dispelled that illusion.

There are several buggy things that can appear with ASSM and large-scale DML operations, and sometimes the problems can appear long after the original had done the dirty deed, so I thought I’d create a simple model based on the information supplied to date – and discovered what the problem (probably) was. Here’s how it starts – I’ve created a tablespace using ASSM, and in this tablespace I’ve created a table which has 48 columns with a row length of 290 bytes (roughly matching the OP’s table), and I’ve hacked out a simple PL/SQL block that loops around inserting arrays of 100 rows at a time into the table for a total of 1M rows before committing.


rem
rem     Script:         assm_cleanout.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2        Lots of blocks left "not full"
rem             11.2.0.4
rem

rem
rem     using OMF, so no file-name needed
rem     Ran this bit as SYS, and altered test user to have unlimited quota
rem

/*
create
        tablespace test_8k_assm
        datafile size 1G
        extent management local
        autoallocate
        segment space management auto
;
*/

rem
rem     Create the table, pre-allocate some space.
rem     This means we should get consistent 8M extents and not initial little ones
rem

create table t1 (
        v001 varchar2(5), v002 varchar2(5), v003 varchar2(5), v004 varchar2(5), v005 varchar2(5),
        v006 varchar2(5), v007 varchar2(5), v008 varchar2(5), v009 varchar2(5), v010 varchar2(5),
        v011 varchar2(5), v012 varchar2(5), v013 varchar2(5), v014 varchar2(5), v015 varchar2(5),
        v016 varchar2(5), v017 varchar2(5), v018 varchar2(5), v019 varchar2(5), v020 varchar2(5),
        v021 varchar2(5), v022 varchar2(5), v023 varchar2(5), v024 varchar2(5), v025 varchar2(5),
        v026 varchar2(5), v027 varchar2(5), v028 varchar2(5), v029 varchar2(5), v030 varchar2(5),
        v031 varchar2(5), v032 varchar2(5), v033 varchar2(5), v034 varchar2(5), v035 varchar2(5),
        v036 varchar2(5), v037 varchar2(5), v038 varchar2(5), v039 varchar2(5), v040 varchar2(5),
        v041 varchar2(5), v042 varchar2(5), v043 varchar2(5), v044 varchar2(5), v045 varchar2(5),
        v046 varchar2(5), v047 varchar2(5), v048 varchar2(5)
)
segment creation immediate
tablespace test_8k_assm
storage(initial 8M)
;

alter table t1 allocate extent (size 8M);
alter table t1 allocate extent (size 8M);

rem
rem     Simple anonymous pl/sql block
rem     Large insert, handled with array inserts
rem     Can modify loop count and array size very easily
rem

declare
        type tab_array is table of t1%rowtype;
        junk_array tab_array;
begin

        select
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx',
                'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx', 'xxxxx'
        bulk collect into
                junk_array
        from
                all_objects
        where
                rownum  <= 100 -- > comment to avoid WordPress format issue
        ;

        for i in 1..10000 loop
                forall j in 1..junk_array.count
                        insert into t1 values junk_array(j) ;
        end loop;

end;

commit;

The number of rows per block after this insert is 24, with 1038 bytes free space left (808 due to the pctfree = 10, then the bit that was too small to take a 25th row before breaching the pctfree barrier). This means we should report 1M/24 = 41,666 full blocks and one block with some free space. So we query the table using the dbms_space package:


declare
        m_unformatted_blocks    number;
        m_unformatted_bytes     number;
        m_fs1_blocks            number;
        m_fs1_bytes             number;
        m_fs2_blocks            number;
        m_fs2_bytes             number;

        m_fs3_blocks            number;
        m_fs3_bytes             number;
        m_fs4_blocks            number;
        m_fs4_bytes             number;
        m_full_blocks           number;
        m_full_bytes            number;

begin
        dbms_space.SPACE_USAGE(
                segment_owner           => 'TEST_USER',
                segment_name            => 'T1',
                segment_type            => 'TABLE',
                unformatted_blocks      => m_unformatted_blocks,
                unformatted_bytes       => m_unformatted_bytes,
                fs1_blocks              => m_fs1_blocks ,
                fs1_bytes               => m_fs1_bytes,
                fs2_blocks              => m_fs2_blocks,
                fs2_bytes               => m_fs2_bytes,
                fs3_blocks              => m_fs3_blocks,
                fs3_bytes               => m_fs3_bytes,
                fs4_blocks              => m_fs4_blocks,
                fs4_bytes               => m_fs4_bytes,
                full_blocks             => m_full_blocks,
                full_bytes              => m_full_bytes
        );

        dbms_output.new_line;
        dbms_output.put_line('Unformatted                   : ' || to_char(m_unformatted_blocks,'999,999,990') || ' / ' || to_char(m_unformatted_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 1 (  0 -  25% free) : ' || to_char(m_fs1_blocks,'999,999,990') || ' / ' || to_char(m_fs1_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 2 ( 25 -  50% free) : ' || to_char(m_fs2_blocks,'999,999,990') || ' / ' || to_char(m_fs2_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 3 ( 50 -  75% free) : ' || to_char(m_fs3_blocks,'999,999,990') || ' / ' || to_char(m_fs3_bytes,'999,999,999,990'));
        dbms_output.put_line('Freespace 4 ( 75 - 100% free) : ' || to_char(m_fs4_blocks,'999,999,990') || ' / ' || to_char(m_fs4_bytes,'999,999,999,990'));
        dbms_output.put_line('Full                          : ' || to_char(m_full_blocks,'999,999,990') || ' / ' || to_char(m_full_bytes,'999,999,999,990'));

end;
/

The results aren’t what we expect:


Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :       35,001 /      286,728,192
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :        6,665 /       54,599,680

We have one block marked as 25 – 50% free (that’s the one block with 16 rows in it, which means about 40% space currently free) but our 41,666 full blocks are actually reported as 6,665 full blocks and 35,001 blocks with some space available. That’s going to hurt eventually if some process wants to insert more rows and finds that it has to fail its way through 35,001 blocks before finding a block which has enough free space.

So what happens when I repeat the PL/SQL block (and commit)? Here are the results from calls to dbms_space after the next two cycles:


Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :       70,002 /      573,456,384
Freespace 2 ( 25 -  50% free) :            2 /           16,384
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          220 /        1,802,240
Full                          :       13,330 /      109,199,360

Unformatted                   :          256 /        2,097,152
Freespace 1 (  0 -  25% free) :      105,003 /      860,184,576
Freespace 2 ( 25 -  50% free) :            3 /           24,576
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          117 /          958,464
Full                          :       19,995 /      163,799,040

Every time we execute the PL/SQL block we leave a trail of 35,001 more blocks which are flagged as “not quite full”.

Looking at the session stats while running the insert loop I can tell that Oracle isn’t checking to see whether or not it should be using those blocks. (A quick way of proving this is to flush the buffer cache before each execution of the PL/SQL and note that Oracle doesn’t read back the 105,000 blocks before inserting any data). So somehow, sometime, someone might get a nasty surprise – and here’s one way that it might happen:

Since I know I my data fits 24 rows per block I’m going to modify my PL/SQL block to select one row into the array then loop round the insert 25 times – so I know I’m inserting a little bit more than one block’s worth of data. Starting from the state with 105,003 blocks marked as “Freespace 1” this is what I saw – first, the free space report after inserting 25 rows:


Unformatted                   :          240 /        1,966,080
Freespace 1 (  0 -  25% free) :        1,074 /        8,798,208
Freespace 2 ( 25 -  50% free) :            0 /                0
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :          133 /        1,089,536
Full                          :      123,927 /    1,015,209,984

Then a few wait events and session statistics for the insert:


---------------------------------------------------------
SID:    39:TEST_USER - jonathan
Session Events - 09-Jan 16:57:18
Interval:-      6 seconds
---------------------------------------------------------
Event                                             Waits   Time_outs        Csec    Avg Csec    Max Csec
-----                                             -----   ---------        ----    --------    --------
db file sequential read                          15,308           0         128        .008           3
db file scattered read                           20,086           0         271        .014           4

---------------------------------
Session stats - 09-Jan 16:57:18
Interval:-  6 seconds
---------------------------------
Name                                                                     Value
----                                                                     -----
session logical reads                                                  269,537
physical read total IO requests                                         35,401
db block gets                                                          229,522
consistent gets                                                         40,015
physical reads                                                         124,687
physical reads cache                                                   124,687
db block changes                                                       208,489
physical reads cache prefetch                                           89,293
redo entries                                                           207,892
redo size                                                           16,262,724
undo change vector size                                                  1,720
deferred (CURRENT) block cleanout applications                         103,932
table scan blocks gotten                                                20,797
HSC Heap Segment Block Changes                                              25

The session has read and updated almost all of the level 1 bitmap blocks. I don’t know exactly what triggered this re-read, but seems to be related to the number of rows inserted (or, perhaps, the amount of space used rather than the row count) as an insert crosses the pctfree boundary and fails over to the next block. I’ve only done a couple of little tests to try and get a better idea of why an insert sometimes sweeps through the bitmap blocks – so I know that inserting 2 or 3 rows at a time will also trigger the cleanout – but there are probably several little details involved that need to be identified.

You might note a couple of details in the stats:

  • Because I had flushed the buffer cache before the insert Oracle did its “cache warmup” tablescanning trick – if this had not happened I would probably have done a single block read for every single bitmap block I touched.
  • There are 103,932 block cleanout applications – but 208,000 db block changes and redo entries. Roughly half the latter are for data block cleanouts (OP code 4.1) and half are the state changes on the level 1 bitmap blocks (OP code 13.22). You’ll notice that neither change produces any undo.
  • I’ve also included the HSC Heap Segment Block Changes statistics to show you that not all changes to Heap Segment Blocks show up where you might expect them.

And finally:

If you re-run the tests on 11.2.0.4 and 12.2.0.1 you get the following results after the intial script run – the problem doesn’t exist:


11.2.0.4
========
Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :       41,666 /      341,327,872

12.2.0.1
========
Unformatted                   :            0 /                0
Freespace 1 (  0 -  25% free) :            0 /                0
Freespace 2 ( 25 -  50% free) :            1 /            8,192
Freespace 3 ( 50 -  75% free) :            0 /                0
Freespace 4 ( 75 - 100% free) :           67 /          548,864
Full                          :       41,666 /      341,327,872

So how does this help the OP.

  • First, there may be a huge mess still waiting to be cleaned in the table – but at 34M blocks I’m not too keen to suggest running the dbms_space routine to find out what it looks like – but maybe that’s necessary.
  • Secondly – an upgrade to 12.2 will probably avoid the problem in future.
  • Thirdly – if the number of rows per block is very close to uniform, write a little code to do a loop that inserts (say) 2 * expected number of rows per block as single row inserts and rolls back; the inserts will probably trigger a massive though perhaps not complete cleanout, so rinse and repeat until the cleanout is complete. Try to find a time when you don’t mind the extra load to get this job done.
  • Finally – on the big job that does the bulk insert – repeat the dummy insert/rollback at the end of the job to clean up the mess made by the job.

Addenda

Prompted by comment #2 below, I should add that if the problem has been fixed in 12.2 then possibly there’s a bug report and patch for it already. If there isn’t then the OP could raise an SR (referencing this blog note), and request a bug fix or back-port from 12.2.

And with 24 hours of publication, comment #4 (from Yury Pudovchenko) tells us that the bug is fixed by the Oct 2017 Bundle Patch.