Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Case Study

A recent thread on the ODC database forum highlighted a case where the optimizer was estimating 83,000 for a particular index full scan when the SQL Monitor output for the operation showed that it was returning 11,000,000 rows.

Apart from the minor detail that the OP didn’t specifically ask a question, the information supplied was pretty good. The OP had given us a list of bind variables, with values, and the SQL statement, followed by the text output of the Monitor’ed SQL and, to get the predicate section of the plan, the output from a call to dbms_xplan. This was followed by the DDL for the critical index and a list of the stats for all the columns in the index.

Here’s the critical line of the plan (from the SQL Monitor report) followed by its predicate section (from the dbms_xplan output, but cosmetically enhanced) and some details of the columns used in the predicate:

SQL Plan Monitoring Details (Plan Hash Value=3210215320)
=================================================================================================================================================================================================================================
| Id    |            Operation            |         Name            |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write | Mem  | Temp | Activity |       Activity Detail       | Progress | 
|       |                                 |                         | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes |      |      |   (%)    |         (# samples)         |          |
=================================================================================================================================================================================================================================
|    11 |             INDEX FULL SCAN     | PK_HOUSEHOLD_GDC        |   83917 | 22799 |        86 |     +1 |     1 |      11M |     9 | 73728 |       |       |      |      |    24.21 | Cpu (77)                    |          |
=================================================================================================================================================================================================================================

  11 - filter(
        (    TO_DATE(:SYS_B_00||TO_CHAR("MONTH")||:SYS_B_01||TO_CHAR("YEAR"),:SYS_B_02)>=ADD_MONTHS(TRUNC(TO_DATE(:SYS_B_03,:SYS_B_04),:SYS_B_05),(-:SYS_B_06)) 
         AND TO_DATE(:SYS_B_00||TO_CHAR("MONTH")||:SYS_B_01||TO_CHAR("YEAR"),:SYS_B_02)<=TRUNC(TO_DATE(:SYS_B_07,:SYS_B_08),:SYS_B_09)-:SYS_B_10)
        )

COLUMN_NAME                    DATA_TYPE       NUM_DISTINCT  DENSITY  NUM_NULLS LAST_ANALYZED       HISTOGRAM
------------------------------ --------------- ------------ -------- ---------- ------------------- ---------------
YEAR                           NUMBER                     5        0          0 2018-12-02 13:19:10 FREQUENCY
MONTH                          NUMBER                    12        0          0 2018-12-02 13:19:10 FREQUENCY

I’ve included the full Monitor output at the end of the posting, or you could visit the ODC page if you want to see it, but if we look at just this line we can see that the index full scan starts running in the first second of the query (‘Start Active’), runs once (‘Execs’) and, as the OP said, retrieved 11M rows in that one scan compared to an estimated 83,917.

When we examine the predicate section we can understand why the optimizer could make such a large error – the SQL requires Oracle to combine two columns from the table with various bits of bind variables to construct a date which is then compares with a couple of constant dates derived from several input bind variables using range based comparisons.

This is an example of Oracle using a fixed estimate of 5% for the selectivity of “unknown range-based comparison” – but with two comparisons the selectivity becomes 5% of 5% = 0.25% (i.e. 1/400).

If we look at the column definitions and stats we see that we seem to have 5 possible years and 12 possible months (which could mean a range as small as 3 years and 2 months) – so a selectivity of 1/400 would be in the right ballpark if we were querying for a date range of roughly 4.5 days. Working the figures the other way around – if 83,917 is 1/400 of the data then there are about 33.5M rows in the table and we are querying for something more like 1/3 of the table.

Observations

I find it curious that the optimizer used an “index full scan” to fetch a huge amount of data from the index when there is no requirement for sorting (there is a subsequent “hash unique”, rather than “sort unique nosort”). I would have expected an “index fast full scan” so I am curious to know if some optimizer parameters have been fiddled with to get the optimizer to bypass the fast full scan. Possibly a change in parameter settings would result in a very different plan.

The names of the bind variables are of the form “SYS_B_nn” – which means that the original query has been subject to the effects of forced cursor sharing. Since we are apparently expecting to identify and manipulate millions of rows this looks like the type of query where you don’t want to use cursor sharing. If the session can set “cursor_sharing=exact” before running the query, or inject the hint /*+ cursor_sharing_exact */ into the query then perhaps we’d get a better estimate of rows (and a better plan). If hinting or setting session parameters is possible then setting optimzer_dynamic_sampling to level 3, or possibly 4, might be sufficient.

The messy expression combining month and year is a crippling handicap to the optimizer – so fixing the query to make the literals visible isn’t actually going to help. This is Oracle 12c, though – so we could add a virtual date column (declared as invisible to avoid the threat of inserts that don’t specify column lists) and gather stats on it. The combination of virtual column and literal values might give the optimizer the information it really needs. Here’s a little script to demonstrate:


rem
rem     Script:         virtual_study.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:
rem
rem     Last tested
rem             12.1.0.2

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        sysdate - (5 * 365) + rownum / 550      d1,
        to_number(
                to_char(
                        (sysdate - (5 * 365) + rownum / 550),
                        'MM'
                )
        )                                       month,
        to_number(
                to_char(
                        (sysdate - (5 * 365) + rownum / 550),
                        'YYYY'
                )
        )                                       year,
        lpad(rownum,10,'0')                     v1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns month size 12 for columns year size 6'
        );
end;
/

I’ve created a table with a million rows with data going back roughly 5 years from current date, which means I need roughly 550 rows per day. I’ve then created histograms on the month and year columns to match the original posting. Now I’ll set up the bind variables and values specified by the OP and run a simple query to show the date information that the bind variables give, and the 1/400 selectivity of the OP’s predicate:


var SYS_B_00 varchar2(32);
var SYS_B_01 varchar2(32);
var SYS_B_02 varchar2(32);
var SYS_B_03 varchar2(32);
var SYS_B_04 varchar2(32);
var SYS_B_05 varchar2(32);
var SYS_B_06 number;
var SYS_B_07 varchar2(32);
var SYS_B_08 varchar2(32);
var SYS_B_09 varchar2(32);
var SYS_B_10 number;

exec :SYS_B_00:='01/';
exec :SYS_B_01:='/';
exec :SYS_B_02:='dd/MM/yyyy';
exec :SYS_B_03:='10/04/2018';
exec :SYS_B_04:='MM/dd/yyyy';
exec :SYS_B_05:='q';
exec :SYS_B_06:=12;
exec :SYS_B_07:='10/04/2018';
exec :SYS_B_08:='MM/dd/yyyy';
exec :SYS_B_09:='q';
exec :SYS_B_10:=1;

select
        to_date(:sys_b_00||to_char(month)||:sys_b_01||to_char(year),:sys_b_02)  d1, 
        add_months(trunc(to_date(:sys_b_03,:sys_b_04),:sys_b_05),(-:sys_b_06))  c1,
        to_date(:sys_b_00||to_char(month)||:sys_b_01||to_char(year),:sys_b_02)  d2,
        trunc(to_date(:sys_b_07,:sys_b_08),:sys_b_09)-:sys_b_10                 c2
from
        t1
where
        rownum = 1
;

set serveroutput off
alter session set statistics_level = all;

select  count(*)
from    t1
where
        (    to_date(:sys_b_00||to_char(month)||:sys_b_01||to_char(year),:sys_b_02) >= add_months(trunc(to_date(:sys_b_03,:sys_b_04),:sys_b_05),(-:sys_b_06)) 
         and to_date(:sys_b_00||to_char(month)||:sys_b_01||to_char(year),:sys_b_02) <= trunc(to_date(:sys_b_07,:sys_b_08),:sys_b_09)-:sys_b_10 )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

===========================================

D1        C1        D2        C2
--------- --------- --------- ---------
01-DEC-13 01-OCT-17 01-DEC-13 30-SEP-18


  COUNT(*)
----------
    200750

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:07.39 |    4980 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:07.39 |    4980 |
|*  2 |   FILTER            |      |      1 |        |    200K|00:00:06.42 |    4980 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |   2500 |    200K|00:00:04.59 |    4980 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(TRUNC(TO_DATE(:SYS_B_07,:SYS_B_08),:SYS_B_09)-:SYS_B_10 .ge. ADD_MON
              THS(TRUNC(TO_DATE(:SYS_B_03,:SYS_B_04),:SYS_B_05),(-:SYS_B_06)))
   3 - filter((TO_DATE(:SYS_B_00||TO_CHAR("MONTH")||:SYS_B_01||TO_CHAR("YEAR")
              ,:SYS_B_02) .ge. ADD_MONTHS(TRUNC(TO_DATE(:SYS_B_03,:SYS_B_04),:SYS_B_05),(-:SYS_B
              _06)) AND TO_DATE(:SYS_B_00||TO_CHAR("MONTH")||:SYS_B_01||TO_CHAR("YEAR"),:SYS
              _B_02) .le. TRUNC(TO_DATE(:SYS_B_07,:SYS_B_08),:SYS_B_09)-:SYS_B_10))


Note: in this and subsequent text I’ve had to use .le. to represent “less than or equal to” and .ge. to represent “greater than or equal to”. in the execution plans

This shows us that the first row in my table has a date component of 1st Dec 2013, while the date range required by the OP was one year’s worth of data between 1st Oct 2017 and 30th Sept 2018. The optimizer’s estimate of 2,500 rows out of 1M is the 1/400 we expect.

Let’s test the effect of running the query using literals (i.e. in the OP’s environment stop the “cursor_sharing = force” effect):


select
        count(*)
from    t1
where
        (    to_date('01/'||to_char(month)||'/'||to_char(year),'dd/MM/yyyy') >= add_months(trunc(to_date('10/04/2018','dd/MM/yyyy'),'q'),(-12)) 
         and to_date('01/'||to_char(month)||'/'||to_char(year),'dd/MM/yyyy') <= trunc(to_date('10/04/2018','dd/MM/yyyy'),'q')-1 )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'))

========================================================

 COUNT(*)
----------
    200750


--------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |   892 (100)|      1 |00:00:05.17 |    4980 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |            |      1 |00:00:05.17 |    4980 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |   2500 |   892  (30)|    200K|00:00:04.30 |    4980 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((TO_DATE('01/'||TO_CHAR("MONTH")||'/'||TO_CHAR("YEAR"),'dd/MM/yyyy') .ge. TO_DAT
              E(' 2017-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              TO_DATE('01/'||TO_CHAR("MONTH")||'/'||TO_CHAR("YEAR"),'dd/MM/yyyy') .le. TO_DATE(' 2018-03-31
              00:00:00', 'syyyy-mm-dd hh24:mi:ss')))


We can see that the literals have echoed through the plan to the predicate section, but the optimizer hasn’t changed its estimate. Let’s create the virtual column, gather stats on it, and try again:


alter table t1 add v_date invisible generated always as (
        to_date('01/'||to_char(month)||'/'||to_char(year),'dd/MM/yyyy')
) virtual
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns v_date size 1')

select  /* virtual column */
        count(*)
from    t1
where
        (    to_date('01/'||to_char(month)||'/'||to_char(year),'dd/MM/yyyy') >= add_months(trunc(to_date('10/04/2018','dd/MM/yyyy'),'q'),(-12)) 
         and to_date('01/'||to_char(month)||'/'||to_char(year),'dd/MM/yyyy') <= trunc(to_date('10/04/2018','dd/MM/yyyy'),'q')-1 )
;

 select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

=======================================================================

 COUNT(*)
----------
    200750

--------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |   950 (100)|      1 |00:00:06.27 |    4980 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |            |      1 |00:00:06.27 |    4980 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |    236K|   950  (34)|    200K|00:00:04.78 |    4980 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((TO_DATE('01/'||TO_CHAR("MONTH")||'/'||TO_CHAR("YEAR"),'dd/MM/yyyy') .ge. TO_DAT
              E(' 2017-04-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              TO_DATE('01/'||TO_CHAR("MONTH")||'/'||TO_CHAR("YEAR"),'dd/MM/yyyy') .le. TO_DATE(' 2018-03-31
              00:00:00', 'syyyy-mm-dd hh24:mi:ss')))



The optimizer sees that the expression involving month and year matches the virtual column definition, and evaluates the two date expression to produce simple constants and gives us a cardinality estimate in the right ballpark.

Conclusion

Cursor sharing and “big” queries don’t mix. If you have queries that have to manipulate large volumes of data then the overhead of optimising each one separately is likely to be insignificant, and the threat of cardinality errors introduced by bind variables being re-used could be significant.

If you have to make use of an existing (bad) table definition, and can’t managed to write predicates that allow the optimizer to use existing column statistics, remember that you might be able to create a virtual (and invisible) column that captures the necessary definition thereby allowing you to give Oracle some statistics about the necessary predicate.

Footnote

In case you didn’t want to scan through the ODC page, here’s the full SQL Monitor output for the original query:


Global Stats
==============================================================================================
| Elapsed |   Cpu   |    IO    | Cluster  |  Other   | Buffer | Read | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes | Reqs  | Bytes |
==============================================================================================
|     320 |      76 |      140 |       39 |       66 |     8M | 257K |   2GB |  1528 | 306MB |
==============================================================================================
 
 
SQL Plan Monitoring Details (Plan Hash Value=3210215320)
=================================================================================================================================================================================================================================
| Id    |            Operation            |         Name            |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write | Mem  | Temp | Activity |       Activity Detail       | Progress | 
|       |                                 |                         | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes |      |      |   (%)    |         (# samples)         |          |
=================================================================================================================================================================================================================================
|  -> 0 | SELECT STATEMENT                |                         |         |       |       180 |   +142 |     1 |        0 |       |       |       |       |      |      |          |                             |          |
|  -> 1 |   SORT UNIQUE                   |                         |    1093 | 52574 |       180 |   +142 |     1 |        0 |       |       |   534 | 107MB |   2M | 113M |     0.94 | Cpu (3)                     |          |
|  -> 2 |    NESTED LOOPS                 |                         |    1093 | 52573 |       180 |   +142 |     1 |       3M |       |       |       |       |      |      |     0.31 | Cpu (1)                     |          |
|  -> 3 |     NESTED LOOPS                |                         |    1118 | 52573 |       180 |   +142 |     1 |       3M |       |       |       |       |      |      |     0.31 | Cpu (1)                     |          |
|  -> 4 |      HASH JOIN RIGHT SEMI       |                         |    1118 | 52238 |       189 |   +133 |     1 |       3M |       |       |       |       | 153M |      |     1.57 | Cpu (5)                     |          |
|     5 |       VIEW                      |                         |    157K | 31145 |         9 |   +134 |     1 |       2M |       |       |       |       |      |      |          |                             |          |
|     6 |        WINDOW SORT              |                         |    157K | 31145 |        57 |    +86 |     1 |       4M |  3777 | 199MB |   994 | 199MB |      |      |     3.14 | Cpu (5)                     |     100% |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | direct path read temp (5)   |          |
|     7 |         HASH JOIN               |                         |    157K | 29653 |        50 |    +85 |     1 |       4M |       |       |       |       |      |      |     1.26 | Cpu (4)                     |          |
|     8 |          VIEW                   |                         |   81771 | 23273 |         1 |    +86 |     1 |       1M |       |       |       |       |      |      |          |                             |          |
|     9 |           HASH UNIQUE           |                         |   81771 | 23273 |        75 |    +12 |     1 |       1M |       |       |       |       |      |      |     1.89 | Cpu (6)                     |          |
|    10 |            FILTER               |                         |         |       |        78 |     +9 |     1 |      11M |       |       |       |       |      |      |     0.31 | Cpu (1)                     |          |
|    11 |             INDEX FULL SCAN     | PK_HOUSEHOLD_GDC        |   83917 | 22799 |        86 |     +1 |     1 |      11M |     9 | 73728 |       |       |      |      |    24.21 | Cpu (77)                    |          |
|    12 |          INDEX FULL SCAN        | PK_ADV_HOUSEHOLD_ACCT   |      8M |  6332 |        49 |    +86 |     1 |       8M |       |       |       |       |      |      |    12.58 | gc cr block 2-way (37)      |          |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | gc current block 2-way (3)  |          |
| -> 13 |       INDEX FULL SCAN           | PK_ADV_HOUSEHOLD_ACCT   |      8M |  6332 |       180 |   +142 |     1 |       7M |       |       |       |       |      |      |     0.63 | Cpu (2)                     |          |
| -> 14 |      INDEX RANGE SCAN           | IDX4_LPL_BETA_CUST_RLTN |       1 |     1 |       181 |   +141 |    3M |       3M | 75759 | 592MB |       |       |      |      |    23.27 | gc current grant 2-way (1)  |          |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | Cpu (21)                    |          |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | db file parallel read (52)  |          |
| -> 15 |     TABLE ACCESS BY INDEX ROWID | IMPL_LPL_BETA_CUST_RLTN |       1 |     1 |       180 |   +142 |    3M |       3M |  177K |   1GB |       |       |      |      |    29.56 | Cpu (12)                    |          |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | db file parallel read (81)  |          |
|       |                                 |                         |         |       |           |        |       |          |       |       |       |       |      |      |          | db file sequential read (1) |          |
=================================================================================================================================================================================================================================

Sangam 18: Presentations, Scripts and More

Many, many thanks to those who came to my three sessions at Sangam 18, the largest meeting of Oracle technologists in South Asia.

As I promised, you can download the presentations and scripts for all my sessions here. As always, your feedback will be highly appreciated.

Plans and Trees

Prompted by a question on the ODC database forum – and also because I failed to get to the “Bonus slides” on my presentation on basic execution plans at both the DOAG and UKOUG conferences, here’s a small of slides demonstrating how to convert a text execution plan into a tree that you can read using the mechanism described in Oracle’s white paper by the phrase: “start from the bottom left and work across and then up”.

The file is a Microsoft Powerpoint file (early version).

 

Installing Ansible on Oracle Linux 7 for test and development use

There are a few alternative ways of installing Ansible on Linux, and the install guide for Ansible 2.7 (the current version at the time of writing) does a great job in explaining them all in detail.  There is a potentially easier way to get to a current Ansible version if you are using Oracle Linux 7, but it comes with a very important limitation. Let’s get that out of the way first.

You need to be aware that the RPM-based installation of Ansible as described in this blog post requires you to enable Oracle’s EPEL repository. As per https://yum.oracle.com/oracle-linux-7.html, the EPEL repository is listed under “Packages for Test and Development” (bottom of the page) and these come with the following warning:  Note: The contents in the following repositories are for development purposes only. Oracle suggests these not be used in production. 

This is really important!

If you are ok with the limitation I just quoted from Oracle’s YUM server, please read on. If not, head back to the official Ansible documentation and use a different method instead. I only use Ansible in my own lab and therefore don’t mind.

Updating the repository configuration file

Back to the topic of Ansible … Before I get around to install Ansible on my machines I update my yum repository configuration file. Things are changing quickly, and I found /etc/yum.repos.d/public-yum-ol7.repo to be outdated at times. I always refresh it from yum.oracle.com just to be sure I’m not missing out on the new stuff.

# cd /etc/yum.repos.d
# mv -iv public-yum-ol7.repo public-yum-ol7.repo.$(date +%y%m%d)
# wget http://yum.oracle.com/public-yum-ol7.repo

With the new file in place, use your preferred method to enable the ol7_developer_EPEL repository. I simply edit public-yum-ol7.repo, there are other ways like yum-config-manager getting you there. But be advised: I just noticed the from December 17 at least the UEK Release 5 repository might be enabled by default leading to a potentially unwanted kernel upgrade.

Install Ansible

With the developer EPEL repository enabled, you have access to a great many Ansible versions. At the time of writing, these were available:

# yum --showduplicates list ansible
Loaded plugins: ulninfo
Available Packages
ansible.noarch                      2.3.1.0-1.el7                       ol7_developer_EPEL
ansible.noarch                      2.4.2.0-1.el7                       ol7_developer_EPEL
ansible.noarch                      2.5.0-2.el7                         ol7_developer_EPEL
ansible.noarch                      2.5.1-1.el7                         ol7_developer_EPEL
ansible.noarch                      2.5.2-1.el7                         ol7_developer_EPEL
ansible.noarch                      2.6.1-1.el7                         ol7_developer_EPEL
ansible.noarch                      2.6.2-1.el7                         ol7_developer_EPEL
ansible.noarch                      2.6.4-1.el7                         ol7_developer_EPEL
ansible.noarch                      2.6.5-1.el7                         ol7_developer_EPEL
ansible.noarch                      2.7.0-1.el7                         ol7_developer_EPEL

# yum info ansible
Loaded plugins: ulninfo
Available Packages
Name        : ansible
Arch        : noarch
Version     : 2.7.0
Release     : 1.el7
Size        : 11 M
Repo        : ol7_developer_EPEL/x86_64
Summary     : SSH-based configuration management, deployment, and task execution system
URL         : http://ansible.com
Licence     : GPLv3+
Description : Ansible is a radically simple model-driven configuration management,
            : multi-node deployment, and remote task execution system. Ansible works
            : over SSH and does not require any software or daemons to be installed
            : on remote nodes. Extension modules can be written in any language and
            : are transferred to managed machines automatically.

Happy testing!

See you in OBUG Tech Days Belgium

Antwerp, February 7, 2019 — February 8, 2019

I’ll demo join methods in slow motion, but look at the full Agenda: https://www.techdaysbelgium.be/?page_id=507

And it’s not only about sessions: all speakers are well known in the community for their will to discuss and share knowledge, opinions… and beers.

Registration opened

Tickets! " Techdays Belgium

OBUG Tech Days Belgium 2019 – Antwerp – 7/8-FEB-2019

Agenda: https://www.techdaysbelgium.be/?page_id=507

Dates: February 7 and 8, 2019

Location: http://cinemacartoons.be in Antwerp, Belgium

More information soon.

For people from the netherlands: this is easy reachable by car or by train! This is a chance to attend a conference and meet up with a lot of well-known speakers in the Oracle database area without too extensive travelling.

Account locking in an Active Data Guard environment

During the Data Guard round table of the excellent UKOUG Tech18 conference I got aware of this topic that I’d like to share with the Oracle community:

What is the locking behavior for user accounts in an environment where users may connect to the primary as well as to the standby database?

User gets locked on the primary

SQL> alter profile default limit failed_login_attempts 2;

Profile altered.

SQL> create user adam identified by adam;

User created.
SQL> grant create session to adam;

Grant succeeded.

SQL> connect adam/wrong@prima
ERROR:
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.
SQL> connect adam/wrong@prima
ERROR:
ORA-01017: invalid username/password; logon denied

SQL> connect adam/wrong@prima
ERROR:
ORA-28000: The account is locked.

I changed the default profile so the account lock happens faster. The change of the default profile reaches the standby via redo apply. The same goes for account locks that happened on the primary like above.

Standby inherits the locked accounts from primary

SQL> connect adam/wrong@physt
ERROR:
ORA-28000: The account is locked.

This inherited lock cannot be unlocked on the standby:

SQL> connect sys/oracle@physt as sysdba
Connected.
SQL> alter user adam account unlock;
alter user adam account unlock
*
ERROR at line 1:
ORA-28015: account unlock attempted on read-only database but a conflicting
account lockout on the primary exists

The account can only be unlocked on the primary and that implictly unlocks it on the standby too:

SQL> connect sys/oracle@prima as sysdba
Connected.
SQL> alter user adam account unlock;

User altered.
SQL> connect adam/adam@prima
Connected.
SQL> connect adam/adam@physt
Connected.

Account gets locked on the standby only

SQL> connect adam/wrong@physt
ERROR:
ORA-01017: invalid username/password; logon denied


Warning: You are no longer connected to ORACLE.
SQL> connect adam/wrong@physt
ERROR:
ORA-01017: invalid username/password; logon denied


SQL> connect adam/wrong@physt
ERROR:
ORA-28000: The account is locked.

This lock at the standby is kept there in memory only and doesn’t impact the primary:

SQL> connect adam/adam@prima
Connected.

It can be unlocked on the standby:

SQL> connect sys/oracle@physt as sysdba
Connected.
SQL> alter user adam account unlock;

User altered.

SQL> connect adam/adam@physt
Connected.

Standby locks are kept in memory

After a restart, the lock is gone:

SQL> connect adam/wrong@physt
ERROR:
ORA-01017: invalid username/password; logon denied


Warning: You are no longer connected to ORACLE.
SQL> connect adam/wrong@physt
ERROR:
ORA-01017: invalid username/password; logon denied


SQL> connect adam/wrong@physt
ERROR:
ORA-28000: The account is locked.


SQL> connect sys/oracle@physt as sysdba
Connected.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area  629143384 bytes
Fixed Size		    8660824 bytes
Variable Size		  180355072 bytes
Database Buffers	  436207616 bytes
Redo Buffers		    3919872 bytes
Database mounted.
Database opened.
SQL> connect adam/adam@physt
Connected.

Role change involving a restart of the standby

That will also lose the previous locks that have been done at the standby only:

SQL> connect adam/wrong@physt
ERROR:
ORA-01017: invalid username/password; logon denied


Warning: You are no longer connected to ORACLE.
SQL> connect adam/wrong@physt
ERROR:
ORA-01017: invalid username/password; logon denied


SQL> connect adam/wrong@physt
ERROR:
ORA-28000: The account is locked.


SQL> exit
[oracle@uhesse ~]$ dgmgrl sys/oracle@prima
DGMGRL for Linux: Release 18.0.0.0.0 - Production on Fri Dec 7 08:28:59 2018
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "prima"
Connected as SYSDBA.
DGMGRL> switchover to physt;
Performing switchover NOW, please wait...
Operation requires a connection to database "physt"
Connecting ...
Connected to "physt"
Connected as SYSDBA.
New primary database "physt" is opening...
Operation requires start up of instance "prima" on database "prima"
Starting instance "prima"...
Connected to an idle instance.
ORACLE instance started.
Database mounted.
Database opened.
Connected to "prima"
Switchover succeeded, new primary is "physt"
DGMGRL> show configuration;

Configuration - myconf

  Protection Mode: MaxPerformance
  Members:
  physt - Primary database
    prima - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 18 seconds ago)

DGMGRL> exit
[oracle@uhesse ~]$ sqlplus /nolog

SQL*Plus: Release 18.0.0.0.0 - Production on Fri Dec 7 08:31:16 2018
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.

SQL> connect adam/adam@prima
Connected.
SQL> connect adam/adam@physt
Connected.

I did the tests on 18c but I don’t think this is a new feature. I just didn’t have that topic on the radar before. Which is one reason why we go to conferences, right? </p />
</p></div>

    	  	<div class=

Misdirection

A recent post on the ODC database forum prompted me to write a short note about a trap that catches everyone from time to time. The trap is following the obvious; and it’s a trap because it’s only previous experience that lets you decide what’s obvious and the similarity between what you’re looking and your previous experience may be purely coincidental.

The question on OTN (paraphrased) was as follows:

When I run the first query below Oracle doesn’t use the index on column AF and is slow, but when I run the second query the Oracle uses the index and it’s fast. So when the input starts with ‘\\’ the indexes are not used. What’s going on ?


SELECT * FROM T WHERE AF = '\\domain\test\1123.pdf';
SELECT * FROM T WHERE AF = 'a\\domain\test\1123.pdf';

Looking at the two queries my first thought was that it’s obvious what’s (probably) happening, and my second thought was the more interesting question: “why does this person think that the ‘\\’ is significant ?”

The cause of the difference in behaviour is probably related to the way that Oracle stores statistics (specifically histograms) about character columns, and the way in which the cardinality calculations can go wrong.  If two character match over the first few characters the numeric representation of those strings that Oracle uses in a histogram is identical, and if they are long enough even the “actual value” stored would be identical. It looks as if this person is storing URLs, and it’s quite likely that there are a lot of long URLs that start with the same (long) string of characters – it’s a very old problem – and it’s an example of a column where you probably want to be absolutely sure that you don’t gather a histogram.

But why did the OP decide that the ‘\\’ was the significant bit ? I don’t know, of course, but  how about this:

  • No contrary tests: Perhaps every single time the query misbehaved the value started with ‘\\’ and it never went wrong for any other starting values. And maybe the OP tested several different domain names – it would be much easier to see the ‘\\’ as the common denominator rather than “repetitive leading character string” if you tested with values that spanned different domains.

combined with

  • An easily available “justification”: In many programming languages (including SQL) ‘\’ is an escape character – if you don’t really know much about how the optimizer works you might believe that that could be enough to confuse the optimizer.

It can be very difficult when you spot an obvious pattern to pause long enough to consider whether you’ve identified the whole pattern, or whether you’re looking at a special case that’s going to take you in the wrong direction.

 

Row Migration

There’s a little detail of row migration that’s been bugging me for a long time – and I’ve finally found a comment on MoS explaining why it happens. Before saying anything, though, else I’m going to give you a little script (that I’ve run on 12.2.0.1 with an 8KB block size in a tablespace using [corrected ASSM]  manual (freelist) space management and system allocated extents) to demonstrate the anomaly.


rem
rem     Script:         migration_itl.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem     Notes
rem     Under ASSM we can get 733 rows in the block,
rem     using freelist management it goes up to 734
rem

create table t1 (v1 varchar2(4000))
segment creation immediate
tablespace test_8k
pctfree 0
;

insert into t1
select  null from dual connect by level <= 734 -- > comment to avoid wordpress format issue
;

commit;

spool migration_itl.lst

column rel_file_no new_value m_file
column block_no    new_value m_block

select 
        dbms_rowid.rowid_relative_fno(rowid)    rel_file_no, 
        dbms_rowid.rowid_block_number(rowid)    block_no,
        count(*)                                rows_starting_in_block
from 
        t1
group by 
        dbms_rowid.rowid_relative_fno(rowid), 
        dbms_rowid.rowid_block_number(rowid) 
order by 
        dbms_rowid.rowid_relative_fno(rowid), 
        dbms_rowid.rowid_block_number(rowid)
;

update t1 set v1 = rpad('x',10);
commit;

alter system flush buffer_cache;

alter system dump datafile &m_file block &m_block;

column tracefile new_value m_tracefile

select
        tracefile 
from 
        v$process where addr = (
                select paddr from v$session where sid = (
                        select sid from v$mystat where rownum = 1
                )
        )
;

-- host grep nrid &m_tracefile

spool off

The script creates a single column table with pctfree set to zero, then populates it with 734 rows where every row has a null for its single column. The query using the calls to the dbms_rowid package will show you that all 734 rows are in the same block. In fact the block will be full (leaving a handful of bytes of free space) because even though each row will require only 5 bytes (2 bytes row directory entry, 3 bytes row overhead, no bytes for data) Oracle’s arithmetic will allow for the 11 bytes that is the minimum needed for a row that has migrated – the extra 6 bytes being the pointer to where the migrated row now lives. So 734 rows * 11 bytes = 8078, leaving 4 bytes free space with 110 bytes block and transaction layer overhead.

After populating and reporting the table the script then updates every row to grow it by a few bytes, and since there’s no free space every row will migrate to a new location. By dumping the block (flushing the buffer cache first) I can check where each row has migrated to. (If you’re running a UNIX flavour and have access to the trace directory then the commented grep command will give you what you want to see.) Here’s a small extract from the dump on a recent run:

nrid:  0x05c00082.0
nrid:  0x05c00082.1
nrid:  0x05c00082.2
nrid:  0x05c00082.3
...
nrid:  0x05c00082.a4
nrid:  0x05c00082.a5
nrid:  0x05c00082.a6
nrid:  0x05c00083.0
nrid:  0x05c00083.1
nrid:  0x05c00083.2
nrid:  0x05c00083.3
...
nrid:  0x05c00085.a4
nrid:  0x05c00085.a5
nrid:  0x05c00085.a6
nrid:  0x05c00086.0
nrid:  0x05c00086.1
nrid:  0x05c00086.2
...
nrid:  0x05c00086.3e
nrid:  0x05c00086.3f
nrid:  0x05c00086.40
nrid:  0x05c00086.41

My 734 rows have migrated to fill the next four blocks (23,130) to (23,133) of the table and taken up some of the space in the one after that (23,134). The first four blocks have used up row directory entries 0x00 to oxa6 (0 to 166), and the last block has used up row directory entries 0x00 to 0x41 (0 to 65) – giving us the expected total: 167 * 4 + 66 = 734 rows. Let’s dump one of the full blocks – and extract the interesting bits:

alter system dump datafile 23 block 130;
Block header dump:  0x05c00082
 Object id on Block? Y
 seg/obj: 0x1ba1e  csc:  0x0000000001e0aff3  itc: 169  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0006.00f.000042c9  0x0240242d.08f3.14  --U-  167  fsc 0x0000.01e0affb
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0x04   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0x05   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0x06   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
...
0xa6   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0xa7   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0xa8   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0xa9   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000

nrow=167
frre=-1
fsbo=0x160
fseo=0x2ec
avsp=0x18c
tosp=0x18c

tab 0, row 0, @0xfe4
tl: 20 fb: ----FL-- lb: 0x1  cc: 1
hrid: 0x05c00081.0
col  0: [10]  78 20 20 20 20 20 20 20 20 20
tab 0, row 1, @0xfd0
tl: 20 fb: ----FL-- lb: 0x1  cc: 1
hrid: 0x05c00081.1

This block has 169 (0xa9) ITL entries – that’s one for each row migrated into the block (nrow = 167) plus a couple spare. The block still has some free space (avsp = tosp = 0x18c: available space = total space = 396 bytes), but it can’t be used for any more incoming migration because Oracle is unable to create any more ITL entries – it’s reached the ITL limit for 8KB blocks.

So finally we come to the question that’s been bugging me for years – why does Oracle want an extra ITL slot for every row that has migrated into a block? The answer appeared in this sentence from MoS Doc ID: 2420831.1: Errors Noted in 12.2 and Above During DML on Compressed Tables”

“It is a requirement during processing of parallel transactions that each data block row that does not have a header have a block ITL available.”

Rows that have migrated into a block do not have a row header – check the flag byte (fb) for the two rows I’ve listed, it’s: “—-FL–“ , there is no ‘H’ for header. We have the First and Last row pieces of the row in this block and that’s it. So my original “why” question now becomes “What’s the significance of parallel DML?”

Imagine the general case where we have multiple processes updating rows at random from multiple blocks, and many different processes forced rows to migrate at the same time into the same block. The next parallel DML statement would dispatch multiple parallel execution slaves, which would all be locking rows in their own separate block ranges – but multiple slaves could find that they wanted to lock rows which had all migrated into the same block – so every slave MUST be able to get an ITL entry in that block at the same time; for example, if we have 8 rows that had migrated into a specific block from 8 different places, and 8 parallel execution slaves each followed a pointer from the region they were scanning to update a row that had migrated into this particular block then all 8 slaves would need an ITL entry in the block (and if there were a ninth slave scanning this region of the table we’d need a 9th ITL entry). If we didn’t have enough ITL entries in the block for every single migrated row to be locked by a different process at the same time then (in principle, at least) parallel execution slaves could deadlock each other because they were visiting blocks in a different order to lock the migrated rows. For example:

  1. PQ00 visits and locks a row that migrated to block (23,131)
  2. PQ01 visits and locks a row that migrated to block (23,132)
  3. PQ00 visits and tries to lock a row that migrated to block (23,132) — but if there were no “extra” ITL slots available, it would wait
  4. PQ01 visits and tries to lock a row that migrated to block (23,131) — but there were no “extra” ITL slots available so it would wait, and we’d be in a deadlock.

Oracle’s solution to this threat: when migrating a row to a block add a new ITL if the number of migrated rows exceeds the number of ITL slots + 2 (the presence of the +2 is a working hypothesis, it might be “+initrans of table”).

Footnote 1

The note was about problems with compression for OLTP, but the underlying message was about 4 Oracle errors of type ORA-00600 and ORA-00700, which report the discovery and potential threat of blocks where the number of ITL entries isn’t large enough compared to the number of inward migrated rows. Specifically:

  • ORA-00600 [PITL1]
  • ORA-00600 [kdt_bseg_srch_cbk PITL1]
  • ORA-00700: soft internal error, arguments: [PITL6]
  • ORA-00700: soft internal error, arguments: [kdt_bseg_srch_cbk PITL5]

 

Footnote 2

While drafting the SQL script above, I decide to check to see how many other scripts I had already written about migrated rows and itl slots: there were 12 of the former and 10 of the latter, and reading through the notes I found that one of the scripts (itl_chain.sql),Ac dated December 2002 included the following note:

According to a comment that came from Oracle support via Steve Adams, the reason for the extra ITLs is to avoid a risk of parallel DML causing an internal deadlock.

So it looks like I knew what the ITLs were for about 16 years ago, but managed to forget sometime since then.

 

 

Automatic sequences not being dropped

One of the nice new things in 12c was the concept of identity columns. In terms of the functionality they provide (an automatic number default) it is really no different from anything we’ve had for years in the database via sequences, but native support for the declarative syntax makes migration from other database platforms a lot easier.

Under the covers, identity columns are implemented as sequences. This makes a lot of sense – why invent a new piece of functionality when you can exploit something that already has been tried and tested exhaustively for 20 years? So when you create a table with an identity column, you’ll see the appearance of a system named sequence to support it.

Another nice thing about identity columns is that if you drop the table, the underlying sequence that supports the column is also automatically dropped….. (cue ominous music…) or is it? I had this demo come my way via email asking that exact question:


SQL> create table t1 ( x int generated always as identity );

Table created.

SQL> create table t2 ( x int generated by default as identity );

Table created.

SQL> select table_name, sequence_name
  2  from   user_tab_identity_cols;

TABLE_NAME           SEQUENCE_NAME
-------------------- --------------------
T1                   ISEQ$$_139912
T2                   ISEQ$$_139914

6 rows selected.

SQL> drop table t1;

Table dropped.

SQL> drop table t2;

Table dropped.

SQL> select table_name, sequence_name
  2  from   user_tab_identity_cols;

no rows selected

SQL> select sequence_name from user_sequences;

SEQUENCE_NAME
--------------------
ISEQ$$_139912
ISEQ$$_139914

Is that a bug? Why are the sequences still there?

There is no need to panic – there is a very simple explanation to this. From 10g onwards, by default, when you drop a table we do not actually drop it (ie, release the data and the space it consumed). We simply rename it, and it becomes a member of the recycle bin. This is a nifty insurance policy against losing data inadvertently. The table “disappears” from standard dictionary views but the sequences remain which is what causes the confusion.  We don’t destroy the sequences because recreating them if the table is brought back from the recycle bin would require knowing what the current high watermark for the sequence value would need to be.

Purging the recycle bin brings everything back into alignment.


SQL> purge recyclebin;

Recyclebin purged.

SQL> select sequence_name from user_sequences;

no rows selected