Search

Top 60 Oracle Blogs

Recent comments

statistics

Online Statistics Collection during Bulk Loads on Partitioned Tables

Introduction

One of the enhancements to statistics collection and management in Oracle 12c was the ability of the database will automatically collect statistics during either a create-table-as-select operation or during the initial insert into a freshly created or freshly truncated table, provide that insert is done in direct-path mode (i.e. using the APPEND hint).
When that occurs, there is an additional operation in the execution plan; OPTIMIZER STATISTICS GATHERING.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

Analysing Database Time with Active Session History for Statements with Online Optimizer Statistics Gathering Operations

(Updated 30.1.2020) I have been looking into the performance of online statistics collection. This feature was introduced in 12c.  When statistics are collected online there is an extra OPTIMIZER STATISTICS GATHERING operation in the execution plan. However, I have noticed that the presence or absence of this operation does not change the hash value of the plan.  I am grateful to RobK for his comment pointing out that the full plan hash value does differ.
This has consequences for profiling DB time by execution plan line and then describing that line from a captured plan.

Online Statistics Gathering Disabled by Column Specific METHOD_OPT Table Statistics Preference

I have come across a quirk where the presence of a table statistics preference that specifies METHOD_OPT that is specific to some columns disables online statistics gathering.  This behaviour is at least not documented.  I have reproduced this in Oracle version 12.1.0.2 and 19.3.

Demonstration 

I will create two identical tables, but on the first table, I will specify a table statistic preference to collect a histogram on column C.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">set serveroutput on verify on autotrace off
CREATE TABLE t1(a number, b varchar2(1000), c number);
CREATE TABLE t2(a number, b varchar2(1000), c number);

12.2.0.1 And Later Support (Limited) Extended Stats On Virtual columns / Column Groups of Expressions

I do have a demo as part of my optimizer related workshops that shows the restriction / limitation of DBMS_STATS not supporting extended statistics on virtual columns / group of expressions, so for example the combination of both expressions and column groups, like ((TRUNC(COL1)), (TRUNC(COL2))).

Surprisingly, when following a certain sequence of operation, this starts working (to some degree) from 12.2.0.1 on.

Single Value Column Frequency Histogram Oracle 12c and later

It is hopefully in the meantime well known that Oracle has introduced in version 11g a new algorithm to gather statistics on a table that no longer requires sorting for determining the critical Number Of Distinct Values (NDV) figure - it instead uses a clever "approximate NDV" algorithm which always reads 100% of the table data and therefore in principle generates very accurate statistics. This new algorithm gets used only when the ESTIMATE_PERCENT parameter to the DBMS_STATS.GATHER*STATS calls is left at default or explicitly passed as "DBMS_STATS.AUTO_SAMPLE_SIZE". This new algorithm is also required in case other new features like "Incremental Statistics" should be used.

In 12c Oracle improved this algorithm allowing the generation of Frequency and the new Top Frequency histogram types in a single pass. The new Hybrid histogram type still requires a separate pass.

Oracle 19c Real-Time and High-Frequency Automatic Statistics Collection

I gave this presentation at the UKOUG Techfest 19 conference.  This video was produced as a part of the preparation for that session.  The slide deck is also available on my website.


It takes a look at the pros and cons of these new 19c features.  They are only available on Engineered Systems.  Both features aim to address the challenge of using data that has been significantly updated before the statistics maintenance window has run again.

Practical Application Performance Tuning: Applying Theory in Practice

I gave this presentation at the UKOUG Techfest 19 conference.  It is closely based on a previous presentation about PeopleSoft nVision performance tuning, and uses the experience of a PeopleSoft project as a case study, so I am also posting here and on my PeopleSoft blog.
This video was produced as a part of the preparation for this session.  The slide deck is also available on my website.

Strange Estimates.

A question came up on the Oracle-L list server a couple of days ago expressing some surprise at the following execution plan:

19c High-Frequency statistics gathering and Real-Time Statistics

Those are the two exciting new features about the optimizer statistics which arrived in the latest release of 12cR2: 19c. Less exciting is that we are not allowed to use them in any other platform than Exadata:

https://apex.oracle.com/database-features/

But let’s cross the fingers and hope that this will be released in the future because they solve real-life problems such as Out-of-Range queries. Here is a little example involving both of them. A table starts empty and is growing during the day. Relying only on the statistics gathered during the maintenance window will give bad estimations. And dynamic sampling may not sample the right blocks.

No more stale statistics in 19c

There is an odd contradiction that we all encounter for most databases, especially if they are predominantly used during the business day. Here is how that contradiction comes to be – it is in the way that we obtain and use optimizer  statistics on those databases. The contradiction runs like this:

  • To minimize service disruption, we gather statistics at a quiet time, for example, in the middle of the night
  • We then use those statistics during the business day whilst user activity is at its highest.
  • Highest user activity will typically mean the highest frequency of data changes.
  • Hence the statistics are at their peak accuracy when no-one is using them to optimize queries, and they are at their least accurate when everyone is using them to optimize queries!

We can demonstrate this easily with the following script run in 18c.