One of the enhancements to statistics collection and management in Oracle 12c was the ability of the database will automatically collect statistics during either a create-table-as-select operation or during the initial insert into a freshly created or freshly truncated table, provide that insert is done in direct-path mode (i.e. using the APPEND hint).
When that occurs, there is an additional operation in the execution plan; OPTIMIZER STATISTICS GATHERING.
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | 495K(100)| |
| 1 | LOAD AS SELECT | | | | | |
| 2 | OPTIMIZER STATISTICS GATHERING | | 70M| 11G| 495K (2)| 00:00:20 |
| 3 | TABLE ACCESS FULL | XXXXXXXXXXXXXXX | 70M| 11G| 495K (2)| 00:00:20 |
----------------------------------------------------------------------------------------------------
The motivation for this blog was encountering a bulk insert into a partitioned table where the statistics gathering operation consumed a very significant amount of time. Partitioning gives you more things to consider.
I created a simple test that compares the time taken by online statistics collection on partitioned and non-partitioned tables, with the explicit collection of statistics using DBMS_STATS. I have four tables with the same structure.
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">CREATE TABLE T1 (a number, b varchar2(1000), c number) NOLOGGING;
CREATE TABLE T2 (a number, b varchar2(1000), c number)
PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;
CREATE TABLE T3 (a number, b varchar2(1000), c number)
PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;
CREATE TABLE T4 (a number, b varchar2(1000), c number)
PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;
I loaded 100 million rows into each in direct-path mode. The partitioned tables end up with 100 partitions, each with 1 million rows. I have also suppressed redo logging during the direct-path insert by creating the tables with the NOLOGGING attribute.
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">EXEC dbms_stats.set_table_prefs(user,'T3','INCREMENTAL','FALSE');
EXEC dbms_stats.set_table_prefs(user,'T4','INCREMENTAL','TRUE');
The following set of tests will be run for different combinations of:
I enabled SQL trace, from which I was able to obtain the elapsed time of the various statements, and I can determine the amount of time spent on online statistics gathering from timings on the OPTIMIZER STATISTICS GATHERING operation in the execution plan in the trace.
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">TRUNCATE TABLE T2;
TRUNCATE TABLE T3;
EXEC dbms_stats.delete_table_stats(user,'T2');
EXEC dbms_stats.delete_table_stats(user,'T3');
EXEC dbms_stats.delete_table_stats(user,'T4');
INSERT /*+APPEND &inshint*/ into T2 i SELECT * /*+&selhint*/ from t1 s;
INSERT /*+APPEND &inshint NO_GATHER_OPTIMIZER_STATISTICS*/ into T3 i SELECT /*+&selhint*/ * from t1 s;
INSERT /*+APPEND &inshint NO_GATHER_OPTIMIZER_STATISTICS*/ into T4 i SELECT /*+&selhint*/ * from t1 s;
commit;
EXEC dbms_stats.gather_table_stats(user,'T3');
EXEC dbms_stats.gather_table_stats(user,'T4');
It was while building this test that I discovered a couple of quirks:
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">SELECT table_name, num_rows from user_tables where table_name LIKE 'T_' order by 1;
TABLE_NAME NUM_ROWS LAST_ANALYZED
---------- ---------- -----------------
T1 10000000 14:07:36 16/01/20
T2 10000000 14:07:36 16/01/20
T3
T4
I also have column statistics on T1 and T2, but no histograms.
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">break on table_name skip 1
SELECT table_name, column_name, num_distinct, global_stats, histogram, num_buckets, last_analyzed
FROM user_tab_columns where table_name like 'T_' order by 1,2;
TABLE_NAME COLUMN_NAME NUM_DISTINCT GLO HISTOGRAM NUM_BUCKETS LAST_ANALYZED
---------- ------------ ------------ --- --------------- ----------- -----------------
T1 A 10000 YES NONE 1 14:06:58 16/01/20
B 10000 YES NONE 1 14:06:58 16/01/20
C 100 YES NONE 1 14:06:58 16/01/20
T2 A 10000 YES NONE 1 14:07:11 16/01/20
B 10000 YES NONE 1 14:07:11 16/01/20
C 100 YES NONE 1 14:07:11 16/01/20
T3 A NO NONE
B NO NONE
C NO NONE
T4 A NO NONE
B NO NONE
C NO NONE
However, I do not have any partition statistics (I have only shown the first and last partition of each table in this report).
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">break on table_name skip 1
SELECT table_name, partition_position, partition_name, num_rows
FROM user_tab_partitions WHERE table_name like 'T_' ORDER BY 1,2 nulls first;
TABLE_NAME PARTITION_POSITION PARTITION_NAME NUM_ROWS LAST_ANALYZED
---------- ------------------ -------------------- ---------- -----------------
T2 1 T_PART
…
100 SYS_P20008
T3 1 T_PART
…
100 SYS_P20107
T4 1 T_PART
…
100 SYS_P20206
Online optimizer statistics gathering only collects statistics at table level but not partition or sub-partition level. Histograms are not collected.
From Oracle 18c, there are two undocumented parameters that modify this behaviour. Both default to false. Interestingly, both are enabled in the Oracle Autonomous Data Warehouse.
Statistics will be collected on partitions that do not have them when the automatic statistics collection job runs in the next database maintenance window. The question is whether to manage without them until then?
"The optimizer will use global or table level statistics if one or more of your queries touches two or more partitions. The optimizer will use partition level statistics if your queries do partition elimination, such that only one partition is necessary to answer each query. If your queries touch two or more partitions the optimizer will use a combination of global and partition level statistics."
– Oracle The Data Warehouse Insider Blog: Managing Optimizer Statistics in an Oracle Database 11g - Maria Colgan
It will depend upon the nature of the SQL in the application. If the optimizer does some partition elimination, and the data is not uniformly distributed across the partitions, then partition statistics are likely to be beneficial. If there is no partition elimination, then you might question whether partitioning (or at least the current partitioning strategy) is appropriate!
Let's look at how long it takes to insert data into, and then subsequently collect statistics on the tables in my example. This test was run on Oracle 19c on one compute node of a virtualised Exadata X4 machine with 16 CPUs. This table shows elapsed time and the total DB time include all parallel server processes for each operation.
Table Name
|
Oper-ation | Comment | Option | Serial Insert & Statistics | Parallel Insert & Statistics | Parallel SQL & Statistics | Parallel DML, Insert, Select & Statistics | Parallel DML, SQL & Statistics | Parallel Tables | Parallel Tables & DML | Parallel Tables, DML & Method Opt |
Table
|
NO |
NO |
NO |
NO |
NO |
PARALLEL | PARALLEL | PARALLEL | |||
Insert Hint | blank | PARA |
blank | PARA |
blank | blank | blank | blank | |||
Select Hint | blank | PARA |
PARA |
PARA |
PARA |
blank | blank | blank | |||
Parallel DML | DISABLE | DISABLE | DISABLE | ENABLE | ENABLE | DISABLE | ENABLE | ENABLE | |||
Stats Degree | none | DEFAULT | DEFAULT | DEFAULT | DEFAULT | none | none | none | |||
Method Opt | none | none | none | none | none | none | none | ... FOR COLUMNS SIZE 1 A | |||
T2 |
Insert
|
Online Stats Gathering
|
Elapsed Time (s)
|
172.46
|
160.86
|
121.61
|
108.29
|
60.31
|
194.47
|
23.57
|
20.57
|
Optimizer Statistics Gathering
|
82.71
|
55.59
|
55.90
|
-
|
-
|
-
|
-
|
-
|
|||
T3 |
Insert
|
NO_GATHER
|
125.40
|
156.36
|
124.18
|
20.62
|
29.01
|
199.20
|
20.97
|
21.15
|
|
Explicit Stats
|
122.80
|
146.25
|
63.23
|
15.99
|
24.88
|
24.58
|
24.99
|
24.62
|
|||
T4 |
Insert
|
NO_GATHER
|
123.18
|
158.15
|
147.04
|
20.44
|
29.91
|
204.61
|
20.65
|
20.60
|
|
Incremental Explicit Stats
|
80.51
|
104.85
|
46.05
|
23.42
|
23.14
|
23.21
|
22.60
|
23.03
|
|||
T2 |
Insert
|
Online Stats Gathering
|
DB Time (s)
|
174
|
163 | 169 | 359 | 337 | 248 | 366 | 308 |
T3 |
Insert
|
NO_GATHER
|
128
|
193 | 160 | 290 | 211 | 236 | 312 | 326 | |
Explicit Stats
|
122
|
146 | 63 | 265 | 305 | 262 | 335 | ||||
T4 |
Insert
|
NO_GATHER
|
126
|
194 | 167 | 295 | 205 | 233 | 304 | 295 | |
Incremental Explicit Stats
|
80
|
105 | 2 | 281 | 266 | 300 | 179 | 226 |
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">ALTER SESSION ENABLE PARALLEL DML;
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">insert /*+APPEND ENABLE_PARALLEL_DML*/ into T2 SELECT * from t1;
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">EXEC dbms_stats.set_table_prefs(user,'T3','DEGREE','DBMS_STATS.DEFAULT_DEGREE');
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">ALTER TABLE T3 PARALLEL;
Overall, the best performance was obtained when the tables were altered to use parallelism, and parallel DML was enabled; then the query, insert and statistics collection are performed in parallel.
However, the improved performance of parallelism comes at a cost. It can be a brutal way of bringing more resource to bear on an activity. A parallel operation can be expected to use more database time across all the parallel server processes than the same operation processed serially. My best results were obtained by activating all of the CPUs on the server without regard for any other activity. Too many concurrent parallel operations have the potential to overload a system. Remember also, that while the parallel attribute remains on the table any subsequent query will also run in parallel.
Suppressing online statistics collection saves total database time whether working in parallel or not. The saving in elapsed time is reduced when the insert and query are running in parallel. The time taken to explicitly collect statistics will exceed that saving because it is doing additional work to collect partition statistics not done during online statistics collection.
Using incremental statistics for partitioned tables will also reduce the total amount of work and database time required to gather statistics, but may not significantly change the elapsed time to collect statistics.
If you need table statistics but can manage without partition statistics until the next maintenance window, then online statistics collection is very effective. However, I think the general case will be to require partition statistics, so you will probably need to explicitly collect statistics instead. If you want histograms, then you will also need to explicitly collect statistics.
Recent comments
3 years 2 days ago
3 years 12 weeks ago
3 years 16 weeks ago
3 years 17 weeks ago
3 years 22 weeks ago
3 years 43 weeks ago
4 years 11 weeks ago
4 years 41 weeks ago
5 years 25 weeks ago
5 years 25 weeks ago