Oakies Blog Aggregator

Controlling How %UpdateStats Collects Optimizer Statistics

I have written a number of entries on this blog about updating database statistics on tables during Application Engine processes.

I proposed a PL/SQL package (wrapper.sql) called from the DDL model to intercept the call from the %UpdateStats macro in Application Engine. By default it

  • Gathers statistics on regular tables.
  • Refreshed only stale statistics on partitioned tables.
  • Does not gather statistics on Global Temporary Records.

I have now published an enhanced version: wrapper848meta.sql.

  • A table PS_GFC_STATS_OVRD holds meta-data to override the default behaviour of the script for certain records. The meta-data can also specify the size of the sample, and the options to control the collection of histograms.
  • If a private instance of a Temporary Records is a Global Temporary Tables, the wrapper may still collect statistics (normally this would be suppressed because of the risk of one session using the statistics collected by another session, but this will not happen for these tables).

Now, it is possible to specify the few tables where statistics must still be explicitly gathered, or whether to do this only if the current statistics on the table are stale. The DBA is probably the person best placed to decide whether and how to collect statistics on which tables, and these decisions can be implemented with the meta-data, but without code change.

Thus, it is possible to

  • Reduce the number of calls to dbms_stats,
  • to reduce the overhead of the remaining calls
  • and at least preserve, if not improve performance of batch processes without making any code changes.

Classic MetaLink vs. My Oracle Support

If you are in any way involved with supporting Oracle products, then you know that the death knell for Classic MetaLink has sounded. MetaLink will be unplugged at the end of July 2009.

The new support site, My Oracle Support, seems to be causing some pain for quite a few people in the Oracle user community.
Some of the complaints regard limited platform support due to the Adobe Flash 9 requirements, navigation and response times.
On the other hand there are some cool new features such as Power View, Configuration Manager and the new Advanced Search options.
How do you feel about it?
Here's a chance to let your opinion be know as a poll has been created where you can vote on it.

At this time 637 voters have voiced their opinion about MetaLink and My Oracle Support.

Current Poll results can be found in this Excel File: MetaLink vs My Oracle Support Results

Oracle Terminated Connection Timeout

I have recently come across situations on two different PeopleSoft sites where ad-hoc queries continue to run on the Oracle database server long after the Application Server process, which is the Oracle session client, has terminated. Often, queries perform poorly because they are poorly coded, but that is another story. To help guard against this situation Oracle has mechanism called Terminated Connection Timeout (also known as Dead Connection Detection (DCD) when it was introduced in Net8).

Oracle Support Note 615782.1 explains the mechanism. "DCD is initiated on the server when a connection is established. At this time SQL*Net reads the SQL*Net parameter files and sets a timer to generate an alarm. The timer interval is set by providing a non-zero value in minutes for the SQLNET.EXPIRE_TIME parameter in the sqlnet.ora file on the database server side. When the timer expires, SQL*Net on the server sends a 'probe' packet, essentially an empty SQL*Net packet, to the client. If the client end of the connection is still active, the probe is discarded, and the timer mechanism is reset. If the client has terminated abnormally, the server will receive an error from the send call issued for the probe, and SQL*Net on the server will signal the operating system to release the connection's resources."

Thus, if a PeopleSoft operator initiates an ad-hoc query that runs on the PSQRYSRV server for longer than the ICQuery service time-out (default 1200 seconds), then Tuxedo will terminate and restart the busy server process. However, the query will continue run on the database server until the current fetch operation returns. In the case of a query performing a large sort or hash operation, it might be a long time before the first row is returned. All the while, the query is continuing to consume resources on the database.

PeopleTools 8.44, also introduced the ability to kill a query via the Query Monitor that had reached a maximum run time. This is one of the functions of the PSMONITORSRV server process (see Oracle Support Note 624339.1. The maximum run time is specified in a permission list (see Security Administration PeopleBook Permission List Query Profile) and then the ability to kill queries that have timed out can be enabled or disabled system-wide (see PeopleSoft Query PeopleBook Query Administration. It will kill the application server process that submitted the query, but for the same reasons, the query may continue to run on the database.

Thus, setting Terminated Connection Timeout is not merely a good idea for a PeopleSoft system running on Oracle, it is effectively mandatory. Otherwise. some PeopleSoft functionality simply won't work as intended.

What is an appropriate value for SQLNET.EXPIRE_TIME?

The value for this parameter is the time between successive SQL*Net probes sent by the Oracle shadow server process to the client. Setting it is a balance between the maximum time that a query can be left to consume resources after a client process terminates, against the additional overhead of every client process sending a probe every few minutes.

The SQL*Net documents often talk about additional network traffic generated by DCD. This was a consideration in the past on client-server applications that ran across a wide area network. However, it is rarely a consideration in relatively modern systems such as PeopleTools 8, the database connections are made by Application Server and Process Scheduler, which are usually physically close to the database server.

The time-out can be set independently of any of the other time-outs for the Application Server and Web Server. Documents on Metalink often suggest 5 or 10 minutes, and I don't think that is unreasonable.

My thanks to Colin Kilpatrick who prompted me to look at this again.

Oracle Terminated Connection Timeout

I have recently come across situations on two different PeopleSoft sites where ad-hoc queries continue to run on the Oracle database server long after the Application Server process, which is the Oracle session client, has terminated. Often, queries perform poorly because they are poorly coded, but that is another story. To help guard against this situation Oracle has mechanism called Terminated Connection Timeout (also known as Dead Connection Detection (DCD) when it was introduced in Net8).

Oracle Support Note 615782.1 explains the mechanism. "DCD is initiated on the server when a connection is established. At this time SQL*Net reads the SQL*Net parameter files and sets a timer to generate an alarm. The timer interval is set by providing a non-zero value in minutes for the SQLNET.EXPIRE_TIME parameter in the sqlnet.ora file on the database server side. When the timer expires, SQL*Net on the server sends a 'probe' packet, essentially an empty SQL*Net packet, to the client. If the client end of the connection is still active, the probe is discarded, and the timer mechanism is reset. If the client has terminated abnormally, the server will receive an error from the send call issued for the probe, and SQL*Net on the server will signal the operating system to release the connection's resources."

Thus, if a PeopleSoft operator initiates an ad-hoc query that runs on the PSQRYSRV server for longer than the ICQuery service time-out (default 1200 seconds), then Tuxedo will terminate and restart the busy server process. However, the query will continue run on the database server until the current fetch operation returns. In the case of a query performing a large sort or hash operation, it might be a long time before the first row is returned. All the while, the query is continuing to consume resources on the database.

PeopleTools 8.44, also introduced the ability to kill a query via the Query Monitor that had reached a maximum run time. This is one of the functions of the PSMONITORSRV server process (see Oracle Support Note 624339.1. The maximum run time is specified in a permission list (see Security Administration PeopleBook Permission List Query Profile) and then the ability to kill queries that have timed out can be enabled or disabled system-wide (see PeopleSoft Query PeopleBook Query Administration. It will kill the application server process that submitted the query, but for the same reasons, the query may continue to run on the database.

Thus, setting Terminated Connection Timeout is not merely a good idea for a PeopleSoft system running on Oracle, it is effectively mandatory. Otherwise. some PeopleSoft functionality simply won't work as intended.

What is an appropriate value for SQLNET.EXPIRE_TIME?

The value for this parameter is the time between successive SQL*Net probes sent by the Oracle shadow server process to the client. Setting it is a balance between the maximum time that a query can be left to consume resources after a client process terminates, against the additional overhead of every client process sending a probe every few minutes.

The SQL*Net documents often talk about additional network traffic generated by DCD. This was a consideration in the past on client-server applications that ran across a wide area network. However, it is rarely a consideration in relatively modern systems such as PeopleTools 8, the database connections are made by Application Server and Process Scheduler, which are usually physically close to the database server.

The time-out can be set independently of any of the other time-outs for the Application Server and Web Server. Documents on Metalink often suggest 5 or 10 minutes, and I don't think that is unreasonable.

My thanks to Colin Kilpatrick who prompted me to look at this again.

Merge counter

A package to capture and report separate INSERT and UPDATE rowcounts from a MERGE statement. February 2004

my week in venn diagram form ...

(from thisisindexed.com)My take on this one has nothing to do with Bernie and everything to do with recent events on the big project. If I could add a small red 'x' in the middle of 'Doom' with an arrow and a 'You are Here' message, it would be perfect ...

SIOUG 2009 conference

I'll be presenting at the SIOUG 2009 conference that will take place in Portoroz, from 20th to 23rd September 2009.

The title of the presentation will be "CBO fundamentals: Understanding the different modes of System Statistics" and will be based on the series already available here on my blog.

So if you're interested in the details of the cost calculations performed by the recent releases of the Cost Based Optimizer and would like to here about this in a different style, this presentation might be something for you.

If you happen to be at the conference I'm looking forward to meeting you there.

Thanks to OakTable fellow Joze Senegacnik for inviting me to the conference.

Fundamentals of Software Performance Quick Reference Card

I just posted "Fundamentals of Software Performance Quick Reference Card" at the Method R company website:

This two-page quick reference card written by Cary Millsap sums up computer software performance the Method R way. The first page lists definitions of the terms you need to know: efficiency, knee, load, response time, and so on. The second page lists ten principles that are vital to your ability to think clearly about software performance. This document contains meaningful insight in a format that's compact enough to hang on your wall.

It's free, and there's no sign-up required. I hope you will enjoy it.

Oracle 10g Statistics History Retention in PeopleSoft

I have been working on a system where many Application Engine programs are running throughout the day, and are frequently collecting Optimizer statistics with the %UpdateStats macro on many working storage tables. Concurrent calls to dbms_stats are typical.

There are two new behaviours in Oracle 10g RDBMS that can in extreme cases, in combination with a system that calls dbms_stats very frequently, create a significant performance overhead.

From Oracle 10g, histograms may, by default, be collected automatically. That means that rows are concurrently deleted from and inserted into histgrm$ and hist_head$, leading to contention and consistent read.

  • Also from Oracle 10g, every time you collect statistics on a table the old statistics are retained in the SYS.WRI$_OPTSTAT%HISTORY tables. If histograms have previously been collected, these are also copied. DBMS_STATS has the additional overhead of writing this history. I found in excess of 10,000 versions of previous statistics for some tables, because the batch processes have updated statistics on working storage tables that many times.
  • dbms_stats also appears to be responsible for purging history older than the retention limit. The default retention period is 31 days. I have seen concurrent calls to dbms_stats blocked on row level locks on the statistics history tables. For me, this occurred 31 days after the system went live on a significantly increased volume.
  • #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
    SELECT dbms_stats.get_stats_history_retention FROM dual;
    GET_STATS_HISTORY_RETENTION 
    --------------------------- 
                             31

Statistics history was designed to work in conjunction with schema wide statistics jobs that only refreshed stale statistics. There is an option on gather_schema_stats to collect only statistics on tables where the current statistics are stale. However, there is no such option on gather_table_stats. If you have decided to call this procedure for a particular table, then it is assumed you know you need to refresh the statistics. However, by calling dbms_stats from a batch program you can end up calling it much more frequently than is really necessary.

Recommendations

  • Disable statistics history by using dbms_stats.alter_stats_history_retention to set the retention period to zero. Unfortunately this can only be set at database level. The statistics history is there in case you want to revert to a previous version of the statistics should a new set of statistics produce a problem, but it is only used rarely, and I think this is a necessary sacrifice.
  • #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
    EXECUTE dbms_stats.alter_stats_history_retention(retention=>0);
  • Use Oracle Optimizer Dynamic Sampling. However, I suggest increasing the level from the default of 2 to 4 to increase the situations in which it is used.
  • Introduce the new version of the PL/SQL wrapper package for dbms_stats so that you can specify the records for which statistics will be explicitly collected, and whether histograms are to be collect. Thus you can reduce the number of calls to dbms_stats.

If you have allowed the statistics history to grow before you disable history retention, then you might like to read John Hallas' posting on Purging statistics from the SYSAUX tablespace.

Oracle 10g Statistics History Retention in PeopleSoft

I have been working on a system where many Application Engine programs are running throughout the day, and are frequently collecting Optimizer statistics with the %UpdateStats macro on many working storage tables. Concurrent calls to dbms_stats are typical.

There are two new behaviours in Oracle 10g RDBMS that can in extreme cases, in combination with a system that calls dbms_stats very frequently, create a significant performance overhead.

From Oracle 10g, histograms may, by default, be collected automatically. That means that rows are concurrently deleted from and inserted into histgrm$ and hist_head$, leading to contention and consistent read.

  • Also from Oracle 10g, every time you collect statistics on a table the old statistics are retained in the SYS.WRI$_OPTSTAT%HISTORY tables. If histograms have previously been collected, these are also copied. DBMS_STATS has the additional overhead of writing this history. I found in excess of 10,000 versions of previous statistics for some tables, because the batch processes have updated statistics on working storage tables that many times.
  • dbms_stats also appears to be responsible for purging history older than the retention limit. The default retention period is 31 days. I have seen concurrent calls to dbms_stats blocked on row level locks on the statistics history tables. For me, this occurred 31 days after the system went live on a significantly increased volume.
  • #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
    SELECT dbms_stats.get_stats_history_retention FROM dual;
    GET_STATS_HISTORY_RETENTION 
    --------------------------- 
                             31

Statistics history was designed to work in conjunction with schema wide statistics jobs that only refreshed stale statistics. There is an option on gather_schema_stats to collect only statistics on tables where the current statistics are stale. However, there is no such option on gather_table_stats. If you have decided to call this procedure for a particular table, then it is assumed you know you need to refresh the statistics. However, by calling dbms_stats from a batch program you can end up calling it much more frequently than is really necessary.

Recommendations

  • Disable statistics history by using dbms_stats.alter_stats_history_retention to set the retention period to zero. Unfortunately this can only be set at database level. The statistics history is there in case you want to revert to a previous version of the statistics should a new set of statistics produce a problem, but it is only used rarely, and I think this is a necessary sacrifice.
  • #eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
    EXECUTE dbms_stats.alter_stats_history_retention(retention=>0);
  • Use Oracle Optimizer Dynamic Sampling. However, I suggest increasing the level from the default of 2 to 4 to increase the situations in which it is used.
  • Introduce the new version of the PL/SQL wrapper package for dbms_stats so that you can specify the records for which statistics will be explicitly collected, and whether histograms are to be collect. Thus you can reduce the number of calls to dbms_stats.

If you have allowed the statistics history to grow before you disable history retention, then you might like to read John Hallas' posting on Purging statistics from the SYSAUX tablespace.