Who's online

There are currently 0 users and 21 guests online.

Recent comments


January 2011

My Symposium Schedule

The Hotsos Symposium 2011 schedule has been published. On the one hand, it's one of my easiest to plan because there are only two streams which takes away the ridiculous choice of something like Openworld, on the other it can be extremely difficult to pick because the quality is uniformly high and the presentations all tend to be about performance. There are always some clashes I simply can't solve. Anyway, here's my best guess at the moment.

08:30 AM     Welcome and Keynote by Kerry Osborne
09:45 AM     Kerry Osborne: Tuning Exadata
11:00 AM     Cary Millsap: Thinking Clearly about Performance
01:00 PM     Toon Koppelaars: Thinking Clearly in SQL or Gerwin Hendriksen: GAPP Improvements
02:15 PM     Henry Poras: Determining Resource Utilization and Saturation Limits in a Multi-User, Mixed Workload Environment
03:30 PM     Andrey Nikolaev: Contemporary Latch Internals
04:45 PM     Shan Nawaz: Oracle's Forgotten Children or Stephan Haisley: Tentative: Streams and Golden Gate

08:30 AM     Margaret Norman: Moving to 11g Statistics
09:45 AM     Doug Burns: Statistics on Partitioned Objects
11:00 AM     Riyaj Shamsudeen: Advanced RAC Troubleshooting, Part I
01:00 PM     Riyaj Shamsudeen: Advanced RAC Troubleshooting, Part II
02:15 PM     Miladin Modrakovic: Oracle Diagnostic Events in 11g or Dan Fink: Variance Analysis, Profiling, and Trending using ASH and AWR
03:30 PM     Neil Gunther: Brooks, Cooks, and Response Time Scalability
04:45 PM     Alex Gorbachev: Database I/O Performance or Tim Gorman: Forensic Analysis using AWR and ASH

08:30 AM     Margaret Norman: How to Set Up Stable Parallel Execution or Karl Arao: Mining the AWR Repository
09:45 AM     Toon Koppelaars: "Triggers Considered Harmful," Considered Harmful or Gary Propeck: The CBO's Look at Clusters, IOTs, Partitions, and Other Objects
11:00 AM     Maria Colgan: Implement Best Practices for Extreme Performance with Oracle Data Warehousing
01:00 PM     Tanel Põder: Troubleshooting the Most Complex Oracle Performance Problem I've Ever Seen
02:15 PM     Tom Kyte: Just in Time

I think you'll agree that it looks terrific and it isn't false modesty to say that I do pinch myself sometimes that I make it on to the agenda for a conference like this.

Speaking engagements 2011

Despite my attempts to not travel (much) anymore, I’m still going to speak at two events in first half of 2011.

The first event is at the Suncoast Oracle User Group meeting in Tampa, FL on Thursday, 27th January. I’ll speak about Advanced Oracle Troubleshooting in 60 minutes.

I will deliver the second presentation at Hotsos Symposium 2011 near Dallas, TX on Wednesday 9th March. I will talk about Troubleshooting the Most Complex Oracle Performance Problem I’ve ever seen. If you think that this performance problem was complex, then be prepared to see an even worse one there!


Quiz Night

Here’s an execution plan pulled from v$sql_plan in with the “gather_plan_statistics” option enabled:

| Id  | Operation                    | Name   | Starts | A-Rows |   A-Time   | Buffers |
|   1 |  NESTED LOOPS                |        |      1 |      0 |00:00:00.01 |     608 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PARENT |      1 |    200 |00:00:00.01 |       6 |
|   3 |    INDEX RANGE SCAN          | PAR_I1 |      1 |    200 |00:00:00.01 |       2 |
|   4 |   TABLE ACCESS BY INDEX ROWID| CHILD  |    200 |      0 |00:00:00.01 |     602 |
|   5 |    INDEX UNIQUE SCAN         | CHI_I1 |    200 |    200 |00:00:00.01 |     402 |

The code defining the two tables is the same:

create table &m_table_name
	rownum					id,
	trunc(dbms_random.value(1,&m_rowcount))	n1,
	lpad(rownum,10,'0')			small_vc,
	rpad('x',100)				padding
	rownum <= &m_rowcount

The indexes par_i1 and chi_i1 are on the id column and, as you can see for chi_i1, have been created as unique indexes. Unlike the table, though, the code defining them isn’t necessarily identical.

Please state, with justification, what you think the blevel is for index chi_i1 ?

(Note – to avoid any risk of confusion, remember that height = blevel + 1; and just in case anyone thinks it makes any difference, the tables and indexes were all created in the same tablespace which is my usual setup of 1MB uniform extents, locally managed, 8K block size and freelist management.)

The quiz isn’t intended to be cunning, devious and tricksy, by the way. It’s all about observing and understanding a fairly ordinary situation. On the other hand I’m sure there are interesting variations with strange side effects that could be introduced if you really want to get clever.

Answer 23rd Jan:
Rather than writing my answer into a new post I’ve added it below – but to give new visitors a chance to work out the answer before seeing mine I’ve left a big gap before the answer.

Reading the comments I think all the interesting points have been covered. We have comments covering special treatment of root blocks in nested loop joins, the “buffer is pinned count” statistic, possible changes in 11g, the hypothesis that I had set a large pctfree for the child index,and the link between “A-rows” in one line of the plan and “Starts” in another line.

For reference, my query looked like this:

	parent	par,
	child	chi
where between 301 and 500
and = par.n1
and	chi.n1 = 0

The mechanics of the plan are:
Line 3 – index range scan of par_i1 (which finds ultimately 200 matching rowids).
Line 2 – for each rowid from line 3 check the matching table row (which ultimately returns 200 rows)

Then two hundred calls to:
Line 5 – index unique scan of chi_i1, which always finds it single row – hence returning 200 rowids in total
Line 4- check the matching row in the table – which always fails – hence returning zero rows in total

The critical counts that allow you to answer my question are the 402 and 602 buffer gets in lines 5 and 4. Ignoring, temporarily, the odd 2 these numbers are clearly 200 times something – which is rather nice given that we know that we are doing something 200 times. The simplest solution, of course is that the (602 – 402 =) 200 gets due to line 4 represents the visits to the 200 table blocks, leaving us with 400 (plus an odd bit) to account for.

Now consider the possibilities:

    Blevel = 0 – we would visit the only index block (which would be the root, but look like a leaf) then visit the table.
    Blevel = 1 – we would visit the root, the leaf which is the next layer down, then the table
    Blevel = 2 – we would visit the root, the branch level, the leaf, then the table.

So, at first sight we might decide the only way to get 400 buffer visits from the index is to have Blevel = 1, visit the root and the leaf. But that’s not the way it works (apart from a couple of versions which had a bug).

When running a nested loop join, Oracle “pins” the root block of the inner (second table) index, which means it attaches a structure to the “users’ list” in the buffer header that links the session’s state object directly to the buffer header. So, in my nested loop join, Oracle gets the root block once and keeps it pinnned, then gets branch and leaf blocks 200 times. The blevel on the child index is 2. (The blevel of the parent index is only 1 – I had set the pctfree on the child index to 90 to make it much larger than it needed to be.)

This “root block pin” isn’t the only pinning optimisation in the query, though. In detail, the steps are as follows – and this is something you would be table to see in the trace file if you enabled event 10200 (one of the ‘consistent reads’ traces).

    1) Get the root block of parent index and pin it
    2) Get the first relevant leaf block of the parent index and pin it
    3) Get the first relevant table block of the parent table and pin it.
    4) Get the root block of the child index and pin it
    5) Get the relevant branch block of the child index (no pin)
    6) Get the relevant leaf block of the child index (no pin)
    7) Get the table block of the child table (no pin) — row does not match final predicate and is rejected
    8) Revisit the parent index leaf block through the pin (add one to “buffer is pinned count”)
    9) Revisit the parent table block through the pin (add one to “buffer is pinned count”)
    10) Revisit the child index root block through the pin (add one to “buffer is pinned count”)
    11) Get a new child branch
    12) Get a new child leaf
    13) Get a new table block
    14) Repeat for a total of 200 cycles from step 8

There a couple of deviations from this cycle, of course. The rows I needed from parent were spread across 4 consecutive blocks in the table so, roughly every 50 rows from parent, line (9) above would become “release current pinned parent block, get new parent block and pin it”. The other little oddity that I can’t explain is that Oracle does “get” the child root block on the second visit to the index as well as the first visit – and then pins it from that moment onwards. So the counts are:

    Gets on the parent index 2 (root and leaf)
    Gets on the parent table 4 (one for each table block visited)
    Gets on the child index 402 (two on the root block, 200 on branch blocks, 200 on leaf blocks)
    Gets on the child table (200 for each row/block accessed)

Inevitably, things change – Oracle keeps getting smarter about things like ‘consistent gets – examination’, ‘buffer is pinned count’ and, in 11g, “fastpath” access to buffered blocks. This is a clue to the difference in gets that Charles Hooper recorded in – and explains why I chose to use with a unique index for my example. If you want to investigate other variations all it takes is snapshots of v$mystat, calls to “alter system flush buffer_cache”, and event 10200; the treedump can also be very helpful for identifying block addresses.

[Some notes which say a little more about gets and pins]

Query is Returning ORA-06502: Character String Buffer Too Small, Any Help for the OP?

January 21, 2011 I found an interesting SQL statement on the OTN forums today.  When executing the SQL statement Oracle Database returns the following error: ORA-00604: error occurred at recursive SQL level 1 ORA-06502: PL/SQL: numeric or value error: character string buffer too small ORA-06512: at line 12 ORA-00920: invalid relational operator 00604. 00000 - "error [...]

Describing Performance Improvements (Beware of Ratios)

Recently, I received into my Spam folder an ad claiming that a product could “...improve performance 1000%.” Claims in that format have bugged me for a long time, at least as far back as the 1990s, when some of the most popular Oracle “tips & techniques” books of the era used that format a lot to state claims.

Beware of claims worded like that.

Whenever I see “...improve performance 1000%,” I have to do extra work to decode what the author has encoded in his tidy numerical package with a percent-sign bow. The two performance improvement formulas that make sense to me are these:

Thank You for the Overwhelming Response to the 100 Series

A few weeks ago, while delivering a session at #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">New York Oracle Users Group, I had the idea of starting a blog series with a title 1#000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">00 Things You Probably Didn’t Know About Oracle Database. From the myriads of emails I have received so far, I know it has been quite successful in helping many readers. I was pleasantly surprised to hear from even some seasoned DBAs how it helped them. I would like to take this moment to talk about a few things.

In the initial installment, I put a self declared goal of producing an installment a week. I quickly, after the first installment itself, realized it was lofty, way lofty goal. I have a day job as a Database Architect where I research options, prepare project plans, estimate budgets, get approvals for the projects, jump in DBA operations, develop strategies of all terms, get them approved, chase everyone from the Sys Admin to the mailman to make sure the train chugs along. I also write articles, give training seminars, present technical sessions, review manuscripts of books (and occasionally write one), critique published books sent to me by publishers, do some mentoring, and yada yada yada. And, you see, I am just an average person with a family that needs me as well. This little “hobby”, however satisfying, faces a lot of competition.

What is the alternative? I could write shorter pieces or simply drop the code examples, figures, etc. which take up most of the time. On second thought, most of you actually like the articles because of these very elements - code examples and figures; and I don’t want to drop them.

Therefore, as much as it hurts me, I have to renege on the commitment of one installment a week. I would rather produce some quality content than just a bunch.

Second, I don’t have a plan, a set schedule or a “roadmap” for this series. I choose the next content based on what I get in the feedback. If you would like to see a specific topic covered, please drop me an #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">email, #000099; font-family: Arial; font-size: 11pt; font-style: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">tweet or put in the comment section here. Please be as specific as possible.

Third, some of the comments has been about additional questions which were probably not covered very well in the blog entry. I consider them to be success, rather that a failure of the writing. You see, a “complete” writing is an oxymoron; it does not exist. If article, blog or book does not generate ten other questions, then it fails. The objective of my series is never to be 100% comprehensive; instead it aims to build a foundation and generate additional interest. Some of the questions are planned to be answered in the subsequent installments; rest are left in the wishlist for the future articles.

Finally, an author is nothing without readers. From the bottom of my heart, I thank you, dear reader, for reading this. Life makes demands, which is perhaps more true for technologists like yourself; your decision to devote a slice of your life to this series is a conscious investment you made. I am honored and grateful for that.

Arup Nanda
Danbury CT, USA
January 20th, 2011
Twitter: arupnanda

Books and presentations

As you probably know that my first co-authored book Expert Oracle Practices was released in 2009. I have co-authored one more book Pro Oracle SQL with my esteemed colleagues. This books covers many aspects of better SQL development. Have fun reading :-)

I also will be presenting in few conferences in the upcoming months: RMOUG Training days 2011 , Hotsos symposium ’11, and IOUG Collob . Hopefully, I will see you in one of the conference (or all of the conferences)

Hotsos 2011

I will be talking about advanced RAC troubleshooting in Hotsos symposium ’11. Hotsos Symposium, conducted every March in Dallas, TX, is an intensive seminar series probing the deep waters of Performance related to Oracle Database.

There are many great speakers in this conference. It gives me a great pleasure to meet many folks that I have known for years, exchange ideas, and learn from industry leaders.

Hope to see you there!

Viewing Runtime Load Balancing Events

Yesterday I have run a benchmark on a 2 node RAC cluster (ProLiant BL685c G6 with 4 Six-Core AMD Opteron 8431) and 32G RAM each. It’s running Oracle Grid Infrastructure as well as an Oracle database on Oracle Enterprise Linux 5.5 64bit and device-mapper-multipath.

I was testing how the system would react under load but also wanted to see if the Runtime Load Balancing was working. The easiest way to check this is to view the AQ events that are generated for a service if AQ HA notifications is set to true. They can either be dequeued from the database as described in chapter 11 of Pro Oracle Database 11g RAC on Linux or alternatively queried from the database. The latter is the quicker method and this article will focus on it.

Before you can make use of Runtime Load Balancing you need to set at least 2 properties in your service:

  • Connection Load Balancing Goal (either SHORT or LONG)
  • Runtime Load Balancing Goal (SERVICE_TIME or THROUGHPUT)

.Net applications require AQ HA notifications to be set to true as these can’t directly make use of Fast Application Notification (FAN) events as said in the introduction. My JDBC application is fully capable of using the FAN events, however as you will see later I am using the AQ notifications anyway to view the events.

Connected as the owner of the Oracle binaries, I created a new service to make use of both instances:

$ srvctl add service -d TEST -s TESTSRV -r TEST1,TEST2 -P BASIC  \
> -l PRIMARY -y MANUAL -q true -x false -j short -B SERVICE_TIME \
> -e SESSION -m BASIC -z 0 -w 0

The service TESTSRV for database TEST has TEST1 and TEST2 as preferred instances, and the service should be started (manually) when the database is in the primary role. AQ Notifications are enabled, and I chose the connection load balancing goal to be “short” (usually ok with web applications and connection pooling) and a runtime load balancing goal of service time (should also be appropriate for many short transactions typical for a web environment). The remaining paramters define Transparent Application Failover. Please refer to the output of “srvctl add service -h” for more information about the command line parameters.

The result of this endavour can be viewed with srvctl config service:

$ srvctl config service -d TEST -s TESTSRV
Service name: TESTSRV
Service is enabled
Server pool: TEST_TESTSRV
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: MANUAL
DTP transaction: false
AQ HA notifications: true
Failover type: SESSION
Failover method: BASIC
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: SHORT
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: BASIC
Preferred instances: TEST1,TEST2
Available instances:

So to begin with I created the order entry schema (SOE) in preparation of a swingbench run. (I know that Swingbench’s Order Entry is probably not the best benchmark out there but my client knows and likes it). Once about 10G of data were generated I started a swingbench run with 300 users, and reasonably low think time (min transaction time 20ms and max of 60ms). The connect string was //

A query against gv$session showed an even balance of sessions, which was good:

select count(inst_id), inst_id
 from gv$session
where username = 'SOE'
group by inst_id

However, whatever I did I couldn’t get the Runtime Load Balancing in sys.sys$service_metrics_tab to chanage. They always looked like this (column user_data):

{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} }

That sort of made sense as none of the nodes broke into a sweat-the system was > 50% idle with a load average of about 12. So that wouldn’t cut it. Instead of trying to experiment with the Swingbench parameters, I decided to revert back to the silly CPU burner: a while loop which generates random numbers. I wasn’t interested in I/O at this stage, and created this minimal script:

$ cat dothis.sql
 n number;
 while (true) loop
 n:= dbms_random.random();
 end loop;

A simple for loop can be used to start the load test:

$ for i in $(seq 30); do
> sqlplus soe/ @dothis &

This created an even load on both nodes. I then started another 20 sessions on node1 against TEST1 to trigger the change in behaviour. And fair enough, the top few lines of “top” revealed the difference. The output for node 1 was as follows:

top - 10:59:30 up 1 day, 21:16,  6 users,  load average: 42.44, 20.23, 10.07
Tasks: 593 total,  48 running, 545 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.9%us,  0.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32960688k total, 11978912k used, 20981776k free,   441260k buffers
Swap: 16777208k total,        0k used, 16777208k free,  8078336k cached

Whereas node 2 was relatively idle.

top - 10:59:22 up 5 days, 17:45,  4 users,  load average: 15.80, 10.53, 5.74
Tasks: 631 total,  16 running, 605 sleeping,  10 stopped,   0 zombie
Cpu(s): 58.8%us,  0.6%sy,  0.0%ni, 40.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32960688k total, 11770080k used, 21190608k free,   376672k buffers
Swap: 16777208k total,        0k used, 16777208k free,  7599496k cached

Would that imbalance finally make a difference? It did, as the user_data column (truncated here for better readability) reveals:

SQL> select user_data
2  from sys.sys$service_metrics_tab
3  order by enq_time desc;

{instance=TEST1 percent=4 flag=GOOD aff=TRUE}{instance=TEST2 percent=96 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:01:16')
{instance=TEST1 percent=6 flag=GOOD aff=TRUE}{instance=TEST2 percent=94 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:00:46')
{instance=TEST1 percent=10 flag=GOOD aff=TRUE}{instance=TEST2 percent=90 flag=GOOD aff=TRUE} } timestamp=2011-01-20 11:00:16')
{instance=TEST1 percent=18 flag=GOOD aff=TRUE}{instance=TEST2 percent=82 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:59:46')
{instance=TEST1 percent=28 flag=GOOD aff=TRUE}{instance=TEST2 percent=72 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:59:16')
{instance=TEST1 percent=35 flag=GOOD aff=TRUE}{instance=TEST2 percent=65 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:58:46')
{instance=TEST1 percent=40 flag=GOOD aff=TRUE}{instance=TEST2 percent=60 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:58:16')
{instance=TEST1 percent=43 flag=GOOD aff=TRUE}{instance=TEST2 percent=57 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:57:46')
{instance=TEST1 percent=44 flag=GOOD aff=TRUE}{instance=TEST2 percent=56 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:57:16')
{instance=TEST1 percent=48 flag=GOOD aff=TRUE}{instance=TEST2 percent=52 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:56:46')
{instance=TEST1 percent=49 flag=GOOD aff=TRUE}{instance=TEST2 percent=51 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:56:16')
{instance=TEST1 percent=50 flag=GOOD aff=TRUE}{instance=TEST2 percent=50 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:55:46')
{instance=TEST1 percent=50 flag=GOOD aff=TRUE}{instance=TEST2 percent=50 flag=GOOD aff=TRUE} } timestamp=2011-01-20 10:55:16')
{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} } timestamp=2011-01-20 10:54:46')
{instance=TEST1 percent=50 flag=UNKNOWN aff=TRUE}{instance=TEST2 percent=50 flag=UNKNOWN aff=TRUE} } timestamp=2011-01-20 10:54:16')

Where it was initially even at 50-50 it soon became imbalanced, and TEST2 would be preferred after a few minutes in the test.  So everything was working as expected, I just didn’t manage to put enough load on the system intially.