Quick note about Jonathan Lewis trip to Dallas: Jonathan Lewis will be presenting two day seminar on two topics, “Beating the Oracle Optimizer” (June 28) and “Troubleshooting and tuning” (June 29th).
The event will be held June 28-29, 2012 at SMU-in-Legacy in Plano, TX.
This is a must-attend event for experienced DBAs and Developers. Especially, if you are planning to upgrade your database/application in the near-future or if you are in the middle of an upgrade, you must attend these two seminars. This seminar series provide enormous value resolving complex Production performance issues.
Click Here for details.
This is a quick note about reverse path filtering and impact of that feature to RAC. I encountered an interesting problem recently with a client and it is worth blogging about it, with a strong hope that it might help one of you in the future.
Environment is 184.108.40.206 GI, Linux 5.6. In a 3 node cluster, Grid Infrastructure (GI) comes up cleanly in just one node, but never comes up in other nodes. If we shutdown GI in first node, we can start the GI in second node with no issues. Meaning, GI can be up in just one node at any time.
System Admins indicated that there are no major changes, only few bug fixes. Seemingly, problem started after those bug fixes. But there were few other changes to the environment /init.ora parameter change etc. So, the problem was not immediately attributable to just OS changes.
Let’s first discuss how RAC traffic works before continuing. Environment for the discussion is: 2 node cluster with 8K database block size, UDP protocol is used for cache fusion. (BTW, UDP and RDS protocols are supported in UNIX platform; whereas Windows uses TCP protocol).
UDP protocol, fragmentation, and assembly
UDP Protocol is an higher level protocol stack, and it is implemented over IP Protocol ( UDP/IP). Cache Fusion uses UDP protocol to send packets over the wire (Exadata uses RDS protocol though).
We know that database blocks are transferred between the nodes through the interconnect, aka cache fusion traffic. Common misconception is that packet transfer size is always database block size for block transfer (Of course, messages are smaller in size). That’s not entirely true. There is an optimization in the cache fusion code to reduce the packet size (and so reduces the bits transferred over the private network). Don’t confuse this note with Jumbo frames and MTU size, this note is independent of MTU setting.
If you are attending Collaborate 2012, you might be interested in my content-rich sessions below :
Session Number: 326
Session Title: SCAN, VIP, HAIP, and other RAC acronyms
Session Date/Time/Room: Tue, Apr 24, 2012 (10:45 AM – 11:45 AM) : Surf C
Session Number: 327
Session Title: Internals and Performance Boot Camp: Truss, pstack, pmap, and more
Session Date/Time/Room: Wed, Apr 25, 2012 (03:00 PM – 04:00 PM) : Palm A
Hope to see you there!
Update: I am uploading presentation files. Presentations are much more recent than the document
Last week (March 2012), I was conducting Advanced RAC Training online. During the class, I was recreating a ‘gc buffer busy’ waits to explain the concepts and methods to troubleshoot the issue.
Let’s define these events first. Event ‘gc buffer busy’ event means that a session is trying to access a buffer,but there is an open request for Global cache lock for that block already, and so, the session must wait for the GC lock request to complete before proceeding. This wait is instrumented as ‘gc buffer busy’ event.
From 11g onwards, this wait event is split in to ‘gc buffer busy acquire’ and ‘gc buffer busy release’. An attendee asked me to show the differentiation between these two wait events. Fortunately, we had a problem with LGWR writes and we were able to inspect the waits with much clarity during the class.
Temporary tablespaces are shared objects and they are associated to an user or whole database (using default temporary tablespace). So, in RAC, temporary tablespaces are shared between the instances. Many temporary tablespaces can be created in a database, but all of those temporary tablespaces are shared between the instances. Hence, temporary tablespaces must be allocated in shared storage or ASM. We will explore the space allocation in temporary tablespace in RAC, in this blog entry.
In contrast, UNDO tablespaces are owned by an instance and all transactions from that instance is exclusively allocated in that UNDO tablespace. Remember that other instances can read blocks from remote undo tablespace, and so, undo tablespaces also must be allocated from shared storage or ASM.
Space allocation in TEMP tablespace
There was a question about the wait event ‘rdbms ipc message’ in Oracle-l list. Short answer is that ‘rdbms ipc message’ event means that a process is waiting for an IPC message to arrive. Usually, this wait event can be ignored, but there are few rare scenarios this wait event can’t be completely ignored.
What is ‘rdbms ipc message’ wait means?
It is typical of Oracle Database background processes to wait for more work. For example, LGWR will wait for more work until another (foreground or background ) process request LGWR to do a log flush. In UNIX platforms, wait mechanism is implemented as a sleep on a specific semaphore associated with that process. This wait time is accounted towards database wait events ‘rdbms ipc message’.
Also note that, semaphore based waits are used in other wait scenarios too, not just ‘rdbms ipc message’ waits.
Time to Trace
I just uploaded my presentation materials for ‘Truss, pstack etc’ for HOTSOS 2012 symposium , a performance intensive conference, happening right here in my home town Dallas, TX.
I can’t believe, it is been ten years from the start of this annual conference! This is the tenth annual symposium and I have been presenting in this symposium for almost all years except few early years. Quality of presentations and quality of audience is very high in this symposium and many of the audience are repeat audience, almost this feels like an annual pilgrimage to “sanctum of performance”. If you are interested in learning the techniques and methods to debug and resolve performance issues in a correct way, you should definitely consider attending this symposium. To top it off, Jonathan Lewis is conducting Training Day this year.
I will be leaving to Denver in few days to talk about the following presentations in RMOUG 2012. Stop by and say hello to me if you intend to attend RMOUG training days.
My sessions in RMOUG 2012 are
Hope to see you there.