Followers of the blog will know I dig virtualization. I first ran Oracle in virtualized environments over a decade ago.
In my current company there is a strong virtualization presence in the Windows space. Pretty much all Windows servers, including those running MS SQL Server, are actually VMs running on a VMware farm. The UNIX/Linux side is a little different. Most stuff is still done on physical boxes and what little virtualization is done, uses CentOS and KVM for freebie open source solutions.
There are a lot of architectural changes going on at the moment and I’ve been pushing *very hard* for a switch to the virtual infrastructure (VI) for all our middle tier servers and a few of our databases. It is looking very likely (but not guaranteed) that this will happen.
I’ve recently realized that I didn’t post anywhere the second version of my presentation — Under The Hood of Oracle Clusterware 2.0: Grid Infrastructure, codenamed UTHOC2. I think it would be very useful as I still see lots of questions being asked and UTHOC1 covers Oracle RAC 10g and 11gR1 only. 11g Release 2 brought many changes in the clusterware and the slides needed some good refresh.
Update 7-May-2013: Almost 100 people filled in the survey and here are the result:
It’s very easy to get a lot of information from an AWR (or Statspack) report – provided you remember what all the numbers represent. From time to time I find that someone asks me a question about some statistic and my mind goes completely blank about the exact interpretation; but fortunately it’s always possible to cross check because so many of the statistics are cross-linked. Here’s an example of a brief mental block I ran into a few days ago – I thought I knew the answer, but realised that I wasn’t 100% sure that my memory was correct:
In this Load Profile (for an AWR report of 60.25 minutes), what does that Transactions figure actually represent ?
The promised hacking session about Exadata performance troubleshooting will happen on Thursday 21st February 9am-11am PST (2 hours). I will show a number of examples and demos where the Exadata Snapper shows what was going on in storage cells when running Smart Scans or just doing IO.
Register here (free stuff!):
I will post the scripts on the day of this session.
I was recently involved with an upgrade project to go from 126.96.36.199 to 188.8.131.52 on an Exadata V2. We hit some snags during the upgrade specifically related to OEM 12c Cloud Control. We performed an out-of-place upgrade and OEM 184.108.40.206.0 had some difficulty in dealing with this.
12c Cloud Control is supposed to run a daily check which looks for new targets on each server. When it finds something new, it places this in a queue to wait for admin approval. With a single click you can promote the newly discovered target into an OEM managed object.
Here’s an example of a quick’n'dirty way of profiling stack traces on your command line. This is an example from Solaris (but the script should work on Linux too plus other Unixes with minor modifications).
I created a problem case below, based on a case I once troubleshooted at a client site. Note that they had set optimizer_mode = FIRST_ROWS in their database and the optimized came up with a very inefficient execution plan for the select from DBA_LOCK_INTERNAL view below:
SQL> ALTER SESSION SET optimizer_mode = FIRST_ROWS; Session altered. SQL> SET TIMING ON SQL> SELECT * FROM dba_lock_internal; ... ... the output data stripped ... ... 927 rows selected. Elapsed: 00:23:27.14
It took over 23 minutes to return 927 rows from DBA_LOCK_INTERNAL!
I ran Snapper to see where the time is spent then:
I’m working out my 2013 high-level plans and something dawned on me. I’m not sure I know what the term knowledge-sharing means any more.
That’s SQL Tuning Sets, optimizer_features_enable, and SQL Plan Management.
There’s a recent post on OTN describing an issue when using SQL Tuning Sets to enforce plan stability when upgrading from 10.2.0.3 to 220.127.116.11 – it doesn’t always work. Here’s a very simple model to demonstrate the type of thing that can happen (the tables are cloned from a completely different demo, so don’t ask why I picked the data they hold):
I was at a client site recently where one of the end-users seemed to have discovered a cunning strategy for optimising a critical SQL statement. His problem was that his query screen times out after 2 minutes, so any query he runs has to complete in less than two minutes or he doesn’t see the results. Unfortunately he had a particular query which took nearly 32 minutes from cold to complete – partly because it’s a seven-table join using ANSI OUTER joins, against tables ranging through the 10s of millions of rows and gigabytes of data – the (necessary) tablescan of the table that had to be first in the join order took 70 seconds alone.