How do you trouble-shoot a problem ? It’s not an easy question to answer when posed in this generic fashion; but perhaps it’s possible to help people trouble-shoot by doing some examples in front of them. (This is why I’ve got so many statspack/AWR examples – just reading a collection of different problems helps you to get into the right mental habit.)
So here’s a problem someone sent me yesterday. Since it only took a few seconds to read, and included a complete build for a test case, with results, and since it clearly displayed an Oracle bug, I took a look at it. (I’ve trimmed the test a little bit, there were a few more queries leading up to the error):
create table person (id number(2), name varchar2(10)) ; insert into person values (1, 'Alpha') ; insert into person values (2, 'Bravo') ; insert into person values (3, 'Charlie') ; insert into person values (4, 'Charles') ; insert into person values (5, 'Delta') ; create or replace view vtest as select id, 'C' as letter from person where name like 'C%' ; select p.id, p.name, v.id, v.letter from person p left join vtest v on v.id = p.id order by p.id ;
The problem was that 10.2.0.4 and 126.96.36.199 gave different results – and the 188.8.131.52 result was clearly wrong. So the question was: “is there something broken with outer joins on views, or possibly ANSI outer joins?” (The ansswer to the last question is always “probably” as far as I’m concerned, but I wouldn’t turn that into a “yes” without checking first.) Here are the two results:
10.2.0.4: ======== ID NAME ID L ---------- ---------- ---------- - 1 Alpha 2 Bravo 3 Charlie 3 C 4 Charles 4 C 5 Delta 184.108.40.206 ======== ID NAME ID L ---------- ---------- ---------- - 1 Alpha C 2 Bravo C 3 Charlie 3 C 4 Charles 4 C 5 Delta C
Clearly the extra ‘C’s in the letter column are wrong.
So what to do next ? Knowing that Oracle transforms ANSI SQL before evaluating an execution plan I decided to run the 10053 trace. Sometimes you get lucky and see the “unparsed SQL” in this trace file, a representation (though not necessarily 100% exact) image of the statement for which Oracle will generate a plan. I was lucky, this was the unparsed SQL (cosmetically enhanced):
SELECT P.ID ID, P.NAME NAME, PERSON.ID ID, CASE WHEN PERSON.ROWID IS NOT NULL THEN 'C' ELSE NULL END LETTER FROM TEST_USER.PERSON P, TEST_USER.PERSON PERSON WHERE PERSON.ID (+) = P.ID AND PERSON.NAME(+) LIKE 'C%' ORDER BY P.ID ;
So I ran this query, and found that the same error appeared – so it wasn’t about ANSI or views. So possibly it’s something about the CASE statement and/or the ROWID in the CASE statement, which I tested by adding three extra columns to the query:
person.name, person.rowid, CASE WHEN PERSON.name IS NOT NULL THEN 'C' ELSE NULL END LETTER
With these extra columns I got the following results from the query:
ID NAME ID NAME ROWID L L ---------- ---------- ---------- ---------- ------------------ - - 1 Alpha C 2 Bravo C 3 Charlie 3 Charlie AAAT7gAAEAAAAIjAAC C C 4 Charles 4 Charles AAAT7gAAEAAAAIjAAD C C 5 Delta C
So the CASE did the right thing with the person.name column, but the wrong thing with the person.rowid column.
Time to get onto MOS (Metalink).
I searched the bug database with the key words: case rowid null
This gave me 2,887 hits, so I added the expression (with the double quotes in place) “outer join”
This gave me 110 hits, so from the “product category” I pick “Oracle Database Products”
This gave me 80 hits, and the first one on the list was:
Bug 10269193: WRONG RESULTS WITH OUTER JOIN AND CASE EXPRESSION OPTIMIZATION CONTAINING ROWID
The text matched my problem, so job done – except it’s reported as not fixed until 12.1
This isn’t a nice bug, of course, because the particular problem can be generated automatically in the transformation of ANSI outer joins to Oracle outer joins, so you can’t just change the code.
In passing, it’s taken me 31 minutes to write this note – that’s 10 minutes longer than it took to pin down the bug, but I have to say I got lucky on two counts: first, that the “unparsed SQL” was available, second that my choice of key words for MOS got me to the bug so quickly (which is where I usually find I waste most time).
As I’m done with the book and back from a quick vacation (to Prague, which is an awesome place – well, at least during the summer) I promised (in Twitter) that now I’d start regularly writing blog articles again. In a separate tweet I asked what to write about. Among other requests (which I’ll write about later), one of the requests was to write something about enabling and reading SQL trace files…
I am probably not going to write (much) about SQL trace for a single reason – Cary Millsap has already written a paper so good about this topic, that there’s no point for me to try to repeat it (and my paper wouldn’t probably be as clear as Cary’s).
So, if you want to get the most out of SQL Trace, read Cary’s Mastering Performance with Extended SQL Trace paper:
The above link directs you to Method-R’s article index, as there’s a lot of other useful stuff to read there.
Wow, now I’m done with my first request – to write something about SQL Trace :-)
Earlier I did a presentation at VirtaThon which is the same topic that I presented at Hotsos 2011.. Mining the AWR and Capacity Planning are very dear to my heart and up until now I’m using every research I did on that presentation to work on an “Exadata Provisioning Tool” which I’m planning to present at the next Hotsos 2012… well, the only thing that’s different this time is.. my attendees are virtual geeks all over the world ;)
The crew at My Oracle Support (MOS) [@myoraclesupport] have an excellent starting point for troubleshooting Oracle Exadata. I’d recommend to add this one to your MOS bookmarks.
The excellent Madrid from the otn forums has a nice post here on a particular listener error caused by not following the install instructions precisely. I thought that I’d dig in a little further to illustrate the interplay between the O/S and our wonderful C program that is oracle.exe The message Hector got was [...]
One of my blog readers recently dropped me an email noting that he had noticed some cache buffers chains latch contention recently and successfully troubleshooted it with LatchProf. I asked if he’d like to blog about it and here’s the article:
Cache buffer chains latch contention typically shows up when some execution plans go bad, switching to nested loops or filter loops and revisiting the same table (or index) blocks very frequently…
I’ve said in the past that one of the best new features, in my view, in 11g was the appearance of proper virtual columns; and I’ve also been very keen on the new “approximate NDV” that makes it viable to collect stats with the “auto_sample_size”.
Who’d have guessed that if you put them both together, then ran a parallel stats collection it would break
The bug number Karen quotes (10013177.8) doesn’t (appear to) mention extended stats – but since virtual columns, function-based indexes, and extended stats share a number of implementation details I’d guess that they might be affected as well.
If you’ve troubleshooted (or tuned) RAC then you probably already know Riyaj Shamsudeen and his Orainternals blog & website (links below).
Anyway, since I started delivering my Advanced Oracle Troubleshooting classes some years ago, many people asked whether I would do a similar class for RAC. I had deliberately left out the RAC-specific stuff from my troubleshooting material, because it’s a very wide and complex topic and I feel like before trying to master RAC troubleshooting, you should master troubleshooting of regular single instance databases anyway. I realized that I didn’t have the time to build (and maintain) yet another set of trainig material, especially on so complex topic as RAC performance & troubleshooting.
So, having seen Riyaj’s impressive work and his presentations at various conferences, I asked whether he would be interested in building a RAC troubleshooting class, going from fundamentals to advanced topics – and he said yes. By now we are that far that I’m happy to announce the first Advanced RAC online seminars by Riyaj Shamsudeen (split across two weeks of online sessions, 4-hours per day, in end of august and september).
We initially called the seminar “Advanced RAC Troubleshooting” but then realized, that there are some closely related non-troubleshooting topics to be covered, like fundamental concepts, internals and also how to configure RAC for performance (so that you wouldn’t have to troubleshoot performance later :-)
We’ll use the same infrastructure and seminar philosophy as I do in my own online seminars, it’s just that this is Riyaj’s material and he will deliver it too.
You can read more about the seminar content, dates and sign up at the seminars page:
Riyaj’s website (articles, slides etc):
Let the RAC hacking begin! ;-)
In May I received the IOUG Select Journal Editor’s Choice Award for my Systematic Oracle Latch Contention Troubleshooting article where I introduced my LatchProfX tool for advanced drilldown into complex latch contention problems (thanks IOUG and John Kanagaraj!).
As the relevant IOUG webpage hasn’t been updated yet, I thought to delay this announcement until the update was done – but I just found an official enough announcement (press release) by accident from Reuters site:
The article itself is here:
Thanks to IOUG crew, John Kanagaraj and everyone else who has read, used my stuff and given feedback! :-)