I was just watching John Rauser’s keynote “What is a Career in Big Data?” from last weeks Strata Conference New York and I have to say it’s an amazing talk. I would highly recommended it to anyone who does any type of data analysis, including any type of performance analysis.
I found many of the “critical skill” points John made to have a strong correlation to performance analysis work as well. Some quotations that really stand out to me:
“[writing]…it’s the first major difference between mediocrity and greatness.” [10:39]
“If it isn’t written down, it never happened…if your writing is so opaque that people can not understand your work, then you may as well never have never done it.” [10:50]
If you run a query that is supposed to return one row from a large table, and there’s a suitable index in place you would probably expect the optimizer to identify and use the index. If you change the query to return all the data (without sorting) from the table you would probably expect the optimizer to choose a full tablescan.
This leads to a very simple idea that is often overlooked:
Sometimes it takes just one extra row to switch a plan from an indexed access to a full tablescan.
There has to be a point in our thought experiment where the optimizer changes from the “one row” indexed access to the “all the rows” tablescan.
If you’re lucky and the optimizer’s model is perfect there won’t be any significant difference in performance, of course. But we aren’t often that lucky, which is why people end up asking the question: “How come the plan suddenly went bad, nothing changed … except for a little bit of extra data?” All is takes is one row (that the optimizer knows about) to change from one plan to another – and sometimes the optimizer works out the wrong moment for making the change.
In my professional opinion, if your performance work is to have any credibility then you need to agree with and abide by this philosophy. I think it’s really what separates the real experts from the amateurs.
“Measurement is the first step that leads to control and eventually to improvement. If you can’t measure something, you can’t understand it. If you can’t understand it, you can’t control it. If you can’t control it, you can’t improve it.” – Dr. H. James Harrington
Discuss.name="googleone_share_1" style="position:relative;z-index:5;float: right; margin-left: 10px;">
Paraphrasing Yogi Berra:
If you’re wondering why it’s worth remembering this odd comment – it addresses the (commonly asked) question:
The answer is: yes.
When a session is creating redo change vectors it doesn’t know whether it is going to commit or rollback. But a session has to be able to store an arbitrarily large list of change vectors somewhere, and that list has to appear in the redo log (ideally “instantly”) if the session commits – so Oracle avoids delays on commit by putting the change vectors into the redo log as they are created***.
If you view the question from the opposite extreme, the recovery mechanism has to be able to deal with uncommitted data anyway because there are, after all, several scenarios where data that definitely was committed cannot be recovered; for example, recovery until end of log file 9998 because log file 9999 was destroyed and simply doesn’t exist – how can the code handle transactions that were not committed until part way through file 9999 if it only knows how to handle committed transactions ?)
*** Not strictly true from 10g onwards where Oracle introduced a delaying effect aimed at reducing competition for the redo allocation and redo copy latches for “small” transactions.
If you see a comment like “X is a bad idea” this does not mean “some mechanism that is vaguely ‘not X’ is a good idea”.
If, for example, I say:
that is absolutely not the same as saying
If this were a purely mathematical world we could invoke symbolic logic and point out:
(A => B) <=> (¬B => ¬A)
which means my statement is equivalent to:
Of course, being Oracle, you may find that someone, somewhere, has exactly such a histogram that appears to work brilliantly for them – but that will be because the optimizer has messed up the arithmetic so much that they are getting a great execution plan for completely the wrong reason … so they need to watch out for the next upgrade or patch release in case the optimizer gets enhanced.
Here’s a useful description I heard recently from philosopher Daniel Dennett:
The canons of good spin:
It seems to describe a lot of the stuff that our industry publishes on the internet.
The English language is full of irregular verbs, for example: I am hypothesising about possible explanations You are guessing He’s talking rubbish [Back to Philosophy 10] Filed under: humour, Philosophy
The most significant question to ask when thinking about adding a new index: “Will the index eliminate significantly more work than it introduces (at the moments when it really matters) ?” A few examples of “moments that matter”: Bulk housekeeping Highly concurrent OLTP activity Frequent high-precision reporting Acceptance testing for side effects [Back to Philosophy [...]