This is the fourth post on a serie of postings on how to get measurements out of the cell server, which is the storage layer of the Oracle Exadata database machine. Up until now, I have looked at the measurement of the kind of IOs Exadata receives, the latencies of the IOs as as done by the cell server, and the mechanism Exadata uses to overcome overloaded CPUs on the cell layer.
Exadata is about doing IO. I think if there’s one thing people know about Exadata, that’s it. Exadata brings (part of the) processing potentially closer to the storage media, which will be rotating disks for most (Exadata) users, and optionally can be flash.
When you are administering an Exadata or more Exadata’s, you probably have multiple databases running on different database or “computing” nodes. In order to understand what kind of IO you are doing, you can look inside the statistics of your database, and look in the data dictionary what that instance or instances (in case of RAC) have been doing. When using Exadata there is a near 100% chance you are using either normal redundancy or high redundancy, of which most people know the impact of the “write amplification” of both normal and high redundancy of ASM (the write statistics in the Oracle data dictionary do not reflect the additional writes needed to satisfy normal (#IO times 2) or high (#IO times 3) redundancy). This means there might be difference in IOs between what you measure or think for your database is doing, and actually is done at the storage level.
Exadata gets its performance by letting the storage (the exadata storage server) participate in query processing, which means part of the processing is done as close as possible to where the data is stored. The participation of the storage server in query processing means that a storage grid can massively parallel (depending on the amount of storage servers participating) process a smart scan request.
In an earlier post I’ve described how a distributed query can operate at a remote site if it’s a simple select but has to operate at the local site if it’s a CTAS (create as select) or insert as select. There’s (at least) one special case where this turns out to be untrue … provided you write the query in the correct fashion. I discovered this only as a result of doing a few experiments in response to a question on the OTN database forum.
Here’s a little demonstration, cut-n-pasted with a little cosmetic editing from an 11gR1 SQL*Plus session:
I have a fairly strong preference for choosing simple solutions over complex solutions, and using Oracle-supplied packaged over writing custom code – provided the difference in cost THere’(whether that’s in human effort, run-time resources, or licence fees) is acceptable. Sometimes, though, the gap between simplicity and cost is so extreme that a hand-crafted solution is clearly the better choice. Here’s an idea prompted by a recent visit to a site that makes use of materialized views and also happens to be licensed for the partitioning option.
While checking out potential scalability threats recently on a client system, I was directed to a time-critical task that was currently executing the same PL/SQL procedure 16 times (with different parameters) between 6:00 and 7:00 am; as the system went through its next phase of expansion the number of executions of this procedure was likely to grow. An interesting detail, though, was that nothing else was going on while the task was running so the machine (which had 6 cores) was running at 16% CPU.
An obvious strategy for handling the required growth target was to make sure that four (possibly 5) copies of the procedure were allowed to run concurrently. Fortunately the different executions were completely independent of each other and didn’t interfere with each other’s data, so the solution simply required a mechanism to control the parallelism. Conveniently 11gR2 gave us one.
A quick note, Expert Oracle RAC book co-written by me is available now: Expert Oracle RAC 12c. I have written about 6 chapters covering the RAC internals that you may want to learn :) I even managed to discuss the network internals in deep, after all, network is one of the most important component of a RAC cluster.
Here is a quick table comparison of different types of metrics views
The first line of the table is the classic wait event and statistic views. The following lines are the metric views. The metric views were introduced in Oracle 10g.
Metric views compute deltas and rates which hugely simplifying the ability to answer simple questions like “what is the I/O rate on my databases right now?” This question, before 10g, was surprisingly tedious to answer. To answer the question one would have to query v$sysstat for example: