(the details are investigated and specific to Oracle’s database implementation on Linux x86_64)
Exadata IO: This event is not used with Exadata storage, ‘cell single block physical read’ is used instead.
Despite p3 listing the number of blocks, I haven’t seen a db file sequential read event that read more than one block ever. Of course this could change in a newer release.
At the Accenture Enkitec Group we have a couple of Exadata racks for Proof of Concepts (PoC), Performance validation, research and experimenting. This means the databases on the racks appear and vanish more than (should be) on an average customer Exadata rack (to be honest most people use a fixed few existing databases rather than creating and removing a database for every test).
Nevertheless we gotten in a situation where the /etc/oratab file was not in sync with the databases registered in the cluster registry. This situation can happen for a number reasons. For example, if you clone a database (RMAN duplicate), you end up with a cloned database (I sincerely hope), but this database needs to be manually registered in the cluster registry. This is the same with creating a standby database (for which one of the most used methods is to use the clone procedure with a couple of changes).
For some time now, I am using gdb to trace the inner working of the Oracle database. The reason for using gdb instead of systemtap or Oracle’s dtrace is the lack of user-level tracing with Linux. I am using this on Linux because most of my work is happening on Linux.
In order to see the same information with gdb on the system calls of Oracle as strace, there’s the Oracle debug info repository. This requires a bit of explanation. When strace is used on a process doing IO that Oracle executes asynchronous, the IO calls as seen with strace look something like this:
When you are administering an Exadata or more Exadata’s, you probably have multiple databases running on different database or “computing” nodes. In order to understand what kind of IO you are doing, you can look inside the statistics of your database, and look in the data dictionary what that instance or instances (in case of RAC) have been doing. When using Exadata there is a near 100% chance you are using either normal redundancy or high redundancy, of which most people know the impact of the “write amplification” of both normal and high redundancy of ASM (the write statistics in the Oracle data dictionary do not reflect the additional writes needed to satisfy normal (#IO times 2) or high (#IO times 3) redundancy). This means there might be difference in IOs between what you measure or think for your database is doing, and actually is done at the storage level.
Exadata gets its performance by letting the storage (the exadata storage server) participate in query processing, which means part of the processing is done as close as possible to where the data is stored. The participation of the storage server in query processing means that a storage grid can massively parallel (depending on the amount of storage servers participating) process a smart scan request.
This post is about database writer (dbwr, mostly seen as dbw0 nowadays) IO.
The testenvironment in which I made the measurements in this post: Linux X64 OL6u3, Oracle 184.108.40.206 (no BP), Clusterware 220.127.116.11, ASM, all database files in ASM. The test environment is a (VMWare Fusion) VM, with 2 CPU’s.
It might be a good idea to read my previous blog about logwriter IO.
The number of database writers is depended on the number of CPU’s visible to the instance (when not explicitly set with the DB_WRITER_PROCESSES parameter), and seems mostly to be CEIL(CPU_COUNT/8). There might be other things which could influence the number (NUMA comes to mind). In my case, I’ve got 2 CPU’s visible, which means I got one database writer (dbw0).
This post is about log writer (lgwr) IO.
It’s good to point out the environment on which I do my testing:
Linux X64 OL6u3, Oracle 18.104.22.168 (no BP), Clusterware 22.214.171.124, ASM, all database files in ASM.
In order to look at what the logwriter is doing, a 10046 trace of the lgwr at level 8 gives an overview.
A way of doing so is using oradebug. Be very careful about using oradebug on production environments, it can/may cause the instance to crash.
This is how I did it:
SYS@v11203 AS SYSDBA> oradebug setospid 2491 Oracle pid: 11, Unix process pid: 2491, image: firstname.lastname@example.org (LGWR) SYS@v11203 AS SYSDBA> oradebug unlimit Statement processed. SYS@v11203 AS SYSDBA> oradebug event 10046 trace name context forever, level 8 Statement processed.
Of course 2491 is the Linux process id of the log writer, as is visible with “image”.
This is a small note describing how Oracle implemented the situation which is covered by the db file parallel read wait event. This events happens if Oracle knows it must read multiple blocks which are not adjacent (thus from different random files and locations), and cannot continue processing with the result of a single block. In other words: if it cannot process something after reading a single block (otherwise Oracle will read a single block visible by the wait ‘db file sequential read’).
This is how it shows up if you enable sql trace:
This is part 2 of a number of blogposts about huge Oracle database IO’s.
If you landed on this blogpost and did not read part 1, please read part 1 here.
In part 1 I showed how database IOs of a full table scan could be bigger than 1MB by increasing the db_file_multiblock_read_count parameter to a number beyond 1MB expressed in Oracle blocks. These bigger IOs only happen with direct path reads, not with buffered multiblock reads.
But how much bigger can these IOs be? In part 1 I showed Oracle IOs of 1020 blocks. Is that the limit? To investigate this, I created a much bigger table (table T2 in part 1 had a maximum extent size of 1024 blocks, which meant that the 1020 is the biggest IO possible from this table).
For the sake of this investigation I created a much bigger table to get larger extents: