Exadata gets its performance by letting the storage (the exadata storage server) participate in query processing, which means part of the processing is done as close as possible to where the data is stored. The participation of the storage server in query processing means that a storage grid can massively parallel (depending on the amount of storage servers participating) process a smart scan request.
This post is about database writer (dbwr, mostly seen as dbw0 nowadays) IO.
The testenvironment in which I made the measurements in this post: Linux X64 OL6u3, Oracle 22.214.171.124 (no BP), Clusterware 126.96.36.199, ASM, all database files in ASM. The test environment is a (VMWare Fusion) VM, with 2 CPU’s.
It might be a good idea to read my previous blog about logwriter IO.
The number of database writers is depended on the number of CPU’s visible to the instance (when not explicitly set with the DB_WRITER_PROCESSES parameter), and seems mostly to be CEIL(CPU_COUNT/8). There might be other things which could influence the number (NUMA comes to mind). In my case, I’ve got 2 CPU’s visible, which means I got one database writer (dbw0).
This post is about log writer (lgwr) IO.
It’s good to point out the environment on which I do my testing:
Linux X64 OL6u3, Oracle 188.8.131.52 (no BP), Clusterware 184.108.40.206, ASM, all database files in ASM.
In order to look at what the logwriter is doing, a 10046 trace of the lgwr at level 8 gives an overview.
A way of doing so is using oradebug. Be very careful about using oradebug on production environments, it can/may cause the instance to crash.
This is how I did it:
SYS@v11203 AS SYSDBA> oradebug setospid 2491 Oracle pid: 11, Unix process pid: 2491, image: email@example.com (LGWR) SYS@v11203 AS SYSDBA> oradebug unlimit Statement processed. SYS@v11203 AS SYSDBA> oradebug event 10046 trace name context forever, level 8 Statement processed.
Of course 2491 is the Linux process id of the log writer, as is visible with “image”.
This is a small note describing how Oracle implemented the situation which is covered by the db file parallel read wait event. This events happens if Oracle knows it must read multiple blocks which are not adjacent (thus from different random files and locations), and cannot continue processing with the result of a single block. In other words: if it cannot process something after reading a single block (otherwise Oracle will read a single block visible by the wait ‘db file sequential read’).
This is how it shows up if you enable sql trace:
This is part 2 of a number of blogposts about huge Oracle database IO’s.
If you landed on this blogpost and did not read part 1, please read part 1 here.
In part 1 I showed how database IOs of a full table scan could be bigger than 1MB by increasing the db_file_multiblock_read_count parameter to a number beyond 1MB expressed in Oracle blocks. These bigger IOs only happen with direct path reads, not with buffered multiblock reads.
But how much bigger can these IOs be? In part 1 I showed Oracle IOs of 1020 blocks. Is that the limit? To investigate this, I created a much bigger table (table T2 in part 1 had a maximum extent size of 1024 blocks, which meant that the 1020 is the biggest IO possible from this table).
For the sake of this investigation I created a much bigger table to get larger extents:
It’s been a while since I presented the first incarnation of my ‘about multiblock reads’ presentation. When I did this at the UKOUG TEBS conference in Birmingham in 2011, Christian Antognini chaired my presentation. After my presentation Christian showed me it’s possible to set the parameter ‘db_file_multiblock_read_count’ higher than 1MB/db_block_size (which is 128 if your blocksize is 8kB), and you could benefit from it if your hardware is sufficient. In fact, Christian showed me AWR reports (could also be statspack reports, not sure) which showed the benefit.
My understanding of the parameter db_file_multiblock_read_count at the time was:
The maximum value is the operating system’s maximum I/O size expressed as Oracle blocks ((max I/O size)/DB_BLOCK_SIZE). If you set this parameter to a value greater than the maximum, Oracle uses the maximum.
This is yet another blogpost on Oracle’s direct path read feature which was introduced for non-parallel query processes in Oracle version 11.
For full table scans, a direct path read is done (according to my tests and current knowledge) when:
- The segment is bigger than 5 * _small_table_threshold.
- Less than 50% of the blocks of the table is already in the buffercache.
- Less than 25% of the blocks in the buffercache are dirty.
When an Oracle process starts executing a query and needs to do a full segment scan, it needs to make a decision if it’s going to use ‘blockmode’, which is the normal way of working on non-Exadata Oracle databases, where blocks are read from disk and processed by the Oracle foreground process, either “cached” (read from disk and put in the database buffercache) or “direct” (read from disk and put in the process’ PGA), or ‘offloaded mode’, where part of the execution is done by the cell server.
The code layer where the Oracle database process initiates the offloading is ‘kcfis’; an educated guess is Kernel Cache File Intelligent Storage. Does a “normal” alias non-Exadata database ever use the ‘kcfis’ layer? My first guess would be ‘no’, but we all know guessing takes you nowhere (right?). Let’s see if a “normal” database uses the ‘kcfis’ functions on a Linux x64 (OL 6.3) system with Oracle 220.127.116.11 64 bit using ASM.
With Exadata version 18.104.22.168.0 came the Unbreakable Linux Kernel for Exadata, which had been the stock EL5 redhat kernel prior to this version (2.6.18). With the unbreakable kernel came the opportunity to run the perf utility. This utility has the opportunity to see which functions are active inside an executable when there’s a symbol table. And the oracle database executable has a symbol table! One reason to do this, is to get a more granular overview of what the Oracle database is doing than the wait interface, especially to get a more detailed overview of what the database is doing in what is visible in the wait interface as ‘on cpu’.