This is part 2 of a number of blogposts about huge Oracle database IO’s.
If you landed on this blogpost and did not read part 1, please read part 1 here.
In part 1 I showed how database IOs of a full table scan could be bigger than 1MB by increasing the db_file_multiblock_read_count parameter to a number beyond 1MB expressed in Oracle blocks. These bigger IOs only happen with direct path reads, not with buffered multiblock reads.
But how much bigger can these IOs be? In part 1 I showed Oracle IOs of 1020 blocks. Is that the limit? To investigate this, I created a much bigger table (table T2 in part 1 had a maximum extent size of 1024 blocks, which meant that the 1020 is the biggest IO possible from this table).
For the sake of this investigation I created a much bigger table to get larger extents:
It’s been a while since I presented the first incarnation of my ‘about multiblock reads’ presentation. When I did this at the UKOUG TEBS conference in Birmingham in 2011, Christian Antognini chaired my presentation. After my presentation Christian showed me it’s possible to set the parameter ‘db_file_multiblock_read_count’ higher than 1MB/db_block_size (which is 128 if your blocksize is 8kB), and you could benefit from it if your hardware is sufficient. In fact, Christian showed me AWR reports (could also be statspack reports, not sure) which showed the benefit.
My understanding of the parameter db_file_multiblock_read_count at the time was:
The maximum value is the operating system’s maximum I/O size expressed as Oracle blocks ((max I/O size)/DB_BLOCK_SIZE). If you set this parameter to a value greater than the maximum, Oracle uses the maximum.
This is yet another blogpost on Oracle’s direct path read feature which was introduced for non-parallel query processes in Oracle version 11.
For full table scans, a direct path read is done (according to my tests and current knowledge) when:
- The segment is bigger than 5 * _small_table_threshold.
- Less than 50% of the blocks of the table is already in the buffercache.
- Less than 25% of the blocks in the buffercache are dirty.
When an Oracle process starts executing a query and needs to do a full segment scan, it needs to make a decision if it’s going to use ‘blockmode’, which is the normal way of working on non-Exadata Oracle databases, where blocks are read from disk and processed by the Oracle foreground process, either “cached” (read from disk and put in the database buffercache) or “direct” (read from disk and put in the process’ PGA), or ‘offloaded mode’, where part of the execution is done by the cell server.
The code layer where the Oracle database process initiates the offloading is ‘kcfis’; an educated guess is Kernel Cache File Intelligent Storage. Does a “normal” alias non-Exadata database ever use the ‘kcfis’ layer? My first guess would be ‘no’, but we all know guessing takes you nowhere (right?). Let’s see if a “normal” database uses the ‘kcfis’ functions on a Linux x64 (OL 6.3) system with Oracle 22.214.171.124 64 bit using ASM.
With Exadata version 126.96.36.199.0 came the Unbreakable Linux Kernel for Exadata, which had been the stock EL5 redhat kernel prior to this version (2.6.18). With the unbreakable kernel came the opportunity to run the perf utility. This utility has the opportunity to see which functions are active inside an executable when there’s a symbol table. And the oracle database executable has a symbol table! One reason to do this, is to get a more granular overview of what the Oracle database is doing than the wait interface, especially to get a more detailed overview of what the database is doing in what is visible in the wait interface as ‘on cpu’.
In my previous post I touched the topic of a “new” codepath (codepath means “way of utilising the operating system”) for of full segment scans. Full segment scan means scanning a segment (I use “segment” because it could be a partition of a table or index) which is visible commonly visible in an explain plan (but not restricted to) as the rowsources “TABLE ACCESS FULL”, “FAST FULL INDEX SCAN” and “BITMAP FULL SCAN”.
Look at my presentation About multiblock reads to see how and when direct path reads kick in, and what the difference between the two is. Most notably, Oracle has released very little information about asynchronous direct path reads.
Recently I was discussing some IO related waits with some friends. The wait I was discussing was ‘kfk: async disk IO’. This wait was always visible in Oracle version 188.8.131.52 and seems to be gone in version 184.108.40.206 and above. Here is the result of some investigation into that.
First: the wait is not gone with version 220.127.116.11 and above, which is very simple to prove (this is a database version 18.104.22.168):
I guess the first question which comes to mind when reading this title is ‘Why’? For a database, but I guess for any IO depended application, we want IO’s to be faster, not throttle them, alias make them slower. Well, the ‘why’ is: if you want to investigate IO’s, you sometimes want them to slow down, so it’s easier to see them. Also, (not so) recent improvements in the Oracle database made great progress in being able to use the available bandwidth by doing IO in parallel, which could strip away much of the ability to see them in Oracle’s popular SQL trace.
Recently I patched an 22.214.171.124 grid infrastructure to an higher version. After the patching I started the grid infrastructure on that host, and ASM was unable to start. Looking in the alert.log file of the ASM instance it turned out that upon starting ASM, even before the contents of the pfile/spfile was displayed, the ASM crashed with the ORA-00600 error:
When you use Oracle ASM (Automatic Storage Management) for your database, the permissions on the block devices on the operating system layer which are used by ASM need to be changed. To be more precise, the owner and group need to be set to ‘oracle’ and ‘dba’ (Oracle documentation) in my case.
I used to do this in a very lazy way, using a simple ‘/bin/chown oracle.dba /dev/sdb’ in /etc/rc.local. This worked for me with RHEL/OL version 5. This has changed with RHEL/OL 6, because the system V startup system has changed to ‘upstart’. Also, the disk devices change ownership back in OL6 if you set it by hand to oracle.dba.