This is the second part of a series of blogpost on Oracle database PGA usage. See the first part here. The first part described SGA and PGA usage, their distinction (SGA being static, PGA being variable), the problem (no limitation for PGA allocations outside of sort, hash and bitmap memory), a resolution for Oracle 12 (PGA_AGGREGATE_LIMIT), and some specifics about that (it doesn’t look like a very hard limit).
But this leaves out Oracle version 11.2. In reality, the vast majority of the database that I deal with at the time of writing is at version 11.2, and my guess is that this is not just the databases I deal with, but a general tendency. This could change in the coming time with the desupport of Oracle 11.2, however I suspect the installed base of Oracle version 12 to increase gradually and smoothly instead of in a big bang.
This post is about memory management on the operating system level of an Oracle database. The first question that might pop in your head is: isn’t this a solved problem? The answer is: yes, if you use Oracle’s AMM (Automatic Memory Management) feature, which let’s you set a limit for the Oracle datababase’s two main memory area’s: SGA and PGA. But in my opinion any serious, real life, usage of an Oracle database on Linux will be (severely) constrained in performance because of the lack of huge pages with AMM, and I personally witnessed very strange behaviour and process deaths with the AMM feature and high demand for memory.
I posted a fair amount of stuff on how Oracle is generating IOs, and especially large IOs, meaning more than one Oracle block, so > 8KB. This is typically what is happening when the Oracle database is executing a row source which does a full segment scan. Let’s start off with a quiz: what you think Oracle is the maximum IO size the Oracle engine is capable of requesting of the Operating System (so the IO size as can be seen at the SCI (system call interface) layer? If you made up your answer, remember it, and read on!
The real intention of this blogpost is to describe what is going on in the Oracle database kernel, but also what is being done in the Linux kernel. Being a performance specialised Oracle DBA means you have to understand what the operating system does. I often see that it’s of the utmost importance to understand how an IO ends up as a request at the NAS or SAN head, so you understand what a storage admin is talking about.
(the details are investigated and specific to Oracle’s database implementation on Linux x86_64)
Exadata IO: This event is not used with Exadata storage, ‘cell single block physical read’ is used instead.
Despite p3 listing the number of blocks, I haven’t seen a db file sequential read event that read more than one block ever. Of course this could change in a newer release.
At the Accenture Enkitec Group we have a couple of Exadata racks for Proof of Concepts (PoC), Performance validation, research and experimenting. This means the databases on the racks appear and vanish more than (should be) on an average customer Exadata rack (to be honest most people use a fixed few existing databases rather than creating and removing a database for every test).
Nevertheless we gotten in a situation where the /etc/oratab file was not in sync with the databases registered in the cluster registry. This situation can happen for a number reasons. For example, if you clone a database (RMAN duplicate), you end up with a cloned database (I sincerely hope), but this database needs to be manually registered in the cluster registry. This is the same with creating a standby database (for which one of the most used methods is to use the clone procedure with a couple of changes).
For some time now, I am using gdb to trace the inner working of the Oracle database. The reason for using gdb instead of systemtap or Oracle’s dtrace is the lack of user-level tracing with Linux. I am using this on Linux because most of my work is happening on Linux.
In order to see the same information with gdb on the system calls of Oracle as strace, there’s the Oracle debug info repository. This requires a bit of explanation. When strace is used on a process doing IO that Oracle executes asynchronous, the IO calls as seen with strace look something like this:
When you are administering an Exadata or more Exadata’s, you probably have multiple databases running on different database or “computing” nodes. In order to understand what kind of IO you are doing, you can look inside the statistics of your database, and look in the data dictionary what that instance or instances (in case of RAC) have been doing. When using Exadata there is a near 100% chance you are using either normal redundancy or high redundancy, of which most people know the impact of the “write amplification” of both normal and high redundancy of ASM (the write statistics in the Oracle data dictionary do not reflect the additional writes needed to satisfy normal (#IO times 2) or high (#IO times 3) redundancy). This means there might be difference in IOs between what you measure or think for your database is doing, and actually is done at the storage level.
Exadata gets its performance by letting the storage (the exadata storage server) participate in query processing, which means part of the processing is done as close as possible to where the data is stored. The participation of the storage server in query processing means that a storage grid can massively parallel (depending on the amount of storage servers participating) process a smart scan request.
This post is about database writer (dbwr, mostly seen as dbw0 nowadays) IO.
The testenvironment in which I made the measurements in this post: Linux X64 OL6u3, Oracle 184.108.40.206 (no BP), Clusterware 220.127.116.11, ASM, all database files in ASM. The test environment is a (VMWare Fusion) VM, with 2 CPU’s.
It might be a good idea to read my previous blog about logwriter IO.
The number of database writers is depended on the number of CPU’s visible to the instance (when not explicitly set with the DB_WRITER_PROCESSES parameter), and seems mostly to be CEIL(CPU_COUNT/8). There might be other things which could influence the number (NUMA comes to mind). In my case, I’ve got 2 CPU’s visible, which means I got one database writer (dbw0).
This post is about log writer (lgwr) IO.
It’s good to point out the environment on which I do my testing:
Linux X64 OL6u3, Oracle 18.104.22.168 (no BP), Clusterware 22.214.171.124, ASM, all database files in ASM.
In order to look at what the logwriter is doing, a 10046 trace of the lgwr at level 8 gives an overview.
A way of doing so is using oradebug. Be very careful about using oradebug on production environments, it can/may cause the instance to crash.
This is how I did it:
SYS@v11203 AS SYSDBA> oradebug setospid 2491 Oracle pid: 11, Unix process pid: 2491, image: firstname.lastname@example.org (LGWR) SYS@v11203 AS SYSDBA> oradebug unlimit Statement processed. SYS@v11203 AS SYSDBA> oradebug event 10046 trace name context forever, level 8 Statement processed.
Of course 2491 is the Linux process id of the log writer, as is visible with “image”.