Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Oracle Security Training Manuals For Sale

I had a reason today to go to our company storage for something today and whilst moving other things around to find what I needed I discovered two A4 boxes with printed manuals for some of our recent training classes....[Read More]

Posted by Pete On 05/04/17 At 02:22 PM

12.2 Some Cool Partitioning New Features (Big Wheels)

I previously discussed just how easy it is to convert online a non-partitioned table to be partitioned with Oracle Database 12.2. Thought I might run through a number of really cool new partitioning features and capabilities that were also introduced in 12.2. To start, I’m just going to create a basic range-partitioning table and populate […]

A performance deep dive into column encryption

Actually, this is a follow up post from my performance deep dive into tablespace encryption. After having investigated how tablespace encryption works, this blogpost is looking at the other encryption option, column encryption. A conclusion that can be shared upfront is that despite they basically perform the same function, the implementation and performance consequences are quite different.

Column encryption gives you the ability to choose to encrypt per individual column, that’s kind of obvious. However, having to choose which columns to encrypt is what I see as the biggest downside of this encryption option. In most cases, especially with boxed applications, it is quite hard to try to figure out which columns you exactly want to encrypt in order to protect your sensitive data. Which columns do exactly contain your primary sensitive data, and which columns do contain secondary sensitive data (data derived from sensitive data). Do you, when you have to apply encryption, know what EXACTLY is defined as sensitive data, and what isn’t? I bet there isn’t a clear technical description.

A logical reaction then would be ‘couldn’t I then just encrypt all columns’? Well, that is what tablespace encryption is for, isn’t it? To summarise this: I do think the correct use of column encryption in reality is hard to implement and this very limited in usefulness, in most cases tablespace encryption should be used.

Okay…for this test I created a table with two columns, of which one is encrypted:

SQL> create table column_encryption (id number, a varchar2(10) encrypt);
SQL> insert into column_encryption values (1, 'AAAAAAAAAA');
SQL> commit;

The same table, but without encryption:

SQL> create table no_column_encryption (id number, a varchar2(10) );
SQL> insert into no_column_encryption values (1, 'AAAAAAAAAA');
SQL> commit;

And the same table with a lot of rows:

SQL> create table column_encryption_large (id number, a varchar2(10) encrypt);
SQL> begin
 	     for counter in 1..32000000 loop
 		     insert into column_encryption_large values ( counter, dbms_random.string('l',10) );
 	     end loop;
 end;
/

Let’s follow the path of the previous TDE post, and profile the execution of a SQL on the big table to see the impact of column encryption. The first test is a ‘select count(*) from column_encryption_large’ in one session, and ‘perf record -g -p PID’ in another. If you need more explanation on how to run it, please look at the previous blogpost. This is the output of ‘perf report –sort comm –max-stack 2’:

# perf report --sort comm --max-stack 2
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 1K of event 'cycles'
# Event count (approx.): 1418165467
#
# Children      Self  Command
# ........  ........  ...............
#
   100.00%   100.00%  oracle_6919_aob
            |--29.21%-- kdstf00000010000100kmP
            |--12.58%-- kdbulk
            |--3.32%-- gup_pte_range
            |--2.58%-- kdst_fetch0
            |--2.54%-- kcbgtcr
            |--2.25%-- __blk_bios_map_sg
            |--2.21%-- kcbhvbo
            |--2.18%-- unlock_page
            |--1.98%-- ktrgcm
            |--1.93%-- do_direct_IO
            |--1.86%-- kcbldrget
            |--1.52%-- kcoapl

This shows IO related functions, both Oracle and operating system level; kdstf is kernel data scan table full for example, gup_pte_range, do_direct_IO, unlock_page and __blk_bios_map_sg are Linux kernel functions. Most notably there are no encryption related functions, which is a big difference with tablespace encryption!
This is actually very logical if you understand the differences between column encryption and tablespace encryption. First let’s look at a block dump from a data block from segment in an encrypted tablespace:

Block dump from cache:
Dump of buffer cache at level 4 for pdb=0 tsn=5 rdba=907
Block dump from disk:
Encrypted block <5, 907> content will not be dumped. Dumping header only.
buffer tsn: 5 rdba: 0x0000038b (1024/907)
scn: 0x0.4e9af4 seq: 0x01 flg: 0x16 tail: 0x9af40601
frmt: 0x02 chkval: 0xf23a type: 0x06=trans data

Yes…you read that right: the block is encrypted, so it will not be dumped. Luckily, you can set the undocumented parameter “_sga_clear_dump” to true to make Oracle dump the block:

SQL> alter session set "_sga_clear_dump"=true;
SQL> alter system dump datafile 5 block 907;

This will make Oracle dump the block. The dump will show the decrypted version of the tablespace level encrypted block:

Block header dump:  0x0000038b
 Object id on Block? Y
 seg/obj: 0x17bc3  csc: 0x00.4e9aed  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x388 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0007.01d.000001d0  0x00000987.0390.27  --U-    1  fsc 0x0000.004e9af4
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x0000038b
data_block_dump,data header at 0x7f140f335374
===============
tsiz: 0x1f98
hsiz: 0x14
pbl: 0x7f140f335374
     76543210
flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f8a
avsp=0x1f76
tosp=0x1f76
0xe:pti[0]      nrow=1  offs=0
0x12:pri[0]     offs=0x1f8a
block_row_dump:
tab 0, row 0, @0x1f8a
tl: 14 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [10]  41 41 41 41 41 41 41 41 41 41
end_of_block_dump

For the count(*), there is no need to read the data, the only thing needed is to read the row directory to fetch the number of rows (row 19). However, to do that, the block must be decrypted.

Now look at a block dump of a column encrypted data block:

Block header dump:  0x0000032b
 Object id on Block? Y
 seg/obj: 0x1821d  csc: 0x00.676d7e  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x328 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.007.000078a9  0x00000117.2246.07  --U-    1  fsc 0x0000.00676d7f
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
bdba: 0x0000032b
data_block_dump,data header at 0x7f140f333264
===============
tsiz: 0x1f98
hsiz: 0x14
pbl: 0x7f140f333264
     76543210
flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f5d
avsp=0x1f49
tosp=0x1f49
0xe:pti[0]      nrow=1  offs=0
0x12:pri[0]     offs=0x1f5d
block_row_dump:
tab 0, row 0, @0x1f5d
tl: 59 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [52]
 fd e0 87 66 55 f7 e6 43 de be 31 f6 71 4f 7f 4e f1 75 fb 88 98 9d 13 ed 8e
 cb 69 02 bc 29 51 bd 21 ea 22 04 6b 70 e9 ec 01 9d d6 e4 5a 84 01 1d 90 b0
 e9 01
end_of_block_dump

The block and the row directory can be read normally without any need for decryption. The only thing encrypted is the column (“a”). That perfectly explains the absence of any functions that indicate decryption, because there isn’t any decryption taking place!

Now let’s rewrite the SQL to touch the data, and thus involve decryption: ‘select avg(length(a)) from column_encryption_large’. This way the row needs to be decrypted and read. This is how the output of a perf recording looks like:

# perf report --sort comm --max-stack 2
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 65K of event 'cycles'
# Event count (approx.): 229042607170
#
# Children      Self  Command
# ........  ........  ...............
#
   100.00%   100.00%  oracle_6919_aob
            |--24.73%-- ztchsh1h
            |--14.91%-- ztchsh1n
            |--6.10%-- y8_ExpandRijndaelKey
            |--5.90%-- ownGetReg
            |--5.50%-- __intel_ssse3_rep_memcpy
            |--4.99%-- ztchsh1f
            |--4.28%-- ztcxi
            |--2.60%-- ipp_is_GenuineIntel
            |--1.88%-- _intel_fast_memcpy
            |--1.74%-- _intel_fast_memcpy.P
            |--1.52%-- kspgip
            |--1.16%-- kgce_init

The functions starting with ‘ztc’ are probably related to security (“zecurity”), and also probably related to decryption. The function name “y8_ExpandRijndaelKey” is clearly related to encryption. When you look up the function address of “ownGetReg”, it’s close to the “y8_ExpandRijndaelKey” function. The last group of functions are memcpy related functions, that seems consistent with decrypting: moving data.

On the performance side, it’s clear that the majority of the time is spend in the functions ztchsh1h and ztchsh1n. In order to understand more about these functions, let’s expand the stack:

# perf report --sort comm
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 65K of event 'cycles'
# Event count (approx.): 229035032972
#
# Children      Self  Command
# ........  ........  ...............
#
   100.00%   100.00%  oracle_6919_aob
            |
            |--25.01%-- ztchsh1h
            |          |
            |          |--99.63%-- ztchsh1n
            |          |          |
            |          |          |--50.85%-- ztchsh1f
            |          |          |          ztchf
            |          |          |          ztcxf
            |          |          |          ztcx
            |          |          |          kztsmohmwl
            |          |          |          kztsmhmwl
            |          |          |          kzekmetc
            |          |          |          kzecsqen
            |          |          |          kzecctex
            |          |          |          evaopn2
            |          |          |          evaopn2
            |          |          |          qesaAggNonDistSS
            |          |          |          kdstf00001010000000km
            |          |          |          kdsttgr
            |          |          |          qertbFetch
            |          |          |          qergsFetch
            |          |          |          opifch2
            |          |          |          kpoal8
------------------------------------------------------
            |--14.90%-- ztchsh1n
            |          |
            |          |--85.25%-- ztchsh1f
            |          |          ztchf
            |          |          ztcxf
            |          |          ztcx
            |          |          kztsmohmwl
            |          |          kztsmhmwl
            |          |          kzekmetc
            |          |          kzecsqen
            |          |          kzecctex
            |          |          evaopn2
            |          |          evaopn2
            |          |          qesaAggNonDistSS
            |          |          kdstf00001010000000km
            |          |          kdsttgr
            |          |          qertbFetch
            |          |          qergsFetch
            |          |          opifch2
            |          |          kpoal8

I fetched the stack of the two functions in which the most time was spend. The most important thing to see is that the encryption now takes place as part of processing the fetched data (qesaAggNonDistSS probably has something to do with aggregating data, evaopn2 probably is a function to evaluate operands) rather than performing the (logical) IO; mind the absence of the kcbgtcr function.

The reason for doing the decryption during operand evaluation rather than during doing the IO is because the data is stored encrypted in the block, and thus also in the buffer cache. So during IO time, there is no need to decrypt anything. This also has another rather important consequence: for every row that has an encrypted column that is processed, decryption needs to take place. There does not seem to be any caching of the decrypted value for column encryption, which is logical from a security point of view, but has a severe performance consequence.

When doing a pin tools debugtrace on the above SQL for the processing of a single row (the table ‘column_encryption’, rather than ‘column_encryption_large’), applying the sed filters, and then grepping for a selective set of functions, this is how the processing looks like:

 | | | | > qergsFetch(0x294512030, 0x7f871c9fa2f0, ...)
 | | | | | > qeaeAvg(0x7f8717ce9968, 0xe, ...)
 | | | | | < qeaeAvg+0x000000000063 returns: 0  | | | | | > qertbFetch(0x294512178, 0x7f871ca08a68, ...)
 | | | | | | | | | | > kcbgtcr(0x7ffe2f9b3ae0, 0, ...)
 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > pread64@plt(0x100, 0x1f428c000, ...)
 | | | | | | | | | | < kcbgtcr+0x000000003221 returns: 0x1f428c014  | | | | | | | | | | | | | | > kcbgtcr(0x7ffe2f9b35d0, 0, ...)
 | | | | | | | | | | | | | | < kcbgtcr+0x0000000009a1 returns: 0x1f428c014  | | | | | | > kdsttgr(0x7f871c9f9918, 0, ...)
 | | | | | | | > kdstf00001010000000km(0x7f871c9f9918, 0, ...)
 | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | | | > kcbgtcr(0x7f871c9f9930, 0, ...)
 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > pread64@plt(0x100, 0x2b1115000, ...)
 | | | | | | | | | | | < kcbgtcr+0x000000003221 returns: 0x1e4aa6014
 | | | | | | | | | < kdst_fetch0+0x0000000004d0 returns: 0x1e4aa6076
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0x1e4aa6076  | | | | | | | | > qesaAggNonDistSS(0x7ffe2f9b45d0, 0x7fff, ...)
 | | | | | | | | | > evaopn2(0x294511ef0, 0x294512030, ...)
 | | | | | | | | | | > evaopn2(0x294511e68, 0x294512030, ...)
 | | | | | | | | | | | | | | | | | | | > ztchsh1n(0x7ffe2f9b1ef8, 0x11c4e8d0, ...)
 | | | | | | | | | | | | | | | | | | | > ztchsh1f(0x7ffe2f9b1ef8, 0x7ffe2f9b3100, ...)
 --> 168 times in total of ztchsh1n or ztchsh1f
 | | | | | | | | | | < evaopn2+0x0000000002dc returns: 0x7f871c9fa2c0  | | | | | | | | | | > evalen(0x294511ef0, 0x7f871c9fa2c0, ...)
 | | | | | | | | | | < evalen+0x000000000147 returns: 0x2
 | | | | | | | | | < evaopn2+0x0000000002dc returns: 0x7f871c9fa2d0  | | | | | | | | | > qeaeAvg(0x7f8717ce9968, 0xb, ...)
 | | | | | | | | | < qeaeAvg+0x000000000063 returns: 0x7f8717ce99c9
 | | | | | | | | < qesaAggNonDistSS+0x000000000193 returns: 0x7fff  | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | | | > kcbgtcr(0x7f871c9f9930, 0, ...)
 | | | | | | | | | | | < kcbgtcr+0x0000000009a1 returns: 0x1dec30014
 | | | | | | | | | < kdst_fetch0+0x0000000004d0 returns: 0x1dec30072
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0x1dec30072  | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | | | > kcbgtcr(0x7f871c9f9930, 0, ...)
 | | | | | | | | | | | < kcbgtcr+0x0000000009a1 returns: 0x1deca4014
 | | | | | | | | | < kdst_fetch0+0x0000000004d0 returns: 0x1deca4072
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0x1deca4072  | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | | | > kcbgtcr(0x7f871c9f9930, 0, ...)
 | | | | | | | | | | | < kcbgtcr+0x0000000009a1 returns: 0x1e4be0014
 | | | | | | | | | < kdst_fetch0+0x0000000004d0 returns: 0x1e4be0072
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0x1e4be0072  | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | | | > kcbgtcr(0x7f871c9f9930, 0, ...)
 | | | | | | | | | | | < kcbgtcr+0x0000000009a1 returns: 0x1dedb2014
 | | | | | | | | | < kdst_fetch0+0x0000000004d0 returns: 0x1dedb2072
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0x1dedb2072  | | | | | | | | > kdst_fetch(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | > kdst_fetch0(0x1, 0x7f871c9f9918, ...)
 | | | | | | | | | < kdst_fetch0+0x0000000011c9 returns: 0
 | | | | | | | | < kdst_fetch+0x000000000048 returns: 0
 | | | | | | | < kdstf00001010000000km+0x00000000035d returns: 0x7fff
 | | | | | | < kdsttgr+0x00000000085f returns: 0x7fff
 | | | | | < qertbFetch+0x000000001301 returns: 0x7fff  | | | | | > qeaeAvg(0x7f8717ce9968, 0x294511f78, ...)
 | | | | | < qeaeAvg+0x000000000063 returns: 0x2  | | | | | | > evaopn2(0x294511f78, 0, ...)
 | | | | | | < evaopn2+0x0000000002dc returns: 0x7f871c9fa2e0
 | | | | < qergsFetch+0x000000000f25 returns: 0

This is how the explain plan of the ‘select avg(length(a)) from column_encryption’ SQL:

----------------------------------------------------------------------------------------
| Id  | Operation	   | Name	       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |		       |       |       |     3 (100)|	       |
|   1 |  SORT AGGREGATE    |		       |     1 |    53 |	    |	       |
|   2 |   TABLE ACCESS FULL| COLUMN_ENCRYPTION |     1 |    53 |     3	 (0)| 00:00:01 |
----------------------------------------------------------------------------------------

If you look back to the grepped debugtrace, and the execution plan:
Line 1: the sort aggregate rowsource (qergsFetch).
Line 4: the table acces full (qertbFetch).
Line 5: here a logical read (kcbgtcr) is issued, and because the block didn’t exist in the cache, was physically read (line 6: pread64). This is the segment header, the “real” scan of data blocks has not been started yet.
Line 11: this is the ultra fast full table scan (kdstf00001010000000km). My guess is this function is a full table scan function with certain decisions hard coded, instead of choices made on runtime, so the in-CPU execution prediction runs into less branch mispredictions.
Line 12: this is the part of the full table scan for fetching (loading) the data (kdst_fetch). What is special here is that a multiblock read is done, the kcbgtcr function triggers a single physical read for multiple logical blocks, which are later fetched per block (kdst_fetch and kcbgtcr functions starting from line 32, 38, etc).
Line 19: this function executes row based functions and aggregates the results per block/fetch (qesaAggNonDistSS).
Line 20: as part of fetching the row and executing functions, the row value is evaluated first (evaopn2).
Line 21/22: here the column is decrypted (made visible by the ztchsh1n/ztchsh1f functions, not necessarily the decryption functions theirselves).
Line 26/29: here probably the length (evalen) and average (qeaeAvg) row functions are executed.
Line 32: the next block is processed, but no rows are found, and thus no need to execute rowsource (qe*) functions afterwards.

So, what do we know at this point regarding column encryption?
– Columns that are encrypted are stored encrypted in the block in the buffer cache.
– Which means they have to be decrypted every time the column values are read, which is different from tablespace encryption, for which a block is encrypted, and is decrypted whenever a block is read into the buffer cache.
– Functions related to column encryption specifically (different functions than seen with tablespace encryption) take roughly 40% of the time in my case.

Can the time spend on column decryption be optimised?
There are multiple ways you can change the way Oracle applies column encryption. There are four encryption types: 3DES168, AES128, AES192 and AES256. The default is AES192.
Here are query timings of doing a select avg(length(a)) from TABLE on my “large” table with 32 million rows:

3DES168 4:53
AES256 1:09
AES192 1:06
AES128 1:03

A way to optimise column encryption is to specify not to use an extra verification by specifying ‘nomac’ at the encryption definition of the column. This saves space (by default, extra space is used for the message abstract that is used by the verification for every column). These are the timings with the ‘nomac’ addition to disable encryption verification:

3DES168 3:59
AES256 0:26
AES192 0:23
AES128 0:22

This shows a significant reduction of time. However, if no encryption at all is applied to the column, the query timing is 0:03.

Internals background information
The functions ztchsh1n/ztchsh1 are related to verification (again, if you read the tablespace encryption blogpost, where the most time consuming functions were verification too). Once ‘nomac’ is specified with the encryption definition of the column, the ztchsh* function vanish, and the top time consuming functions are y8_ExpandRijndaelKey and ownGetReg, which clearly are directly related to decryption. The performance gain of ‘nomac’ is lesser with 3DES168 encryption.

Conclusion
I think tablespace encryption is the encryption method of choice for a normal implementation. In most cases it will be too much work to exactly figure out which columns to encrypt. If you still consider column encryption, you also should be aware that the column value is stored encrypted in the block and (as a consequence) in the cache. Every use of the encrypted column involves encryption or decryption, for which the overhead is significant, even with ‘nomac’ specified to disable (additional) verification.

Tagged: 3DES, AES, block dump, encryption, internals, oracle, security, trace

How to Perform a Security Audit of an Oracle Database Training in Athens, Greece

I will be teaching my two days class How to Perform a Security Audit of an Oracle Database in Athens, Greece on May 16th and 17th 2017 organised by Oracle University. This is a great class that helps you understand....[Read More]

Posted by Pete On 04/04/17 At 02:45 PM

12cR2 DBCA, Automatic Memory Management, and -databaseType

This post explains the following error encountered when creating a 12.2 database with DBCA:
[DBT-11211] The Automatic Memory Management option is not allowed when the total physical memory is greater than 4GB.
or when creating the database directly with the installer:
[INS-35178]The Automatic Memory Management option is not allowed when the total physical memory is greater than 4GB.
If you used Automatic Memory Management (AMM) you will have to think differently and size the SGA and PGA separately.

ASMM

Automatic Shared Memory Management, or ASMM is what you do when setting SGA_TARGET and not setting MEMORY_TARGET. Basically, you define the size of the SGA you want to allocate at startup and that will be available for the instance, most of it being buffer cache and shared pool. I’ll not go into the detail of SGA_TARGET and SGA_MAX_SIZE because on the most common platforms, all is allocated at instance startup. Then, in addition to this shared area used by all instance processes, each processes can allocate private memory, and you control this with PGA_AGGREGATE_TARGET.
The total size of SGA and PGA for all instances in a system must reside in physical memory for the simple reason that they are mostly used to avoid I/O (a large buffer cache avoids physical reads and optimizes physical writes, a large PGA avoids reads and writes to tempfiles).

AMM

Because you don’t always know how much to allocate to each (SGA and PGA) Oracle came with a feature where you define the whole MEMORY_TARGET, part of this will be dynamically allocated to SGA or PGA. This is called Automatic Memory Management (AMM). It’s a good idea on the paper: it is automatic, which means that you don’t have to think about it, and it is dynamic, which means that you don’t waste physical memory because of bad sizing.

But it is actually a bad idea when going to implementation, at least on the most common platforms.
SGA and PGA are different beasts that should not be put in the same cage:

  • SGA is big, static, shared, allocated once at startup
  • PGA is small chunks constantly allocated and deallocated, private to processes

First, it is not so easy because you have to size the /dev/shm correctly or you will get the following at startup:
ORA-00845: MEMORY_TARGET not supported on this system
In addition to that, because the whole memory is prepared to contain the whole SGA you see misleading numbers in ‘show sga’.

Second there are lot of bugs, resizing overhead, etc.

And finally, you cannot use large pages when you are in AMM, and in modern system (lot of RAM, lot of processes) having all processes mapping the SGA with small pages of 4k is a big overhead.

So, as long as you have more than few GB on a system, you should avoid AMM and set SGA_TARGET and PGA_AGGREGATE_TARGET independently. Forget MEMORY_TARGET. Forget /dev/shm. Forget also the following documentation at http://docs.oracle.com/database/122/ADMIN/managing-memory.htm#ADMIN00207 which mentions that Oracle recommends that you enable the method known as automatic memory management.
Actually, AMM is not recommended for systems with more than a few GB of physical memory, and most system have more than few GB of physical memory. If you try to use AMM on a system with less than 4GB you get a warning in 12cR1 and it is an error in 12cR2:
CaptureAMM002
I got this when trying to create a database with AMM on a system with more than 4GB of physical memory.

CaptureAMM001
This does not depend on the size of MEMORY_TARGET you choose, or the size of /dev/shm, but only the size of available physical memory:
[oracle@VM104 ~]$ df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 3.9G 0 3.9G 0% /dev/shm
 
[oracle@VM104 ~]$ free -h
total used free shared buff/cache available
Mem: 7.8G 755M 5.0G 776M 2.1G 6.2G
Swap: 411M 0B 411M

No choice: it is a hard stop

If you are not convinced, then please have a look at MOS Doc ID 2244817.1 which explains this decision:

  • It is not something new: DBCA used to give similar warning message but in 12.2.0.1 it is an error message
  • Reason behind: Because database creation fails some times and in some cases database wont be functional after some times

So, do you want to create a database which may not be functional after some times?

So, what size for SGA and PGA?

Then, if you were thinking that AMM was cool, your next question not is: what size to allocate to SGA and PGA?

Don’t panic.

You are in this situation because you have several GB of RAM. Current servers have lot of memory. You don’t have to size it to the near 100MB. Start with some values, run with it. Look at the performance and the memory advisors. Are you doing too much physical I/O on tables where you expect data to be in cache? Then increase the SGA, and maybe set a minimum for the buffer cache. Do you see lot of hard parse because your application runs lot of statements and procedures? Then increase the SGA and maybe set a minimum for the shared pool. Do you run lot of analytic queries that full scan tables and have to hash and sort huge amount of data? Then decrease the SGA and increase the PGA_AGGREGATE_TARGET.

Where to start?

If you don’t know where to start, look at the DBCA database types:

#-----------------------------------------------------------------------------
# Name : databaseType
# Datatype : String
# Description : used for memory distribution when memoryPercentage specified
# Valid values : MULTIPURPOSE|DATA_WAREHOUSING|OLTP
# Default value : MULTIPURPOSE
# Mandatory : NO
#-----------------------------------------------------------------------------

Those types define the ratio between SGA and PGA. Then why not start with what is recommended by Oracle?

I’ve created the 3 types of instances with the following:
dbca -silent -totalMemory 10000 -databaseType MULTIPURPOSE -generateScripts -scriptDest /tmp/MULT ...
dbca -silent -totalMemory 10000 -databaseType DATA_WAREHOUSING -generateScripts -scriptDest /tmp/DWHG ...
dbca -silent -totalMemory 10000 -databaseType OLTP -generateScripts -scriptDest /tmp/OLTP ...

And here are the settings generated by DBCA
$ grep target /tmp/*/init.ora
DWHG/init.ora:sga_target=6000m
DWHG/init.ora:pga_aggregate_target=4000m
MULT/init.ora:sga_target=7500m
MMULT/init.ora:pga_aggregate_target=2500m
OLTP/init.ora:sga_target=8000m
OLTP/init.ora:pga_aggregate_target=2000m

Here is the summary:

SGA PGA
OLTP 80% 20%
Multi-Purpose 75% 25%
Data Warehousing 60% 40%

(percentages are relative to eachother, here. Donc’ use 100% of physical memory for the Oracle instances because the system needs some memory as well)

This gives an idea where to start. Servers have lot of memory but you don’t have to use all of it. If you have a doubt, leave some free memory to be available for the filesystem cache. Usually, we recommend to used direct i/o (filesystemio_options=setall) to avoid the filesystem overhead. But when you start and want to lower the risks sub-sizing SGA or PGA, then you may prefer to keep that second level of cache (filesystemio_options=async) which uses all the physical memory available. This may improve the reads from tempfiles in case your PGA is too small. This is just an idea, not a recommendation.

So what?

If you have a server with more than few GB, then set SGA and PGA separately. Start with the ratios above, and then monitor performance and advisors. Physical servers today have at least 32GB. Even with a small VM with 1GB for my labs, I prefer to set them separately, because in that case I want to be sure to have a minimum size for buffer cache and shared pool. You may have lot of small VMs with 3GB and think about setting MEMORY_TARGET. But using large pages is a recommendation here because the hypervisor will have lot of memory to map, so ASMM is still the recommandation.

Once you know the size of all SGA, look at Hugepagesize in /proc/meminfo, set the number of hugepages in /etc/sysctl.conf, run sysctl -p and your instances will use available large pages for the SGA.

 

Cet article 12cR2 DBCA, Automatic Memory Management, and -databaseType est apparu en premier sur Blog dbi services.

Block Names

There are a number of tiny details that I can never remember when I’m sketching out models to test ideas, and one of those is the PL/SQL block name. Virtually every piece of PL/SQL I write ends up with variables which have one of two prefixes in their names “M_” or “G_” (for memory or global, respectively) but I probably ought to be formal than that, so here’s an example of labelling blocks – specifically, labelling anonymous blocks from SQL*Plus using a trivial and silly bit of code:


rem
rem     Script:         plsql_block_names.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2017
rem

create table t1
nologging
as
select * from all_objects
;


begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(object_id) nologging;

--
--        Anonymous PL/SQL block with label, and a couple of 
--        uses of the label as the qualifier for variables
--

<>
declare
        data_object_id  t1.data_object_id%type;
        object_id       t1.object_id%type;
begin
        select  data_object_id
        into    my_block.data_object_id
        from    t1
        where
                data_object_id is not null
        and     rownum = 1
        ;
        select  object_id
        into    my_block.object_id
        from    t1
        where
                data_object_id = my_block.data_object_id
        and     rownum = 1
        ;

        dbms_output.put_line('Object: ' || object_id || ' - ' || my_block.object_id);
        dbms_output.put_line('Data Object: ' || data_object_id || ' - ' || my_block.data_object_id);

end;
/

Object: 16 - 16
Data Object: 6 - 6

The important point, of course, is that with a qualified variable name you eliminate the risk of a problem that appears occasionally on the public fora where someone has lost track of duplicated variable names, or used (as I have above) a variable name that matches a column name, and doesn’t notice that a little further down the code Oracle has captured the “wrong” interpretation of the name.

You’ll note that the block name has to be surrounded by doubled angle brackets – and now that I’ve written about it I might actually remember that for a couple of weeks ;)

 

Footnote:

This isn’t the only use for labels, by the way, it’s just one that probably ought to be used more frequently in production code.

The other thing I can never remember is how to escape quote marks – so I invariably end up using the old “double the quotes” method when I want to quote quotes.

When automatic reoptimization plan is less efficient

11gR2 started to have the optimizer react at execution time when a misestimate is encountered. Then the next executions are re-optimized with more accurate estimation, derived from the execution statistics. This was called cardinality feedback. Unfortunately, in rare cases we had a fast execution plan with bad estimations, and better estimations lead to worse execution plan. This is rare, but even when 9999 queries are faster, the one that takes too long will gives a bad perception of this optimizer feature.
This feature has been improved in 12cR1 with new names: auto-reoptimization and statistics feedback. I’m showing an example here in 12.1.0.2 without adaptive statistics (the 12.2 backport) and I’ve also disabled adaptive plan because they show the wrong numbers (similar to what I described in this post). I’ll show that at one point, the re-optimization can go back to the initial plan if it was the best in execution time.

V$SQL

Basically, here is what happened: first execution was fast, but with actual number of rows far from the estimated ones. Auto-reoptimisation kicks in for next execution and get a new plan, but with longer execution time. Third execution is another re-optimization, leading to same bad plan. Finally starting at 4th execution, the time is back to reasonable and we see the same as the first plan is used:

SQL> select sql_id,child_number,plan_hash_value,is_reoptimizable,is_resolved_adaptive_plan,parse_calls,executions,elapsed_time/1e6
from v$sql where sql_id='b4rhzfw7d6vdp';
 
SQL_ID CHILD_NUMBER PLAN_HASH_VALUE I I PARSE_CALLS EXECUTIONS ELAPSED_TIME/1E6
------------- ------------ --------------- - - ----------- ---------- ----------------
b4rhzfw7d6vdp 0 1894156093 Y 1 1 .346571
b4rhzfw7d6vdp 1 955499861 Y 1 1 5.173733
b4rhzfw7d6vdp 2 955499861 Y 1 1 4.772258
b4rhzfw7d6vdp 3 1894156093 N 7 7 .5008

The scope of statistic feedback is not to get optimal execution from the first execution. This requires accurate statistics, static or dynamic, and SQL Plan Directives is a try to get that. Statistics feedback goal is to try to get a better plan rather than re-use one that is based on misestimates. But sometimes the better is the enemy of the good and we have an example here in child cursors 1 and 2. But the good thing is that finally we are back to acceptable execution time, with a final plan that can be re-used without re-optimization.

What surprised me here is that the final plan has the same hash value than the initial one. Is it a coincidence that different estimations gives the same plan? Or did the optimizer finally gave up to try to find better?

V$SQL_REOPTIMIZATION_HINTS

In 12c the statistics feedback are exposed in V$SQL_REOPTIMIZATION_HINTS.

SQL> select sql_id,child_number,hint_text,client_id,reparse from v$sql_reoptimization_hints where sql_id='b4rhzfw7d6vdp';
 
SQL_ID CHILD_NUMBER HINT_TEXT CLIENT_ID REPARSE
------------- ------------ ---------------------------------------------------------------------------------------------------- ---------- ----------
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" ROWS=1517.000000 ) 1 1
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" MIN=1517.000000 ) 1 1
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" TABLE "DM_FOLDER_R1"@"SEL$1" ROWS=1517.000000 ) 1 1
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 0 OPT_ESTIMATE (@"SEL$1" TABLE "DM_SYSOBJECT_R2"@"SEL$1" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" ROWS=1517.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" MIN=1517.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" TABLE "DM_FOLDER_R1"@"SEL$1" ROWS=1517.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$1" TABLE "DM_SYSOBJECT_R2"@"SEL$1" MIN=3.000000 ) 1 0
b4rhzfw7d6vdp 1 OPT_ESTIMATE (@"SEL$582FA660" QUERY_BLOCK ROWS=1491.000000 ) 1 1

The child cursor 0 was re-optimized to cursor 1 with different number of rows for “DM_FOLDER_R1″ and “DM_SYSOBJECT_R2″
The child cursor 1 has the same values, but an additional number of row correction for a query block.

But we don’t see anything about cursor 2. It was re-optimizable, and was actually re-optimized into cursor 3 but no statistics corrections are displayed here.

Trace

As it is a reproducible case, I’ve run the same while tracing 10046, 10053 and 10507 (level 512) to get all information about SQL execution, Optimiser compilation, and statistics feedback. For each child cursor, I’ll show the execution plan with estimated and actual number of rows (E-Rows and A-Rows) and then some interesting lines from the trace, mainly those returned by:
grep -E "KKSMEC|^atom_hint|^@"

Child cursor 0 – plan 1894156093 – 0.34 seconds

Plan hash value: 1894156093
----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 171 (100)| 1 |00:00:00.04 | 17679 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.04 | 17679 |
| 2 | NESTED LOOPS | | 1 | 1 | 116 (0)| 1491 |00:00:00.04 | 17679 |
| 3 | NESTED LOOPS | | 1 | 1 | 115 (0)| 1491 |00:00:00.04 | 17456 |
| 4 | NESTED LOOPS | | 1 | 49 | 17 (0)| 5648 |00:00:00.01 | 537 |
|* 5 | INDEX RANGE SCAN | D_1F0049A880000016 | 1 | 3 | 3 (0)| 1517 |00:00:00.01 | 13 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED | DM_SYSOBJECT_R | 1517 | 16 | 10 (0)| 5648 |00:00:00.01 | 524 |
|* 7 | INDEX RANGE SCAN | D_1F0049A880000010 | 1517 | 71 | 2 (0)| 5648 |00:00:00.01 | 249 |
|* 8 | TABLE ACCESS BY INDEX ROWID | DM_SYSOBJECT_S | 5648 | 1 | 2 (0)| 1491 |00:00:00.03 | 16919 |
|* 9 | INDEX UNIQUE SCAN | D_1F0049A880000108 | 5648 | 1 | 1 (0)| 1491 |00:00:00.03 | 15428 |
| 10 | NESTED LOOPS SEMI | | 5648 | 2 | 25 (0)| 1491 |00:00:00.02 | 14828 |
| 11 | NESTED LOOPS | | 5648 | 7 | 18 (0)| 2981 |00:00:00.02 | 12869 |
| 12 | TABLE ACCESS BY INDEX ROWID BATCHED| DM_SYSOBJECT_R | 5648 | 71 | 4 (0)| 2981 |00:00:00.01 | 7747 |
|* 13 | INDEX RANGE SCAN | D_1F0049A880000010 | 5648 | 16 | 3 (0)| 2981 |00:00:00.01 | 6145 |
|* 14 | TABLE ACCESS BY INDEX ROWID | DM_SYSOBJECT_S | 2981 | 1 | 2 (0)| 2981 |00:00:00.01 | 5122 |
|* 15 | INDEX UNIQUE SCAN | D_1F0049A880000108 | 2981 | 1 | 1 (0)| 2981 |00:00:00.01 | 2140 |
|* 16 | INDEX UNIQUE SCAN | D_1F0049A880000145 | 2981 | 52759 | 1 (0)| 1491 |00:00:00.01 | 1959 |
|* 17 | INDEX UNIQUE SCAN | D_1F0049A880000142 | 1491 | 1 | 1 (0)| 1491 |00:00:00.01 | 223 |
----------------------------------------------------------------------------------------------------------------------------------------

Because of low cardinality estimation of DM_SYSOBJECT_R predicate (E-Rows=3) the optimizer goes to NESTED LOOP. This plan has good execution time here because all blocks are in buffer cache. Reading 17679 blocks from buffer cache takes less than one second. It would have been much longer if those were physical I/O.

This is a case where the optimizer detects misestimate at execution time. Here is what is recorded in the trace:

Reparsing due to card est...
@=0x63a56820 type=3 nodeid=5 monitor=Y halias="DM_FOLDER_R1" loc="SEL$1" oname="SEL$F5BB74E1" act=1517 min=0 est=3 next=(nil)
Reparsing due to card est...
@=0x638fe2b0 type=5 nodeid=4 monitor=Y halias="" loc="SEL$F5BB74E1" onames="SEL$07BDC5B4"@"SEL$5" "SEL$2"@"SEL$5" act=5648 min=0 est=49 next=0x638fe250
Reparsing due to card est...
@=0x638fe4c0 type=5 nodeid=3 monitor=Y halias="" loc="SEL$F5BB74E1" onames="SEL$07BDC5B4"@"SEL$5" "SEL$2"@"SEL$5" "SEL$3"@"SEL$1" act=1491 min=0 est=1 next=0x638fe460
Reparsing due to card est...
@=0x638fe688 type=5 nodeid=2 monitor=Y halias="" loc="SEL$F5BB74E1" onames="SEL$07BDC5B4"@"SEL$5" "SEL$2"@"SEL$5" "SEL$3"@"SEL$1" "R_OBJECT_ID"@"SEL$1" act=1491 min=0 est=1 next=0x638fe5f8
kkocfbCheckCardEst [sql_id=b4rhzfw7d6vdp] reparse=y ecs=n efb=n ost=n fbs=n

Those are the misestimates which triggers re-optimization.

And here are all statistics feedback.

*********** Begin Dump Context (kkocfbCheckCardEst) [sql_id=b4rhzfw7d6vdp cpcnt=0] ***********
@=0x638fe688 type=5 nodeid=2 monitor=Y halias="" loc="SEL$F5BB74E1" onames="DM_FOLDER_R1"@"SEL$1" "DM_SYSOBJECT_R2"@"SEL$1" "TE_"@"SEL$2" "LJ_"@"SEL$2" act=1491 min=0 est=1 next=0x638fe5f8
@=0x638fe5f8 type=3 nodeid=17 monitor=Y halias="LJ_" loc="SEL$2" oname="D_1F0049A880000142" act=0 min=1 est=1 next=0x638fe4c0
@=0x638fe4c0 type=5 nodeid=3 monitor=Y halias="" loc="SEL$F5BB74E1" onames="DM_FOLDER_R1"@"SEL$1" "DM_SYSOBJECT_R2"@"SEL$1" "TE_"@"SEL$2" act=1491 min=0 est=1 next=0x638fe460
@=0x638fe460 type=1 nodeid=8 monitor=Y halias="TE_" loc="SEL$2" act=0 min=1 est=1 next=0x638fe3d0
@=0x638fe3d0 type=3 nodeid=9 monitor=Y halias="TE_" loc="SEL$2" oname="D_1F0049A880000108" act=0 min=1 est=1 next=0x638fe2b0
@=0x638fe2b0 type=5 nodeid=4 monitor=Y halias="" loc="SEL$F5BB74E1" onames="DM_FOLDER_R1"@"SEL$1" "DM_SYSOBJECT_R2"@"SEL$1" act=5648 min=0 est=49 next=0x638fe250
@=0x638fe250 type=1 nodeid=6 monitor=Y halias="DM_SYSOBJECT_R2" loc="SEL$1" act=3 min=1 est=16 next=0x638fe1c0
@=0x638fe1c0 type=3 nodeid=7 monitor=Y halias="DM_SYSOBJECT_R2" loc="SEL$1" oname="D_1F0049A880000010" act=3 min=1 est=71 next=0x63a56820
@=0x63a56820 type=3 nodeid=5 monitor=Y halias="DM_FOLDER_R1" loc="SEL$1" oname="D_1F0049A880000016" act=1517 min=0 est=3 next=(nil)
*********** End Dump Context ***********

We also see some information about execution performance:

kkoarCopyCtx: [sql_id=b4rhzfw7d6vdp] origin=CFB old=0x63a565d0 new=0x7fe74e2153f0 copyCnt=1 copyClient=y
**************************************************************
kkocfbCopyBestEst: Best Stats
Exec count: 1
CR gets: 17679
CU gets: 0
Disk Reads: 0
Disk Writes: 0
IO Read Requests: 0
IO Write Requests: 0
Bytes Read: 0
Bytes Written: 0
Bytes Exchanged with Storage: 0
Bytes Exchanged with Disk: 0
Bytes Simulated Read: 0
Bytes Simulated Returned: 0
Elapsed Time: 51 (ms)
CPU Time: 51 (ms)
User I/O Time: 15 (us)
*********** Begin Dump Context (kkocfbCopyBestEst) **********
*********** End Dump Context ***********

They are labeled as ‘Best Stats’ because we had only one execution at that time.

Finally, the hints are dumped:

******** Begin CFB Hints [sql_id=b4rhzfw7d6vdp] xsc=0x7fe74e215748 ********
Dumping Hints
=============
atom_hint=(@=0x7fe74e21ebf0 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" TABLE "DM_SYSOBJECT_R2"@"SEL$1" MIN=3.000000 ) )
atom_hint=(@=0x7fe74e21e758 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) )
atom_hint=(@=0x7fe74e21e3f0 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) )
atom_hint=(@=0x7fe74e21dfd0 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" ROWS=1517.000000 ) )
atom_hint=(@=0x7fe74e21dc68 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" MIN=1517.000000 ) )
atom_hint=(@=0x7fe74e21d8c8 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" TABLE "DM_FOLDER_R1"@"SEL$1" ROWS=1517.000000 ) )
********** End CFB Hints **********

Those are exactly what we see in V$SQL_REOPTIMIZATION_HINTS

This is all what we see for this first execution. The next execution starts with:

KKSMEC: Invalidating old cursor 0 with hash val = 1894156093
KKSMEC: Produced New cursor 1 with hash val = 955499861

As a consequence of child cursor 0 marked as reoptimizable, the next execution invalidates it and creates a new child cursor 1.

Child cursor 1 – new plan 955499861 – 5.17 seconds

Here is the new plan we see after that second execution:

Plan hash value: 955499861
------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 30996 (100)| 1 |00:00:04.58 | 102K| 101K| | | |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:04.58 | 102K| 101K| | | |
| 2 | VIEW | VM_NWVW_2 | 1 | 12039 | 30996 (1)| 1491 |00:00:04.58 | 102K| 101K| | | |
| 3 | HASH UNIQUE | | 1 | 12039 | 30996 (1)| 1491 |00:00:04.58 | 102K| 101K| 941K| 941K| 2597K (0)|
|* 4 | HASH JOIN RIGHT SEMI | | 1 | 12039 | 30490 (1)| 4132 |00:00:04.57 | 102K| 101K| 12M| 3867K| 14M (0)|
| 5 | TABLE ACCESS FULL | DM_DOCUMENT_S | 1 | 213K| 210 (1)| 213K|00:00:00.01 | 741 | 0 | | | |
|* 6 | HASH JOIN | | 1 | 36463 | 29665 (1)| 5622 |00:00:04.51 | 101K| 101K| 1405K| 1183K| 2026K (0)|
|* 7 | HASH JOIN | | 1 | 36463 | 18397 (1)| 5622 |00:00:02.23 | 65103 | 65050 | 940K| 940K| 1339K (0)|
|* 8 | HASH JOIN | | 1 | 2222 | 14489 (1)| 1499 |00:00:01.58 | 51413 | 51369 | 992K| 992K| 1377K (0)|
|* 9 | HASH JOIN | | 1 | 2222 | 14120 (1)| 1499 |00:00:01.46 | 50088 | 50057 | 3494K| 1598K| 4145K (0)|
|* 10 | TABLE ACCESS FULL | DM_SYSOBJECT_S | 1 | 39235 | 10003 (1)| 39235 |00:00:00.83 | 36385 | 36376 | | | |
|* 11 | HASH JOIN | | 1 | 24899 | 3920 (1)| 5648 |00:00:00.62 | 13703 | 13681 | 1199K| 1199K| 1344K (0)|
|* 12 | INDEX RANGE SCAN | D_1F0049A880000016 | 1 | 1517 | 12 (0)| 1517 |00:00:00.01 | 13 | 0 | | | |
|* 13 | TABLE ACCESS FULL| DM_SYSOBJECT_R | 1 | 646K| 3906 (1)| 646K|00:00:00.50 | 13690 | 13681 | | | |
| 14 | TABLE ACCESS FULL | DM_FOLDER_S | 1 | 431K| 367 (1)| 431K|00:00:00.04 | 1325 | 1312 | | | |
|* 15 | TABLE ACCESS FULL | DM_SYSOBJECT_R | 1 | 646K| 3906 (1)| 646K|00:00:00.51 | 13690 | 13681 | | | |
|* 16 | TABLE ACCESS FULL | DM_SYSOBJECT_S | 1 | 646K| 10000 (1)| 646K|00:00:02.14 | 36385 | 36376 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------
Note
-----
- statistics feedback used for this statement

The notes makes it clear that the estimations comes from previous run (statistics feedback) and we see that for most operations E-Rows = A-Rows. With those a new plan has been chosen, with complex view merging: VM_NWWM. You can find clues about those internal view names on Jonathan Lewis blog. Here probably because the estimated number of rows is high, the subquery has been unnested. It is an ‘EXISTS’ subquery, which is transformed to semi join and merged to apply a distinct at the end.

So, we have a different plan, which is supposed to be better because it has been costed with more accurate cardinalities. .The goal of this post is not to detail the reason why the execution time is longer with a ‘better’ plan. If you look at ‘Reads’ column you can see that the first one has read all blocks from buffer cache but second one had to do physical I/O for all. With nothing from buffer cache, reading 101K blocks in multiblock reads may be faster than reading 17679 so the optimizer decision was not bad. I’ll have to estimate if it is expected to have most of the blocks in buffer cache in real production life as behavior in UAT is different. Some people will stop here, say that cardinality feedback is bad, disable it or even set optimizer_cost_adj to get the nested loop, but things are more complex than that.

The important thing is that the optimizer doesn’t stop there and compares the new execution statistics with the previous one.

**************************************************************
kkocfbCompareExecStats : Current
Exec count: 1
CR gets: 102226
CU gets: 3
Disk Reads: 101426
Disk Writes: 0
IO Read Requests: 1633
IO Write Requests: 0
Bytes Read: 830881792
Bytes Written: 0
Bytes Exchanged with Storage: 830881792
Bytes Exchanged with Disk: 830881792
Bytes Simulated Read: 0
Bytes Simulated Returned: 0
Elapsed Time: 4586 (ms)
CPU Time: 1305 (ms)
User I/O Time: 3040 (ms)
**************************************************************
kkocfbCompareExecStats : Best
Exec count: 1
CR gets: 17679
CU gets: 0
Disk Reads: 0
Disk Writes: 0
IO Read Requests: 0
IO Write Requests: 0
Bytes Read: 0
Bytes Written: 0
Bytes Exchanged with Storage: 0
Bytes Exchanged with Disk: 0
Bytes Simulated Read: 0
Bytes Simulated Returned: 0
Elapsed Time: 51 (ms)
CPU Time: 51 (ms)
User I/O Time: 15 (us)
kkocfbCompareExecStats: improvement BG: 0.172935 CPU: 0.039555

The first execution, with ‘bad’ statistics, is still the best one and this new execution has an improvement of 0.17, which means 5 times slower.

Then in the trace we see again that re-optimisation (reparsing) is considered:

Reparsing due to card est...
@=0x6a368338 type=5 nodeid=11 monitor=Y halias="" loc="SEL$582FA660" onames="SEL$608EC1F7"@"SEL$582FA660" "SEL$04458B50"@"SEL$582FA660" act=5648 min=0 est=24899 next=0x6a3682d8
Reparsing due to card est...
@=0x6a3687b0 type=5 nodeid=7 monitor=Y halias="" loc="SEL$582FA660" onames="SEL$608EC1F7"@"SEL$582FA660" "SEL$04458B50"@"SEL$582FA660" "SEL$FB0FE72C"@"SEL$33802F1B" "SEL$5"@"SEL$33802F1B" "SEL$07BDC5B4"@"SEL$636B5685" act=5622 min=0 est=36463 next=0x6a368750
Reparsing due to card est...
@=0x6a368990 type=5 nodeid=6 monitor=Y halias="" loc="SEL$582FA660" onames="SEL$608EC1F7"@"SEL$582FA660" "SEL$04458B50"@"SEL$582FA660" "SEL$FB0FE72C"@"SEL$33802F1B" "SEL$5"@"SEL$33802F1B" "SEL$07BDC5B4"@"SEL$636B5685" "SEL$FB0FE72C"@"SEL$4" act=5622 min=0 est=36463 next=0x6a368930
Reparsing due to card est...
@=0x6a368b90 type=5 nodeid=4 monitor=Y halias="" loc="SEL$582FA660" onames="SEL$608EC1F7"@"SEL$582FA660" "SEL$04458B50"@"SEL$582FA660" "SEL$FB0FE72C"@"SEL$33802F1B" "SEL$5"@"SEL$33802F1B" "SEL$07BDC5B4"@"SEL$636B5685" "SEL$FB0FE72C"@"SEL$4" "SEL$F5BB74E1"
@"SEL$4" act=4132 min=0 est=12039 next=0x6a368b30
Reparsing due to card est...
@=0x6a368d60 type=4 nodeid=3 monitor=Y halias="" loc="SEL$582FA660" act=1491 min=0 est=12039 next=0x6a368b90

An additional OPT_ESTIMATE hint is generated for the complext view merging view query block:

atom_hint=(@=0x7fe74e21eb90 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" ROWS=1517.000000 ) )
atom_hint=(@=0x7fe74e21e7b0 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_FOLDER_R1"@"SEL$1" "D_1F0049A880000016" MIN=1517.000000 ) )
atom_hint=(@=0x7fe74e21e470 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" TABLE "DM_FOLDER_R1"@"SEL$1" ROWS=1517.000000 ) )
atom_hint=(@=0x7fe74e21e050 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_FILTER "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) )
atom_hint=(@=0x7fe74e21dce8 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" INDEX_SCAN "DM_SYSOBJECT_R2"@"SEL$1" "D_1F0049A880000010" MIN=3.000000 ) )
atom_hint=(@=0x7fe74e21da38 err=0 resol=0 used=0 token=1018 org=6 lvl=2 txt=OPT_ESTIMATE (@"SEL$582FA660" QUERY_BLOCK ROWS=1491.000000 ) )
atom_hint=(@=0x7fe74e21d600 err=0 resol=0 used=0 token=1018 org=6 lvl=3 txt=OPT_ESTIMATE (@"SEL$1" TABLE "DM_SYSOBJECT_R2"@"SEL$1" MIN=3.000000 ) )

Whith this new cardinality estimation, the next execution will try to get a better execution, but it doesn’t change the optimizer choice and the new child cursor gets the same execution plan:
KKSMEC: Invalidating old cursor 1 with hash val = 955499861
KKSMEC: Produced New cursor 2 with hash val = 955499861

Child cursor 2 – plan 955499861 again – 4.77 seconds

This the third execution:

Plan hash value: 955499861
------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 30996 (100)| 1 |00:00:04.19 | 102K| 101K| | | |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:04.19 | 102K| 101K| | | |
| 2 | VIEW | VM_NWVW_2 | 1 | 1491 | 30996 (1)| 1491 |00:00:04.19 | 102K| 101K| | | |
| 3 | HASH UNIQUE | | 1 | 1491 | 30996 (1)| 1491 |00:00:04.19 | 102K| 101K| 941K| 941K| 1355K (0)|
|* 4 | HASH JOIN RIGHT SEMI | | 1 | 12039 | 30490 (1)| 4132 |00:00:04.19 | 102K| 101K| 12M| 3867K| 14M (0)|
| 5 | TABLE ACCESS FULL | DM_DOCUMENT_S | 1 | 213K| 210 (1)| 213K|00:00:00.01 | 740 | 0 | | | |
|* 6 | HASH JOIN | | 1 | 36463 | 29665 (1)| 5622 |00:00:04.12 | 101K| 101K| 1405K| 1183K| 2021K (0)|
|* 7 | HASH JOIN | | 1 | 36463 | 18397 (1)| 5622 |00:00:03.39 | 65102 | 65050 | 940K| 940K| 1359K (0)|
|* 8 | HASH JOIN | | 1 | 2222 | 14489 (1)| 1499 |00:00:02.94 | 51412 | 51369 | 992K| 992K| 1331K (0)|
|* 9 | HASH JOIN | | 1 | 2222 | 14120 (1)| 1499 |00:00:01.04 | 50088 | 50057 | 3494K| 1598K| 4145K (0)|
|* 10 | TABLE ACCESS FULL | DM_SYSOBJECT_S | 1 | 39235 | 10003 (1)| 39235 |00:00:00.47 | 36385 | 36376 | | | |
|* 11 | HASH JOIN | | 1 | 24899 | 3920 (1)| 5648 |00:00:00.55 | 13703 | 13681 | 1199K| 1199K| 1344K (0)|
|* 12 | INDEX RANGE SCAN | D_1F0049A880000016 | 1 | 1517 | 12 (0)| 1517 |00:00:00.01 | 13 | 0 | | | |
|* 13 | TABLE ACCESS FULL| DM_SYSOBJECT_R | 1 | 646K| 3906 (1)| 646K|00:00:00.43 | 13690 | 13681 | | | |
| 14 | TABLE ACCESS FULL | DM_FOLDER_S | 1 | 431K| 367 (1)| 431K|00:00:01.82 | 1324 | 1312 | | | |
|* 15 | TABLE ACCESS FULL | DM_SYSOBJECT_R | 1 | 646K| 3906 (1)| 646K|00:00:00.33 | 13690 | 13681 | | | |
|* 16 | TABLE ACCESS FULL | DM_SYSOBJECT_S | 1 | 646K| 10000 (1)| 646K|00:00:00.60 | 36385 | 36376 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------
Note
-----
- statistics feedback used for this statement

Same plan and same execution time here. Tables are large and SGA is small here.

*********** Begin Dump Context: best estimates ***********
 
**************************************************************
kkocfbCompareExecStats : Current
Exec count: 1
CR gets: 102224
CU gets: 3
Disk Reads: 101426
Disk Writes: 0
IO Read Requests: 1633
IO Write Requests: 0
Bytes Read: 830881792
Bytes Written: 0
Bytes Exchanged with Storage: 830881792
Bytes Exchanged with Disk: 830881792
Bytes Simulated Read: 0
Bytes Simulated Returned: 0
Elapsed Time: 4206 (ms)
CPU Time: 1279 (ms)
User I/O Time: 3084 (ms)
**************************************************************
kkocfbCompareExecStats : Best
Exec count: 1
CR gets: 17679
CU gets: 0
Disk Reads: 0
Disk Writes: 0
IO Read Requests: 0
IO Write Requests: 0
Bytes Read: 0
Bytes Written: 0
Bytes Exchanged with Storage: 0
Bytes Exchanged with Disk: 0
Bytes Simulated Read: 0
Bytes Simulated Returned: 0
Elapsed Time: 51 (ms)
CPU Time: 51 (ms)
User I/O Time: 15 (us)
kkocfbCompareExecStats: improvement BG: 0.172939 CPU: 0.040363

So where we are here? We had an execution which was based on bad estimations. Then two tries on good estimations, but because of different buffer cache behavior they are finally 5 times slower. Nothing else to try.

The good thing is that the optimizer admits it cannot do better and falls back to the best execution time, now considered as the best estimate:

kkocfbCheckCardEst: reparse using best estimates
...
kkocfbCopyCardCtx: No best stats found

We see no OPT_ESTIMATE hints here, reason why there was noting in V$SQL_REOPTIMIZATION_HINTS for cursor 2, but this cursor is still marked as re-optimizable and next execution invalidates it:

KKSMEC: Invalidating old cursor 2 with hash val = 955499861
KKSMEC: Produced New cursor 3 with hash val = 1894156093

We see that we are back to the original plan, which is expected because the static statistics have not changed, and there are no statistics feedback this time.

Child cursor 3 – back to plan 1894156093 – 0.5 seconds

This is the plan that si used for all subsequent executions now.

Plan hash value: 1894156093
----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 171 (100)| 1 |00:00:00.04 | 17677 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.04 | 17677 |
| 2 | NESTED LOOPS | | 1 | 1 | 116 (0)| 1491 |00:00:00.04 | 17677 |
| 3 | NESTED LOOPS | | 1 | 1 | 115 (0)| 1491 |00:00:00.04 | 17454 |
| 4 | NESTED LOOPS | | 1 | 49 | 17 (0)| 5648 |00:00:00.01 | 536 |
|* 5 | INDEX RANGE SCAN | D_1F0049A880000016 | 1 | 3 | 3 (0)| 1517 |00:00:00.01 | 13 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED | DM_SYSOBJECT_R | 1517 | 16 | 10 (0)| 5648 |00:00:00.01 | 523 |
|* 7 | INDEX RANGE SCAN | D_1F0049A880000010 | 1517 | 71 | 2 (0)| 5648 |00:00:00.01 | 249 |
|* 8 | TABLE ACCESS BY INDEX ROWID | DM_SYSOBJECT_S | 5648 | 1 | 2 (0)| 1491 |00:00:00.03 | 16918 |
|* 9 | INDEX UNIQUE SCAN | D_1F0049A880000108 | 5648 | 1 | 1 (0)| 1491 |00:00:00.03 | 15427 |
| 10 | NESTED LOOPS SEMI | | 5648 | 2 | 25 (0)| 1491 |00:00:00.02 | 14827 |
| 11 | NESTED LOOPS | | 5648 | 7 | 18 (0)| 2981 |00:00:00.02 | 12868 |
| 12 | TABLE ACCESS BY INDEX ROWID BATCHED| DM_SYSOBJECT_R | 5648 | 71 | 4 (0)| 2981 |00:00:00.01 | 7747 |
|* 13 | INDEX RANGE SCAN | D_1F0049A880000010 | 5648 | 16 | 3 (0)| 2981 |00:00:00.01 | 6145 |
|* 14 | TABLE ACCESS BY INDEX ROWID | DM_SYSOBJECT_S | 2981 | 1 | 2 (0)| 2981 |00:00:00.01 | 5121 |
|* 15 | INDEX UNIQUE SCAN | D_1F0049A880000108 | 2981 | 1 | 1 (0)| 2981 |00:00:00.01 | 2140 |
|* 16 | INDEX UNIQUE SCAN | D_1F0049A880000145 | 2981 | 52759 | 1 (0)| 1491 |00:00:00.01 | 1959 |
|* 17 | INDEX UNIQUE SCAN | D_1F0049A880000142 | 1491 | 1 | 1 (0)| 1491 |00:00:00.01 | 223 |
----------------------------------------------------------------------------------------------------------------------------------------

After a few tries to get a better plan, the optimizer finally switched back to the first one because it was the best in term of response time (I don’t know exactly which execution statistics are used for this decision, elapsed time is just my guess here).

The interesting point here is to understand that you can see a reoptimized cursor without statistics feedback:

  • No rows for the previous cursor in V$SQL_REOPTIMIZATION_HINTS
  • No ‘statistics feedback’ not in the new cursor plan
  • Difference between E-Rows and A-Rows in the new plan

So what?

SQL optimization is a complex task and there is nothing like an execution is ‘fast’ or ‘slow’, an execution plan is ‘good’ or ‘bad’, an optimizer decision is ‘right’ or ‘wrong’. What is fast after several similar executions can be slow on a busy system because less blocks remains in cache. What is slow at a time where the storage is busy may be fast at another time of the day. What is fast with one single user may raise more concurrency contention on a busy system. Cardinality feedback is a reactive attempt to improve an execution plan. On average, things go better with it, but it is not abnormal that few cases can go wrong for a few executions. You can’t blame the optimizer for that, and fast conclusions or optimizer parameter tweaking are not sustainable solutions. And don’t forget that if your data model is well designed, then the critical queries should have one clear optimal access path which will not depend on a small difference in estimated number of rows.

The only thing I can always conclude when I see cardinality feedback going wrong is that there is something to fix in the design of data model, the statistics gathering and/or the query design. When statistics feedback gives a worse execution plan, it is the consequence of the combination of:

  • mis-estimation of cardinalities: bad, insufficient, or stale statistics
  • mis-estimation of response time: bad system statistics, untypical memory sizing, unrepresentative execution context
  • no clear optimal access path: sub-optimal indexing, lack of partitioning,…

It is a good thing to have the auto-reoptimization coming back to the initial plan when nothing better has been observed. I would love to see more control about it. For example, a hint that sets a threshold of execution time where the optimizer should not try to find better. I filled this idea in https://community.oracle.com/ideas/17514 and you can vote for it.

Update 2-APR-2017

I was not clear in this post, but this is the first time I observed this behavior (multiple reoptimization and then back to original plan), so I’m not sure about the reasons and the different conditions required. This was on 12.1.0.2 with JAN17 PSU and the two Adaptive Statistics backport from 12cR2, adaptive plans set to false and no bind variables.

 

Cet article When automatic reoptimization plan is less efficient est apparu en premier sur Blog dbi services.

A performance deep dive into tablespace encryption

This is a run through of a performance investigation into Oracle tablespace encryption. These are the versions this test was performed on:

$ cat /etc/oracle-release
Oracle Linux Server release 6.8
$ /u01/app/oracle/product/12.1.0.2/dbhome_1/OPatch/opatch lspatches
24315824;Database PSU 12.1.0.2.161018, Oracle JavaVM Component (OCT2016)
24006101;Database Patch Set Update : 12.1.0.2.161018 (24006101)

In this test I created an encrypted tablespace:

SQL> create tablespace is_encrypted datafile size 10m autoextend on next 10m encryption default storage(encrypt);

(this assumes you have setup a master encryption key already)
And I created an encrypted simple heap table with one row:

SQL> create table t_is_encrypted ( f1 varchar2(100) ) tablespace is_encrypted;
SQL> insert into t_is_encrypted values ('AAAAAAAAAA');

Also, in order to do some performance tests, I created a simple heap table with a lot of rows:

SQL> create table t_encrypted_large ( id number, random_string varchar2( 100 ) ) tablespace is_encrypted;
SQL> begin
        for counter in 1..32000000 loop
                insert into t_encrypted_large values ( counter, dbms_random.string('l',50) );
        end loop;
end;
/

Let me first go back to basics a little and explain how tablespace encryption works: tablespace encryption sets the ‘E’ flag in the cache header of blocks of a segment which contain data (block type# 6, ‘trans data’), which means the block will be UNencrypted in the buffercache, but will be encrypted once the block is persisted to disk. If a block is read from disk into the cache, it means it must be decrypted first. Once a block is in the cache and decrypted, it can be read and used without the overhead of encryption, because it is not encrypted anymore.

A special case arises when direct path reads are used by Oracle. A direct path read means the blocks are read from disk into the process’ PGA, so there is no involvement of the buffer cache. This means that direct path reads ALWAYS must decrypt the data blocks with tablespace encryption, which means it will suffer the overhead of decryption. This inherent overhead applies to any direct path action (which means both direct reads and direct path writes).

Let’s first do a simple count(*) on table t_encrypted_large, and collect a wait profile. I am doing this with Tanel Poder’s snapper. First session, which collects the profile:

SQL> var snapper refcursor
SQL> @snapper all,begin 1 1 793

Now with the begin snapshot ready, head over to the other session, which will scan the t_encrypted_large table:

SQL> select count(*) from t_encrypted_large;

  COUNT(*)
----------
  32000000

Now end the snapper snapshot and look at where the time was spend:

SQL> @snapper all,end 1 1 793
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    793, SYS       , TIME, parse time elapsed                                        ,           140,     5.32us,      .0%, [          ],          ,           ,
    793, SYS       , TIME, PL/SQL execution elapsed time                             ,           113,     4.29us,      .0%, [          ],          ,           ,
    793, SYS       , TIME, DB CPU                                                    ,       4912502,   186.66ms,    18.7%, [@@        ],          ,           ,
    793, SYS       , TIME, sql execute elapsed time                                  ,       4915777,   186.78ms,    18.7%, [##        ],          ,           ,
    793, SYS       , TIME, DB time                                                   ,       4916441,   186.81ms,    18.7%, [##        ],          ,           ,       2.66 % unaccounted time
    793, SYS       , WAIT, Disk file operations I/O                                  ,            69,     2.62us,      .0%, [          ],         2,        .08,     34.5us average wait
    793, SYS       , WAIT, db file sequential read                                   ,           484,    18.39us,      .0%, [          ],         3,        .11,   161.33us average wait
    793, SYS       , WAIT, db file scattered read                                    ,           220,     8.36us,      .0%, [          ],         1,        .04,      220us average wait
    793, SYS       , WAIT, direct path read                                          ,          1641,    62.35us,      .0%, [          ],         4,        .15,   410.25us average wait
    793, SYS       , WAIT, SQL*Net message to client                                 ,             2,      .08us,      .0%, [          ],         3,        .11,      .67us average wait
    793, SYS       , WAIT, SQL*Net message from client                               ,      20701153,   786.56ms,    78.7%, [WWWWWWWW  ],         3,        .11,       6.9s average wait

The time spend executing SQL is 186.78ms, and during that time, 186.66ms was spend on CPU (DB CPU). The waits acknowledge this, the time in IO related waits is measured in microseconds. In case you wondered if this output is sane, this system is using local NVMe disks, and you see the full table scan flipping from buffered multiblock IO (db file scattered read) to direct path multiblock IO (direct path read). The conclusion at this point is that as good as all time was spend running on CPU. Of course this might be different if IO has higher latency (random IO latency of rotating media is between 5-12ms, reading from a SAN cache normally has a latency of around 1ms).

It is not possible to get a breakdown of where (ON) CPU time is spend inside the database. There are session counters, but these do not depict time. However, on the operating system level there is a useful tool: perf. In my experience, perf output is only really useful when the hardware PMU events are available from the operating system. This can be tested by specifying the event to use for perf. If you have the PMU events available, perf stat -e cycles sleep 1 will return something like:

$ perf stat -e cycles sleep 1

 Performance counter stats for 'sleep 1':

           736,481      cycles

       1.001210217 seconds time elapsed

If you don’t have PMU events available (for you are running in a virtual machine for example, however some virtualisation products can give access to the PMU events), it will look like:

$ perf stat -e cycles sleep 1

 Performance counter stats for 'sleep 1':

         cycles

       1.001025828 seconds time elapsed

Back to the original purpose of this blogpost: the performance of tablespace. Let’s get a breakdown of the functions the time is spend on doing the ‘select count(*) from t_encrypted_large’. This is done by first executing perf record in one session with the process ID specified of the server process, which then executes the ‘select count(*) from t_encrypted_large’. Once the SQL is ready, the perf session can be stopped (ctrl-c), and the contents of the gathered perf database can be reported. This is how the recording is done:

$ perf record -g -p 57531

Now the ‘select count(*) from t_encrypted_large’ is executed. Once the result comes back in the sqlplus session, go to the perf session and ctrl-c it:

^C[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.928 MB perf.data (5114 samples) ]

Now that the perf data is collected, display a summary of the data:

$ perf report --sort comm --max-stack 2
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 5K of event 'cycles'
# Event count (approx.): 15445982213
#
# Children      Self  Command
# ........  ........  ...............
#
   100.00%   100.00%  oracle_57531_ao
            |--27.44%-- kdr4chk
            |--15.91%-- kdb4chk1
            |--10.14%-- cipher_loop_p3
            |--8.56%-- msort_with_tmp
            |--5.84%-- kdrrea2
            |--3.98%-- memcpy
            |--3.12%-- kd4_ent_cmp
            |--2.13%-- kdrrrh2
            |--1.99%-- kdstf00000010000100kmP
            |--1.53%-- __intel_ssse3_rep_memcpy
            |--1.42%-- __intel_new_memset
            |--1.40%-- blks_loop_p3
            |--1.14%-- y8_ExpandRijndaelKey
            |--1.06%-- kdbulk
            |--0.88%-- ownGetReg
            |--0.62%-- kghalf
             --12.86%-- [...]

What this shows, is 100% of the samples are from the process ‘oracle_57531_ao’, 27.44% of the samples were spend in a function called ‘kdr4chk’, 15.91% in a function called ‘kdb4chk1’, and so on.

Just by looking at the function names, these do not strike me as functions that do decryption. Function names like ‘cipher_loop_p3’ and ‘y8_ExpandRijndaelKey’ more look like functions doing some of the heavy lifting for decryption. In fact, the function names the most time is spend in (kdr4chk and kdb4chk1) seem to indicate some kind of checking.

A first thought is to look at the database parameters for block checking, and turn them off, and redo the test.

SQL> show parameter db_block_check

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
db_block_checking		     string	 OFF
db_block_checksum		     string	 FALSE

In fact, I already turned db block checking and db block checksum off. So it’s not the checking governed by these parameters, so what can it be then?

The best way I could come up with is getting a trace of the exact functions the process is executing. However, the number of C functions an executable goes through is typically gigantic. So for that reason, doing an investigation of the C functions, it’s best to use a very small table, which is ‘t_is_encrypted’, which contains only one row, in order to minimise the processing.

However, there is no tooling by default on Linux that gives you the ability to trace the exact C functions a process is executing. perf, which was just used SAMPLES execution. However, the intel pin tools give you the options to create a C function execution overview. Please use the link to see how you can download and link pin tools, and then create a debugtrace of the execution of ‘select * from t_is_encrypted’ for yourself. If you are interested in the findings based on the debugtrace, read on!

First of all, understanding the functions requires some experience. I don’t think there is any workaround for gaining the knowledge, in order to learn how to interprent them you just have to study them and spending time on it to work it out.

The output of debugtrace on ‘select * from t_is_encrypted’ (which fetches the one row in the table from an encrypted tablespace) looks like this:

         3 @@@ return underflow
Return 0x00007f225b355818 /lib64/libpthread.so.0:__read_nocancel+0x00000000000f returns: 0x151
        14 @@@ return underflow
Return 0x000000000cf1036d /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:sntpread+0x00000000003d returns: 0
        31 @@@ return underflow
Return 0x000000000cf102c0 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:ntpfprd+0x0000000000c0 returns: 0
       119 @@@ return underflow
Return 0x000000000cef19dd /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:nsbasic_brc+0x00000000032d returns: 0
       122 @@@ return underflow
Return 0x000000000cef164a /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:nsbrecv+0x00000000005a returns: 0
       152 Call 0x000000000cf00486 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:nioqrc+0x000000000276 -> 0x0000000005b2bdb0 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:_intel_fast_memcpy(0x7ffe4ed10685, 0x7f225a965da0, ...)
       156 Tailcall 0x0000000005b2bdbe /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:_intel_fast_memcpy+0x00000000000e -> 0x0000000005b2bda0 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:_intel_fast_memcpy.P(0x7ffe4ed10685, 0x7f225a965da0, ...)
       158 | Call 0x0000000005b2bda1 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:_intel_fast_memcpy.P+0x000000000001 -> 0x0000000005b32990 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:__intel_ssse3_rep_memcpy(0x7ffe4ed10685, 0x7f225a965da0, ...)
       172 | Return 0x0000000005b34d77 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:__intel_ssse3_rep_memcpy+0x0000000023e7 returns: 0x7ffe4ed10685
       174 Return 0x0000000005b2bda7 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:_intel_fast_memcpy.P+0x000000000007 returns: 0x7ffe4ed10685
       197 @@@ return underflow

That is not very well readable. I created some simple sed filters to make the output more readable:

cat debugtrace.out | sed 's/^\ *\([0-9]*\)\(\ .*\)Call.*-> 0x.*:\(.*\)/\2> \3/' | sed 's/^\ *\([0-9]*\)\(\ .*\)Return.*:\(.*returns: .*\)/\2< \3/' | sed 's/^\ *\([0-9]*\)\(\ .*\)Tailcall.*-> 0x.*:\(.*\)/\2<> \3/' | grep -v underflow

If you use that on the same file, it gets better, with the aim to show the function call flow:

Return 0x00007f225b355818 /lib64/libpthread.so.0:__read_nocancel+0x00000000000f returns: 0x151
Return 0x000000000cf1036d /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:sntpread+0x00000000003d returns: 0
Return 0x000000000cf102c0 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:ntpfprd+0x0000000000c0 returns: 0
Return 0x000000000cef19dd /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:nsbasic_brc+0x00000000032d returns: 0
Return 0x000000000cef164a /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:nsbrecv+0x00000000005a returns: 0
 > _intel_fast_memcpy(0x7ffe4ed10685, 0x7f225a965da0, ...)
 <> _intel_fast_memcpy.P(0x7ffe4ed10685, 0x7f225a965da0, ...)
 | > __intel_ssse3_rep_memcpy(0x7ffe4ed10685, 0x7f225a965da0, ...)
 | < __intel_ssse3_rep_memcpy+0x0000000023e7 returns: 0x7ffe4ed10685
 < _intel_fast_memcpy.P+0x000000000007 returns: 0x7ffe4ed10685
Return 0x000000000cf004ed /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:nioqrc+0x0000000002dd returns: 0
 > kslwtectx(0x7ffe4ed103c0, 0x7f225a965da1, ...)
 | > sltrgftime64(0x7ffe4ed103c0, 0x7f225a965da1, ...)
 | | > sltrgftime64_cgt(0x7ffe4ed103c0, 0x7f225a965da1, ...)
 | | | > clock_gettime@plt(0x1, 0x7ffe4ed10230, ...)
 | | | | > clock_gettime(0x1, 0x7ffe4ed10230, ...)
 | | | | < clock_gettime+0x000000000054 returns: 0
 | | | < clock_gettime+0x00000000006a returns: 0
 | | < sltrgftime64_cgt+0x000000000058 returns: 0x68724a87136
 | < sltrgftime64+0x00000000005c returns: 0x68724a87136
 | > kslwt_end_snapshot(0x2c674c080, 0x2c674c080, ...)
 | < kslwt_end_snapshot+0x000000000235 returns: 0x2c674b128
 | > kslwt_update_stats_int(0x2c674c080, 0x2c674c080, ...)
 | | > kews_update_wait_time(0x6, 0x53696ef, ...)
 | | < kews_update_wait_time+0x00000000002b returns: 0x1
 | < kslwt_update_stats_int+0x000000000314 returns: 0x1b0
 | > kskthewt(0x68724a87136, 0x184, ...)
 | < kskthewt+0x0000000000d7 returns: 0x23
 < kslwtectx+0x0000000003c0 returns: 0xe923

The rows that start with ‘Return’ are functions that were already entered when debugtrace started, and thus debugtrace could not determine the level, and reports an underflow in the output when the process returns from them.
The rows that start with ‘>’ mean the function after it is entered. The first two arguments are shown.
The rows that start with ‘<‘ mean the function after it is returned from. The return code is shown.
The rows that start with ‘<>’ mean a ‘tail call’ was executed, which is function that is executed as part of the return of another function.

To further emphasise this example, the above trace start with the server process waiting on input. On linux, this is done using a read function (the __read_nocancel function comes from the libpthread operating system library), which is a place a server process is expected to wait and is actually in an oracle wait event (SQL*Net message from client). It becomes apparent the process was in a wait event because the code for ending a wait event (kslwtectx) is executed after the __read_nocancel function. A word of warning: in this output, kslwtectx seems to be the beginning (left most) function. In reality, this is not true, there are many more functions, but the process has not returned to them so they are not known at that point of the trace. For the function a waiting database server process is sleeping in (__read_nocancel), this is sntpread>ntpfprd>nsbasic_brc>nsbrecv>nioqrc>opikndf2>opitsk>opiino>opiodr>opidrv>sou2o>optima_real>ssthrdmain>main.

Okay, now let’s look into the relevant output of the ‘select * from t_is_encrypted’. If you wade through the trace, it’s best to first go the the line with for ‘< kcbz_table_scan_read'. The kcbz_table_scan_read function performs a buffered multiblock read. '<' means the return, and you need to go up from there to find the first important bit. Here’s the important bit:

 | | | | | | | | | | | > ksfd_clear_pgaobjsc(0, 0x2b, ...)
 | | | | | | | | | | | < ksfd_clear_pgaobjsc+0x000000000023 returns: 0
 | | | | | | | | | | | > _intel_fast_memcpy(0x1a5ad6000, 0x2b1115000, ...)
 | | | | | | | | | | | <> _intel_fast_memcpy.P(0x1a5ad6000, 0x2b1115000, ...)
 | | | | | | | | | | | | > __intel_ssse3_rep_memcpy(0x1a5ad6000, 0x2b1115000, ...)
 | | | | | | | | | | | | < __intel_ssse3_rep_memcpy+0x00000000243e returns: 0x1a5ad6000
 | | | | | | | | | | | < _intel_fast_memcpy.P+0x000000000007 returns: 0x1a5ad6000
 | | | | | | | | | | | > _intel_fast_memcpy(0x182808000, 0x2b1117000, ...)
 | | | | | | | | | | | <> _intel_fast_memcpy.P(0x182808000, 0x2b1117000, ...)
 | | | | | | | | | | | | > __intel_ssse3_rep_memcpy(0x182808000, 0x2b1117000, ...)
 | | | | | | | | | | | | < __intel_ssse3_rep_memcpy+0x00000000243e returns: 0x182808000
 | | | | | | | | | | | < _intel_fast_memcpy.P+0x000000000007 returns: 0x182808000
 | | | | | | | | | | | > _intel_fast_memcpy(0x178ed0000, 0x2b1119000, ...)
 | | | | | | | | | | | <> _intel_fast_memcpy.P(0x178ed0000, 0x2b1119000, ...)
 | | | | | | | | | | | | > __intel_ssse3_rep_memcpy(0x178ed0000, 0x2b1119000, ...)
 | | | | | | | | | | | | < __intel_ssse3_rep_memcpy+0x00000000243e returns: 0x178ed0000
 | | | | | | | | | | | < _intel_fast_memcpy.P+0x000000000007 returns: 0x178ed0000
 | | | | | | | | | | | > _intel_fast_memcpy(0x12927c000, 0x2b111b000, ...)
 | | | | | | | | | | | <> _intel_fast_memcpy.P(0x12927c000, 0x2b111b000, ...)
 | | | | | | | | | | | | > __intel_ssse3_rep_memcpy(0x12927c000, 0x2b111b000, ...)
 | | | | | | | | | | | | < __intel_ssse3_rep_memcpy+0x00000000243e returns: 0x12927c000
 | | | | | | | | | | | < _intel_fast_memcpy.P+0x000000000007 returns: 0x12927c000
 | | | | | | | | | | | > _intel_fast_memcpy(0x15411c000, 0x2b111d000, ...)
 | | | | | | | | | | | <> _intel_fast_memcpy.P(0x15411c000, 0x2b111d000, ...)
 | | | | | | | | | | | | > __intel_ssse3_rep_memcpy(0x15411c000, 0x2b111d000, ...)
 | | | | | | | | | | | | < __intel_ssse3_rep_memcpy+0x00000000243e returns: 0x15411c000
 | | | | | | | | | | | < _intel_fast_memcpy.P+0x000000000007 returns: 0x15411c000
 | | | | | | | | | | | > kcbz_free_table_scan_read_buffer(0x2b1115000, 0xa000, ...)
 | | | | | | | | | | | | > kslgetl(0x60023a80, 0x1, ...)
 | | | | | | | | | | | | < kslgetl+0x000000000144 returns: 0x1
 | | | | | | | | | | | | > kslfre(0x60023a80, 0x8, ...)
 | | | | | | | | | | | | < kslfre+0x000000000125 returns: 0
 | | | | | | | | | | | < kcbz_free_table_scan_read_buffer+0x0000000000f3 returns: 0
 | | | | | | | | | | < kcbz_table_scan_read+0x000000000664 returns: 0x5

Going up from kcbz_table_scan_read return, we see a call to kcbz_free_table_scan_read_buffer, which gets and frees a latch (‘cache table scan latch’). Above that we essentially see the table scan read buffer contents being copied into the buffer cache per block using an optimised version of the memcpy system call. Luckily, debugtrace shows the first two arguments of a call, for memcpy this is a pointer to the destination as first argument, and a pointer to the source as second. If you look closely to the memcpy arguments, you see that the destination is random (“scattered” through the buffer cache, which is where ‘db file scattered read’ gets its name from), and the source is starting at 0x2b1115000 and increases by exactly 0x2000. 0x2000 equals 8192, which is the size of the blocks. The reason for starting here is the blocks are further processed in the buffer cache buffers in order they are copied in kcbz_table_scan_read.

The first buffer copied is 0x1a5ad6000. The next sequence of functions:

 | | | | | | | | | | > kcbzvb(0x1a5ad6000, 0x5, ...)
 | | | | | | | | | | | > kcbhvbo(0x1a5ad6000, 0x2000, ...)
 | | | | | | | | | | | < kcbhvbo+0x0000000000cc returns: 0
 | | | | | | | | | | < kcbzvb+0x000000000084 returns: 0x1
 | | | | | | | | | | > kcbztek_trace_blk(0xd5e15b8, 0, ...)
 | | | | | | | | | | < kcbztek_trace_blk+0x00000000001d returns: 0x1a5fce498
 | | | | | | | | | | > _setjmp@plt(0x7ffe4ed04ed0, 0, ...)
 | | | | | | | | | | <> __sigsetjmp(0x7ffe4ed04ed0, 0, ...)
 | | | | | | | | | | <> __sigjmp_save(0x7ffe4ed04ed0, 0, ...)
 | | | | | | | | | | < __sigjmp_save+0x000000000010 returns: 0
 | | | | | | | | | | > kcbz_encdec_tbsblk(0x5, 0x38b, ...)

Next is a function ‘kcbzvb’ which probably is kernel cache buffers space management (Z) verify buffer. Then kcbztek_trace_blk. My current thinking is this function reads the block to see if the encryption flag is set in the cache header, and it hasn’t been decrypted already. When a block is is not encrypted, or is encrypted and already decrypted, kcbz_encdec_tbsblk is not called; in this case, the block is encrypted and not decrypted yet, so we see the function kcbz_encdec_tbsblk being called. A lot of function calls inside the kcbz_encdec_tbsblk are done which strongly indicate having something to do with keys and encryption, like kcbtse_get_tbskey, kcbtse_get_tbskey_info and a lot of functions starting with kgce (kgce_init, kgce_set_mode, kgce_set_key) most notably the function kgce_dec, which probably is kernel generic encryption – decryption. Here is how that function returns and what it is executes next (still inside the function kcbz_encdec_tbsblk):

 | | | | | | | | | | | < kgce_dec+0x0000000000a6 returns: 0
 | | | | | | | | | | | > _intel_fast_memcpy(0x1a5ad6014, 0x7f225a96c0f0, ...)
 | | | | | | | | | | | <> _intel_fast_memcpy.P(0x1a5ad6014, 0x7f225a96c0f0, ...)
 | | | | | | | | | | | | > __intel_ssse3_rep_memcpy(0x1a5ad6014, 0x7f225a96c0f0, ...)
 | | | | | | | | | | | | < __intel_ssse3_rep_memcpy+0x0000000025e8 returns: 0x1a5ad6014
 | | | | | | | | | | | < _intel_fast_memcpy.P+0x000000000007 returns: 0x1a5ad6014
 | | | | | | | | | | | > kcbhxoro(0x1a5ad6000, 0x2000, ...)
 | | | | | | | | | | | | > sxorchk(0x1a5ad6000, 0x2000, ...)
 | | | | | | | | | | | | < sxorchk+0x0000000000e6 returns: 0x46dd
 | | | | | | | | | | | < kcbhxoro+0x000000000013 returns: 0x46dd

The kgce_dec functions returns. It is my assumption the decryption of the block is done. Next there is another memcpy function. The arguments of memcpy are *dest, *src. The first address is the same address we saw earlier, but at 0x1a5ad6014 instead of 0x1a5ad6000, so 14 bytes past the start of the block. The source is 0x7f225a96c0f0. In most cases these high numbers are PGA memory addresses. So what I think happens here, is the block was copied into the buffer, then kcbztek_trace_blk found the encryption flag and the block encrypted, so the kcbz_encdec_tbsblk function has been called to decrypt the block, which is done in PGA, and then the decrypted version of the block is copied into the buffer overwriting the encrypted version.

The next function kcbhxoro probably is a function to check the block. This must mean the checking is done on the unencrypted block contents. Please mind we are still in kcbz_encdec_tbsblk!

Continuing the trace we see:

 | | | | | | | | | | | > kcbchk_ctx(0x1a5ad6000, 0, ...)
 | | | | | | | | | | | | > ktbdbchk(0x1a5ad6000, 0, ...)
 | | | | | | | | | | | | | > kdgchk(0x1a5ad6000, 0, ...)
 | | | | | | | | | | | | | | > kd4chk(0x1a5ad6000, 0x7fffffc7, ...)
 | | | | | | | | | | | | | | | > kdb4chk1(0x1a5ad6014, 0, ...)
 | | | | | | | | | | | | | | | | > kd4_alloc_ok(0xc9cee30, 0xc9cee80, ...)
 | | | | | | | | | | | | | | | | | > kcbtmal(0, 0x4, ...)
 | | | | | | | | | | | | | | | | | | > kghalf(0x7f225f080c80, 0x7f225f085980, ...)
 | | | | | | | | | | | | | | | | | | | > _intel_fast_memset(0x7f225ef1ffb0, 0, ...)
 | | | | | | | | | | | | | | | | | | | <> _intel_fast_memset.J(0x7f225ef1ffb0, 0, ...)
 | | | | | | | | | | | | | | | | | | | | > __intel_memset(0x7f225ef1ffb0, 0, ...)
 | | | | | | | | | | | | | | | | | | | | < __intel_memset+0x00000000031e returns: 0x7f225ef1ffb0
 | | | | | | | | | | | | | | | | | | | < _intel_fast_memset.J+0x000000000007 returns: 0x7f225ef1ffb0
 | | | | | | | | | | | | | | | | | | < kghalf+0x00000000061f returns: 0x7f225ef1ffb0

These are the apparent check functions! We also see it takes the buffer address as first argument. This is where perf measured the majority of the time being spend (kdr1chk, not in this snippet, and kdb4chk1)! The last noteworthy things in the trace for handling this block are:

 | | | | | | | | | | | < kcbchk_ctx+0x0000000000ff returns: 0
 | | | | | | | | | | | > kgce_fini(0x7f225f080c80, 0x7ffe4ed03b98, ...)
 | | | | | | | | | | | | > skgce_end_ipp_f(0x7f225f080c80, 0x7ffe4ed03b98, ...)
 | | | | | | | | | | | | < skgce_end_ipp_f+0x000000000014 returns: 0
 | | | | | | | | | | | < kgce_fini+0x000000000037 returns: 0
 | | | | | | | | | | < kcbz_encdec_tbsblk+0x000000001f36 returns: 0

Here we see the return from the kcbchk_ctx call, which were identified above as ‘check functions’, then another call to a function in the encryption family (kgce_fini), which clearly indicates it’s stopping the encryption/decryption functions (“fini”), and then returning from the kcbz_encdec_tbsblk function. There are some functions executed after kcbz_encdec_tbsblk, like kcbz_check_objd_typ. A little further you will see the whole sequence starting again for the next block with the function kcbzvb.

This makes me wonder. We saw the block being read from disk into the table_scan_read_buffer, from the read buffer into the cache, then being decrypted, then being checked. I turned db_block_checking and db_block_checksum off, why would there be a need for checking? But let’s not get carried away, I made some assumptions. I am right in my assumptions?

If you look at the logical flow it’s:
– table scan read buffer -> buffer cache buffer (kcbz_table_scan_read>_intel_fast_memcpy)
– identify if block needs decrypting (kcbz_encdec_tbsblk>kcbztek_trace_blk)
– decrypt buffer cache buffer (kcbz_encdec_tbsblk>kcgce_dec)
– check buffer cache buffer (kcbz_encdec_tbsblk>kcbchk_ctx)

If I can identify the buffer, then stop the execution at kcbztek_trace_blk to look at the buffer to check for it being encrypted, then stop at kcbchk_ctx and look at the same buffer again, and see if it is decrypted, my assumption that the kgce_dec function is doing the decryption and that the checking is based on the unencrypted block contents are correct.

In order to do this, we can use gdb and have the function arguments printed of memcpy, and break at the function kcbztek_trace_blk, print the buffer and then continue and break at kcbchk_ctx and print the buffer again. This is how I created my gdb macro:

break _intel_fast_memcpy
  commands
    silent
    printf "_intel_fast_memcpy dest: %lx, src: %lx, size: %d\n", $rdi, $rsi, $rdx
    c
  end
break kcbztek_trace_blk
  commands
    silent
    print $rip
  end
break kcbchk_ctx
  commands
    silent
    print $rip
  end

Now create a database session, flush the buffer cache (alter system flush buffer_cache) to make sure the small table t_is_encrypted is read from disk again, and thus decrypted, and attach to it with gdb and source the macro:

$ gdb -p 57531 -x tde_verification.gdb
...
(gdb) set pagi off
(gdb) c
Continuing.

I turned pagination off (set pagi off) because there is a lot of memcpy going on, otherwise gdb will pause execution if the screen is filled, and used the continue (c) command, so we can use the sqlplus session again. Now execute ‘select * from t_is_encrypted’, this is what it looks like in the debugger:

_intel_fast_memcpy dest: 7f85093a7f0a, src: 7ffd589f6690, size: 7
_intel_fast_memcpy dest: 7f85093a7f11, src: 7ffd589f6624, size: 4
_intel_fast_memcpy dest: 7f85093a7f15, src: 7ffd589f6628, size: 4
_intel_fast_memcpy dest: 7f85093a7f19, src: 7ffd589f662c, size: 4
_intel_fast_memcpy dest: 7f85093a7f1d, src: 7ffd589f6630, size: 4
_intel_fast_memcpy dest: 7f85093bac08, src: 7ffd589f3238, size: 336
_intel_fast_memcpy dest: 7ffd589f0e30, src: 2b7f89c20, size: 456
_intel_fast_memcpy dest: 7ffd589f3238, src: 7f85093bac08, size: 336
$1 = (void (*)()) 0xca1f7f0 

This is actually reading the segment header, so we continue (c).

(gdb) c
Continuing.
_intel_fast_memcpy dest: 7f8508bf86a8, src: 7f8508bf8060, size: 712
_intel_fast_memcpy dest: 7f8508bf86c0, src: 7f8508bf8078, size: 264
_intel_fast_memcpy dest: 7f85093bac08, src: 7ffd589eb308, size: 336
_intel_fast_memcpy dest: 7ffd589e8f00, src: 2b7f89c20, size: 456
_intel_fast_memcpy dest: 7ffd589eb308, src: 7f85093bac08, size: 336
_intel_fast_memcpy dest: 1a5576000, src: 2b1115000, size: 8192
_intel_fast_memcpy dest: ee6b8000, src: 2b1117000, size: 8192
_intel_fast_memcpy dest: 12eb54000, src: 2b1119000, size: 8192
_intel_fast_memcpy dest: 200c56000, src: 2b111b000, size: 8192
_intel_fast_memcpy dest: 1a570e000, src: 2b111d000, size: 8192
$2 = (void (*)()) 0xca1f7f0 

Here we see kcbztek_trace_again, and we see memcpy executed 5 times with a size of 8192. These are the blocks copied from the table scan read buffer! The buffer we are going to look at is 0x1a5576000. In order to examine the block, we are going to print 8192 bytes starting from 0x1a5576000:

(gdb) x/8192bx 0x1a5576000
0x1a5576000:	0x06	0xa2	0x00	0x00	0x8b	0x03	0x00	0x00
0x1a5576008:	0xf4	0x9a	0x4e	0x00	0x00	0x00	0x01	0x16
0x1a5576010:	0x3a	0xf2	0x00	0x00	0x30	0x7f	0xf4	0x80
0x1a5576018:	0x5f	0xf1	0xd5	0xe9	0xd0	0x5a	0xeb	0x59
0x1a5576020:	0x5c	0x7c	0x90	0x87	0x21	0x46	0x44	0xf3
0x1a5576028:	0x66	0xa2	0x6a	0x3d	0xd4	0x97	0x6d	0x1e
0x1a5576030:	0xab	0xf8	0x06	0xd4	0x17	0x5c	0xc9	0x1a
0x1a5576038:	0xe6	0x86	0xf4	0x5a	0x20	0xbe	0x8d	0x22
0x1a5576040:	0x0c	0x2a	0x29	0x2b	0xa1	0x2a	0x3b	0x86
0x1a5576048:	0x4b	0x31	0x13	0xa7	0x4c	0xcd	0x4c	0xa4
0x1a5576050:	0xf1	0x1a	0xa3	0x59	0xf0	0x70	0x31	0xd5
...snip...
0x1a5577fb8:	0xf8	0x24	0x90	0x74	0xa3	0x23	0x65	0xf6
0x1a5577fc0:	0xee	0x76	0x6c	0xf5	0x8b	0xaf	0x23	0x7e
0x1a5577fc8:	0xc4	0x5b	0xbd	0x5f	0x26	0xb8	0x5e	0x3b
0x1a5577fd0:	0x65	0x96	0x48	0x88	0xb3	0x37	0x10	0x15
0x1a5577fd8:	0x3f	0xd8	0x62	0x54	0x4f	0xa3	0xf3	0x60
0x1a5577fe0:	0xfc	0x2e	0x9d	0xf4	0xc6	0x49	0xbc	0x4f
0x1a5577fe8:	0x85	0x3b	0x8a	0xd1	0xa3	0xc8	0x61	0x72
0x1a5577ff0:	0xfd	0xb8	0xe1	0x2e	0x65	0x92	0x0d	0x70
0x1a5577ff8:	0x50	0xd8	0x34	0x06	0x01	0x06	0xf4	0x9a

I can’t tell from looking at the hexadecimal numbers if this block is encrypted or not. However, this block is supposed to have a single row containing ‘AAAAAAAAAA’, which means there is supposed to be a lot of empty space. I can’t find a repeating value, and the entire block contains values. This might be the result of the block being encrypted. Now, if my assumption is correct, if we continue and then when execution breaks on kcbchk_ctx, the same buffer should look entirely different, because it then has been decrypted by kgce_dec:

(gdb) c
Continuing.
_intel_fast_memcpy dest: 7ffd589eb3f0, src: 2a5e2ced8, size: 240
_intel_fast_memcpy dest: 7ffd589ea860, src: e7ef774, size: 15
_intel_fast_memcpy dest: 7ffd589eaa80, src: 7ffd589eb3c8, size: 16
_intel_fast_memcpy dest: 1a5576014, src: 7f8508ccc908, size: 8168
$3 = (void (*)()) 0xc9ceed0 
(gdb) x/8192bx 0x1a5576000
0x1a5576000:	0x06	0xa2	0x00	0x00	0x8b	0x03	0x00	0x00
0x1a5576008:	0xf4	0x9a	0x4e	0x00	0x00	0x00	0x01	0x06
0x1a5576010:	0xdd	0x46	0x00	0x00	0x01	0x00	0x00	0x00
0x1a5576018:	0xc3	0x7b	0x01	0x00	0xed	0x9a	0x4e	0x00
0x1a5576020:	0x00	0x00	0x00	0x00	0x02	0x00	0x32	0x00
0x1a5576028:	0x88	0x03	0x00	0x00	0x07	0x00	0x1d	0x00
0x1a5576030:	0xd0	0x01	0x00	0x00	0x87	0x09	0x00	0x00
0x1a5576038:	0x90	0x03	0x27	0x00	0x01	0x20	0x00	0x00
0x1a5576040:	0xf4	0x9a	0x4e	0x00	0x00	0x00	0x00	0x00
0x1a5576048:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5576050:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5576058:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5576060:	0x00	0x00	0x00	0x00	0x00	0x01	0x01	0x00
0x1a5576068:	0xff	0xff	0x14	0x00	0x8a	0x1f	0x76	0x1f
0x1a5576070:	0x76	0x1f	0x00	0x00	0x01	0x00	0x8a	0x1f
0x1a5576078:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5576080:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5576088:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
...snip...
0x1a5577fa8:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5577fb0:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5577fb8:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5577fc0:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5577fc8:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5577fd0:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5577fd8:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5577fe0:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x1a5577fe8:	0x00	0x00	0x00	0x00	0x00	0x00	0x2c	0x01
0x1a5577ff0:	0x01	0x0a	0x41	0x41	0x41	0x41	0x41	0x41
0x1a5577ff8:	0x41	0x41	0x41	0x41	0x01	0x06	0xf4	0x9a

After continuing, we see a couple of memcpy’s, after which the debugger breaks execution when kcbchk_ctx is called. We execute the examine command again, and now the block looks way different. Way more zero’d locations, the place of the different headers is visible because of the zero’s in between, and starting from 0x1a5576078 it’s all zero until 0x1a5577fe8. We see a repeating value 0x41. Is that my ‘A’?

(gdb) p (char) 0x41
$2 = 65 'A'

Yes! These are the 10 A’s!

So, this means that before we enter the kcbchk_ctx function inside kcbz_encdec_tbsblk the block is actually already decrypted, and kcbchk_ctx performs something that takes a lot of the time, which seems to be checking, based on the function naming. Because the block is already decrypted when we enter kcbchk_ctx, it seems to me that the checking step is technically optional.

Further background analysis
I stated that with tablespace encryption the block is decrypted when read into the cache, that non-data blocks are not encrypted and that once the block is decrypted in the cache, it can be read without any encryption overhead because it is not encrypted (the block being decrypted in the cache is actually documented by Oracle). This actually can be proven with a debugger macro:

break pread64
  commands
    silent
    print $rip
    c
  end
break kgce_dec
  commands
    silent
    print $rip
    c
  end
break kcbchk_ctx
  commands
    silent
    print $rip
    c
  end
break kcbz_encdec_tbsblk
  commands
    silent
    print $rip
    c
  end
break kcbztek_trace_blk
  commands
    silent
    print $rip
    c
  end
break kdr4chk
  commands
    silent
    print $rip
    c
  end
break kcbgtcr
  commands
    silent
    print $rip
    c
  end

If you flush the buffer cache, and then attach to a foreground session with gdb and source the above macro, you get the following output:

$55 = (void (*)()) 0xc9d9a20 
$56 = (void (*)()) 0x7fd2c72b8100 
$57 = (void (*)()) 0xca1f7f0 
$58 = (void (*)()) 0xc9d9a20 
$59 = (void (*)()) 0xc9d9a20 
$60 = (void (*)()) 0x7fd2c72b8100 
$61 = (void (*)()) 0xca1f7f0 
$62 = (void (*)()) 0x71d5320 
$63 = (void (*)()) 0x3923750 
$64 = (void (*)()) 0xc9ceed0 
$65 = (void (*)()) 0xcdd26a0 
$66 = (void (*)()) 0xca1f7f0 
$67 = (void (*)()) 0x71d5320 
$68 = (void (*)()) 0x3923750 
$69 = (void (*)()) 0xc9ceed0 
$70 = (void (*)()) 0xca1f7f0 
$71 = (void (*)()) 0x71d5320 
$72 = (void (*)()) 0x3923750 
$73 = (void (*)()) 0xc9ceed0 
$74 = (void (*)()) 0xca1f7f0 
$75 = (void (*)()) 0x71d5320 
$76 = (void (*)()) 0x3923750 
$77 = (void (*)()) 0xc9ceed0 
$78 = (void (*)()) 0xca1f7f0 
$79 = (void (*)()) 0x71d5320 
$80 = (void (*)()) 0x3923750 
$81 = (void (*)()) 0xc9ceed0 
$82 = (void (*)()) 0xc9d9a20 
$83 = (void (*)()) 0xc9d9a20 
$84 = (void (*)()) 0xc9d9a20 
$85 = (void (*)()) 0xc9d9a20 

Rows 1-4: First we see kcbgtcr (kernel cache buffers get consistent read); as I have made clear in the past, all (normal) reads in the oracle databases start off as a logical read; but since we just flushed the cache, the segment header needs to be read from disk, which explains the pread64. After the single block read, the function kcbztek_trace_blk is executed, but because this is a non-data block, no encryption is applied to this block, so there is no decryption to be done.
Rows 5-11: Then the multiblock read is initiated, which is executed via a pread64. After the multibock read, the blocks are put in the buffer cache. For the first block, we see the kcbztek_trace_blk function being executed, and because this block is encrypted, the function kcbz_encdec_tbsblk is invoked, and then the actual decryption in the function kgce_dec. After performing decryption, the apparent checking function kcbchk_ctx is called, and inside that function kdr4chk is executed (which checks an individual row, we got one row).
Rows 12-15,16-19,20-23,24-27: The other blocks that are read do not contain any data, however since they are data blocks, you see kcbztek_trace_blk which finds the encryption flag, after which kcbz_encdec_tbsblk is called, which calls kgce_dec, and then kcbchk_ctx. kdr4chk is not called, because there are no rows.

However, what is interesting is what will be called when we execute the same full table scan again over the 5 blocks now that they are in the buffer cache:

$86 = (void (*)()) 0xc9d9a20 
$87 = (void (*)()) 0xc9d9a20 
$88 = (void (*)()) 0xc9d9a20 
$89 = (void (*)()) 0xc9d9a20 
$90 = (void (*)()) 0xc9d9a20 
$91 = (void (*)()) 0xc9d9a20 
$92 = (void (*)()) 0xc9d9a20 

Exactly how this is documented it would work! The foreground process now can just use the consistent read mechanism to read the blocks in the cache, there is no encryption codepath called. Please mind that when you change blocks in the cache, the changing does not get any additional overhead too, because it works 100% the same as a non-encrypted block. However, what does get a performance penalty is writing the blocks to disk (!). The database writer needs to encrypt the blocks before it can store them on disk! This means that DML intensive databases that switch to tablespace encryption, might need a bigger buffercache because there is more time and resources needed to write them to disk, which might lead to more dirty buffers in the cache, and thus lesser buffers available for other usage.

Also mind that encryption functions of tablespace encryption get called as part of the IO codepath:

#0  0x00000000071d5320 in kcbz_encdec_tbsblk ()
#1  0x00000000012e8e7c in kcbzibmlt ()
#2  0x00000000012dd9d3 in kcbzib ()
#3  0x000000000c9dbda9 in kcbgtcr ()
#4  0x000000000c9c0ac5 in ktrget2 ()
#5  0x000000000ccbc786 in kdstgr ()
#6  0x000000000cd0a0e9 in qertbFetch ()
#7  0x000000000cb6e42b in opifch2 ()
#8  0x00000000021e49f3 in kpoal8 ()
#9  0x000000000cb75a5d in opiodr ()
#10 0x000000000cda421b in ttcpip ()
#11 0x0000000001baba3c in opitsk ()
#12 0x0000000001bb0401 in opiino ()
#13 0x000000000cb75a5d in opiodr ()
#14 0x0000000001ba7b8b in opidrv ()
#15 0x00000000026b31b1 in sou2o ()
#16 0x0000000000bbda7a in opimai_real ()
#17 0x00000000026bd8fc in ssthrdmain ()
#18 0x0000000000bbd9bc in main ()

1. This is the master encryption routine.
2. This is the kernel cache space management insert block multiblock routine.
3. This is the kernel cache space management insert block routine.
4. This is the consistent read call kcbgtcr.
5. This is the kernel transaction layer buffer get routine.
6. This is the kernel data scan table get rows (full table scan).
7. This is the SQL execution rowsource tablescan fetch routine.
In other words: the decryption is in the IO codepath, or: is executed as part of the IO routines.

Update april 4, 2016:
Further investigation showed the checking (entering the kcbchk_ctx function) CAN be eliminated when the undocumented parameter “_disable_block_checking” is set to FALSE.
There is a bug logged for this issue: 12943305 – decryption performance can be improved, and has a fix. This is reported to be fixed in 12.2, and a patches available for the different 12.1.0.2 PSUs.

Fixing Development Mistakes from the Command Line with Delphix TimeFlow

How many times have you had a developer come to you and say, “I just did a bad thing in the database.  Can you recover from what I just did?”

With Delphix virtualization, we make this pretty easy to address from the user interface with a simple slider to recover from a PIT before the catastrophic mistake, but today, we’ll discuss how to do this from the command line

1.Log into the Delphix engine as an admin user.

ssh delphix_admin@
delphix > timeflow
delphix timeflow > ls

2. Depending on the platform that you’re using, (in our example, we’ll use Oracle) you’ll see the list of the databases available and can choose the one that you want to refresh before the catastrophic incident from the clueless developer

delphix database> select [VDB name]

3. We can do a simple rollback if we just want to go back to the last snapshot or we can use an list command to see more options:

delphix database "[VDB Name]"> rollback
delphix database "[VDB Name]" rollback *> ls
Properties
    type: OracleRollbackParameters
    credential: (unset)
    timeflowPointParameters:
        type: TimeflowPointSemantic
        container: (required)
        location: LATEST_POINT
    username: (unset)

4. So we’ve decided to do a PIT recovery after the mistake and use the following command and then commit the changes:

delphix database "[VDB Name]" rollback *> set timeflowPointParameters.location=82439
delphix database "[VDB Name" rollback *> commit

That’s all there is to it.

So what if the developer is incompetent and screws up repeatedly?

Follow steps 1-4 above and then to purge the problem from the environment, run the following command from the delphix engine:

delphix database developer [developer employee ID] remove *> commit

An eject button we’ve installed in the delphix engine will remove the developer from the premises and Delphix will even submit all necessary paperwork to Human Resources to complete his termination processing.

If you’d like to automate the process, you can create a handy script that simply asks for the following parameters by calling it from any shell, Powershell for windows or even Jenkins as part of DevOps!

#0000ff;">Happy April 1st!

 

 

 

 



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [Fixing Development Mistakes from the Command Line with Delphix TimeFlow], All Right Reserved. 2017.

The post Fixing Development Mistakes from the Command Line with Delphix TimeFlow appeared first on DBA Kevlar.

Using virt-install/KVM and kickstart to create lots of identical VMs quickly

In preparation for an upcoming talk about Oracle 12.2 I am planning on getting my feet wet with the Oracle 12.2 Sharding option. It seems to be a very interesting subject to get into, but like all new subjects, they can be intimidating at first due to their complexity. So why not use the instructions someone has already written down?

Oracle has a nice series of articles about Sharding. I found the Sharding landing page in the Maximum Availability Architecture section on OTN. For the purpose of this blog post I’m intending to follow the “Oracle Sharded Database Deployment with Active Data Guard using CREATE SHARD method for On-Premises – Cookbook”, available from the aforementioned landing page.

About those hardware requirements

Well that’s what I initially had planned. What I hadn’t expected was the number of VMs to be created. On page 7 in the aforementioned PDF, the author explains the pre-requisites. Copying verbally from the document:

You must acquire brand new 7 VMs which do not have any pre-existing Oracle database or listeners running on them.

Well, that’s 7 VMs. Not an awful lot of fun if you have to install all of them manually. Thankfully I don’t have to! In my lab environments I run KVM plus libvirt to make my life easier. There are many useful aspects of this combination, and one of them is named virt-install. The tool allows you to create a VM plus storage, networking etc entirely on the command line. But that would still require the administrator to navigate the anaconda screens, something I’d like to avoid. For quite some time administrators have deployed systems using an automation option. On Red Hat Linux and clones that mechanism is called “Kickstart”. Kickstart – documented in the Red Hat Installation Guide chapter 23 – is an immensely powerful way of deploying systems in a consistent, and always correct (or wrong :) way. In this article I’d like to share how you could run virt-install to install VMs with minimum effort.

Say hello to virt-install

virt-install is beautifully documented making it easy to create VMs. The syntax however takes a little while to understand. I have previously written about virt-install but those posts are now showing their age, and a new example seems in order.

Before presenting the command to create the VM let’s think about the machine’s capabilities first.

  • VM to host a single instance Oracle database and some auxiliary software
  • Should probably have at least two vcpus
  • Eight GB of DRAM should be the minimum
  • I like to separate the root file system from the application data, e.g. I want two volume groups: rootvg and oraclevg

Let’s translate that to virt-install; this works on virt-install 1.4.0 which is the current version on my Oracle Linux 7.3 host.

# virt-install --name shard0 --memory 8192 --memorybacking hugepages=yes \
--vcpus 2 --metadata description=shard0_vm --location /path/to/OL7.3.iso \
--initrd-inject=/tmp/shard0.ks --os-variant=rhel7.3 \
--extra-args="ks=file:/shard0.ks console=tty0 console=ttyS0,115200n8"  \
--disk size=12,pool=capacitypool --disk size=50,pool=latencypool \
--network network=public --nographics

This worked for me: the virt-install command (sourcing the kickstart file) creates and configures a fresh VM with all the required packages for Oracle 12c. I couldn’t reference the 12.2 preinstall RPM because it’s not part of the Oracle Linux 7.3 ISO. That’s ok for me, I can wait for the 7.4 ISO when this might be part of the standard distribution. As a workaround you could install the 12.2 preinstall RPM once the new VM is installed, the RPM can be found in the OL7 “latest” yum channel.

virt-install what?

The virt-install command might require an explanation or two. All of these parameters are explained in the man-page, the ones that stick out a bit are covered here.

name
This is the name the VM gets. You can manage it using virsh, the libvirt command line interface or it’s GUI cousin, virt-manager
memorybacking
Using this command I instruct the VM to use large pages in the hope that it simplifies the hypervisor’s job of memory management. Obviously this requires large pages to be set up and enough of them available
location
As explained in the docs you can make an ISO available to virt-install that serves as the installation source. Doing so requires you to run virt-install as root
initrd-inject and extra-args
Using these together in combination with the location flag allows me to stick a kickstart file through to the VM. In the kickstart file I configured partitioning, networking, firewall/selinux settings as well as packages. I can even have a %post section in which I can run arbitrary shell commands. The other parameters are taken from the Red Hat documentation and allow me to install the VM in console mode. Requires the “text” setting in the kickstart file, and the nographics flag
disk
Using the disk directive I request 2 qcow2 virtual disk images to be created. I place the O/S on the capacity pool (backed by hard disk – a lot of capacity but dog slow) and the Oracle related data on a virtual disk backed by the SSDs in the machine.
network
In a similar way I am instructing the VM to connect to the network I defined beforehand, named public.
nographics
If you want to make sure you don’t need X11 at all to install the VM, you can use nographics. You have to provide a “text” directive in your kickstart file and most likely set the console to something sensible, see above for an example

I have said it before and I say it again: libvirt is a fantastic tool, and so easy to use. If you spend a little bit of time preparing your environment, it gets even simpler. Here are a couple of examples:

Before creating VMs on my host I created a couple of storage pools, namely capacity and latency. That’s where I instruct the storage to be used. This way I can simply instruct libvirt to create a volume of x GB in the pool and assign it to the VM. The order matters, the 12 GB disk will become /dev/vda (KVM uses virtio drivers), the second one /dev/vdb. The partitioning scheme is defined in the kickstart file.

Likewise, I can create a network that all my VMs can connect to. On the back-end, libvirt created iptables rules, and a network bridge:

# virsh net-info public
Name:        public
UUID:        some long identifier
Active:      yes
Persistent:  yes
Autostart:   yes
Bridge:      virbr1

VMs that are part of this network – including the host (x.x.x.1) can communicate with each other.

I hope I could get you a little interested in libvirt. Have a look at the documentation, it is very good and explains these topics in more detail.

If you haven’t used virt-install previously this might look uninspiring. However using the commands above I don’t need to set up the installation source via NFS, SMB or HTTP. Neither do I need to provide the kickstart file via a web server, keeping setup requirements very low.

What about the kickstart file? Where do I get one

I had the same question when I first started with the tool. The documentation is quite good, but a working example would be nice. Turns out you have one, it’s created by default in /root/anaconda-ks.cfg and just like an Oracle response file, contains a recording of the options you entered in the GUI. Have a look at it, and adjust to your needs.