Search

OakieTags

Who's online

There are currently 0 users and 26 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Solaris Eye for the Linux Guy… Part III (hugepages = ISM)

This post has been a long time coming but recently, I have started working on some SPARC SuperCluster POC’s with customers and I am getting re-acquainted with my old friend Solaris and SPARC.

If you are a Linux performance guy you have likely heard of HugePages.   Huge pages are used to increase the performance of large memory machines but requiring fewer TLB‘s .  I am not going to go into the details TLB’s, but every modern chip supports multiple memory page sizes.

So how do you get huge pages with Solaris?

Do nothing – it is the DEFAULT with Oracle running on Solaris.

The “use_ism” parameter used to be used to control this, but it has been moved to the “_underbar” category these days since there is really no reason whatsoever to change it.   I remember doing tests back in the Solaris 8 days with/without ISM pages to show the performance differences and truly it was and still is a good thing.

How are ISM/Huge pages used with Oracle running on Solaris?

At first, ISM pages are only used for the SGA so OLTP style environments benefited the most from ISM.   With Oracle 10 on Solaris, it was also allowed for the PGA.  So, if you were doing have PGA activity like a HASH join or sort, you would benefit from larger page sizes as well.

With Solaris, it is easy to see if the page sizes of any running process by simply running the “pmap(2)” command. 

root@ssc401:~# pmap -xs 23189
 23189: ora_pmon_HC1
 Address Kbytes RSS Anon Locked Pgsz Mode Mapped File
 0000000100000000 64 64 - - 8K r-x-- oracle
 0000000100010000 48 48 - - - r-x-- oracle
 000000010001C000 64 64 - - 8K r-x-- oracle
 ...
 000000010D420000 256 256 64 - 64K rwx-- oracle
 000000010D460000 64 64 - - - rwx-- oracle
 ...
 ... 
 000000010D540000 2304 2304 2304 - 64K rwx-- [ heap ]
 0000000380000000 262144 262144 - 262144 256M rwxsR [ ism shmid=0xf00007e ]
 0000000390000000 65536 65536 - 65536 4M rwxsR [ ism shmid=0xf00007e ]
 0000000400000000 31457280 31457280 - 31457280 2G rwxsR [ ism shmid=0x600007f ]
 0000000B80000000 1572864 1572864 - 1572864 256M rwxsR [ ism shmid=0x600007f ]
 0000000BE0000000 196608 196608 - 196608 4M rwxsR [ ism shmid=0x600007f ]
 0000000C00000000 24 24 - 24 8K rwxsR [ ism shmid=0x7000000 ]
 FFFFFFFF5A800000 16 16 - - 8K r-x-- libodm11.so
 FFFFFFFF5A902000 8 8 8 - 8K rwx-- libodm11.so
 FFFFFFFF60500000 64 64 - - 64K r-x-- libclsra11.so
 FFFFFFFF60510000 24 24 - - - r-x-- libclsra11.so
 ...
 FFFFFFFF7D1FC000 8 8 - - 8K r-x-- libsched.so.1
 FFFFFFFF7D1FE000 8 8 - - 8K r-x-- libdl.so.1
 FFFFFFFF7D300000 8 8 8 - 8K rw--- [ anon ]
 FFFFFFFF7D400000 8 8 8 - 8K rw--- [ anon ]
 FFFFFFFF7D500000 8 8 8 - 8K rw--- [ anon ]
 FFFFFFFF7FFE0000 128 128 128 - 64K rw--- [ stack ]

Notice that the “text”, “heap”, “anon”, “stack”, and shared memory can all use different page sizes.  In this case, the SGA is backed by 2G, 256M, 4M, 8k ISM pages.

So what about Dynamic ISM?  Should I use ISM or DISM?

So, Dynamic ISM was introduced to resize the SGA.  DISM is really just ISM memory that can be paged.  This can be useful when you have HUGE memory machines and want to resize the SGA without taking down the instance.  But why is this needed?

  • Single-Instance availability on HUGE machines that can dynamically add/replace MEMORY.  Machines like the E10k/E25k/M9000/M10… etc all allow you to add components on the fly without restarted Solaris.  Let’s say have have a failing memory board.  You could “Shrink” the SGA so that it fits into the surviving space and while you service the faulty board.  Also, say you start with a 1/2 populated machine.  You can add memory without and grow the SGA without stopping the instance.
  • Consolidation or Cloud like services.  In this environment you can resize running instances on the fly in order to free up memory for new instances.

Personally, I don’t see a lot of used for DISM with the SuperCluster.   If you have RAC you don’t need DISM for availability reasons and with cloud/consolidation I think multiple instances within a single server is not the best practice going forward.   At one point you needed to use DISM for NUMA features, but that is not case with 11.2.0.3.

Filed under: Exadata, Linux, Oracle, Solaris

Byzantium…

I’ve just got back from watching Byzantium at the cinema.

I had never heard of this film until about 20 minutes before the film started. In this age of watered down and twinkly vampires it’s good to see them portrayed with a gritty and raw edge. There are fairly obvious similarities between this story and Interview with the Vampire, though Byzantium is not so epic. It was a rather slow paced film, but I don’t mean that in a bad way. It wasn’t trying to be horror or action, but a story about the characters, told at a steady pace.

Saoirse Ronan is intense and beautiful. I knew I recognized her from somewhere, but it was not until I checked on IMDB that I saw she was the kid from Hanna. I think she was perfect casting to play a 16 year old that has been “alive” for 200 years. I think this was a good role for Gemma Arterton too. I like her in the less-Hollywood stuff.  was so odd, in a good way, in this film. He looked so thin and awkward, compared to his role as Banshee in the X-Men: First Class.

Overall I would have to say I liked it. If you are a fan of the vampire genre and are sick of the current “My Little Pony” vampires on film and TV, you might want to give it a try.

Cheers

Tim…


Byzantium… was first posted on June 2, 2013 at 12:18 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Detective? Crime Writer? DBA? Which are you?

The DBA role can sometimes feel like a mix of detective, crime writer and DBA all thrown together. What do I mean by that? When you hit some problems you have to play detective, trying to find clues to what is going on. Once you’ve started to gather clues, you have to play crime writer and put them together to form a coherent story that explains what is going on. Once you have the outline of your crime story you can start looking at the facts again and see if they fit with your story. If they do, your story may just be correct. If they don’t, you probably need to check the accuracy of the facts and do some rewriting of the story until the two things fit together. Once things seem to fit, you can then get busy trying to arrest the villain, or fix the problem. :) This process is often necessary as part of root cause analysis.

I’ll use a recent case to highlight what I mean. I’m going to simplify things a little or the post will be too long. :)

Detective

We recently had some intermittent, but very severe performance problems on a system, resulting in the application effectively hanging for some time. As usual, it’s the DBA job to prove it’s not the fault of the database. :) A bit of detective work came up with the following facts:

  • Moderate CPU usage on the application and DB servers.
  • Gradual reduction in free memory on both servers.
  • Over time the systems started to swap.
  • Gradual increase in number of application processes connecting to the database.
  • Lots of I/O waits on the database server.

Crime Writer

So let’s switch to crime writer and think of some stories that could explain this.

Story 1

High load on the application causes lots of database connections, eventually swamping the servers.

Does it fit the facts? The reduction in free memory, eventually leading to swap could be caused by an increase in numbers of processes on the servers. Lots of processes all doing I/O on the server will probably increase the amount of I/O waits as they fight with each other for the disks. That sounds promising.

How do we confirm it? Use the AWR reports to compare the load profiles (amongst other things) of the system compared to other times in the day, days of the week, same time and day last week etc.

Conclusion: The load on the system was not substantially different from other times when the system was running fine. The story doesn’t fit together.

Story 2

Slow I/O is reducing response time. As new requests come in, the backlog of work is getting bigger…

Does it fit the facts? In checking out the AWR reports looking for confirmation of the previous story we noticed the average times of certain physical I/O operations were about 30ms. Normally the average is <1ms. Checking the event histogram section of the AWR report, we could see a very wide spread of timings for physical I/O operations, including some in excess of 1 second. After a few minutes we started to see Warning “aiowait timed out x times” in alert.log [ID 222989.1] messages in the alert log, suggesting some of those I/O events were taking longer than 10 minutes. :(

Now we have a new fact to add to the previous list:

  • Some of the individual I/O waits are extremely long!

We know the application layer of this application is not too clever. If a request comes and there is an idle application process available, it will be reused. If no idle application process is available, a new one is spawned to process the request. The only limit on this is the max number of processes for that user at the OS level.

If the disk I/O is slow, application requests will take a long time to complete. If application requests are taking a long time, the likelihood of new application requests finding an idle application process to reuse is reduced, so the incidence of new processes being spawned increases. As new processes are spawned, we see a gradual increase in memory usage, leading eventually to swap. At the same time, we see an increase in database connections, which require processes, which use memory… You can see where this is going…

Conclusion: This story does seem to fit the facts. Essentially, memory, swap and the number of processes are the symptom, not the cause of the problem. The root cause could be the I/O performance issue.

Catching the Criminal

Now I said this was an intermittent problem. The next time it occurred I was prepared and watched events unfold using Luca Canali’s nifty Latency Heat Map Visualization, which prompted this quick post. The result of this was we had pretty solid evidence to suggest the problems we were experiencing were nothing to do with the application or database, but were as a result of poor I/O performance. The baton was passed to the system administrators, network and storage team to try and bottom out the problem. That process is still ongoing, but smoking gun seems to be the storage network layer.

Conclusion

Over time you build up a level of experience that allows you to spot patterns and quickly get to the root of problems. To observers it can seem almost mystical, which is why us grunt DBAs look at those database performance gurus with awe, but it all comes down to root cause analysis. The more you do this stuff, the better you get at it. Stop doing it for a while and you get rusty.

Detective? Crime Writer? DBA? Which are you? You should be all three. :)

Cheers

Tim…


Detective? Crime Writer? DBA? Which are you? was first posted on June 1, 2013 at 10:12 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Java 7 EE Launch : June 12th & 13th

The Oracle ACE program recently invited members to a teleconference session about the upcoming launch of Java 7 EE. Arun Gupta took us through a preview of the information that will be available to everyone as part of the launch.

If you are interested in getting the low down on this new release, you can register for the launch events here.

Cheers

Tim…


Java 7 EE Launch : June 12th & 13th was first posted on May 31, 2013 at 8:08 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Oracle Security Class and software for Oracle security

I have just agreed a public class dates of my very popular " How to perform a security audit of an Oracle database " with Oracle University to be held on September 24th and 25th in Rome, Italy. The registration....[Read More]

Posted by Pete On 30/05/13 At 05:54 PM

BGOUG Spring 2013 : Photos

I was a little bit rubbish and forgot to get photos of the recent BGOUG event. Fortunately, the conference photos have arrived. You can check them out here.

There are a number of photos of an old fat guy that looks a little like me. I don’t remember meeting him though. Weird… I think he needs the toilet in this shot.

By the way, don’t say stuff like, “Bind variables are a waste of time!”, when you are near to Tom Kyte. He will slap you quicker than Debra Lilley after a negative comment about Fusion Apps. :)

Cheers

Tim…


BGOUG Spring 2013 : Photos was first posted on May 29, 2013 at 7:55 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Forcing Smart Scans on Exadata – is the _serial_direct_read parameter safe to use in production?

One of the most common Exadata performance problems I see is that the direct path reads (and thus also Smart Scans) don’t sometimes kick in when running full scans in serial sessions. This is because in Oracle 11g onwards, the serial full segment scan IO path decision is done dynamically, at runtime, for every SQL execution – and for every segment (partition) separately. Whether you get a direct path read & smart scan, depends on the current buffer cache size, how big segment you’re about to scan and how much of that segment is actually cached at the moment. Note that the automatic IO path decision for index fast full scans is slightly different from table scans.

This dynamic decision unfortunately can cause unexpected surprises and variance in your report/batch job runtimes. Additionally, it looks like the SELECT part of your UPDATE/DELETE statements (the select part finds the rows to update/delete) does not ever automatically get direct path read/smart scan chosen – by design! So, when your SELECT statement may use smart scan and be really fast, the same select operation in an INSERT SELECT (or UPDATE/DELETE) context will not end up using smart scans by default. There’s even a bug explaining that – closed as “not a bug” (Exadata Smartscan Is Not Being Used On Insert As Select[Article ID 1348116.1]).

To work around these problems and force a direct path read/smart scan, you can either:

  1. Run your query in parallel as parallel full segment scans will use direct path reads, unless your parallel_degree_policy = AUTO, then you may still get buffered reads thanks to the dynamic in-memory parallel execution decision of Oracle 11.2
  2. Run your query in serial, but force the serial direct path reads by setting _serial_direct_read = TRUE (or ALWAYS in 11.2.0.2+)

Here are the valid options for this parameter in 11.2.0.2+

SQL> @pvalid _serial_direct_read

Display valid values for multioption parameters matching "_serial_direct_read"...

  PAR# PARAMETER                                                 ORD VALUE                          DEFAULT
------ -------------------------------------------------- ---------- ------------------------------ -------
  1993 _serial_direct_read                                         1 ALWAYS
       _serial_direct_read                                         2 AUTO
       _serial_direct_read                                         3 NEVER
       _serial_direct_read                                         4 TRUE
       _serial_direct_read                                         5 FALSE

And this leads to the question – as _serial_direct_read is an undocumented, hidden parameter – is it safe to use it in production?

In my mind, there are 3 kinds of parameters:

  1. Documented parameters – they should be safe and should work. If they don’t, it’s a bug and should get fixed by Oracle
  2. Undocumented parameters which nobody uses and knows much about
  3. Undocumented parameters which are documented in My Oracle Support (and are widely used in practice)

You shouldn’t use #2 parameters in production without a written blessing by Oracle Support – and I’d like to know some justification, why the recommended parameter ought to help.

The _serial_direct_read, however belongs to category #3 – it’s an undocumented parameter, but widely documented by public use and more importantly (formally), documented in My Oracle Support. If you search MOS for _serial_direct_read, you’ll find plenty of notes recommending the _serial_direct_read as a workaround – but the best of them is MOS note Best Practices for OLTP on the Sun Oracle Database Machine[Article ID 1269706.1], which says this:

Direct reads bypass the buffer cache and go directly into the process PGA. Cell offload operations occur for direct reads only. Parallel query processes always use direct reads and therefore offload any eligible operations (ex: scans). For normal foreground processes running serially, the RDBMS decides whether direct or buffered reads are the best option based on both the state of the buffer cache and the size of the table. In the event that the RDBMS does not make the correct choice to use direct reads/offload processing, you can set _serial_direct_read=TRUE (available at session level). Keep in mind that it is possible that setting this parameter will makes things worse if the app is better off with buffered reads, so make sure you know that you want direct reads/offload processing before setting it.

Comments:

  1. The abovementioned MOS note is actually incorrect stating that “Parallel query processes always use direct reads“, the in-memory parallel execution changes this, as explained above.
  2. The note recommends setting the _serial_direct_read = TRUE, but I’ve used ALWAYS for clarity. The TRUE before 11.2.0.2+ really meant “AUTO”, the dynamic decision and FALSE meant NEVER. But starting from Oracle 11.2.0.2 onwards, the TRUE = ALWAYS, FALSE = NEVER and AUTO means dynamic decision (like TRUE used to mean on 11.2.0.1 and before)
  3. This parameter forces direct reads only for full segment scans (full table scan and fast full index scan variations), your “random” index lookups and range scans etc will still use reads via buffer cache regardless of this parameter – which is great for OLTP and mixed workload systems
  4. The _serial_direct_read parameter controls the direct path read decision both for table and index segment scans

I’m not a fan of setting such parameters at system level, but in past I have created login triggers in mixed-workload environments, where the OLTP sessions will never do a direct path read (or I leave the decision automatic) and the reporting and batch sessions in the same database will have forced direct path reads for their full segment scans.

This article served two purposes, talking about direct path reads and also evaluating whether that cool-undocumented-parameter should be used in production at all. First, you must be able to argument, why would this parameter help at all – and second, is it at least documented as a valid workaround in some MOS article. If unsure, raise an SR and get the Support’s official blessing before taking the risk.


French Philosophy

From Mohamed Houri, here’s a French translation of all my “Philosophy” notes to date.

How to hint

Here’s a live example demonstrating a point I’ve often made – you have to be very detailed in your hinting or Oracle will find a way to obey your hints and do the wrong thing.  A recent posting on the OTN database forum gave use the following query and execution plan:

SELECT
	ERO.DVC_EVT_ID,
	E.DVC_EVT_DTTM
FROM D1_DVC_EVT E,
     D1_DVC_EVT_REL_OBJ ERO
WHERE
	ERO.MAINT_OBJ_CD = 'D1-DEVICE'
AND	ERO.PK_VALUE1 = :H1
AND	ERO.DVC_EVT_ID = E.DVC_EVT_ID
AND	E.DVC_EVT_TYPE_CD IN (
		'END-GSMLOWLEVEL-EXCP-SEV-1',
		'STR-GSMLOWLEVEL-EXCP-SEV-1'
	)
ORDER BY
	E.DVC_EVT_DTTM DESC

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Starts | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers |Reads  |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |      1 |  3196 (100)|       |       |    134 |00:00:13.85 |    3195 |  2136 |
|   1 |  SORT ORDER BY                       |            |      1 |  3196   (1)|       |       |    134 |00:00:13.85 |    3195 |  2136 |
|   2 |   NESTED LOOPS                       |            |      1 |            |       |       |    134 |00:00:13.85 |    3195 |  2136 |
|   3 |    NESTED LOOPS                      |            |      1 |  3195   (1)|       |       |   1059 |00:00:07.77 |    2138 |  1197 |
|*  4 |     INDEX RANGE SCAN                 | TEST1      |      1 |    30   (0)|       |       |   1059 |00:00:00.07 |      11 |    11 |
|   5 |     PARTITION RANGE ITERATOR         |            |   1059 |     1   (0)|   KEY |   KEY |   1059 |00:00:07.69 |    2127 |  1186 |
|*  6 |      INDEX UNIQUE SCAN               | D1T400P0   |   1059 |     1   (0)|   KEY |   KEY |   1059 |00:00:07.67 |    2127 |  1186 |
|*  7 |    TABLE ACCESS BY GLOBAL INDEX ROWID| D1_DVC_EVT |   1059 |     2   (0)| ROWID | ROWID |    134 |00:00:06.08 |    1057 |   939 |
-----------------------------------------------------------------------------------------------------------------------------------------

You’ll notice that something close to half the time spent came from the table access in line 7 (This is 11g, and we have a plan which shows the “double nested loop” of an index access followed by a table access – for each rowid returned in line 3 (totalling 7.77 seconds) we access the table through the nested loop driven by line 2 which totals 13.85 seconds).

After a little chat, the suggestion arose to introduce an index that avoided the table access – it’s doing a fairly large amount of random I/O, and we might be able to run the query roughly twice as fast if we didn’t visit it. So the DBA set up a suitable test index (called test2) on the D1_DVC_EVT table, and found that the optimizer didn’t use it (perhaps because the index was larger then the alternative, perhaps because the clustering_factor was much bigger) – so he added a hint to the code: /*+ index (e test2) */ which made Oracle use the index to produce the following plan:

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Starts | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |      1 | 98415 (100)|       |       |    134 |00:04:11.82 |     100K|  96848 |
|   1 |  SORT ORDER BY              |       |      1 | 98415   (1)|       |       |    134 |00:04:11.82 |     100K|  96848 |
|*  2 |   HASH JOIN                 |       |      1 | 98414   (1)|       |       |    134 |00:04:11.82 |     100K|  96848 |
|*  3 |    INDEX RANGE SCAN         | TEST1 |      1 |    30   (0)|       |       |   1059 |00:00:00.01 |      11 |      0 |
|   4 |    PARTITION RANGE ALL      |       |      1 | 98249   (1)|1048575|     1 |   7566K|00:03:34.58 |     100K|  96848 |
|   5 |     PARTITION RANGE SUBQUERY|       |    287 | 98249   (1)|KEY(SQ)|KEY(SQ)|   7566K|00:03:10.87 |     100K|  96848 |
|*  6 |      INDEX FULL SCAN        | TEST2 |   2296 | 98249   (1)|1048575|     1 |   7566K|00:02:45.47 |   97412 |  96848 |
----------------------------------------------------------------------------------------------------------------------------

Unfortunately, although Oracle obeyed the hint – it had to, since it was legal and in-context – it didn’t take the path the DBA expected.

When you hint, you have to make it impossible for Oracle find any path you don’t want, and that can take a lot of hints. In this case the DBA simply wanted to use the same nested loop path that he’d originally seen, but using the new index instead. To get the path safely he needed at least 4 hints: one to specify the join order, one to specify the join method, and one for each table to specify the access method. In this case:

/*+
        leading(ero e)
        use_nl(ero e)
        index(ero test1)
        index(e test2)
*/

Once you’ve hinted some SQL and got it working the safe thing to do, in 11g, is to check the outline section of the actual execution plan to see if you’ve missed any important hints and then, if you can’t change the production code, attach the SQL Baseline from your hinted code to the SQL text from the original. (See – for example: http://jonathanlewis.wordpress.com/2011/01/12/fake-baselines/ )

It’s hard to create a full set of hints by hand – and I often see hinted SQL in production systems where the plan that appears happens to be the right one but it’s not the only plan that could be derived from the hints. So my 11g mantra for hinting is this: if you can hint it, baseline it”.

 

CUR Gets

Here’s an interesting anomaly from the OTN database forum – PL/SQL track. It’s a single row delete from a table that does a huge number of db block gets and (in the first example shown) physical reads. Here’s the supplied output with autotrace enabled:

SQL> delete from messages2 where id = 11004240718;

1 row deleted.

Elapsed: 00:00:03.79

Execution Plan
----------------------------------------------------------
Plan hash value: 140582024

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |              |     1 |    17 |     2   (0)| 00:00:01 |
|   1 |  DELETE            | MESSAGES2    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0016189 |     1 |    17 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"=11004240718)

Statistics
----------------------------------------------------------
         61  recursive calls
       7381  db block gets
         92  consistent gets
       5802  physical reads
     303368  redo size
        839  bytes sent via SQL*Net to client
        801  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
          1  rows processed

After a little chit-chat, the obvious guess turns out to be relevant – thought the OP doesn’t quite tell us what we need to know – there is a constraint on another table which, when enabled, results in a large amount of work, and when disabled reduces the workload to the expected level. The excess work is (probably) Oracle checking a referential integrity (foreign key) constraint to ensure that there are no child rows blocking the delete (or, possibly, operating an ON DELETE CASCADE operation – although the OP tells us that there is no such constraint).

Here’s a quick and dirty, approximate, demo on 11.2.0.3 with an 8KB block size:

create table parent (
	id		number(8),
	description	varchar2(10),
	constraint par_pk primary key (id)
)
;

create table child(
	id_p	number(8)
		constraint chi_fk_par
		references parent,
--		on delete cascade
	id	number(8),
	description	varchar2(10),
	constraint chi_pk primary key (id_p, id)
)
;

insert into parent values (10000001,'one');
insert into child values(10000001, 1,'one')

insert into parent values (10000002,'two');
begin
	for i in reverse 1..20000 loop
		insert into child values(10000002,i,'two');
	end loop;
end;
/

insert into parent values (10000003,'three');
insert into child values(10000003, 1,'three')

commit;

begin
	dbms_stats.gather_table_stats(user,'child',cascade=>true);
	dbms_stats.gather_table_stats(user,'parent');
end;
/

delete from child;
commit;

select	index_name, leaf_blocks
from	user_indexes
where	table_name = 'CHILD'
;

The output from the query on the child index reported 103 leaf blocks for the index.
So here’s the code to prime the test:

variable b1 number
exec :b1 := 10000001
delete from parent where id = :b1;

set autotrace on

exec :b1 := 10000002
delete from parent where id = :b1;

set autotrace off

Note how I set up my data so that I could use a bind variable from SQL*Plus to do a quick and cheap delete to get the optimisation sorted out on the first run. This doesn’t really help in the version of the run that I’m going to show because the autotrace with the implicit explain re-optimises anyway, but here’s the plan and stats:

Execution Plan
----------------------------------------------------------
Plan hash value: 3366423708

-------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost  |
-------------------------------------------------------------
|   0 | DELETE STATEMENT   |        |     1 |     6 |     1 |
|   1 |  DELETE            | PARENT |       |       |       |
|*  2 |   INDEX UNIQUE SCAN| PAR_PK |     1 |     6 |       |
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"=TO_NUMBER(:B1))

Statistics
----------------------------------------------------------
          0  recursive calls
        107  db block gets
          1  consistent gets
          0  physical reads
       7416  redo size
        918  bytes sent via SQL*Net to client
       1038  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

Note the db block gets statistics – 107, compared to the 103 leaf blocks in the index – that’s a pretty good indication by itself that the workload that the OP is seeing is related to the prior massive delete from the child table with a large scan (possibly including a high volume of delayed block cleanout in his case) of the now empty section of the foreign key index.

If one quick test isn’t enough to convince you that the explanation is probably relevant then you can run the test a few times with different numbers of rows in the child table, and with the referential integrity constraint changed to on delete cascade to see what happens if you don’t do the child delete but leave it up to Oracle to handle the cascade. You might also want to check the complete set of session stats, and generate a trace file – as I did, but there’s already to much Oracle output in the posting already, so I haven’t included the results.