Search

OakieTags

Who's online

There are currently 0 users and 22 guests online.

Recent comments

Oakies Blog Aggregator

[Oracle] Trace back to responsible SQL or PL/SQL code for a particular (PGA) memory request by intercepting process with DTrace

Introduction

This is just a short blog post about a simple DTrace script (dtrace_kghal_pga_code), that i recently wrote and published due to a PGA memory leak troubleshooting assignment. A client of mine noticed a major PGA memory increase after upgrading to Oracle 12c. The PL/SQL code did not change - just the database release. He already troubleshooted the issue with help of Tanel Poder's blog post "Oracle Memory Troubleshooting, Part 4: Drilling down into PGA memory usage with V$PROCESS_MEMORY_DETAIL" and identified the corresponding heap and allocation reason. However there is one issue by just looking at V$PROCESS_MEMORY_DETAIL or heap dumps - you can not trace back the causing PL/SQL or SQL code very easily.

 

 

DTrace approach with script "dtrace_kghal_pga_code"

The basic idea behind the DTrace script is to check the memory allocation reason on probe entry level (for KGHAL memory allocator functions) and stop the Oracle process at this point if the check is true. After the Oracle process has been stopped by DTrace you can request an error stack with oradebug and continue the Oracle process. As the process continues you get an errorstack trace which includes the responsible code for this particular memory allocation request (in PGA).

 

The following demo is run with Oracle 12.1.0.2 on Solaris 11.2 x86. I also used Frits Hoogland's PL/SQL code from here and put it into PL/SQL function called mem_alloc to allocate a lot of PGA memory.

 

SQL> select mem_alloc() from dual;

 

After a few seconds Tanel Poder's script smem_detail.sql can be used to determine the detailed PGA memory usage.

 

SQL> @http://blog.tanelpoder.com/files/scripts/smem_detail.sql 25

       SID CATEGORY        NAME                       HEAP_NAME            BYTES ALLOCATION_COUNT

---------- --------------- -------------------------- --------------- ---------- ----------------

        25 PL/SQL          pmuccst: adt/record        koh-kghu call    896860424            54901

        25 PL/SQL          pl/sql vc2                 koh-kghu call    112961768             6915

        25 PL/SQL          pmucalm coll               koh-kghu call     31622336             1936

        25 Other           permanent memory           pga heap            263952               32

 

Now you know that most memory is allocated from private heap "koh-kghu call" due to reason "pmuccst: adt/record", but you still don't know which PL/SQL code line is responsible for requesting this memory. In this demo case it is pretty simple as Frit's code allocates such amount of memory at only one place, but this is just for demonstration purpose. In reality the PL/SQL code is usually much larger and much more complex - so no way to determine the corresponding code line easily.

 

Let's use the DTrace script "dtrace_kghal_pga_code" to intercept the process when it requests memory from heap "koh-kghu call" due to "pmuccst: adt/record".

 

shell> ./dtrace_kghal_pga_code 1502 kghalf "pmuccst: adt/record"

dtrace: description 'pid$target::kghalf:entry

    ' matched 1 probe

dtrace: allowing destructive actions

CPU     ID                    FUNCTION:NAME

   0  78530                     kghalf:entry

Process is stopped due to memory allocation reason "pmuccst: adt/record" from heap "koh-kghu call ".

Please run the following commands in separate SQL*Plus to dump an errorstack:

SQL> ORADEBUG SETOSPID 1502

SQL> ORADEBUG DUMP ERRORSTACK 3    <<<< ORADEBUG will hang until process is continued by prun - works as designed

----------------------------

Please run the following command in separate shell after executing ORADEBUG:

shell> /usr/bin/prun 1502



I think the DTrace output is self-explanatory and you just have to follow the instructions.

 

Received ORADEBUG command (#1) 'DUMP ERRORSTACK 3' from process '1519'

*** 2016-01-30 09:04:30.651

dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)

----- Error Stack Dump -----

----- Current SQL Statement for this session (sql_id=8y13z00p0sgc6) -----

SELECT * FROM T2

----- PL/SQL Stack -----

----- PL/SQL Call Stack -----

  object      line  object

  handle    number  name

#ff0000; font-size: 8pt; font-family: courier new,courier;">199ddab20        14  function TEST.MEM_ALLOC

----- Call Stack Trace -----

calling              call     entry                     

location             type     point                 

-------------------- -------- --------------------

*** 2016-01-30 09:04:30.946

ksedst()+307         call     skdstdst()

__sighndlr()+6       call     ssprtmin()          

call_user_handler()+718  call     __sighndlr()                                          

#ff0000;">sigacthandler#ff0000;">()+219  call     call_user_handler() 

#ff0000; font-size: 8pt; font-family: courier new,courier;">kghalf()+2           signal   sigacthandler()     

#ff0000;">kohalmc()+214        call     kghualloc()         

kohalc()+145         call     kohalmc()           

pmuocon2_con_recur()+145  call     kohalc()                                                

pmuocon()+127        call     pmuocon2_con_recur()                   

pfrrun_no_tool()+298 call     pfrinstr_EXECC()                                    

pfrrun()+1216        call     pfrrun_no_tool()    

plsql_run()+648      call     pfrrun()            

peidxr_run()+317     call     plsql_run()         

ssthrdmain()+558     call     opimai_real()       

main()+164           call     ssthrdmain()        

_start()+123         call     main()           



Checking the call stack trace in the generated trace file reveals that the errostack trace is created at the right time.

The function kghalf() is the last executed function right before the signal handler (by oradebug) kicks in: "sigacthandler()+219<-kghalf()+2". You also can see the PL/SQL call stack and the currently running SQL statement (if a "plain" SQL statement runs at this time - otherwise you see the "entry" PL/SQL call). PL/SQL code line 14 in function TEST.MEM_ALLOC is exactly related to "select * bulk collect into c_tmp from t2;".

 

 

Summary

I hope you may find this DTrace script a little bit useful in case of troubleshooting unclear PGA memory allocation.

Just let me know if you have some issues with it. If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database (performance) issues. You can also follow or drop me a note on Twitter of course.

 

 

References

Conferences and Missed Opportunities

Spring conference season is upon us. Planning for COLLABORATE16 in April has gotten me to thinking about conferences and missed opportunities. 

Do you remember the first time you didn't attend an industry conference? That's a strange question, isn't it? I don't remember my first time not attending an event, but I do remember one event in particular as a forever-missed opportunity. The event was the International Community for Auditory Display's (ICAD's) 1998 conference in Glasgow, Scotland. 

Now a trip to Glasgow might have been out of my reach at the time. But the 2000 event was held in Georgia, and I missed that one too. I didn't just miss it. I chose to miss it. Because I was cheap. Because I did not comprehend the benefits. Because no one had ever clued me in to why I should spend on myself like that. I missed the event, because I missed seeing its value.

I've missed other events over the years that I wish I had made the effort to attend. Interbike last year is an example. The Interbike International Bicycle Exposition is the largest bicycle industry event in North America, and the opportunity to attend as an industry insider was before me. But I missed the opportunity—not because of money this time, but because I let short-term work pressures get in the way of my long-term interests and personal growth.

So COLLABORATE16 is upon us. Now is the time to ante up if you're interested in attending. Or if not that event, maybe it's some other conference this year that is a better fit. Whatever the case, whatever event is the best fit, consider investing in yourself and your career by attending and meeting people and becoming involved in a larger community.

Because it's kind of a bummer to look back at missed opportunities and wonder what other course life might be taking right now had you chosen to attend an event rather than to miss it.

Because investing in your career pays in the long run.

Because sometimes we need to indulge our dreams. 

 

Conferences and Missed Opportunities

My job sends to many industry events, but I've missed attending some events
that would have fed my soul. Don't miss those! Regrets are no fun. Feed
your career, and feed some dreams too.



Read the full post at www.gennick.com/database.

Friday Philosophy – If Only I Was As Good a Programmer As I Thought I Was Aged 22

I saw a tweet that made me smile a few days ago:

programmer quote

Like many of you, I started out my working life in IT as a computer programmer. Like some of you, I still am a computer programmer from time to time. I don’t think I’ve had a year of my working life when I did not do either some application development programming or some database infrastructure programming. I am constantly writing small (or not so small) SQL or PL/SQL programs to do what I do in Oracle.

I started programming in school, I did an “O” level in computer studies (the exams we sat in the UK aged 16, up until 1988!), and I was pretty good at the programming as compared to my fellow class mates. My first “real” program played Noughts and Crosses (tic-tac-toe to our American cousins and maybe others) and version 2 was unbeatable. Which at the time I thought was pretty cool.
but Wikipedia now tells me is pretty easy :-). I also remember someone in the year above me unrolling some huge printout of the role-playing game he was writing (you know, the old textual “you have walked into a room where there is a lion, a bar of soap and a chandelier, what do you want to do?” sort of thing) and telling me I would never be able to do it. I just looked at the code and thought: Why have you hard-coded every decision and used all those GOTOs? Some sort of loop and a data block to look up question, answers and consequences would be much smaller and easy to extend? I don’t think he liked me voicing that opinion…

I did not do any programming of any consequence as part of my college course but after that I started work as a computer programmer (sorry “analyst programmer”) in the National Health Service. Again, I seemed better at it than most of those around me, fixing bugs that others had given up on and coding the tricky stuff no one else wanted to touch. And after a year or so, I was convinced I was a programming god!

I wasn’t of course. Part of it was my juvenile, naive ego and the other part was that, fundamentally, many of those around me were bad programmers. Anybody decent either did not join in the first place or got a better job elsewhere that paid more than the NHS did. I eventually did that myself and joined Oracle. Where I realised that (a) SQL confused the hell out of me and (b) when I started using PL/SQL there were plenty of people around me who were better at traditional programming than I.

I think it took me about a year to feel I was damned good at both of them. Guess what? I was wrong. I was simply competent. But after a year or two more I did two things that, for me, finally did make me into a good programmer:

  • I went contracting so I worked in a lot of places, saw a lot more examples of good and bad code and I met a lot more programmers.
  • I think I hit mental puberty and woke up to the fact that I needed to listen and learn more.

Since then, I think my own opinion of my coding skills has generally dropped year on year, even though I would like to think I continue to get better at actually constructing computer programs and suites of programs.

So yes, I wish I was as good a programmer now as I thought I was aged 22. And after 25 years at it (actually, pretty much 35 years at it on and off!) just like Rich Rogers (or is it John D Cook? I can’t quite decide if it is a quotation or not) I think I am finally getting moderately good at writing programs. If I continue to follow this trend, on my 65th birthday I will be convinced I can’t program for toffee and yet will finally be a Good Programmer.

I wonder if  anyone would still employ me to do it by then?

Table Scans

It’s amazing how easy it is to interpret a number incorrectly until the point comes where you have to look at it closely – and then you realise that there was a lot more to the number than your initial casual assumption, and you would have realised it all along if you’d ever needed to think about it before.

Here’s a little case in point. I have a simple (i.e. non-partitioned) heap table t1 which is basically a clone of the view dba_segments, and I’ve just connected to Oracle through an SQL*Plus session then run a couple of SQL statements. The following is a continuous log of my activity:


SQL> select table_name, partitioned, blocks from user_tables;

TABLE_NAME           PAR     BLOCKS
-------------------- --- ----------
T1                   NO         958

1 row selected.

SQL> select name, value from v$mystat ms, v$statname sn where ms.statistic# = sn.statistic# and name like '%table scan%';

NAME                                  VALUE
-------------------------------- ----------
table scans (short tables)                9
table scans (long tables)                 0
table scans (rowid ranges)                0
table scans (cache partitions)            0
table scans (direct read)                 0
table scan rows gotten                 4188
table scan blocks gotten                 14

7 rows selected.

SQL> select count(extents) from t1;

COUNT(EXTENTS)
--------------
          5143

1 row selected.

SQL> select name, value from v$mystat ms, v$statname sn where ms.statistic# = sn.statistic# and name like '%table scan%';

  • My cache size is quite small, so t1 doesn’t count as a “short” table.
  • I’ve collected stats on the table (and there are no indexes) so the optimizer doesn’t need to do any dynamic sampling to generate an execution plan.
  • This is 11g, so there are no SQL Plan Directives in place to force dynamic sampling

So here’s the question: how many “table scans (long tables)” will Oracle record against my session when I re-run that query against v$mystat ?

Warning – this IS a trick question.

Update number 1

I said it was a trick question and, as you will have seen if you’ve had time to read the comments, the answer is going to depend on various configuration options. Here’s what I got in my test – and I’ve reported not just the session stats, but the segment statistics (v$segment_statistics):


NAME                                  VALUE
-------------------------------- ----------
table scans (short tables)                9
table scans (long tables)                26
table scans (rowid ranges)               26
table scans (cache partitions)            0
table scans (direct read)                 0
table scan rows gotten                 9331
table scan blocks gotten                954

7 rows selected.

SQL> select owner, object_name, object_type, value  from v$segment_statistics where owner = 'TEST_USER' and statistic_name = 'segment scans' and value != 0;

OWNER           OBJECT_NAME          OBJECT_TYPE             VALUE
--------------- -------------------- ------------------ ----------
TEST_USER       T1                   TABLE                      26

1 row selected.

I recorded 26 long table scans – and that 26 was echoed in the segment statistics. (Note: I don’t normally use v$segment_statistics, which is a join between a couple of in-memory structures and three real table, I usually query v$segstat). It’s the segment statistics that made me pause in a recent problem review;  up to that moment I had been using the “Segments by Table Scans” section of the AWR Report as a useful (but badly named – since it also counts (most) index fast full scans) indicator of a potential threat, then I suddenly realised that there was yet another important detail I had to check before I could determine what the numbers were telling me.

So the next question is WHY has Oracle reported 26 tablescans ?

Update number 2

Martin Preiss (see comments) has been working hard to investigate this, and managed to produce a couple more variations in statistics for “the same” problem. In my case I had declared the table as “parallel 2”. For a table of degree N Oracle is very likely to break the table into 13*N chunks – there’s a historic reason for 13 – and that’s why I see 26 “rowid range” scans which, unfortunately, are also recorded as “long” table scans even though each is individually well short of the short table limit. This is really annoying when you start to look at the “Segments by Table Scan” report buecause you now don’t know how many times a table really was scanned unless you know something about the run-time degree of parellelism involved.

If you check the hidden parameters you will find several parameters relating to this chunking, in particular:


_px_min_granules_per_slave        minimum number of rowid range granules to generate per slave (default  13)
_px_max_granules_per_slave        maximum number of rowid range granules to generate per slave (default 100)

Technically, therefore, Oracle MIGHT get to 100 granules per parallel slave, and a tablescan at degree 2 could be recorded as 200 tablescans!

Martin asked why my 26 tablescans didn’t show up as “table scans (direct read)” – but that’s 11g for you, it allows a serial tablescan to use direct path reads, and it allows parallel tablescans to read into the cache, and the combination of cache size and table size meant that my example just happened to read into the cache.

And that introduces ANOTHER interpretation problem – what annoying things might I discover if I declare the table the the CACHE option ? (as Ivica suggested in his first comment below) ?

Final Update (probably)

Iviva has been busy on checking the CACHE operation and shown that there are a number of cases to consider (more, in fact, than I was planning to mention – and the variation the combination in NOPARALLEL and CACHE should give you some pause for thought). The only point I wanted to make was the effect of enabling PARALLEL and CACHE; I don’t think that this is a combination that is very likely to appear in a production system, but boundary conditions (and accidents) do occur. With my little sample – even after I grew the table to be MUCH larger, the tablescan DIDN’T get reported: here are some session stats and segment stats (taken using my snapshot code) of a single tablescan running parallel 2 when the table was cached:


Name                                                                     Value
----                                                                     -----
table scans (rowid ranges)                                                  26
table scans (cache partitions)                                              26
table scan rows gotten                                                 164,672
table scan blocks gotten                                                29,611

  Statistic                                    Value
  ---------                             ------------

T1
  logical reads                               30,272
  physical reads                              29,614
  physical read requests                         261

I know I’ve done a tablescan in 26 pieces (parallel 2) – and scanned 29,000+ blocks doing it; but according to the segment stats AND session stats I haven’t done a tablescan. Fortunately, of course, I can see the parallel tablescan in the session stats, and in this isolated case I can see from the “cache partitions” statistics that that tablescan was on a cached table. But if I’ve been depending on the segment stats to tell me about which tablescans happen most frequently and do most work I’ve found another reason why I can’t trust the stats and have to do more work cross-checking different parts of the AWR for self-consistent results.

I don’t expect to add any more comments about this mixture of tablescans and parallelism, with the varying effects on the session and segment statistics – but there’s always the possibility that one day I’ll start to worry about how the KEEP and RECYCLE (db_keep_cache_size and db_recycle_cache_size) could confuse things further.

Footnote:

It’s probably worth pointing out that the segment statistics have never recorded SHORT tablescans, they’ve only ever captured details of LONG tablescans. There is a slight inconsistency here, though, since they capture all the “index fast full scans (full)” reported in the session stats whether they are short or long – not that the session stats record the difference; unfortunately, another flaw creeps in: parallel index fast full scans “disappear” from the stats, although they show up as “index fast full scans (rowid ranges)” and “index fast full scans (direct)” with the same multiplier of 13 that we see for parallel table scans.

Common GATHER_PLAN_STATISTIC confusion

Most people already know about the very cool GATHER_PLAN_STATISTICS hint.  If not, you can see an example here

But here’s a common cause of confusion when using it in SQL Plus:


SQL> select /*+ gather_plan_statistics*/ count(p) from t where x > sysdate - 30;

  COUNT(P)
----------
         0

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'))

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
--------------------
SQL_ID  9babjv8yq8ru3, child number 0

BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;

NOTE: cannot fetch plan for SQL_ID: 9babjv8yq8ru3, CHILD_NUMBER: 0
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)



So what has happened ? Why didn’t I get my nicely formatted output ?

Well, if you have “SERVEROUTPUT” set to “on” in SQL Plus, then when your SQL statement has completed, SQL Plus makes an additional call to the database to pick up any data in the DBMS_OUTPUT buffer.

Hence when you ask for statistics on the last call made to the database, it is not the one you were expecting.

Moral of the story: Turn off serveroutput before using GATHER_PLAN_STATISTICS in SQL Plus.

Enterprise Manager Processes, EM12c vs. EM13c

To assist users as they plan out their upgrades and new Enterprise Manager environments, I wanted to go over a few subtle, but important changes from EM12c, 12.1.0.5 to the newest release, the much anticipated EM13c, 13.1.0.0.

em13c_splashhttp://dbakevlar.com/wp-content/uploads/2016/01/em13c_splash-768x422.jpg 768w, http://dbakevlar.com/wp-content/uploads/2016/01/em13c_splash-1024x563.jpg 1024w" sizes="(max-width: 507px) 100vw, 507px" />

EM Processes

One of the things you’ll notice when starting an EM12c from the command line is WHAT is started.

$ ./emctl start oms
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
Starting Oracle Management Server...
Starting WebTier...
WebTier Successfully Started
Oracle Management Server Successfully Started
Oracle Management Server is Up

The Oracle Management Repository, (aka OMR, the database) has been started separately, along with the listener for the connection to the OMR, but upon issuing the startup command for the OMS, there are steps that are followed in a default installation:

  • OMS, (Oracle Management Service)
  • Weblogic, (Webtier)
  • The Node Manager and a few other necessary components behind the scenes.

You’ll also note that as of 12.1.0.4 with the latest patch and 12.1.0.5, the agent on the OMR host is started automatically:

$ ./emctl start agent
Oracle Enterprise Manager Cloud Control 12c Release 5
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
Agent is already running

Now, with Enterprise Manager 13c, there are a few more processes and checks that are done as part of the start up:

$ ./emctl start oms
Oracle Enterprise Manager Cloud Control 13c Release 1
Copyright (c) 1996, 2015 Oracle Corporation. All rights reserved.
Starting Oracle Management Server...
WebTier Successfully Started
Oracle Management Server Successfully Started
Oracle Management Server is Up
JVMD Engine is Up
Starting BI Publisher Server ...
BI Publisher Server Successfully Started
BI Publisher Server is Up

These two new processes are well known to EM Administrators, but you didn’t see them incorporated into the start up step.

You can see the steps for all the processes started as part of the “emctl start oms” in the $OMS_HOME/gc_inst/em/EMGC_OMS1/sysman/log/emctl.log

2016-01-21 17:26:42,077 [main] INFO commands.BaseCommand logAndPrint.653 - Oracle Management Server is Up
2016-01-21 17:26:42,078 [main] INFO commands.BaseCommand printMessage.413 - statusOMS finished with result: 0
2016-01-21 17:26:42,094 [main] INFO ctrl_extn.EmctlCtrlExtnLoader logp.251 - Extensions found: 1
2016-01-21 17:26:42,095 [main] INFO ctrl_extn.EmctlCtrlExtnLoader logp.251 - Executing callback for extensible_sample
2016-01-21 17:26:42,095 [main] INFO ctrl_extn.EmctlCtrlExtnLoader logp.251 - jar is /u01/app/oracle/13c/plugins/oracle.sysman.emas.oms.plugin_13.1.1.0.0/archives/jvmd/em-engines-emctl.jar; class is oracle.sysman.emctl.jvmd.JVMDEmctlStatusImpl
2016-01-21 17:26:42,200 [main] INFO ctrl_extn.EmctlCtrlExtnLoader logp.251 - rsp is 0 message is JVMD Engine is Up
2016-01-21 17:26:42,200 [main] INFO commands.BaseCommand printMessage.426 - extensible_sample rsp is 0 message is JVMD Engine is Up
2016-01-21 17:26:42,201 [main] INFO commands.BaseCommand logAndPrint.653 - JVMD Engine is Up
2016-01-21 17:26:42,242 [main] INFO commands.BaseCommand logAndPrint.653 - BI Publisher Server is Up

The JVMD, JVM Diagnostics, is now part of the EM infrastructure.  Considering how important java heap knowledge is to tuning your EM environment, it makes complete sense that this is now included with the processing of an EM weblogic host.

There’s a number of new diagnostic reports and health dashboards to assist with ensuring your EM environment retains a healthy performance.

bip13chttp://dbakevlar.com/wp-content/uploads/2016/01/bip13c-768x500.jpg 768w, http://dbakevlar.com/wp-content/uploads/2016/01/bip13c-1024x666.jpg 1024w, http://dbakevlar.com/wp-content/uploads/2016/01/bip13c.jpg 1537w" sizes="(max-width: 514px) 100vw, 514px" />.

Logs

Along with the important emctl.log, there are some new logs in the sysman logs directory that weren’t there before:

  • emoms_pbs* -Trace and log files for starting worker threads and other background processes.
  • pafLogs  – Not sure, but think this is a sub-directory for plugin logs.  Still researching this one.**
  • jvmdlogs -Directory for JVMD logs
  • syncopss.log -Security log for synchronization with Wallet

**Thank you to Andrew Bulloch: “pafLogs – these are the log files and outputs from the Provisioning and Automation Framework (PAF).  That’s typically DP’s (Deployment Procedures), and other automation tasks that the OMS and, more specifically, the jobs subsystem/tasks subsystem used internally.Resource and Space Usage”

The EM Administrator may want to know, how much user memory is used, upon installation, (not counting heavy activity, plugins, etc.) is used by the added components.

A 12.1.0.5 installation, OMS, OMR with webtier can be tracked with the following when running as a unique user:

ps aux | awk '{arr[$1]+=$4}; END {for (i in arr) {print i,arr[i]}}' | sort -k2

86Mb

For 13.1.0.0, the usage was about 50% higher, (for the basic load, no additional targets or collections occurring):

133Mb

So not much of a difference to start these processes and have them available in the background for reporting and java diagnostic support.

Space required for the OMS_HOME, (sans the logs, GC_INST and the AGENT_HOME) is different as well:

12.1.0.5:  12Gb

13.1.0.0:  14Gb

The Agent has a lot more work to do in the EM13c release and this is why you’ll note I separated the size requirements for releases:

12.1.0.5: 1Gb

13.1.0.0: 3Gb

So there you have it.  A little more background info about EM13c that should assist you in planning for your upcoming upgrade or new environment!

 



Tags:  ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [Enterprise Manager Processes, EM12c vs. EM13c], All Right Reserved. 2016.

Bitmap Efficiency

An interesting observation came up on the Oracle-L list server a few days ago that demonstrated how clever the Oracle software is at minimising run-time work, and how easy it is to think you know what an execution plan means when you haven’t actually thought through the details – and the details might make a difference to performance.

The original question was about a very large table with several bitmap indexes, and an anomaly that appeared as a query changed its execution plan.  Here are the critical sections from the plans (extracted from memory with rowsource execution statistics enabled):

--------------------------------------------------------------------------------------------------------
|  Id |Operation                        | Name       | Starts | E-Rows | A-Rows |     A-Time | Buffers |
--------------------------------------------------------------------------------------------------------
|   6 |    TABLE ACCESS BY INDEX ROWID  |       FACT |      1 |      1 |     24 |00:00:00.01 |      31 |
|   7 |     BITMAP CONVERSION TO ROWIDS |            |      1 |        |     24 |00:00:00.01 |       7 |
|   8 |      BITMAP AND                 |            |      1 |        |      1 |00:00:00.01 |       7 |
|*  9 |       BITMAP INDEX SINGLE VALUE |     FACT_0 |      1 |        |      1 |00:00:00.01 |       3 |
|* 10 |       BITMAP INDEX SINGLE VALUE |  FACT_DIM1 |      1 |        |      4 |00:00:00.01 |       4 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     9 - access("FACT"."C0"=243001)
    10 - access("FACT"."C1"="DIMENSION1"."ID")


-------------------------------------------------------------------------------------------------------
|  Id | Operation                      | Name       | Starts | E-Rows | A-Rows |     A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
|   7 |    BITMAP CONVERSION TO ROWIDS |            |      5 |        |      8 |00:00:00.01 |     119 |
|   8 |     BITMAP AND                 |            |      5 |        |      1 |00:00:00.01 |     119 |
|*  9 |      BITMAP INDEX SINGLE VALUE |  FACT_DIM1 |      5 |        |     20 |00:00:00.01 |      28 |
|* 10 |      BITMAP INDEX SINGLE VALUE |  FACT_DIM2 |      5 |        |    140 |00:00:00.01 |      78 |
|* 11 |      BITMAP INDEX SINGLE VALUE |     FACT_0 |      5 |        |      5 |00:00:00.01 |      13 |
|  12 |   TABLE ACCESS BY INDEX ROWID  |       FACT |      8 |      1 |      8 |00:00:00.01 |       8 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     9 - access("FACT"."C1"="DIMENSION1"."ID")
    10 - access("FACT"."C2"="DIMENSION2"."ID")
    11 - access("FACT"."C0"=243001)

The first plan shows the steps leading to a single access (Starts = 1) to the FACT table after combining two bitmap indexes; the second shows the second child of a nested loop join where Oracle has combined three bitmaps indexes to access the FACT table – operation 7 (and its descendants) execute 5 times in this case. I’ve included the related parts of the predicate section so that you can see that the predicates at operations 9 and 10 of the first plan are the same as the predicates at operations 9 and 11 of the second plan.

So here’s the question – if one access to fact_dim1 requires 4 buffer visits, why does it take 28 buffer visits to do the same thing 5 times (and it is with the same value every time); conversely if one access to fact_0 requires 3 buffer visits, why do 5 visits to do the same thing take only 13 buffer visits. (Note: the arithmetic is made a little more obscure by the way in which index branch blocks may be pinned during nested loop joins.)

Then there’s a further question – not visible in the plan – the A-Rows in the “BITMAP INDEX SINGLE VALUE” operation is the number of bitmap sections in the rowsource, and we can see that the key values for index fact_dim2 have a significant number of bitmap chunks for a single key (5 executions returned 140 bitmap chunks). This scale, though, is true of all three indexes – in fact a follow-up email pointed out that a typical key value in EVERY ONE of the three indexes consisted of about 100 bitmap chunks, so why can’t we see those hundreds in the execution plan ?

So this is where we’re at: we have an execution plan where we haven’t visited all the bitmap chunks for a bitmap key, and the order in which the bitmap indexes are used in the plan seems to have some effect on the choice of leaf-blocks you visit when accessing the chunks. So (a) could a change in the order of indexes make a significant difference to the number of bitmap chunks you visit and the resulting performance, and (b) is there a way to control the order in which you visit the indexes. That’s where the note starts to get a bit technical – if you don’t want to read any more the answers are: (a) yes but probably not significantly and (b) yes.

Demo

To investigate what goes on inside a “BITMAP AND” I created a table with two bitmap indexes and used a very large setting for pctfree for the indexes so that they had to be stored with a large number of bitmap chunks per key. Here’s the code that I used, with some results from an instance of 12.1.0.2:


create table people
nologging
as
with generator as (
        select  --+ materialize 
                rownum id 
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum-1, 1e2)      id_town_home,
        trunc((rownum-1)/1e4)   id_town_work,
        rpad('x',10,'x')        small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'PEOPLE',
                method_opt       => 'for all columns size 1'
        );
end;
/

create bitmap index pe_home on people(id_town_home) nologging pctfree 95;
create bitmap index pe_work on people(id_town_work) nologging pctfree 95;

select
        index_name, distinct_keys, num_rows, leaf_blocks, avg_leaf_blocks_per_key
from
        user_indexes
where
        table_name = 'PEOPLE'
order by
        index_name
;


INDEX_NAME           DISTINCT_KEYS   NUM_ROWS LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY
-------------------- ------------- ---------- ----------- -----------------------
PE_HOME                        100      30399       15200                     152
PE_WORK                        100       1800         907                       9

As you can see I’ve generated two columns (id_town_home, id_town_work) with 100 distinct values and 10,000 rows each, but with very different data distributions – the rows for any given value for id_town_home are uniformly spread across the entire table, every hundredth row; while the rows for any given value of id_town_work are very tightly clustered as a group of 10,000 consecutive rows. As a consequence the index entry (bitmap string) for a typical key value for id_town_home is enormous and has to be broken into 304 chunks spread across 152 leaf blocks (2 index entries per leaf block), while the index entry for a typical key value for id_town_work is much shorter, but still requires 18 chunks spread across 9 leaf blocks.

So what will I see if I run the following query, and force it to use a BITMAP AND of the two indexes, in the two different orders:

select
        /*+ index_combine(pe) */
        max(small_vc)
from
        people pe
where
        id_town_home = 50
and     id_town_work = 50
;

Based on a very simple interpretation of the typical execution plan and using the index stats shown above we might expect to see roughly A-Rows = 18 with 9 buffer gets (plus a few more for segment headers and branch blocks) on the id_town_work index and A-Rows = 304 with 152 buffer gets on the id_town_home index to allow Oracle to generate and compare the two bit strings – but here are the two plans with their execution stats, generated in 12.1.0.2, and each run after flushing the buffer cache:

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |      1 |        |      1 |00:00:00.01 |     118 |    117 |
|   1 |  SORT AGGREGATE                      |         |      1 |      1 |      1 |00:00:00.01 |     118 |    117 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| PEOPLE  |      1 |    100 |    100 |00:00:00.01 |     118 |    117 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |         |      1 |        |    100 |00:00:00.01 |      18 |     17 |
|   4 |     BITMAP AND                       |         |      1 |        |      1 |00:00:00.01 |      18 |     17 |
|*  5 |      BITMAP INDEX SINGLE VALUE       | PE_WORK |      1 |        |     18 |00:00:00.01 |      14 |     13 |
|*  6 |      BITMAP INDEX SINGLE VALUE       | PE_HOME |      1 |        |      4 |00:00:00.01 |       4 |      4 |
-------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |      1 |        |      1 |00:00:00.01 |     122 |    120 |
|   1 |  SORT AGGREGATE                      |         |      1 |      1 |      1 |00:00:00.01 |     122 |    120 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| PEOPLE  |      1 |    100 |    100 |00:00:00.01 |     122 |    120 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |         |      1 |        |    100 |00:00:00.01 |      22 |     20 |
|   4 |     BITMAP AND                       |         |      1 |        |      1 |00:00:00.01 |      22 |     20 |
|*  5 |      BITMAP INDEX SINGLE VALUE       | PE_HOME |      1 |        |      5 |00:00:00.01 |       8 |      7 |
|*  6 |      BITMAP INDEX SINGLE VALUE       | PE_WORK |      1 |        |     18 |00:00:00.01 |      14 |     13 |
-------------------------------------------------------------------------------------------------------------------

We have NOT touched anything like the entire bit-string for the id_town_home index – a bit-string that spans 152 leaf blocks! Clearly Oracle is doing something clever to minimise the work, and it’s so clever that switching the order of these two extremely different indexes in the plan has made virtually no difference to the work done. Obviously I can’t tell you exactly what the code is doing, but I think I can produce a reasonable guess about what’s going on.

The pe_work index has the smaller number of leaf blocks per key, which makes it the better starting choice for the AND in this case, so the optimizer’s default starting action was to pick the first couple of chunks of that index key value; and Oracle immediately sees that the first rowid that it could possibly need in its result set is roughly in the middle of the table – remember that the “key” columns of a bitmap index are (real_key, first_rowid_of chunk, last_rowid_of_chunk, compressed_bitstring).

Since it now knows the lowest possible rowid that it could need Oracle can now probe the pe_home index by (id_town_home=50, {target_rowid}) – which will let it go to a bitmap index chunk that’s roughly in the middle of the full range of 152. Then Oracle can expand the bitstrings from the chunks it has, reading new chunks as needed from each of the indexes until the 18 chunks / 9 leaf block from the pe_work index have been used up (and that range would have aligned with just two or three chunks from the pe_home index) at which point Oracle can see there’s no more rows in the table that could match both predicates and it doesn’t need to read the next 75 chunks of the pe_home index.

Conversely, when I forced Oracle to use the (inappropriate) pe_home index first, it read the first couple of chunks, then read the first couple of chunks of the pe_work index, at which point it discovered that it didn’t need any of the pe_home index prior to (roughly) chunk 75, so it jumped straight to the right chunk to align with pe_work and carried on from there. That’s why the forced, less efficient, plan that visited pe_home first visited just a couple more leaf blocks than the plan the optimizer selected for itself.

Bottom line on performance (tl;dr) – Oracle is sufficiently smart about checking the start and end ranges on bitmap indexes (rather then arbitrarily expanding the entire bitmap for each key) that even for very large bitmap index entries it will probably only access a couple of “redundant” leaf blocks per index even if it picks the worst possible order for using the indexes. You’re far more likely to notice Oracle picking the wrong indexes (because you know the data better) than you are to spot it using the right indexes in the wrong order – and given that bitmap indexes tend to be relatively small and well buffered (compared to the tables), and given the relatively large number of rows we pick by random I/O from fact tables, a little extra work in the bitmap indexes is unlikely to make a significant difference to the performance of most queries.

Closing fact: in the unlikely circumstances that you do spot the special case where it will make a difference (and it will probably be a difference in CPU usage) then you can dictate the order of the indexes with the undocumented bitmap_tree() hint.  I may get round to writing up the variations one day but, for this simple case, the index_combine() hint that I used to force the BITMAP AND turned into the following bitmap_tree() hint in the outline:

bitmap_tree(@sel$1 pe@sel$1 and((people.id_town_work) (people.id_town_home)))

bitmap_tree( @query_block     table_name@query_block     and( ({first index definition}) ({second index definition}) ) )

Obviously not suitable to throw into production code casually – check with Oracle support if you think it’s really necessary – but if you wanted to reverse the order of index usage in this case you could just swap the order of the index definitions. If you thought there was a third index that should be used you could include its definition (note that it’s table_name.column_name – the index definition – in the brackets).

My reference: bitmap_control_02.sql

(Not So) Internal Dialogue

internal-dialogueWhen I wrote about rehearsals in my public speaking tips series, I mentioned talking through small sections of the presentations as a means for rehearsals. I do this a lot! I live on my own, so this is not an internal dialogue. I say this stuff out loud.

This morning I was talking through some ideas as I left the house and cleared the ice off the car. I continued during the journey to work, including when I got out of the car to get a coffee from the Costa Express at the garage. Even as I was unlocking the office door.

If you happen see me doing this in the street, I’m not talking to an imaginary friend. It’s an imaginary audience. If you’re gonna do crazy, you might as well do it at scale… :)

Cheers

Tim…


(Not So) Internal Dialogue was first posted on January 28, 2016 at 11:52 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

FIRST_ROWS vs FIRST_ROWS_n

You might be thinking that FIRST_ROWS is pretty much the same as FIRST_ROWS_1, since they are both after the first row. However, there are some important distinctions to make here. The traditional ‘FIRST_ROWS’ parameter is not a genuine cost-based option, more a hybrid between the rule and cost. This can be observed via the following example:



SQL> create table T
  2  as select rownum x, rpad(rownum,100,'x') padding
  3  from dual
  4  connect by level <= 2000
  5  order by dbms_random.value;

Table created.

SQL>
SQL> create index IX on T ( x ) ;

Index created.

SQL> set autotrace traceonly explain
SQL>
SQL> select * from T where x < 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    99 | 10395 |    12   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    99 | 10395 |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"<100)

SQL> alter session set optimizer_mode = first_rows;

Session altered.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from T where x < 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 3947747388

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |    99 | 10395 |    99   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |    99 | 10395 |    99   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IX   |    99 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"<100)

Notice that the optimizer has chosen a path with a HIGHER cost, so in the case of first_rows, it is not solely the cost that plays a part as a determining factor. There are some heuristics in place to bias the optimizer toward index access.

You could quite naturally assume that the FIRST_ROWS_1 is the same as FIRST_ROWS, but this is NOT the case. The optimizer use the cost calculation and no other heuristics. You can see that the ‘n’ in FIRST_ROWS_n as an important driver in determining cardinalities for query result sets.



SQL> alter session set optimizer_mode = first_rows_1;

Session altered.

SQL> select * from T where x < 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 3947747388

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |     2 |   210 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |     2 |   210 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IX   |       |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"<100)

SQL> alter session set optimizer_mode = first_rows_10;

Session altered.

SQL> select * from T where x < 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 3947747388

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |    11 |  1155 |    13   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |    11 |  1155 |    13   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IX   |       |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"<100)

SQL>
SQL> alter session set optimizer_mode = first_rows_100;

Session altered.

SQL> select * from T where x < 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    99 | 10395 |    12   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    99 | 10395 |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"<100)

SQL>
SQL>


So we can start to see the methodology at work here. The optimizer is "taking our word" in that, (independent of what the table statistics imply), it will only need to use the first 'n' rows in the relevant source. It is a true cost adjustment rather than a change to the optimization scheme (as the original FIRST_ROWS is)