I’ll be interested to see how the performance improvements to SFTP work out. I’ve seen some issues with this during transfers of large files before. The built in NFS and VNC servers sound interesting too. I can think of one situation where the NFS server would come in really handy.
Fun, fun, fun…
Cost Based Oracle – Fundamentals (November 2005)
But the most interesting function for our purposes is sys_op_countchg(). Judging from its name, this function is probably counting changes, and the first input parameter is the block ID portion (object_id, relative file number, and block number) of the table’s rowid, so the function is clearly matching our notional description of how the clustering_factor is calculated. But what is that 1 we see as the second parameter?
When I first understood how the clustering_factor was defined, I soon realized that its biggest flaw was that Oracle wasn’t remembering recent history as it walked the index; it only remembered the previous table block so that it could check whether the latest row was in the same table block as last time or in a new table block. So when I saw this function, my first guess (or hope) was that the second parameter was a method of telling Oracle to remember a list of previous block visits as it walked the index.
And finally, Oracle Corp. had implemented an official interface to the second parameter of sys_op_countchg() – provided you install the right patch – through a new table (or schema, or database) preference type available to the dbms_stats.set_table_prefs() procedure.
I’ve been meaning to write this post for two or three months, ever since Sean Molloy sent me an email about short blog note from Martin Decker describing Bug 13262857 Enh: provide some control over DBMS_STATS index clustering factor computation. Unfortunately I’ve not yet had time to investigate the patch, but I don’t think I need to any more because Richard Foote has written it up in his latest blog post.
Read Richard’s post – it’s important.
Richard’s post has, unsurprisingly, produced a buzz of excitement in his reader – and started up the discussion of how best to use this capability; so here’s another quote from the book (p.111 – available in the download of chapter 5):
So using Oracle’s own function for calculating the clustering_factor, but substituting the freelists value for the table, may be a valid method for correcting some errors in the clustering_factor for indexes on strongly sequenced data. (The same strategy applies if you use multiple freelist groups—but multiply freelists by freelist groups to set the second parameter.)
Can a similar strategy be used to find a modified clustering_factor in other circumstances? I think the answer is a cautious “yes” for tables that are in ASSM tablespaces. Remember that Oracle currently allocates and formats 16 new blocks at a time when using automatic segment space management (even when the extent sizes are very large, apparently). This means that new data will be roughly scattered across groups of 16 blocks, rather than being tightly packed.
Calling Oracle’s sys_op_countchg() function with a parameter of 16 could be enough to produce a reasonable clustering_factor where Oracle currently produces a meaningless one. The value 16 should, however, be used as an upper bound. If your real degree of concurrency is typically less than 16, then your actual degree of concurrency would probably be more appropriate.
Whatever you do when experimenting with this function—don’t simply apply it across the board to all indexes, or even all indexes on a particular table. There will probably be just a handful of critical indexes where it is a good way of telling Oracle a little more of the truth about your system—in other cases you will simply be confusing the issue.
Note particularly the comments about how the best value depends on the data in the indexed columns, the table configuration, and the degree of concurrency - you don’t necessarily want to use the same value for every index on a given table. That’s a shame, since Oracle has defined the interface as a TABLE preference, so if you set it then you get the same for every index. Despite this, if you’re prepared to put in a little control work, it does mean that you can use an official Oracle mechanism to play the game I was suggesting in the book – for each “special” index, set the preference, collect the stats, then clear the preference.
A new major release (version 3.0) of my XPLAN_ASH tool is available for download.
In addition to many changes to the way the information is presented and many other smaller changes to functionality there is one major new feature: XPLAN_ASH now also supports S-ASH, the free ASH implementation.
If you run XPLAN_ASH in a S-ASH repository owner schema, it will automatically detect that and adjust accordingly.
XPLAN_ASH was tested against the latest stable version of S-ASH (2.3). There are some minor changes required to that S-ASH release in order to function properly with XPLAN_ASH. Most of them will be included in the next S-ASH release as they really are only minor and don't influence the general S-ASH functionality at all.
If you're interested in using XPLAN_ASH with an existing S-ASH installation get in touch with me so I can provide the necessary scripts that apply the necessary changes.
Rather than writing another lengthy blog post about the changes and new features introduced I thought I start a multi-part video tutorial where I explain the purpose of the tool and how to use it based on the new version - some parts of the tutorial will focus on specific functionality of the tool and are therefore probably also quite useful as some kind of general tutorial on that Oracle feature and SQL execution troubleshooting guide in general.
The tutorial will consist of six parts initially, the first two are already available - the next ones to follow over time.
Part 1: Introduction, Overview
Part 2: Usage, Parameters, Invocation
Part 3: Rowsource Statistics: TBD
Part 4: Active Session History: TBD
Part 5: Systematic Parallel Execution Skew Analysis & Troubleshooting: Coming Soon
Part 6: Experimental Stuff, Script Configuration And Internals: TBD
Feel free to post questions/requests for clarification that are not covered in the tutorials in the comments section - if there are topics of general interest I might publish a seventh part addressing those questions.
In future I might use that video style more often since it's a nicer way of conveying certain kind of information.
Believe me, this article is worth reading I’m currently not allowed to discuss Oracle 12c Database goodies but I am allowed to discuss things perhaps initially intended for 12c that are currently available and already back-ported to 11g. This includes a wonderful improvement in the manageability of how the Clustering Factor (CF) of an index can now […]
I recently filled in this years ACE Director self assessment survey. Among other things, as part of that process I check how many articles, blog posts and forum answers I’ve posted over the year.
Whilst getting these numbers I noticed about 50% of articles I put live this year weren’t promoted to the front page as new articles because they were “back-fill”, written on general topics that haven’t really changed over the years. Of the last 6 articles I’ve written, only 1 has made it to the front page as a “new” article. Some of these back-fill articles were prompted by answering readers questions and some by situations that have come up at work. They all go live on the website, but I’d feel silly posting them as a new article.
Probably the biggest factor in the ratio of back-fill to new material is the delays in the release of 12c database. Each release prompts a rash of new articles and it’s been so long since 11gR2…
My next Embarcadero sponsored webinar will be on May 14 and is entitled Using Optimizer Hints for Oracle Performance Tuning.
Hints are excellent database performance tuning tools that direct the Oracle optimizer to utilize specific operations in SQL execution plans. We often use hints because the Oracle optimizer doesn’t always come up with the execution plan we want on its own. When used correctly, hints can help stabilize an execution plan to use the same operations over and over allowing the SQL to perform the way we desire.
In this webinar, I'll take a look at using hints specifically for testing. Hints are great, and often overlooked, testing tools.
Register for Optimizer Hints for Oracle Performance Tuning webinar to learn:
Thanks again to Embarcadero (@DBPowerStudio) for their sponsorship of these webinars.
I hope to see you there!
The webinars on “Smarter Statistics in 11g” are on tomorrow (Friday) at 2:00 pm and 6:00 pm. There’s a waiting list for the 6:00 pm event, so if you’ve signed up but can’t make it please delete your registration. (The event will be repeated on 10th June). If you want to vote a better time for me to do short webinars there’s a poll at the end of the article.
I’m about to make a serious move into online webinars, and as a warm-up exercise I’ll be doing a couple of one-hour free events on Friday 17th May.
I’ll be talking through a Powerpoint presentation called “Smarter Statistics in 11g” twice, once at 2:00 pm – 3:00 pm BST, and again at 6:00 pm BST (12:00 pm - 1:00 pm CDT) . Broadly speaking the first one is for the benefit people from the UK and eastwards, and the second is for the benefit of people from the US and westwards. This is just a trial run, of course, and if it works well I will be doing more of the same, perhaps three times per day to spread across more time zones.
John Goodhue (my O1 sponsor for the USA) is arranging all the mechanical details, and I’ll post links for registration when they become available – we’ll be using GoToWebinar as the supply mechanism, and we’ll be limiting access to 100 people (so if you do register and can’t attend, please remove yourself from the list; if you don’t manage to register for either event, you’ll get another chance later as I plan to repeat each event a few times.)
I’ll also be doing a full day paid event on 23rd May which will be my “Indexing Strategies” tutorial. This first full day event will be timed to suit the American audience – although anyone can register, of course – but we plan to have further events suited to other time zones. The URL for registration is now available – with an option to purchase a 30-day window to the recording of my “Oracle Mechanisms” presentation in Minneapolis.
When an Oracle process starts executing a query and needs to do a full segment scan, it needs to make a decision if it’s going to use ‘blockmode’, which is the normal way of working on non-Exadata Oracle databases, where blocks are read from disk and processed by the Oracle foreground process, either “cached” (read from disk and put in the database buffercache) or “direct” (read from disk and put in the process’ PGA), or ‘offloaded mode’, where part of the execution is done by the cell server.
The code layer where the Oracle database process initiates the offloading is ‘kcfis’; an educated guess is Kernel Cache File Intelligent Storage. Does a “normal” alias non-Exadata database ever use the ‘kcfis’ layer? My first guess would be ‘no’, but we all know guessing takes you nowhere (right?). Let’s see if a “normal” database uses the ‘kcfis’ functions on a Linux x64 (OL 6.3) system with Oracle 18.104.22.168 64 bit using ASM.
The only way to profile kernel functions that I am aware of is using ‘gdb’ and breaking on functions in the Oracle executable:
(the process id shown below ought to be the process id of an oracle database process id you are going to execute in. Do not, I repeat: not do this with other processes, especially the ones that do important tasks!)
# gdb -p 42 GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6) Copyright (C) 2010 Free Software Foundation, Inc. ... (gdb) rbreak ^kcfis.* Breakpoint 1 at 0x2204094
kcfis_get_sched_delay; Breakpoint 2 at 0x220410a kcfis_capability_tab_get; Breakpoint 3 at 0x2204150 kcfis_can_session_migrate; Breakpoint 4 at 0x2204190 kcfis_fob_storage_attr; Breakpoint 5 at 0x22041d0 kcfis_init_resource_limit; Breakpoint 6 at 0x22041f2 kcfis_tablespace_is_on_sage; ... (gdb) c Continuing.
Okay, we got the debugger set, now let’s execute a simple query (doing a full scan) to see if kcfis is touched on a “normal” server or not!
TS@v11203 > select count(*) from t2;
I am on OS-X (Apple) using iTerm, and I see the tab of the gdb session turning red: gdb generated some output on the screen!
(gdb) c Continuing. Breakpoint 6, 0x00000000022041f2 in kcfis_tablespace_is_on_sage () (gdb)
So, we hit a kcfis function! Let me put upfront that I do not have sourcecode of any kind, so my statements about what a function does are actually guesses. Anyway: from the name of the function (kcfis_tablespace_is_on_sage) it looks like a function in the kcfis layer which determines if a tablespace is on an Exadata storage server (exadata’s internal name is/was ‘sage’). Nice. so we hit the layer for determination if the tablespace is on Exadata.
Let’s continue the profiling with gdb:
(gdb) c Continuing. Breakpoint 2, 0x000000000220410a in kcfis_capability_tab_get () (gdb) c Continuing. Breakpoint 6, 0x00000000022041f2 in kcfis_tablespace_is_on_sage () (gdb) c Continuing.
We see another call to the function ‘kcfis_tablespace_is_on_sage’, and a call to ‘kcfis_capability_tab_get’. The last function probably tries to probe the table (but could be tablespace) to get the capabilities. This could be the function which checks the requirements for hybrid columnar compression, I am not sure about that.
At what point during the processing of the full segment scan does the kcfis_tablespace_is_on_sage occur? One way of investigating this, is profiling some functions we know a (direct path) full scan does, and see where the kcfis_tablespace_is_on_sage kicks in. When the buffer cache is flushed prior to executing a full scan, and the SQL is made unique, so it has to be parsed, the following sequence of events happens:
- A ‘SQL*Net message to client’ wait
- A ‘db file sequential read’ wait (for reading the segment header)
- Potentially a ‘asynch descriptor resize’ wait
- The full scan is done asynchronously (potentially revealing some ‘direct path read’ waits)
So if we profile on start and end of a wait (kslwtbctx and kslwtectx), a single block read (pread64), AIO (io_submit and io_getevents_0_4) and of course kcfis_tablespace_is_on_sage, we should be able to see that:
Breakpoint 3, 0x0000000008fa1334 in kslwtectx () Breakpoint 2, 0x0000000008f9a652 in kslwtbctx () Breakpoint 3, 0x0000000008fa1334 in kslwtectx () Breakpoint 2, 0x0000000008f9a652 in kslwtbctx () Breakpoint 1, pread64 () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) Breakpoint 3, 0x0000000008fa1334 in kslwtectx () Breakpoint 4, 0x00000000022041f2 in kcfis_tablespace_is_on_sage () Breakpoint 4, 0x00000000022041f2 in kcfis_tablespace_is_on_sage () Breakpoint 2, 0x0000000008f9a652 in kslwtbctx () Breakpoint 3, 0x0000000008fa1334 in kslwtectx () Breakpoint 5, io_submit (ctx=0x7fb42f475000, nr=1, iocbs=0x7fffb4c5e100) at io_submit.c:23 23 io_syscall3(int, io_submit, io_submit, io_context_t, ctx, long, nr, struct iocb **, iocbs) Breakpoint 5, io_submit (ctx=0x7fb42f475000, nr=1, iocbs=0x7fffb4c5e100) at io_submit.c:23 23 io_syscall3(int, io_submit, io_submit, io_context_t, ctx, long, nr, struct iocb **, iocbs) Breakpoint 6, io_getevents_0_4 (ctx=0x7fb42f475000, min_nr=2, nr=128, events=0x7fffb4c66768, timeout=0x7fffb4c67770) at io_getevents.c:46 46 if (ring==NULL || ring->magic != AIO_RING_MAGIC)
So what do we see here?
Row 1 : Here the wait ending for ‘SQL*Net message from client’.
Row 2-3 : This is the ‘SQL*Net message to client’ wait.
Row 5-6-7 : This is begin wait, pread64 for reading the segment header and the end wait.
Row 8&9 : The session probes for Exadata.
Row 10-11 : Start and end of a wait, ‘asynch descriptor resize’ (verified by trace file).
Row 12- : Here the full scan takes off.
So…a query starts (I’ve made it unique by adding a dummy hint, so it’s parsed again) the SQL*Net roundtrip occurs, the segment header is read, then the process looks if it’s on Exadata, which it isn’t here, and starts an asynchronous full scan.
So let’s repeat this check on a database which is on Exadata! In order to do so, we need to be aware several calls are not done on exadata: pread64, io_submit and io_getevents_0_4, because we do not do local IO, but issue them over infiniband. So I breaked on kslwtbctx, kslwtectx, kcfis_tablespace_is_on_sage. This is how that looks like:
(gdb) c Continuing. Breakpoint 2, 0x000000000905cf62 in kslwtectx () Breakpoint 3, 0x0000000002230366 in kcfis_tablespace_is_on_sage ()
Mmmh, this is different. The ‘SQL*Net message from client’ wait ends, and before any other wait occurs, the existence of exadata is checked. This is different from the non-exadata case. Let’s take a look at the backtrace of the break on kcfis_tablespace_is_on_sage:
(gdb) bt #0 0x0000000002230366 in kcfis_tablespace_is_on_sage () #1 0x0000000001402eb0 in qesSageEnabled () #2 0x0000000009234d20 in kkdlgstd () #3 0x0000000001a6111d in kkmfcblo () #4 0x000000000922f76d in kkmpfcbk () #5 0x000000000942e538 in qcsprfro () #6 0x000000000942de29 in qcsprfro_tree () #7 0x000000000942de6e in qcsprfro_tree () #8 0x0000000002dd80c5 in qcspafq () #9 0x0000000002dd51d9 in qcspqbDescendents () #10 0x0000000002dd91e4 in qcspqb () #11 0x0000000001a6b2be in kkmdrv () #12 0x0000000002584c76 in opiSem () #13 0x000000000258ac8b in opiDeferredSem () #14 0x000000000257dc32 in opitca () #15 0x0000000001ec3d7d in kksFullTypeCheck () #16 0x00000000092a7256 in rpiswu2 () #17 0x0000000001eca977 in kksLoadChild () #18 0x0000000009298448 in kxsGetRuntimeLock () #19 0x000000000925aa34 in kksfbc () #20 0x000000000925556e in kkspsc0 () #21 0x0000000009254e6a in kksParseCursor () #22 0x000000000933cb25 in opiosq0 () #23 0x0000000001b82a46 in kpooprx () #24 0x0000000001b80d2c in kpoal8 () #25 0x00000000091fb8b8 in opiodr () #26 0x000000000939e696 in ttcpip () #27 0x000000000180f011 in opitsk () #28 0x0000000001813c0a in opiino () #29 0x00000000091fb8b8 in opiodr () #30 0x000000000180af4c in opidrv () #31 0x0000000001e0a77b in sou2o () #32 0x0000000000a0cc05 in opimai_real () #33 0x0000000001e106ec in ssthrdmain () #34 0x0000000000a0cb71 in main ()
What is interesting to see, is line 23, backtrace layer number 21: kksParseCursor. So actually during parsing the detection of storage servers happens already, not when it actually starts a full segment scan needs to make a decision to do a smartscan or not.