Who's online

There are currently 0 users and 51 guests online.

Recent comments


Oakies Blog Aggregator

Shrink Tablespace

A recent question on the OTN database forum raised the topic of returning free space in a tablespace to the operating system by rebuilding objects to fill the gaps near the start of files and leave the empty space at the ends of files so that the files could be resized downwards.

This isn’t a process that you’re likely to need frequently, but I have written a couple of notes about it, including a sample query to produce a map of the free and used space in a tablespace. While reading the thread, though, it crossed my mind that recent versions of Oracle introduced a feature that can reduce the amount of work needed to get the job done, so I thought I’d demonstrate the point here.

When you move a table its indexes become unusable and have to be rebuilt; but when an index becomes unusable, the more recent versions of Oracle will drop the segment. Here’s a key point – if the index becomes unusable because the table has been moved the segment is dropped only AFTER the move has completed. Pause a minute for thought and you realise that the smart thing to do before you move a table is to make its indexes unusable so that they release their space BEFORE you move the table. (This strategy is only relevant if you’ve mixed tables and indexes in the same tablespace and if you’re planning to do all your rebuilds into the same tablespace rather than moving everything into a new tablespace.)

Here are some outputs demonstrating the effect in a database. I have created (and loaded) two tables in a tablespace of 1MB uniform extents, 8KB block size; then I’ve created indexes on the two tables. Running my ts_hwm.sql script I get the following results for that tablespace:

------- ----------- ----------- ---------- --------------- ------------------
      5         128         255 TEST_USER  T1              TABLE
                256         383 TEST_USER  T2              TABLE
                384         511 TEST_USER  T1_I1           INDEX
                512         639 TEST_USER  T2_I1           INDEX
                640      65,535 free       free

Notice that it was a nice new tablespace, so I can see the two tables followed by the two indexes at the start of the tablespaces. If I now move table t1 and re-run the script this is what happens:

alter table t1 move;

------- ----------- ----------- ---------- --------------- ------------------
      5         128         255 free       free
                256         383 TEST_USER  T2              TABLE
                384         511 free       free
                512         639 TEST_USER  T2_I1           INDEX
                640         767 TEST_USER  T1              TABLE
                768      65,535 free       free

Table t1 is now situated past the previous tablespace highwater mark and I have two gaps in the tablespace where t1 and the index t1_i1 used to be.

Repeat the experiment from scratch (drop the tables, purge, etc. to empty the tablespace) but this time mark the index unusable before moving the table and this is what happens:

------- ----------- ----------- ---------- --------------- ------------------
      5         128         255 free       free
                256         383 TEST_USER  T2              TABLE
                384         511 TEST_USER  T1              TABLE
                512         639 TEST_USER  T2_I1           INDEX
                640      65,535 free       free

Table t1 has moved into the space vacated by index t1_i1, so the tablespace highwater mark has not moved up.

If you do feel the need to reclaim space from a tablespace by rebuilding objects, you can find that it’s quite hard to decide the order in which the objects should be moved/rebuilt to minimise the work you (or rather, Oracle) has to do; if you remember that any table you move will release its index space anyway and insert a step to mark those indexes unusable before you move the table you may find it’s much easier to work out a good order for moving the tables.

Footnote: I appreciate that some readers may already take advantage of the necessity of rebuilding indexes by dropping indexes before moving tables – but I think it’s a nice feature that we can now make them unusable and get the same benefit without introducing a risk of error when using a script to recreate an index we’ve dropped.


Oracle OpenWorld 2014 – XML and JSON presentations, meetings and Hands-On Labs

Oracle OpenWorld is about to start and I am looking forward to it. This year I will be able to learn and enjoy it a...
class="readmore">Read More

Oracle Open World Schedule

I will be in San Francisco a little early this year and as I’m not longer an ACE Director, it’s to attend the IOUC, (International Oracle User Conference) at Redwood Shores on Sept. 25th and 26th.  I’ll be representing RMOUG for the two day event that has representatives from user groups from all over the world.  Our lovely host, Mary Lou Dopart and her team, will ensure we have another great year learning how we can provide the best user group for our members in the Oracle community.

After the a lovely weekend in San Francisco with friends, I’ll be heading over with the rest of the ACE Directors to stay at the Hilton, (as this is the hotel with the worst Wi-Fi in town, I will be bringing signal flags to communicate with my fellow guests out the windows in the courtyard,which seemed successful last year… :))  As Tim is an ACED, I go where Tim goes and my coworkers have quickly called “unfair!”, but hey, being Tim Gorman’s partner has its serious perks… :)

The week will be filled with an incredible list of meetings, events, parties and dinners, but I will also be working my combat boots off for Oracle.

I have two sessions this year:

Zero to Manageability in One Hour: Build a Solid Foundation for Oracle Enterprise Manager 12c [CON8134] — Wednesday, Oct. 1st 1:45pm, Moscone South 303

Oracle Exadata Database Machine Administration and Monitoring Made Easy [CON7726] —  Wednesday, Oct. 1st 4:45pm, Moscone South 104

I don’t have my AWR Warehouse talk for Oak Table World, (sorry folks, total mix up…) but I do have an OTN article on the topic coming out and I’ve sent my use cases to support Jagan Arthreya’s session, (Monday, Sept. 29th 1:30pm, Moscone South 104 [CON8449]).

When not presenting, I’ll be found at a number of demo grounds, hands on labs and AWR Warehouse Demo Grounds.  Scheduled times are:

Monday-  Afternoon, EM Demo Grounds

Tuesday-  Morning, EM Demo Grounds, Afternoon, DBaaS Demo Grounds

Events I’ll be attending outside of Oracle company events:

Sunday:  IOUG Happy Hour, followed by the Oracle ACE Dinner.

Monday:  ODTUG Happy Hour, followed by Friends of Pythian

Tuesday: ODTUG Community Meetup

Wednesday:  OTN Bloggers Meetup

Thursday: Recording of Exadata Session for Oracle Database Learning Stream

Then Tim, (Tim GORMAN,for all of those who keep congratulating Tim Hall on our upcoming nuptials…LOL)  and I fly back on Friday, met by any of our out of town family so we can get married on Sunday, Oct. 5th.  Yeah, you heard me right….we are going to get married right after Oracle Open World… :)







Copyright © DBA Kevlar [Oracle Open World Schedule], All Right Reserved. 2014.

Private Cloud and Lifecycle Management with EM12c at OOW14

Yet again the monster that is Oracle Open World is about to take over San Francisco. I won’t be making it this year, but considering we had something like 60,000 attendees last year I’d hate to see what the numbers are gonna be this year! :)

To try and make it a little easier for you to find all the Private Cloud and Lifecycle Management with EM12c material, here are the ones I know about. If you know of any I’ve missed, feel free to add them in the comments field below! Note that this is only the Private Cloud and Lifecycle Management material – the complete master list of all EM material can be found in the master Focus On EM12c document.

Hands-On Labs

I like to start off with the hands-on labs (HOL) since to me the greatest value is in getting your hands dirty with the technology. Some of these may already be full by the time you read this so be quick if you want to add them to your schedule!

  • Oracle Enterprise Manager Database as a Service: Automation for Broader Cloud Services – In this hands-on lab, you will explore the latest release of Oracle Enterprise Manager, 12c, which enables rapid enterprisewide adoption of database as a service (DBaaS), assisted by a new service catalog and self-service provisioning features that simplify high availability, disaster recovery, and fast data cloning. Participants will choose from a variety of basic to advanced exercises that cover:
    • Creating a DBaaS platform for new database provisioning, using standardized pluggable DBaaS
    • Deploying a comprehensive service catalog
    • Exploring new agile data refresh capabilities with Snap Clone and the Oracle Recovery Manager (Oracle RMAN) feature of Oracle Database
    • Maintaining the cloud DBaaS infrastructure

    This is scheduled twice – Monday 1:15 pm – 2:15 pm and Tuesday 5:15 pm – 6:15 pm.

  • Rapidly Mass-Deploy Oracle Fusion Middleware with Oracle Enterprise Manager 12c Provisioning – Enterprises are increasingly using Oracle Enterprise Manager 12c to rapidly mass-deploy Oracle Fusion Middleware components. The secure and stable provisioning platform of Oracle Enterprise Manager enables administrators to quickly provision a large number of environments on virtual or physical hosts, based on gold images. In this hands-on lab, you’ll learn how to discover and verify an Oracle SOA environment, based on best practices, and how to make a clone of this environment and save it to Oracle Enterprise Manager. You’ll then learn how to use this cloned image to provision a new instance of the initial environment. This HOL is only scheduled once – Monday 2:45 pm to 3:45 pm
  • Achieving Standardization with Oracle Enterprise Manager Database Lifecycle Management – Database deployment standardization is the key to maximizing DBA efficiency and is a first step toward the cloud. Oracle Enterprise Manager 12c supports comprehensive lifecycle management geared toward standardization, including automated provisioning; patching; and change, configuration, and compliance management. This hands-on lab focuses on these key features and how they support standardization. This HOL is scheduled twice – Monday 4:15 pm – 5:15 pm and Wednesday 4:15 pm to 5:15 pm.
  • Private Cloud Self-Service, Oracle Fusion Middleware PaaS with Oracle Enterprise Manager 12c – The Oracle Enterprise Manager platform enables enterprises to quickly deploy a private middleware PaaS cloud integrated with other services such as IaaS or DBaaS and empowers them to deliver an agile, secure, and stable self-service model for deploying Java platforms and applications to their users. In this session, you’ll learn how to create and publish services from the administrator’s viewpoint and how to make those available to self-service users. You’ll also learn how to provision and configure those services for use, deploy your Java applications, and integrate a middleware PaaS cloud with a database PaaS cloud. Finally, you’ll see how you can monitor your deployed applications and configure policies for performance- or schedule-based scaling. This HOL is scheduled twice – Tuesday 3:45 pm – 4:45 pm and Thursday 1 pm – 2 pm.

Conference Sessions

These are the standard presentations slots. They’ve been cut back to 45 minutes this year, so you’ll either need to ask questions quickly or grab the speakers afterwards if you have more detailed questions. Those marked with an asterisk are presentations by Oracle employees, those with a plus sign are co-presented by Oracle and customer speakers. those without either are totally presented by customers.

Monday September 29

*Best Practices from Oracle Cloud Delivered On-Premises with Oracle Enterprise Manager – CON8018, Shailesh Dwivedi, 1:30 – 2:15.
Database Software Currency: Using Oracle Enterprise Manager 12c Provisioning and Patching – CON3178, Gary Henderson, 2:45 – 3:30.
*How to Deliver Oracle Fusion Middleware as a Service to Your Organization – CON7953, Vikas Anand, Pyounguk Cho, Shailesh Dwivedi, Yoav Eilat, Kelly Goetsch, 4:00 – 4:45.
Why Database as a Service Will Be a Breakaway Technology at Société Générale – CON2436, Christian BILIEN, 5:15 – 6:00

Tuesday September 30

*Snap Clone: Snapshot Your Data Without Snapping Your Storage – CON8122, Bala Kuchibhotla, Subhadeep Sengupta, 10:45 – 11:30
*Create a DBaaS Catalog in an Hour with a PaaS-Ready Infrastructure – CON5748, Roger Bitar, Kenneth Kutzer, Ramin Moazen, 12:00 – 12:45

Wednesday October 1

*Oracle Infrastructure Systems Management with Oracle Enterprise Manager and Ops Center – CON4954, Simon Hayler, 10:15 – 11:00
+Databases to Oracle Exadata: The Saga Continues for Oracle Enterprise Manager–Based Patching – CON8121 , Brian Bong, Dee Hicks, Hari Srinivasan, 10:15 – 11:00
+Middleware as a Service: Converged Solution for Administrators and DevOps – CON8091, Ron Clanton, Gebhard Herget, Henrik Blixt,
12:45 – 1:30
+DBaaS 2.0: Rapid Provisioning, Richer Services, Integrated Testing, and More – CON8016, George Mamvura Mamvura, Javier Ruiz, Adeesh Fulay, 3:30 – 4:15
*Database as a Service (DBaaS) Cookbook: Strategies and Tips for Successful Deployment – CON8260, Ashish Agrawal, Adeesh Fulay, GP Gongloor, 4:45 – 5:30

Thursday October 2

+Enterprise Architecture Approach to Developing a DBaaS Private Cloud at Boeing – CON3028, Sami Turan, Enes Yildirim, Yunas Nadiadi, 9:30 – 10:15
PDBaaS with Oracle Enterprise Manager 12c – CON4448, Leighton Nelson, 12:00 – 12:45
+Security Compliance and Data Governance: Dual Problems, Single Solution – CON8015, Steve Ries, David Wolf, 1:15 – 2:00


Finally, there are demonstrations available for you to look at whenever the Demogrounds are open (usually closed only during the keynotes, IIRC):

Automation and Storage Savings with Database as a Service and Snap Clone – Moscone South, Left – SLD-102
Complete Database Lifecycle Management – Moscone South, Left – SLD-107
Middleware PaaS in Private Cloud with Oracle Enterprise Manager 12c – Moscone South, Left – SLM-111
Oracle SuperCluster and Oracle VM for SPARC Management with Oracle Enterprise Manager Ops Center 12c – Moscone South, Center – SC-158

Hope that helps you sort out what’s of most value to you at OOW14!

#CloneAttack at Oracle OpenWorld 2014

Delphix and Dbvisit will be at the OTN Lounge in the lobby of Moscone South from 3:30 – 5:00pm on Monday 29-Sept.  Come join us to hear about #CloneAttack and #RepAttack, two great hands-on learning opportunities.


#CloneAttack is your chance to install a complete Delphix lab environment on your Windows or Mac laptop for you to play with and experiment at any time.  Experts Kyle Hailey, Steve Karam, Adam Bowen, Ben Prusinski, and I will be sharing USB “thumb” drives with the virtual machine OVA files for the lab environment, and we will be working one-on-one with you to help you get everything up and running, then to show you basic use-cases for cloning with Delphix.

Bring your laptop, bring your VMware, and get some data virtualization into your virtual life!

At the same time, #CloneAttack will be joined by #RepAttack by Dbvisit, where Arjen Visser, Jan Karremans, and the team will be helping you replicate Oracle to Oracle for zero downtime upgrades.

This just in!  #MonitorAttack from Confio SolarWinds will also be joining the party at the CCM on Tuesday to show you how to quickly and easily install Confio Ignite and enjoy the great features there.


Children’s Creativity Museum, 221 4th St, San Francisco


Tuesday, Sept 30 from 10am – 5pm PDT

Requirements and preparation before arriving:

  • Mac: VMware Fusion (free trial version download)
  • Win: VMware Workstation (free trial version download)
  • Oracle: Oracle11gR2 (free trial version – disk1, disk2)
  • at least 8 GB RAM
  • at least 50 GB free disk space, but preferably 100 GB free
  • at least 2 Ghz CPU, preferably dual-core or better

Oracle Midlands Event #6 : Registration Open

Registration for Oracle Midlands Event #6 is now open. Here’s the information from the website.


I’ve registered. See you there!



Oracle Midlands Event #6 : Registration Open was first posted on September 18, 2014 at 10:53 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Oracle EMEA Customer Support Services Excellence Award 2014

The corporation announced today that I got the Customer Services Excellence Award 2014 in the category ‘Customer Champion’ for the EMEA region. It is an honor to be listed there together with these excellent professionals that I proudly call colleagues.

CSS Excellence Award 2014

Blog articles at Toad World

I just wanted to point you to the blog articles I've been posting over at Toad World in case you're wondering why there aren't many articles showing up here.

So, go take a look!

Karen's Toad World Blog

Oracle Midlands Event #5 : Summary

Oracle Midlands Event #5 happened last night.

First up was Martin Widlake speaking about clustering data to improve performance. The cool and scary thing about Oracle is you often go into session like this thinking it’s all going to be stuff you already know, then you realise how much you either didn’t know in the first place, or had forgotten. A couple of times Martin asked questions of the audience and I felt myself shrinking back in my seat and chanting the mantra, “Don’t pick me!”, in my head. :)

After food and some chatting there was freebie prize draw. I won a RedGate t-shirt. I shall have to remember to take it to OOW and “accidentally” wear it whilst walking past the RedGate booth. :)

Next up was Ron Ekins speaking about data virtualization and instant cloning. The session started off with a general introduction to the subject and a quick look at the products available to achieve it. This session also included a live demo of Delphix. The more I hear about Delphix, the more impressed I am. Judging by the Oracle geeks they’ve drawn into the company in recent years (Kyle Haley, Steve Karam and Tim Gorman) it seems I’m not alone in that opinion. :)

Thanks to RedGate for sponsoring the event, allowing it remain free.  Thanks to Mike Dirden for organising the event. Thanks to the speakers travelling to Birmingham to teach us some stuff. Thanks also to the attendees, without whom this would not happen. Please keep spreading the word. :)



PS. The next event will be with Bjoern Rost in November. Keep an eye on the Oracle Midlands site for details, which should be appearing in the next few days.

Oracle Midlands Event #5 : Summary was first posted on September 17, 2014 at 11:12 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

About index range scans, disk re-reads and how your new car can go 600 miles per hour!

Despite the title, this is actually a technical post about Oracle, disk I/O and Exadata & Oracle In-Memory Database Option performance. Read on :)

If a car dealer tells you that this fancy new car on display goes 10 times (or 100 or 1000) faster than any of your previous ones, then either the salesman is lying or this new car is doing something radically different from all the old ones. You don’t just get orders of magnitude performance improvements by making small changes.

Perhaps the car bends space around it instead of moving – or perhaps it has a jet engine built on it (like the one below :-) :

Anyway, this blog entry is a prelude to my upcoming Oracle In-Memory Database Option series and here I’ll explain one of the radical differences between the old way of thinking and modern (In-Memory / Smart Scan) thinking that allow such performance improvements.

To set the scope and and clarify what I mean by the “old way of thinking”: I am talking about reporting, analytics and batch workloads here – and the decades old mantra “if you want more speed, use more indexes”.

I’m actually not going to talk about the In-Memory DB option here – but I am going to walk you through the performance numbers of one index range scan. It’s a deliberately simple and synthetic example executed on my laptop, but it should be enough to demonstrate one important point.

Let’s say we have a report that requires me to visit 20% of rows in an orders table and I’m using an index range scan to retrieve these rows (let’s not discuss whether that’s wise or not just yet). First, I’ll give you some background information about the table and index involved.

My test server’s buffer cache is currently about 650 MB:

SQL> show sga

Total System Global Area 2147483648 bytes
Fixed Size                  2926472 bytes
Variable Size             369100920 bytes
Database Buffers          687865856 bytes
Redo Buffers               13848576 bytes
In-Memory Area           1073741824 bytes

The table I am accessing is a bit less than 800 MB in size, about 100k blocks:

SQL> @seg soe.orders

---------- ------ -------------  ------------- -------- 
       793 SOE    ORDERS         TABLE           101504 

I have removed some irrelevant output from the output below, I will be using the ORD_WAREHOUSE_IX index for my demo:

SQL> @ind soe.orders
Display indexes where table or index name matches %soe.orders%...

------------ ----------- ------------------ ---- --------------- ----
SOE          ORDERS      ORDER_PK              1 ORDER_ID
                         ORD_WAREHOUSE_IX      1 WAREHOUSE_ID
                                               2 ORDER_STATUS

------------ ----------- ----------------- ---------- ---- -------- ---- ---- -- ------- --------- ---------- --------- ----------------- ------ ---------
SOE          ORDERS      ORDER_PK          NORMAL/REV YES  VALID    NO   N     3   15801   7148950    7148950   7148948 20140913 16:17:29 16     VISIBLE
             ORDERS      ORD_WAREHOUSE_IX  NORMAL     NO   VALID    NO   N     3   17860      8685    7148950   7082149 20140913 16:18:03 16     VISIBLE

I am going to do an index range scan on the WAREHOUSE_ID column:

SQL> @descxx soe.orders

Col# Column Name                    Null?      Type                      NUM_DISTINCT        Density  NUM_NULLS HISTOGRAM       NUM_BUCKETS Low Value                        High Value
---- ------------------------------ ---------- ------------------------- ------------ -------------- ---------- --------------- ----------- -------------------------------- --------------------------------
   1 ORDER_ID                       NOT NULL   NUMBER(12,0)                   7148950   .00000013988          0                           1 1                                7148950
   9 WAREHOUSE_ID                              NUMBER(6,0)                        999   .00100100100          0                           1 1                                999

Also, I enabled SQL trace and event 10298 – “ORA-10298: ksfd i/o tracing”, more about that later:

SQL> ALTER SESSION SET EVENTS '10298 trace name context forever, level 1';

Session altered.


PL/SQL procedure successfully completed.


Ok, now we are ready to run the query! (It’s slightly formatted):

SQL> SELECT /*+ MONITOR INDEX(o, o(warehouse_id)) */ 
         soe.orders o 
         warehouse_id BETWEEN 400 AND 599;

Let’s check the basic autotrace figures:

          0  recursive calls
          0  db block gets
    1423335  consistent gets
     351950  physical reads
          0  redo size
        347  bytes sent via SQL*Net to client
        357  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

What?! We have done 351950 physical reads?! This is 351950 blocks read via physical read operations. This is about 2.7 GB worth of IOs done just for this query! Our entire table size was under 800MB and the index size under 150MB. Shouldn’t indexes allow us to visit less blocks than the table size?!

Let’s dig deeper – by breaking down this IO number by execution plan line (using a SQL Monitoring report in this case):

Global Stats
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
|      48 |      25 |       23 |     1 |     1M | 352K |   3GB |

SQL Plan Monitoring Details (Plan Hash Value=16715356)
| Id |               Operation                |       Name       | Execs |   Rows   | Read | Read  | Activity |       Activity Detail       |
|    |                                        |                  |       | (Actual) | Reqs | Bytes |   (%)    |         (# samples)         |
|  0 | SELECT STATEMENT                       |                  |     1 |        1 |      |       |          |                             |
|  1 |   SORT AGGREGATE                       |                  |     1 |        1 |      |       |          |                             |
|  2 |    TABLE ACCESS BY INDEX ROWID BATCHED | ORDERS           |     1 |       1M | #ff0000;">348K |   #ff0000;">3GB |    96.30 | Cpu (1)                     |
|    |                                        |                  |       |          |      |       |          | db file parallel read (25)  |
|  3 |     INDEX RANGE SCAN                   | ORD_WAREHOUSE_IX |     1 |       1M | 3600 |  28MB |     3.70 | db file sequential read (1) |

So, most of these IOs come from accessing the table (after fetching relevant ROWIDs from the index). 96% of response time of this query was also spent in that table access line. We have done about ~348 000 IO requests for fetching blocks from this table. This is over 3x more blocks than the entire table size! So we must be re-reading some blocks from disk again and again for some reason.

Let’s confirm if we are having re-reads. This is why I enabled the SQL trace and event 10298. I can just post-process the tracefile and see if IO operations with the same file# and block# combination do show up.

However, using just SQL trace isn’t enough because multiblock read wait events don’t show all blocks read (you’d have to infer this from the starting block# and count#), the “db file parallel read” doesn’t show any block#/file# info at all in SQL Trace (as this “vector read” wait event encompasses multiple different block reads under a single wait event).

The classic single block read has the file#/block# info:

WAIT #139789045903344: nam='db file sequential read' ela= 448 file#=2 block#=1182073 blocks=1 obj#=93732 tim=156953721029

The parallel read wait events don’t have individual file#/block# info (just total number of files/blocks involved):

WAIT #139789045903344: nam='db file parallel read' ela= 7558 files=1 blocks=127 requests=127 obj#=93696 tim=156953729450

Anyway, because we had plenty of db file parallel read waits that don’t show all the detail in SQL Trace, I also enabled the event 10298 that gives us following details below (only tiny excerpt below):

ksfd_osdrqfil:fob=0xce726160 bufp=0xbd2be000 #ff0000;">blkno=1119019 nbyt=8192 flags=0x4
ksfdbio:rq=0x7f232c4edb00 fob=0xce726160 aiopend=126
ksfd_osdrqfil:fob=0xce726160 bufp=0x9e61a000 #ff0000;">blkno=1120039 nbyt=8192 flags=0x4
ksfdbio:rq=0x7f232c4edd80 fob=0xce726160 aiopend=127
ksfdwtio:count=127 aioflags=0x500 timeout=2147483647 posted=(nil)
ksfdchkio:ksfdrq=0x7f232c4edb00 completed=1
ksfdchkio:ksfdrq=0x7f232c4edd80 completed=0
WAIT #139789045903344: nam='db file parallel read' ela= 6872 files=1 blocks=127 requests=127 obj#=93696 tim=156953739197

So, on Oracle on Linux x86_64 with xfs filesystem with async IO enabled and filesystemio_options = SETALL we get the “ksfd_osdrqfil” trace entries to show us the block# Oracle read from a datafile. It doesn’t show the file# itself, but it shows the accessed file state object address (FOB) in SGA and as it was always the same in the tracefile, I know duplicate block numbers listed in trace would be for the same datafile (and not for a block with the same block# in some other datafile). And the tablespace I used for my test had a single datafile anyway.

Anyway, I wrote a simple script to summarize whether there were any disk re-reads in this tracefile (of a select statement):

$ grep ^ksfd_osdrqfil LIN121_ora_11406.trc | awk '{ print $3 }' | sort | uniq -c | sort -nr | head -20
     10 blkno=348827
     10 blkno=317708
      9 blkno=90493
      9 blkno=90476
      9 blkno=85171
      9 blkno=82023
      9 blkno=81014
      9 blkno=80954
      9 blkno=74703
      9 blkno=65222
      9 blkno=63899
      9 blkno=62977
      9 blkno=62488
      9 blkno=59663
      9 blkno=557215
      9 blkno=556581
      9 blkno=555412
      9 blkno=555357
      9 blkno=554070
      9 blkno=551593

Indeed! The “worst” blocks have been read in 10 times – all that for a single query execution.

I only showed 20 top blocks here, but even when I used “head -10000″ and “head -50000″ above, I still saw blocks that had been read in to buffer cache 8 and 4 times respectively.

Looking into earlier autotrace metrics, my simple index range scan query did read in over 3x more blocks than the total table and index size combined (~350k blocks read while the table had only 100k blocks)! Some blocks have gotten kicked out from buffer cache and have been re-read back into cache later, multiple times.

Hmm, let’s think further: We are accessing only about 20% of a 800 MB table + 150 MB index, so the “working set” of datablocks used by my query should be well less than my 650 MB buffer cache, right? And as I am the only user in this database, everything should nicely fit and stay in buffer cache, right?

Actually, both of the arguments above are flawed:

  1. Accessing 20% of rows in a table doesn’t automatically mean that we need to visit only 20% blocks of that table! Maybe all of the tables’s blocks contain a few of the rows this index range scan needs? So we might need to visit all of that table’s blocks (or most of them) and extract only a few matching rows from each block. But nevertheless, the “working set” of required blocks for this query would include almost all of the table blocks, not only 20%. We must read all of them in at some point in the range scan.So, the matching rows in table blocks are not tightly packed and physically in correspondence with the index range scan’s table access driving order, but are potentially “randomly” scattered all over the table.This means that an index range scan may come back and access some data block again and again to get a yet-another row from it when the ROWID entries in index leaf blocks point there. This is what I call buffer re-visits(Now scroll back up and see what is that index’es clustering factor :-)

  2. So what, all the buffer re-visits should be really fast as the previously read block is going to be in buffer cache, right?Well, not really. Especially when the working set of blocks read is bigger than buffer cache. But even if it is smaller, the Oracle buffer cache isn’t managed using basic LRU replacement logic (since 8.1.6). New blocks that get read in to buffer cache will be put into the middle of the “LRU” list and they work their way up to the “hot” end only if they are touched enough times before someone manages to flush them out. So even if you are a single user of the buffer cache, there’s a chance that some just recently read blocks get aged out from buffer cache – by the same query still running – before they get hot enough. And this means that your next buffer re-visit may turn into a disk block re-read that we saw in the tracefiles.If you combine this with the reality of production systems where there’s a thousand more users trying to do what you’re doing, at the same time, it becomes clear that you’ll be able to use only a small portion of the total buffer cache for your needs. This is why people sometimes configure KEEP pools – not that the KEEP pool is somehow able to keep more blocks in memory for longer per GB of RAM, but simply for segregating the less important troublemakers from more important… troublemakers :)


So what’s my point here – in the context of this blog post’s title?

Let’s start from Exadata – over the last years it has given many customers order(s) of magnitude better analytics, reporting and batch performance compared to their old systems, if done right of course. In other words, instead of indexing even more, performing wide index range scans with millions of random block reads and re-reads, they ditched many indexes and started doing full table scans. Full table scans do not have such “scaling problems” like a wide index range scan (or a “wide” nested loop join driving access to another table). In addition you got all the cool stuff that goes really well with full scans – multiblock reads, deep prefetching, partition-wise hash joins, partition pruning and of course all the throughput and Smart Scan magic on Exadata).

An untuned complex SQL on a complex schema with lots of non-ideal indexes may end up causing a lot of “waste IO” (don’t have a better term) and similarly CPU usage too. And often it’s not simple to actually fix the query – as it may end up needing a significant schema adjustment/redesign that would require also changing the application code in many different places (ain’t gonna happen). With defaulting reporting to full table scans, you can actually eliminate a lot of such waste, assuming that you have a high-througput – and ideally smart – IO subsystem. (Yes, there are always exceptions and special cases).

We had a customer who had a reporting job that ran almost 2000x faster after moving to Exadata (from 17 hours to 30 seconds or something like that). Their first reaction was: “It didn’t run!” Indeed it did run and it ran correctly. Such radical improvement came from the fact that the new system – compared to the old system – was doing multiple things radically better. It wasn’t just an incremental tweak of adding a hint or a yet another index without daring to do more significant changes.

In this post I demoed just one of the problems that’s plaguing many of the old-school Oracle DW and reporting systems. While favoring full table scanning had always been counterintuitive for most Oracle shops out there, it was the Exadata’s hardware, software and also the geek-excitement surrounding it, what allowed customers to take the leap and switch from the old mindset to new. I expect the same from the Oracle In-Memory Database Option. More about this in a following post.