Search

OakieTags

Who's online

There are currently 0 users and 29 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Oracle Exadata Database Machine Handily Handles The Largest Database In Oracle IT. What Does That Really Mean?

In my recent post entitled Oracle Executives Underestimate SPARC SuperCluster I/O Capability–By More Than 90 Percent! I offered some critical thinking regarding the nonsensical performance claims attributed to the Sun SPARC SuperCluster T4 in one of the keynotes at Oracle Openworld 2011. In the comment thread of that post a reader asks:

All – has anyone measured the actual IOPS from disk as well as from flash in your Exadata (production) environment and compare with what the Oracle white paper or CXO presentations claimed?

That is a good question. It turns out the reader is in luck. There happens to be really interesting public information that can answer his question. According to this searchoracle.techtarget.com  article, Campbell Webb, Oracle’s VP of Product Development IT refers to Oracle’s Beehive email and collaboration database as “Oracle’s largest backend database.” Elsewhere in the article, the author writes:

Oracle’s largest in-house database is a 101-terabyte database running Beehive, the company’s in-house email and collaboration software, running on nine Oracle Exadata boxes.

“Exadata boxes?” After that misnomer I’m glad I don’t have to stick with the techtarget.com piece to finish this post.  As it turns out, Oracle’s Campbell Webb delivered a presentation on the Beehive system at Oracle Openworld 2011.  The presentation (PDF) can be found here. I’ll focus on some screenshots of that PDF to finish out this post.

According the following slide from the presentation we glean the following facts about Oracle’s Beehive database:

  • The Beehive database is approximately 93TB
  • Redo generation peaks at 20MB/s—although that is unclear if that is per instance or the aggregate of all instances of this Real Application Clusters Database
  • User SQL executions peaks at roughly 30,000 per second

The techtarget.com piece quotes Campbell Webb as stating the configuration is 9 racks of Exadata gear with 24 instances of the database—but “only” 16 are currently active. That is a lot of Oracle instances and, indeed, a lot of instances can drive a great deal of physical I/O. Simply put, a 9-rack Exadata system is gargantuan.

The following is a zoom-in photo of slide 12 from the presentation. It spells out that the configuration has the standard 14 Exadata Storage Servers per rack (126 / 14 == 9) and that the hosts are X2-2 models. In a standard configuration there would be 72 database hosts in a 9-rack X2-2 configuration but the techtarget.com article quotes Webb as stating 16 are active and there only 24 in total. More on that later.

With this much gear we should expect astounding database throughput statistics. That turns out to not be the case. The following slide shows:

  • 4,000,000 logical I/O per second at peak utilization. That’s 250,000 db block gets + db block consistent gets (cache-buffers chain walks) per second per active host (16 hosts). That’s a good rate of SGA buffer pool cache activity—but not a crushing load for 2S Westmere EP.
  • The physical read to write ratio is 88:12.
  • Multiblock physical I/Os are fulfilled by Exadata Storage Servers on average at 6 or less milliseconds
  • Single block reads are largely satisfied in Exadata Smart Flash Cache as is evidenced by the 1ms waits
  • Finally, database physical I/O peaks at 176,000 per second

176,000 IOPS
With 126 storage servers there is roughly 47TB of Exadata Smart Flash Cache. Considering the service times for single block reads there is clear evidence that the cache management is keeping the right data in the cache. That’s a good thing.

On the other hand, I see a cluster of 16 2U dual-socket Westmere-EP Real Application Clusters servers driving peak IOPS of 176,000. Someone please poke me with a stick because I’m bored to death—falling asleep. Nine racks of Exadata is capacity for 13,500,000 IOPS (read operations only of course). This database is driving 1% of that. 

Nine racks of Exadata should have 72 database hosts. I understand not racking them if you don’t need them, but the configuration is using less than 2 active hosts per rack—but, yes, there are 24 cabled (less than 3 per rack). Leaving out 48 X2-2 hosts is 96U—more than a full rack worth of aggregate wasted space. I don’t understand that.  The servers are likely in the racks—powered off. You, the Oracle customer, can’t do that because you aren’t Oracle Product Development IT. You’ll be looking at capex—or a custom Exadata configuration if you need 16 hosts fed by 126 cells.

Parting Thoughts
It is not difficult to configure a Real Application Clusters system capable of beating 16 2-socket Westmere EP servers, with their 176,000 IOPS demand, with far, far less than 9 racks of hardware.  It would be Oracle software just the same—just no Exadata bragging rights. And, once a modern, best-of-breed system  is happily steaming along its way hustling 176,000 IOPS, you could even call it an “Engineered System.” There’d be no Exadata bragging rights though. Just a good system handling a moderate workload. There is nothing about this workload that can’t be easily handled with conventional, best-of-breed storage. EMC technology with FAST quickly comes to mind.

Beehive is Oracle’s largest database and it runs on a huge Exadata configuration. Those two facts put together do not make any earth-shattering proof point when you study the numbers.

I don’t get it. Well, actually I do.

By the way, did I mention that 176,000 IOPS is not a heavy IOPS load–especially when only 12% of them are writes?

Filed under: oracle

What Have I Let Myself in For! – UKOUG this year

One of my favourite Oracle happenings of the year is fast approaching, the UK Oracle User Group technical conference {see/click on the link on the right margin}. I’ve blogged before ( like here, last year) why I think it is so good.

I try and present at the conference each year and I go no matter if I am presenting or not.

However, this year I think I might have got myself into trouble. I put forward 3 talks, expecting one or possibly two to get through. One on Index Organized Tables, one on IT disasters and one as an introduction to database design – I’ve moaned about it being a dying art so I figured I should get off my backside and do something positive about it. Each talk is in a different stream.

Well, the IOT talk was accepted, the Disasters talk was rejected and the Database Design intro was put on the reserve list. I was happy with that. I did three talks the first year I presented and promised myself never to be that stupid again {I spent most of the conference in the Speaker’s lounge or my hotel putting the talks together and tweaking slides}.

What I was not expecting was for the OakTable to ask me to do the IOT talk on the OakTable Sunday. Yikes! {The OakTable Sunday is a great opportunity to see a set of presentations by people who really know their stuff in a smaller setting – You really want to get along to it if you can}. However I had two reasons not to do it:

  1. I would have to miss one of the other OakTable talks.
  2. That thing I said about people presenting who really know their stuff.

I was told that (1) was not a problem as the talks would be repeated in the main conference so I would have an opportunity to see  the one I missed and (2) stop being so British and do it. {In fact, one friend on the OakTable told me off after the last conference for my criticism of my own presentation that year – “yes it was poor for you but normally you do a good job, so keep doing it”}. Of course I said yes.

Then it struck me, I was presenting twice now. Once on Sunday and repeating on Wednesday in hall 5 {I’ll probably not simply repeat the contents, at the OakTable Sunday I’ll assume a little more knowledge by the audience and dig a bit deeper technically, in the main conference I’ll cover off the basics more, for those utterly new to IOTs}. At least it was only one set of slides to prepare.

A few days later I get a mail from the UKOUG office. A gap had appeared in the Development stream, would I be willing to do my “Oracle Lego – an introduction to database design” talk – but beef it up a little? Yes, sure. What do you mean about beef it up? The dev stream guys wanted something that went into more detail, was more about some of the more challenging systems I’ve work on. So we exchanged a few emails and it quickly became apparent that some wanted the intro talk I had originally proposed, to get people going with database design. Others felt there would be more audience for a more in-depth talk, so could I span both? I had to say no. I remember attending my Oracle database design course in 1993. It was 5 days long. If my memory serves there was also a second course a couple of weeks later that covered more advanced design for 3 days! I can talk fast but not 8 days fast. They were effectively asking for two quite different presentations, an intro and then a review of more challenging examples “OK” they said, “do Oracle Lego – But if another gap comes up, could you do the intermediate talk?”. Err, OK… So I wrote a quick synopsis for “Oracle Meccano” {Meccano is a toy construction kit made up of miniature girders, plates, bolts and stuff you can make proper things out of. If you liked Lego you would love Meccano as you got older} .

Since then I have been slightly anxious about getting an email from the UKOUG about a gap in the development stream for the conference…

This week I have started preparing the presentations for real {which so far has resulted in me breaking my server, finding a load of notes on blogs I was going to write and then doing this post} so I contacted the ladies in charge of the agenda and asked if I was now off the hook for the Oracle Meccano talk? “Yes, no more gaps, it is not on the agenda”. Phew. “But could you put it together in case of last minute cancellations?”. *sigh*. OK.

So I will, but I’m not signing up to do any Session Chairing, which I was about to. If you see me at the conference and I look a little crazed, it’s because I got a mail from the UKOUG just before the event about a sudden gap…

At least there is no chance I will be asked to do the Disasters talk at short notice, I saw the scores it got by the paper reviewers :-) .

Good blogs to read

If you’re looking for more good Oracle blogs to read, here are two very good ones:
 
Enjoy :)

Matching the Expected Output – Analytic RANK, ROW_NUMBER, DENSE_RANK, or Something Different?

November 16, 2011 I noticed an interesting request on the comp.databases.oracle.misc usenet group, and I thought that I would make an attempt at solving the request.  The original poster (OP) stated that he had a table with data similar to the following: TX   ID   DEPT   LOCATION   LOAD 1    99    A        NY       12 2    99    A        [...]

UKOUG Agenda

As in previous years the UKOUG allows you to create a personalized agenda for the upcoming conference. To give you a flavour for some of the excellent presentations and to encourage you to register I’ve reproduced mine below, as with previous years this is a ideal wishlist, I’ll probably “die” halfway through various days and [...]

Configuring VNC Server on Fedora 16…

When Fedora 15 rocked up it brought with it a replacement for the old init-style startup called systemd. In Fedora 15, it was pretty easy to ignore this change as everything (including vncserver) ran pretty much the same as before.

Fedora 16 is a little more “aggressive” about it’s use of systemd. When you issue and old-style service command, you are in no doubt that things have changed.

[root@homer system]# service nfs stop
Redirecting to /bin/systemctl  stop nfs.service
[root@homer system]#

Once again, not a big deal in itself.

So that brings me to the reason for this post. The configuration of VNC Server has changed completely between Fedora 15 and Fedora 16. By default you can’t get away with editing the “/etc/sysconfig/vncservers” file anymore. That issue prompted me to knock up a quick article to remind myself how to do the systemd-style setup.

I included the old method for the sake of comparison.

Cheers

Tim…




Why does the same job run slower on an exact clone?

A customer  was working with a clone copy of their production database for reports, but the cloned database reports were running slower than the same reports on production and they wanted to know why.

Ostensibly the machines were similar and the memory was the same, though it’s always worth verifying.  The first thing I checked was the load profile. If you have OEM you can look at the performance page and/or top activity page to see the load profile.  The load profile of production was much more CPU bound and much less I/O bound than the clone and their was clearly a difference.

The customer knew the time they the report was run on the clone  but was unsure exactly when it was run on production, though knew it had been run in the past day.

On the clone box, during the report run the load profile using SQL script to reproduce the performance graph looked like :

07-NOV  CLONE
TM     AAS  GRAPH
----- ----  ----------------------------------------------------
11:00   .2  +              6
12:00   .2  -              6
13:00   .1                 6
14:00   .1                 6
15:00   .1                 6
16:00 14.9  ++++++ooooooooo6ooooooooooooooooooooooooooooooooooo-
17:00   .0                 6

The reports were run between 4pm and 5pm  (ie 16:00-17:00) and that’s easy to see from the load chart.
The “+” represent CPU, “o” represent I/O, and “-” other wait time.
Now the reports on production were suppose to be run around 7am but the client wasn’t sure.
Here is what the load profile looked like on production looked like

07-NOV PRODUCTION
time AAS GRAPH
----- --- ---------------------------------------------------------------------
00:00 1.5 ++++o
01:00 1.1 +++
02:00  .4 +
03:00  .2 +
04:00  .4 ++
05:00 1.5 ++ooo
06:00 1.6 +++oo
07:00 3.2 ++++++++ooo-
08:00 3.6 ++++++++++ooo--
09:00 6.1 +++++++++++++++++oooooo----
10:00 4.7 +++++++++++++++++ooo
11:00 5.3 +++++++++++++++++++ooo-
12:00 0.0 +++++++++++++++++++++++++++++++++oooooooo---------------------------
13:00 0.5 ++++++++++++++++++++++++oooooooooooooooooooo
14:00 19.2++++++++++++++++++++++++++++++++oooooooooooooooooooooooooooooooooooooo
15:00 6.5 +++++++++++++++oooooo

The customer thought the report had been run at 8am on production and at 8am there is little I/O wait and some  CPU. Looking at the top SQL from the clone during the report verses the production database at 8am didn’t show any of the same top SQL.  At this point I ask the customer to send me AWR exports of the two databases. I imported the AWR exports giving the clone database DBID=1 and production DBID=2

The first thing I check was the I/O latency in the AWR repository data. Both production and the clone showed I/O averaging 6ms, so latency was unlikely to be the problem.
Clone Latency  during the report:

BTIME           EVENT_NAME                        AVG_MS         CT
--------------- ------------------------------ --------- ----------
07-NOV-11 16:00 db file scattered read              6.48        4,246
07-NOV-11 16:00 db file sequential read             8.15    4,760,454   **
07-NOV-11 16:00 direct path read                    3.50      680,192
07-NOV-11 16:00 direct path write                  10.22           14
07-NOV-11 16:00 direct path write temp               .62            2

Production looked like (for example)

BTIME           EVENT_NAME                        AVG_MS         CT
--------------- ------------------------------ --------- ----------
07-NOV-11 14:00 db file scattered read              2.93    148,226
07-NOV-11 14:00 db file sequential read             6.35  4,961,761  **
07-NOV-11 14:00 direct path read                    2.32  2,706,322
07-NOV-11 14:00 direct path write                    .63        204
07-NOV-11 14:00 direct path write temp               .86        820

Thus the clone looks slightly slower,  but not enough to explain I/O load seen on the clone.

Then I check for the top SQL during the report which gave:

select
     SQL_ID ,
     sum(decode(session_state,'ON CPU',1,0))     "CPU",
     sum(decode(session_state,'WAITING',1,0))    -
     sum(decode(session_state,'WAITING', decode(wait_class, 'User I/O',1,0),0))    "WAIT" ,
     sum(decode(session_state,'WAITING', decode(wait_class, 'User I/O',1,0),0))    "IO" ,
     sum(decode(session_state,'ON CPU',1,1))     "TOTAL"
from   dba_hist_active_sess_history
where SQL_ID is not NULL
 and  dbid=1
group by sql_id
order by sum(decode(session_state,'ON CPU',1,1))   desc

 

   SQL_ID           CPU   WAIT     IO  TOTAL
   ------------- ------ ------ ------ ------
  4sfx95xxxxc03   31.0     .0  736.0    767

I wasn’t confident that the customer knew when the report was run on production, so I just searched for these SQL_ID values over the last 12 hours. The top query looks like a radically different I/O profile

select
     SQL_ID ,
     sum(decode(session_state,'ON CPU',1,0))     "CPU",
     sum(decode(session_state,'WAITING',1,0))    -
     sum(decode(session_state,'WAITING', decode(wait_class, 'User I/O',1,0),0))    "WAIT" ,
     sum(decode(session_state,'WAITING', decode(wait_class, 'User I/O',1,0),0))    "IO" ,
     sum(decode(session_state,'ON CPU',1,1))     "TOTAL"
from   dba_hist_active_sess_history
where SQL_ID is not NULL
 and  dbid=2
and sql_id = '4sfx95xxxxc03',
group by sql_id
order by sum(decode(session_state,'ON CPU',1,1))   desc
/

 

   SQL_ID           CPU   WAIT        IO  TOTAL
   ------------- ------ ------ --------- ------
   4sfx95xxxxc03   12.0     .0     39.00     51

Now I wanted verify the stats for each SQL statement to make sure the query was doing the same work:

VDB
   SNAP_ID      EXECS       DISK        LIO        RWS        CPU     ELAPSE    IO_TIME       READS
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------  ----------
     18798      25648    2746798   21669637      12554 1768040000 7867477859 7430523627     2746594
Prod
     18887     116449      52123    3026366      77023   67260000  377033758  313874605       52122

The query is being executed more on Prod and doing less disk and less CPU.
Now let’s verify it’s execution plan:

      DB SQL_ID        PLAN_HASH_VALUE
--------- ------------- ---------------
    clone 4sfx95xxxxc03      1805550729
     Prod 4sfx95xxxxc03      3038004819

different execution plans
lets look at the two execution plans

(I used a direct query on DBA_HIST_SQL_PLAN but you can also used the package provided by Oracle, for example:

select * from table(dbms_xplan.display_awr('4sfx95xxxxc03',1805550729,1)); -- sql_id, plan_hash, DBID

)

Plan Hash
VDB                                             Production
-----------                                     ------------
1805550729                                      3038004819  

OPERATION
----------------------------------              --------------------
SELECT STATEMENT_                               SELECT STATEMENT_
 SORT_ORDER BY                                    SORT_ORDER BY FILTER_
  FILTER_                                           FILTER_
   NESTED LOOPS_                                    NESTED LOOPS_
    NESTED LOOPS_                                    NESTED LOOPS_
->   TABLE ACCESS_BY INDEX ROWID PAY_CHECK           TABLE ACCESS_BY INDEX ROWID PAY_EARNINGS
->    INDEX_RANGE SCAN I1PAY_CHECK                      INDEX_RANGE SCAN EPAY_EARNINGS
->   INDEX_RANGE SCAN PAY_EARNINGS                   INDEX_UNIQUE SCAN PAY_CHECK
->  TABLE ACCESS_BY INDEX ROWID PAY_EARNINGS        TABLE ACCESS_BY INDEX ROWID PAY_CHECK
   TABLE ACCESS_BY INDEX ROWID PAY_CHECK           TABLE ACCESS_BY INDEX ROWID PAY_CHECK
    INDEX_RANGE SCAN I0PAY_CHECK                      INDEX_RANGE SCAN I0PAY_CHECK

So the execution plans have changes. Now why have they changed? That’s a story for another day, but the plan would only change if the data and/or the statistics on the tables had changed.

REFERENCE: DBA HIST Views

Star Transformation And Cardinality Estimates

If you want to make use of Oracle's cunning Star Transformation feature then you need to be aware of the fact that the star transformation logic - as the name implies - assumes that you are using a proper star schema.

Here is a nice example of what can happen if you attempt to use star transformation but your model obviously doesn't really correspond to what Oracle expects:

drop table d;

purge table d;

drop table t;

purge table t;

create table t
as
select
rownum as id
, mod(rownum, 100) + 1 as fk1
, 1000 + mod(rownum, 10) + 1 as fk2
, 2000 + mod(rownum, 100) + 1 as fk3
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000000
;

exec dbms_stats.gather_table_stats(null, 't')

create bitmap index t_fk1 on t (fk1);

create bitmap index t_fk2 on t (fk2);

create bitmap index t_fk3 on t (fk3);

create table d
as
select
rownum as id
, case when rownum between 1 and 100 then 'Y' else 'N' end as is_flag_d1
, case when rownum between 1001 and 1010 then 'Y' else 'N' end as is_flag_d2
, case when rownum between 2001 and 2100 then 'Y' else 'N' end as is_flag_d3
, rpad('x', 100) as vc1
from
dual
connect by
level <= 10000
;

exec dbms_stats.gather_table_stats(null, 'd', method_opt => 'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 254 IS_FLAG_D1, IS_FLAG_D2, IS_FLAG_D3');

This is a simplified example of a model where multiple, potentially small, dimensions are stored in a single physical table and the separate dimensions are represented by views that filter the corresponding dimension data from the base table.

So we have a fact table with one million rows and a "collection" dimension table that holds three dimensions, among others.

In order to enable the star transformation bitmap indexes on the foreign keys of the fact table are created.

The dimension table has histograms on the flag columns to tell the optimizer about the non-uniform distribution of the column data.

Now imagine a query where we query the fact table (and possibly do some filtering on the fact table by other means like other dimensions or direct filtering on the fact table) but need to join these three dimensions just for displaying purpose - the dimensions itself are not filtered so the join will not filter out any data.

Let's first have a look at an execution plan of such a simply query with star transformation disabled:

select /*+ no_star_transformation */
count(*)
from
t f
, (select * from d where is_flag_d1 = 'Y') d1
, (select * from d where is_flag_d2 = 'Y') d2
, (select * from d where is_flag_d3 = 'Y') d3
where
f.fk1 = d1.id
and f.fk2 = d2.id
and f.fk3 = d3.id
;

SQL> explain plan for
2 select /*+ no_star_transformation */
3 count(*)
4 from
5 t f
6 , (select * from d where is_flag_d1 = 'Y') d1
7 , (select * from d where is_flag_d2 = 'Y') d2
8 , (select * from d where is_flag_d3 = 'Y') d3
9 where
10 f.fk1 = d1.id
11 and f.fk2 = d2.id
12 and f.fk3 = d3.id
13 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display(format => 'BASIC +ROWS +PREDICATE'));
Plan hash value: 77569906

----------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
|* 2 | HASH JOIN | | 940K|
|* 3 | TABLE ACCESS FULL | D | 100 |
|* 4 | HASH JOIN | | 945K|
|* 5 | TABLE ACCESS FULL | D | 100 |
|* 6 | HASH JOIN | | 950K|
|* 7 | TABLE ACCESS FULL| D | 10 |
| 8 | TABLE ACCESS FULL| T | 1000K|
----------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("F"."FK3"="D"."ID")
3 - filter("IS_FLAG_D3"='Y')
4 - access("F"."FK1"="D"."ID")
5 - filter("IS_FLAG_D1"='Y')
6 - access("F"."FK2"="D"."ID")
7 - filter("IS_FLAG_D2"='Y')

So clearly the optimizer got it quite right - the join to the dimensions is not going to filter out significantly - the slight reduction in rows comes from the calculations based on the histograms generated.

But now try the same again with star transformation enabled:

SQL> explain plan for
2 select /*+ star_transformation opt_param('star_transformation_enabled', 'temp_disable') */
3 count(*)
4 from
5 t f
6 , (select * from d where is_flag_d1 = 'Y') d1
7 , (select * from d where is_flag_d2 = 'Y') d2
8 , (select * from d where is_flag_d3 = 'Y') d3
9 where
10 f.fk1 = d1.id
11 and f.fk2 = d2.id
12 and f.fk3 = d3.id
13 ;

Explained.

SQL>
SQL> select * from table(dbms_xplan.display(format => 'BASIC +ROWS +PREDICATE'));
Plan hash value: 459231705

----------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | SORT AGGREGATE | | 1 |
|* 2 | HASH JOIN | | 9 |
|* 3 | HASH JOIN | | 9 |
|* 4 | HASH JOIN | | 10 |
|* 5 | TABLE ACCESS FULL | D | 10 |
| 6 | TABLE ACCESS BY INDEX ROWID | T | 10 |
| 7 | BITMAP CONVERSION TO ROWIDS| | |
| 8 | BITMAP AND | | |
| 9 | BITMAP MERGE | | |
| 10 | BITMAP KEY ITERATION | | |
|* 11 | TABLE ACCESS FULL | D | 100 |
|* 12 | BITMAP INDEX RANGE SCAN| T_FK1 | |
| 13 | BITMAP MERGE | | |
| 14 | BITMAP KEY ITERATION | | |
|* 15 | TABLE ACCESS FULL | D | 100 |
|* 16 | BITMAP INDEX RANGE SCAN| T_FK3 | |
| 17 | BITMAP MERGE | | |
| 18 | BITMAP KEY ITERATION | | |
|* 19 | TABLE ACCESS FULL | D | 10 |
|* 20 | BITMAP INDEX RANGE SCAN| T_FK2 | |
|* 21 | TABLE ACCESS FULL | D | 100 |
|* 22 | TABLE ACCESS FULL | D | 100 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("F"."FK3"="D"."ID")
3 - access("F"."FK1"="D"."ID")
4 - access("F"."FK2"="D"."ID")
5 - filter("IS_FLAG_D2"='Y')
11 - filter("IS_FLAG_D1"='Y')
12 - access("F"."FK1"="D"."ID")
15 - filter("IS_FLAG_D3"='Y')
16 - access("F"."FK3"="D"."ID")
19 - filter("IS_FLAG_D2"='Y')
20 - access("F"."FK2"="D"."ID")
21 - filter("IS_FLAG_D1"='Y')
22 - filter("IS_FLAG_D3"='Y')

What an astonishing result: Not only Oracle will try now to access all rows of the fact table by single-block random I/O, which by itself can be a disaster for larger real-life fact tables, in particular when dealing with Exadata features like Smart Scans which are only possible with multi-block direct-path reads, but furthermore if this was part of a more complex execution plan look at the cardinality estimates: They are off by five orders of magnitude - very likely a receipt for disaster for any step following afterwards.

The point here is simple: The Star Transformation calculation model obviously doesn't cope with the "collection" of dimensions in a single table very well, but assumes a dimensional model where each dimension is stored in separate table(s). If you don't adhere to that model the calculation will be badly wrong and the results possibly disastrous.

Here the Star Transformation assumes a filtering on dimension tables that are effectively no filter but this is something the current calculation model is not aware of. If you put the three dimensions in separate tables no "artificial" filter is required and hence the calculation won't be mislead.

Of course one could argue that the star transformation optimization seems to do a poor job since the normal optimization based on the same input data produces a much better estimate, but at least for the time being that's the way this transformation works and the model chosen better reflects this.

Profiling trace files with preprocessor external tables in 11g and some parallel execution hacking

If you work with SQL Trace files (and profile them) then you should check out the awesome novel use of the “external table preprocessor” feature explained by Adrian Billington here:

Ironically just a day after writing my “Evil things” article, I noticed a note in MOS about how to enable an event 10384 at level 16384 to get a parallel plan to be executed in serial:

  • How to force that a Parallel Query runs in serial with the Parallel Execution Plan [ID 1114405.1]

This way you can still see the A-rows and other feedback like buffer gets and PIOs by row source even if the plan is a parallel plan. The problem is that even with GATHER_PLAN_STATISTICS enabled (or the equivalent parameter(s)) the PX slaves don’t pass their actual time, rows and buffer gets/PIOs stats back to the QC to be displayed in V$SQL_PLAN_STATISTICS / DBMS_XPLAN output. With parallel slaves, all you’d see would be the QC-generated numbers and not the PX slave stuff.

So if you set that magic event (at your own risk) then even the parallel plans would be executed by the QC only (basically a parallel plan executed entirely ins serial) and you’ll still see all the A-rows and buffer gets/physical read numbers as with serial plans. But remember my yesterday’s article ;-)

Alternative options for getting such runtime stats for a parallel query would be:

  1. Use the Real-Time SQL Monitoring feature (only on 11g and with diagnostics+tuning pack licenses)
  2. Enable SQL Trace for the QC, run your parallel query (you’ll have to wait until it finishes or cancel the query with CTRL+C) and consolidate the STAT# lines from all the PX slave sessions – each PX slave dumps the STAT# lines with row-source level stats like regular serial queries
Update: Coskan commented over twitter that why not use ALLSTATS ALL in DBMS_XPLAN.DISPLAY_CURSOR instead of ALLSTATS LAST – and indeed, in this case you would see the A-rows, buffer gets and other stats of both QC and the PX slaves aggregated together. Although I had used ALLSTATS ALL to compare average execution stats to the LAST stats, I never thought of using it for aggregating the PX slave stats together with QC like that.
 
Basically what happens (in 10g+) with PX is that the QC and all PX slaves all execute the same SQL_ID (but sometimes a different child version though, especially when you’re running cross-instance PX, in which case SQL Monitoring is a better option). And ALLSTATS ALL would aggregate the stats of all executions of the particular SQL_ID (and child number). So if you run something with 4 PX slaves with GATHER_PLAN_STATISTICS enabled, they all add their execution stats to the “ALL” columns. It’s just that the QC is always the last one to finish executing the query (that’s when the exec stats get updated in the V$SQL_PLAN_STATISTICS views), that’s why the ALLSTATS LAST always shows the limited QC stats and not the PX slave ones. But Allstats ALL solves that problem.
 
However the catch is that ALLSTATS ALL would accumulate all executions of this query (including your previous runs of it), so if you want to see fresh stats of only your latest query execution, you should adjust the SQL_ID so that a new cursor would be created. And you can adjust the SQL_ID by simply adding a comment like /* test 2 */ somewhere into your query text…
Thanks Coskan! :)

Immortals…

Immortals is a lack-luster and truly forgettable film. So forgettable I’m already starting to doubt I went to see it.

Visually is it dark and grimy, similar to 300, which I was also not a big fan of. I imagined the film would be really big and epic, but instead most of the scenes were really small scale. Some of the sets looked positively “amateur dramatic” at times. I guess someone got a bit bored during the CGI touching up.

It’s not a really bad film, but it’s not good either. It’s just kinda meh, which has got to be one of the worst reactions a film can get. If I were to make a film I would hope it was either amazing or really terrible. Mediocre is so not worth it.

I’m still eagerly anticipating the new Twilight film, because I know it is going to be really terrible! Something I can really and truly hate with every fibre of my being! :)

Cheers

Tim…