October 6, 2011 Hammering a Square Peg into a Round Hole: Fine Edges are Lost, Gaps in Detail http://www.amazon.com/Oracle-Database-Performance-Tuning-Recipes/dp/1430... (Back to the Previous Post in the Series) In an effort for my review to be fair, I have completed the review for the second half of the “Oracle Database 11g Performance Tuning Recipes” book (omitting the pages for chapters [...]
This is the penultimate day of #OOW11 and I am here at the hotel lobby trying to put some order around the myriads of nuggets of information I have had over the last several days.
The announcements this year have been centered around introduction of various new products from Oracle - Oracle Database Cloud, Cloud Control, Database Appliance, Big Data Appliance, Exalytics, T4 Super cluster and so on. One interesting pattern that emerges from the announcements that is different from all the previous years is the introduction of several engineered and assembled systems that perform some type of task - specialized or generic. In the past Oracle announced machines too; but not so many at the same time, leading to an observation by April Sims (Executive Editor, Select Journal) that this year can be summed up in one phrase - Rise of the Machines.
But many of the folks I met in person or online were struggling to put their head around the whole lineup. It's quite clear that they were very unclear (no pun intended) how these are different and what situation each one would fit in. It's perfectly normal to be little confused about the sweet spots of each product considering the glut of information on them and seemingly overlapping functionalities. In the Select Journal Editorial Board meeting we had earlier this morning, I committed to writing about the differences between the different systems announced at #OOW11 and their usages in Select Journal 2012 Q1 edition. I didn't realize at that time what a tall order that is. I need to reach out to several product managers and executives inside Oracle to understand the functionality differences in these machines. Well, now that I have firmly put my feet in mouth, I will have to do just that. [Update on 4/29/2012: I have done that. Please see below]
In the demogrounds I learned about Oracle Data Loader for Hadoop and Enterprise-R, two exciting technologies that will change the way we collect and analyze large data sets, especially unstructured ones. Another new technology, centered around Cloud Control, was the Data Subsetting. It allows you to pull a subset of data from the source system to create test data, mask it if necessary and even find sensitive data based on some format. The tool was due for quite some time.
Again, I really need to collect my thoughts and sort through all that information overload I was subjected to at OOW. This was the best OOW ever.
#990000; font-size: large;">Update on April 29th, 2011
I knew I had to wrap my head around these announcements and sort through the features available in the engineered machines. And I did exactly that. I presented a paper in the same name - Rise of the Machines - in Collaborate 2012, the annual conference of the Independent Oracle Users Group. Here is the presentation. In that session I explained the various features of 6 machines - Oracle Database Appliance, Exadata, Exalogic, Sparc Super Cluster, Exalytics and Big Data Appliance, the differences between them and where each one should be used. Please download the session if you want to know more about the topic.
A variation on Tom Kyte's invaluable RUNSTATS utility that compares the resource consumption of two alternative units of work. Designed to work under constrained developer environments and builds on the original with enhancements such as "pause and resume" functionality, time model statistics and the option to report on specific statistics. ***Update*** Now available in two formats: 1) as a PL/SQL package and 2) as a free-standing SQL*Plus script (i.e. no installation/database objects needed). January 2007 (updated October 2011)
A variation on Jonathan Lewis's SNAP_MY_STATS package to report the resource consumption of a unit of work between two snapshots. Designed to work under constrained developer environments, this version has enhancements such as time model statistics and the option to report on specific statistics. ***Update*** Now available in two formats: 1) as a PL/SQL package and 2) as a free-standing SQL*Plus script (i.e. no installation/database objects needed). June 2007 (updated October 2011)
Monday: I went to some presentations, hung around in the OTN lounge and ate at every possible opportunity. Tanel Poder‘s presentation on “Large-Scale Consolidation onto Oracle Exadata: Planning, Execution, and Validation” was pretty cool.
In the evening I planned to meet a former colleague at the OTN party. I decided they best way to find him was to visit every food station at the party, which of course meant sampling the goods. Unfortunately I spent too much time eating and not enough time looking for him. Sorry Ian! The cool thing about Open World is you can enter a giant tent full of thousands of people and pretty much guarantee you will bump into loads of people you know.
Tuesday: I spent most of Tuesday helping out at RAC Attack in the OTN Lounge. I did manage to get to see Greg Rahn‘s presentation called “Real-World Performance: How Oracle Does It”, which focussed on Real-Time SQL Monitoring. Greg’s presentation style is really easy to listen to and you know this isn’t just theoretical knowledge. He’s in the trenches doing this stuff as part of the Real-World Performance Group.
As the afternoon progressed I felt a little tired, so I went back to the hotel, puked and fell asleep. I think this was more to do with being over-tired than anything else. That meant I missed some of the later sessions and didn’t hook up with anyone in the evening.
This morning I feel a little ropey, but I’m going to head on down to RAC Attack again and see if I can make myself useful. Tonight is the appreciation event, but I’m not sure if I will be able to “appreciate it” unless I get a major energy injection at some point today.
There seems to be a little confusion out there about the certification status of Oracle Database 11gR2, especially with the release of the 220.127.116.11 patchset which fixes all the issues associated with RAC installs on OL/RHEL 6.1.
Currently, 11gR2 is *NOT* certified on OL6 or RHEL6. How do I know? My Oracle Support says so! Check for yourself like this:
From the results you will see that Oracle Database 18.104.22.168 is certified on OL and RHEL 5.x. Oracle do not differentiate between different respins of the major version. You will also notice that it is not currently supported on OL6 or RHEL6.
Having said that, we can expect this certification really soon. Why? Because Red Hat has submitted all the certification information to Oracle and (based on previous certifications) expects it to happen some time in Q4 this year, which is any time between now and the end of the year.
With a bit of luck, by the time I submit this post MOS certification will get updated and I will happily be out of date…
Just a little follow-up to my previous note on hybrid columnar compression. The following is the critical selection of code I extracted from the trace file after tracing a run of the advisor code against a table with 1,000,000 rows in it:
create table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID1 tablespace "USERS" nologging as select /*+ FULL(mytab) NOPARALLEL(mytab) */ rownum rnum, mytab.* from "TEST_USER"."T1" mytab where rownum <= 1000001 create table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 tablespace "USERS" nologging as select /*+ FULL(mytab) NOPARALLEL(mytab) */ * from "TEST_USER".DBMS_TABCOMP_TEMP_ROWID1 mytab where rnum >= 1 alter table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 set unused(rnum) create table "TEST_USER".DBMS_TABCOMP_TEMP_UNCMP tablespace "USERS" nologging as select /*+ FULL(mytab) NOPARALLEL (mytab) */ * from "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 mytab create table "TEST_USER".DBMS_TABCOMP_TEMP_CMP organization heap tablespace "USERS" compress for archive high nologging as select /*+ FULL(mytab) NOPARALLEL (mytab) */ * from "TEST_USER".DBMS_TABCOMP_TEMP_UNCMP mytab drop table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID1 purge drop table "TEST_USER".DBMS_TABCOMP_TEMP_ROWID2 purge drop table "TEST_USER".DBMS_TABCOMP_TEMP_UNCMP purge drop table "TEST_USER".DBMS_TABCOMP_TEMP_CMP purge
Note: in my example the code seems to pick the first 1M rows in the table; if this is the way Oracle works for larger volumes of data this might give you a unrepresentative set of data and misleading results. I would guess, though, that this may be a side effect of using a small table in the test; it seems likely that if I had a much larger table – perhaps in the 10s of millions of rows – then Oracle would use a sample clause to select the data. If Oracle does use the sample clause then the time to do the test will be influenced by the time it takes to do a full tablescan of the entire data set.
Note 2: The code to drop all 4 tables runs only at the end of the test. If you pick a large sample size you will need enough free space in the tablespace to create three tables hold data of around that sample size, plus the final compressed table. This might be more space, and take more time, than you initially predict.
Note 3: There are clues in the trace file suggesting that Oracle may choose to sort the data (presumably by adding an order by clause in the final CTAS) to maximise compression.
Note 4: You’ve got to wonder why Oracle creates two copies of the data before coming up with the final compressed copy. You might also why the UNCMP copy isn’t created with PCTFREE 0 to allow for a more reasonable comparison between the “free” option for archiving the table and the compressed version. (It would also be more useful to have a comparision between the free “basic compression” and the HCC compression, rather than the default 10% free space copy.)
For reference (though not to be taken too seriously) the following figures show the CPU and Elapsed times for creating the four tables:
Table CPU Ela ----- ----- --- Rowid1 1.12 6.67 Rowid2 0.70 0.20 UNCMP 0.59 7.31 CMP 18.29 0.04
Don’t ask me why the elapsed times don’t make sense; but do note that this was 22.214.171.124 on 32-bit Windows running in a VM.
And a few more statistics for comparison, showing block sizes of the test table of 1M rows:
Original size: 10,247 Data size reported by dbms_compression: 10,100 Final size reported by dbms_compression: 2,438 Original table recreated at pctfree 0: 9,234 Original table with basic compression: 8,169 Optimal sort and basic compression: 6,781
There’s no question that HCC can give you much better results than basic compression – but it’s important to note that the data patterns and basic content make a big difference to how well the data can be compressed.
Footnote: The question of how indexes work with HCC tables came up in one of the presentations I went to. The correct answer is: “not very well”.
Today (Tuesday, October 4th 2011) I had my presentation at OOW 2011 in Hotel InterContinental at 13:15 titled "Getting the Best from the Cost Based Optimizer". The room was already sold out on Thursday and I really expected to have a full room - cca 270 people. The room was too small for all who want attending the presentation. Although there were no seats available quite a lot of them were standing in the back.
This time my presentation was not very technical one (in my eyes). I just wanted to point out problems I see many times in real life when people don't really know about some features or behavior of the CBO. Unfortunately one hour was not enough to explain in details why things are going wrong and also show all relevant details. But as I said, the aim was to point out the problems and also give directions what one has to do to get rid of them and according to the reactions of the audience after the presentation the goal was achieved.
After the presentation I was answering the questions more than half an hour so I almost missed the meeting for the beta program of next release of the database. Because so many were interested to get the presentation slides I have uploaded it to my home web page immediately upon arrival at SFO airport.
Here is the abstract for "Getting the Best from the Cost Based Optimizer":
Oracle Database 11g brings many new enhancements to PL/SQL. These will improve the performance, functionality, and security of your applications and will increase your productivity as a developer. This session will present the Oracle database 11g new PL/SQL language features and enhancements that can be used to improve programming functionality, performance and usability. Participants will learn about new trigger options, PL/SQL function result cache, bulk binding, new security features and more.
The agenda was:
The presentation is available here for download, but you must be registered.