The only significant problem I have with WordPress is that I forget that the “save draft” control is a discreet white button on the left of the screen and the publish control is a loud blue one on the right. This means I publish things by accident from time to time, sometimes when they’re little more than a few notes I’ve collected.
I’ve just done that with some cut-n-paste from an email discussion I’ve been having – it’s an introduction, then a load of fragments that I was hoping to find time to write up in about 10 days time. So if you’re one of the subscribers who just received some an article from my blog, you might want to delete it and wait for it to be written up properly.
BTW – the last time I checked there were 501 subscribers to the blog, which I guess doesn’t sound too bad for a high-tech blog on such a narrow subject.
I wrote a short note last week that linked to a thread on the Russian Oracle forum about indexing, and if you’ve followed the thread you will have seen a demonstration that seemed to be proving the point that there were cases where an index rebuild would be beneficial.
Of course it’s not difficult to come up with cases where index rebuilds should make a difference – but it’s harder to come up with demonstrations that look realistic, so I thought I’d review the example to explain why it doesn’t really work as a good example of why you might need to think about rebuilding some production index.
The code is simple – create a table with an index, insert a batch of rows, check the index usage, repeat a few times, then rebuild the index and show that the space usage drops dramatically. Here’s the starting code (with a couple of minor changes):
create table testin (s varchar2(1000)); create index testin on testin (s) tablespace test_16k ; execute dbms_random.seed(0) insert into testin select dbms_random.string('P', 1000) s from dual connect by level <= 1000 ; commit; validate index testin; select height, blocks, lf_blks, lf_rows, lf_rows_len, pct_used from index_stats ;
You’ll note that the test index is using a blocksize of 16KB. Since the key value is 1,000 characters it does make sense to use one of the larger block size for the index although, to highlight the more common variation on that particular theme, it’s a good idea to think about using larger block sizes for index organized tables because a single index entry is often quite large and this can have an unpleasant effect on leaf block splits.
The call to dbms_random.string(‘P’,1000) generates a string of 1,000 characters selected from the full character set – which maximises the possible variation in values.
After the first validation, the code did the following:
insert into testin select ... commit; validate index testin; select ... from index_stats; insert into testin select ... commit; validate index testin; select ... from index_stats; alter index testin rebuild; validate index testin; select ... from index_stats;
This is what the four sets of results from index_stats looked like when I ran the test:
HEIGHT BLOCKS LF_BLKS LF_ROWS LF_ROWS_LEN PCT_USED ---------- ---------- ---------- ---------- ----------- ---------- 2 128 99 1000 1013000 63 2 256 217 2000 2026000 58 2 384 320 3000 3039000 59 2 256 215 3000 3039000 87 -- rebuilt
Quite clearly the rebuild has had a significant effect on the index – the size has dropped by nearly 30 percent. But apart from the fact that it tells us that a rebuild can reduce the size of an index, does it tell us about anything that’s likely to happen in a real system ?
The answer is yes. It has given us a good indication that indexes with very long keys are likely to operate with a larger amount of free space per leaf block than indexes with short keys. The pct_used of around 60% is significantly lower than the value of around 70% that you might expect for the more general case b-tree index with random data arrival. (In fact when I ran the test using an 8KB block size the usage was closer to 50% – but then you can only get seven of these index entries into an 8KB block – giving a guaranteed wastage of 1KB).
Has it told us, though, that this type of index merits a rebuild ? Possibly not until you answer a few questions, of which the most important is probably: “What’s going to happen next on your production system ?”
We’ve started with 1,000 rows inserted at a time – is this going to go on for the lifetime of the index and, if so, how long is that lifetime ? Is our test unrepresentative because in “real life” we would be inserting 1,000 rows into a table holding 10,000,000 rows.
Perhaps, following an initial bit of batch loading, we’re going to be dribbling rows into the table in ones and twos – which, again, would make the test unrepresentative of the full lifetime of the index. The impact of an occasional big batch operation (often a delete rather than an insert) on an index is often an indicator that special measures should be taken – to the extent that sometimes we drop an index before a batch job and rebuild it afterwards.
Maybe we’re going to be deleting 1,000 rows rows at random from now on just before, or just after, we do the inserts – possibly with, or without, a commit in between. How would that affect any ideas we have about rebuilds ?
The fact that a rebuild of this index at this point in time gives a significant space saving doesn’t necessarily mean much in terms of the continued use and maintenance of the index. And just to show how potentially misleading this demonstration could be, let’s just repeat the load one more time, and see what the index looks like.
Here are two sets of results – one showing what happens on the next insert if you have rebuilt the index, the other showing what happens if you simply extend the test by one more cycle without rebuilding the index:
HEIGHT BLOCKS LF_BLKS LF_ROWS LF_ROWS_LEN PCT_USED ---------- ---------- ---------- ---------- ----------- ---------- 2 256 215 3000 3039000 87 -- rebuilt 2 512 432 4000 4052000 58 -- after next insert HEIGHT BLOCKS LF_BLKS LF_ROWS LF_ROWS_LEN PCT_USED ---------- ---------- ---------- ---------- ----------- ---------- 2 384 320 3000 3039000 59 -- not rebuilt 2 512 413 4000 4052000 61 -- after next insert
Not only is the “final” index larger because you rebuilt it part way through the test – to get to that larger state you have done 217 index leaf blocks splits as you inserted the data, compared to 93 index leaf block splits in the case where you didn’t rebuild the index. As a rough guideline, that’s about 4.5MB of extra redo log generated (trivial in absolute terms, but a significant fraction of the total redo generated).
To me, the importance of the original demonstration is not that it shows us an index that gets smaller when we rebuild it, its importance lies in the fact that it reminds us that we have to think carefully about how well our tests represent real scenarios.
Day 2 was all about my own presentation really as I find it next to impossible to concentrate on anything before it and I'm always bushed afterwards.
I did pop my head into Margaret Norman's presentation briefly because I knew she'd be talking about statistics changes in 11g which was very pertinent to my own presentation. There was quite a run on Statistics presentations actually, with Margaret, Maria Colgan from Oracle and me all giving related presentations in fairly quick succession. Poor Maria had to go last and even going immediately after Margaret, it occurred to me that people might have heard it all in her presentation. I suppose it's always good to get different views and presentation styles, though.
As for my own presentation. Sigh. What conflicting emotions. On the one hand, it felt very good, I had a lot of fun, got through 70+ slides and think I delivered some very important messages to try to save people from some of the pain and misery me and my colleagues have been through over the past 18 months.
People have been extremely complimentary afterwards and I was unusually pleased. The Symposium seems to bring the best out of me.
Then, after some much needed sleep, I woke up to an email from Wolfgang Breitling pointing out that I said something that maybe I didn't mean and which (this is my description) was very stupid. I think it's probably worth a further post so I can be explicit about the mistake for those who attended the presentation and will put a note in the slides when I upload them.
As always, my brain was fried after presenting. In fact, the more on-form I've been, the more tired I tend to be because there's a lot of nervous energy expended. I had planned on attending Riyaj Shamsudeen's 2-part RAC Troubleshooting presentation but when the moment came, I just couldn't imagine my brain being able to process that so I went back to my room, chilled out a bit and changed out of my suit
Next up Was Maria Colgan's Oracle Optimizer - Top Tips to get optimal SQL execution every time. But, as I'd already seen this at UKOUG, it was just a flying visit to see how she went. I heard the usual positive feedback later, despite her clearly suffering from a cold. I've experienced that feeling of desperately needing to cough while you're wearing a microphone! Not fun As I've said before, I think this is a really useful presentation from a great presenter if you ever get a chance to hear her.
In the end, I gave in to what was starting to feel inevitable and grabbed a couple of hours sleep, which was precisely what I needed, but meant missing the Miladin Modrakovic/Dan Fink dilemma One good thing about the Symposium is that I can pick up all of the materials later and have a good look through them.
Last of the day was Tim Gorman's Forensic Analysis using AWR and ASH, which I thoroughly enjoyed. There wasn't exactly show-stopping new information for me - I wouldn't expect that - but I absolutely loved the way he presented using ASH to identify the source of a production error, rather than just for performance analysis. It's something I've done many times and can be a great tool. Tim's relaxed but authoritative presentation style reflects someone who's clearly been there, done it and bought the T-shirt and he hit the content level just right. Do a few things very well rather than bore people to tears with 60 minutes mumbling your way through the slides!
Sadly, I spent the next hour or so debating my error with Wolfgang and Maria and Jacco in the hotel bar. As soon as I took a closer look at both paper and slides, I realised I'd screwed up but party night showed up to soothe my furrowed brow. I had a great time chatting to Mark Williams, Wolfgang and Maria and a cast of hundreds before going just a little crazy in the company of the Beatles-Piano-Human-Jukebox Mark Williams and Stephan Haisley. Great fun all round and I had a whale of a time, although it did mean I didn't get to bed until 3am. I'm too old for that nonsense now
Disclosure: I'm attending this year's Hotsos Symposium with the help
of the Oracle ACE Director program, which is paying my travel and
accommodation expenses. The time off work is at my own expense.
March 10, 2011 A couple of days ago I saw an interesting question that asked what is the difference between using the FIRST_ROWS(100) hint in a SQL statement and using ROWNUM<=100 in the WHERE clause with the original SQL statement moved to an inline view. For example: SELECT /*+ FIRST_ROWS(100) */ C1, C2 [...]
A couple of weeks ago I listed a number of scripts from 18.104.22.168 relating to AWR reports – it seems only sensible to publish a corresponding list for Statspack. In fact, there are two such lists – one for “traditional” Statspack, and one for “standby statspack” – a version of statspack you can pre-install so that you can run statspack reports against a standby database.
Statspack – the basic source of information is $ORACLE_HOME/rdbms/admin/spdoc.txt
spreport.sql -> Generates a Statspack Instance report sprepins.sql -> Generates a Statspack Instance report for the database and instance specified sprepsql.sql -> Generates a Statspack SQL report for the SQL Hash Value specified sprsqins.sql -> Generates a Statspack SQL report for the SQL Hash Value specified, for the database and instance specified sppurge.sql -> Purges a limited range of Snapshot Id's for a given database instance sptrunc.sql -> Truncates all Performance data in Statspack tables spuexp.par -> An export parameter file supplied for exporting the whole PERFSTAT user
Standby Statspack – the basic source of information is $ORACLE_HOME/rdbms/admin/sbdoc.txt
sbreport.sql - Create a report sbaddins.sql - Add a standby database instance to the configuration sblisins.sql - List instances in the standby configuration sbdelins.sql - Delete an instance from the standby configuration sbpurge.sql - Purge a set of snapshots
A warning note about standby statspack – there is a significant structural change from 22.214.171.124 to 126.96.36.199 allowing support of multiple standby databases. I haven’t looked closely at it yet, but it did cross my mind that it might be possible to install the 188.8.131.52 version in an 184.108.40.206 database to avoid late upgrade issues – I may get around to testing the idea one day.
Followers of the blog will know I’ve had a little bit of trouble with Oracle Norway and money disappearing from my company credit card.
This is just a quick note to say the issue has now been resolved and all the money is safely back in my account. Thank you to everyone who got involved in resolving the process. Your help is much appreciated.
The (M)other (O)f (A)ll (T)uning (S)cripts. A tuning and diagnostics utility for sqlplus co-developed with Tanel Poder. Includes active session sampling and a TOP-style utility for displaying instance activity at regular refresh intervals. Supports versions from 10g Release 2 onwards. September 2010 (updated March 2011)
The first day was my usual mish-mash of managing to attend a few presentations, further slide-polishing and beers with friends.
We kicked off with Kerry Osborne's keynote address which was a walk through IT history, both in a general sense through the move from mainframes and single-vendor proprietary systems and back again, and also in a personal sense as Kerry discussed his own history in IT. Kerry's a fun speaker so he managed to keep everyone's attention well for an hour and raise quite a few laughs.
After a quick (smoking) break, we had Kerry again on Tuning Exadata. I didn't learn too much here but, in fairness, I might have learned a lot more had he not had to answer so many basic Exadata questions which sucked up a lot of the time slot. I heard a few people complain about this afterwards, so it wasn't just me.
Continuing the opening single stream of presentations was Cary Millsap's Thinking Clearly About Performance. Regardless of how many new facts I may (not) learn in this particular presentation, I am virtually guaranteed that if I listen to Cary for an hour, he'll remind me of how we should approach performance problems and always makes me think about things just slightly differently. He's a great speaker - not to be missed.
As soon as we split into two streams, I had my first tough decision to make between Gerwin Hendriksen's GAAP presentation and Thinking Clearly in SQL by Toon Koppelars. In the end I went with Toon on the basis that I can read Gerwin's paper later. It was a good choice as Toon did a great job of making the maths behind solving tricky problems in SQL much more understandable than it had been before for me.
At that point, my day dissolved into a combination of catching up on both sleep and slide preparation then I just had time to attend Stephan Haisley's presentation on Goldengate Performance. I was absolutely determined to attend this because I'd always admired his work but had never heard him present for one reason or another. Sure enough, he's a great presenter and I learned plenty of useful tips that I'll share when I'm back in the office.
The day wrapped up with Netezza Night. Yes, Netezza night. It seemed a strange gig to have them sponsor a Hotsos event and there was a strange little speech for a couple of minutes but, hey, they bought the beer, so I'm not complaining.
Disclosure: I'm attending this year's Hotsos Symposium with the help of the Oracle ACE Director program, which is paying my travel and accommodation expenses. The time off work is at my own expense.
How do you find out if there are any restrictions (or interesting, possibly undocumented, values) for the system parameters ? Once upon a time I used to use a value that was (probably) illegal so that I could find out from the resulting error message what the list of valid values was, for example:
SQL> alter system set optimizer_features_enable='fred'; alter system set optimizer_features_enable='fred' * ERROR at line 1: ORA-00096: invalid value fred for parameter optimizer_features_enable, must be from among 220.127.116.11.1, 18.104.22.168, 10.2.0.4, 10.2.0.3, 10.2.0.2, 10.2.0.1, 10.1.0.6, 10.1.0.5, 10.1.0.4, 10.1.0.3, 10.1.0, 9.2.0, 9.0.1, 9.0.0, 8.1.7, 8.1.6, 8.1.5, 8.1.4, 8.1.3, 8.1.0, 8.0.7, 8.0.6, 8.0.5, 8.0.4, 8.0.3, 8.0.0
This little trick became unnecessary in 10.2 with the arrival of view v$parameter_valid_values. For example (from an 22.214.171.124 instance):
SQL> select ordinal, isdefault, value 2 from v$parameter_valid_values 3 where name = 'optimizer_features_enable' 4 order by ordinal 5 / ORDINAL ISDEFAUL VALUE ---------- -------- --------------------------- 1 FALSE 8.0.0 2 FALSE 8.0.3 3 FALSE 8.0.4 4 FALSE 8.0.5 5 FALSE 8.0.6 6 FALSE 8.0.7 7 FALSE 8.1.0 8 FALSE 8.1.3 9 FALSE 8.1.4 10 FALSE 8.1.5 11 FALSE 8.1.6 12 FALSE 8.1.7 13 FALSE 9.0.0 14 FALSE 9.0.1 15 FALSE 9.2.0 16 FALSE 10.1.0 17 FALSE 10.1.0.3 18 FALSE 10.1.0.4 19 FALSE 10.1.0.5 20 FALSE 10.1.0.6 21 FALSE 10.2.0.1 22 FALSE 10.2.0.2 23 FALSE 10.2.0.3 24 FALSE 10.2.0.4 25 TRUE 126.96.36.199 26 FALSE 188.8.131.52.1 26 rows selected.
But a quick test demonstrates that you don’t get to see the values for hidden parameters – which is why I wrote a quick script to query the underyling x$ structure. Since it’s looking at an x$ it will only work if you’re connected to the SYS schema:
set pagesize 60 set linesize 90 set trimspool on column name_kspvld_values format a40 heading "Name" column value_kspvld_values format a40 heading "Value" column isdefault_kspvld_values format a4 heading "Def" break on name_kspvld_values skip 1 spool valid_values select name_kspvld_values, decode( isdefault_kspvld_values, 'TRUE', '***', 'FALSE', null, '?' ) isdefault_kspvld_values, value_kspvld_values from x$kspvld_values order by name_kspvld_values, ordinal_kspvld_values ; spool off Name Def Value ---------------------------------------- ---- ---------------------------------------- _always_anti_join HASH MERGE NESTED_LOOPS CHOOSE OFF _always_semi_join HASH MERGE NESTED_LOOPS CHOOSE OFF _backup_kgc_scheme *** ZLIB BZIP2 LZO _cluster_library SKGXN CLSS ...
You’ll notice when you run it that it’s only about the parameters that take string values – there’s no information there about default values for numeric parameters or boolean parameters. Unfortunately there are (in my 184.108.40.206 instance) 107 distinct parameters listed in this view but 283 parameters of type 2 in x$ksppi (one of the things underneath v$parameter) so at first sight it looks as if some (quite a lot of) parameters are missing – but that’s because some string parameters (such as control_files, remote_listener, log_archive_dest) are open-ended in their content and can’t be constrained by a simple list of values.
Yes, the title of this blog entry is a come-on. I am ever-so-slightly apologetic (smiley face).
This post follows the longest dry spell in my blogging over the last five years. I haven’t posted since early January and thus I am quite overdue for the next installment in my series regarding the Oracle Database 11g Direct NFS clonedb feature. I set out to make the next installment yesterday but before doing so I visited the analytics for my blog readership to see what’s been happening. I discovered that essentially nobody comes to this blog through Exadata related search terms anymore. That surprised me. Indeed, for the first—what—two or so years after Exadata went into general availability the first page worth of Google search results always included some of my posts. I can’t find any of my Exadata posts in the first several pages Google spoon-feeds me now when I google “Exadata.” This isn’t a wounded-soul post. I do have a point to make. Humor me for a moment while I show the top twenty search terms that have directed readers to my blog since January 1, 2011.
|oracle on flash SSD||188|
|oracle nfs clonedb||182|
|oracle on nfs||122|
|oracle fibre channel||115|
|huge pages allocated||104|
|real application clusters||92|
|automatic memory management||82|
|oracle file systems||75|
|greenplum versus exadata||70|
So, as far as search terms go there seems to be a lack of traffic coming to this site for Exadata-related information. The page views for my Exadata posts are high, but the search terms are lightly-weighted. This means folks generally read Exadata-related material here after being directed for a non-related search term. Oh well. I’d ordinarily say, “so what.” However, it is unbelievable to me how many people ask me questions each and every day that would be unnecessary if not for a quick read of one of the entries I posted before Oracle Open World 2010. That post, entitled Seven Fundamentals Everyone Should Know Before Attending Openworld 2010 Sessions might be better named You Really Need to Know This Little Bit About Exadata Before Anyone Else Tries to Tell You Anything About Exadata. Folks, if you get a moment and you care at all about Exadata, please do read that short blog entry. It will enhance your experience with your sales folks or any other such Exadata advocates. Indeed, who wants to be introduced to a technology solution by the folks trying to sell it to you. Now, don’t get me wrong. I’m not saying Exadata sales folks are prone to offering misinformation. What I’m trying to say is your interaction with sales folks will be enhanced if you don’t find yourself in such remedial space as the very definition of the product and its most basic fundamentals. That leads me to point out some of the folks who have taken the helm from me where Exadata blog content is concerned.
Oaktable Network Members Booting Up Exadata Blogging
Fellow Oaktable Network member Kerry Osborne blogs about Exadata, in addition to his current efforts to write a book on the topic. I’ve seen the content of his book in my role as Technical Editor. I think you will all find it a must-read regarding Exadata because it is shaping up to be a very, very good book. I have the utmost of respect for fellow Oaktable Network members like Kerry. In addition to Kerry, Fritz Hoogland (a recent addition to the Oaktable Network) is also producing helpful Exadata-related content. Oracle’s Uwe Hesse blogs frequently about Exadata-related matters as well. So, there, I’ve pointed out the places people graze for Exadata content these days. But I can’t stop there.
We Believe the Oracle Data Sheets
The content I’ve seen in blogs seems to mostly confirm the performance claims stated in Oracle Data Sheet materials. Let me put it another way. We all know the latest Exadata table/index scan rates (e.g., 25 GB/s HDD full rack or 70GB/s combined Flash + HDD). We’ve seen the Data Sheets and we believe the cited throughput numbers. I have an idea—but first let me put on my sarcasm hat. I’m going to predict that the next person to blog about Exadata will start out by blogging something very close to the following:
My big_favorite_table has many columns and a bazillion rows. On disk it requires 200 gigabytes of storage but with mirroring it takes up 400 gigabytes. When I run the following query—even without Exadata Smart Flash Cache—it only takes eight seconds on my full-rack Exadata configuration to get the result:
SQL> select count(*) from big_favorite_table where pk_number < 0; COUNT(*) ---------- 0
Don’t get me wrong. It is important for folks to validate the Data Sheet numbers with their own personal testing. But folks, please, we believe the light-scan rates are what the marketing literature states. I’m probably not alone in my desire to see blogs on users’ experience in solving particularly complex analytical data analysis problems involving vast amounts of data stored in Exadata. That sort of blogging is where social networking truly ads value—you know, going “beyond the Data Sheet.”
So what does all this have to do with the infrequent nature of my blogging? Well, I’ll just have to leave that for a future entry. And, no, the Data Sheets on Exadata Database Machine are not inaccurate.
Filed under: oracle