Search

OakieTags

Who's online

There are currently 0 users and 42 guests online.

Recent comments

Affiliations

March 2011

Buffer States

Here’s a bit of geek stuff that I’ve been meaning to write up for nearly a year – to the day, more or less – and I’ve finally been prompted to finish the job off by the re-appearance on the OTN database forum of the standard “keep cache” question:

    Why isn’t Oracle keeping an object “properly” when it’s smaller than the db_keep_cache_size and it has been assigned to the buffer_pool keep ?

This is a two-part note – and in the first part I’m just going to run a query and talk about the results. The query is one that has to be run by SYS because it references a couple of x$ structures, and this particular version of the query was engineered specifically for a particular client.

select
        obj,
        state,
        bitand(bh.flag,power(2,13))     cur,
        count(*)        ct
from
        x$bh            bh,
        x$kcbwds        wds
where
        wds.addr = bh.set_ds
and     wds.set_id  between 1 and 24
group by
        obj,
        state,
        bitand(bh.flag,power(2,13))
order by
        obj,
        state,
        bitand(bh.flag,power(2,13))
;

You’ll notice I’m joining x$bh (the “buffer header” array) to x$kcbwds (the “working data set” array) where I’ve picked sets 1 to 24. On this particular system these were the sets for the KEEP cache. (If you want a generic query to isolate a particular cache then there’s an example here that identifies the RECYCLE cache by reference – but I wanted the query in this note to run as efficiently as possible against this production system, so I did a preliminary lookup against x$kcbwbpd and then used the literal set ids).

 

Here are a few lines from the resulting output:


       OBJ      STATE        CUR         CT
---------- ---------- ---------- ----------
     40158          1       8192          1

     40189          1          0      87233
                            8192     272789

                    3          0      69804
                            8192     393868

     40192          1          0         87
                            8192      12197

                    3          0      30763
                            8192       1994

...

    117291          1          0        498
                            8192       4419

                    3          0       3001
                            8192         15

    117294          1          0        243
                            8192       3544

                    3          0       1245
                            8192         23

4294967294          3          0          2
**********                       ----------
sum                                 1216072

Since we’re looking at x$ structures – which rarely have any official documentation – the rest of this note isn’t guaranteed to be correct – and things do change with version so I need to stress that this specific example comes from 9.2.0.6. This is what I think the results show:

The state column is instance-related and is essentially something that’s useful in a RAC enviroment. State 1 translates to ‘XCUR’ (exclusive current) which means that this instance has exclusive rights to the most recent version of the block; state 3 translates to ‘CR’ (only valid for consistent read).

Bit 13 of the flag column is set if the buffer has been “gotten in current mode”. (If you’re interested in the other bits there’s a page on my old website that might keep you entertained for a while – I haven’t yet updated it to 11g, though.)

The problem for the client was this – the total size of all the data segments in the KEEP cache was about 6GB and the total size of the KEEP cache was about 10GB, yet the database was still reporting a constant trickle of physical reads to the objects and, when the code to “re-load” the cache  was executed at a quiet period at the start of the day some 60,000 physical blocks had to be  read. With a 10GB cache for 6GB of data would you really expect to see this I/O ?

Take a look at the figures for object 40189:

There are 272,789 buffers for blocks that were “gotten in current mode” (bit 13 is set) and are also “exclusive current” (state 1) to the instance, but there are also 393,868 buffers that were originally “gotten in current mode” but are now “only valid for consistent read”.

Similarly there are 87,233 buffers for blocks that weren’t “gotten in current mode” but are “exclusive current” to the instance – in other words they are the most up to date version of the block but weren’t fetched with a “db block get”, and again there are 69,804 buffers holding blocks that were not “gotten in current mode” but which are now “only valid for consistent read”.

Buffers that are “only valid for consistent read” are buffers holding blocks that have been generated through one of Oracle’s mechanisms for creating CR (consistent read) clones. As you can see, then, a block that is a CR clone may still be flagged as “gotten in current mode”. In fact, in line with Oracle’s generally “lazy” approach to work you can even find (in some versions of Oracle, at least) CR clones that still have the “dirty” bit set in the flag, even though CR clones can never really be dirty and are NEVER written to disc.

Take another look at the buffer counts – this KEEP cache is sized at 1.2M buffers (10GB), but object 40189 alone has taken out 460,000 of those buffers (3.6GB) in block clones, and for this object there are more clones than originals (at 360,000, which happens to be just a few thousand blocks less than the size of the table). So, when you’re thinking about creating a KEEP cache, remember that you have to allow for block cloning – simply setting the db_keep_cache_size to something “a bit bigger” than the object you want to keep cached may not even be close to adequate.

Part 2 to follow soon.

How important is multi-browser support for specific apps?

In a thread on the OakTable mailing list, James Morle pointed out that Oracle’s Web Conferencing software was IE only. A point that has been mentioned by Jake from The AppsLab a number of times in relation to his need for a Windows VM on his Mac. The discussion turned to the relative browser share and multi-browser support. This post is a minor rearrangement of my posts to that thread.

Before I launch into the body of the post, keep in mind I am talking about complex (typically GUI style) apps with a specific purpose that run from within a browser, not just general web pages!

The breakdown of browser stats from my website over the last month was.

  • Internet Explorer: 42.32%
  • Firefox: 38.68%
  • Chrome: 15.19%

Which is pretty similar to those figures quoted in the summary on Wikipedia.

  • Internet Explorer: 43.55%
  • Firefox: 29.0%
  • Chrome: 13.89%

That surprised me because in the past I’ve always found my stats for IE much lower than the general stats quoted. I’ve always assumed this was because Oracle geeks try out alternative browsers much more than the general public. Most “normal” people I know use IE. Most geeks I know don’t. Now they seem to match. Does this mean more regular folk are moving to Firefox & Chrome, or is this all being skewed by browsers on mobile devices?

The stats for mobile devices are shown here, but I am not sure if these get included in the general stats also. If so, I would expect some of the Chrome hits to be coming from Android devices and some of the Safari hits to be coming from iPhone and iPad devices. If that is the case, then using the general browser market share stats may not be too clever when deciding the impact of whether to support a specific browser for your app. Maybe OS usage is a better option.

Looking at the OS usage stats on Wikipedia, Windows is still kicking butt on the desktop, so all these people have access to IE as well as their preferred browser.

  • Windows XP (41.15%)
  • Windows 7 (26.35%)
  • Windows Vista (14.57%)
  • Mac OS X (7.07%)
  • iOS (iPhone) (2.20%)
  • Linux (1.65%)

Compare that to my site, where Linux is the distant second biggest OS.

  • Windows: 90.56%
  • Linux: 6.37%
  • Macintosh: 2.33%
  • iPhone: 0.21%

For a browser-based app you expect to be run from the desktop, forcing people to use (or have access to) IE is not that much to ask. The vast majority can, if pushed, switch to IE for that specific task.

I don’t think you can lump mobile and non-mobile into one pot. Mobile apps have so many constraints to consider that they will invariably be treated as a separate project that must *definitely* be multi-browser compliant or a native app.

Browser-based apps that are intended for desktop users are different because about 90% of the time (according to the stats) they will be used on a Windows PC, having access to IE.

Obviously, your intention should always be to build apps that are multi-browser compliant, but depending on the nature and purpose of the specific app, having to open IE to run it will have zero impact on the vast majority of users (both home and business) until Windows loses its desktop dominance or the desktop ceases to exist…

Going back to the app that started this thread, Oracle’s Web Conferencing, is it a problem that this is IE only? Well it’s a pain for me because I’m a Mac and Linux user, but it’s not insurmountable because I can use a VM. I’ve never needed or wanted to use this functionality from a mobile device, so the IE constraint hasn’t affected me in that respect. In this case it’s very much a business app, so the vast majority of users will be sitting at a Windows PC. With that in mind, this is one of those cases where the IE constraint is annoying, but acceptable.

Cheers

Tim…




DBCA failing to create RAC databases

This is a weird problem I ran in today. As part of an automation project the code deploys RAC One databases across a cluster, depending on the capacity available of the node. These are 128G RAM BL685c G6 currently but will be upgraded to G7 later.

Now, my problem was that after the weekend we couldn’t deploy any more RAC One databases, except for 1 node. DBCA simply created single instance databases instead. Newly created databases were properly registered in the OCR, and their build completed ok, but not as RAC One databases. Take for example this database:

$ srvctl config database -d MYDB
Database unique name: MYDB
Database name: MYDB
Oracle home: /u01/app/oracle/product/11.2.0.2
Oracle user: oracle
Spfile: +DATA/MYDB/spfileMYDB.ora
Domain: example.com
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: MYDB
Database instances: MYDB
Disk Groups: DATA
Mount point paths:
Services:
Type: SINGLE
Database is administrator managed

How come? We are sure that we pass the RACOneNode flag to dbca, which can be found in the command line. Trying again I spotted these (alongside the sys and system passwords … you should change these as soon as DBCA completes!)

[rac]oracle@node2.example.com $ ps -ef|grep MYDB
oracle   14865 14854 65 11:22 ?        00:00:07 /u01/app/oracle/product/11.2.0.2/jdk/jre/bin/java -Doracle.installer.not_bootstrap=true -DORACLE_HOME=/u01/app/oracle/product/11.2.0.2 -DSET_LAF= -Dsun.java2d.font.DisableAlgorithmicStyles=true -Dice.pilots.html4.ignoreNonGenericFonts=true -DDISPLAY= -DJDBC_PROTOCOL=thin -mx128m -classpath ... oracle.sysman.assistants.dbca.Dbca -silent -createDatabase -templateName /u01/app/oracle/product/admin/templates/Default.dbc -gdbName MYDB.example.com -RACOneNode -RACOneNodeServiceName MYDB_APP.example.com -sid MYDB -sysPassword xxx -systemPassword xxx -emConfiguration NONE -totalMemory 4096 -storageType ASM -asmSysPassword xxx -diskGroupName DATA -initParams db_create_file_dest=+DATA,cpu_count=1 -nodelist node2
oracle   15415  5109  0 11:22 pts/0    00:00:00 grep MYDB

So why the problem? Looking at the dbca trace file I found these lines

[main] [ 2011-03-14 14:15:31.845 GMT ] [SQLEngine.initialize:363]  Starting Reader Thread...
[main] [ 2011-03-14 14:15:31.927 GMT ] [OracleHome.initOptions:1240]  executing: startup nomount pfile='/u01/app/oracle/product/11.2.0.2/dbs/initDBUA0.ora'
[main] [ 2011-03-14 14:15:55.417 GMT ] [SQLEngine.done:2167]  Done called
[main] [ 2011-03-14 14:15:55.418 GMT ] [OracleHome.initOptions:1247]  ORA-00304: requested INSTANCE_NUMBER is busy

oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-00304: requested INSTANCE_NUMBER is busy

 at oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1655)
 at oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeSql(SQLEngine.java:1903)
 at oracle.sysman.assistants.util.OracleHome.initOptions(OracleHome.java:1241)
 at oracle.sysman.assistants.dbca.backend.SilentHost.initialize(SilentHost.java:179)
 at oracle.sysman.assistants.dbca.Dbca.execute(Dbca.java:116)
 at oracle.sysman.assistants.dbca.Dbca.main(Dbca.java:180)
[main] [ 2011-03-14 14:15:55.420 GMT ] [OracleHome.initOptions:1250]  executing: select parameter from v$option where value='TRUE'
[main] [ 2011-03-14 14:15:55.420 GMT ] [SQLEngine.reInitialize:735]  Reinitializing SQLEngine...

Interesting-the ORA-304 error sticks out. The DBCA logs are in $ORACLE_BASE/cfgtoollogs/dbca/dbName/ in 11.2 btw. Further down the logfile it then determines that the RAC option is not available. This isn’t true-and I checked on each node:

$ cd $ORACLE_HOME/rdbms/lib
$ nm -r libknlopt.a | grep -c kcsm.o
1

That was identical on all nodes. So we definitely had RAC compiled into the oracle binary. I also compared the size and timestamp of all oracle binaries in $ORACLE_HOME only to find them identical. However dbca didn’t seem impressed with my contradiction and went on creating single instance databases. That now became a little inconvenient.

I then tried relocating one of the succesfully created RAC One databases to the nodes where we had problems building them, hoping to find out more about the problem. At this stage I was convinced there was a problem with semophores or other SysV IPC.

I ceratainly didn’t want to use the Windows Fix and reboot!

Moving On

So to recap, we should be able to build RAC (One Node) databases as the option is compiled into the binary, and yet it doesn’t work. From the trace I gathered that Oracle builds an auxiliary instance first, and uses initDBUA0.ora in $ORACLE_HOME to start it. So where are it’s logs/where’s the diagnostic dest? Turns out it is in $ORACLE_HOME/log/ – simply set your ADR base to this location and use the familiar commands. And this finally give me a clue:

*** 2011-03-14 12:06:40.104
2011-03-14 12:06:40.104: [ CSSCLNT]clssgsGroupJoin: member in use group(0/DBDBUA0)
kgxgnreg: error: status 14
kgxgnreg: error: member number 0 is already in use
kjxgmjoin: can not join the group (DBDBUA0) with id 0 (inst 1)
kjxgmjoin: kgxgn error 3

So somewhere else in the cluster had to be a DBUA0 instance that prevented my new instance from starting. A quick trawl through the process table on all nodes revealed that DBUA was active on node6. Shutting that down solved the problem!

Summary

DBCA is a nice tool to create databases, together with user definable templates it is really flexible. From a technical point of view it works as follows:

  • For RAC and RAC One Node it tries to create an auxiliary instance, called DBUA0, as a cluster database. If DBUA0 is used on the same node, it will use DBUA1 etc.
  • Next it will rename the database to what we assign on the command line
  • It then performs a lot more actions which are not of relevance here.

In my case, one of these DBUA0s aux instances was still present on a different node in the cluster as a result of a crashed database creation. When subsequent calls to dbca created another auxiliary (cluster!) DBUA0 instance on a different node, it wasn’t aware that there was a DBUA0 already and LMON refused to create it. This is expected behaviour- instance names have to be unique across the cluster. The DBUA0 of node2 for example clashed with the one on node6.

Why did it work on p6 then I hear you ask? DBCA seems to have code logic to establish that DBUA0 on a node is in use, and uses DBUA1 next.

Update

I got this update from Oracle Support who acknowledge this as a bug:

Notes 16-Mar-2011 1:21:16 GMT+00:00 PM Oracle Support
Unscheduled
Generic Note
————————
Dear Martin,

Following bug is created for this particular issue.

Bug 11877668 – DBCA DOESN’T CREATE A RAC ONE DATABASE IN SILENT MODE

Captain’s Fury…

Captain’s Fury is the fourth book in the Codex Alera series by Jim Butcher. The book starts two years on from where the Cursor’s Fury ended.

I mentioned in my previous post that the last 100 pages of Cursor’s Fury was exiting. Captain’s Fury makes the previous book seem tame in comparison. It’s like a juggernaut ramming into your head for 600 pages. I felt physically drained by the end of it. Totally cool.

Cheers

Tim…

Another Day, Another....

Another day, another day of Real World Performance. We are bringing it East. The first four events where held in Los Angeles, San Diego, Seattle and Phoenix. The next are are in

  • Houston Texas, Wednesday April 20th
  • Atlanta Georgia, Friday April 22nd
Click on the link above to get more information and see some feedback from the first four events. It is a day long event - with early bird registration breaks if you register before March 18th! It'll be $75 off if you register early. That is just $100 for IOUG members and $150 for non-members. Given that you get a delicious box lunch as part of the deal - that's a pretty good price.

There is life in the old dog yet…

My Blog has been unforgivably quiet of late. All I can say in my defence is “work”. I’ve blogged about this before, and mentioned it in presentations, but most of us are too tied up in the day job to test things properly, let alone blog or in other ways present on how things work. We fix the problem in front of us and move on to the next “critical, absolute priority 1A plus, must-be-fixed-today” issue. So like most of you, that has been my life for…ohh, months.
{I makes me even more thankful to those who continually find the time to test properly and blog about what they have found, on top of their “earning” work}

However, my working life is looking like it will return to a more reasonable balance soon.

So, this blog post is a statement of intent that I WILL be blogging again in the next week or so. OK, so this post is not of any use to anyone reading, but it means I now feel morally obliged to follow up on the statement (and that is actually the intent of this blog post).

I’m going to blog on two technical areas.

One is gathering Stats, which I am utterly sick of and tired of doing in my working life. So I figure if I tell anyone who will listen all I know about gathering system, fixed object, dictionary and object stats and give some hints as to what I have come to think of as methods and techniques for doing so, I might not have to think about it any more. I can tell people to read my blog and not hire me to do it for them. Maybe a career limiting move but I was never that bright :-)

The second topic will be Index Organised Tables (IOTs). They are great. Ohhh, they have drawbacks and concerns, after all no tuning trick is For Free and I know one person (who I shall refer to only as Dave) who’s career was almost destroyed due to an Oracle 9 to 10 “feature” on IOTs involving corruption. But IOTs are {in my opinion} a vastly under-used feature of Oracle and could be useful to many Oracle sites. If anyone wants help with them, I’ll let you hire me for that and I will come and help gladly. So long as no bl00dy stats are involved :-)

So, having drawn my own line in the sand to do some Technical Blogs (I actually have enough waffle-based Friday Philosophy topics to last 2 years but have promised myself to balance them with decent technical posts) I better go and write them.

Popularity vs Performance vs Productivity

A more popular car will have better servicing as parts and skills will be common. A fast car is compelling but sometimes speed is easily sacrificed for comfort.

Recently the question of language efficiency verses productivity has been nagging me. The nagging question includes  what  languages, frameworks and packages are best for collecting data, analyzing data and exposing the data in a rich UI via a web browser.

Productivity

Here is an interesting graphic on speed of the programs in various languages

http://hinchcliffe.org/img/weblanguagecomparison2.png

Now the above graphic might be hard to read (wish it was larger) but what it is indicating is that Python is  roughly 10x times slower in execution time to than the equivalent programs in Java. On the other hand the expect productivity gain for a programmer is 5-10x faster in Python than Java. Now on the surface my reaction is “OK, it’s faster to program in Python, ok, but Java is 10x faster so it’s clearly the choice!”, but if I put the numbers into real life I’m like OMG – imagine a Python program that takes me 1 day to write, now that same program would take me 5-10 days in Java!? Thats a huge deal. Hands down, I’m going to program in Python (or something other than Java).  I can deal with some runtime performance issues.

Should a company program in Java or Python? What if Python would take a year to bring to market? What if the Java verison took 5-10 years ?!  When asked whether if he could go back and recreate Slideshare in some other language than Ruby, Jonathan Boutelle, said the question is mute. If they hadn’t written it in Ruby and had in Java they wouldn’t even be having the conversation. Sure Ruby had some scaling issues for them, but they are  released  Slideshare on the market and became successful.

Here is another graphic on productivity from the book “From Java to Ruby

The data is base on a study of productivity in C, Java, Perl, Python: http://page.mi.fu-berlin.de/prechelt/Biblio//jccpprt_computer2000.pdf

As far as productivity comparisons, I think the following two images of Python vs Java explain it nicely:

http://blog.chenzhongke.com/2010/05/why-python-is-more-productive-than-java.html

For a down and out dirty view Java sprawl that further exacerbates productivity check out:

http://chaosinmotion.com/blog/?p=622

(as a rebuttal try pursing hello world in gnu packaging : http://ftp.gnu.org/gnu/hello/hello-2.6.tar.gz )

Performance

Here is a fascinating site with empirical data on speed of languages

http://shootout.alioth.debian.org/

Current speed might be a bit misleading as popularity will impact the efforts put into the issues of a language and a language that is popular though maybe less efficient will see improvements, for example it’s interesting to see the improvements in Java over the years, and now to see the improvements in Ruby.  Though Ruby is slower than  Scala and Scala’s productivity may even be better than Ruby, Scala doesn’t, yet, have the market momentum, thus the assurances that it is a language to invest in now.

Also if speed of the language is an issue the solution is not to throw the baby out with the bathwater, ie throw out the productive programming language altogether and go for a fast one such as C or JAVA but to find the areas of slowness in the productive framework and replace those with a fast function in another language, ie polyglot programming.

Popularity

another interesting image on the current usage of languages:

http://www.dataists.com/2010/12/ranking-the-popularity-of-programming-langauges/

Languages mentioned in job postings as % and % increase

http://www.soa-at-work.com/2010/02/it-job-trends-which-technologies-you.html

The issues of showing total growth  verses % growth lend itself well to heat map representation. Here is a heat map from O’Reilly:

(Interesting to all the Oracle folks: PL/SQL shows a 51% growth 2007-2008.

Here is data from O’Reilly on growth over time, showing Python and C# growing steadily:

http://radar.oreilly.com/2009/02/state-of-the-computer-book-mar-22.html

http://radar.oreilly.com/2011/02/2010-book-market-4.html

Tiobe Index

http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

Here is google trends normalized by C programming (with out normalization all trends are down because google trends show hits normalized by all traffic and as general traffic increases verses computer geek searches general computer geek searches trend downwards)

Interesting how Ruby seems to be trending down (as normalized against C  searches)

another perspective: Google Insights:

Just for Fun

here  is how I started my morning, a fun video on Python, that catalyzed the above discussion:

other links

hello world in different JVM/CLR languages http://carlosqt.blogspot.com/2010/06/most-active-net-and-jvm-languages.html

Oracle Exadata Performance series – Part 1: Should I use Hugepages on Linux Database Nodes?

There was a question in LinkedIn forum about whether Linux Hugepages should be used in Oracle Exadata Database layer, as they aren’t enabled by default during ACS install. I’m putting my answer into this blog entry – apparently LinkedIn forums have a limit of 4000 characters per reply… (interestingly familiar number, by the way…:)

So, I thought that it’s time to start writing my Oracle Exadata Performance series articles what I’ve planned for a while… with some war stories from the field, some stuff what I’ve overcome when researching for writing the Expert Oracle Exadata book etc.

I’ve previously published an article about Troubleshooting Exadata Smart Scan performance and some slides from my experience with VLDB Data Warehouse migrations to Exadata.

Here’s the first article (initially planned as a short response in LinkedIn, but it turned out much longer though):

As far as I’ve heard, the initial decision to not enable hugepages by default was that the hugepages aren’t flexible & dynamic enough – you’ll have to always configure the hugepages at OS level to match your desired SGA size (to avoid wastage). So, different shops may want radically different SGA sizes (larger SGA for single-block read oriented databases like transactional/OLTP or OLAP cubes), but smaller SGA for smart scan/parallel scan oriented DWs. If you configure 40GB of hugepages on a node, but only use 1GB of SGA, then 39GB memory is just reserved, not used, wasted – as hugepages are pre-allocated. AMM, using regular pages, will only use the pages what it touches, so there’s no memory wastage due to any pre-allocation issues…

So, Oracle chose to use an approach which is more universal and doesn’t require extra OS level configuration (which isn’t hard at all though if you pay attention, but not all people do). So, less people will end up in trouble with their first deployments although they might not be getting the most out of their hardware.

However, before enabling hugepages “because it makes things faster” you should ask yourself what exact benefit would they bring you?

There are 3 main reasons why hugepages may be useful in Linux:

1) Smaller kernel memory usage thanks to less PTEs thanks to larger pagesizes

This means less pagetable entries (PTEs) and less kernel memory usage. The bigger your SGA and the more processes you have logged on, the bigger the memory usage.

You can measure this in your case – just “grep Page /proc/meminfo” and see how big portion of your RAM has been used by “PageTables”. Many people have blogged about this, but Kevin Closson’s blog is probably the best source to read about this:

2) Lower CPU usage due to less TLB misses in CPU and soft page-fault processing when accessing SGA.

It’s harder to measure this on Linux with standard tools, although it is sure possible (on Solaris you can just run prstat -m to get microstate accounting and look into TFL,DFL,TRP stats).

Anyway, the catch here is that if you are running parallel scans and smart scans, then you don’t access that much of buffer cache in SGA at all, all IOs or smart scan result-sets are read directly to PGAs of server processes – which don’t use large pages at all, regardless of whether hugepages for SGA have been configured or not. There are some special cases, like when a block clone has to be rolled back for read consistency, you’ll have to access some undo blocks via buffer cache… but again this should be a small part of total workload.

So, in a DW, which using mostly smarts scans or direct path reads, there won’t be much CPU efficiency win from large pages as you bypass buffer cache anyway and use small pages of private process memory. All the sorting, hashing etc all happens using small pages anyway. Again I have to mention that on (my favorite OS) Solaris it is possible to configure even PGAs to use large pages (via _realfree_heap_pagesize_hint parameter) … so it’ll be interesting to see how this would help DW workloads on the Exadata X2-8 monsters which can run Solaris 11.

3) Lock SGA pages into RAM so they won’t be paged out when memory shortage happens (for whatever reason).

Hugepages are pre-allocated and never paged out. So, when you have extreme memory shortage, your SGAs won’t be paged out “by accident”. Of course it’s better to ensure that such memory shortages won’t happen – configure the SGA/PGA_AGGREGATE_TARGET sizes properly and don’t allow third party programs consume crazy amounts of memory etc. Of course there’s the lock_sga parameter in Oracle which should allow to do this on Linux with small pages too, but first I have never used it on Linux so I don’t know whether it works ok at all and also in 11g AMM perhaps the mlock() calls aren’t supported on the /dev/shm files at all (haven’t checked and don’t care – it’s better to stay away from extreme memory shortages). Read more about how the AMM MEMORY_TARGET (/dev/shm) works from my article written back in 2007 when 11g came out ( Oracle 11g internals – Automatic Memory Management ).

So, the only realistic win (for DW workload) would be the reduction of kernel pagetables structure size – and you can measure this using PageTables statistic in /proc/meminfo. Kevin demonistrated in his article that 500 connections to an instance with ~8 GB SGA consisting of small pages resulted in 7 GB of kernel pagetables usage, while the usage with large pages (still 500 connections, 8 GB SGA) was about 265 MB. So you could win over 6 GB of RAM, which you can then give to PGA_AGGREGATE_TARGET or to further inrease SGA. The more processes you have connected to Oracle, the more pagetable space is used… Similarly, the bigger the SGA is, the more pagetable space is used…

This is great, but the tradeoff here is manageability and some extra effort you have to put in to always check whether the large pages actually got used or not. After starting up your instance, you should really check whether the HugePages_Free in /proc/meminfo shrank and HugePages_Rsvd increased (when instance has just started up and Oracle hasn’t touched all the SGA pages yet, some pages will show up as Rsvd – reserved).

With a single instance per node this is trivial – you know how much SGA you want and pre-allocate the amount of hugepages for that. If you want to increase the SGA, you’ll have to shut down the instance and increase the Linux hugepages setting too. This can be done dynamically by issuing a command like echo N > /proc/sys/vm/nr_hugepages (where N is the number of huge pages), BUT in real life this may not work out well as if Linux kernel can’t free enough small pages from right physical RAM locations to consolidate 2 or 4 MB contiguous pages, the above command may fail to create the requested amount of new hugepages.

And this means you should restart the whole node to do the change. Note that if you increase your SGA larger to the number of hugepages (or you forget to increase the memlock setting in /etc/security/limits.conf accordingly) then your instance will silently just use the small pages, while all the memory pre-allocated for hugepages stays reserved for hugepages and is not usable for anything else!).

So, this may become more of a problem when you have multiple database instances per cluster node or you expect to start up and shut down instances on different nodes based on demand (or when some cluster nodes fail).

Long story short – I do configure hugepages in “static” production environments, to save kernel memory (and some CPU time for OLTP type environments using buffer cache heavily), also on Exadata. However for various test and development environments with lots of instances per server and constant action, I don’t bother myself (and the client) with hugepages and make everyone’s life easier… Small instances with small number of connections won’t use that many PTEs anyway…

For production environments with multiple database instances per node (and where failovers are expected) I would take the extra effort to ensure that whatever hugepages I have preallocated, won’t get silently wasted because an instance wants more SGA than the available hugepages can accommodate. You can do this by monitoring /proc/meminfo’s HugePage entries as explained above. And remember, the ASM instance (which is started before DB instances) will also grab itself some hugepages when it starts!

Share

Symposium 2011 - My Presentation

I think the best approach here is to focus on the technical details of the mistake first and then follow with any whining, self-justification or philosophy.

The mistake I made in my presentation was to suggest at least twice that simply adding a new partition to a composite partitioned object is enough to completely invalidate the aggregated global stats at the table level. That was based on me taking a valid example from the white paper (listings 13 and 14) and an earlier blog post and (badly) converting it to a simple partitioned table example.

The bit I missed in the conversion (and subsequently reinforced verbally) was that it's the subsequent gathering of statistics on a single subpartition and not all of the subpartitions that causes the stats to go bad because some subpartitions are missing statistics. To put it a more elegant way, I liked the way Wolfgang Breitling expressed it to me - 'A DDL operation will not invalidate the stats'. He's quite right. What invalidates the stats is making the wrong calls to DBMS_STATS.

The white paper hasn't changed because the example there was always correct and reflected a real world issue. Here are the specific corrections I made to the slides.

Changed whole section to missing *Sub*partition stats to reflect the example in the paper correctly

Slide 35 - Changed
Solution - quickly add a new partition *and gather stats* (incorrectly gather stats, as it happens)

Slides 36-39
Changed diagrams to show new partition and subpartitions being added but stats only being gathered on one of the new subpartitions, which invalidates both the global and partition stats for the new partition because not all of the underlying component subpartitions have had stats gathered on them.

Essentially, the simple act of adding partitions and subpartitions does not invalidate aggregated global stats, but partially-gathered stats will.

Slides 47-49
Fixed repeated implication that adding partition invalidates aggregated stats - it does not.

Slide 53
Toned down some of the negativity about Dynamic Sampling after discussion with Wolfgang

With the benefit of time and reflection, it was a bad mistake but I don't think it was anywhere near to invalidating the presentation which I still think was one of my better ones (no demos, you see ;-)). I also think it was a perfect example of several essentials of sharing technical information either through presentations or articles.

- By publishing the scripts you've used and the results, others can look at the tests and see where you've gone wrong.
- By publshing the scripts you've used and the results, you can immediately see where you've gone wrong when someone questions your results! (As soon as I opened my laptop after being questioned by Wolfgang and Maria, I realised what I'd done.)
- When trying to translate real results to pretty pictures, make sure you don't screw up the essential detail in an effort to make things look simpler!
- Don't rush your slides ;-)
- Don't decide that what is on your slides must be true when you've already got a paper showing the truth!
- Don't believe anything any presenter tells you without seeing the results and then checking for yourself. Trust but Verify is the common mantra.

Anyway, my thanks to Maria Colgan and particularly to Wolfgang Breitling for doing the right thing by highlighting my error and discussing it in some depth so that everyone can get towards the correct information. Although I was naturally a little grumpy that I'd made a mistake in an area I actually know quite a lot about, the subsequent discussions with Wolfgang is adding to our collected pool of knowledge in very interesting ways.

All of these links are to updated materials.

http://www.slideshare.net/dougburns/statistics-on-partitioned-objects
http://oracledoug.com/stats_slides.pdf
http://oracledoug.com/stats.pptx
http://oracledoug.com/stats.docx

Symposium 2011 - Day 3

I woke up at 6:30 but just closed my eyes for a moment and it was 9:15! ;-) That was the first presentation blown away - another tough choice between Margaret Norman discussing Parallel Execution and Karl Arao on AWR (nice write-up by Karl). I only hope they're more timely and reliable in uploading their slides than I've proved to be this time so that, once I've got my portal access sorted out, I can look through all of the Symposium materials.

I managed to show up just a little late for Toon Koppelaars - Triggers Considered Harmful, Considered Harmful. Which I was very keen to see. If you have looked at AM4DP, the book co-authored by Toon and Lex de Haan, you will know that it discusses in detail the use of triggers to implement more detailed business logic in the database as assertions. Contrast that with the oft-quoted Tom Kyte mantras about the dangers of database triggers and there's a dilemma to resolve. In the end it turns out that Tom and Toon agree on many inappropriate uses of triggers and even when they disagree on using triggers to implement business logic, it's largely because Tom doubts the majority of developers ability to do so safely and correctly. I tend to agree with Tom, based on the majority of people I've worked with. As this is a very descriptive paragraph, I'll let them both know in case I've misrepresented their views. Fascinating debate, anyway.

I also managed to catch Maria Colgans - Implement Best Practices for Extreme Performance with Oracle Data Warehousing - well the first half of it, as she got so caught up in the subject that she ran out of time a little and didn't get through all of the material. I reckon that's what happens when you're having too much fun presenting to an audience who are interested in the details ;-) It was no big deal as, again, people can review the slides and her white paper later. However, I really must pull her up on what I am sure was a major error in her presentation! As far as I'm concerned, and I've now checked, the plural of synopsis is synopses! Not sure I'll be able to trust another word she says, now!

Which meant I missed Mark Farnham's: Physical Ordering of Data:Is it Ever Useful? Yet another tough call.

As the sessions switched to a single stream for the final afternoon, I was determined to have at least one beer with Alex Gorbachev, who I hadn't seen nearly enough of, although I saw even less of Marco Gralike who went down with what appeared to be food poisoning at the start of the week and was holed up in his room for the entire conference! Poor guy. Alex was convinced the hotel bar would be open at noon but alas not and I just about convinced him we shouldn't break in and serve ourselves! ;-) We didn't want to miss Tanel  Põders presentation, so it looked like beers were out and Alex had to head to the airport immediately after the Farewell ceremony :-( Then, in a genius moment, I remembered my mini-bar and produced one can of beer each which we cracked open in the back row just as Tanel was starting to present on The Most Complex Oracle Performance Problem I've Ever Seen. Tanel was as interesting and easy to listen to as always and ran through a good range of the different tools used to diagnose a problem with milliions of .aud files in the audit directory. Oh my god, that sounded so familiar to me! (Well, apart from the fact that we'd managed to build up a more impressive 20 million.)

The final session of the conference was Tom Kyte's Just in Time. Why Just in Time? Because this was a presentation written as the Symposium progressed with the intention to wrap up the major themes of the conference and anything that Tom had (re-)learned. Nice idea. Tom pointed out that it meant he'd been very much an attendee at the conference for a change and recognised the challenge of 3 days having your head filled with information! Here's some evidence, too, courtesy of Hotsos and (I suspect) Becky Goodman.

He kicked off with a list of funnies that he'd learned - including the fact that I used to write home computer games for a living and gave it all up for Oracle - and then moved through a bunch of subjects, including how systems have changed over the years but we keep reintroducing the same old problems. He also had a section on how prevalent Optimiser Statistics presentations had been at the conference which reflected the fact that it's an issue Oracle customers are really scared about or at least focussed on. I think he implied that he might have learnt something about the aggregation process from my presentation, which was cool, but not as cool as how he handled the mistake I'd made in my own presentation. Something else he learned - Never be afraid to admit you made a mistake. I'd emailed him the night before to ask if he'd mention it so that people knew to download the (cough) updated slides. (Oh, ok, so as he pointed out, maybe there was never a version 1.0 incorrect slides uploaded yet but let's not be too picky here ;-) ) He was very kind to point out that many people would want to sweep their mistakes under the carpet so I was actually quite touched by how he handled it and thanked him afterwards. Thanks to Jacco Landlust as well for suggesting this approach. Maria Colgan had offered me 5 minutes of her presentation to talk through it but, given her slack time-keeping, I'm glad I went with Tom. Tee hee :-)

Gary Goodman then ran us all through some farewells including the embarassing but welcome round of applause for the presenters and the results of the charity auction and then it was just time to say cheerio to a few people before heading for a night of drinks and more fun with several of my old Symposium friends. Not people I get to see very often. However, I'd prefer to draw a discrete veil over the moment when I was telling the story of a Cuddly Toy who likes to hang upside down on a rope in our living room singing 'No one I think is in my tree. I mean it must be high or low ....'.

You had to be there. Or perhaps not.

Disclosure: I'm attending this year's Hotsos Symposium with the
help
of the Oracle ACE Director program, which is paying my travel and
accommodation expenses. The time off work is at my own expense.