Search

OakieTags

Who's online

There are currently 0 users and 38 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Friday Philosophy – OK, so I am on Twitter Now

Not a very exciting Friday Philosophy this week I’m afraid, just a self-publicising announcement that I am now on Twitter. I’ve put the wordpress widget on the blog for a while (days or weeks, I don’t know), my twitter name is MDWidlake. {I was a little surprised mwidlake had gone already but that says more about how rare I consider my surname to be than naivety, I hope}. It seems you can click on a part of the widget to follow me, which is a pretty safe thing to do as I am not very verbal as yet.

As I said, I’m not very active at the moment, I’m more following just a few friends and seeing what people use Twitter for. So far it mostly seems to be about:

  • Random stuff posted when bored
  • Complaining about work or, more specifically, tasks that are proving trickier than hoped
  • Drinking
  • Random stuff posted when bored
  • Articles that have caught someone’s eye
  • …or more often, about tweets about articles that have caught someone’s eye
  • Chatty stuff that only makes sense between social peers (and isn’t that one of the main points of something like Twitter?)
  • Random stuff posted when bored
  • Cuddly toys. I think that is a result of low sample size and that Doug Burns is away at a conference. I worry about his sanity sometimes.

Niall Litchfield, Neil Chandler and Doug Burns were right {thanks again for your advice, gents}, there is some nice stuff on there and I’ve already seen some articles and web pages I found interesting via it – but I have also failed to get on with proper work-like stuff I should have been doing as a result.

I also like the chatty extension to real social engagement that Twitter gives but I hold out on my final decision as to whether this makes up for the negative impact it seems to have on real, meeting-in-person socialising.

The interface to Twitter seems a bit, well, rubbish to me. I know, I’ve been on there for all of a week and I am probably missing the Bleedin’ Obvious  but it seems the stuff I see in Timeline, the default view, is just a subset of what people I follow say. I suspect that it’s got something to do with whether the person the tweet is replying to is on my follow list. To understand half the social stuff you have to go clicking around on people’s full tweet history and follow the thread back. Surely there is an easier way than this, maybe some connect-by tree-walk SQL could be invoked…

I’ve already dropped one person off my “following” list. I only followed one celebrity and I decided I could live without the random musings of Simon Pegg. I can imagine people get addicted to following several dozen b to z level celebs, maybe it’s like constantly living in some sort of poor quality reality tv show {Personally I tend to avoid all reality TV, I prefer reality. Except that I am forced to watch that dancing thing on BBC by my wife. And like most men who make that sort of defence, I can’t quite explain away why I still watch it if she is away…}.

So, don’t expect too much in the way of interesting, witty, insightful or even existing tweets from me as yet, but if you want to follow, heck you can always drop me like a sack of manure any time you like :-) .

Little Things Doth Crabby Make – Part XVII. I See xfs_mkfile(8) Making Fragmented Files.

BLOG UPDATE 21-NOV-2011: The comment thread for this post is extremely relevant.

 

I recently had an “exchange of ideas” with an individual. It was this individual’s assertion that modern systems exhibit memory latencies measured in microseconds.

Since I haven’t worked on a system with microsecond-memory since late in the last millennium I sort of let the conversation languish.

The topic of systems speeds and feeds was fresh on my mind from that conversation when I encountered something that motivated me to produce this installment in the Little Things Doth Crabby Make series.

This installment in the series has to do with disk scan throughput and file system fragmentation. But what does that have to do with modern systems’ memory latency? Well, I’ll try to explain.

Even though I haven’t had the displeasure of dealing with microsecond memory, this century, I do recall such ancient systems were routinely fed (and swamped) by just a few hundred megabytes per second disk scan throughput.

I try to keep things like that in perspective when I’m fretting over the loss of 126MB/s like I was the other day. Especially when the 126MB/s is a paltry 13% degradation in the systems I was analyzing! Modern systems are a modern marvel!

But what does any of that have to do with XFS and fragmentation? Please allow me to explain. I had a bit of testing going where 13% (for 126MB/s) did make me crabby (it’s Little Things Doth Crabby Make after all).

The synopsis of the test, and thus the central topic of this post, was:

  1. Create and initialize a 32GB file whilst the server is otherwise idle
  2. Flush the Linux page cache
  3. Use dd(1) to scan the file with 64KB reads — measure performance
  4. Use xfs_bmap(8) to report on file extent allocation and fragmentation

Step number 1 in the test varied the file creation/initialization method between the following three techniques/tools:

  1. xfs_mkfile(8)
  2. dd(1) with 1GB writes (yes, this works if you have sufficient memory)
  3. dd(1) with 64KB writes

The following screen-scrape shows that the xfs_mkfile(8) case rendered a file that delivered scan performance significantly worse than the two dd(1) cases. The degradation was 13%:

# xfs_mkfile 32g testfile
 # sync;sync;sync;echo "3" > /proc/sys/vm/drop_caches
 # dd if=testfile of=/dev/null bs=64k
 524288+0 records in
 524288+0 records out
 34359738368 bytes (34 GB) copied, 40.8091 seconds, 842 MB/s
 # xfs_bmap -v testfile > frag.xfs_mkfile.out 2>&1
 # rm -f testfile
 # dd if=/dev/zero of=testfile bs=1024M count=32
 32+0 records in
 32+0 records out
 34359738368 bytes (34 GB) copied, 22.1434 seconds, 1.6 GB/s
 # sync;sync;sync;echo "3" > /proc/sys/vm/drop_caches
 # dd if=testfile of=/dev/null bs=64k
 524288+0 records in
 524288+0 records out
 34359738368 bytes (34 GB) copied, 35.5057 seconds, 968 MB/s
 # xfs_bmap -v testfile > frag.ddLargeWrites.out 2>&1
 # rm testfile
 # df -h .
 Filesystem Size Used Avail Use% Mounted on
 /dev/sdb 2.7T 373G 2.4T 14% /data1
 # dd if=/dev/zero of=testfile bs=1M count=32678
 32678+0 records in
 32678+0 records out
 34265366528 bytes (34 GB) copied, 21.6339 seconds, 1.6 GB/s
 # sync;sync;sync;echo "3" > /proc/sys/vm/drop_caches
 # dd if=testfile of=/dev/null bs=64k
 522848+0 records in
 522848+0 records out
 34265366528 bytes (34 GB) copied, 35.3932 seconds, 968 MB/s
 # xfs_bmap -v testfile > frag.ddSmallWrites.out 2>&1

I was surprised by the xfs_mkfile(8) case. Let’s take a look at the xfs_bmap(8) output.

First, the two maps from the dd(1) files:

# cat frag.ddSmallWrites.out
 testfile:
 EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
 0: [0..9961471]: 1245119816..1255081287 6 (166187576..176149047) 9961472
 1: [9961472..26705919]: 1342791800..1359536247 7 (84037520..100781967) 16744448
 2: [26705920..43450367]: 1480316192..1497060639 8 (41739872..58484319) 16744448
 3: [43450368..66924543]: 1509826928..1533301103 8 (71250608..94724783) 23474176
 #
 # cat frag.ddLargeWrites.out
 testfile:
 EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
 0: [0..9928703]: 1245119816..1255048519 6 (166187576..176116279) 9928704
 1: [9928704..26673151]: 1342791800..1359536247 7 (84037520..100781967) 16744448
 2: [26673152..43417599]: 1480316192..1497060639 8 (41739872..58484319) 16744448
 3: [43417600..67108863]: 1509826928..1533518191 8 (71250608..94941871) 23691264

The mapping of file offsets to extents is quite close in the dd(1) file cases. Moreover, XFS gave me 4 extents for my 32GB file. I like that..but…

So what about the xfs_mkfile(8) case? Well, not so good.

I’ll post a blog update when I figure out more about what’s going on. In the meantime, I’ll just paste it and that will be the end of this post for the time being:

# cat frag.xfs_mkfile.out
testfile:
 EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
 0: [0..10239]: 719289592..719299831 4 (1432..11671) 10240
 1: [10240..14335]: 719300664..719304759 4 (12504..16599) 4096
 2: [14336..46591]: 719329072..719361327 4 (40912..73167) 32256
 3: [46592..78847]: 719361840..719394095 4 (73680..105935) 32256
 4: [78848..111103]: 719394608..719426863 4 (106448..138703) 32256
 5: [111104..143359]: 719427376..719459631 4 (139216..171471) 32256
 6: [143360..175615]: 719460144..719492399 4 (171984..204239) 32256
 7: [175616..207871]: 719492912..719525167 4 (204752..237007) 32256
 8: [207872..240127]: 719525680..719557935 4 (237520..269775) 32256
 [...3,964 lines deleted...]
 3972: [51041280..51073535]: 1115787376..1115819631 6 (36855136..36887391) 32256
 3973: [51073536..51083775]: 1115842464..1115852703 6 (36910224..36920463) 10240
 3974: [51083776..51116031]: 1115852912..1115885167 6 (36920672..36952927) 32256
 3975: [51116032..54897663]: 1142259368..1146040999 6 (63327128..67108759) 3781632
 3976: [54897664..55078911]: 1146077440..1146258687 6 (67145200..67326447) 181248
 3977: [55078912..56094207]: 1195607400..1196622695 6 (116675160..117690455) 1015296
 3978: [56094208..67108863]: 1245119816..1256134471 6 (166187576..177202231) 11014656

Filed under: oracle

Procedural SQL*Plus and Password Encryption

One  small but bothersome  issue I’ve had for 20 years is  how to drive a program like SQL*Plus with a shell script to make it procedural.  One approach is to just run single commands to SQL*Plus causing a connection and exit for every SQL statement. Connecting and disconnecting is costly. Ideally, I just want to open up a  connection and send commands to the connection and get the response. Of course languages like java, perl, python all have Oracle connection APIs but what if I just want a simple shell script and don’t have access to perl or python with the Oracle APIs or access to java? Can’t I just do it in shell?  Can’t I just connect SQL*Plus to a named pipe and echo my commands into the pipe? Well yes but there is an annoying obstacle. After echoing the first command the pipe, which SQL*Plus dutifully executes, SQL*Plus then exits. I first ran into this problem about 20 years ago and didn’t solve it at the time.  A few years later, I figured out a trick to make it work and have been using it ever since. The trick is to have a process run a “tail -f” of an empty file into the pipe. With this second process tailing, SQL*Plus doesn’t exit when reading from the pipe. Since I first started using this I’ve never looked into exactly why. It think that when SQL*Plus tries to read from the pipe after the first command has been sent, the OS says, no more data , and SQL*Plus exits. With the second process doing the “tail -f”, then the OS tells SQL*Plus, waiting for more data to send you, and SQL*Plus waits. Would love a more detailed explanation.

#!/bin/ksh
SID=orcl
PORT=1521
function usage
{
       echo "Usage: $(basename $0)    [sid] [port]"
       echo "  username        database username"
       echo "  username        database password"
       echo "  host            hostname or IP address"
       echo "  sid             optional database sid (default: orcl)"
       echo "  port            optional database port (default: 1521)"
       exit 2
}
[[ $# -lt 3 ]] && usage
[[ $# -gt 5 ]] && usage
[[ $# -gt 0 ]] && UN=$1
[[ $# -gt 1 ]] && PW=$2
[[ $# -gt 2 ]] && HOST=$3
[[ $# -gt 3 ]] && SID=$4
[[ $# -gt 4 ]] && PORT=$5
  MKNOD=/etc/mknod
  DEBUG=0
  OPEN=sqlplus.open
  PIPE=sqlplus.pipe
  CLEAN=sqlplus.clean
  sh ./$CLEAN > /dev/null 2>&1
  echo "" > $CLEAN
  echo "rm $OPEN" >> $CLEAN
  echo "rm $PIPE" >> $CLEAN
  rm $OPEN $PIPE > /dev/null 2>&1
  touch  $OPEN
  cmd="$MKNOD $PIPE p"
  eval $cmd
  tail -f $OPEN >> $PIPE &
  OPENID="$!"
  echo "kill -9 $OPENID" >> $CLEAN
  # run SQLPLUS silent unless DEBUG is 2 or higher
  SILENT=""
  if [ $DEBUG -lt 2 ]; then
      SILENT="-s"
  fi
  CONNECT="$UN/$PW@(DESCRIPTION= \
                     (ADDRESS_LIST=             \
                         (ADDRESS=              \
                             (PROTOCOL=TCP)     \
                             (HOST=$HOST)       \
                             (PORT=$PORT)))     \
                      (CONNECT_DATA=            \
                             (SERVER=DEDICATED) \
                             (SID=$SID)))"
   cmd="sqlplus $SILENT \"$CONNECT\" < $PIPE > /dev/null &"
   cmd="sqlplus $SILENT \"$CONNECT\" < $PIPE  &"
   echo "$cmd"
   echo "PIPE $PIPE"
   eval $cmd
   SQLID="$!"
   echo "kill -9 $SQLID" >> $CLEAN

Example execution

$ ./sqlplus_pipe.sh  scott tiger 192.168.1.2
PIPE sqlplus.pipe
$ echo 'select * from dual;' > sqlplus.pipe
D
-
X
$ echo 'exit' > sqlplus.pipe
$ sh sqlplus.clean

The above code will create a SQL*Plus connection reading it’s input from a pipe. It also creates a cleanup file to remove the pipe file and kill the tail process. If creating multiple connections then the file names will have to be pre/post-pended with some string to keep them separate. could be a timestamp. Could be info about which target.

Now what does this have to do with password encryption?
Well there are packages that handle password encryption. Oracle has as cool thing called wallets, that can be setup so that SQL*Plus can connect without a password.
Oracle’s Doc
http://download.oracle.com/docs/cd/B19306_01/network.102/b14266/cnctslsh.htm
Here is a quick setup:
http://www.oracle-base.com/articles/10g/SecureExternalPasswordStore_10gR2.php
Unfortunately Oracle’s wallet method requires files that can’t be redistributed. These file dependencies legally have to be installed by the end user,  where as Oracle instant client can be redistributed. So what if I’m redistributing a tool that uses instant client? Then the wallet solution is out of the question, at least for easy installs.
Now what if I create my own binary to handle password encryption, like “Oracle Password Repository”
http://opr.sourceforge.net/
This doesn’t help that much for protection, because a user can actually get the text password. What I want is a way to encrypt passwords and hide  the unencrypted passwords from the user. Sure I want  them to connect, then they have access to the database,  but I want to prevent them from walking off with a file full of clear text passwords.  One solution, like the Oracle wallet with the “cwallet.sso” files, is to have a binary that creates the connections  for SQL*Plus over a pipe and then pass the pipe to the users. Bit of a hack, especially for an interactive users, but for scripts that run SQL*Plus it not only centralizes the passwords, but it encrypts and  helps prevent the user from getting access to and walking away with a set of clear text passwords.

NOTE:

Reguarding the beginning of this blog post and the problem of SQL*Plus exiting after receiving the first command via a named pipe, here is what truss looks like echoing ‘select * from dual’ into the pipe with and without having a second process tailing (nothing) into the pipe. First example has a second process doing a “tail -f ” of an empty file into the pipe while echoing ‘select * from dual’ into the pipe which SQL*Plus reads, executes and stays connected

fstat64(1, 0x08044440)                          = 0
write(1, "\n", 1)                               = 1
write(1, " D\n", 2)                             = 2
write(1, " -\n", 2)                             = 2
write(1, " X\n", 2)                             = 2
write(1, "\n", 1)                               = 1
read(0, 0x0813FAD4, 5120)       (sleeping...)

Second example, there is no “tail -f” and we just do “echo ‘select * from dual;’” into the pipe which SQL*Plus executes then exits:

write(1, "\n", 1)                               = 1
write(1, " D\n", 2)                             = 2
write(1, " -\n", 2)                             = 2
write(1, " X\n", 2)                             = 2
write(1, "\n", 1)                               = 1
write(1, " S Q L >  ", 5)                       = 5
read(0, 0x0813F714, 5120)                       = 0
write(4, "\0\r\0\006\0\0\0\0\003\t".., 13)      = 13
read(4, "\011\0\006\0\0\0\0\0\t01".., 2064)     = 17
write(1, " D i s c o n n e c t e d".., 95)      = 95
write(1, " W i t h   t h e   P a r".., 78)      = 78
write(4, "\0\n\0\006\0\0\0\0 @", 10)            = 10
close(4)

Here is a good explanation http://linux.die.net/man/7/pipe

If all file descriptors referring to the write end of a pipe have been closed, then an attempt to read(2)
from the pipe will see end-of-file (read(2) will return 0).

The part that isn’t explained, for me, is that a reader will wait until at the write end has been opened.  So the EOF doesn’t happened until there is an open and a close and all open write file descriptors have to be closed, thus adding a never finishing write will keep the reader from reading an EOF.

Here is the code for OpenSolaris (thanks to Erik Schrock for this links)

 

16K CR gets for UQ index scan? Easy. Kind of.

I planned to write on this for quite some time, but failed to do so. Sorry about that. Today I finally got time and desire to describe a situation from the title. It was observed on an Oracle 9.2.0.8 running Solaris SPARC; manifested itself as a severe CPU burning at 100% utilization with ‘latch free’ on the first place of the Top 5 wait events in Statspack report.

Here is part of a 1 hour report:

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:            -72,280.51            -18,601.83
              Logical reads:            411,881.06            106,000.09
              Block changes:              2,515.53                647.39
             Physical reads:                665.17                171.18
            Physical writes:                218.18                 56.15
                 User calls:                394.87                101.62
                     Parses:                 79.46                 20.45
                Hard parses:                  0.94                  0.24
                      Sorts:                885.81                227.97
                     Logons:                  0.05                  0.01
                   Executes:              1,160.92                298.77
               Transactions:                  3.89

  % Blocks changed per Read:    0.61    Recursive Call %:     75.88
 Rollback per transaction %:    5.17       Rows per Sort:      5.95

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:   99.98       Redo NoWait %:    100.00
            Buffer  Hit   %:   99.84    In-memory Sort %:    100.00
            Library Hit   %:   99.93        Soft Parse %:     98.82
         Execute to Parse %:   93.16         Latch Hit %:     89.32
Parse CPU to Parse Elapsd %:   84.20     % Non-Parse CPU:     99.57

 Shared Pool Statistics        Begin   End
                               ------  ------
             Memory Usage %:   90.41   91.42
    % SQL with executions>1:   31.20   35.28
  % Memory for SQL w/exec>1:   30.16   30.05

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total
Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free                                      2,529,903      49,710    48.22
CPU time                                                       31,694    30.75
db file sequential read                         1,228,775      13,164    12.77
buffer busy waits                                 269,229       3,076     2.98
enqueue                                             2,276       2,567     2.49 

And this is Latch sleep breakdown:

                                      Get                            Spin &
Latch Name                       Requests      Misses      Sleeps Sleeps 1->4
-------------------------- -------------- ----------- ----------- ------------
cache buffers chains        4,190,999,464 ###########   2,524,798 467719813/20
                                                                  75503/172064
                                                                  /31324/0
library cache                  27,510,901      44,028         931 43116/895/15
                                                                  /2/0
row cache enqueue latch        10,692,790      27,738          16 27722/16/0/0
                                                                  /0
cache buffer handles            6,478,263       7,240          66 7175/64/1/0/
                                                                  0
shared pool                     2,657,115       5,706         412 5300/400/6/0
                                                                  /0
...

As you can see, ‘latch free’ waits are almost entirely accounted to the infamous ‘cache buffers chains’ latch gets. You can easily find information on what it is and why it could happen, and how to diagnose such contention. Here I’ll just show you what I’ve found by accident (just staring at the report). I noticed that two queries – #2 and #3 in the SQL Ordered by Gets section – which are nothing but a unique index access to a table, had ~16K gets per execution:

                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
    333,759,439       21,538       15,496.3   22.4  7028.63  24546.46 4091048759
Module: JDBC Thin Client
SELECT 1 FROM T WHERE ID = :B1 FOR UPDATE

    259,164,764       16,137       16,060.3   17.4  5391.01  18934.74  307775861
Module: JDBC Thin Client
select ID from T where ID=:1

How that could possibly happen? At first I thought the index used by queries was missing, but that was not the case. Second thought – the index has exploded due to modifications (I found one query that probably had inserted lots of rows to the table, but rows_processed for the statement was zero) and its height was increased, which caused other sessions to do much more work when accessing both index and table blocks. That was just an idea which I didn’t know how to confirm or dismiss so I asked wise guys and they were kind enough to point out to me that index height increase is not a necessity to have a spike in CR gets for concurrent SQL access. All that is needed is just an uncommitted transaction that inserts rows to the table somewhere nearby the normally accessed data. Yep, it sounded reasonable to me and I was OK with this explanation. I did confirm that the long running INSERT statement was indeed adding millions of rows to the table due to an application bug (missing condition), but failed to do so because of transaction timeout on the WebLogic set to half an hour. So the statement actually was rolled back and that’s why rows_processed for the query was zero.
But when I tried to reproduce this situation to prepare test case for this blog entry, I was not able to do so. A simple access by index to the table with an in-flight transaction inserting huge amount of data is definitely not enough to reproduce such situation. So it must have been one of those scenarios when a (table or index) block has been updated many times, and subsequent transactions had to apply undo to reverse changes to a block made by other transactions. It’s very easy to setup so I won’t do that here. Also I think it’s impossible to identify which block caused the issue having just a standard Statspack report in hands. Anyway, here is statistics comparison that shows undo access was the cause for increased LIO and ‘cache buffers chain’ latch contention:

Statname Normal R1 R2 Comments
branch node splits

29

329

421

consistent changes

44,192,320

779,500,412

1,269,485,409

number of times undo was applied to construct a consistent block version
consistent gets

209,640,205

1,035,154,852

1,466,065,189

consistent gets – examination

142,031,460

889,574,780

1,366,583,799

number of times a block (such as undo or index root) was accessed under protection of ‘cache buffers chain’ latch
data blocks consistent reads – undo records applied

43,973,030

779,361,353

1,269,187,599

number of undo records applied while constructing CR block copy
leaf node splits

6,200

90,367

113,873

transaction tables consistent read – undo records applied

1,824

80,503

4,863

user calls

6,526,934

2,277,792

1,429,810

Filed under: Oracle, Performance Tagged: indexes, read consistency

Simplified GNS setup for RAC 11.2.0.2 and newer

One of the main problems I have seen with GNS (Grid Naming Service) installations was that you couldn’t really see if your DNS and DHCP configuration was correct until it’s been too late. This has been addressed, but it’s little known. There are a number of checks you can run before starting Oracle Universal Installer, and this post is about them.

What is the Grid Naming System?

I was initially drawn towards the GNS when it was initially released with 11.2.0.1. It is aimed at environments where the Oracle DBAs take on (yet another) piece of work, namely the DNS administration. By virtue of “subdomain delegation”, the master DNS server responsible for “example.com” hands off requests for a subdomain to this – rac.example.com – to an Oracle managed process. This was quite poorly documented initially, prompting me to figure it out myself in an earlier post: http://martincarstenbach.wordpress.com/2009/10/02/build-your-own-11-2-ra...

The problem with GNS in 11.2.0.1 was that you couldn’t really test if the DNS setup was sufficient for Oracle Installer to work, and I had a few attempts at the installation (the discussion here takes into account that I might not have been able to perform sufficient checking!)

Implementing DNS and DHCP

As a quick reminder, here is the subdomain delegation bit you add to your forward resolution zone file (with bind as the example):

$ORIGIN rac.localdomain.
@       IN      NS      gns.rac.localdomain.
gns.rac.localdomain.    IN      A       192.168.99.150

Here, I have a domain called “localdomain” for which 192.168.99.10 is the primary NS. Anything for “*.rac.localdomain” will be handed off to gns.rac.localdomain with IP address 192.168.99.150. This IP address will later be defined as the GNS VIP and as it seems, it’s the only address that must be registered in DNS. This is one of the biggest changes-with “manual” name resolution you’d add the SCAN, VIP, and public names for all cluster nodes in DNS before installation.

DHCP is very simple to set up and my example in the previously mentioned post are perfectly sufficient to get started.

Implementing on the host

The below is a sample “/etc/resolv.conf” I used – ensure that your /etc/nsswitch.conf sets “hosts” to files then DNS, which is the documented way.

options attempts: 2
options timeout: 1

search rac.localdomain localdomain
nameserver 192.168.99.150
nameserver 192.168.99.10

The “attempts” and “timeout” options were new to my builds, and the search order must include the GNS domain, before it bubbles up to the corporate domain. This doesn’t have to be a top level domain by the way, I just kept it simple.

The hostname, as defined in /etc/hosts (and NOT in DNS like I said), has to be in the DNS subdomain. I opted for rac11203gnsnode1.rac.localdomain, with IP address 192.168.99.34. Nothing else is defined in /etc/hosts, optionally you could define the private interconnect addresses in there as well.

Testing and validating the setup

The cluvfy utility has been enhanced to test GNS before you go through the installation process and finding out it all failed. Before you install, you invoke runcluvfy comp gns -precrsinst as in this example:

[oracle@rac11203gnsnode1 ~]$ runcluvfy comp gns -precrsinst -domain rac.localdomain -vip 192.168.99.150 -verbose -n rac11203gnsnode1

Verifying GNS integrity

Checking GNS integrity...
Checking if the GNS subdomain name is valid...
The GNS subdomain name "rac.localdomain" is a valid domain name
Checking if the GNS VIP is a valid address...
GNS VIP "192.168.99.150" resolves to a valid IP address
Checking the status of GNS VIP...

GNS integrity check passed

Verification of GNS integrity was successful.

The command is fairly self explanatory: pass the subdomain you want GNS to manage and the GNS VIP, and off it goes. With my settings in the zone file (the reverse file doesn’t have any entries) it completed successfully.

The GNS integrity can also be verified post installation as in this example:

[oracle@rac11203gnsnode1 ~]$ cluvfy comp gns -postcrsinst -verbose

Verifying GNS integrity

Checking GNS integrity...
Checking if the GNS subdomain name is valid...
The GNS subdomain name "rac.localdomain" is a valid domain name
Checking if the GNS VIP belongs to same subnet as the public network...
Public network subnets "192.168.99.0" match with the GNS VIP "192.168.99.0"
Checking if the GNS VIP is a valid address...
GNS VIP "192.168.99.150" resolves to a valid IP address
Checking the status of GNS VIP...
Checking if FDQN names for domain "rac.localdomain" are reachable

GNS resolved IP addresses are reachable

GNS resolved IP addresses are reachable

GNS resolved IP addresses are reachable
Checking status of GNS resource...
Node          Running?                  Enabled?
------------  ------------------------  ------------------------
rac11203gnsnode1  yes                       yes

GNS resource configuration check passed
Checking status of GNS VIP resource...
Node          Running?                  Enabled?
------------  ------------------------  ------------------------
rac11203gnsnode1  yes                       yes

GNS VIP resource configuration check passed.

GNS integrity check passed

Verification of GNS integrity was successful.
[oracle@rac11203gnsnode1 ~]$

Post installation

The srvctl utility has a few more options as well to query the GNS state. The most comprehensive one is the “-a” flag:

[oracle@rac11203gnsnode1 ~]$ srvctl config gns -a
GNS is enabled.
GNS is listening for DNS server requests on port 53
GNS is using port 5353 to connect to mDNS
GNS status: OK
Domain served by GNS: rac.localdomain
GNS version: 11.2.0.3.0
GNS VIP network: ora.net1.network

There are more granular options returning version, status etc. The most useful of these options seems to be the “-l”, listing all allocated IPs (note that 192.168.99.34 is NOT supplied by GNS, but defined in the host file). Since we only connect via VIP or SCAN this is not a problem.

For my 1 node experimental systems, these IP addresses were used:

[oracle@rac11203gnsnode1 ~]$ srvctl config gns -l
Name                 Type Value
gnsclu-scan1-vip     A    192.168.99.36
gnsclu-scan2-vip     A    192.168.99.37
gnsclu-scan3-vip     A    192.168.99.38
rac11203gnsnode1-vip A    192.168.99.35
scan                 A    192.168.99.36
scan                 A    192.168.99.37
scan                 A    192.168.99.38
[oracle@rac11203gnsnode1 ~]$

Finally {}

If found the following addition to crsctl option very useful to quickly set up a DNS test “server”-surely not something to be done in production but interesting nevertheless.

[oracle@rac11203gnsnode1 ~]$ crsctl start testdns -h

Usage:

 crsctl start testdns [-address ] [-port ][-domain ] [-once][-v]
 Start a test DNS listener that listens on the given address at the given port and for specified domain
Where
 IP_address IP address to be used by the listener (defaults to hostname)
 port The port on which the listener will listen. Default value is 53.
 domain The domain query for which to listen. By default, all domain queries are processed.

-once Flag indicating that DNS listener should exit after one DNS query packet is received
 -v Verbose output
[oracle@rac11203gnsnode1 ~]$

You might also consider this interesting DNS query tool:

[oracle@rac11203gnsnode1 ~]$ crsctl query dns -servers
CRS-10018: the following configuration was found on the system:
CRS-10019: There are 2 domains in search order. They are:
rac.localdomain
localdomain
CRS-10022: There are 2 name servers. They are:
192.168.99.150
192.168.99.10
CRS-10020: number of retry attempts for name lookup is: 4
CRS-10021: timeout for each name lookup is: 5
[oracle@rac11203gnsnode1 ~]$

And if you don’t like the manual page for nslookup, you could use this little command instead to interrogate your DNS setup:

[oracle@rac11203gnsnode1 ~]$ crsctl query dns -h
Usage:
 crsctl query dns -servers
 Lists the system configured DNS server, search paths, attempt and timeout values

crsctl query dns -name  [-dnsserver ] [-port ] [-attempts ] [-timeout ] [-v]
 Returns a list of addresses returned by DNS lookup of the name with the specified DNS server
Where
 name Fully qualified domain name to lookup
 DNS_server_address Address of the DNS server on which name needs to be looked up
 port Port on which DNS server is listening
 attempts Number of retry attempts
 timeout Timeout in seconds

Summary

The handling of GNS has improved greatly, and a lot more information is available about what’s happening under the covers. Unknown to many, the main Clusterware tools undergo revisions all the time, so it’s worth running crsctl commands on the various objects available with the “-h” flag to stay up to date.

Spam Poetry

Here is a spam comment that I thought deserved publishing in its entirety but without the spam links. I hope you all enjoy the lyricism as much as I did. Incomparable Blog. I tot up this Blog to my bookmarks.Thanks for alluring the in the nick of time b soon to examine this, I lean [...]

New Defaults, Old Side Effects

When 11.2 came out I posted about deferred segment creation at http://orawin.info/blog/2010/04/25/new-features-new-defaults-new-side-effects/ and a couple of odd side effects. Oracle published a Note  1050193.1 that makes the quite extraordinary claim that Sequences are not guaranteed to generate all consecutive values starting with the ‘START WITH’ value. It’s absolutely true that sequences don’t guarantee no gaps – but [...]

Skipped Initial Sequence Value on Oracle 11.2

I’m seeing an oddity with newly created sequences skipping the initial value. ie I create the sequence and the first use of it returns the value 2, not 1.

{update, see the post comments and this post by Niall – this is a side effect of delayed segment creation – the first insert silently fails, but gets the sequence, and is repeated behind the scenes once the segment is there. It is fixed in 11.2.0.3, my db is at 11.2.0.1 as I got sick of the download of the latest version failing on my poor broadband connection and got it off a mate – turns out it is not the latest version.
Thanks for the enlightenment Niall.}

This is on Oracle 11.2.0.1 on Windows (XP, Vista and now Windows 7 64 bit). I am sure I did not see this on Oracle 10.1 and 10.2 on linux, Tru64 and windows.

I create a set of test tables for when I run courses or just want to test some things, a few tables holding people, names, addresses, phone calls and monthly call summaries. I do this with a bunch of pretty crude scripts that create the data in steps. Only on Oracle 11.2 it all stopped working. I tracked down the problem to the skipping of the first value for the first sequence I create and use. I’ve just confirmed this on a new server I have created.

I’ve boiled it down to the following test case. I ensure my test table and two sequences are not there, create them and then insert 10 records for men and 10 for women. For the women the records are numbered 1 to 10. For the men they are numbered 2 to 11!!!

My code:

-- seq_oddity.sql
-- BUG ON 11.2.0.1 ??
-- though I create both sequences afresh, the first use of seq_m is getting 2, yes two!
-- cannot work out why, so fixed by setting the row with forn_id of 130 to 1.
set timi off
spool seq_oddity.lst
--
drop table test_fn purge;
drop sequence seq_m;
drop sequence seq_f;
create table test_fn
(forn_id number(5) not null
,forname varchar2(30) not null
,sex_ind char(1) not null)
tablespace users
/
create sequence seq_m;
create sequence seq_f;
insert into test_fn values (seq_m.nextval,'ALAN','M');
exec dbms_output.put_line ('I have just created male name number '||seq_m.currval);
insert into test_fn values (seq_m.nextval,'BARRY','M');
insert into test_fn values (seq_m.nextval,'CHRIS','M');
insert into test_fn values (seq_m.nextval,'DAVID','M');
insert into test_fn values (seq_m.nextval,'EDWARD','M');
insert into test_fn values (seq_m.nextval,'JANG','M');
insert into test_fn values (seq_m.nextval,'GARY','M');
insert into test_fn values (seq_m.nextval,'HARRY','M');
insert into test_fn values (seq_m.nextval,'IAN','M');
insert into test_fn values (seq_m.nextval,'JAMES','M');
exec dbms_output.put_line ('I created 10 men and last value was '||seq_m.currval);
--
--
--
insert into test_fn values (seq_f.nextval,'ALISON','F');
exec dbms_output.put_line ('I have just created female name number '||seq_f.currval);
insert into test_fn values (seq_f.nextval,'BARBARA','F');
insert into test_fn values (seq_f.nextval,'CHERYL','F');
insert into test_fn values (seq_f.nextval,'DAWN','F');
insert into test_fn values (seq_f.nextval,'ELAINE','F');
insert into test_fn values (seq_f.nextval,'FRANCIS','F');
insert into test_fn values (seq_f.nextval,'GILLIAN','F');
insert into test_fn values (seq_f.nextval,'CHERRY','F');
insert into test_fn values (seq_f.nextval,'INGRID','F');
insert into test_fn values (seq_f.nextval,'JANET','F');
exec dbms_output.put_line ('I created 10 women and last value was '||seq_f.currval);
--
select sex_ind,min(forn_id),max(forn_id)
from test_fn
group by sex_ind/
--
spool off

The results are:

mdw11> @seq_oddity
drop table test_fn purge
           *
ERROR at line 1:
ORA-00942: table or view does not exist

drop sequence seq_m
              *
ERROR at line 1:
ORA-02289: sequence does not exist

drop sequence seq_f
              *
ERROR at line 1:
ORA-02289: sequence does not exist

Table created.

Sequence created.

Sequence created.

1 row created.

I have just created male name number 2

1 row created.
1 row created.
...
1 row created.

I created 10 men and last value was 11


1 row created.

I have just created female name number 1

1 row created.
1 row created.
...
1 row created.
I created 10 women and last value was 10


S MIN(FORN_ID) MAX(FORN_ID)
- ------------ ------------
M            2           11
F            1           10
2 rows selected.

mdw11>

I suppose I should trace this and check for known oracle bugs but I wondered if anyone else had seen it.

BGOUG: I’m on my way…

I’m at the airport waiting for my flight to Sofia (via Munich), on my way to the BGOUG conference in Hissarya.

It’s early and I’m totally knackered. I had a killer tabata-style circuits class last night and a nightmare tabata-style kettlebell class the night before. I still have bruises from the latter. If only I could stop eating rubbish I would look quite buff. :)

This is my last overseas event of the year and I’m glad it is the BGOUG event. Anything else and I think I would be looking for an excuse not to go. It will be great to get back amongst my Bulgarian buddies again.

Cheers

Tim…




Mike Carey: Dead Men’s Boots…

Number 3 in the Felix Castor series from Mike Carey, Dead Men’s Boots is about ghosts of criminals possessing humans to effectively allow them to live forever. I got about 2/3 the way through this book then left it on a plane and was stuck for a couple of weeks. As soon as I got home I bought another copy and continued.

I am now totally invested in Felix Castor. I’ve got two more books to read (and another due out at the end of the year) and I’m already starting to get separation anxiety at the thought of finishing the series. This is exactly how I felt about Harry Dresden when I was reading The Dresden Files.

Cheers

Tim…