It seems I am doing a lot of fixing broken stuff recently. So this time I have been asked to repair a broken 8 node RAC cluster on OEL 5.5 with Oracle RAC 184.108.40.206. The system has been moved into a different, more secure network, and its firewalls prevented all access to the machines except for ILO. Another way of “security through obscurity”. The new network didn’t allow any clients to connect to any of the 8 node RAC which means that it is actually quite expensive kit to sit idle. The cluster is not in production, it’s still being build to specification but this accessibility problem has been a holdup to the project for a little while now. Yesterday has been a breakthrough-the netops team found an error to their configuration and for the first time the hosts could be accessed via ssh. Unfortunately for me that access is possible via audited gateways using PowerBroker to which I don’t have access.An alternative was the ILO interface which has not yet been hardened to production standards. So after some discussion internally I was given the ILO access credentials. This is good and bad: good, because it was a thoroughly broken system, and bad because there is no copy and paste with a java based console. And if that wasn’t bad enough, I had to contend myself with 80×24 characters on the console (however in very big letters). I pretty much needed all of my 24″ screen to display it. But I digress.
When logging on, I found the following situation:
Running /sbin/ifconfig has been a dream on this machine – I saw all 3 SCAN IPs on it, and all 8 node virtual IP addresses. Plus it has 6 NICs for Oracle, bonded into pairs of 2. And this is exactly where the confusion starts. I found the following bonded interfaces defined:
It took a while to figure out why these interfaces were named as they were, but apparently the suffix is a VLAN name. It also filtered through that one of my colleagues has tried to replace the previously used bond0.212 with bond0 as the public interconnect. He was however not successful in doing so, leaving the cluster in the state it was in.
He used the following commands to update the public interface:
$ oifcfg getif bond1.251 172.xxx.0 global cluster_interconnect bond0 10.2xxx8.0 global public
He also changed the vip configuration, with the result shown here:
srvctl config vip -n node11 VIP exists: /node1-vip/10.2xx8.13/10.2xx8.0/255.255.255.0/bond0, hosting node node11
The VIP however remained unimpressed:
srvctl start vip -n node1 PRCR-1079 : Failed to start resource ora.node1.vip CRS-2674: Start of 'ora.net1.network' on 'node1' failed CRS-2632: There are no more servers to try to place resource 'ora.node1.vip' on that would satisfy its placement policy
That’s where I have been asked to cast a keen eye over the installation.
First of all I could find nothing wrong with what has been done so far. So starting my investigation I first thought there was something wrong with the public network so I decided to shut it down:
# ifdown bond0
I then checked the network configuration of /etc/sysconfig/network-scripts. The setting is shown here:
device=bond0 bonding_opts="use_carrier=0 miimon=0 mode=1 arp_interval=10000 arp_ip_target=10.xxx.4 primary=eth0" bootproto=none onboot=yes network=10.2xxx.0 netmask=255.255.254.0 ipaddr=10.xxx.2 userctl=no
device=eth0 hwaddr=f4:ce:46:87:fa:d0 bootproto=none onboot=yes master=bond0 slave=y userctl=no
device=eth1 hwaddr=f4:ce:46:87:fa:d4 bootproto=none onboot=yes master=bond0 slave=yes userctl=no
The MAC addresses of ifcfg-eth* matched the output from the ifconfig command. In the lab I occasionally have the problem that my configurartion files don’t match the real MAC addresses and therefore my NICs don’t come up. But this wasn’t the case here.
I then checked if the kernel module is loaded correctly. Usually you’d find that in /etc/modprobe.conf but there was not entry. I added these lines as per the documentation:
alias bond0 bonding alias bond1 bonding alias bond1.251 bonding
With that all done I brought the bond0 interface back up (don’t ever try to bring down the private interconnect-it will cause a node eviction!). Still nothing. The output of crsctl status resource -t remained “OFFLINE” for resource ora.net1.network. BTW, you cannot manually start that a network resource using srvctl (it’s an ora.* resource so don’t even think about trying crsctl start resource ora.net1.network :). All you can do with a network resource is to get its configuration (srvctl config network -k 1…) and modify it (srvctl modify network -k 1…)
ORAROOTAGENT is responsible for starting the network, and it will try to do so every second or so. That’s CRSD’s ORAROOTAGENT by the way, the log file is in $GRID_HOME/log/`hostname -s`/agent/crsd/orarootagent_root/orarootagent_root.log.
After the modification to bond0 I could now ping the IP associated with bond0 so at least that was a success. One thing I learned that day is that the MAC address of the bonded NIC matches the primary eth* interface’s NIC, in my case it was that of eth0, i.e. f4:ce:46:87:fa:d0. If one of the enslaved NICs failed it would probably assume the failback NIC’s MAC address. So in summary:
At this point I could see no reason why starting of the network failed. Maybe a typo in the configuration? The network configuration can be queried with 2 commands: oifcfg and servctl config network. So I tried oifcfg first.oifcfg getif returns:
bond0 10.xx.x2.0 "good" bond0 10.xx.x8.0 "old/bad" bind1.251 172.xx.xx.160 interconnect bind1.251 169.254.0.0
Hmmm, where’s that second bond0 interface from? The bond1.251 interface is in use and working, the 172.xxx IP matches the IP address assigned in ifcfg-bon1.251. The second entry for bind1.251 is created by the HAIP resource and has to do with the high available cluster interconnect which uses multicasting for communication (to the frustration of many users who upgraded to 220.127.116.11 only to find out that the lower stack doesn’t start on the second and other nodes).
So to be sure that I was seeing something unusual I compared the output with another node on the cluster. There I found I only have 3 interfaces …. bond0 and bond1 + the UDP multicast address. I initially tried to remove the bad network with oifcfg delif but that didn’t work. I then verified the output of srvctl config network to see if it matched what I expected to. And here was a surprise: the output of the network listed a wrong subnet mask. Instead of 255.255.254.0 (note the “254″!) i found 255.255.255.0. That was easy to fix and while I was back again trying to delete the old network using oifcfg I suddenly realised that the cluster has sprung back into life. Small typo-big consequences! Finally all the resources depending on ora.net1.network were started, including SCAN VIPs, SCAN listeners, listeners, VIPs…
References for NIC bonding on RHEL5
Followers of the blog will know I’ve had a little trouble with Oracle Norway randomly taking money off me for no reason.
Today I got the money refunded, but there was a snag. I was refunded less than the amount that was taken. I’m guessing this has something to do with exchange rates etc. So as it stands I am about £40 out of pocket, which is significantly better than the several thousand pounds I was before.
As you can image, I sent an email off this morning asking for the missing money. Let’s see how quickly that is dealt with.
February 11, 2011 (Updated February 25, 2011) On August 26, 2010, just before cancelling my first three month old order for the “Oracle Tuning: The Definitive Reference Second Edition” book, I wrote the following in a blog article (side note: the second order for the book is now more than three months old, and I have [...]
I’m really not sure what to think about the collaboration between Nokia and Microsoft. Prior to my recent switch to HTC, I’ve always used Nokia phones, so I have a soft spot for the company, but this recent announcement has me in two minds.
My first reaction was this move is a complete disaster for Nokia and a big bonus for Microsoft. Nokia ships a serious amount of phones, so Microsoft will quickly get some impressive numbers, which is great for them, but what do Nokia get out of it? Currently it seems they get a mobile platform that nobody really wants or cares about.
My second reaction was maybe this is the right move for Nokia. Symbian has a lot of the market share at the moment, but it is going down hill very quickly. They need to make a move, but where to go? If they go Android they will be just another manufacturer in the mix. They would be better off than they are now, but could they dominate this market? If they go Windows they could mark themselves out as the dominant force in this market. The other offerings in this space look rather weak. As Windows Phone develops, with Nokia’s help, maybe this could be a very attractive market.
Of course, only time will tell, but I know one thing. As the mobile OS market currently stands, I won’t be buying a Nokia phone running Windows Phone.
A bit before Christmas I got an email from Kevin Closson asking me to take a look at a new undocumented Direct NFS (DNFS) feature in the 18.104.22.168 patch set. I think he wanted to see what a regular DBA would think of it. What with Christmas and some family issues, I didn’t get too much done. As soon as I hit the first hurdle I kinda caved in and left if for the new year.
Well, January came and went, then I finally got round to looking at it again. I like to think my constant questions and dumb mistakes has helped to prepare Oracle for the sort of thing that will happen when other idiots like me are let loose on it. Anyway, the result of that little journey is documented here.
As I’ve said in the article, things are still in a state of flux and I will no doubt have to do some alterations once the My Oracle Support (MOS) Note 1210656.1 is released that will properly document it.
So as a regular DBA what do I think? I think it is awesome!
Old-style cloning of databases isn’t hard, but it’s boring and can take ages depending on the size of the database and storage being used. It’s one of those tasks that always makes me sigh, before I get off my ass and start it. Clonedb turns that on its head because it is really quick and simple. There is a bit of setup, but that is really going to be a one-time thing on most servers. You are doing your backups anyway, so there is no big deal there. Now you can just run a script and bang, you have a running clone.
I think this is going to please a lot of DBAs out there!
Warning – make sure you read to the end of this post.
Someone sent me an email this morning asking how Oracle calculates the index cardinality of an index range scan. Now, as I’ve often said, I don’t do private email about Oracle – unless it just happens to catch my attention and looks like something that is sufficiently generic to be worth publishing.
Today’s emailer was a lucky winner – he’d sent me a very short email that took about 30 seconds to read, contained a significant error, and (at first sight) probably had the right information in it for me to work with. Here’s the problem, as a cut-n-paste from an SQL*Plus session:
SQL> explain plan for select * from admlxa.QRT_BENCH where QRT_BENCH_DATE < :a3; Explained. Elapsed: 00:00:00.01 SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ----------------- Plan hash value: 2896103184 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 424K| 81M| 3170 (4)| 00:00:05 | | 1 | TABLE ACCESS BY INDEX ROWID| QRT_BENCH | 424K| 81M| 3170 (4)| 00:00:05 | |* 2 | INDEX RANGE SCAN | IDX_QRT_BENCH_1 | 78876 | | 303 (5)| 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("QRT_BENCH_DATE"<:A3)
Look closely at the “Rows” column – there’s clearly a logic error appearing here. If you select 78,876 rowids from an index you can’t possibly acquire 424,000 rows from the table – so where have those two numbers come from ?
The supporting information looked like this:
num_rows for QRT_BENCH table = 8480798 num_distinct for QRT_BENCH_DATE column = 458 num_rows for IDX_QRT_BENCH_1 = 8763975 distinct_keys for IDX_QRT_BENCH_1 = 537
Of course, I really needed to know whether this was a single-column or multi-column index if I wanted to model the problem correctly and do further checks on when the error appeared, but this was good enough to get started. We note, first, that the 424K for the table cardinality is just the standard “5% due to range predicate with unknown value”: round(0.05 * 8480798) = 424040.
Step 2: Does 5.376 look familiar — it may be a coincidence, but isn’t that really close to 1% of the number of distinct keys in the index ?
At this point I don’t have any time to investigate in detail, but a scratch hypothesis is that Oracle is calculating something like: 5 * (number of rows in table / (number of distinct keys in index)); and maybe that magic five appears through a piece of code that takes 5%, but for some reason then divides by the 1% associated with the selectivity normally associated with function(col).
If I had the time (and the data set) I’d start playing with dbms_stats.set_index_stats() et.al. to see the effect of changing num_distinct and num_rows to see if my initial guess was somewhere in the right ballpark. As it is I’ve just emailed this note back to the source.
Before I published this note I got a reply from the original correspondent, with the following comment:
Sure enough – five years ago I had published some details about exactly this “feature” – and that 537 vs. 5.376 really was just a coincidence.
I decided to publish the note anyway for three reasons –
I’ve been ignoring the “social network” phenomenon for a long time but I’ve finally given in and, as some people have already discovered, signed up to LinkedIn. I don’t know whether this is a good, bad, or pointless thing to do, but I thought I’d give it a go for a while and see what happens.
At present, though, I’m only linking to people who fall into one of several restrictive categories: I know them “outside” the internet, or have worked for them, or collaborated with them, or had a length technical exchange with them either by email or in person at a technical conference.
So if you send me a “Join my network” and I don’t do anything about it, please don’t be offended – it just means you don’t belong to a fairly restricted group. (On the other hand, since I’ve averaged about 3 days per site at 40 client sites per year for the last 10 years, it’s possible that you’ve been let down by my appalling memory for names.)
Just had a really pleasent exchange with Oracle support. I was after a way to purge the repository database of an OEM 11.1 Grid Control installation without having to blow it all away. Unfortunately, there is no such option. However, what I liked was this final update from the support member:Generic Note
From sunny Colorado – blue sky and SNOW! – I do wish we could have provided a better option.But I do want to thank you so much for your kindness and patience. You are the best kind of customer to work with. That means a lot, in these challenging jobs.
The whole SR was well and competently managed by Thom, and at no time did he come up with techniques to buy more time by asking for irrelevant log files or similar. I wish more support staff were like him.
I read a post this morning by Grant Ronald talking about fusion apps. In Grant’s post he mentioned things that people have been saying about Fusion over the years. Middleware and Apps are not my specialist field, but I get to hear a lot about them from the conferences and ACE Director meetings, so I have been witness to the Oracle Fusion myth from the beginning.
Cast your mind back several years and the whole concept of Fusion was launched at OOW. We were told that the middleware stack was going to become a single coherent product, rather than the buggy rag-tag bunch of technologies we had in 9iAS and AS10g. Sounds good so far, but then all the existing stuff got rebranded as Fusion Middleware when the products it was made up of hadn’t significantly changed. That’s confusing.
Fast forward a bit and we were expecting something like real Fusion Middleware to appear, then the BEA buyout was announced and WebLogic became the core of Fusion Middleware. Oh. So this wonderful coherent product that Oracle had been developing and we were expecting soon was swapped for a best-of-breed app server from an acquisition. Strange and a little disconcerting, but at least we have a better app server now, except that some of the existing features still required you to install the old AS10g stuff. Still the name Fusion is plastered everywhere.
Fast forward a bit more and we have got to a point where applying the term “Fusion” to the middleware stack is less insulting, but if anyone experienced Fusion along the way they would probably have been left with a bad feeling about what Fusion actually means. It’s very hard to overcome a bad first impression. Are Oracle really surprised that the term “Fusion” is associated with myth and confusion?
OK. That’s the Middleware. What about Fusion Apps? Well, the name includes the word “Fusion”, so it takes on all the bad connotations associated with the infancy of Fusion Middleware. Added to that, since the original announcement of Fusion Apps there have been numerous acquisitions, all of which have no doubt added to the confusion about what Fusion Apps actually is. Then we are told there is no natural upgrade from eBusiness Suite to Fusion Apps. It’s a new product and we have to migrate data to it as we would any new ERP. Next we are told that the initial release will only be a subset of the modules we require, so we will have to run it alongside eBusiness Suite. Wow. This is really confusing. That sounds like a half-finished ERP built on a half-finished middleware stack. Once again, are Oracle really surprised people react like this?
Now I’m not saying the Fusion Middleware is bad. It’s come a long way. I’m also not saying Fusion Apps are bad. I’ve seen the demos and they look amazing. I’ve also talked to people in that field who are genuinely impressed and exited by it. I believe it will be a big eye opener and possibly a game-changer for a lot of people. What I’m saying is I can totally understand when people on the outside of our little goldfish bowl have a really bad and confused impression of anything containing the term “Fusion”, because it does have a very long and sordid history.
In my opinion the term Fusion needs to be scrapped and replaced, then perhaps we can forget the history and focus on the now. Kinda like they did with Beehive.
I’ve always wanted some sort of calendar view of load where I could see patterns across the same days of the week and same hours of the day and then be able to pick different periods and diff them:
The first step of such a display would be selecting the data in such a way to represent the graphic. A graphic should be harder to do than a similar, though less powerful, ascii representation.
So here is a quick sketch of looking at redo generation over time. Right now I have to monitor how well redo can be archived which depends on how it fast it is generated. (typically what I’m more interested in is the general performance which is what the above graphic addressed)
The output below shows the redo generated per hour on average (picking the maximum hour in the bucket which is 4 hours long below) and the maximum generation at the minute level. I may or may not be able to archive redo at the up to the minute but I do want to make sure I can keep up with each hour.
DAY HOUR MAX(MB) w1 w2 w3 w4 w5 SAT 0 9 5 4 4 3 SAT 4 12 4 5 4 4 SAT 8 1 0 0 0 0 SAT 12 0 0 0 0 0 SAT 16 1 0 0 0 0 SAT 20 4 1 0 0 1 SUN 0 10 3 3 1 3 SUN 4 13 6 5 3 5 SUN 8 7 6 6 0 6 SUN 12 7 0 3 0 4 SUN 16 1 0 0 1 0 SUN 20 8 3 2 0 3 MON 0 8 3 2 3 2 MON 4 7 2 3 2 1 MON 8 5 1 0 2 0 MON 12 1 0 0 1 0 MON 16 1 0 0 0 0 MON 20 7 2 2 0 2 TUE 0 14 6 5 7 4 TUE 4 7 1 1 1 2 TUE 8 3 0 0 0 0 TUE 12 1 1 0 0 0 0 TUE 16 1 1 0 0 0 0 TUE 20 3 1 1 1 1 1 WED 0 8 3 2 3 2 2 WED 4 7 2 1 3 2 2 WED 8 8 1 0 0 2 3 WED 12 7 1 0 0 1 1 WED 16 1 1 0 0 0 1 WED 20 4 1 1 1 1 1 THU 0 15 7 8 6 6 6 THU 4 8 2 1 1 1 1 THU 8 1 1 0 0 0 1 THU 12 16 1 11 0 0 1 THU 16 1 1 0 0 0 1 THU 20 4 1 1 1 1 1 FRI 0 11 2 2 2 2 2 FRI 4 8 3 1 1 1 1 FRI 8 4 1 0 0 0 0
set heading off set feedback off set pagesize 0 with pivot_data as ( select WW pivot_col , DY , D , HH , max(maxval) mv , max(average) av from ( select distinct begin_time, 4*trunc(to_char(begin_time,'HH24')/4) HH, to_char(begin_time,'DY') DY, mod(to_char(begin_time,'D'),7) D, mod(to_char(begin_time,'WW'),5) WW, average/(1024*1024) average, maxval/(1024*1024) maxval, snap_id from DBA_HIST_SYSMETRIC_SUMMARY where dbid=&&DBID and metric_name='Redo Generated Per Sec' and begin_time > sysdate - 31 ) group by HH,D,DY,WW ) select DY ||' '|| HH ||' '|| round(max(mv),0) ||' '|| round(max(w1),0) ||' '|| round(max(w2),0) ||' '|| round(max(w3),0) ||' '|| round(max(w4),0) ||' '|| round(max(w5),0) from ( select * from pivot_data pivot ( avg(av) for pivot_col in ( 1 as w1,2 as w2,3 as w3 ,4 as w4 ,5 as w5 ) ) ) group by DY,D,HH order by D,HH / set heading on set feedback on set pagesize 30
set pagesize 100 col DY for A4 col HH for 99 col mx for 99 col w1 for 99 col w2 for 99 col w3 for 99 col w4 for 99 col w5 for 99 with pivot_data as ( select WW , DY , D , HH , max(maxval) mv , max(average) av from ( select distinct begin_time, 4*trunc(to_char(begin_time,'HH24')/4) HH, to_char(begin_time,'DY') DY, mod(to_char(begin_time,'D'),7) D, mod(to_char(begin_time,'WW'),5) WW, average/(1024*1024) average, maxval/(1024*1024) maxval, snap_id from DBA_HIST_SYSMETRIC_SUMMARY where metric_name='Redo Generated Per Sec' and begin_time > sysdate - 31 ) group by HH,D,DY,WW ) select DY, HH, round(max( mv ),0) mx, round(max( decode( WW, 0 , av, null ) ),0) w1, round(max( decode( WW, 1 , av, null ) ),0) w2, round(max( decode( WW, 2 , av, null ) ),0) w3, round(max( decode( WW, 3 , av, null ) ),0) w4, round(max( decode( WW, 4 , av, null ) ),0) w5 from pivot_data group by DY,D,HH order by D,HH /