Who's online

There are currently 0 users and 46 guests online.

Recent comments


Oakies Blog Aggregator

Little Things Doth Crabby Make – Part XVI (Addendum). Hey ls(1) And du(1) Are Supposed To Agree.

My last installment in the Little Things Doth Crabby Make  series had a lot of readers stepping up to remind me that ls(1) and du(1) aren’t always supposed to report the same size-related information on files. Uh, I actually knew that!

The post wasn’t about sparse files or any other such remedial aspects of file sizes.

In the post I mentioned that I was taking some rather unseemly actions against my XFS file system.

One particular unseemly thing I did was a the result of a bug in a small piece of my code.  Imagine for a moment that the loff_t variable sz in the following snippet was stupidly uninitialized/unassigned and the program steps on this syscall(__NR_fallocate,,,,) landmine.

 if ((ret = syscall(__NR_fallocate, fd, 0, (loff_t)0, (loff_t)sz)) != 0 )
 perror ("syscall.fallocate");

Well, if whatever happens to be stored in the variable sz is a really large value you’ll have a.out (allocate_file in my case) spinning in kernel mode for the rest of your life (at least on a 2.6.18 kernel). However, I got tired of it shortly after I snapped the following top(1) information:

 top - 11:47:27 up 3 days, 17 min, 4 users, load average: 1.00, 1.00, 1.00
 Tasks: 481 total, 2 running, 479 sleeping, 0 stopped, 0 zombie
 Cpu(s): 0.0%us, 4.2%sy, 0.0%ni, 95.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
 Mem: 49451520k total, 4065088k used, 45386432k free, 121492k buffers
 Swap: 50339636k total, 1044k used, 50338592k free, 3609352k cached
 12682 root 25 0 3648 308 248 R 99.7 0.0 880:23.09 allocate_file
 3997 root 15 0 13008 1416 816 R 1.0 0.0 10:25.16 top
 10100 gpadmin 15 0 111m 17m 2032 S 1.0 0.0 9:13.49 collectl
 1 root 15 0 10352 692 580 S 0.0 0.0 0:13.40 init
 2 root RT -5 0 0 0 S 0.0 0.0 0:00.10 migration/0
 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
 5 root RT -5 0 0 0 S 0.0 0.0 0:00.10 migration/1
 6 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
 7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
 8 root RT -5 0 0 0 S 0.0 0.0 0:00.21 migration/2
 9 root 34 19 0 0 0 S 0.0 0.0 0:00.08 ksoftirqd/2
 10 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/2
 11 root RT -5 0 0 0 S 0.0 0.0 0:04.91 migration/3
 12 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/3
 13 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/3
 14 root RT -5 0 0 0 S 0.0 0.0 0:00.09 migration/4

It turned out my stupid error put the file system up to the task of allocating nearly 14TB to a file in a file system with about 200GB free. My mistake. However, the call should have failed instead of leaving me with a kernel-mode process that required a server reset to clear. But, alas, I was using a very old interface. If the particular test system I was investigating was running a more recent kernel I would have called fallocate(2) and the situation would most likely have been different but the kernel was older than the 2.6.23 minimum requirement for the fallocate(2) call.

So what does this have to do with ls(1) and du(1). Well, I had a lot of programs running that were thrashing the file system. I unearthed a race condition of some sort where my looping call to ls(1) managed to catch a glimpse of the file being populated by PID 12682 (see the top(1) output above). The ls(1) command reported zero bytes. The next line of the script executed microseconds (or less) later at which point du(1) was under the opinion the file was 287GB. Both the initial and subsequent df(1) information was consistent. I haven’t studied the transactional nature of this old rendition of fallocate so I can’t speculate what was going on. The only thing executing on the system at the time was, indeed, several invocations of the allocate_file program. It turns out that none of them branched to that call with an uninitialized grenade—as it were.

I was unable to reproduce the situation and lost interest after fixing that stupid bug in the allocate_file program.

If there is any moral to this story it would be that the level of unpredictability is unpredictable if a process unpredictably asks the kernel to do something it cannot possibly do such as allocate terabytes to a file in gigabytes of free space. I would predict, however, that >2.6.23 fallocate() would handle my goofy mistake differently.

I hate it when I can’t reproduce a problem.

Filed under: oracle

BGOUG Autumn Conference

Well it's was an enjoyable few days in Bulgaria, to say the least, although perhaps it tailed off a little towards the end through my own fault!

The trip out was fun because between the ACE Director Program and the Bulgarian Oracle User Group all my expenses were covered this time so I treated myself to an upgrade to business. The only sensible flights I could get were on BA so although I am a Star Alliance junkie and had to forgo air miles this time, I was pleased that BAs service was as good as I remembered. Terminal 5 was very swish and efficient, the lounge great and the inflight service very comfy with a nice breakfast ;-)

As I've said before, the overseas conferences I attend alway seem to be organised by extremely passionate and friendly people who go out of their way to welcome and take care of you. So, when I arrived at SOF I met Dimitri Gielis who had been waiting for me for a while with Stoyan Ivanov of the BGOUG board who was going to drive us both to Hissarya. At a little over 2 hours, it's not a taxi trip I would have been too confident of! So thanks to Stoyan for the lifts both from and to the airport and the good conversation :-)

It was still good to arrive at Hissarya, though, which was obviously a beautiful small town that I would investigate later ...maybe on Saturday after my presentations?

Although the Hotel Hissar Spa Complex was splendid, Dimtri and I both struggled with the layout at first and I suspect others did as well. When the lady at reception recommended I wait for the porter to show me to the room, I laughed that off and she drew me a picture of various lifts, floor plans and so on and despite me leaving the desk confidently, she may have had a point. Because the complex consists of several buildings built at different times on a hill-side, it turned out that the crucial part I didn't understand was that the 3rd floor became the 8th floor when you walked through a set of fire doors. It was a while before I found my room and it took me a few attempts to get used to it, but I got there eventually. (Which was pretty fortunate, given the state of my navigational skills during subsequent attempts!)

After a quick beer or two with Dimitri and the sudden realisation that this was a hotel bar with beers for £1 (gulp!) we moved on to dinner which was a surprisingly large couple of tables once all the speakers and BGOUG crowd showed up. The main things I remember are

- Several techies being less than positive about RAC. Some were more convinced and it was just dinner talk, but was funny when added to ....
- Carol Merriman's pre-presentation terror about having to talk to a bunch of techies about the Oracle Database Appliance.
- My ability to scrape cheese off stuff and eat it if I'm hungry enough. (The stuff under the cheese, that is, not the cheese itself!)
- Milena making sure I had a personal bottle of white wine even though rakia and red wine are the only alcoholic drinks a sensible man would drink in Bulgaria! (Although there seems to be quite an appetite for whisky too!)

All in all it was a great way to prepare everyone for the conference to come ...

I only managed to attend a couple of presentations on Day 1. Dimitri's presentation on Apex charts, which was an eye-opener for me because I admit it's a long time since I looked at Apex and things have clearly moved on a lot with Apex 4 and Dimitri has a deeper knowledge than most of what's new in Apex and what might be coming along soon. Definitely worth another look in one of my many free hours. (That's a joke)

For the rest of the day I flitted between lunch, some final presentation preparation and popping into Carol's presentation on the Oracle Database Appliance. I didn't expect to learn very much after attending Openworld but wanted to go along to offer a little moral support because I knew she was nervous about it. In the end, she did a pretty good overview and there was time for an interesting Q & A session at the end where I think I was able to help out a little. I'd summarise the feeling in the room as being - ouch, still a little expensive for small customers - but maybe it's about your definition of SME and maybe the definition changes in different markets?

With the final preparation for my own presentation I only really had the chance to catch the tail-end of Toon Koppelaar's second presentation but I know Toon quite well and am very familiar with his work from the Hotsos Symposium and plenty of other conferences. Later he would chide me about my lack of progress on his and Lex's book ;-)

I thiink the latest Statistics on Partitioned Objects presentations went reasonably well. It felt fun for me and that's always a good sign, the room was very busy and lots of people were complimentary about it later. I was particularly touched that several other presenters like Toon were so complimentary about it. But it also made me realise that it does need some work before UKOUG because it's too long and there's probably a little too much whining about old problems and not enough discussion about new solutions. Regardless, I was reasonably happy and ready for the party.

Oh, and what a party! Memories include :-

- Self-service buffet, so I could avoid any sign of cheese ;-)
- Performances from Bulgarian dancers followed by a majority of the attendees joining in an extended dancing snake that unfurled around the room. If you imagine a side-ways conga with Tim Hall waving a napkin above his head, you'd get a reasonable picture! I'd include some photos of Tim but either they're a little blurred or I simply can't risk him posting photos of me dancing very briefly in retaliation!
- Toon realising that if he was due to leave for his flight at 3:30, he might just have to stay awake.
- Brilliant Bulgarian guys insisting I drink the strongest Rakia and whisky and all sorts of stuff I wouldn't normally touch but their company was so good I didn't feel I could refuse ;-)
- The dangers of me finding an onsite night-club complete with ashtrays the night before a presentation.
- Deciding at 4:30 that maybe I should go to bed.

I think the most positive things I can say about Saturday morning's 10am presentation is that I was there, I did talk for an hour (whilst bolting down glasses of water) and that it was well-attended. I felt pretty bad about this presentation because people made the effort
to show up after the party (they have much stronger constitutions than mine!) and maybe thought it would be ok after the
previous evenings presentation but it wasn't close. I need to remember that although it is just about possible to present after a few hours sleep, it's not recommended!

So, erm, then it all goes a bit quiet as I managed to grab some extra sleep before the final night's dinner which was a much more low-key affair but none-the-less enjoyable for that before the trip home on Sunday.

It was a terrific conference, I met lots of new fun friends and you can be sure I'll be going back if I get another opportunity and I am not scheduled to speak in the first slot after the party ;-)

Usual disclaimer: My travel and accommodation expenses were covered primarily by BGOUG with some assistance from the Oracle ACE Director program. I believe I may have said something quite nice about the Oracle Database Appliance at some point but I've forgotten what it was now .... My thoughts on the BGOUG and Milena Gerova's efforts are all completely positive. Several conference veterans agree - this lady is the ultimate conference organiser!

Solaris IPS documentation

Oracle’s Image Packaging Service (IPS) is documented here:

From here there is a step by step example of creating a package here:

Here are some of the steps


  mkdir proto
  mkdir proto/opt
  # add the package files in opt
  pkgsend generate proto | pkgfmt > mypkg.p5m.1
  cat  << EOF > mypkg.mog
set name=pkg.fmri value=mypkg@1.0,5.11-0
set name=pkg.summary value="This is our example package"
set name=pkg.description value="This is a full description of \
all the interesting attributes of this example package."
set name=variant.arch value=\$(ARCH)
set name=info.classification \
link path=usr/share/man/index.d/mysoftware target=opt/mysoftware/man
  pkgmogrify -DARCH=`uname -p` mypkg.p5m.1 mypkg.mog  | pkgfmt > mypkg.p5m.2
  pkgdepend generate -md proto mypkg.p5m.2 | pkgfmt > mypkg.p5m.3
  pkgdepend resolve -m mypkg.p5m.3
  pkglint mypkg.p5m.3.res
  pkgrepo create /scratch/my-repository
  pkgrepo -s /scratch/my-repository set publisher/prefix=mypublisher

Now  with these steps done, the package is ready to  be published. Here is the command as stated in the documentation.

  pkgsend -s /scratch/my-repository/ publish -d proto mypkg.p5m.4.res

What happens? The pkgsend command just hangs.
After using “truss” it’s easy to see that pkgsend is waiting for input from stdin.

  24440:  read(0, 0xFEF79C20, 1024)       (sleeping...)

Hitting control D, “^D”, gives:

   pkgsend: The URI '/scratch/my-repository/' contains an unsupported scheme ''.

which leads to point to a slightly different syntax:

   pfexec pkgsend -s file:///scratch/my-repository/ publish -d proto mypkg.p5m.4.res

hitting ^D gives:

  'The specified FMRI, 'pkg:/mypkg.p5m.4.res', has an invalid version.

There is not much information on this error, but some here

In every case, pkgsend is trying to read from standard in. The documentation does say that pkgsend will read from standard in when the manifest is not on the command line:

publish [-b bundle ...] [-d source ...] [-s repo_uri_or_path] [-T pattern] [--no-catalog] [manifest ...]

Publishes a package using the specified package manifests to the target package repository, retrieving files for the package from the provided sources. If multiple manifests are specified, they are joined in the order provided. If a manifest is not specified, the manifest is read from stdin.

But on the command line used, the manifest is  specified, so what is happening?

Scanning the command like options turns up:

        pkgsend [options] command [cmd_options] [operands]
  Packager subcommands:
        pkgsend open [-en] pkg_fmri
        pkgsend add action arguments
        pkgsend import [-T pattern] [--target file] bundlefile ...
        pkgsend include [-d basedir] ... [-T pattern] [manifest] ...
        pkgsend close [-A | [--no-index] [--no-catalog]]
        pkgsend publish [-d basedir] ... [-T pattern] [--no-index]
          [--fmri-in-manifest | pkg_fmri] [--no-catalog] [manifest] ...
        pkgsend generate [-T pattern] [--target file] bundlefile ...
        pkgsend refresh-index
        -s repo_uri     target repository URI
        --help or -?    display usage message

and there is an interesting command line option that looks a bit reminiscent of the last error “–fmri-in-manifest“. Giving it a shot:

  pfexec pkgsend -s file:///scratch/my-repository/ publish -d proto --fmri-in-manifest mypkg.p5m.4.res


And it works!

Select For Update – In What Order are the Rows Locked?

November 21, 2011 A recent thread in the usenet group asked whether or not a SELECT FOR UPDATE statement locks rows in the order specified by the ORDER BY clause.  Why might this be an important question?  Possibly if the SKIP LOCKED clause is implemented in the SELECT FOR UPDATE statement?  Possibly if a [...]

HOWTO: XDB Repository Events – An Introduction

Oracle XMLDB Repository Events, IMHO, was one of the coolest functionalities introduced in Oracle 11.1. In principal they are a kind of event “triggers” that get fired during actions / methods on objects in the XDB Repository. One of the disadvantages of this functionality is that they are very “sparsely” documented in the Oracle XMLDB

Read More…

CBO isn’t perfect

And you should remember that. Here is a nice example how Cost Based Optimizer can miss an obvious option (which is available to human eye and Oracle run-time with a hint) while searching for the best plan. CBO simply doesn’t consider Index Skip Scan with constant ‘in list’ predicates in the query, although it costs skip scan for a join. Such bits are always popping up here and there, so you just can’t say “The Cost Based Optimizer examines all of the possible plans for a SQL statement …”, even if Optimizer Team tells you CBO should do so. There will always be places where CBO will do less than possible to come to the best plan and will need a help from your side, such as re-written SQL or a hint.

Filed under: CBO, Hints, Oracle Tagged: skip scan

HOWTO: Implement Versioning via Oracle XMLDB

Since a long time, the database has had some versioning capabilities, long before features like “Edition Based Redefinition” in Oracle 11gR2 appeared. This versioning, via XMLDB functionality, is based on its XDB Repository access to the database. The XDB Repository is a file/folder metaphor that acts as a file server. You can enable this functionality

Read More…

UKOUG 2011: Using your Database as a Fileserver

One of the coolest things in Oracle 11g and onwards is a functionality called XDB Repository Events. Most of you probably know that based on XMLDB functionality in the database, the database also can be used in a File server kind of way by enabling the XDB Repository HTTP/FTP or WebDAV functionality via DBMS_XDB. XDB

Read More…

HOWTO: Consume Anydata via XMLType (and back)

This was a small mind exercise on the OakTable website (OakTable Challenge)for a person regarding how to go from a relational table to anydata datatype table and back, which I, of course, approached via an “XMLType” of way thinking. Probably the whole thing is not that practical and/or can be optimized in various ways, but …

Continue reading »

Revived Boston Area DBA SIG Meeting

The DBA SIG of the Northeast Oracle User Group has been revived (thank you, Lyson and Jeane) and I was honored to be the speaker of the first session of what I hope will be a long list of very successful like the old days. 

I started at 7 PM and finished at midnight - a solid 5 hours later! Thank you for your patience. It just made my day to have you in the audience that late. I hope you found it useful.

Here is the slide deck and the scripts I used during my session. As in the past, I cherish the moments and will highly appreciate to have your feedback.