Search

OakieTags

Who's online

There are currently 0 users and 37 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Detail in the Aggregate

SQL Server 2012 – the "Denali" release – brings improved support for window
functions as defined in the SQL standard. One of the things you can now do
is to mix summary and detail results in the...



Read the full post at www.gennick.com/database.

Detail in the Aggregate

SQL Server 2012 – the "Denali" release – brings improved support for window
functions as defined in the SQL standard. One of the things you can now do
is to mix summary and detail results in the...



Read the full post at www.gennick.com/database.

DOAG 2011 Unconference Wrap-Up

My sessions at DOAG all went well, and I in particular liked the Unconference ones. As promised I held two of them and they were rather different.

The first one only had a couple of attendees (including OakTable fellow Christian Antognini from Switzerland), so we could gather around my laptop and do a real "Optimizer Hacking Session". Actually we had to do that because the projector promised by DOAG wasn't there. I talked mainly about some common traps when performing SQL statement troubleshooting and what to do in order to avoid them, but also showed some cool stuff that I'll shortly blog about separately.

The second session two hours later was attended by many more than I expected, so there was no chance at all to gather around my laptop. Originally I planned to talk about some common pitfalls why the estimates of the optimizer can go wrong (even with 100% computed statistics) and what to do about, but I then realized that I should at least give a short summary of what I've shown in the first session to those that hadn't attended (which were most of the attendees anyway). This started off an interesting discussion with many questions - a surprising number revolved around the usage of user-defined PL/SQL functions.

Since even the WiFi connection didn't work properly I could only mention briefly some important articles that should be read if one wants to get a good understanding of how the optimizer can (actually should!) be helped when dealing with PL/SQL functions.

So for reference I repeat here a summary of relevant articles:

1. Expert Oracle Practices (the book I co-authored), Chapter 7, "PL/SQL and the CBO" by OakTable fellow Joze Senegacnik

2. OakTable Member Adrian Billington on pl/sql functions and cbo costing, which is inspired by Joze's work above

3. Adrian Billington on "setting cardinality for pipelined and table functions"

4. Adrian Billington's generic "Collection Cardinality Utility", for background info see the article above, the source code includes a description when and how to use it

5. The problem of the CBO ignoring the projection for costing, described on my blog

There were some other questions in the same context, for example, what would a Real Time SQL Monitoring report look like with user-defined PL/SQL functions - here is an example you can view / download (remember that the file can be shared since it is self-contained) that is taken from a query quite similar to the one described in my blog post. By the way, it's a query that gives the approach a try discussed during the session if there a chance of getting the "projection" cost right by playing clever tricks that combine projection and selection. I didn't get it to work as desired, but at least the "selection" cost showed up, which however doesn't address the actual problem of the "projection" cost in my particular test case. So Oracle still attempts to merge the views and has then to run the user-defined function many more times than necessary - in fact the query represents the worst case because the user-defined function is called for both, selection and projection.

Sample Real Time SQL Monitoring Report

You can see from the report that at least some PL/SQL time shows up, but it also shows a deficiency of the report - it doesn't show detailed logical I/O values, only a summary of Buffer Gets.

And interestingly, although nothing about this is visible from the end-user report, the report file itself (in the XML data section) contains at least a couple of optimizer parameter settings - to address another question raised during the session.

All in all I really liked these sessions and I hope the attendees enjoyed them as much as I did.

Last, but not least: It's not definitive yet but very likely I'll give a few free "Optimizer Hacking Sessions" in the future on the internet, so stay tuned!

I Can Help You Trace It

The first product I ever created after leaving Oracle Corporation in 1999 was a 3-day course about optimizing Oracle performance. The experiences of teaching this course from 2000 through 2003 (heavily revising the material each time I taught it) added up to the knowledge that Jeff Holt and I needed to write Optimizing Oracle Performance (2003).

Between 2000 and 2006, I spent many weeks on the road teaching this 3-day course. I stopped teaching it in 2006. An opportunity to take or teach a course ought to be a joyous experience, and this one had become too much of a grind. I didn’t figure out how to fix it until this year. How I fixed it is the story I’d like to tell you.

The Problem

The problem was simply inefficiency. The inefficiency began with the structure of the course, the 3-day lecture marathon. Realize, 6 × 3 = 18 hours of sitting in a chair, listening attentively to a single voice (my voice) is the equivalent of a 6-week university term of a 3-credit-hour course, taught straight through in three days. No hour-plus homework assignment after each hour of lecture to reinforce the lessons; but a full semester’s worth of listening to one voice, straight through, for three days. What retention rate would you expect from a university course compressed into just 3 days?

So, I optimized. I have created a new course that lasts one day (not even an exhausting full day at that). But how can a student possibly learn as much in 1 day as we used to teach in 3 days? Isn’t a 1-day event bound to be a significantly reduced-value experience?

On the contrary, I believe our students benefit even more now than they used to. Here are the big differences, so you can see why.

The Time Savings

In the 3-day course, I would spend half a day explaining why people should abandon their old system-wide-ratio-based ways of managing system performance. In the new 1-day course, I spend less than an hour explaining the Method R approach to thinking about performance. The point of the new course is not to convince people to abandon anything they’re already doing; it’s to show students the tremendous additional opportunities that are available to them if they’ll just look at what Oracle trace files have to offer. Time savings: 2 hours.

In the 3-day course, I would spend a full day explaining how to interpret trace data. By hand. These were a few little lab exercises, about an hour’s worth. Students would enter dozens of numbers from trace files into laptops or pocket calculators and write results on worksheets. In the new 1-day course, the software tools that a student needs to interpret files of any size—or even directories full of files—are included in the price of the course. Time savings: 5 hours.

In the 3-day course, I would spend half a day explaining how to collect trace data. In the new 1-day course, the software tools that a student needs to get started collecting trace files are included in the price of the course. For software architectures that require more work than our software can do for you, there’s detailed instruction in the course book. Time savings: 3 hours.

In the 3-day course, I would spend half a day working through about five example cases using a software tool to which students would have access for 30 days after they had gone home. In the new 1-day course, I spend one hour working through about eight example cases using software tools that every student will take home and keep forever. I can spend less time per case yet teach more because the cases are thoroughly documented in the course book. So, in class, we focus on the high-level decision making instead of the gnarly technical details you’ll want to look up later anyway. Time savings: 3 hours.

...That’s 13 classroom hours we’ve eliminated from the old 3-day experience. I believe that in these 13 hours, I was teaching material that students weren’t retaining to begin with.

The Book

The next big difference: the book.

In the old 3-day course, I distributed two books: (1) the “Course Notebook,” which was a black and white listing of the course PowerPoint slides, and (2) a copy of Optimizing Oracle Performance (O’Reilly 2003). The O’Reilly book was great, because it contained a lot of detail that you would want to look up after the course. But of course it doesn’t contain any new knowledge we’ve learned since 2003. The Course Notebook, in my opinion, was never worth much to begin with. (In my opinion, no PowerPoint slide printout is worth much to begin with.)

The Mastering Oracle Trace Data (MOTD) book we give each student in my new 1-day course is a full-color, perfect-bound book that explains the course material and far more in deep detail. It is full-color for an important reason. It’s not gratuitous or decorative; it’s because I’ve been studying Edward Tufte. I use color throughout the book to communicate detailed, high-resolution information faster to your brain.

Color in the book helps to reduce student workload and deliver value long after a student has left the classroom. In this class, there is no collection of slide printouts like you’ve archived after every Oracle class you’ve been to since the 1980s. The MOTD book is way better than any other material I’ve ever distributed in my career. I’ve heard students tell their friends that you have to see it to believe it.

“A paper record tells your audience that you are serious, responsible, exact, credible. For deep analysis of evidence and reasoning about complex matters, permanent high-resolution displays [that is, paper] are an excellent start.” —Edward Tufte

The Software

So, where does a student recoup all the time we used to spend going through trace files, and studying how to collect trace data on half a dozen different software architectures? In the thousands of man-hours we’ve invested into the software that we give you when you come to the course. Instead of explaining every little detail about quirks in Oracle trace data that change between Oracle versions 10.1 and 10.2 and 11.2 or 11.2.0.2 and 11.2.0.4, the software does the work for you. Instead of having to explain all the detail work, we have time to explain how to use the results of our software to make decisions about your data.

What’s the catch? Of course, we hope you’ll love our software and want to buy it. The software we give you is completely full-featured and yours to keep forever, but the license limits you to using it only with one login id, and it doesn’t include patches and upgrades, which we release a few times each year. We hope you’ll love our software so much that you’ll want to buy a license that lets you use it on any of your systems and that includes the right to upgrade as we fix bugs and add features. We hope you’ll love it so much that you encourage your colleagues to buy it.

But there’s really no catch. You get software and a course (and a book and a shirt) for less than the daily rate that we used to charge for just a course.

A Shirt?

MOTD London 2011-09-08: “I can help you trace it.”

Yes, a shirt. Each student receives a Method R T-shirt that says, “I can help you trace it.” We don’t give these things away to anyone except for students in my MOTD course. So if you see one, the person wearing it can, in actual fact, Help You Trace It.

The Net Result

The net result of all this optimization is benefits on several fronts:

  • The course costs a lot less than it used to. The fee is presently only about 25% of the 3-day course’s price, and the whole experience requires less than ⅓ of time away from work that the original course did.
  • In the new course, our students don’t have to work so hard to make productive use of the course material. The book and the software take so much of the pressure off. We do talk about what the fields in raw trace data mean—I think it’s necessary to know that in order to use the data properly, and have productive debates with your sys/SAN/net/etc. administration colleagues. But we don’t spend your time doing exercises to untangle nested (recursive) calls by hand. The software you take home does that for you. That’s why it is so much easier for a student to put this course to work right away.
  • Since the course duration is only one day, I can visit far more cities and meet far more students each year. That’s good for students who want to participate, and it’s great for me, because I get to meet more people.

Plans

The only thing missing from our Mastering Oracle Trace Data course right now is you. I have taught the event now in Southlake, Texas (our home town), in Copenhagen, and in London. It’s field-tested and ready to roll. We have several cities on my schedule right now. I’ll be teaching the course in Birmingham UK on the day after UKOUG wraps up, December 8. I’ll be doing Orlando and Tampa in mid-December. I’ll teach two courses this coming January in Manhattan and Long Island. There’s Billund (Legoland) DK in April. We have more plans in the works for Seattle, Portland, Dallas, and Cleveland, and we’re looking for more opportunities.

Share the word by linking the official
MOTD sticker to http://method-r.com/.

My wish is for you to help me book more cities in North America and Europe (I hope to expand beyond that soon). If you are part of a company or a user group with colleagues who would be interested in attending the course, I would love to hear from you. Registering en masse saves you money. The magic number for discounting is 10 students on a single registration from one company or user group.

I can help you trace it.

Planet Earth

I know it’s another post that’s not about Oracle, but someone sent me this video link a couple of days ago and it’s too wonderful not to share. (I’ve just got back from Iceland, so the Aurora Borealis at 1:05 is particularly relevant)

Coming Soon

Just a quick reminder if you’re in Northern Europe: I’ll be in Denmark on 13th December presenting a new tutorial event Beating the Oracle Optimizer.

If you’ve been on my “Designing Optimal SQL” course, this is a follow up that takes you in the realm of designing “strange” SQL for those occasions when the Optimizer isnt going to do well enough with “normal” SQL.

In other news:

Don’t forget the UKOUG annual conference in Birmingham from 5th – 7th Dec.

Anyone attending the full event can register for a free seat at the Oak Table Day on Sunday 4th Dec. where I’ll be presenting a geeky session about Redo.

If your abstract wasn’t selected this year, then I see there’s a series of “unconference” slots set aside in the exhibition gallery on the Monday to Wednesday.

 

Hinting

As I’ve often pointed out, this blog isn’t AskTom, or the OTN forum, so I don’t expect to have people asking me to solve their problems; neither do I answer email questions about specific problems. Occasionally, though, questions do appear that are worth a little public airing, and one of these came in by email a couple of weeks ago. The question is longer than the answer I sent, my contribution to the exchange doesn’t start until the heading: “My Reply”.

Last week I find a very interesting thing about use_hash hint accidentally. That is when you have join two tables using unique column from one table and you have a equal predicate on that column, you cannot use hint to make them using hash join.  I know that it does not make sense to use hash join in this case because nested loop is the best way to do it. The point is why Oracle ignore the hint here.

Here is the test case.

--Table creation

create table t1 as
select  rownum id,
        object_name name
from    all_objects
where   rownum <= 1000
;

create table t2 as
select
        mod(rownum,20)+1 id
from    dual
connect by
        rownum <= 1000;

-- Index creation (for table T1, we can create a primary key index or unique index)

alter table t1 add constraint t1_pk primary key(id);
create index ind_t2 on t2(id);

-- Gather table statistics here

select    /*+ ordered use_hash(t2) */
        *
from    t1, t2
where   t1.id = t2.id
and     t1.id = 1
;

Here, the use_hash hint will be ignored. Without rewriting the query, oracle only uses nested loop (which is the best thing, other join method are completely no make sense).

In your article, you said:

Why do people think that Oracle “ignores” hints ? There are two main reasons.

    1.  The available hints are not properly documented
    2.  the hints are rarely used properly – because they are not documented properly.
    3.  (there are a few bugs that make things go really wrong anyway) – and yes, I know that’s the third reason of two.

In the above test case, reasons 1 and 2 do not apply. So is it an Oracle feature or a bug?

We can argue that this is a feature since in this case oracle really know nested loop is the best thing. Then for table T1

select  /*+ full(t1) */
        *
from    t1
where   t1.id = 1

The above query should also ignore the FULL hint, but it does not.

Now, if we change the the query to:

select  /*+ ordered use_hash(t2) */
        *
from    t1,t2
where   t1.id = t2.id
and     t1.id in (1,2)
;

The hint works and oracle use hash join here.

Since oracle can use transitive closure when generating query plan, now, we rewrite the first query to an equivalent one:

select  /*+ ordered use_hash(t2) */
        *
from    t1,t2
where   t1.id = t2.id
and     t2.id = 1
;

Oracle will use the hint here even though oracle knows we want t1.id = 1.
So, it looks this is more like a bug than a feature. What do you think?

My reply

If you read chapter 6 of Cost Based Oracle – Fundamentals, somewhere around page 142, you will see what’s going on. There is an inconsistency in this part of the optimizer code here which could be addressed but might be hard to change. (So the answer to your question is that this is more like a bug than a feature – but it’s a side effect of a more significant defect in the code, rather than a very local bug.)

In the first case (predicate on t1), Oracle uses transitive closure to generate a predicate on t2 - and as it does so it drops the join predicate. This makes the hash join impossible and puts the hint out of context.

In the second case, Oracle keeps the join predicate (and that’s the inconsistency in behaviour), so the hash join is still legal and therefore the hint has to be obeyed.

I would guess that the logic works like this:

Predciate on t1: We generate a transitive predicate, but the source predicate is strong>equality on a , so the join predicate is redundant and can be dropped.

 Predicate on t2: We generate a transitive predicate, but the source predicate is not a guaranteed to be single row predicate, so the join predicate has to be kept and checked. (The “not unique” requirement is also why the join on the in-list behaves the way it does).

Arguably Oracle could introduce a second pass in the optimizer code which could note that the generated predicate has resulted in a predicate with equality on a unique key, and with this change in place the optimizer could decide to drop the join predicate and we would be back to consistent behaviour. In fact, to my mind, the code should never drop predicates – but it needs to change so that it recognises “redundant” predicates properly and doesn’t double count them in the calculation of join selectivity.

Friday Philosophy – OK, so I am on Twitter Now

Not a very exciting Friday Philosophy this week I’m afraid, just a self-publicising announcement that I am now on Twitter. I’ve put the wordpress widget on the blog for a while (days or weeks, I don’t know), my twitter name is MDWidlake. {I was a little surprised mwidlake had gone already but that says more about how rare I consider my surname to be than naivety, I hope}. It seems you can click on a part of the widget to follow me, which is a pretty safe thing to do as I am not very verbal as yet.

As I said, I’m not very active at the moment, I’m more following just a few friends and seeing what people use Twitter for. So far it mostly seems to be about:

  • Random stuff posted when bored
  • Complaining about work or, more specifically, tasks that are proving trickier than hoped
  • Drinking
  • Random stuff posted when bored
  • Articles that have caught someone’s eye
  • …or more often, about tweets about articles that have caught someone’s eye
  • Chatty stuff that only makes sense between social peers (and isn’t that one of the main points of something like Twitter?)
  • Random stuff posted when bored
  • Cuddly toys. I think that is a result of low sample size and that Doug Burns is away at a conference. I worry about his sanity sometimes.

Niall Litchfield, Neil Chandler and Doug Burns were right {thanks again for your advice, gents}, there is some nice stuff on there and I’ve already seen some articles and web pages I found interesting via it – but I have also failed to get on with proper work-like stuff I should have been doing as a result.

I also like the chatty extension to real social engagement that Twitter gives but I hold out on my final decision as to whether this makes up for the negative impact it seems to have on real, meeting-in-person socialising.

The interface to Twitter seems a bit, well, rubbish to me. I know, I’ve been on there for all of a week and I am probably missing the Bleedin’ Obvious  but it seems the stuff I see in Timeline, the default view, is just a subset of what people I follow say. I suspect that it’s got something to do with whether the person the tweet is replying to is on my follow list. To understand half the social stuff you have to go clicking around on people’s full tweet history and follow the thread back. Surely there is an easier way than this, maybe some connect-by tree-walk SQL could be invoked…

I’ve already dropped one person off my “following” list. I only followed one celebrity and I decided I could live without the random musings of Simon Pegg. I can imagine people get addicted to following several dozen b to z level celebs, maybe it’s like constantly living in some sort of poor quality reality tv show {Personally I tend to avoid all reality TV, I prefer reality. Except that I am forced to watch that dancing thing on BBC by my wife. And like most men who make that sort of defence, I can’t quite explain away why I still watch it if she is away…}.

So, don’t expect too much in the way of interesting, witty, insightful or even existing tweets from me as yet, but if you want to follow, heck you can always drop me like a sack of manure any time you like :-) .

Little Things Doth Crabby Make – Part XVII. I See xfs_mkfile(8) Making Fragmented Files.

BLOG UPDATE 21-NOV-2011: The comment thread for this post is extremely relevant.

 

I recently had an “exchange of ideas” with an individual. It was this individual’s assertion that modern systems exhibit memory latencies measured in microseconds.

Since I haven’t worked on a system with microsecond-memory since late in the last millennium I sort of let the conversation languish.

The topic of systems speeds and feeds was fresh on my mind from that conversation when I encountered something that motivated me to produce this installment in the Little Things Doth Crabby Make series.

This installment in the series has to do with disk scan throughput and file system fragmentation. But what does that have to do with modern systems’ memory latency? Well, I’ll try to explain.

Even though I haven’t had the displeasure of dealing with microsecond memory, this century, I do recall such ancient systems were routinely fed (and swamped) by just a few hundred megabytes per second disk scan throughput.

I try to keep things like that in perspective when I’m fretting over the loss of 126MB/s like I was the other day. Especially when the 126MB/s is a paltry 13% degradation in the systems I was analyzing! Modern systems are a modern marvel!

But what does any of that have to do with XFS and fragmentation? Please allow me to explain. I had a bit of testing going where 13% (for 126MB/s) did make me crabby (it’s Little Things Doth Crabby Make after all).

The synopsis of the test, and thus the central topic of this post, was:

  1. Create and initialize a 32GB file whilst the server is otherwise idle
  2. Flush the Linux page cache
  3. Use dd(1) to scan the file with 64KB reads — measure performance
  4. Use xfs_bmap(8) to report on file extent allocation and fragmentation

Step number 1 in the test varied the file creation/initialization method between the following three techniques/tools:

  1. xfs_mkfile(8)
  2. dd(1) with 1GB writes (yes, this works if you have sufficient memory)
  3. dd(1) with 64KB writes

The following screen-scrape shows that the xfs_mkfile(8) case rendered a file that delivered scan performance significantly worse than the two dd(1) cases. The degradation was 13%:

# xfs_mkfile 32g testfile
 # sync;sync;sync;echo "3" > /proc/sys/vm/drop_caches
 # dd if=testfile of=/dev/null bs=64k
 524288+0 records in
 524288+0 records out
 34359738368 bytes (34 GB) copied, 40.8091 seconds, 842 MB/s
 # xfs_bmap -v testfile > frag.xfs_mkfile.out 2>&1
 # rm -f testfile
 # dd if=/dev/zero of=testfile bs=1024M count=32
 32+0 records in
 32+0 records out
 34359738368 bytes (34 GB) copied, 22.1434 seconds, 1.6 GB/s
 # sync;sync;sync;echo "3" > /proc/sys/vm/drop_caches
 # dd if=testfile of=/dev/null bs=64k
 524288+0 records in
 524288+0 records out
 34359738368 bytes (34 GB) copied, 35.5057 seconds, 968 MB/s
 # xfs_bmap -v testfile > frag.ddLargeWrites.out 2>&1
 # rm testfile
 # df -h .
 Filesystem Size Used Avail Use% Mounted on
 /dev/sdb 2.7T 373G 2.4T 14% /data1
 # dd if=/dev/zero of=testfile bs=1M count=32678
 32678+0 records in
 32678+0 records out
 34265366528 bytes (34 GB) copied, 21.6339 seconds, 1.6 GB/s
 # sync;sync;sync;echo "3" > /proc/sys/vm/drop_caches
 # dd if=testfile of=/dev/null bs=64k
 522848+0 records in
 522848+0 records out
 34265366528 bytes (34 GB) copied, 35.3932 seconds, 968 MB/s
 # xfs_bmap -v testfile > frag.ddSmallWrites.out 2>&1

I was surprised by the xfs_mkfile(8) case. Let’s take a look at the xfs_bmap(8) output.

First, the two maps from the dd(1) files:

# cat frag.ddSmallWrites.out
 testfile:
 EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
 0: [0..9961471]: 1245119816..1255081287 6 (166187576..176149047) 9961472
 1: [9961472..26705919]: 1342791800..1359536247 7 (84037520..100781967) 16744448
 2: [26705920..43450367]: 1480316192..1497060639 8 (41739872..58484319) 16744448
 3: [43450368..66924543]: 1509826928..1533301103 8 (71250608..94724783) 23474176
 #
 # cat frag.ddLargeWrites.out
 testfile:
 EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
 0: [0..9928703]: 1245119816..1255048519 6 (166187576..176116279) 9928704
 1: [9928704..26673151]: 1342791800..1359536247 7 (84037520..100781967) 16744448
 2: [26673152..43417599]: 1480316192..1497060639 8 (41739872..58484319) 16744448
 3: [43417600..67108863]: 1509826928..1533518191 8 (71250608..94941871) 23691264

The mapping of file offsets to extents is quite close in the dd(1) file cases. Moreover, XFS gave me 4 extents for my 32GB file. I like that..but…

So what about the xfs_mkfile(8) case? Well, not so good.

I’ll post a blog update when I figure out more about what’s going on. In the meantime, I’ll just paste it and that will be the end of this post for the time being:

# cat frag.xfs_mkfile.out
testfile:
 EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
 0: [0..10239]: 719289592..719299831 4 (1432..11671) 10240
 1: [10240..14335]: 719300664..719304759 4 (12504..16599) 4096
 2: [14336..46591]: 719329072..719361327 4 (40912..73167) 32256
 3: [46592..78847]: 719361840..719394095 4 (73680..105935) 32256
 4: [78848..111103]: 719394608..719426863 4 (106448..138703) 32256
 5: [111104..143359]: 719427376..719459631 4 (139216..171471) 32256
 6: [143360..175615]: 719460144..719492399 4 (171984..204239) 32256
 7: [175616..207871]: 719492912..719525167 4 (204752..237007) 32256
 8: [207872..240127]: 719525680..719557935 4 (237520..269775) 32256
 [...3,964 lines deleted...]
 3972: [51041280..51073535]: 1115787376..1115819631 6 (36855136..36887391) 32256
 3973: [51073536..51083775]: 1115842464..1115852703 6 (36910224..36920463) 10240
 3974: [51083776..51116031]: 1115852912..1115885167 6 (36920672..36952927) 32256
 3975: [51116032..54897663]: 1142259368..1146040999 6 (63327128..67108759) 3781632
 3976: [54897664..55078911]: 1146077440..1146258687 6 (67145200..67326447) 181248
 3977: [55078912..56094207]: 1195607400..1196622695 6 (116675160..117690455) 1015296
 3978: [56094208..67108863]: 1245119816..1256134471 6 (166187576..177202231) 11014656

Filed under: oracle

Procedural SQL*Plus and Password Encryption

One  small but bothersome  issue I’ve had for 20 years is  how to drive a program like SQL*Plus with a shell script to make it procedural.  One approach is to just run single commands to SQL*Plus causing a connection and exit for every SQL statement. Connecting and disconnecting is costly. Ideally, I just want to open up a  connection and send commands to the connection and get the response. Of course languages like java, perl, python all have Oracle connection APIs but what if I just want a simple shell script and don’t have access to perl or python with the Oracle APIs or access to java? Can’t I just do it in shell?  Can’t I just connect SQL*Plus to a named pipe and echo my commands into the pipe? Well yes but there is an annoying obstacle. After echoing the first command the pipe, which SQL*Plus dutifully executes, SQL*Plus then exits. I first ran into this problem about 20 years ago and didn’t solve it at the time.  A few years later, I figured out a trick to make it work and have been using it ever since. The trick is to have a process run a “tail -f” of an empty file into the pipe. With this second process tailing, SQL*Plus doesn’t exit when reading from the pipe. Since I first started using this I’ve never looked into exactly why. It think that when SQL*Plus tries to read from the pipe after the first command has been sent, the OS says, no more data , and SQL*Plus exits. With the second process doing the “tail -f”, then the OS tells SQL*Plus, waiting for more data to send you, and SQL*Plus waits. Would love a more detailed explanation.

#!/bin/ksh
SID=orcl
PORT=1521
function usage
{
       echo "Usage: $(basename $0)    [sid] [port]"
       echo "  username        database username"
       echo "  username        database password"
       echo "  host            hostname or IP address"
       echo "  sid             optional database sid (default: orcl)"
       echo "  port            optional database port (default: 1521)"
       exit 2
}
[[ $# -lt 3 ]] && usage
[[ $# -gt 5 ]] && usage
[[ $# -gt 0 ]] && UN=$1
[[ $# -gt 1 ]] && PW=$2
[[ $# -gt 2 ]] && HOST=$3
[[ $# -gt 3 ]] && SID=$4
[[ $# -gt 4 ]] && PORT=$5
  MKNOD=/etc/mknod
  DEBUG=0
  OPEN=sqlplus.open
  PIPE=sqlplus.pipe
  CLEAN=sqlplus.clean
  sh ./$CLEAN > /dev/null 2>&1
  echo "" > $CLEAN
  echo "rm $OPEN" >> $CLEAN
  echo "rm $PIPE" >> $CLEAN
  rm $OPEN $PIPE > /dev/null 2>&1
  touch  $OPEN
  cmd="$MKNOD $PIPE p"
  eval $cmd
  tail -f $OPEN >> $PIPE &
  OPENID="$!"
  echo "kill -9 $OPENID" >> $CLEAN
  # run SQLPLUS silent unless DEBUG is 2 or higher
  SILENT=""
  if [ $DEBUG -lt 2 ]; then
      SILENT="-s"
  fi
  CONNECT="$UN/$PW@(DESCRIPTION= \
                     (ADDRESS_LIST=             \
                         (ADDRESS=              \
                             (PROTOCOL=TCP)     \
                             (HOST=$HOST)       \
                             (PORT=$PORT)))     \
                      (CONNECT_DATA=            \
                             (SERVER=DEDICATED) \
                             (SID=$SID)))"
   cmd="sqlplus $SILENT \"$CONNECT\" < $PIPE > /dev/null &"
   cmd="sqlplus $SILENT \"$CONNECT\" < $PIPE  &"
   echo "$cmd"
   echo "PIPE $PIPE"
   eval $cmd
   SQLID="$!"
   echo "kill -9 $SQLID" >> $CLEAN

Example execution

$ ./sqlplus_pipe.sh  scott tiger 192.168.1.2
PIPE sqlplus.pipe
$ echo 'select * from dual;' > sqlplus.pipe
D
-
X
$ echo 'exit' > sqlplus.pipe
$ sh sqlplus.clean

The above code will create a SQL*Plus connection reading it’s input from a pipe. It also creates a cleanup file to remove the pipe file and kill the tail process. If creating multiple connections then the file names will have to be pre/post-pended with some string to keep them separate. could be a timestamp. Could be info about which target.

Now what does this have to do with password encryption?
Well there are packages that handle password encryption. Oracle has as cool thing called wallets, that can be setup so that SQL*Plus can connect without a password.
Oracle’s Doc
http://download.oracle.com/docs/cd/B19306_01/network.102/b14266/cnctslsh.htm
Here is a quick setup:
http://www.oracle-base.com/articles/10g/SecureExternalPasswordStore_10gR2.php
Unfortunately Oracle’s wallet method requires files that can’t be redistributed. These file dependencies legally have to be installed by the end user,  where as Oracle instant client can be redistributed. So what if I’m redistributing a tool that uses instant client? Then the wallet solution is out of the question, at least for easy installs.
Now what if I create my own binary to handle password encryption, like “Oracle Password Repository”
http://opr.sourceforge.net/
This doesn’t help that much for protection, because a user can actually get the text password. What I want is a way to encrypt passwords and hide  the unencrypted passwords from the user. Sure I want  them to connect, then they have access to the database,  but I want to prevent them from walking off with a file full of clear text passwords.  One solution, like the Oracle wallet with the “cwallet.sso” files, is to have a binary that creates the connections  for SQL*Plus over a pipe and then pass the pipe to the users. Bit of a hack, especially for an interactive users, but for scripts that run SQL*Plus it not only centralizes the passwords, but it encrypts and  helps prevent the user from getting access to and walking away with a set of clear text passwords.

NOTE:

Reguarding the beginning of this blog post and the problem of SQL*Plus exiting after receiving the first command via a named pipe, here is what truss looks like echoing ‘select * from dual’ into the pipe with and without having a second process tailing (nothing) into the pipe. First example has a second process doing a “tail -f ” of an empty file into the pipe while echoing ‘select * from dual’ into the pipe which SQL*Plus reads, executes and stays connected

fstat64(1, 0x08044440)                          = 0
write(1, "\n", 1)                               = 1
write(1, " D\n", 2)                             = 2
write(1, " -\n", 2)                             = 2
write(1, " X\n", 2)                             = 2
write(1, "\n", 1)                               = 1
read(0, 0x0813FAD4, 5120)       (sleeping...)

Second example, there is no “tail -f” and we just do “echo ‘select * from dual;’” into the pipe which SQL*Plus executes then exits:

write(1, "\n", 1)                               = 1
write(1, " D\n", 2)                             = 2
write(1, " -\n", 2)                             = 2
write(1, " X\n", 2)                             = 2
write(1, "\n", 1)                               = 1
write(1, " S Q L >  ", 5)                       = 5
read(0, 0x0813F714, 5120)                       = 0
write(4, "\0\r\0\006\0\0\0\0\003\t".., 13)      = 13
read(4, "\011\0\006\0\0\0\0\0\t01".., 2064)     = 17
write(1, " D i s c o n n e c t e d".., 95)      = 95
write(1, " W i t h   t h e   P a r".., 78)      = 78
write(4, "\0\n\0\006\0\0\0\0 @", 10)            = 10
close(4)

Here is a good explanation http://linux.die.net/man/7/pipe

If all file descriptors referring to the write end of a pipe have been closed, then an attempt to read(2)
from the pipe will see end-of-file (read(2) will return 0).

The part that isn’t explained, for me, is that a reader will wait until at the write end has been opened.  So the EOF doesn’t happened until there is an open and a close and all open write file descriptors have to be closed, thus adding a never finishing write will keep the reader from reading an EOF.

Here is the code for OpenSolaris (thanks to Erik Schrock for this links)