Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

I'm not a DBA anymore...

After nine years and nine months of running the database that hosts asktom, I've retired.

Not from answering questions, but rather from being the DBA and semi-SA for the machine that was asktom.oracle.com.

Asktom.oracle.com is now synonymous with APEX.oracle.com - Joel Kallman facilitated the move (here is the story on that). So, I'm now running in a database with thousands of other applications. And - it seems a bit faster than my old hardware (which was held together with some glue and duct-tape - the drives were failing left and right...).

For some stories on APEX and the ability to handle a pretty large load - I suggest you glance at:

But now, for the first time in years, I won't be afraid to check asktom when I land and get off of the plane. If something goes wrong - the data center guys will fix it :)

And yes, the intermittent "some IE users get an error on asktom" issue was found and fixed.

On the Importance of Diagnosing Before Resolving

Today a reader posted a question I like at our Method R website. It's about the story I tell in the article called, "Can you explain Method R so even my boss could understand it?" The story is about sending your son on a shopping trip, and it takes him too long to complete the errand. The point is that an excellent way to fix any kind of performance problem is to profile the response time for the properly chosen task, which is the basis for Method R (both the method and the company).

Here is the profile that details where the boy's time went during his errand:

                    --Duration---
Subtask minutes % Executions
------------------ ------- ---- ----------
Talk with friends 37 62% 3
Choose item 10 17% 5
Walk to/from store 8 13% 2
Pay cashier 5 8% 1
------------------ ------- ---- ----------
Total 60 100%

I went on to describe that the big leverage in this profile is the elimination of the subtask called "Talk with friends," which will reduce response time by 62%.

The interesting question that a reader posted is this:

Not sure this is always the right approach. For example, lets imagine the son has to pick 50 items
Talk 3 times 37 minutes
Choose item 50 times 45 minutes
Walk 2 times 8 minutes
Pay 1 time 5 minutes
Working on "choose item" is maybe not the right thing to do...

Let's explore it. Here's what the profile would look like if this were to happen:

          --Duration---
Subtask minutes % Executions
------- ------- ---- ----------
Choose 45 47% 50
Talk 37 39% 3
Walk 8 8% 2
Pay 5 5% 1
------- ------- ---- ----------
Total 95 100%

The point of the inquiry is this:

The right answer in this case, too, is to begin with eliminating Talk from the profile. That's because, even though it's not ranked at the very top of the profile, Talk is completely unnecessary to the real goal (grocery shopping). It's a time-waster that simply shouldn't be in the profile. At all. But with Cary's method of addressing the profile from the top downward, you would instead focus on the "Choose" line, which is the wrong thing.

In chapters 1 through 4 of our book about Method R for Oracle, I explained the method much more thoroughly than I did in the very brief article. In my brevity, I skipped past an important point. Here's a summary of the Method R steps for diagnosing and resolving performance problems using a profile:

  1. (Diagnosis phase) For each subtask (row in the profile), visiting subtasks in order of descending duration...
    1. Can you eliminate any executions without sacrificing required function?
    2. Can you improve (reduce) individual execution latency?
  2. (Resolution phase) Choose the candidate solution with the best net value (that is, the greatest value of benefit minus cost).

Here's a narrative of executing the steps of the diagnostic phase, one at a time, upon the new profile, which—again—is this:

          --Duration---
Subtask minutes % Executions
------- ------- ---- ----------
Choose 45 47% 50
Talk 37 39% 3
Walk 8 8% 2
Pay 5 5% 1
------- ------- ---- ----------
Total 95 100%
  1. Execution elimination for the Choose subtask: If you really need all 50 items, then no, you can't eliminate any Choose executions.
  2. Latency optimization for the Choose subtask: Perhaps you could optimize the mean latency (which is .9 minutes per item). My wife does this. For example, she knows better where the items in the store are, so she spends less time searching for them. (I, on the other hand, can get lost in my own shower.) If, for example, you could reduce mean latency to, say, .8 minutes per item by giving your boy a map, then you could save (.9 – .8) × 50 = 5 minutes (5%). (Note that we don't execute the solution yet; we're just diagnosing right now.)
  3. Execution elimination for the Talk subtask: Hmm, seems to me like if your true goal is fast grocery shopping, then you don't need your boy executing any of these 3 Talk events. Proposed time savings: 37 minutes (39%).
  4. Latency optimization for the Talk subtask: Since you can eliminate all Talk calls, no need to bother thinking about latency reduction. ...Unless you're prey to some external constraint (like social advancement, say, in attempt to maximize your probability of having rich and beautiful grandchildren someday), in which case you should think about latency reduction instead of execution elimination.
  5. Execution elimination for the Walk subtask: Well, the boy has to get there, and he has to get back, so this "executions=2" figure looks optimal. (Those Oracle applications we often see that process one row per network I/O call would have 50 Walk executions, one for each Choose call.)
  6. Latency optimization for the Walk subtask: Walking takes 4 minutes each way. Driving might take less time, but then again, it might actually take even more. Will driving introduce new dependent subtasks? Warm Up? Park? De-ice? Even driving doesn't eliminate all the walking... Plus, there's not a lot of leverage in optimizing Walk, because it accounts for only 8% of total response time to begin with, so it's not worth a whole lot of bother trying to shave it down by some marginal proportion, especially since inserting a car into your life (or letting your boy drive yours) is no trivial matter.
  7. Execution elimination for the Pay subtask: The execution count on Pay is already optimized down to the legally required minimum. No real opportunity for improvement here without some kind of radical architecture change.
  8. Latency optimization for the Pay subtask: It takes 5 minutes to Pay? That seems a bit much. So you should look at the payment process. Or should you? Even if you totally eliminate Pay from the profile, it's only going to save 5% of your time. But, if every minute counts, then yes, you look at it. ...Especially if there might be an easy way to improve it. If the benefit comes at practically no cost, then you'll take it, even if the benefit is only small. So, imagine that you find out that the reason Pay was so slow is that it was executed by writing a check, which required waiting for store manager approval. Using cash or a credit/debit card might improve response time by, say, 4 minutes (4%).

Now you're done assessing the effects of (1) execution elimination and (2) latency reduction for each line in the profile. That ends the diagnostic phase of the method. The next step is the resolution phase: to determine which of these candidate solutions is the best. Given the analysis I've walked you through, I'd rank the candidate solutions in this order:

  1. Eliminate all 3 executions of Talk. That'll save 37 minutes (39%), and it's easy to implement; you don't have to buy a car, apply for a credit card, train the boy how to shop faster, or change the architecture of how shopping works. You can simply discard the "requirement" to chat, or you can specify that it be performed only during non-errand time windows.
  2. Optimize Pay latency by using cash or a card, if it's easy enough to give your boy access to cash or a card. That will save 4 minutes, which—by the way—will be a more important proportion of the total errand time after you eliminate all the Talk executions.
  3. Finally, consider optimizing Choose latency. Maybe giving your son a map of the store will help. Maybe you should print your grocery list more neatly so he can read it without having to ask for help. Maybe by simply sending him to the store more often, he'll get faster as his familiarity with the place improves.

That's it.

So the point I want to highlight is this:

I'm not saying you should stick to the top line of your profile until you've absolutely conquered it.

It is important to pass completely through your profile to construct your set of candidate solutions. Then, on a separate pass, you evaluate those candidate solutions to determine which ones you want to implement, and in what order. That first full pass is key. You have to do it for Method R to be reliable for solving any performance problem.

Every day I learn something new...

Two things

One, here is something new I learned today. I wasn't aware of that, that is a nice "stealth" feature :)

The other thing - I have a podcast on 11g Release 2 "things" posted on oracle.com if you are interested....

Is plan_hash_value a final say?

I was reviewing a performance issue with a client recently. Problem is that increased global cache waits causing application slowdown affecting few critical business functions. Using one of my script gc_traffic.sql > and graphing the results with Excel spreadsheet, it is established that there is a marked increase in GC traffic today compared to week earlier. Similar jobs runs every day and so comparing two week days is sufficient to show the GC traffic increase. Graph is between total blocks and AWR snap time in 30 minutes interval. [Click the picture below to review the graph clearly.]

gc comparison

Identifying the object creating this increased GC traffic is essential to identify root cause. We were able to quickly determine that this increase in GC traffic was localized around few SQL statements using ADDM and AWR reports. We decided to focus on one SQL with an obvious increase in elapsed time compared to prior week. So, first question asked, is there a change in the plan? plan_hash_value was reviewed and quickly determined that there is no change in the plan_hash_value.

Little bit of history, there were few changes performed by the vendor over the weekend as a part of few bug fixes. Vendor’s argument was that since there is no change to the plan_hash_value, SQL access plan did not change and so, this can’t be due to vendor changes. My Client’s argument was that there were no changes to the environment and problem started after the application changes.

There are many different things that can go wrong without changes to the execution plan. We can ignore those conditions for now since (a) there has been no changes to the environment (b) no visible changes to the data (c) no error message and average CR recv time is consistent with prior weeks. Well, Let’s cut to the chase. It boiled down to a question “Can SQL plan change without a change in plan_hash_value?”. What do you think? Please answer in the poll below.

plan_hash_value and hash_value

Hash_value of a SQL statement is generated from the text of an SQL statement and plan_hash_value is generated from the execution plan of that SQL statement[ More accurately, from that child cursors' execution plan and exactly what is involved in generating plan_hash_value is not published]. It is a general belief that plan_hash_value will change even if there is a slightest change in the execution plan. But, that is not always the case!

Test case

We will use a small table to explore this issue.

prompt Test case #1: Initial test case with index on columns (n1, n2).
prompt ==========
create table t1 (n1 number, n2 number, v1 varchar2(100));
create index t1_n1 on t1(n1,n2);

explain plan for select * from t1 where n1=:b1 and n2=:b2;
select * from table (dbms_xplan.display);
prompt plan hash value in this case is 626762252

Plan hash value: 626762252
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N1"=TO_NUMBER(:B1) AND "N2"=TO_NUMBER(:B2))

So, we got the plan_hash_value as highlighted above. In the following test case we will recreate the index reordering the columns as (n2, n1).

prompt Test case #2: Index with re-ordered columns
prompt ==========
drop index t1_n1;
create index t1_n1 on t1(n2,n1);

Plan hash value: 626762252

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |  5298 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N2"=TO_NUMBER(:B2) AND "N1"=TO_NUMBER(:B1))

Notice that in the test result above filter predicate changed reflecting index column reordering. But the plan_hash_value did not change. Point is that execution plan can change without a change in plan_hash_value (due to change in the underlying tables).

Let’s modify that index dropping a column from the index in the next test case.

prompt Test case #3: dropping a column from index t1_n1.

drop index t1_n1;
create index t1_n1 on t1(n1);

explain plan for select * from t1 where n1=:b1 and n2=:b2;

Plan hash value: 626762252

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |    41   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |    41   (3)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |  2119 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N2"=TO_NUMBER(:B2))
   2 - access("N1"=TO_NUMBER(:B1))

We see that predicates changed but the plan_hash_value did not change. In the test case below, we will modify index to be a function based index and test this SQL statement. There are also few more self-evident test cases below.

prompt Test case #3: Index is a function based index. Still, no change in the plan_hash_value.

create index t1_n1 on t1(to_char(n1));
explain plan for select * from t1 where to_char(n1)=:b1 and n2=:b2;

Plan hash value: 626762252
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |   127   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |   127   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |  2119 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N2"=TO_NUMBER(:B2))
   2 - access(TO_CHAR("N1")=:B1)

prompt Test case #4: Different schema and same SQL. plan_hash_value did not change.

Plan hash value: 626762252
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    78 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |    78 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |     3 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("N2"=TO_NUMBER(:B2))
   2 - access("N1"=TO_NUMBER(:B1))

plan_hash_value is also case sensitive for index names. In the test cases below, we will create case sensitive indices.

prompt Test case #5: Lower case index_name.. plan_hash_value changed.
create index "t1_n1" on t1(n1,n2);
explain plan for select * from t1 where n1=:b1 and n2=:b2;

Plan hash value: 2252949961
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | t1_n1 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

prompt Test case #6: Upper case index_name.. plan_hash_value did not change.
create index "T1_N1" on t1(n1,n2);
explain plan for select * from t1 where n1=:b1 and n2=:b2;
Plan hash value: 626762252

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

prompt Test case #7: Space in the index name changes plan_hash_value though.
create index "T1_N1 " on t1(n1,n2);
explain plan for select * from t1 where n1=:b1 and n2=:b2;
Plan hash value: 1377541522
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |    53 |  4134 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |    53 |  4134 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_N1  |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Summary

In summary, plan_hash_value is a very good indicator to see if the plan changed or not. Don’t get me wrong, I also use plan_hash_value, but in addition to comparing CPU time and elapsed time. Execution plan can change even when there is no change to the plan_hash_value. Salient points from these test cases are:

  1. Plan_hash_value is dependent upon partial execution plan not on complete execution plan.
  2. If a predicate is moved from filter_predicate to access_predicate or vice-versa, it doesn’t affect plan_hash_value.
  3. Changes in the parallelism of the queries does not affect plan_hash_value. For example, if a query used 4 parallel slaves today and 16 parallel slaves yesterday, that change is not visible through plan_hash_value. This behavior is expected as parallel_adaptive_multiuser can allocate slaves depending upon the state of that instance.
  4. Plan_hash_value is case-sensitive to index/table names. Also sensitive to white-space characters.
  5. Plan_hash_value is not sensitive to index types. For example, if the index type is a function based index as in our test case #4, as long as, index name did not change plan_hash_value will remain the same.
  6. Plan_hash_value is not sensitive to schema either. SQL statement accessing different schemas can have same plan_hash_value too.

Back to our problem. One of the index was recreated removing a column and caused optimizer to apply filter predicates at table level increasing number of accesses to table block tremendously, leading to more logical reads, more Global cache waits etc. This problem was amplified since this SQL was executed very frequently and concurrently from all RAC instances.

In my client’s defense, this application change was tested thoroughly. But, alas, test data chosen for this performance test was not probing this specific issue. This performance issue did not show up in development. Essentially, chosen data in the performance benchmark suite was not an excellent choice.

As they say in Gaelic language “Go raimh maith agat” to my client for allowing me to post this blog.
This can be read in traditional format in
plan_hash_value_and_gc
Update 1: Added the document and also corrected a typo.

Identifying Application Engine Source Code

One of the recurring challenges of performance tuning PeopleSoft systems is that having identified a problem SQL statement with the aid of a database utility, such as Oracle Enterprise Manager or session trace, you often need to identify the source so that you can implement a change.

Stored Outlines and SQL Profiles do not work well with PeopleSoft. Bind variables in Application Engines often become literals before the SQL statement is submitted to the database, unless the ReUse Statement feature has been enabled, which it is not by default.

I wrote about this in PeopleSoft for the Oracle DBA, it contains a script called stmtid.sql (listing 11-23, page 280) that adds an identifying comment to each of the stored statements used by PeopleSoft COBOL programs.

Now, I have produced a similar script for Application Engine programs. The script aeid.sql adds identification comments containing the name of object. It directly modifies the PeopleTools tables for Application Engine steps and PeopleSoft SQL objects. These comments will be seen in database tools and traces. So now when your database tools find a problem statement, it is easy to find the source.

Below, is part of a screen shot from Oracle Enterprise Manager. You can see the identifying comment in the SQL, which was added by this script


The comment identifies the name of the Application Engine program, section, step and step type. In this example, the SQL is from an Application Engine ID GPGB_EDI_PRC, in section called EDI_PRC, in a SQL type step called Step01.

For example, if you need adjust this SQL statement, perhaps add a hint to it, you don't have to spend time working out where it came from. It is not uncommon to find many similar SQL statements in a program. Also, where dynamic code is used to generate the SQL statement, it can be very tricky to find the exact source.

Using aeid.sql

  • You will need to edit the script each time, to specify which Application Engine programs and SQL objects are to be commented. The script is not perfect. It does not handle some very long statements, so you cannot simply instrument the entire system in one go.
  • The script is designed to run in SQL*Plus.
  • It produces a report of statements that it has altered.
  • The script does not commit. You should either commit or rollback the update depending upon whether you are satisfied with the output.
  • I suggest that the script only be run in pre-production environments. The comments will of course be migrated along with any other changes.


Identifying Application Engine Source Code

One of the recurring challenges of performance tuning PeopleSoft systems is that having identified a problem SQL statement with the aid of a database utility, such as Oracle Enterprise Manager or session trace, you often need to identify the source so that you can implement a change.

Stored Outlines and SQL Profiles do not work well with PeopleSoft. Bind variables in Application Engines often become literals before the SQL statement is submitted to the database, unless the ReUse Statement feature has been enabled, which it is not by default.

I wrote about this in PeopleSoft for the Oracle DBA, it contains a script called stmtid.sql (listing 11-23, page 280) that adds an identifying comment to each of the stored statements used by PeopleSoft COBOL programs.

Now, I have produced a similar script for Application Engine programs. The script aeid.sql adds identification comments containing the name of object. It directly modifies the PeopleTools tables for Application Engine steps and PeopleSoft SQL objects. These comments will be seen in database tools and traces. So now when your database tools find a problem statement, it is easy to find the source.

Below, is part of a screen shot from Oracle Enterprise Manager. You can see the identifying comment in the SQL, which was added by this script


The comment identifies the name of the Application Engine program, section, step and step type. In this example, the SQL is from an Application Engine ID GPGB_EDI_PRC, in section called EDI_PRC, in a SQL type step called Step01.

For example, if you need adjust this SQL statement, perhaps add a hint to it, you don't have to spend time working out where it came from. It is not uncommon to find many similar SQL statements in a program. Also, where dynamic code is used to generate the SQL statement, it can be very tricky to find the exact source.

Using aeid.sql

  • You will need to edit the script each time, to specify which Application Engine programs and SQL objects are to be commented. The script is not perfect. It does not handle some very long statements, so you cannot simply instrument the entire system in one go.
  • The script is designed to run in SQL*Plus.
  • It produces a report of statements that it has altered.
  • The script does not commit. You should either commit or rollback the update depending upon whether you are satisfied with the output.
  • I suggest that the script only be run in pre-production environments. The comments will of course be migrated along with any other changes.


Using the TRANSLATE() Function to Detect Numeric Data

When I made the first post to this blog, I had several ideas for future entries.
Those seem to have vanished with the wind, as I could not remember any of them.
Google notebook to the rescue - I have started recording blog ideas.
Now I just need more time to write entries here.

Now that I've made my lame excuse for ignoring this for a few weeks, I will move on to the technical content.

We've recently had a requirement to move some data from one application into another. The apps are similar in purpose, but not identical.

One of challenges in doing this is to cleanup the data into a form that our production app can use.

The source of the data has allowed a revision number for part numbers to be an alphanumeric column in a table. As you well might guess, this 'feature' was somewhat abused.

The data for a single part number may contain numerous rows in a table, each with a different revision number. Some of the revision 'numbers' are numeric, some are alpha.

We are only interested in those that are numeric.

For example, consider the following example:

Part# Revision
----- ----------
123 AA
123 AB
123 AC
123 01
123 02
123 03

The parts we need to import into our system are only those with revisions of 01,02 or 03.

The problem was how to exclude non-numeric revisions in a table.

You might consider the following SQL statement adequate

select * from parts where revision in ('01','02','03',...)

The IN operator could be converted to use a table known numeric revisions

select * from parts where revision in (select revision from good_revisions)

That would work, but would require building the table, which I would rather not do.

The data has not all arrived at one fell swoop - the next batch of data could break this method unless the table good_revisions is maintained. Ugh.

Compounding the problem is that the data is not consistent.
The revision could be ' 2' or '2 ' or '02'.

It seemed that this would call for regular expression.
Had I built the migration schemas in 10g database I could have used the REGEXP_INSTR function to find the numeric only revisions.

As the application is still in 9i, I used a 9i database to build the migration schemas.
Exercising my flawless 20/20 hindsight I realized I should have used 10g for those schemas.

Too late to change now, not enough time to move everything.

The next choice is to use the OWA_PATTERN package to find numeric only revisions.

From a purely functional perspective, this works perfectly.

From a 'waiting around staring at the computer while a job finishes' it was less than perfect.
Using OWA_PATTERN practically guarantees that any SQL it is used in will be slow.

There had to be a faster way.

At that point the TRANSLATE function came to mind.
While I have always found this function rather obtuse, it seemed it might do the job.

The TRANSLATE function accepts a string, a list of characters to locate in the string, and a list of characters to translate them to.

How can that be used to detect numeric only character data?

It works by transforming all numeric characters into a known character, and checking to see if the string consists solely of that character.

An example is the best way to show this.

Create a test table:

create table detect_numeric
as
select cast(chr(a.rownumber) || chr(b.rownumber) as varchar2(2)) fake_number
from
( select rownum rownumber from all_Tables where rownum <= 1000 ) a,( select rownum rownumber from all_Tables where rownum <= 1000 ) bwhere a.rownumber between 48 and 90and b.rownumber between 48 and 90/

Here's an example of using OWA_PATTERN to find numeric data:

SQL> l
1 select fake_number
2 from detect_numeric
3 where owa_pattern.amatch(fake_number,1,'^\d+$') > 0
4* order by 1
SQL> /

FA
--
00
01
...
98
99

100 rows selected.

Here's how to do the same thing using the TRANSLATE function

SQL> l
1 select fake_number
2 from detect_numeric
3 where '||' = translate(fake_number,'0123456789','||||||||||')
4* order by 1
SQL> /

FA
--
00
01
...
98
99

100 rows selected.

The TRANSLATE function works by changing all numeric characters to the '|' symbol.
If the returned value is '||' then this must be all numeric data.

There's a caveat with using this method. The character used in the TRANSLATE function must not appear in the data being checked.

This example is simplified in that it does not account for nulls, spaces or varying data lengths.
Nonetheless it works quite well.

Is it faster?

In the test I ran the TRANSLATE function is 2 orders or magnitude faster than when using OWA_PATTERN.

Tom Kyte's run_stats was used to compare the run times and resource usage of both methods.
Run Stats

Running both methods in loop 20 times yielded the following run times:

10.265529 secs
.010235 secs

PL/SQL procedure successfully completed.

OWA_PATTERN required 10 seconds, while the TRANSLATE function did the same thing in 0.01 seconds.

The resource usage was much better for TRANSLATE as well;

SQL> @run_stats

NAME RUN1 RUN2 DIFF
---------------------------------------- ---------- ---------- ----------
...
LATCH.SQL memory manager workarea list l 268 0 -268
atch

LATCH.checkpoint queue latch 640 0 -640
STAT...redo size 27764 28404 640
STAT...Elapsed Time 1028 3 -1025
STAT...recursive cpu usage 1029 3 -1026
STAT...session pga memory max 74048 65536 -8512
STAT...session uga memory 65464 0 -65464
STAT...session uga memory max 65464 0 -65464
STAT...session pga memory 74048 0 -74048

49 rows selected.

As long as was playing with this it seemed a good idea to test on 10g as well, using the REGEXP_INSTR function rather than the slow OWA_PATTERN package.

The results were surprising.

While REGEXPR_INSTR was very fast, TRANSLATE was still the faster method.

.048662 secs
.008829 secs

PL/SQL procedure successfully completed.

The TRANSLATE function appears to be another order of magnitude faster on 10gR2 as on 9i.

That should not be too big a surprise as there were many PL/SQL optimization included in 10gR2, but this was still somewhat unexpected.

------

Niall Litchfield Niall Litchfield's Blog wondered why I had not tried using an is_number function such as the one shown below.

create or replace function is_number( chk_data_in varchar2 )
return number
is
dummy number(38,4);
begin
dummy := to_number(chk_data_in);
return 1;
exception
when value_error then
return 0;
when others then
raise;
end;

Quite frankly, I forgot about it.

Here are timings for both 9i and 10g. As expected, TRANSLATE is still quite a bit faster.

9i:

SQL> @th3
.092713 secs
.009951 secs

PL/SQL procedure successfully completed.

10gR2:

SQL> @th3
.362097 secs
.008479 secs

PL/SQL procedure successfully completed.

Shell Tricks

DBAs from time to time must write shell scripts. If your environment is strictly Windows based, this article may hold little interest for you.

Many DBAs however rely on shell scripting to manage databases. Even if you use OEM for many tasks, you likely use shell scripts to manage some aspects of DBA work.

Lately I have been writing a number of scripts to manage database statistics - gathering, deleting, and importing exporting both to and from statistics tables exp files.

Years ago I started using the shell builtin getopts to gather arguments from the command line. A typical use might look like the following:

while getopts d:u:s:T:t:n: arg
do
case $arg in
d) DATABASE=$OPTARG
echo DATABASE: $DATABASE;;
u) USERNAME=$OPTARG
echo USERNAME: $USERNAME;;
s) SCHEMA=$OPTARG
echo SCHEMA: $SCHEMA;;
T) TYPE=$OPTARG
echo TYPE: $TYPE;;
t) TABLE_NAME=$OPTARG;;
#echo TABLE_NAME: $TABLE_NAME
n) OWNER=$OPTARG
echo OWNER: $OWNER;;
*) echo "invalid argument specified"; usage;exit 1;
esac

done

In this example, the valid arguments are -d, -u, -s, -T, -t and -n. All of these arguments require a value.

The command line arguments might look like this:
somescript.sh -d orcl -u system -s scott

If an invalid argument such as -z is passed, the script will exit with the exit code set to 1.

For the script to work correctly, some checking of the arguments passed to the script must be done.

For this script, the rules are as follows:

  • -d and -u must always be set
  • -s must be set if -T is 'SCHEMA'
  • -t and -n must both have a value or be blank
  • -s must be used with -T
In this example, values for -T other than 'SCHEMA' are not being checked.
The usual method (at least for me) to test the validity of command line arguments has always been to use the test, or [] operator with combinations of arguments.
For the command line arguments just discussed, the tests might look like the following:
[ -z "$DATABASE" -o -z "$USERNAME" ] && {
echo Database or Username is blank
exit 2
}

# include schema name if necessary
[ "$TYPE" == 'SCHEMA' -a -z "$SCHEMA" ] && {
echo Please include schema name
exit 3
}

# both owner and tablename must have a value, or both be blank
[ \( -z "$TABLE_NAME" -a -n "$OWNER" \) -o \( -n "$TABLE_NAME" -a -z "$OWNER" \) ] && {
echo Please specify both owner and tablename
echo or leave both blank
exit 4
}

# if -s is set, so must -T
[ -n "$SCHEMA" -a -z "$TYPE" ] && {
echo Please include a type with -T
exit 5
}

As you can see, there are a fair number of tests involved to determine the validity of the command line arguments. You may have guessed why I skipped one for this demo - I just did not want to write any more tests.

Validating command line arguments really gets difficult with a larger number of possible arguments. Worse yet, any later modifications to the script that require a new command line argument become dreaded tasks that are put off as long as possible due the complexity of testing the validity of command line arguments.
While writing a script that had 11 possible arguments, I was dreading writing the command line argument validation section, I thought there must be a better way.
It seemed that there must be a simple method of using regular expressions to validate combinations of command line arguments. I had never seen this done, and after spending a fair bit of time googling the topic it became apparent that there was not any code available for a cut and paste solution, so it seemed a nice opportunity to be innovative.
After experimenting a bit, I found what I think is a better way.
The method I use is to concatenate all possible command line arguments into a ':' delimited string, and then use a set of pre-prepared regexes to determine whether or not the command line arguments are valid.
One immediately obvious drawback to this method is that arguments containing the ':' character cannot be used. However the delimiting character can easily be changed if needed.
Using the same example as previous, the command line arguments are all concatenated into a string and converted to upper case:
ALLARGS=":$USERNAME:$DATABASE:$OWNER:$TABLE_NAME:$SCHEMA:$TYPE:"
# upper case arges
ALLARGS=$(echo $ALLARGS | tr "[a-z]" "[A-Z]")

Next a series of regular expressions are created. The first two are generic, and may or may not be used as building blocks for other regular expressions. The others all correspond to a specific command line argument


# alphanumeric only, at least 1 character
export ALNUM1="[[:alnum:]]+"
# alphanumeric only, at least 3 characters
export ALNUM3="[[:alnum:]]{3,}"
# username - alphanumeric only at least 3 characters
export USER_RE=$ALNUM3
# database - alphanumeric only at least 3 characters
export DATABASE_RE=$ALNUM3
# owner - alphanumeric and _ and $ characters
export OWNER_RE='[[:alnum:]_$]+'
# table_name - alphanumeric and _, # and $ characters
export TABLE_RE='[[:alnum:]_#$]+'
# schema - alphanumeric and _ and $ characters
export SCHEMA_RE='[[:alnum:]_$]+'

These regular expressions could use further refinement (such as username must start with alpha only ) but are sufficient for this demonstration.
Next, the regular expressions are concatenated together into ':' delimited strings, with each possible command line argument represented either by its corresponding regex, or by null.
The regexes are stuffed into a bash array. For our example, it looks like this:
#   :   user        :  db           :  owner        :  table     : schema        : type
VALID_ARGS=(
":$USER_RE:$DATABASE_RE:$OWNER_RE:$TABLE_RE::(DICTIONARY_STATS|SYSTEM_STATS|FIXED_OBJECTS_STATS):" \
":$USER_RE:$DATABASE_RE::::(DICTIONARY_STATS|SYSTEM_STATS|FIXED_OBJECTS_STATS):" \
":$USER_RE:$DATABASE_RE:$OWNER_RE:$TABLE_RE:$SCHEMA_RE:(SCHEMA):" \
":$USER_RE:$DATABASE_RE:::$SCHEMA_RE:SCHEMA:")

Notice that there are four different combitations of command line arguments represented.

In all cases the USERNAME and DATABASE are required and must correspond to the regex provided.
In the first combination of arguments, the owner and table must also be specified, and type (-T) must be either one of DICTIONARY_STATS, SYSTEM_STATS or FIXED_OBJECTS_STATS.
In the second possible combination, the only argument allowed in addition to DATABASE and USERNAME is the type (-T) argument.
The third combination requires the OWNER, TABLE_NAME and SCHEMA argument to have a valid value, and the TYPE argument must be set to SCHEMA.
The final combination of arguments requires just the SCHEMA argument and the TYPE argument must be set to SCHEMA, in addition to the USERNAME and DATABASE arguments.
By now you likely want to know just how these regular expressions are tested. The following function is used to test the command line arguments against each regular expression:
function validate_args {
typeset arglist
arglist=$1

while shift
do
[ -z "$1" ] && break
if [ $(echo $arglist | grep -E $1 ) ]; then
return 0
fi

done
return 1

}

Here's how it is used in the script:

# VALID_ARGS must NOT be quoted or it will appear as a single arg in the function
validate_args $ALLARGS ${VALID_ARGS[*]}

While this method may appear somewhat confusing at first, it becomes less so after using it a few times. It greatly simplifies the use of many command line arguments that may appear in differing combinations.
As far as I know, this method only works properly with the bash shell. I have done testing on only two shells, bash and ksh. It does not work properly on ksh.
Here's a demonstration of the ksh problem. The following script is run from both ksh and bash:
function va {

echo ARG1: $1
}


R1="[[:alnum]]+"
R2="[[:alnum]]{3,}"

va $R1
va $R2
And here are the results:

18:9-jkstill-18 > ksh t3
ARG1: [[:alnum]]+
ARG1: [[:alnum]]3
[ /home/jkstill/bin ]

jkstill-18 > bash t3
ARG1: [[:alnum]]+
ARG1: [[:alnum]]{3,}
[ /home/jkstill/bin ]


Notice that when the script is run with ksh, the '{', '}' and ',' are removed from the regular expression. I could find no combination of quoting and escape characters that could prevent that from happening. This method of command line argument validation could be made to work using ksh if those characters are not used in the regexes. That would be rather limiting though.

One other drawback you may have noticed with this method of validating command line arguments is that when an error condition is encountered, the exit code is always 1. With the [] method it was easy to exit with different codes to indicate the nature of the error. Something similar could likely be done by embedding a code into each set of regexes, but I will leave that as an exercise for the reader.
The complete prototype script, as well as a test script can be downloaded:
The next article will include a set of functions used along with the validate_args() function to make shell scripts a bit more robust.

Oracle 11.2 Grid Infrastructure Installation

So Oracle 11gR2 was released earlier this week and I am an trying to install it on a 4-node cluster. I had previously installed the beta release on the same hardware/operating system configuration, so I was reasonably confident that a similar Advanced installation would work first time.

All went well until the final CVU check. This is the final step in the CVU installation. Oracle 11.2 requires SCAN (Single Client Access Names) to be specified. Three SCAN addresses are required for each cluster, irrespective of the number of nodes. The SCAN addresses are maintained as VIPs by the Clusterware. Instead of connecting to listener VIPs, in Oracle 11.2 applications should connect to the SCAN VIPs and they will then get redirected to the listeners.

In the beta it was possible to configure the SCAN addresses in /etc/hosts. For example I have been using:

192.168.2.97 cluster3-scan.juliandyke.com cluster3
192.168.2.98 cluster3-scan.juliandyke.com cluster3
192.168.2.99 cluster3-scan.juliandyke.com cluster3

Probably incorrect from a networking perspective, but enough to get through the installation procedure.

In the production version, CVU checks that the SCAN address is set in DNS. Tracing the CVU shows that it looks in /etc/nsswitch.conf and then runs nslookup. By default /etc/nsswitch.conf includes

hosts: files dns

which means look in /etc/hosts first and if you don't find the name there then go to DNS. CVU does check /etc/nsswitch.conf, but only to grumble if the hosts entry does not include dns. If the dns entry is present CVU will run nslookup. If it is not present, it does not, however, use /etc/hosts.

This means at present, DNS must be configured for RAC installations. This is going to be a bit frustrating for DBAs who just want to try installing RAC on a test system or in a VM environment as they are going to have to get to grips with DNS or find a workaround.

DNS is also required for the GNS (Typical) installation, so for now it looks unavoidable.

I just wanted the screenshots to be error free so my workaround was

cd /u01/app/11.2.0/grid/bin
mv cluvfy cluvfy.bak

create new file called cluvfy.c containing

main ()
{
return 0;
}

make cluvfy

Retry the CVU which then completed successfully
mv cluvfy.bak cluvfy

I guess I could have used Perl or bash to achieve the same effect.

ASM Dynamic Volume Manager and ASM Clustered File System

Two of the top features in 11gR2 are the ASM Dynamic Volume Manager (ADVM) and ASM Clustered File System (ACFS). What is the big deal about these two?

ADVM allows you to create a volume from an ASM diskgroup. Here is an example where we created a volume called asm_vol1 of 100 MB on a diskgroup called DATA:


ASMCMD [+] > volcreate -G DATA -s 100M asm_vol1

Internally it issues the command


alter diskgroup DATA add volume 'asm_vol1' size 100M;

Now you enable the volume you just created:


ASMCMD [+] > volenable -G DATA asm_vol1

Internally it issues:

alter diskgroup DATA enable volume 'asm_vol1';

You can perform other commands like resize, delete, disable; but more on that later on a full length article.

Now that the volume is created, what can you do with it. Well, like all volumes, you can create a filesystem on it. Here is an example of creating a FS called acfs1:


[root@oradba2 ~]# mkdir /acfs1
[root@oradba2 ~]# /sbin/mkfs -t acfs /dev/asm/asm_vol1-207
mkfs.acfs: version = 11.2.0.1.0.0
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/asm_vol1-207
mkfs.acfs: volume size = 268435456

Register MountPoint Command:


[root@oradba2 ~]# /sbin/acfsutil registry -a -f /dev/asm/asm_vol1-207 /acfs1
acfsutil registry: mount point /acfs1 successfully added to Oracle Registry

If you get an error, use the force option:


[root@oradba2 /]# /sbin/mkfs.acfs -f /dev/asm/asm_vol1-207
mkfs.acfs: version = 11.2.0.1.0.0
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/asm_vol1-207
mkfs.acfs: volume size = 268435456
mkfs.acfs: Format complete.


Now you mount the the filesystem:


[root@oradba2 /]# /bin/mount -t acfs /dev/asm/asm_vol1-207 /acfs1

Now if you will check the filesystem, you will notice a new one - /acfs1


[root@oradba2 /]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
149313708 22429752 119176924 16% /
/dev/sda1 101086 11765 84102 13% /boot
tmpfs 907628 591640 315988 66% /dev/shm
/dev/asm/asm_vol1-207
262144 37632 224512 15% /acfs1

This new filesystem is actually carved out of the ASM diskspace. This can be used as a regular filesystem:


[root@oradba2 /]# cd /acfs1
[root@oradba2 acfs1]# ls
lost+found
[root@oradba2 acfs1]# touch 1

So, what't the big deal about it? Plenty.

First, this is part of the ASM management; so all the bells and whistles of ASM - such as asynch i/o, etc. - applies to this filesystem.

Second, this is now a "cluster" filesystem; it is visible across a cluster. So, now you have a fully functional generic filesystem visible across the cluster.

Third, this is now protected by the Grid infrastructure, if you have installed it. Remember from my earlier posts that in 11gR2 you can now have a restartable grid infrastructure even on a single instance.

More on ASM Dynamic Volume Manager later in a full length article. But I hope this makes you really interested.