Search

OakieTags

Who's online

There are currently 0 users and 32 guests online.

Recent comments

Affiliations

Optimisation

Here’s a recent request from the OTN database forum – how do you make this query go faster (tkprof output supplied):

 select a.rowid
   from  a, b
   where A.MARK IS NULL
     and a.cntry_code = b.cntry_code and b.dir_code='XX' and b.numb_type='XXX'
     and upper(Trim(replace(replace(replace(replace(replace(replace(replace(a.co_name,'*'),'&'),'-'),'/'),')'),'('),' '))) like
         upper(Trim(substr(replace(replace(replace(replace(replace(replace(replace(b.e_name,'*'),'&'),'-'),'/'),')'),'('),' '),1,25)))||'%';

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        3   3025.53    3260.11       8367       7950          0          31
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        6   3025.54    3260.13       8367       7950          0          31

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 74

Rows     Row Source Operation
-------  ---------------------------------------------------
     31  HASH JOIN
 302790   INDEX FAST FULL SCAN OBJ#(39024) (object id 39024)   -- B 500,000 in the table
  55798   TABLE ACCESS FULL OBJ#(78942)                        -- A 175,000 in the table



-- and from some "explain plan" tool
SELECT STATEMENT  CHOOSE Cost: 333  Bytes: 52,355,940  Cardinality: 608,790 
3 HASH JOIN  Cost: 333  Bytes: 52,355,940  Cardinality: 608,790 
  1 INDEX FAST FULL SCAN UNIQUE B_PK Cost: 4  Bytes: 503,022  Cardinality: 12,898 
    2 TABLE ACCESS FULL A Cost: 215  Bytes: 3,150,034  Cardinality: 67,022

One thing you might note from the spartan tkprof output – this is an old version of Oracle (9.2.0.1 to be exact).

The first thing to do is note that most of the time is spent on the CPU – and maybe that multiply cascading replace() has something to do with it.  Now replace() and translate() are things I use so rarely that I usually get them wrong first time, but I think the predicate could be replaced by:

upper(translate(a.co_name, 'x*&-/)( ', 'x')) like upper(substr(translate(b.e_name, 'x*&-/)( ', 'x'),1,25))||'%'

Beyond making the code slightly less eye-boggling, though, I don’t think this is going to help much. Consider the information we have about the sizes of the rowsources involved.

If we can trust the tkprof row counts as being the complete output from the first execution of the statement (there seem to have been 2 in the trace file) – we selected 300,000 rows from one table and 56,000 rows from the other and then joined them with a hash join. A hash join requires equality predicates, and the only join predicate in the query that could be used is the one “a.cntry_code = b.cntry_code”.

Now, if cntry_code is short for “country code” we have a scaling problem: there are only about 600 countries in the world, so on average each row in the A table (56,000 rows acquired) is going to find roughly 500 rows in the B table (300,000 rows divided across 600 countries). So at run time the hash join will generate a rowsource of at least 56,000 * 500 = 28 Million rows; then Oracle is going to do that complicated bit of textual manipulation on two columns, compare them, and find that ultimately only 31 rows match !

So how can we do less work ?

If we’re lucky we can make the hash join much more efficient by thinking about what that nasty textual predicate means. We compare to see if one string looks like it’s starting with the first 25 characters of the other string – but if it does then the two strings have to be identical on the first 25 characters, and a hash join works with equality. So let’s just add in a new predicate to see what effect it has:

upper(substr(translate(a.co_name, 'x*&-/)( ', 'x'),1,25)) = upper(substr(translate(b.e_name, 'x*&-/)( ', 'x'),1,25))

I’ve made the suggestion on the forum – now I’m waiting to see if it has a beneficial effect (or whether I’ve made a silly mistake in my logic or guesswork)