Search

OakieTags

Who's online

There are currently 0 users and 35 guests online.

Recent comments

Oakies Blog Aggregator

Bitmap Efficiency

An interesting observation came up on the Oracle-L list server a few days ago that demonstrated how clever the Oracle software is at minimising run-time work, and how easy it is to think you know what an execution plan means when you haven’t actually thought through the details – and the details might make a difference to performance.

The original question was about a very large table with several bitmap indexes, and an anomaly that appeared as a query changed its execution plan.  Here are the critical sections from the plans (extracted from memory with rowsource execution statistics enabled):

--------------------------------------------------------------------------------------------------------
|  Id |Operation                        | Name       | Starts | E-Rows | A-Rows |     A-Time | Buffers |
--------------------------------------------------------------------------------------------------------
|   6 |    TABLE ACCESS BY INDEX ROWID  |       FACT |      1 |      1 |     24 |00:00:00.01 |      31 |
|   7 |     BITMAP CONVERSION TO ROWIDS |            |      1 |        |     24 |00:00:00.01 |       7 |
|   8 |      BITMAP AND                 |            |      1 |        |      1 |00:00:00.01 |       7 |
|*  9 |       BITMAP INDEX SINGLE VALUE |     FACT_0 |      1 |        |      1 |00:00:00.01 |       3 |
|* 10 |       BITMAP INDEX SINGLE VALUE |  FACT_DIM1 |      1 |        |      4 |00:00:00.01 |       4 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     9 - access("FACT"."C0"=243001)
    10 - access("FACT"."C1"="DIMENSION1"."ID")


-------------------------------------------------------------------------------------------------------
|  Id | Operation                      | Name       | Starts | E-Rows | A-Rows |     A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
|   7 |    BITMAP CONVERSION TO ROWIDS |            |      5 |        |      8 |00:00:00.01 |     119 |
|   8 |     BITMAP AND                 |            |      5 |        |      1 |00:00:00.01 |     119 |
|*  9 |      BITMAP INDEX SINGLE VALUE |  FACT_DIM1 |      5 |        |     20 |00:00:00.01 |      28 |
|* 10 |      BITMAP INDEX SINGLE VALUE |  FACT_DIM2 |      5 |        |    140 |00:00:00.01 |      78 |
|* 11 |      BITMAP INDEX SINGLE VALUE |     FACT_0 |      5 |        |      5 |00:00:00.01 |      13 |
|  12 |   TABLE ACCESS BY INDEX ROWID  |       FACT |      8 |      1 |      8 |00:00:00.01 |       8 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     9 - access("FACT"."C1"="DIMENSION1"."ID")
    10 - access("FACT"."C2"="DIMENSION2"."ID")
    11 - access("FACT"."C0"=243001)

The first plan shows the steps leading to a single access (Starts = 1) to the FACT table after combining two bitmap indexes; the second shows the second child of a nested loop join where Oracle has combined three bitmaps indexes to access the FACT table – operation 7 (and its descendants) execute 5 times in this case. I’ve included the related parts of the predicate section so that you can see that the predicates at operations 9 and 10 of the first plan are the same as the predicates at operations 9 and 11 of the second plan.

So here’s the question – if one access to fact_dim1 requires 4 buffer visits, why does it take 28 buffer visits to do the same thing 5 times (and it is with the same value every time); conversely if one access to fact_0 requires 3 buffer visits, why do 5 visits to do the same thing take only 13 buffer visits. (Note: the arithmetic is made a little more obscure by the way in which index branch blocks may be pinned during nested loop joins.)

Then there’s a further question – not visible in the plan – the A-Rows in the “BITMAP INDEX SINGLE VALUE” operation is the number of bitmap sections in the rowsource, and we can see that the key values for index fact_dim2 have a significant number of bitmap chunks for a single key (5 executions returned 140 bitmap chunks). This scale, though, is true of all three indexes – in fact a follow-up email pointed out that a typical key value in EVERY ONE of the three indexes consisted of about 100 bitmap chunks, so why can’t we see those hundreds in the execution plan ?

So this is where we’re at: we have an execution plan where we haven’t visited all the bitmap chunks for a bitmap key, and the order in which the bitmap indexes are used in the plan seems to have some effect on the choice of leaf-blocks you visit when accessing the chunks. So (a) could a change in the order of indexes make a significant difference to the number of bitmap chunks you visit and the resulting performance, and (b) is there a way to control the order in which you visit the indexes. That’s where the note starts to get a bit technical – if you don’t want to read any more the answers are: (a) yes but probably not significantly and (b) yes.

Demo

To investigate what goes on inside a “BITMAP AND” I created a table with two bitmap indexes and used a very large setting for pctfree for the indexes so that they had to be stored with a large number of bitmap chunks per key. Here’s the code that I used, with some results from an instance of 12.1.0.2:


create table people
nologging
as
with generator as (
        select  --+ materialize 
                rownum id 
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum-1, 1e2)      id_town_home,
        trunc((rownum-1)/1e4)   id_town_work,
        rpad('x',10,'x')        small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'PEOPLE',
                method_opt       => 'for all columns size 1'
        );
end;
/

create bitmap index pe_home on people(id_town_home) nologging pctfree 95;
create bitmap index pe_work on people(id_town_work) nologging pctfree 95;

select
        index_name, distinct_keys, num_rows, leaf_blocks, avg_leaf_blocks_per_key
from
        user_indexes
where
        table_name = 'PEOPLE'
order by
        index_name
;


INDEX_NAME           DISTINCT_KEYS   NUM_ROWS LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY
-------------------- ------------- ---------- ----------- -----------------------
PE_HOME                        100      30399       15200                     152
PE_WORK                        100       1800         907                       9

As you can see I’ve generated two columns (id_town_home, id_town_work) with 100 distinct values and 10,000 rows each, but with very different data distributions – the rows for any given value for id_town_home are uniformly spread across the entire table, every hundredth row; while the rows for any given value of id_town_work are very tightly clustered as a group of 10,000 consecutive rows. As a consequence the index entry (bitmap string) for a typical key value for id_town_home is enormous and has to be broken into 304 chunks spread across 152 leaf blocks (2 index entries per leaf block), while the index entry for a typical key value for id_town_work is much shorter, but still requires 18 chunks spread across 9 leaf blocks.

So what will I see if I run the following query, and force it to use a BITMAP AND of the two indexes, in the two different orders:

select
        /*+ index_combine(pe) */
        max(small_vc)
from
        people pe
where
        id_town_home = 50
and     id_town_work = 50
;

Based on a very simple interpretation of the typical execution plan and using the index stats shown above we might expect to see roughly A-Rows = 18 with 9 buffer gets (plus a few more for segment headers and branch blocks) on the id_town_work index and A-Rows = 304 with 152 buffer gets on the id_town_home index to allow Oracle to generate and compare the two bit strings – but here are the two plans with their execution stats, generated in 12.1.0.2, and each run after flushing the buffer cache:

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |      1 |        |      1 |00:00:00.01 |     118 |    117 |
|   1 |  SORT AGGREGATE                      |         |      1 |      1 |      1 |00:00:00.01 |     118 |    117 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| PEOPLE  |      1 |    100 |    100 |00:00:00.01 |     118 |    117 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |         |      1 |        |    100 |00:00:00.01 |      18 |     17 |
|   4 |     BITMAP AND                       |         |      1 |        |      1 |00:00:00.01 |      18 |     17 |
|*  5 |      BITMAP INDEX SINGLE VALUE       | PE_WORK |      1 |        |     18 |00:00:00.01 |      14 |     13 |
|*  6 |      BITMAP INDEX SINGLE VALUE       | PE_HOME |      1 |        |      4 |00:00:00.01 |       4 |      4 |
-------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |      1 |        |      1 |00:00:00.01 |     122 |    120 |
|   1 |  SORT AGGREGATE                      |         |      1 |      1 |      1 |00:00:00.01 |     122 |    120 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| PEOPLE  |      1 |    100 |    100 |00:00:00.01 |     122 |    120 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |         |      1 |        |    100 |00:00:00.01 |      22 |     20 |
|   4 |     BITMAP AND                       |         |      1 |        |      1 |00:00:00.01 |      22 |     20 |
|*  5 |      BITMAP INDEX SINGLE VALUE       | PE_HOME |      1 |        |      5 |00:00:00.01 |       8 |      7 |
|*  6 |      BITMAP INDEX SINGLE VALUE       | PE_WORK |      1 |        |     18 |00:00:00.01 |      14 |     13 |
-------------------------------------------------------------------------------------------------------------------

We have NOT touched anything like the entire bit-string for the id_town_home index – a bit-string that spans 152 leaf blocks! Clearly Oracle is doing something clever to minimise the work, and it’s so clever that switching the order of these two extremely different indexes in the plan has made virtually no difference to the work done. Obviously I can’t tell you exactly what the code is doing, but I think I can produce a reasonable guess about what’s going on.

The pe_work index has the smaller number of leaf blocks per key, which makes it the better starting choice for the AND in this case, so the optimizer’s default starting action was to pick the first couple of chunks of that index key value; and Oracle immediately sees that the first rowid that it could possibly need in its result set is roughly in the middle of the table – remember that the “key” columns of a bitmap index are (real_key, first_rowid_of chunk, last_rowid_of_chunk, compressed_bitstring).

Since it now knows the lowest possible rowid that it could need Oracle can now probe the pe_home index by (id_town_home=50, {target_rowid}) – which will let it go to a bitmap index chunk that’s roughly in the middle of the full range of 152. Then Oracle can expand the bitstrings from the chunks it has, reading new chunks as needed from each of the indexes until the 18 chunks / 9 leaf block from the pe_work index have been used up (and that range would have aligned with just two or three chunks from the pe_home index) at which point Oracle can see there’s no more rows in the table that could match both predicates and it doesn’t need to read the next 75 chunks of the pe_home index.

Conversely, when I forced Oracle to use the (inappropriate) pe_home index first, it read the first couple of chunks, then read the first couple of chunks of the pe_work index, at which point it discovered that it didn’t need any of the pe_home index prior to (roughly) chunk 75, so it jumped straight to the right chunk to align with pe_work and carried on from there. That’s why the forced, less efficient, plan that visited pe_home first visited just a couple more leaf blocks than the plan the optimizer selected for itself.

Bottom line on performance (tl;dr) – Oracle is sufficiently smart about checking the start and end ranges on bitmap indexes (rather then arbitrarily expanding the entire bitmap for each key) that even for very large bitmap index entries it will probably only access a couple of “redundant” leaf blocks per index even if it picks the worst possible order for using the indexes. You’re far more likely to notice Oracle picking the wrong indexes (because you know the data better) than you are to spot it using the right indexes in the wrong order – and given that bitmap indexes tend to be relatively small and well buffered (compared to the tables), and given the relatively large number of rows we pick by random I/O from fact tables, a little extra work in the bitmap indexes is unlikely to make a significant difference to the performance of most queries.

Closing fact: in the unlikely circumstances that you do spot the special case where it will make a difference (and it will probably be a difference in CPU usage) then you can dictate the order of the indexes with the undocumented bitmap_tree() hint.  I may get round to writing up the variations one day but, for this simple case, the index_combine() hint that I used to force the BITMAP AND turned into the following bitmap_tree() hint in the outline:

bitmap_tree(@sel$1 pe@sel$1 and((people.id_town_work) (people.id_town_home)))

bitmap_tree( @query_block     table_name@query_block     and( ({first index definition}) ({second index definition}) ) )

Obviously not suitable to throw into production code casually – check with Oracle support if you think it’s really necessary – but if you wanted to reverse the order of index usage in this case you could just swap the order of the index definitions. If you thought there was a third index that should be used you could include its definition (note that it’s table_name.column_name – the index definition – in the brackets).

My reference: bitmap_control_02.sql

(Not So) Internal Dialogue

internal-dialogueWhen I wrote about rehearsals in my public speaking tips series, I mentioned talking through small sections of the presentations as a means for rehearsals. I do this a lot! I live on my own, so this is not an internal dialogue. I say this stuff out loud.

This morning I was talking through some ideas as I left the house and cleared the ice off the car. I continued during the journey to work, including when I got out of the car to get a coffee from the Costa Express at the garage. Even as I was unlocking the office door.

If you happen see me doing this in the street, I’m not talking to an imaginary friend. It’s an imaginary audience. If you’re gonna do crazy, you might as well do it at scale… :)

Cheers

Tim…


(Not So) Internal Dialogue was first posted on January 28, 2016 at 11:52 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

FIRST_ROWS vs FIRST_ROWS_n

You might be thinking that FIRST_ROWS is pretty much the same as FIRST_ROWS_1, since they are both after the first row. However, there are some important distinctions to make here. The traditional ‘FIRST_ROWS’ parameter is not a genuine cost-based option, more a hybrid between the rule and cost. This can be observed via the following example:



SQL> create table T
  2  as select rownum x, rpad(rownum,100,'x') padding
  3  from dual
  4  connect by level <= 2000
  5  order by dbms_random.value;

Table created.

SQL>
SQL> create index IX on T ( x ) ;

Index created.

SQL> set autotrace traceonly explain
SQL>
SQL> select * from T where x < 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    99 | 10395 |    12   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    99 | 10395 |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"<100)

SQL> alter session set optimizer_mode = first_rows;

Session altered.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from T where x < 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 3947747388

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |    99 | 10395 |    99   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |    99 | 10395 |    99   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IX   |    99 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"<100)

Notice that the optimizer has chosen a path with a HIGHER cost, so in the case of first_rows, it is not solely the cost that plays a part as a determining factor. There are some heuristics in place to bias the optimizer toward index access.

You could quite naturally assume that the FIRST_ROWS_1 is the same as FIRST_ROWS, but this is NOT the case. The optimizer use the cost calculation and no other heuristics. You can see that the ‘n’ in FIRST_ROWS_n as an important driver in determining cardinalities for query result sets.



SQL> alter session set optimizer_mode = first_rows_1;

Session altered.

SQL> select * from T where x < 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 3947747388

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |     2 |   210 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |     2 |   210 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IX   |       |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"<100)

SQL> alter session set optimizer_mode = first_rows_10;

Session altered.

SQL> select * from T where x < 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 3947747388

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |    11 |  1155 |    13   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |    11 |  1155 |    13   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IX   |       |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"<100)

SQL>
SQL> alter session set optimizer_mode = first_rows_100;

Session altered.

SQL> select * from T where x < 100;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    99 | 10395 |    12   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    99 | 10395 |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"<100)

SQL>
SQL>


So we can start to see the methodology at work here. The optimizer is "taking our word" in that, (independent of what the table statistics imply), it will only need to use the first 'n' rows in the relevant source. It is a true cost adjustment rather than a change to the optimization scheme (as the original FIRST_ROWS is)

Add primary key.

I thought I had written this note a few years ago, on OTN or Oracle-L if not on my blog, but I can’t find any sign of it so I’ve decided it’s time to write it (again) – starting as a question about the following code:


create table t1
as
with generator as (
        select  rownum  id
        from            dual
        connect by
                        rownum <= 1000
)
select
        rownum                                  id,
        trunc((rownum-1)/50)                    clustered,
        mod(rownum,20000)                       scattered,
        lpad(rownum,10)                         vc_small,
        rpad('x',100,'x')                       vc_padding
from
        generator       g1,
        generator       g2
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

alter system flush buffer_cache;

alter table t1 add constraint t1_pk primary key(id, scattered);

I’ve generated a table with 1,000,000 rows, including a column that’s guaranteed to be unique; then I’ve added a (two-column) primary key constraint to that table.

Because of the guaranteed unique column the call to add constraint will succeed. Because Oracle will automatically create a unique index to support that constraint it will have to do a tablescan of the table. So here’s the question: HOW MANY TIMES will it tablescan that table (and how many rows will it scan) ?

Space for thought …

The answer is three tablescans, 3 million rows.

Oracle will scan the table to check the validity of adding a NOT NULL definition and constraint for the id column, repeat the scan to do the same for the scattered column, then one final scan to accumulate the key data and rowids to sort and create the index.

Knowing this, you may be able to find ways to modify bulk data loading operations to minimise overheads.

The most recent version I’ve tested this on is 12.1.0.2.

See also: https://jonathanlewis.wordpress.com/2012/03/02/add-constraint/

My reference: pk_overhead.sql

Histograms on data (not the optimizer)

I’ve been doing a series on Analytics here but if you need to group data into ranges, you don’t necessarily need a fancy analytic. The function WIDTH_BUCKET can easily determine histogram groupings within a table.


SQL> SELECT CUST_ID, CUST_LAST_NAME, CUST_CREDIT_LIMIT,
  2     WIDTH_BUCKET(CUST_CREDIT_LIMIT, 100, 5000, 10) credit_bucket
  3     FROM sh.customers
  4     WHERE  country_id = 52787
  5     ORDER BY CUST_CREDIT_LIMIT;

   CUST_ID CUST_LAST_NAME       CUST_CREDIT_LIMIT CREDIT_BUCKET
---------- -------------------- ----------------- -------------
     44282 Justice                           1500             3
     50671 Sandoval                          1500             3
     26284 Lotto                             1500             3
      4827 Kessel                            1500             3
     27671 Lin                               1500             3
     17284 Fellows                           1500             3
      9671 Rittenour                         1500             3
     35283 Gatewood                          1500             3
      9670 Oppy                              1500             3
     35284 Thomas                            1500             3
     18670 Callihan                          1500             3
     18671 Sager                             1500             3
    102218 Hornick                           1500             3
     26283 Geiss                             1500             3
     45827 Rowe                              1500             3
     42898 Haske                             3000             6
      5519 Ballenger                         3000             6
     33898 Batterton                         3000             6
     46518 Orm                               3000             6
     24899 Lightfoot                         3000             6
     47211 Cartwright                        3000             6
     20744 Gravel                            3000             6
    102723 Myczkowski                        3000             6
    101478 Bishop                            3000             6
      1050 Overton                           3000             6
       700 Burnns                            3000             6
      1023 Newcomer                          3000             6
     38744 Tazelar                           3000             6
     29744 Durby                             3000             6
     41514 Burgess                           5000            11
     45828 Wood                              5000            11
    102129 Cay                               5000            11
     40128 Cain                              5000            11
     23515 Figgens                           5000            11
     32514 Lengel                            5000            11
    101613 Spivak                            7000            11
       470 Sandstrum                         7000            11
       763 Dutton                            7000            11
     36667 Capps                             7000            11
       446 Jeffreys                          7000            11
     42302 Everrett                          7000            11
     46519 East                              7000            11
       708 Door                              7000            11
    102178 Bloom                             7000            11
       123 Rockwell                          7000            11
    104403 Fenton                            7000            11
     13133 Colven                            7000            11
     27666 Remler                            7000            11
      4133 Carr                              9000            11
     10362 Tate                              9000            11
     19362 Rosenblum                         9000            11
     28362 Titus                             9000            11
     28458 Kohler                            9000            11
      6208 Vail                              9000            11
      2749 Kimball                           9000            11
     15208 Trimmer                           9000            11
     11748 Stokley                           9000            11
     24208 Baley                             9000            11
     26975 Baer                              9000            11
     35975 Grubb                             9000            11
     44974 Grier                             9000            11
       529 Barone                            9000            11
      2750 Jansen                           10000            11
    103571 Kennedy                          10000            11
       346 Elliott                          10000            11
     47209 Lieberman                        10000            11
     34590 Barden                           10000            11
     28469 Adams                            10000            11
      6209 Crocker                          10000            11
     43589 Eppling                          10000            11
    100761 Zwolinsky                        10000            11
     15209 Fernandez                        10000            11
    100824 Roy                              11000            11
    103845 Moy                              15000            11
     35402 Elkin                            15000            11

75 rows selected.

where the 4 arguments are:

  • Expression to be evaluated
  • Minimum (start) value
  • Maximum (end) value
  • Number of buckets in the distribution

Notice that if you extend outside the min and max value, the width bucket returns a value one less or one more then the bucket count.

Automatic date formats

Notice in all of the examples below that the date string does not match the format mask.

When a date conversion fails, Oracle tries some similar formats to try succeed. This is actually documented as well here

 

Original Format Element

Additional Format Elements to Try if Original fails ‘MM’ ‘MON’ and ‘MONTH’ ‘MON’ ‘MONTH’ ‘MONTH’ ‘MON’ ‘YY’ ‘YYYY’ ‘RR’ ‘RRRR’

 


SQL> select to_date('01JAN2000','ddmmyyyy') from dual;

TO_DATE('
---------
01-JAN-00

SQL> select to_date('01JAN1999','ddmmyyyy') from dual;

TO_DATE('
---------
01-JAN-99

SQL> select to_date('01JAN2000','dd/mm/yy') from dual;

TO_DATE('
---------
01-JAN-00

SQL>  select to_date('01JAN2000','dd-mm-yyyy') from dual;

TO_DATE('
---------
01-JAN-00

SQL> 


Needless to say, relying on any kind of implied format for dates (or any other datatype for that matter) is generally going to get you intro strife eventually Smile

SQL*Plus quick tip

SQL Plus does not appear to be bothered by the direction of slashes, ie “/” or “\” in its scripts

Hence as long as you have a “C:\tmp” folder on your Windows drive (or wherever your SQL Pus binaary is installed/running from), you can do:


spool /tmp/blah
@/tmp/my_file.sql

and it will work on Unix and Windows without alteration.

(This also works for SQLcl as well by the way)

SAMPLE costing

People often think when using the SAMPLE clause, that because they are only dealing with a subset of the data, that immediately it should be a much faster operation. And whilst sometimes this is indeed the case, it is not a guarantee. The optimizer probably gives us a hint ( no pun intended) as to how we should interpret the SAMPLE clause.

Lets compare a standard scan of a table, with a sampled scan.



SQL> create table T as select * from dba_Objects;

Table created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from T;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 96399 |    10M|   451   (1)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 96399 |    10M|   451   (1)| 00:00:01 |
--------------------------------------------------------------------------

SQL>
SQL> select * from T sample (1);

Execution Plan
----------------------------------------------------------
Plan hash value: 2767392432

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   964 |   108K|   450   (1)| 00:00:01 |
|   1 |  TABLE ACCESS SAMPLE| T    |   964 |   108K|   450   (1)| 00:00:01 |
----------------------------------------------------------------------------

The “Rows” seems to make sense, ie, a one percent sample will return about one percent of the rows, but notice that the Cost is the same in both cases. Now at first thought, you might be thinking that this is wrong – after all, if I only have to produce 1/100th of the data, then why would it cost the same ? Well, its because we are sampling rows. To get 1/100th of the rows, it is still quite possible that we’d have to scan every single block in the table – that’s what sampling is, getting a pseudo-randomly distributed set of rows.

What most people think sampling is, is to pick a random subset of blocks, and then grab the rows from that. You can indeed so that, and the costs are reflected accordingly.




SQL> select * from T sample block (1);

Execution Plan
----------------------------------------------------------
Plan hash value: 2767392432

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   964 |   108K|     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS SAMPLE| T    |   964 |   108K|     6   (0)| 00:00:01 |
----------------------------------------------------------------------------


Dont forget though, that as an empty table grows, then its quite possible that rows clumped together in blocks are not as random a sample of the data you might think Smile

My BIWA Summit Presentations

Here are the two BIWA Summit 2016 presentations I delivered today. The first one is a collection of high level thoughts (and opinions) of mine and the 2nd one is more technical:

 

NB! If you want to move to the "New World" - and benefit from the awesomeness of Hadoop, without having to re-engineer your existing applications - check out Gluent, my new startup that will make history! ;-)

Why ask how, when why is so much more fun?

OK. So the original quote from Spawn is exactly the opposite, but let’s go with it… :)

A few times in the past I’ve been asked questions and started to give a direct answer, then someone smarter has jumped in and asked the killer question. Why? Quite often it’s easy to answer the initial question, so rather than understand the reason for the question, you just respond and pat yourself on the back. That’s great, but without knowing the context of the question, the “right answer” could actually be the “wrong answer”. As Tom always says, “The answer to every question is *it depends*!”

I had another situation like that recently. The questions was, “How can I install VNC on a Linux box?” Pretty simple answer and I know a guy who wrote an article on that, so I pointed them to the article. Job done!

Then I got a pang of guilt and the conversation went like this…

  • Q: Why do you want to install VNC?
  • A: Because my boss told me too.
  • Q: By why does your boss want you to install VNC?
  • A: Because the network connection breaks sometimes, making a “ssh -X user@host” a dodgy solution.

Now I have nothing against VNC itself, but installing it on a server is one more attack vector to worry about, especially if it’s not necessary. Knowing the context allowed me to talk about silent installs, command line DBCA, running things in the background, even the screen command.

If the person goes away and installs VNC, that’s no skin off my nose, but just answering how, without knowing the context could well have opened them, or me, up to criticism down the line.

So next time you answer a question and are about to enable smug mode, ask yourself if you have actually helped, or just taken the easy route.

Cheers

Tim…


Why ask how, when why is so much more fun? was first posted on January 26, 2016 at 7:43 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.