Search

Top 60 Oracle Blogs

Recent comments

Fetch First vs. Rownum

I’ve pointed out fairly frequently that if you’re running Standard Edition but would like to take advantage of a few features of the Partitioning option then you might be able to do something appropriate with Partition Views (but I’ve just discovered while searching my blog for a suitable item to link to that I haven’t published any of my PV notes on the blog).

I’ve also pointed out that while 12c allows you to use “fetch first N rows” instead of “where rownum <= N” there’s a hidden threat to using the feature because “fetch first N” turns into a hidden row_number() over() analytic function.

Today’s note is a combination of these two topics, prompted by a request to solve a particular coding issue that has appeared a few times on the Oracle Developer Forum and is likely to be recognisable to a number of people.

I’ll start with a very simple model demonstrateing the simplest use of “fetch first N”:


rem
rem     Script:         fetch_first_union.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t_odd
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_odd modify(flag not null, class not null, id not null);
alter table t_odd add constraint to_chk_odd check (flag = 'O');

create index to_i1 on t_odd(class, id);

With this data set I want to write a query that selects rows for class A where id > 9500, ordered by id – but I only want the first two rows. Here’s a very simple query that gets the result I want, followed by the execution plan from 12.2.0.1 (the A-Rows and E-Rows from 19.3 are slightly different):

set serveroutput off
set linesize 180

alter session set statistics_level = all;

select  /*+ index(t_odd (class, id)) */
        *
from
        t_odd
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    95 (100)|      2 |00:00:00.01 |       6 |
|*  1 |  VIEW                         |       |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       6 |
|*  2 |   WINDOW NOSORT STOPKEY       |       |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       6 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T_ODD |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       6 |
|*  4 |     INDEX RANGE SCAN          | TO_I1 |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_ODD"."CLASS","T_ODD"."ID")<=2)
   4 - access("CLASS"='A' AND "ID">9500)

The plan – with its stats – shows us that we’ve done an index range scan of an index which will return the data in exactly the order we want, and the “fetch first 2 rows” has been translated into the row_number() over() that we expected; but to our great joy the “window sort stopkey” makes the processing stop very early because Oracle recognises that the base data is arriving in the right order so it isn’t necessary to fetch all of it and sort it. The A-Rows column confirms this interpretation of what has happened.

You might notice, by the way, that the optimizer has costed the query as if it were fetching all the rows even though it “knows” that it’s going to fetch only the first two rows. That’s why I had to include the index hint to make the optimizer use the obvious index – a popular alternative is to use the /*+ first_rows(N) */ hint where N matches, or is similar to, the number of rows you want to fetch. If I had omitted the hint the optimizer would have done a full tablescan and then applied a “window sort pushed rank” operation to sort and limit the result to 2 rows.

So now we come to the real problem: the user has a “current” table and an identical “history” table, and would like to replace the table reference with a reference to a union all view for their clients on Standard Edition, or to a partitioned table for clients running Enterprise Edition – and they don’t really want to do any other code changes. So let’s see what happens when we model the union all. I started with a table called t_odd that held only odd values for id, so I’m going to add a table called t_even that holds only even values for id.

create table t_even
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

alter table t_even modify(flag not null, class not null, id not null);
alter table t_even add constraint te_chk_even check (flag = 'E');

create index te_i1 on t_even(class, id);

create or replace view v_bare 
as
select * from t_odd
union all
select * from t_even
/

select
        /*+ 
                index(vw.t_odd  (class, id)) 
                index(vw.t_even (class, id)) 
        */
        *
from
        v_bare vw
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;

As you can see t_even is an identically structured table with similar data, and I’ve created a union all view on top of the two tables, changing the query to reference the view rather than referencing a table. Thanks to the costing anomaly (combined with the small size of the tables) I’ve had to supply a couple of “global” hints to tell the optimizer to use the indexes to access the two tables. So how well does the optimizer do its job when we have a union all view?


----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |        |      1 |        |   192 (100)|      2 |00:00:00.01 |     190 |       |       |          |
|*  1 |  VIEW                                   |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |     190 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK               |        |      1 |    404 |   192   (2)|      2 |00:00:00.01 |     190 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                                 | V_BARE |      1 |    404 |   191   (1)|    404 |00:00:00.01 |     190 |       |       |          |
|   4 |     UNION-ALL                           |        |      1 |        |            |    404 |00:00:00.01 |     190 |       |       |          |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_ODD  |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  6 |       INDEX RANGE SCAN                  | TO_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
|   7 |      TABLE ACCESS BY INDEX ROWID BATCHED| T_EVEN |      1 |    202 |    95   (0)|    202 |00:00:00.01 |      95 |       |       |          |
|*  8 |       INDEX RANGE SCAN                  | TE_I1  |      1 |    202 |     2   (0)|    202 |00:00:00.01 |       2 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "VW"."CLASS","VW"."ID")<=2)
   6 - access("CLASS"='A' AND "ID">9500)
   8 - access("CLASS"='A' AND "ID">9500)

Answer: Bad luck, the optimizer isn’t smart enought to find a cheap way through this query. It’s fetched all the relevant data from the two tables before applying the window sort (which it does with some efficiency – the pushed rank) to produce the right answer. As you can see from the A-Rows column, though, it’s had to acquire a couple of hundred rows from each table before getting down to the 2 rows we wanted.

Partitioned Tables

So let’s try to solve the problem by buying into the partitioning option and creating a list-partitioned table with two partitions, one flagged for current data and one flagged for history data – or ‘O’dd and ‘E’ven data as I’ve created them in my model.



create table t_pt (
        flag,
        class,
        id,
        v1,
        padding
)
partition by list (flag) (
        partition pO values('O'),
        partition pE values('E')
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'O'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum - 1                  id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;

insert into t_pt
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        'E'                             flag,
        chr(65 + mod(rownum,26))        class,
        2 * rownum                      id,
        lpad(2 * rownum,10,'0')         v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4   -- > comment to avoid WordPress format issue
;


create index tp_i1 on t_pt(class, id) local;
alter table t_pt modify (flag not null, class not null, id not null);

execute dbms_stats.gather_table_stats(user,'t_pt',method_opt=>'for all columns size 1', cascade=>true, granularity=>'ALL')

Note particularly that I have created a local index on this partitioned table – so there’s a very close correspondance between the two tables in the previous example and the two partitions in this example. Here’s the plan when I query the partitioned table for the first two rows:


select  /*+ index(t_pt (class, id)) */
        *
from
        t_pt
where
        class = 'A'
and     id > 9500
order by
        class, id
fetch 
        first 2 rows only
;


--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |       |      1 |        |   190 (100)|      2 |00:00:00.01 |     189 |       |       |          |
|*  1 |  VIEW                                        |       |      1 |      2 |   190   (2)|      2 |00:00:00.01 |     189 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK                    |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |     189 |  2048 |  2048 | 2048  (0)|
|   3 |    PARTITION LIST ALL                        |       |      1 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|    404 |00:00:00.01 |     189 |       |       |          |
|*  5 |      INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|    404 |00:00:00.01 |       4 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T_PT"."CLASS","T_PT"."ID")<=2)
   5 - access("CLASS"='A' AND "ID">9500)

The optimizer has let us down again. The plan shows us that we have to acquire all the relevant data from the two partitions before applying the row_number() analytic function and discarding all but the two rows we want. (Again we can check the A-Rows column to see that we have started by fetching a total of 404 rows from the table.)

But what happens if we fall back to the good old-fashioned (non-standard) rownum method:


select
        *
from    (
        select  /*+ index(t_pt (class, id)) */
                *
        from
                t_pt
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        )
where
        rownum <= 2
;

----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                      | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                               |       |      1 |        |   190 (100)|      2 |00:00:00.01 |       6 |       |       |          |
|*  1 |  COUNT STOPKEY                                 |       |      1 |        |            |      2 |00:00:00.01 |       6 |       |       |          |
|   2 |   VIEW                                         |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY                       |       |      1 |    404 |   190   (2)|      2 |00:00:00.01 |       6 |  2048 |  2048 | 2048  (0)|
|   4 |     PARTITION LIST ALL                         |       |      1 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  5 |      COUNT STOPKEY                             |       |      2 |        |            |      4 |00:00:00.01 |       6 |       |       |          |
|   6 |       TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T_PT  |      2 |    404 |   189   (1)|      4 |00:00:00.01 |       6 |       |       |          |
|*  7 |        INDEX RANGE SCAN                        | TP_I1 |      2 |    404 |     4   (0)|      4 |00:00:00.01 |       4 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=2)
   3 - filter(ROWNUM<=2)
   5 - filter(ROWNUM<=2)
   7 - access("CLASS"='A' AND "ID">9500)

Oracle really knows how to use rownum well – notice how there is a count stopkey operation as a child to the partition list all operation, and that’s where our rownum <= 2 predicate is first applied. For each partition Oracle finds “the first two rows” and after it has collected two rows from every partition it sorts them (again with a stopkey) to find the top two in that subset. Check the A-Rows column – we selected a total of 4 rows from the table (2 per partition) and reduced that to 2 rows at operation 3.

Conclusion

There are some patterns of processing where partitioned tables can be a lot friendlier to coders than partition views; if you do have to stick with Standard Edition you can usually get what you want but the coding investment may be significantly higher. Even with partitioned tables, though, there are some “old-fashioned” Oracle methods that do a much nicer job than some of the new-fangled “ANSI” mechanisms.

Footnote

Part of the problem presented here revolves around the desire to keep a pattern of SQL generation that already exists, doing nothing more than replacing a table name with a view (or partitioned table) name.

As we’ve seen, if you start with a simple heap table and try to replace it with a partitioned table you have to use the rownum mechanism rather than the fetch first N rows mechanism.

If you’re running with Standard Edition you can’t do anything simple to replace a table name with the name of a union all view; you’d have to change your code generator to apply all the predicates twice (once for each table) and then apply the rownum predicate or fetch first directive again outside the union all. In other words you have to emulate exactly what Oracle EE manages to do with partitioned tables and rownum.

select
        flag, class, id, v1
from
        (
        select
                /*+ index(t_odd (class, id)) */
                flag, class, id, v1
        from
                t_odd
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
        union all
        (
        select
                /*+ index(t_even (class, id)) */
                flag, class, id, v1
        from
                t_even
        where
                class = 'A'
        and     id > 9500
        order by
                class, id
        fetch
                first 2 rows only
        )
order by
        class, id
fetch 
        first 2 rows only
;

-----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |        |      1 |        |   192 (100)|      2 |00:00:00.01 |       8 |       |       |          |
|*  1 |  VIEW                              |        |      1 |      2 |   192   (2)|      2 |00:00:00.01 |       8 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK          |        |      1 |      4 |   192   (2)|      2 |00:00:00.01 |       8 |  2048 |  2048 | 2048  (0)|
|   3 |    VIEW                            |        |      1 |      4 |   191   (1)|      4 |00:00:00.01 |       8 |       |       |          |
|   4 |     UNION-ALL                      |        |      1 |        |            |      4 |00:00:00.01 |       8 |       |       |          |
|   5 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  6 |       VIEW                         |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|*  7 |        WINDOW NOSORT STOPKEY       |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|   8 |         TABLE ACCESS BY INDEX ROWID| T_ODD  |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|*  9 |          INDEX RANGE SCAN          | TO_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
|* 10 |      VIEW                          |        |      1 |      2 |    95   (0)|      2 |00:00:00.01 |       4 |       |       |          |
|* 11 |       WINDOW NOSORT STOPKEY        |        |      1 |    202 |    95   (0)|      2 |00:00:00.01 |       4 | 73728 | 73728 |          |
|  12 |        TABLE ACCESS BY INDEX ROWID | T_EVEN |      1 |    202 |    95   (0)|      3 |00:00:00.01 |       4 |       |       |          |
|* 13 |         INDEX RANGE SCAN           | TE_I1  |      1 |    202 |     2   (0)|      3 |00:00:00.01 |       2 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_007"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "from$_subquery$_006"."CLASS","from$_subquery$_006"."ID")<=2)
   6 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=2)
   7 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
   9 - access("CLASS"='A' AND "ID">9500)
  10 - filter("from$_subquery$_005"."rowlimit_$$_rownumber"<=2)
  11 - filter(ROW_NUMBER() OVER ( ORDER BY "CLASS","ID")<=2)
  13 - access("CLASS"='A' AND "ID">9500)


As you can see, the E-Rows still predicts a lot of work, but the A-Rows tells us the work was kept to the minimum we want.