Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Power BI Reporting with Oracle Essbase

This last week, I presented a few sessions at ODTUG’s KSCOPE 2019 conference in Seattle.  One of these sessions was with my wonderful co-presenter and Oracle Application Cloud expert, Opal Alapat.  I connected with her when the call for papers opened and asked her if there would be interest in doing a session with Power BI connect to the Essbase in the Oracle Applications cloud, (OAC).  There was no desire to do a bake-off or competitive session, just a morbid curiosity about what options there were to connect the two and the outcome of building reports out of valuable data with such a powerful BI tool.

https://dbakevlar.com/wp-content/uploads/2019/07/kscope19-300x128.jpg 300w" sizes="(max-width: 421px) 100vw, 421px" />

Opal agreed it was a good idea and as the conference chair, took it on to present the opportunity for us to present this to the conference committee.  It is understandable that it could be received poorly, considering the idea vendor history and a fear of blatant marketing sessions.  Lucky for us, we both have a long history of presenting strong technical sessions, a great love of Oracle technology, so this wonderful and brave conference gave us a chance to show our stuff.

The Technical World is Shrinking

I’ve taken on this type of new frontier before, realizing that few IT shops are truly single platform, so I thank them publicly for the opportunity and the attendees demonstrated their appreciation with a ton of questions during and even more so after the session.

We started the session making it clear that we would demonstrate both the Oracle Application Cloud, with sample data from Essbase connected to Oracle Analytics Cloud as well as Power BI.  We also stated we had no interest in a competitive bake- off but hoped to simply show the opportunities to provide more value with the data.

We both went into clear explanations around the architecture, requirements and licensing that would be required as part of our solution,  and since I was working in a cross-platform cloud solution, the hurdles I faced, options I attempted to work around them, what was successful, those that failed, etc.

The Use-Case

The OAC environment that Opal gave me access possessed an example schema/data based on an Audio-Video store revenue for multiple years.  I’d never worked with the OAC before, but I was quickly able to find five methods to connect Power BI to it, either to import or to direct connect to the data:

  1.  Data Direct makes a Rest API to connect to the Oracle Cloud.
  2.  Use the web interface with Power BI and “scrape the website”, allowing PBI to build the table from example.
  3.  Connect with an ODBC driver
  4.  Via the OAC interface, export to Excel and then import into Power BI
  5.  Via the OAC interface, export to a table and then import into Power BI as a CSV file.

https://dbakevlar.com/wp-content/uploads/2019/07/kscope2-300x116.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/07/kscope2-768x297.jpg 768w" sizes="(max-width: 650px) 100vw, 650px" />

Rest API

Although #1 may sound like a great option, the Rest API ran via a docker container or dedicated server to run the service.  I have the container running on my laptop, but along with a more complicated setup, the connection dropped intermittently and the container crashed from time to time.

Web “Scraping”

Connecting to the website to scrape the data isn’t difficult, as Power BI has the ability to bypass the HTML and simply build out the table by example.  You enter in a few examples of what data you want it to build a table out of and it’s quite intuitive and will take it from there.  I had made a goal to have a repeatable, automated refresh option and with the authentication to OAC, this make web scraping less attractive.

ODBC Driver

This was the option that I thought, and in my research, appeared to be the most viable and failed terribly.  The authentication method to OAC and steps that Oracle has taken to use preferred tools has made ODBC the least simple method of connecting to the data and I chose to give up vs. working towards a functioning test.

Export to Excel

The export to Excel offers a cube format for those striving for:

  1.  the cube “feel” when modeling
  2.  want to use power pivot or more familiar with Excel than Power BI

As I’m definitely a RDBMS DBA/dev over an Excel person, this wasn’t my preference.

Export to table

This turned out to be my favorite.  The export was a simple table, in a CSV format that was easy to import into Power BI and easy to build visuals and do predictive analysis on.  The table was straight forward and required no formatting or massaging once brought into Power BI.  To build out a model that could be updated on a regular basis, I was able to schedule a job to export the table on a regular interval to a shared folder and then use a simple shell script to refresh it, renaming the file first to a generic name vs. the unique name that is be default.  I then updated the data source settings to point to the shared folder, which means no manual processing once the reports were built.

https://dbakevlar.com/wp-content/uploads/2019/07/kscope4-300x181.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/07/kscope4-768x465.jpg 768w, https://dbakevlar.com/wp-content/uploads/2019/07/kscope4.jpg 1091w" sizes="(max-width: 407px) 100vw, 407px" />

Huge Win-Win

To complete my demonstration, I went through my Power BI reports, both the design and the row level security by region and user, as well as how easy it was, due to the data structure, for me to build out visualizations with predictive analysis.  This was really crucial since I don’t have a background or any experience with the data that was provided to me.  I was simply the report designer and let Power BI do the heavy lifting.

Take-Aways from the Session

  1.  Attendees are often using Power BI with these Oracle data solutions, but have very little reference material on how to be successful with it.
  2. The ability to direct connect via the rest API was a bit shaky and complicated to set up, where the ability to automate the export to flat file option, (either cube or table) was the most stable.
  3. ”Web Scraping” using and using the build by example and then creating templates to  automate the data massage is possible with Essbase, but does require manual refreshes due to credential authentication.
  4. The reports in Power BI were robust, multi-tab reports and dashboards was available with even simple data.  The predictive analysis and quantity of visualizations allowed the user to build out expanded reports, both interactive, as well as paginated.
  5. For those that have on-premises Essbase, the Essbase connector is available for download, (brand new!) and can be used for those environments with Power BI.

Sample Reports

One of the great things about the use case from Opal and my session, is that the data is all sample data and can be made available to everyone, demonstrating how easy it is to get value by anyone, even someone like me who’s never worked with it before!

If you’re interested in seeing a high level example of the reports created, I’ve added a screenshots of the report here:

https://dbakevlar.com/wp-content/uploads/2019/07/kscope5-300x167.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/07/kscope5-768x428.jpg 768w" sizes="(max-width: 650px) 100vw, 650px" />

https://dbakevlar.com/wp-content/uploads/2019/07/kscope6-300x166.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/07/kscope6-768x425.jpg 768w" sizes="(max-width: 650px) 100vw, 650px" />

https://dbakevlar.com/wp-content/uploads/2019/07/kscope7-300x171.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/07/kscope7-768x437.jpg 768w" sizes="(max-width: 650px) 100vw, 650px" />

https://dbakevlar.com/wp-content/uploads/2019/07/kscope8-300x174.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/07/kscope8-768x445.jpg 768w" sizes="(max-width: 650px) 100vw, 650px" />

Now each of the attendees for KSCOPE have access to the PBIX files with sample data to work with, including the cube example.  All the reports, unlike the screenshots, are interactive, allowing the user to dig into the data and assign role level security to those using the report depending on region and login name, (this role would have to be updated if using something than Azure Active Directory to log in, of course.)

Have a happy 4th of July everyone!!

 



Tags:  , , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright ©  [Power BI Reporting with Oracle Essbase], All Right Reserved. 2019.

Oracle Refresh Group consistency with nested materialized views.

A Refresh Group can be used to refresh a list of materialized views and ensures that all data is read with transaction consistency. This post is about the meaning of consistency when one materialized view depends on the other one. That’s just a quick test to understand the behavior. Any comment is welcome (preferably on twitter — @FranckPachot)

Documentation

The documentation mentions:

To preserve referential integrity and transactional (read) consistency among multiple materialized views, Oracle Database can refresh individual materialized views as part of a refresh group. After refreshing all of the materialized views in a refresh group, the data of all materialized views in the group correspond to the same transactionally consistent point in time.

Materialized View Concepts and Architecture

But what does that mean when one materialized view depends on the other? This can happen when we do a detailed aggregate, and then build finer aggregates from them. What I would like in this case is all materialized views showing data as it were at the same point-in-time in the original tables. But is that possible? Here is a little test.

Demo

I create a DEMO table where I’ll store a sequence number and a timestamp:

SQL> connect demo/demo@//localhost/PDB1
Connected.
SQL> set echo on
SQL> create table DEMO as select 0 seq,current_timestamp now
from xmltable('1 to 1000');
Table created.

I run a background job that will constantly increase the sequence number and update the timestamp:

SQL> set escape on
SQL> host { echo "set echo on" ; echo "whenever sqlerror exit failure" ; echo "exec loop update demo set seq=seq+1,now=current_timestamp where rownum=1; commit; end loop;" ; } | sqlplus -s demo/demo@//localhost/PDB1 \& :
SQL> host sleep 1;

This is a loop that updates at high rate, so do it in a noarchivelog database.

Now that I have this constantly changing table, I create a first materialized view DEMOMV1 which is just a select on DEMO:

12:17:07 SQL> create materialized view DEMOMV1 
as select * from DEMO;
Materialized view created.
12:17:07 SQL> select max(seq),max(now) 
from DEMOMV1;
   MAX(SEQ)                               MAX(NOW)
___________ ______________________________________
21393 01-JUL-19 12.17.07.221922000 PM GMT

This mview contains the original data as-of its creation time: SEQ=21393

And then I create DEMOMV2 that queries both the DEMO table and the DEMOMV1 materialized view, to show the consistency easily:

12:17:07 SQL> create materialized view DEMOMV2 as
select 'MVIEW' source,seq,now from DEMOMV1
union all
select 'TABLE' source,seq,now from DEMO;
Materialized view created.
12:17:07 SQL> select source,max(seq),max(now) 
from DEMOMV2 group by source;
SOURCE MAX(SEQ) MAX(NOW)
_________ ___________ ______________________________________
MVIEW 21393 01-JUL-19 12.17.07.221922000 PM GMT
TABLE 22099 01-JUL-19 12.17.07.336869000 PM GMT

I see the original data as-of two point in times:

  • DEMO table was read at the time of the creation of this DEMOMV2 mview, which contains data at this point-in-time where SEQ was 22099
  • DEMOMV1 was read at the time of the creation of this DEMOMV2 mview, which contain data from the last refresh of DEMOMV1, which means the original data when SEQ was 21393

So far so good, this is exactly what I expected. The refreshes were done one after the other. I cannot expect to see the original data as of the same point-in-time.

Refresh Group

Here is the creation of the refresh group:

12:17:07 SQL> exec dbms_refresh.make('DEMO.DEMORGROUP', list=>'DEMOMV1,DEMOMV2', next_date=>sysdate+100, interval=>'null');
PL/SQL procedure successfully completed.

and the current state of my tables:

12:17:07 SQL> select max(seq),max(now) from DEMO;
   MAX(SEQ)                               MAX(NOW)
___________ ______________________________________
22214 01-JUL-19 12.17.07.377349000 PM GMT
12:17:07 SQL> select max(seq),max(now) from DEMOMV1;
   MAX(SEQ)                               MAX(NOW)
___________ ______________________________________
21393 01-JUL-19 12.17.07.221922000 PM GMT
12:17:07 SQL> select source,max(seq),max(now) from DEMOMV2 group by source;
   SOURCE    MAX(SEQ)                               MAX(NOW)
_________ ___________ ______________________________________
MVIEW 21393 01-JUL-19 12.17.07.221922000 PM GMT
TABLE 22099 01-JUL-19 12.17.07.336869000 PM GMT
12:17:07 SQL> select max(seq),max(now) from DEMO;
   MAX(SEQ)                               MAX(NOW)
___________ ______________________________________
22241 01-JUL-19 12.17.07.386421000 PM GMT

DEMO had some changes, but the materialized views are still at the same state as they were not refreshed.

Now refreshing the group (I enable some traces before):

12:17:07 SQL> alter session set tracefile_identifier=rgroup
events='10979 trace name context forever, level 1
: 10046 trace name context forever, level 3';
Session altered.
12:17:07 SQL> exec dbms_refresh.refresh('DEMO.DEMORGROUP');
PL/SQL procedure successfully completed.

DEMO has newer sequence:

12:17:07 SQL> select max(seq),max(now) from DEMO;
   MAX(SEQ)                               MAX(NOW)
___________ ______________________________________
22713 01-JUL-19 12.17.07.532700000 PM GMT

Then I expect the materialize views to have been refreshed when DEMO was at a point where SEQ was between 22214 and 22713.

12:17:07 SQL> select max(seq),max(now) from DEMOMV1;
   MAX(SEQ)                               MAX(NOW)
___________ ______________________________________
22507 01-JUL-19 12.17.07.464718000 PM GMT

That’s right, this refresh of DEMOMV1 occurred when SEQ was 22507.

Now the big question is about DEMOMV2:

12:17:07 SQL> select source,max(seq),max(now) 
from DEMOMV2 group by source;
   SOURCE    MAX(SEQ)                               MAX(NOW)
_________ ___________ ______________________________________
MVIEW 22507 01-JUL-19 12.17.07.464718000 PM GMT
TABLE 22610 01-JUL-19 12.17.07.499890000 PM GMT

Same as when I did the refreshes manually, one by one: DEMOMV2 was refreshed after DEMOMV1 and sees DEMOMV1 as-of this point in time where SEQ was 22507. And data from the original table is fresher than that: SEQ was 22610 when DEMOMV2 was refreshed.

This means that if you build aggregates on top of aggregates, while the base table is changing, then the summaries will not show the aggregate data as-of the same point-in-time. It is perfectly normal that they are stale, but they don’t have the same staleness because they were nested.

Flashback query?

For a real consistency, I would have expected that DEMOMV2 had read DEMO as-of the same point-in-time as the other mviews in the refresh group. But here it behaved like a read-commited isolation level transaction: consistency is at statement level.

We can think of Flashback Query in this case. This would be fine to read DEMO. But reading DEMOMV1 with flashback query would ignore the changes done in the transaction, and I want to read the refreshed DEMOMV1 or I’ll get values from the previous refresh.

Anyway, flashback mode is disallowed here:

ORA-08182: operation not supported while in Flashback mode

Refresh order

We do not define any order when we add a materialized view to a refresh group. And, as we have seen, the order matters when one mview reads another one. The refresh group takes care of the dependencies as we can see in the 10979 trace:

The ‘Sched. In’ and ‘Out’ link the dependencies. Here DEMOMV2 (which is #2 in the group) takes input from DEMOMV1 (which is #1) and then is scheduled after it.

Actually, in this example, the refresh works exactly as if I had refreshed DEMOMV2 with the nested option, triggering the refresh of the other mviews it depends on, like:

SQL> exec dbms_mview.refresh('DEMO.DEMOMV2',nested=>true);

Serializable

Finally, what I would have expected here is that the refresh was done in a SET TRANSACTION ISOLATION LEVEL SERIALIZABLE so that the refresh of DEMOMV2 sees DEMO at the same query SCN as when DEMOMV1 was refreshed, and sees DEMOMV1 as refreshed (uncommitted changes visible from the current transaction). Here is how it should work in my opinion, showing it with normal tables:

12:07:14 SQL> set transaction isolation level serializable;
Transaction succeeded.
12:07:14 SQL> delete from DEMOT1;
1 row deleted.
12:07:14 SQL> insert into DEMOT1  select * from DEMO;
1 row created.
12:07:14 SQL> delete from DEMOT2;
2 rows deleted.
12:07:14 SQL> insert into DEMOT2 
select 'MVIEW' source,seq,now from DEMOT1
union all
select 'TABLE' source,seq,now from DEMO;
2 rows created.
12:07:14 SQL> select max(seq),max(now) from DEMO;
   MAX(SEQ)                               MAX(NOW)
___________ ______________________________________
2023485 02-JUL-19 12.07.14.394374000 PM GMT
12:07:14 SQL> select max(seq),max(now) from DEMOT1;
   MAX(SEQ)                               MAX(NOW)
___________ ______________________________________
2023485 02-JUL-19 12.07.14.394374000 PM GMT
12:07:14 SQL> select source,max(seq),max(now) from DEMOT2 group by source;
   SOURCE    MAX(SEQ)                               MAX(NOW)
_________ ___________ ______________________________________
MVIEW 2023485 02-JUL-19 12.07.14.394374000 PM GMT
TABLE 2023485 02-JUL-19 12.07.14.394374000 PM GMT

This is what I would like to do with a materialized view refresh, but unfortunately:

ORA-08177: can’t serialize access for this transaction

This failed in the refresh of DEMOMV1:

/* MV_REFRESH (INS) */INSERT /*+ BYPASS_RECURSIVE_CHECK */ INTO "DEMO"."DEMOMV1" select * from DEMO

About the reason, it doesn’t work as I would expect it, my guess is that it has something to do with: the fact that a refresh is somehow considered DDL:

https://asktom.oracle.com/pls/asktom/asktom.search?tag=refresh-on-materialized-view#152095200346634471

So… be careful if you have materialized views on top on the other, and want a consistent refresh, and the base tables are changing. And test it. The consistency issues are always discovered too late in production. Do you have UAT checking that?

Using DbVisualizer to work with #Oracle, #PostgreSQL and #Exasol

As a Database Developer or Database Administrator, it becomes increasingly unlikely that you will work with only one platform.

It’s quite useful to have one single tool to handle multiple different database platforms. And that’s exactly the ambition of DbVisualizer.

As a hypothetical scenario, let’s assume you are a database admin who works on a project to migrate from Oracle to EDB Postgres and Exasol.

The goal might be to replace the corporate Oracle database landscape, moving the OLTP part to EDB Postgres and the DWH / Analytics part to Exasol.

Instead of having to switch constantly between say SQL Developer, psql and EXAplus, a more efficient approach would be using DbVisualizer for all three.

I created one connection for each of the three databases here for my demo:https://uhesse.files.wordpress.com/2019/07/dbviz1.png?w=150&h=71 150w, https://uhesse.files.wordpress.com/2019/07/dbviz1.png?w=300&h=143 300w, https://uhesse.files.wordpress.com/2019/07/dbviz1.png?w=768&h=365 768w, https://uhesse.files.wordpress.com/2019/07/dbviz1.png 809w" sizes="(max-width: 620px) 100vw, 620px" />Now let’s see if statements I do in Oracle also work in EDB Postgres and in Exasol:

https://uhesse.files.wordpress.com/2019/07/dbviz2.png?w=150&h=59 150w, https://uhesse.files.wordpress.com/2019/07/dbviz2.png?w=300&h=118 300w, https://uhesse.files.wordpress.com/2019/07/dbviz2.png?w=768&h=302 768w, https://uhesse.files.wordpress.com/2019/07/dbviz2.png 956w" sizes="(max-width: 620px) 100vw, 620px" />

Oracle

https://uhesse.files.wordpress.com/2019/07/dbviz3.png?w=150&h=58 150w, https://uhesse.files.wordpress.com/2019/07/dbviz3.png?w=300&h=116 300w, https://uhesse.files.wordpress.com/2019/07/dbviz3.png?w=768&h=298 768w, https://uhesse.files.wordpress.com/2019/07/dbviz3.png 951w" sizes="(max-width: 620px) 100vw, 620px" />

EDB

https://uhesse.files.wordpress.com/2019/07/dbviz4.png?w=150&h=58 150w, https://uhesse.files.wordpress.com/2019/07/dbviz4.png?w=300&h=116 300w, https://uhesse.files.wordpress.com/2019/07/dbviz4.png?w=768&h=296 768w, https://uhesse.files.wordpress.com/2019/07/dbviz4.png 954w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol

Works the same for all three! The convenient thing here is that I just had to select the Database Connection from the pull down menu while leaving the statement as it is. No need to copy & paste even.

What about schemas and tables?

https://uhesse.files.wordpress.com/2019/07/dbviz5.png?w=150&h=57 150w, https://uhesse.files.wordpress.com/2019/07/dbviz5.png?w=300&h=114 300w, https://uhesse.files.wordpress.com/2019/07/dbviz5.png?w=768&h=292 768w, https://uhesse.files.wordpress.com/2019/07/dbviz5.png 818w" sizes="(max-width: 620px) 100vw, 620px" />

Oracle

In EDB, I need to create a schema accordingly:

https://uhesse.files.wordpress.com/2019/07/dbviz6.png?w=150&h=67 150w, https://uhesse.files.wordpress.com/2019/07/dbviz6.png?w=300&h=134 300w, https://uhesse.files.wordpress.com/2019/07/dbviz6.png 696w" sizes="(max-width: 620px) 100vw, 620px" />

EDB

 

In Exasol, schema and table can be created in the same way:

https://uhesse.files.wordpress.com/2019/07/dbviz7.png?w=150&h=72 150w, https://uhesse.files.wordpress.com/2019/07/dbviz7.png?w=300&h=143 300w, https://uhesse.files.wordpress.com/2019/07/dbviz7.png 740w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol

Notice that the data types got silently translated into the proper Exasol data types:

https://uhesse.files.wordpress.com/2019/07/dbviz8.png?w=150 150w, https://uhesse.files.wordpress.com/2019/07/dbviz8.png?w=300 300w" sizes="(max-width: 608px) 100vw, 608px" />

Exasol

There is no DBA_TABLES in Exasol, though:

https://uhesse.files.wordpress.com/2019/07/dbviz9.png?w=150&h=44 150w, https://uhesse.files.wordpress.com/2019/07/dbviz9.png?w=300&h=88 300w, https://uhesse.files.wordpress.com/2019/07/dbviz9.png?w=768&h=225 768w, https://uhesse.files.wordpress.com/2019/07/dbviz9.png 801w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol

Of course, there’s much more to check and test upon migration, but I think you got an idea how a universal SQL Client like DbVisualizer might help for such purposes.

 

opt_estimate 4

In the previous article in this series on the opt_estimate() hint I mentioned the “query_block” option for the hint. If you can identify a specify query block that becomes an “outline_leaf” in an execution plan (perhaps because you’ve deliberately given an query block name to an inline subquery and applied the no_merge() hint to it) then you can use the opt_estimate() hint to tell the optimizer how many rows will be produced by that query block (each time it starts). The syntax of the hint is very simple:


opt_estimate(@{query block name}  query_block  rows={number of rows})

As with other options for the hint, you can use scale_rows=, min=, max= as alternatives (the last seems to be used in the code generated by Oracle for materialized view refreshes) but the simple “rows=N” is likely to be the most popular. In effect it does the same as the “non-specific” version of the cardinality() hint – which I’ve suggested from time to time as a way of telling the optimizer the size of a data set in a materialized CTE (“with” subquery), e.g.


set serveroutput off

with demo as (
        select  /*+
                        qb_name(mat_cte)
                        materialize
                        cardinality(@mat_cte 11)
--                      opt_estimate(@mat_cte query_block rows=11)
                */
                distinct trunc(created)    date_list
        from    all_objects
)
select  * from demo
;

select * from table(dbms_xplan.display_cursor);
    

Regardless of whether you use the opt_estimate() or cardinality() hint above, the materialized temporary table will be reported with 11 rows. (Note that in this case where the hint is inside the query block it applies to the “@mat_cte” isn’t necessary).

In the previous article I generated some data with a script called opt_est_gby.sql to show you the effects of the group_by and having options of the opt_estimate() hint and pointed out that there were case where you might also want to include the query_block option as well. Here’s a final example query showing the effect, with the scale_rows feature after creating a table t2 as a copy of t1 but setting pctfree 75 (to make a tablescan more expensive) and creating an index on t2(id):


create table t2 pctfree 75 as select * from t1;
create index t2_i1 on t2(id);

select
        t2.n1, t1ct
from
        t2,
        (
        select  /*+
                        qb_name(main)
                        opt_estimate(@main group_by scale_rows=4)
                        opt_estimate(@main having scale_rows=0.4)
                        opt_estimate(@main query_block scale_rows=0.5)
                */
                mod(n1,10), count(*) t1ct
        from    t1
        group by
                mod(n1,10)
        having
                count(*) > 100
        ) v1
where
        t2.id = v1.t1ct
;

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     8 |   168 |    27   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |     8 |   168 |    27   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |     8 |   168 |    27   (8)| 00:00:01 |
|   3 |    VIEW                      |       |     8 |   104 |    10  (10)| 00:00:01 |
|*  4 |     FILTER                   |       |       |       |            |          |
|   5 |      HASH GROUP BY           |       |     8 |    32 |    10  (10)| 00:00:01 |
|   6 |       TABLE ACCESS FULL      | T1    |  3000 | 12000 |     9   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(COUNT(*)>100)
   7 - access("T2"."ID"="V1"."T1CT")


I’ve inlined the last query (with the two opt_estimate() hints) that I used in the previous article, and added a third opt_estimate() hint to that inline view. In this case I didn’t have to add a no_merge() hint because the numbers worked in my favour but to be safe in a production environment that’s a hint that I should have included.

You may recall that the hash group by on its own resulted in a prediction of 200 rows, and with the having clause the prediction dropped to 10 rows (standard 5%). With my three opt_estimate() hints in place I should see the effects of the following arithmetic:


group by      200       * 4   = 800
having        5% of 800 * 0.4 =  16
query block   16        * 0.5 =   8

As you can see, the cardinality prediction for the VIEW operation is, indeed, 8 – so the combination of hints has worked. It’s just a shame that we can’t see the three individual steps in the arithmetic as we walk the plan.

A Warning

As always I can only repeat – hinting is not easy; and “not easy” usually translates to “not stable / not safe” (and thanks to a Freudian slip while typing: “not sage”. You probably don’t know how do it properly, except in the very simplest cases, and we don’t really know how Oracle is interpreting the hints (particularly the undocumented ones). Here’s an example of how puzzling even the opt_estimate(query_block) hint can be – as usual starting with some data:

rem
rem     Script:         opt_estimate_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select * from all_objects;

create table t2
as
select * from all_objects;

As you can see, I’ve been a bit lazy with this example (which I wrote a couple of years ago) and it uses all_objects as a convenient source of data. Unfortunately this means you won’t necessarily be able to reproduce exactly the results I’m about to show you, which I did on a small instance of 12.2.0.1. I’m going to examine four versions of a simple query which

  • restricts the rows from t1,
  • finds the unique set of object_types in that subset of t1
  • then joins to t2 by object_type

select
        /*+ 
                qb_name(main)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;

The first version is my unhinted baseline (where, in my case, Oracle doesn’t use complex view merging), the second forces complex view merging of the inline aggregate view, then queries 3 and 4 repeat queries 1 and 2 but tell the optimizer that the number of distinct object_type values  is 14 (roughly half the actual in may case). But there is an oddity in the last query – I’ve told the optimizer how many rows it should estimate for the inline view but I’ve also told it to get rid of the inline view and merge it into the outer query block; so what effect is that going to have? My hope would be that the hint would have to be ignored because it’s going to apply to a query block that doesn’t exist in the final plan and that makes it irrelevant and unusable. Here are the four execution plans:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    27 |   351 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    27 |   486 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           | 61776 |  5489K|  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    14 |   182 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    14 |   252 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           |    14 |  1274 |  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The first plan tells us that most of the rows in t1 have created > 1st March 2017 and there are (estimated) 27 distinct values for object_type; and there are 61,776 rows in t2 (which is basically the same as t1), and none of them are eliminated by the join on object_type from the inline view.

The second plan (with the forced complext view merging) shows Oracle changing the view with “distinct” into a (right) semi-join between t2 and t1 with the internal view name of VM_NWVW_1 – and the cardinality is correct.

The third plan shows that my hint telling the optimizer to assume the original inline view produces 14 rows has been accepted and, not surprisingly, when we claim that we have roughly half the number of object_type values the final estimate of rows in the join is roughly halved.

So what happens in the fourth plan when our hint applies to a view that no longer exists? I think the optimizer should have discarded the hint as irrelevant the moment it merged the view. Unfortunately it seems to have carried the hint up into the merged view and used it to produce a wildly inaccurate estimate for the final cardinality. If this had been a three-table join this is the sort of error that could make a sensible hash join into a third table become an unbelievably stupid nested loop join. If you had thought you were doing something incredibly clever with (just) the one opt_estimate() hint, the day might come when a small change in the statistics resulted in the optimizer using a view merge strategy you’d never seen before and producing a catastrophic execution plan in (say) an overnight batch that then ran “forever”.

Hinting is hard, you really have to be extremely thorough in your hints and make sure you cover all the options that might appear. And then you might still run into something that looks (as this does) like a bug.

Footnote

Here’s a closing thought: even if you manage to tell the optimizer exactly how many rows will come out of a query block to be joined to the next table in the query, you may still get a very bad plan unless you can also tell the optimizer how many distinct values of the join column(s) there are in that data set. Which means you may also have to learn all about the (even more undocumented) column_stats() hint.

 

MERGE and ORA-8006

I’m sure there will be a slew of post-Kscope wrap up posts coming out into the blogosphere, so in lieu of that, and the the fact that I’m just stuck in an airport waiting for a flight, I’ll offer something slightly more technical. I did a post a while back about a curious error “unable to get a stable set of rows” when using MERGE. Here is another variant which can occur when you allow rows to physically move during a MERGE.

How is that possible?” I hear you ask. Easy. All we need is partitioned table with ENABLE ROW MOVEMENT.



SQL> create table t (pk number primary key, x number)
  2      partition by list (pk)
  3      (partition p1 values(1),
  4       partition p2 values(2)
  5      )
  6  enable row movement;

Table created.

SQL>
SQL> insert into t values (1, 1);

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL> merge into t
  2  using (select 1 idx, 2 new_val from dual
  3         connect by level <= 2
  4        ) u
  5  on (t.x = u.idx)
  6  when matched then
  7    update set pk=new_val;
merge into t
           *
ERROR at line 1:
ORA-08006: specified row no longer exists

You can see that the merge is going to first change the PK column values from 1 to 2, which will move the row from one partition to another. The second row from the source (CONNECT BY LEVEL <= 2) will go hunting for that original value of 1 in its original partition and not find it there. Hence it “no longer exists”.

This is just another example of why you might want to consider cleansing input data for sensibility before aimlessly throwing it at a MERGE command.

Hi Vivek, Right, but in a mixed workload, reporting is done at the same time as OLTP.

Hi Vivek,
Right, but in a mixed workload, reporting is done at the same time as OLTP. Parallel Query is ok (when resource manager manages to keep resources for the OLTP) but PDML blocks all modification on their target tables and all tables linked to them by referential integrity. It is more than a “reporting” service. It is more like an “I am alone on these tables” service.

Hi Gg,

Hi Gg,

Thanks for the feedback.
The reason is documented in https://support.oracle.com/epmos/faces/DocContentDisplay?id=2433906.1
parallel processes have their own transaction and coordinator uses 2PC to commit. Concurrent transactions could mess-up everything.

opt_estimate 3

This is just a quick note to throw out a couple of of the lesser-known options for the opt_estimate() hint – and they may be variants that are likely to be most useful since they address a problem where the optimizer can produce consistently bad cardinality estimates. The first is the “group by” option – a hint that I once would have called a “strategic” hint but which more properly ought to be called a “query block” hint. Here’s the simplest possible example (tested under 12.2, 18.3 and 19.2):


rem
rem     Script:         opt_est_gby.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

create table t1
as
select
        rownum                  id,
        mod(rownum,200)         n1,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
)
from
        dual
connect by
        level <= 3000
;

set autotrace on explain

prompt  =============================
prompt  Baseline cardinality estimate
prompt  (correct cardinality is 10)
prompt  Estimate will be 200
prompt  =============================

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*) 
from    t2 
group by 
        mod(n1,10)
;

I’ve generated a table of 3,000 rows with a column n1 holding 15 rows each of 200 distinct values. The query then aggregates on mod(n1,10) so it has to return 10 rows, but the optimizer doesn’t have a mechanism for inferring this and produces the following plan – the Rows value from the HASH GROUP BY at operation 1 is the only thing we’re really interested in here:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   200 |   800 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   200 |   800 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

It looks as if the optimizer’s default position is to use num_distinct from the underlying column as the estimate for the aggregate. We can work around this in the usual two ways with an opt_estimate() hint. First, let’s tell the optimizer that it’s going to over-estimate the cardinality by a factor of 10:


select  /*+
                qb_name(main)
                opt_estimate(@main group_by, scale_rows = 0.1)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    20 |    80 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    20 |    80 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

The hint uses group_by as the critical option parameter, and then I’ve used the standard scale_rows=nnn to set a scaling factor that should be used to adjust the result of the default calculation. At 10% (0.1) this gives us an estimate of 20 rows.

Alternatively, we could simply tell the optimizer how many rows we want it to believe will be generated for the aggregate – let’s just tell it that the result will be 10 rows.

select  /*+
                qb_name(main)
                opt_estimate(@main group_by, rows = 10)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |    40 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

We use the same group_by as the critical parameter, with rows=nnn.

Next steps

After an aggregation there’s often a “having” clause so you might consider using the group_by option to fix up the cardinality of the having clause if you know what the normal effect of the having clause should be. For example: “having count(*) > NNN” will use the optimizer’s standard 5% “guess” and “having count(*) = NNN” will use the standard 1% guess. However, having seen the group_by options I took a guess that there might be a having option to the opt_estimate() hint as well, so I tried it – with autotrace enabled here are three queries, first the unhinted baseline (which uses the standard 5% on my having clause) then a couple of others with hints to tweak the cardinality:

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main having scale_rows=0.4)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main group_by scale_rows=2)
                opt_estimate(@main having scale_rows=0.3)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

The first query gives us the baseline cardinality of 10 (5% of 200). The second query scales the having cardinality down by a factor of 0.4  (with means an estimate of 4). The final query first doubles the group by cardinality (to 400), then scales the having cardinality (which would have become 20) down by a factor of 0.3 with the nett effect of producing a cardinality of 6. Here are the plans.

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10 |    40 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |   --  10
|   2 |   HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |   -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     4 |    16 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   4
|   2 |   HASH GROUP BY     |      |     4 |    16 |    10  (10)| 00:00:01 |    -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     6 |    24 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   6
|   2 |   HASH GROUP BY     |      |     6 |    24 |    10  (10)| 00:00:01 |    -- 400
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

It’s a little sad that the FILTER operation shows no estimate while the HASH GROUP BY operation shows the estimate after the application of the having clause. It would be nice to see the plan reporting the figures which I’ve added at the end of line for operations 1 and 2.

You may wonder why one would want to increase the estimate for the group by then reduce it for the having. While I’m not going to go to the trouble of creating a worked example it shouldn’t be too hard to appreciate the idea that the optimizer might use complex view merging to postpone a group by until after a join – so increasing the estimate for a group by might be necessary to ensure that that particular transformation doesn’t happen, while following this up with a reduction to the having might then ensure that the next join is a nested loop rather than a hash join. Of course, if you don’t need to be this subtle you might simply take advantage of yet another option to the opt_estimate() hint, the query_block option – but that will (probably) appear in the next article in this series.

 

Oracle Materialized View Refresh Group atomicity— How to prove transactional consistency with…

Oracle Materialized View Refresh Group atomicity— How to prove transactional consistency with LogMiner

I had recently to prove to myself, and then to the Oracle Support, that a Materialized View Group was not refreshed atomically as it should, according to the documentation:

Add materialized views to a refresh group to ensure transactional consistency between the related materialized views in the refresh group. When a refresh group is refreshed, all materialized views that are added to a particular refresh group are refreshed at the same time.

Database Administrator's Guide

AskTOM

The first idea was suggested by the user who encountered the issue. He has read a similar question on, AskTOM, which is a very good idea:

Ask TOM does DBMS_REFRESH.REFRESH a atomic refresh of mviews?

There, Connor quickly shows that there are no intermediate commits by enabling SQL_TRACE and looking at the XCTEND lines from the trace dump.

sql_trace

So I did the same in my case and here is an extract from the interesting lines, just grepping the inserts and the XCTEND:

Clearly, the first 4 tables were done in the same transaction. But we can see some commits between the 4 others. This seemed to confirm what the user has observed: a query on tables shows data from a different point in time. And then I opened a SR to fill a bug.

However, the support engineer disapproved this proof because the XCTEND can come from recursive transactions. And he is totally right. With SQL_TRACE you can prove that it is atomic, but you cannot prove that it is not.

LogMiner

When it comes to transactions, LogMiner is the right tool. It is incredibly powerful (all persistent changes on your database go to the redo stream, and LogMiner can read the most interesting out of it. And it is incredibly easy to use — at least until the latest release where Oracle removes many replication features which may overlap with GoldenGate — a product sold separately.

Here I’m refreshing the materialized view group LSA.CCDB_VIEWS, which contains 8 MVIEWs. I am tracking the SCN before (scn1) and after (scn2).

set numwidth 16 linesize 200 pagesize 1000
column scn1 new_value scn1
column scn2 new_value scn2
column sid format 999999 new_value sid
column seg_type format 99
column seg_name format a30
column seg_owner format a12
column operation format a12
set linesize 1000
alter database add supplemental log data;
select current_timestamp,current_scn scn1,sys_context('userenv','sid') sid from v$database;
exec dbms_refresh.refresh('LSA.CCDB_VIEWS');
commit;
select current_timestamp,current_scn scn2,sys_context('userenv','sid') sid from v$database;
alter database drop supplemental log data;

Note that as I don’t have supplemental logging enabled here, I enable it just for this test. It is not a bad idea to enable it always, as long as the redo size is acceptable.

Then I start Log Miner for this SCN range. I use CONTINUOUS_MINE as I am on 18c here (it has been deprecated, de-supported and even removed in 19c

Glitches

Here’s a question just in from Oracle-L that demonstrates the pain of assuming things work consistently when sometimes Oracle development hasn’t quite finished a bug fix or enhancement. Here’s the problem – which starts from the “scott.emp” table (which I’m not going to create in the code below):

rem
rem     Script:         fbi_fetch_first_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

-- create and populate EMP table from SCOTT demo schema

create index e_sort1 on emp (job, hiredate);
create index e_low_sort1 on emp (lower(job), hiredate);

set serveroutput off
alter session set statistics_level = all;
set linesize 156
set pagesize 60

select * from emp where job='CLERK'         order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

select * from emp where lower(job)='clerk' order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

Both queries use the 12c “fetch first” feature to select two rows from the table. We have an index on (job, hiredate) and a similar index on (lower(job), hiredate), and given the similarity of the queries and the respective indexes (get the first two rows by hiredate where job/lower(job) is ‘CLERK’/’clerk’) we might expect to see the same execution plan in both cases with the only change being the choice of index used. But here are the plans:


select * from emp where job='CLERK'         order by hiredate fetch
first 2 rows only

Plan hash value: 92281638

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |     2 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |         |      1 |      2 |     2   (0)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |         |      1 |      3 |     2   (0)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP     |      1 |      3 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_SORT1 |      1 |      3 |     1   (0)|      3 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("JOB"='CLERK')


select * from emp where lower(job)='clerk' order by hiredate fetch
first 2 rows only

Plan hash value: 4254915479

-------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |      1 |        |     1 (100)|      2 |00:00:00.01 |       2 |       |       |          |
|*  1 |  VIEW                                 |             |      1 |      2 |     1   (0)|      2 |00:00:00.01 |       2 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK             |             |      1 |      1 |     1   (0)|      2 |00:00:00.01 |       2 |  2048 |  2048 | 2048  (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| EMP         |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       2 |       |       |          |
|*  4 |     INDEX RANGE SCAN                  | E_LOW_SORT1 |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       1 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')


As you can see, with the “normal” index Oracle is able to walk the index “knowing” that the data is appearing in order, and stopping as soon as possible (almost) – reporting the WINDOW operation as “WINDOW NOSORT STOPKEY”. On the other hand with the function-based index Oracle retrieves all the data by index, sorts it, then applies the ranking requirement – reporting the WINDOW operation as “WINDOW SORT PUSHED RANK”.

Clearly it’s not going to make a lot of difference to performance in this tiny case, but there is a threat that the whole data set for ‘clerk’ will be accessed – and that’s the first performance threat, with the additional threat that the optimizer might decide that a full tablescan would be more efficient than the index range scan.

Can we fix it ?

Yes, Bob, we can. The problem harks back to a limitation that probably got fixed some time between 10g and 11g – here are two, simpler, queries against the emp table and the two new indexes, each with the resulting execution plan when run under Oracle 10.2.0.5:


select ename from emp where       job  = 'CLERK' order by hiredate;
select ename from emp where lower(job) = 'clerk' order by hiredate;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    66 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP     |     3 |    66 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | E_SORT1 |     3 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("JOB"='CLERK')


--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     3 |    66 |     3  (34)| 00:00:01 |
|   1 |  SORT ORDER BY               |             |     3 |    66 |     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| EMP         |     3 |    66 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | E_LOW_SORT1 |     3 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(LOWER("JOB")='clerk')

The redundant SORT ORDER BY is present in 10g even for a simple index range scan. By 11.2.0.4 the optimizer was able to get rid of the redundant step, but clearly there’s a little gap in the code relating to the over() clause that hasn’t acquired the correction – even in 18.3.0.0 (or 19.2 according to a test on https://livesql.oracle.com).

To fix the 10g problem you just had to include the first column of the index in the order by clause: the result doesn’t change, of course, because you’re simply prefixing the required columns with a column which holds the single value you were probing the index for but suddenly the optimizer realises that it can do a NOSORT operation – so the “obvious” guess was to do the same for this “first fetch” example:

select * from emp where lower(job)='clerk' order by lower(job), hiredate fetch first 2 rows only;

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |      1 |        |     3 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |             |      1 |      2 |     3  (34)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |             |      1 |      1 |     3  (34)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP         |      1 |      1 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_LOW_SORT1 |      1 |      1 |     1   (0)|      3 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."SYS_NC00009$","EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')

It’s just one of those silly little details where you can waste a HUGE amount of time (in a complex case) because it never crossed your mind that something that clearly ought to work might need testing for a specific use case – and I’ve lost count of the number of times I’ve been caught out by this type of “not quite finished” anomaly.

Footnote

If you follow the URL to the Oracle-L thread you’ll see that Tanel Poder has supplied a couple of MoS Document Ids discussing the issue and warning of other bugs with virtual column / FBI translation, and has shown an alternative workaround that takes advantage of a hidden parameter.