Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Enforcing a re-parse of a cursor in Autonomous Database. Using a hammer

While setting up a demo environment in my Autonomous Database I found that one of my tuning techniques – evicting a cursor from the shared pool – isn’t quite as straight forward as it could be. In fact, at the time of writing it wasn’t possible to force a cursor reparse in Autonomous Database (to the best of my knowledge).

The problem

Executing my flush.sql script failed, and from what I can tell, dbms_shared_pool is not exposed to users in Autonomous Transaction Processing database at the moment:

SQL> show user
USER is "ADMIN"
SQL> desc sys.dbms_shared_pool
ERROR:
ORA-04043: object sys.dbms_shared_pool does not exist

If dbms_shared_pool isn’t accessible, it makes it pretty hard to purge a cursor with the surgical precision this package offers.

A potential workaround exists

There is a workaround, however it affects all cursor referencing a table. It’s neither granular, nor very elegant and has the potential to do more harm than good due to the potentially high number of cursor invalidations!. I’d say the workaround sits somewhere between dbms_shared_pool.purge and flushing the shared pool altogether.

Due to its broad scope, be very careful applying this workaround, and use all industry best known methods prior to its use, refrain from using it unless you are absolutely sure about what you are about to do. Consider yourself warned.

Setting the scene

As always, I’m using Swingbench for this example. I created the SOE schema in the Autonomous Transaction database, it’s quite straight forward to do and documented on Dominic Giles’ blog. I had to massage the data model a little after creating a rather large number of rows to reproduce my classic bind variable issue.

Consider the ORDERS table, and more specifically, ORDER_MODE_EXT:

SOE> select count(*), order_mode_ext from orders group by order_mode_ext;

  COUNT(*) ORDER_MOD
---------- ---------
  12625080 online
    113920 undefined
    231820 direct

3 rows selected. 

I created an index and frequency histogram on order_mode_ext, and I’m triggering an index-based access path by using a SQL statement and a bind variable. You will notice that I’m running SQL scripts, these are of course not mine, they are Tanel Poder’s.

SQL> var om varchar2(20)
SQL> exec :om := 'undefined' 

PL/SQL procedure successfully completed.

SQL> select /* atp-test */ /*+ gather_plan_statistics */ count(*) from orders where order_mode_ext = :om;

  COUNT(*)
----------
    113920

1 row selected.

SQL_ID: 1cw6t95y0jcsm 

Modern SQL*Plus can print a SQL ID if you set “feedback on SQL_ID” by the way, as you can see. Let’s have a look at the execution plan:

SQL> select plan_hash_value, child_number, invalidations, executions, is_bind_aware, is_bind_sensitive 
 2 from v$sql where sql_id = '1cw6t95y0jcsm';

PLAN_HASH_VALUE CHILD_NUMBER INVALIDATIONS EXECUTIONS I I
--------------- ------------ ------------- ---------- - -
      545586833            0             0          1 N Y

SQL> @xi 1cw6t95y0jcsm %
eXplain the execution plan for sqlid 1cw6t95y0jcsm child %...

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
SQL_ID  1cw6t95y0jcsm, child number 0
-------------------------------------
select /* atp-test */ /*+ gather_plan_statistics */ count(*) from
orders where order_mode_ext = :om

Plan hash value: 545586833

-----------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |      1 |        |   311 (100)|      1 |00:00:00.01 |     358 |
|   1 |  SORT AGGREGATE   |                |      1 |      1 |            |      1 |00:00:00.01 |     358 |
|*  2 |   INDEX RANGE SCAN| I_ORDERS_OMEXT |      1 |    113K|   311   (1)|    113K|00:00:00.01 |     358 |
-----------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - :1 (VARCHAR2(30), CSID=873): 'undefined'

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ORDER_MODE_EXT"=:OM)


25 rows selected. 

Tanel’s scripts prove the expected index-based access path was chosen for my bind value. Indexed based access is ok for this particular bind value (‘undefined’) because it’s a small subset of the table. But what about the other extreme, ‘online’ orders? Let’s try passing a different bind variable:

SQL> exec :om := 'online'

PL/SQL procedure successfully completed.

SQL> select /* atp-test */ /*+ gather_plan_statistics */ count(*) from orders where order_mode_ext = :om;

  COUNT(*)
----------
  12625080

1 row selected. 
 
SQL_ID: 1cw6t95y0jcsm

Again, let’s have a look at the shared pool:

SQL>@xi 1cw6t95y0jcsm %
eXplain the execution plan for sqlid 1cw6t95y0jcsm child %...

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
SQL_ID  1cw6t95y0jcsm, child number 0
-------------------------------------
select /* atp-test */ /*+ gather_plan_statistics */ count(*) from
orders where order_mode_ext = :om

Plan hash value: 545586833

-----------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |      1 |        |   311 (100)|      1 |00:00:01.01 |   33760 |
|   1 |  SORT AGGREGATE   |                |      1 |      1 |            |      1 |00:00:01.01 |   33760 |
|*  2 |   INDEX RANGE SCAN| I_ORDERS_OMEXT |      1 |    113K|   311   (1)|     12M|00:00:01.36 |   33760 |
-----------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - :1 (VARCHAR2(30), CSID=873): 'undefined'

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ORDER_MODE_EXT"=:OM)


25 rows selected.

SQL>select plan_hash_value, child_number, invalidations, executions, is_bind_aware, is_bind_sensitive
 2 from v$sql where sql_id = '1cw6t95y0jcsm';

PLAN_HASH_VALUE CHILD_NUMBER INVALIDATIONS EXECUTIONS I I
--------------- ------------ ------------- ---------- - -
      545586833            0             0          2 N Y  

No change after the second execution (I know, this isn’t the full story).

Triggering the re-parse

Let’s assume further that PHV 545586833 (the index-based plan) causes immense problems for the application and you need the database to reparse quickly since you know the “wrong” bind variable has been passed to the cursor as a one-off. You also realise that you forgot to implement plan stability measures to prevent this problem from occurring in the first place.

Keeping with the story, you don’t want to wait for Adaptive Cursor Sharing to help you, and need to take action. In the past, you could have used dbms_shared_pool to remedy the situation, but again, it’s not available at the time of writing.

There is another option though, but it comes with caveats.

From what I can tell the process you are about to read is the only option to force a reparse of a cursor referencing the table, but it’s a blunt and harsh method. It will force a re-parse of all cursors referencing the table. Only use it if there is no other option! You may end up fixing one problem and causing many others.

A table-DDL operation causes Oracle to re-parse all SQL statements referring to the table. After the most careful considerations about the side-effect on other SQL statements you have been given the green light by management to go ahead. The potential gain is considered worth the pain.

Before I made any changes, here’s the state of play:

SQL> select object_name, object_type, sysdate as now, to_char(last_ddl_time,'dd.mm.yyyy hh24:mi:ss')
 2 from dba_objects where object_name = 'ORDERS' and owner = 'SOE';

OBJECT_NAME                    OBJECT_TYPE             NOW                 TO_CHAR(LAST_DDL_TI
------------------------------ ----------------------- ------------------- -------------------
ORDERS                         TABLE                   2020-10-25 19:43:36 25.10.2020 19:06:34

The degree of parallelism on my table is 1 (noparallel), so let’s try a DML operation.

SQL> select degree from dba_tables where owner = 'SOE' and table_name = 'ORDERS';

DEGREE
----------------------------------------
         1

SQL> alter table soe.orders noparallel;

Table altered.

SQL> select degree from dba_tables where owner = 'SOE' and table_name = 'ORDERS';

DEGREE
----------------------------------------
         1

SQL> select object_name, object_type, sysdate as now, to_char(last_ddl_time,'dd.mm.yyyy hh24:mi:ss')
 2 from dba_objects where object_name = 'ORDERS' and owner = 'SOE';

OBJECT_NAME                    OBJECT_TYPE             NOW                 TO_CHAR(LAST_DDL_TI
------------------------------ ----------------------- ------------------- -------------------
ORDERS                         TABLE                   2020-10-25 19:44:24 25.10.2020 19:44:12

No change made to the table DOP, but the last DDL time changed. This will force ALL cursors referencing this table to become invalidated.

Now let’s run the “problem” query again, and check the execution statistics:

SQL> select /* atp-test / /+ gather_plan_statistics / count(*) from orders where order_mode_ext = :om; 

  COUNT(*)
----------
  12625080

1 row selected.

SQL_ID: 1cw6t95y0jcsm

SQL>@xi 1cw6t95y0jcsm %
eXplain the execution plan for sqlid 1cw6t95y0jcsm child %...

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
SQL_ID  1cw6t95y0jcsm, child number 0
-------------------------------------
select /* atp-test */ /*+ gather_plan_statistics */ count(*) from
orders where order_mode_ext = :om

Plan hash value: 3862103574

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |      1 |        |   594 (100)|      1 |00:00:01.10 |   34845 |
|   1 |  SORT AGGREGATE               |                |      1 |      1 |            |      1 |00:00:01.10 |   34845 |
|*  2 |   INDEX STORAGE FAST FULL SCAN| I_ORDERS_OMEXT |      1 |     12M|   594  (31)|     12M|00:00:01.44 |   34845 |
-----------------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - :1 (VARCHAR2(30), CSID=873): 'online'

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("ORDER_MODE_EXT"=:OM)


25 rows selected.

SQL>select plan_hash_value, child_number, invalidations, executions, is_bind_aware, is_bind_sensitive
 2 from v$sql where sql_id = '1cw6t95y0jcsm';

PLAN_HASH_VALUE CHILD_NUMBER INVALIDATIONS EXECUTIONS I I
--------------- ------------ ------------- ---------- - -
     3862103574            0             1          1 N Y   

As a direct consequence of the DDL operation against the orders table a new plan has been chosen for SQL ID 1cw6t95y0jcsm (and all other statements referencing table!). As you can see there isn’t a new child cursor for SQL ID 1cw6t95y0jcsm, the old cursor has been evicted completely.

Remember that a DML operation as the one shown in this post is a blunt weapon and only to be used as a last resort.

Video : Scalable Sequences in Oracle Database 18c Onward

In today’s video we’ll discuss Scalable Sequences, which were documented for the first time in Oracle 18c.

The video is based on this article.

The star of today’s video is David Peak, who is now working on the Oracle Pandemic Response Systems. This video is a throwback to a hotel we stayed in at São Paulo a few years back.

Cheers

Tim…

The post Video : Scalable Sequences in Oracle Database 18c Onward first appeared on The ORACLE-BASE Blog.


Video : Scalable Sequences in Oracle Database 18c Onward was first posted on October 26, 2020 at 9:22 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Vagrant & Docker Builds : APEX 20.2 and other updates

The recent release of APEX 20.2 has triggered a build frenzy.

Vagrant

All my GitHub Vagrant builds that include APEX have been updated to APEX 20.2. The builds themselves are unchanged. This was literally an update to the environment files, so it took longer to test the builds than it did to make the changes.

While I was at it, I did a couple of extra updates. I updated Tomcat to version 9.0.39 on all relevant builds, and updated the optional patch script for the single instance database 19c on OL8 build to use the October 2020 bundle patch. The GI bundle isn’t available yet, so I’ve not altered the OL8 19c RAC build. That will happen soon.

Update: I’ve got the GI bundle patch now, and the OL8 19c RAC build has been updated to use it.

There will of course be more updates to the builds once we get the new versions of AdoptOpenJDK, ORDS and SQLcl, that are probably coming soon.

Packer

I mentioned in my VirtualBox 6.1.16 post I would be updating the oraclebase/oracle-7 and oraclebase/oracle-8 vagrant boxes to include the VirtualBox 6.1.16 guest additions. Those are done now.

Docker

This is pretty much the same as the Vagrant story.

The relevant GitHub Docker builds for Oracle database and ORDS containers have been updated to include APEX 20.2.

I’ve also added Tomcat 9.0.39 to the ORDS builds, and updated the optional patch script for the database 19c on OL8 build to use the October 2020 bundle patch.

Once again, more changes will appear as the new versions of AdoptOpenJDK, ORDS and SQLcl appear.

Conclusion

Automation is awesome! A few minutes and we are bang up to date!

Cheers

Tim…

The post Vagrant & Docker Builds : APEX 20.2 and other updates first appeared on The ORACLE-BASE Blog.


Vagrant & Docker Builds : APEX 20.2 and other updates was first posted on October 23, 2020 at 9:05 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

What is a database backup (back to the basics)

By Franck Pachot

.
TL;DR:

  • do not consider a dump (like PostgreSQL pg_dump or Oracle expdp) as a database backup
  • do not consider that your backup is successful if you didn’t test recovery
  • databases provide physical database backups, easy and safe to restore and recover to any point-in-time between first backup and point of failure
  • managed databases provide an easy recovery interface, but don’t trust it before you try it
  • and…

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/10/Screens... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/10/Screens... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/10/Screens... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/10/Screens... 1600w" sizes="(max-width: 1024px) 100vw, 1024px" />
I’ve written this after reading “We deleted the production database by accident </p />
</a></p></div>

    	  	<div class=

Column Groups

Here’s an odd little detail about the statistics of column groups. At first glance it’s counter-intuitive but it’s actually an “obvious” (once you’ve thought about it for a bit) consequence of the approximate_ndv() algorithm for gathering stats.

I’ll present it as a question:

I have a table with two columns: flag and v1. Although the column are not declared as non-null neither holds any nulls. If there are 26 distinct values for flag, and 1,000,000 distinct values for v1, what’s the smallest number of distinct values I should see if I create the column group (flag, v1) ?

The question is, of course, a little ambiguous – there’s the number of distinct values that the column (group) holds and the number that a fresh gather of statistics reports it as holding. Here are the stats from a test run of a simple script that creates, populates and gathers stats on my table:

select  column_name, num_distinct 
from    user_tab_cols 
where   table_name = 'T1'
/

COLUMN_NAME                      NUM_DISTINCT
-------------------------------- ------------
FLAG                                       26
ID                                    1000000
V1                                     999040
SYS_STUQ#TO6BT1REX3P1BKO0ULVR9         989120

There are actually 1,000,000 distinct values for v1 (it’s a varchar2() representation of the id column), but the approximate_ndv() mechanism can have an error of (I believe) up to roughly 1.3%, so Oracle’s estimate here is a little bit off.

The column group (represented by the internal column defonition SYS_STUQ#TO6BT1REX3P1BKO0ULVR9) must hold (at least) 1,000,000 distinct values – but the error in this case is a little larger than the error in v1, with the effect that the number of combinations appears to be less than the number of distinct values for v1!

There’s not much difference in this case between actual and estimate, but there test demonstrates the potential for a significant difference between the estimate and the arithmetic that Oracle would do if the column group didn’t exist. Nominally the optimizer would assume there were 26 million distinct values (though in this case I had only created 1M rows in the table and the optimizer would sanity check that 26M).

So, although the difference between actual and estimate is small, we have to ask the question – are there any cases where the optimizer will ignore the column group stats because of a sanity check that “proves” the estimate is “wrong” – after all it must be wrong if the num_distinct is less than the num_distinct of one of the components. Then again maybe there’s a sanity check that only ignores the column group if the estimate is “wrong enough”, but allows for small variations.

I mention this only because an odd optimizer estimate has shown up recently on the Oracle-L mailing list, and the only significant difference I can see (at present) is that a bad plan appears for a partition where this column group anomaly shows up in the stats, but a good plan appears when the column group anomaly isn’t present.

Footnote:

If you want to recreate the results above, here’s the model I’ve used (tested on 19.3.0.0 and 11.2.0.4):

rem
rem     Script:         column_group_stats_5.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             11.2.0.4
rem 

execute dbms_random.seed(0)

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        chr(65 + mod(rownum,26))        flag,
        rownum                          id,
        lpad(rownum,10,'0')             v1
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6   -- > comment to avoid WordPress format issue
order by
        dbms_random.value
/


select  column_name, num_distinct 
from    user_tab_cols 
where   table_name = 'T1'
/

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns(v1, flag) size 1'
        );
end;
/
 
select  column_name, num_distinct 
from    user_tab_cols 
where   table_name = 'T1'
/

Footnote 2:

As an interesting little statistical quirk, if I defined the column group as (flag, v1) rather than (v1, flag) the estimate for the column group num_distinct was 1,000,000.

VirtualBox 6.1.16

VirtualBox 6.1.16 has been released.

The downloads and changelog are in the usual places.

I’ve done the installation on Windows 10, macOS and Oracle Linux 7 hosts with no drama.

I’ll be running new Packer builds for the oraclebase/oracle-7 and oraclebase/oracle-8 vagrant boxes, so they should appear with the new version of the guest additions over the next day or so.

Cheers

Tim…

PS. It seems I didn’t write a blog post about the release of 6.1.14, but I am using it on all my servers. Interesting… I can only assume I was abducted by aliens immediately after upgrading to 6.1.14. It’s the only possible explanation!

The post VirtualBox 6.1.16 first appeared on The ORACLE-BASE Blog.


VirtualBox 6.1.16 was first posted on October 21, 2020 at 9:48 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Video : Vagrant Oracle Real Application Clusters (RAC) Build

In today’s video we’ll discuss how to build a 2-node RAC setup using Vagrant.

This video is based on the OL8 19c RAC build, but it’s similar to that of the OL7 19c RAC build also. If you don’t have access to the patches from MOS, stick with the OL7 build, as it will work with the 19.3 base release. The GitHub repos are listed here.

If you need some more words to read, you can find descriptions of the builds here, as well as a beginners guide to Vagrant.

The video is a talk through of using the build, not an explanation of each individual step in the build, as that would be a really long video. If you just want a RAC to play with, run the build and you’ll have one. If you want to learn about the steps involved in doing a RAC build, read the scripts that make up the build. Please don’t ask for a GUI step through of a build or I’ll be forced to ask you to read Why no GUI installations anymore? </p />
</p></div>

    	  	<div class=

Why a One-Week Report for AWR Sizing in Azure

It’s not uncommon for different recommended practices to arise in technical sizing and optimization practices.  For many, it’s a compromise between most optimal data and ease of access vs. impact on production environments, which is no different from what we face when sizing Oracle on Azure.

As we know it’s important to lift and shift the workload vs. moving the hardware, we must have as simple way to perform this task.  The question comes up repeatedly as to our preferred method of working with a one-week Automatic Workload Repository, (AWR) report to do so.  This post will be focused on my opportunity of a small test on how different windows of AWR data collection impacted our recommended architecture and sizing in Azure for Oracle.

The examples used are from real customers, with the database and host information masked to protect the customer’s environments.  These customers provided me with both hourly, peak and/or one-week window reports for their AWR workloads.

Example 1

Our First example is a RAC database, single node with two submissions-  a one-hour AWR and a one-week report from the customer.  What you will note is with the one-hour report, the sizing for the host memory and CPU is quite close for both in calculations.  Our challenge is that with Oracle on Azure, it’s always about the IOPs in most cases:

https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam1-300x30.png 300w, https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam1-768x77.png 768w, https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam1-1536x154.png 1536w, https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam1-2048x205.png 2048w" sizes="(max-width: 800px) 100vw, 800px" />

In a one-hour report, multiple reports will need to be gathered since we’re more likely to miss:

  • What was the peak workload?
  • What night-time batch processing occurs?
  • What demands do backups put on the database?
  • What are the top performance hits over time?
  • What long-running processes occur and the calculations from those that run over the one-hour report?

As stated earlier, IOPS is one of the highest concerns with an Oracle workloads on Azure.   We have considerable vCPU and memory we can allocate, but then storage needs to be carefully designed to meet the needs of these IO hungry databases.  As you can see by our first example, the IOPS doubled between the one-hour report and the one-week report.  Also notice there are different values for what is calculated from the values for the average, which are the direct values from AWR and what we calculate for the peak workload.  These values are derived with the understanding that these are averages and that there may be workloads that we could be missing in the one-week report and we grant some “wiggle room” in our calculations.  This is where the “Peak” values come in and we size for this, never the averages displayed in the worksheet.

https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam2-300x106.jpg 300w" sizes="(max-width: 497px) 100vw, 497px" />

These numbers use a separate table that is configurable by workload.  A workload that is heavier in memory, CPU or IOPS can be adjusted to grant more leverage to the output, a decision based on the Oracle specialist stating that we expect more workload in a given area.

The sizing for the above example database would fall to a few VM solutions:

  • 4 vCPU
  • 39 GiB memory

There are significant VMs that have 4 vCPU and 32 GiB in memory.  I would then advise my customer which one of the E-series with an “S” designation, (more flexibility in managed disk if we need to stripe more disk for redo, archive logs, etc.)  with 32 GiB of memory would be best for their database.

For storage, our calculations state we require:

  • 5K IOPS and 3.2K MBPs.

There are only a few managed disk solutions that can provide this.  It’s not an exact value and it doesn’t need to be.  If you look at the table for storage, you’ll quickly see that this corresponds to only a few solutions for this type of heavy IO workload.  Even Ultra Disk is going to be pushed passed its limit:

  Capacity per disk (GiB) IOPS per disk Throughput per disk (MB/s)
Mininum 4 100 1
Maximum 65536 160000 2000

 

This means we automatically bring in a solution that isn’t bound by the individual VM or limits of the managed disk, which for Azure, means Azure NetApp Files, (ANF).   Would we have known to bring in this option if we’d worked with the 1-hour AWR report?  No-  we may have very well recommended to go with premium or ultra disk, missing the fact that the workload will end up demanding an architecture with less restrictions on IO.

This is not to say the customer must go with the recommendations.  They could also take up a strong optimization exercise and eliminate the heavy IO on the database and be able to decrease the resource needs in the cloud.  As an optimization specialist, I’m never going to turn down this option from anyone towards their long-term satisfaction in the cloud.  It’s always about efficient use of your resources, but the first course is to lift and shift the workload as is, which means that is what I need to architect for.

Example 2

This second example is from an average single instance database to be migrated to Azure.

In my workloads, I’ve numbered them to make them unique in the output and there were three that were submitted to Microsoft:

https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam3-300x40.jpg 300w, https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam3-768x103.jpg 768w, https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam3-1536x207.jpg 1536w, https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam3-2048x276.jpg 2048w" sizes="(max-width: 800px) 100vw, 800px" />

  1. A 1-hour workload
  2. A 1-hour peak workload
  3. A 1-week workload report

Workload can also be a perception thing for those who manage and monitor a database.  I’m all about the data for a reason-  How often have you met with IT teams and they are sure that a certain time is the busiest time for their application or database and if you turn on activity reporting, a very different picture results.  The peak load resulted in less vCPU than the one-week report and the IO ended up mixed-  greater MBPs, but less IOPs in the 1-hour peak.  When we take the wiggle room into consideration, we can see that we more than cover the peak workload of 9126.86 MBPs, in our one-week peak MBPs of 17156.98 MPBs.

In other words, I would have captured the peak workload in my “wiggle room” that I used as part of my one-week report and again, have satisfied the sizing exercise and sized the Oracle workload for Azure correctly.

Example 3

The third example is from an Exadata-  another large example, but an interesting one.  This customer wanted to see the difference between running a unique, peak workload they had run one-time, (which they had an AWR for) and their normal weekly workload and it produced quite mixed results:

https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam4-300x30.jpg 300w, https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam4-768x76.jpg 768w, https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam4-1536x152.jpg 1536w, https://dbakevlar.com/wp-content/uploads/2020/10/awr_exam4-2048x203.jpg 2048w" sizes="(max-width: 800px) 100vw, 800px" />

As you can see from the above output, the one-week output produced far less in IO but was significantly higher in vCPU requirements.  This is where we realize that an identified peak can be peak by one resource and not another.  Its about the pain point, rarely across all resources.

In this case, the customer was again requiring significant IOPS and optimization would be required to meet their needs.  They were already faced with this from other cloud vendors and that we produced the data for a baseline, along with optimization recommendations were the reason they decided to go with Azure.  The plan here is to optimize onprem and then re-evaluate the workload, but again, the important reason to bring up this example is that a peak workload may not provide you with the results that you hoped for.

Again, we look at our workload averages and compare them to the one week workload peak numbers, we more than cover for them, which is the goal with the peak numbers.

And the Consensus Is

Have we lost anything by making the customer run one, one-week report vs. running multiple reports to gather various peaks and/or running queries to collect data on peak workloads?

The answer in 98% of cases is No.

Do we decrease the need for manual human calculations of different workloads to come up with exact numbers for workloads with the one-week method of AWR sizing?  Do we decrease the chance for human error by simplifying to a single workload report as well?

Yes and Yes.

Are we using these numbers to size out the storage requirements for Azure NetApp Files, (ANF) or other advanced Azure services or products?

No. Those calculations are done with ANF’s own scripts and are understood by the customer to require a bit more investigation to ensure they get the best performance from the service.  Our calculations only tell us that we will need the service to meet the IO needs of the customer’s workload, which is accurate.

The one-week report offers the most optimal and simple sizing solution for estimating workloads for Azure without requiring extensive effort from the customer’s resources, without putting extra pressure on the customer’s databases and simplifies the ability to size out the workload for Azure for those that may not be as familiar with Oracle as some of us.

 

 



Tags:  ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright ©  [Why a One-Week Report for AWR Sizing in Azure], All Right Reserved. 2020.

Automattic Awarded Coveted Spot on Forbes Cloud 100 List

Automattic — a leader in publishing and e-commerce software and the parent company behind the industry-leading brands WordPress.com, WooCommerce, WordPress VIP, Jetpack, Tumblr, and more — was awarded a coveted spot on the prestigious Forbes Cloud 100 list, the annual ranking of the world’s top private cloud companies. In partnership with Bessemer Venture Partners and Salesforce Ventures, the Forbes Cloud 100 recognizes standouts in tech’s hottest categories from disruptive startups to internet giants.

A pioneer in democratizing publishing and e-commerce, WordPress powers 38 percent of all websites globally, has 10x the content management market share of its nearest competitor, and is the platform of choice for tens of millions of websites around the world. 

WooCommerce, Automattic’s e-commerce solution, powers 30 percent of the top one million global e-commerce websites — allowing anyone to sell anything from anywhere. With WooCommerce, people can build exactly the business they want, with everything they need to run their store on a single platform. 

Automattic’s technology also powers the largest brands on the web. The WordPress VIP Platform is used by more than 250 enterprises, including Facebook, Microsoft, Spotify, and CNN, to publish content to hundreds of millions of readers and users. VIP’s purpose-built infrastructure delivers flexibility, security, and control with unrivaled performance and effortless scaling.

Automattic’s innovation is also attracting a growing and diverse array of platform interactions  —  e.g. 1.7 million new users registering each month across the Automattic ecosystem, 1.2 billion monthly unique visitors on WordPress.com, and 9 billion monthly page views on Tumblr. 

“We are incredibly proud to be included in the Forbes Cloud 100 list — for the fifth year in a row — among so many other noteworthy companies,” said Matt Mullenweg, CEO, Automattic. “Our passion is making the web a better place, and I credit the extraordinary results over the years to the talented and wonderful people — both inside and outside our organization — who bring the Automattic vision to life every day.”

Join Performance for UUID, STRING, and INTEGER with CockroachDB

overview

To continue on the UUID performance thread, I was recently asked by a customer how joins perform with various data types. I had not run a specific test, but suspected perform would be driven mostly by the size of the data types.

I wanted to verify my assumptions with real test data that shows the core performance of joins with CockroachDB.

the schema, data, and queries

For this test, two tables were created. The first table had one million rows and the second table had 200k matching primary keys for UUID, STRING, and INTEGER data types.

schema:

create table u1 (id uuid primary key);
create table u2 (id uuid primary key);

create table s1 (id string primary key);
create table s2 (id string primary key);

create table i1 (id integer primary key);
create table i2 (id integer primary key);

data load:

insert into u1 select token from generate_series(1,1000000);
insert into u2 select token from u1 offset 400000 limit 200000;

insert into s1 select token::string from u1;
insert into s2 select token::string from u1 offset 400000 limit 200000;

insert into i1 select gs from generate_series(1,1000000) as gs;
insert into i2 select gs from generate_series(400001,600000) as gs;

queries:

select count(*) from u1 join u2 using (id);
select count(*) from s1 join s2 using (id);
select count(*) from i1 join i2 using (id);

sizing of types and tables

The data type sizes are as follows:

  • UUID :: 16 bytes
  • STRING :: 36 bytes to store UUID as string
  • INTEGER :: 8 bytes for default 64bit values

CockroachDB uses prefix compression compress the primary key values, so the best way is to look at the size of the ranges for the various tables.

-- UUID table (u1)
--
SQL> show ranges from table u1;
  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       43 |     40.054517 |            1 |                       | {1}      | {""}

-- STRING table (s1) :: Storing UUIDs
--
SQL> show ranges from table s1;
  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       45 |            60 |            1 |                       | {1}      | {""}

-- INTEGER table (i1) :: Storing INTEGER values
--
SQL> show ranges from table i1;
  start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities
------------+---------+----------+---------------+--------------+-----------------------+----------+---------------------
  NULL      | NULL    |       49 |     24.934101 |            1 |                       | {1}      | {""}

As you can see, the ranges are compressed pretty well which will surely help improve the performance of joins for all data-types.

raw size (MB) Stored(MB) Compression Ratio
INTEGER 61.0 24.9 2.45
UUID 122.1 40.1 3.05
STRING 274.7 60.0 4.58

… and the winner is

As expected, the performance does follow the size of the various columns. The String data type did perform better than expected, mainly due to the compression of the prefix and data within CockroachDB.

Please feel free to take CockroachDB to a spin and try this for yourself. The instructions to reproduce are in my github repository for join tests.