Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Troubleshooting

Here’s a question to provoke a little thought if you’ve got nothing more entertaining to do on a Sunday evening.  What threats do you think of when you see a statement like the following in (say) an AWR report, or in a query against v$sql ?

update tableX set
        col001 = :1, col002 = :2, col003 = :3, ...
        -- etc. 
        -- the names are supposed to indicate that the statement updates 302 columns
        -- etc.
        col301 = :301, col302 = :302
where
        pk_col = :303
;

I’ll be writing up some suggestions tomorrow (Monday, UK BST), possible linking to a few other articles for background reading.

Update

The first three comments have already hit the high points, but I’m going to jot down a few notes anyway.

The first two things that really (should) make an impact are:

  • There’s a table in the application with (at least) 303 columns – anything over 255 is a potential disaster area
  • An update statement that updates 302 columns is probably machine generated by a non-scalable application

A secondary detail that might be useful is recognising the pattern of the text – lower case for column names, simple “:nnn” for bind variables.  As it stands I don’t recognise the possible source for this style, but I know it’s not SQL embedded in PL/SQL (which would be all capitals with “:Bnnn” as bind variable names) and it’s not part of a distributed query from a remote database (which would be in capitals with quoted names, and table aliases like “A1, A2, …”), and it’s not “raw” Hiberbate code which produces meaningless table and column aliases based on numbers with a “t” for table and “c” for column.

So let’s think about possible problems and symptoms relating to the two obvious issues:

Wide tables

Once you have more than 255 (real) columns in a table – even if that count includes columns that have been marked unused – Oracle will have to split rows into “rowpieces” that do not exceed 255 columns and chain those pieces together. Oracle will try to be as efficient as possible – with various damage-limiting code changes appearing across versions – attempting store these row pieces together and keeping the number to a minimum, but there are a number of anomalies that can appear that have a significant impact on performance.

Simply having to visit two row pieces to pick up a column in the 2nd row piece (even if it is in the same block) adds to the cost of processing; but when you have to visit a second block to acquire a 2nd (or 3rd, or 4th) rowpiece the costs can be significant. As a quirky little extra, Oracle’s initial generation of row-pieces creates them from the end backwards – so a row with 256 columns starts with a row-piece of one column following by a rowpiece of 255 columns: so you may find that you have to fetch multiple row pieces for virtually every row you access.

It’s worth noting that a row splitting is based only on columns that have been used in the row. If your data is restricted to the first 255 column of a row then the entire row can be stored as a single row piece (following the basic rule that “trailing nulls take no space”); but as soon as you start to update such a row by populating columns past the 255 boundary Oracle will start splitting from the end – and it may create a new trailing row-piece each time you populate a column past the current “highest” column.  In an extreme case I’ve managed to show an example of a single row consisting of 746 row pieces, each in a different block (though that’s a bug/feature that’s been addressed in very recent versions of Oracle).

If rows have been split across multiple blocks then one of the nastier performance problems appears with direct path read tablescans. As Oracle follows the pointer to a secondary row piece it will do a physical read of the target block then immediately forget the target block so, for example, if you have inserted 20 (short) rows into a block then updated all of them in a way that makes them split and all their 2nd row pieces go to the same block further down the table you can find that Oracle does a single direct path read that picks up the head pieces, then 20 “db file sequential read” calls to the same block to pick up the follow-on pieces. (The same effect appears with simple migrated rows.) Contrarily, if you did the same tablescan using “db file scattered read” requests then Oracle might record a single, highly deceptive “table fetch continued row” because it managed to pin the target block and revisit it 20 times.

Often a very wide row (large number of columns) means the table is trying to hold data for multiple types of object. So a table of 750 columns may use the first 20 columns for generic data, columns 21 to 180 for data for object type A, 181 to 395 for data for object type B, and so on.  This can lead to rows with a couple of hundred used columns and several hundred null columns in the middle of each row – taking one byte per null column and burning up lots of CPU as Oracle walks a row to find a relevant column. A particularly nasty impact can appear from this type of arrangement when you upgrade an applications:  imagine you have millions of rows of the “type A” above which use only the first 180 columns.  For some reason the application adds one new “generic” column that (eventually) has to be populated for each row – as the column is populated for a type A row the row grows by 520 (null counting) bytes and splits into at least 3 pieces. The effect could be catastrophic for anyone who had been happy with their queries reporting type A data.

One of the difficulties of handling rows that are chained due to very high column counts is that the statistics can be quite confusing (and subject to change across versions). The most important clue comes from “table fetch continued row”; but this can’t tell you whether your “continued” rows are migrated or chained (or both), which table they come from, and whether you’ve been fetching the same small number multiple times or many different rows. Unfortunately the segment statistics (v$segstat / v$segment_statistics) don’t capture the number of continued fetches by segment – it would be nice if they did since it ought to be a rare (and therefore low-cost) event. The best you can do, probably, is to look at the v$sql_monitor report for queries that report tablescans against large tables but report large numbers of single block reads in the tablescan – and for repeatable cases enable SQL trace with wait tracing enabled against suspect queries to see if they show the characteristic mix of direct path reads and repeated db file sequential reads.

Update every column

The key performance threat in statements that update every column – including the ones that didn’t change – is that Oracle doesn’t compare before and after values when doing the update. Oracle’s code path assumes you know what you’re doing so it saves every “old” value to an undo record (which gets copied to the redo) and writes every “new” value to a redo change vector.  (Fortunately Oracle does check index definitions to see which index entries really have suffered changes, so it doesn’t visit index leaf blocks unnecessarily). It’s possible that some front-end tool that adopts this approach has a configuration option that switches from “one SQL statement for all update” to “construct minimal statement based on screen changes”.

The simple trade-off between these two options is the undo/redo overhead vs. parsing and optimisation overheads as the tool creates custom statements on demand. In the case of the table with more than 255 columns, of course, there’s the added benefit that an update of only the changed columns might limit the update to columns that are in the first rowpiece, eliminating the need (some of the time) to chase pointers to follow-up pieces.

Limiting the update can help with undo and redo, of course, but if the tool always pulls the entire row to the client anyway you still have network costs to consider. With the full row pulled and then updated you may find it takes several SQL*Net roundtrips to transfer the whole row between client and server.  In a quick test on a simple 1,000 column table with an update that set every column in a single row to null (using a bind variables) I found that the a default setup couldn’t even get 1,000 NULLs (let alone “real values”) across the network without resorting to one wait on “SQL*Net more data from client”


variable b1 number
exec :b1 := null;

update t1 set
        col0001 = :b1,
        col0002 = :b1,
        ...
        col1000 = :b1
;

Although “SQL*Net message to/from client” is typically (though not always correctly) seen as an ignorable wait, you need to be a little more suspicious of losing time to “SQL*Net more data to/from client”. The latter two waits mean you’re sending larger volumes of information across the network and maybe you can eliminate some of the data or make the transfer more efficient – perhaps a simple increase in the SDU (session data unit) in the tnsnames.ora, listener.ora, or sqlnet.ora (for default_sdu_size) might be helpful.

Warning

One of the features of trouble-shooting from cold is that you don’t know very much about the system you’re looking at – so it’s nice to be able to take a quick overview of a system looking for any potentially interesting anomalies and take a quick note of what they are and what thoughts they suggest before you start asking questions and digging into a system. This article is just a couple of brief notes along the lines of: “that doesn’t look nice- what questions does it prompt”.

Troubleshooting

An anecdote with a moral.

Many years ago – in the days of Oracle 7.2.3, when parallel query and partition views were in their infancy and when RAC was still OPS (Oracle Parallel Server), I discovered a bug that caused parallel queries against partition views to crash (Maxim – mixing two new features is a good way to find bugs). I no longer recall the details of exact syntax but the problem revolved around the way Oracle used to rewrite queries for parallel execution. In outline it was something like the following:


create or replace view v1
as
select * from t1
union all
select * from t2
;

select  /*+ parallel(v1 2) */ * 
from    v1 
where   pv_col between 1 and 10
and     date_col = to_date('1-Apr-1999','dd-mm-yyyy')

If you had followed the rules about partition views then Oracle would generate some code that managed to combine the partitioning definitions with the view definition and query predicates and come up with rewritten code for the parallel query slaves that looked something like (e.g.)

select  {list of columns}
from    t1
where   pv_col between 1 and 10 
and     pv_col >= 0 and pv_col < 3000 -- > comment to avoid wordpress format issue 
and     date_col = to_date(:SYS_B1,'dd-mm-yyyy') 
union all 
select  {list of columns} 
from    t2 
where   pv_col between 1 and 10 
and     pv_col >= 3000 and pv_col < 6000 -- > comment to avoid wordpress format issue
and     date_col = to_date(:SYS_B2,'dd-mm-yyyy')

In this case I’ve defined my partitions (using constraints on the underlying tables) so that t1 will hold rows where pv_col >= 0 and pv_col < 3000, and t2 will hold rows where pv_col >= 3000 and pv_col < 6000. As you can see the optimizer has expanded the query with the view text and  pulled the constraints into the query and will be able to bypass all access to t2 because it can reduce the pv_col predicates on t2 into the contradiction “1 = 0”. Here’s the basic form of the execution plan we’d be looking for with partition elimination:


-----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |    15 |  1500 |     2   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10000 |    15 |  1500 |     2   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    VIEW                 | V1       |    15 |  1500 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|   4 |     UNION-ALL           |          |       |       |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR  |          |    12 |  1308 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL | T1       |    12 |  1308 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|*  7 |      FILTER             |          |       |       |            |          |  Q1,00 | PCWC |            |
|   8 |       PX BLOCK ITERATOR |          |     1 |   109 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  9 |        TABLE ACCESS FULL| T2       |     1 |   109 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------


Note the FILTER at operation 7 – the predicate there is actually “NULL IS NOT NULL” for new versions of Oracle, but would have been “1 = 0” for older versions, so operations 8 and 9 would not get executed.

Notice, also, that my date predicate changes from a literal to a bind variable. This just happened to be the way things were done for parallel query in that version of Oracle. So now we can get to the bug. Our data was actually partitioned by day using a date(-only) column, and the initial target was to keep two years of data. The trouble was that certain queries kept crashing with Oracle error “ORA-12801: error signaled in parallel query server”.

Naturally I created a simplified model (smaller tables, fewer partitions in the views) to try and track down the problem – and the problem disappeared. So I took a crashing query from production, and started creating partition views with fewer and fewer table until the query stopped crashing, and what I discovered was the following:

  • If you had 2 dates in the query it crashed if the view held 128 or more tables
  • If you had 3 dates in the query it crashed if the view held 86 or more tables

Think about the arithmetic for a moment: 2 * 128 = 256, 3 * 86 = 258. Does this give you a clue ?

What it suggested to me was that someone in Oracle Development had used a single byte to index the array of bind variables that they defined for use with the generated parallel query text, so when you had 2 dates and needed 256 bind variable the counter overflowed, when you had 3 dates and needed 258 bind variables the counter overflowed. Having made the hypothesis I predicted that a query would crash if there were 4 dates and 64 partitions, but survive if there were only 63 partitions. (And I was right.)

When I called this in to Oracle support (who remembers those early days when you were supposed to “phone” Oracle support on a “land-line”) and suggested the source of the problem I was told that there was “no limit on the number of bind variables a query could handle in Oracle”. Notice how this is essentially a true statement – but has nothing to do with my suggestion.

Several months (maybe even a couple of years) later – long after the client had upgraded to 7.3.2 then 7.3.4 and I was no long on site – I got a call from Oracle support who wanted to close the TAR (as SR’s used to be known) because they’d discovered the problem and it was … see above. I got the feeling that no-one had considered my suggestion for a long time because they “knew” it had to be  wrong.

The moral(s) of the story

  • Listen to the question carefully – you may not  have heard what you were assuming you would hear.
  • Listen to the answer carefully – it may sound like a convincing response to your question while not being relevant to the actual question.
  • “It’s not supposed to do that” isn’t the same as “That didn’t happen” or (to mis-quote a well-known philosophical problem): “you can’t turn an ‘ought not’ into a ‘did not”

One thing that’s worth emphasising is that everyone (and that does include me) will occasionally hear what they’re expecting to hear and completely misunderstand the point of the question. So when someone says something which is clearly very silly pause for thought then ask them, with care and precision, if they just said what you thought they said – maybe what they said and what you heard were very different. (The same is true for twitter, list servers, forums etc. – it’s very easy to misinterpret a short piece of text, and it may be the way it’s been written but it may be the way it’s been read.)

 

Split Partition

This is a little case study on “pre-emptive trouble-shooting”, based on a recent question on the ODC database forum asking about splitting a range-based partition into two at a value above the current highest value recorded in a max_value partition.

The general principle for splitting (range-based) partitions is that if the split point is above the current high value Oracle will recognise that it can simply rename the existing partition and create a new, empty partition, leaving all the indexes (including the global and globally partitioned indexes) in a valid state. There are, however, three little wrinkles to this particular request:

  • first is that the question relates to a system running 10g
  • second is that there is a LOB column in the table
  • third is that the target is to have the new (higher value) partition(s) in a different tablespace

It’s quite possible that 10g won’t have all the capabilities of partition maintenance of newer versions, and if anything is going to go wrong LOBs are always a highly dependable point of failure, and since all the examples in the manuals tend to be very simple examples maybe any attempt to introduce complications like tablespace specification will cause problems.

So, before you risk doing the job in production, what are you going to test?

In Oracle terms we want to check the following

  • Will Oracle have silently copied/rebuilt some segments rather than simply renaming old segments and creating new, empty segments.
  • Will the segments end up where we want them
  • Will all the indexes stay valid

To get things going, the OP had supplied a framework for the table and told us about two indexes, and had then given us two possible SQL statements to do the split, stating they he (or she) had tested them and they both worked. Here’s the SQL (with a few tweaks) that creates the table and indexes. I’ve also added some data – inserting one row into each partition.

rem
rem     Script:         split_pt_lob.sql
rem     Author:         Jonathan Lewis
rem     Dated:          July 2019
rem
rem     Last tested 
rem             12.2.0.1
rem             10.2.0.5
rem

define m_old_ts = 'test_8k'
define m_new_ts = 'assm_2'

drop table part_tab purge;

create table part_tab(
  pt_id            NUMBER,
  pt_name          VARCHAR2(30),
  pt_date          DATE default SYSDATE,
  pt_lob           CLOB,
  pt_status        VARCHAR2(2)
)
tablespace &m_old_ts
lob(pt_lob) store as (tablespace &m_old_ts)
partition by range (pt_date)
(
  partition PRT1 values less than (TO_DATE('2012-01-01', 'YYYY-MM-DD')),
  partition PRT2 values less than (TO_DATE('2014-09-01', 'YYYY-MM-DD')),
  partition PRT_MAX values less than (MAXVALUE)
)
/

alter table part_tab
add constraint pt_pk primary key(pt_id)
/

create index pt_i1 on part_tab(pt_date, pt_name) local
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    1,'one',to_date('01-Jan-2011'),rpad('x',4000),'X'
)
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    2,'two',to_date('01-Jan-2013'),rpad('x',4000),'X'
)cascade=>trueee
/

insert into part_tab(
    pt_id, pt_name, pt_date, pt_lob, pt_status
)
values(
    3,'three',to_date('01-Jan-2015'),rpad('x',4000),'X'
)
/

commit;

execute dbms_stats.gather_table_stats(null,'part_tab',cascade=>true,granularity=>'ALL')

We were told that

The table has
– Primary Key on pt_id column with unique index (1 Different table has FK constraint that refers to this PK)
– Composite index on pt_date and pt_name columns

This is why I’ve added a primary key constraint (which will generate a global index) and created an index on (pt_date,pt_name) – which I’ve created as a local index since it contains the partitioning column.

The description of the requirement was:

  • The Task is to split partition(PRT_MAX) to a different tablespace
  • New partition(s) won’t have data at the moment of creation

And the two “tested” strategies were:

alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
        PARTITION PRT3    tablespace &m_old_ts,
        PARTITION PRT_MAX tablespace &m_new_ts
);

alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
        PARTITION PRT3    tablespace &m_old_ts LOB (pt_lob) store as (TABLESPACE &m_old_ts), 
        PARTITION PRT_MAX tablespace &m_new_ts LOB (pt_lob) store as (TABLESPACE &m_new_ts)
)
;
 

If we’re going to test these strategies properly we will need queries similar to the following:


break on object_name skip 1
select object_name, subobject_name, object_id, data_object_id  from user_objects order by object_name, subobject_name;

break on index_name skip 1
select index_name, status from user_indexes;
select index_name, partition_name, status from user_ind_partitions order by index_name, partition_name;

break on segment_name skip 1
select segment_name, partition_name, tablespace_name from user_segments order by segment_name, partition_name;

First – what are the object_id and data_object_id for each object before and after the split. Have we created new “data objects” while splitting, or has an existing data (physical) object simply changed its name.

Secondly – are there any indexes or index partitions that are no longer valid

Finally – which tablespaces do physical objects reside in.

On a test run of the first, simpler, split statement here are the before and after results for the object_id and data_object_id, followed by the post-split results for index and segment details:


Before Split
============

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23677          23677
                                 PRT2                        23678          23678
                                 PRT_MAX                     23679          23679
                                                             23676

PT_I1                            PRT1                        23690          23690
                                 PRT2                        23691          23691
                                 PRT_MAX                     23692          23692
                                                             23689

PT_PK                                                        23688          23688

SYS_IL0000023676C00004$$         SYS_IL_P252                 23685          23685
                                 SYS_IL_P253                 23686          23686
                                 SYS_IL_P254                 23687          23687

SYS_LOB0000023676C00004$$        SYS_LOB_P249                23681          23681
                                 SYS_LOB_P250                23682          23682
                                 SYS_LOB_P251                23683          23683
                                                             23680          23680

After split
===========

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23677          23677
                                 PRT2                        23678          23678
                                 PRT3                        23693          23679
                                 PRT_MAX                     23679          23694
                                                             23676

PT_I1                            PRT1                        23690          23690
                                 PRT2                        23691          23691
                                 PRT3                        23700          23692
                                 PRT_MAX                     23699          23699
                                                             23689

PT_PK                                                        23688          23688

SYS_IL0000023676C00004$$         SYS_IL_P252                 23685          23685
                                 SYS_IL_P253                 23686          23686
                                 SYS_IL_P257                 23697          23687
                                 SYS_IL_P258                 23698          23698

SYS_LOB0000023676C00004$$        SYS_LOB_P249                23681          23681
                                 SYS_LOB_P250                23682          23682
                                 SYS_LOB_P255                23695          23683
                                 SYS_LOB_P256                23696          23696
                                                             23680          23680


INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            VALID
SYS_IL0000023676C00004$$         N/A


INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   USABLE
                                 PRT_MAX                USABLE

SYS_IL0000023676C00004$$         SYS_IL_P252            USABLE
                                 SYS_IL_P253            USABLE
                                 SYS_IL_P257            USABLE
                                 SYS_IL_P258            USABLE


SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023676C00004$$  SYS_IL_P252            TEST_8K
                          SYS_IL_P253            TEST_8K
                          SYS_IL_P257            TEST_8K
                          SYS_IL_P258            TEST_8K

SYS_LOB0000023676C00004$$ SYS_LOB_P249           TEST_8K
                          SYS_LOB_P250           TEST_8K
                          SYS_LOB_P255           TEST_8K
                          SYS_LOB_P256           TEST_8K

Before the split partition PRT_MAX – with 4 segments: table, index, LOB, LOBINDEX – has object_id = data_object_id, with the values: 23679 (table), 23692 (index), 23683 (LOB), 23687 (LOBINDEX); and after the split these reappear as the data_object_id values for partition PRT3 (though the object_id values are larger than the data_object_id values) – so we infer that Oracle has simply renamed the various PRT_MAX objects to PRT3 and created new, empty PRT_MAX objects.

We can also see that all the indexes (including the global primary key index) have remained valid. We also note that the data_object_id of the primary key index has not changed, so Oracle didn’t have to rebuild it to ensure that it stayed valid.

There is a problem, though, the LOB segment and LOBINDEX segments for the new PRT_MAX partition are not in the desired target tablespace. So we need to check the effects of the second version of the split command where we add the specification of the LOB tablespaces. This is what we get – after rerunning the entire test script from scratch:


OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23727          23727
                                 PRT2                        23728          23728
                                 PRT_MAX                     23729          23729
                                                             23726

PT_I1                            PRT1                        23740          23740
                                 PRT2                        23741          23741
                                 PRT_MAX                     23742          23742
                                                             23739

PT_PK                                                        23738          23738

SYS_IL0000023726C00004$$         SYS_IL_P272                 23735          23735
                                 SYS_IL_P273                 23736          23736
                                 SYS_IL_P274                 23737          23737

SYS_LOB0000023726C00004$$        SYS_LOB_P269                23731          23731
                                 SYS_LOB_P270                23732          23732
                                 SYS_LOB_P271                23733          23733
                                                             23730          23730


OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23727          23727
                                 PRT2                        23728          23728
                                 PRT3                        23743          23743
                                 PRT_MAX                     23729          23744
                                                             23726

PT_I1                            PRT1                        23740          23740
                                 PRT2                        23741          23741
                                 PRT3                        23750          23750
                                 PRT_MAX                     23749          23749
                                                             23739

PT_PK                                                        23738          23738

SYS_IL0000023726C00004$$         SYS_IL_P272                 23735          23735
                                 SYS_IL_P273                 23736          23736
                                 SYS_IL_P277                 23747          23747
                                 SYS_IL_P278                 23748          23748

SYS_LOB0000023726C00004$$        SYS_LOB_P269                23731          23731
                                 SYS_LOB_P270                23732          23732
                                 SYS_LOB_P275                23745          23745
                                 SYS_LOB_P276                23746          23746
                                                             23730          23730

INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            UNUSABLE
SYS_IL0000023726C00004$$         N/A

INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   UNUSABLE
                                 PRT_MAX                USABLE

SYS_IL0000023726C00004$$         SYS_IL_P272            USABLE
                                 SYS_IL_P273            USABLE
                                 SYS_IL_P277            USABLE
                                 SYS_IL_P278            USABLE

SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023726C00004$$  SYS_IL_P272            TEST_8K
                          SYS_IL_P273            TEST_8K
                          SYS_IL_P277            TEST_8K
                          SYS_IL_P278            ASSM_2

SYS_LOB0000023726C00004$$ SYS_LOB_P269           TEST_8K
                          SYS_LOB_P270           TEST_8K
                          SYS_LOB_P275           TEST_8K
                          SYS_LOB_P276           ASSM_2


Before looking at the more complex details the first thing that leaps out to hit the eye is the word UNUSABLE – which appears for the status of the (global) primary key index and the PRT3 subpartition. The (empty) PRT_MAX LOB and LOBINDEX partitions are where we wanted them, but by specifying the location we seem to have broken two index segments that will need to be rebuilt.

It gets worse, because if we check the data_object_id of the original PRT_MAX partition (23729) and its matching index partition (23742) we see that they don’t correspond to the (new) PRT3 data_object_id values which are 23743 and 23750 respectively – the data has been physically copied from one data object to another completely unnecessarily; moreover the same applies to the LOB and LOBINDEX segments – the data object ids for the PRT_MAX LOB and LOBINDEX partitions were 23733 and 23737, the new PRT3 data object ids are 23746 and 23747.

If you did a test with only a tiny data set you might not notice the implicit threat that these changes in data_object_id tell you about – you’re going to be copying the whole LOB segment when you don’t need to.

Happy Ending (maybe)

A quick check with 12.2 suggested that Oracle had got much better at detecting that it didn’t need to copy LOB data and invalidate indexes with the second form of the code; but the OP was on 10g – so that’s not much help. However it was the thought that Oracle might misbehave when you specifyied tablespaces that made me run up this test – in particular I had wondered if specifying a tablespace for the partition that would end up holding the existing data might trigger an accident, so here’s a third variant of the split statement I tested, with the results on the indexes, segments, and data objects. Note that I specify the tablespace only for the new (empty) segments:


alter table part_tab split partition PRT_MAX at(TO_DATE('2019-08-01', 'YYYY-MM-DD')) into (
    PARTITION PRT3,
    PARTITION PRT_MAX tablespace &m_new_ts  LOB (pt_lob) store as (TABLESPACE &m_new_ts)
)
/

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23752          23752
                                 PRT2                        23753          23753
                                 PRT_MAX                     23754          23754
                                                             23751

PT_I1                            PRT1                        23765          23765
                                 PRT2                        23766          23766
                                 PRT_MAX                     23767          23767
                                                             23764

PT_PK                                                        23763          23763

SYS_IL0000023751C00004$$         SYS_IL_P282                 23760          23760
                                 SYS_IL_P283                 23761          23761
                                 SYS_IL_P284                 23762          23762

SYS_LOB0000023751C00004$$        SYS_LOB_P279                23756          23756
                                 SYS_LOB_P280                23757          23757
                                 SYS_LOB_P281                23758          23758
                                                             23755          23755

OBJECT_NAME                      SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID
-------------------------------- ---------------------- ---------- --------------
PART_TAB                         PRT1                        23752          23752
                                 PRT2                        23753          23753
                                 PRT3                        23768          23754
                                 PRT_MAX                     23754          23769
                                                             23751

PT_I1                            PRT1                        23765          23765
                                 PRT2                        23766          23766
                                 PRT3                        23775          23767
                                 PRT_MAX                     23774          23774
                                                             23764

PT_PK                                                        23763          23763

SYS_IL0000023751C00004$$         SYS_IL_P282                 23760          23760
                                 SYS_IL_P283                 23761          23761
                                 SYS_IL_P287                 23772          23762
                                 SYS_IL_P288                 23773          23773

SYS_LOB0000023751C00004$$        SYS_LOB_P279                23756          23756
                                 SYS_LOB_P280                23757          23757
                                 SYS_LOB_P285                23770          23758
                                 SYS_LOB_P286                23771          23771
                                                             23755          23755
INDEX_NAME                       STATUS
-------------------------------- --------
PT_I1                            N/A
PT_PK                            VALID
SYS_IL0000023751C00004$$         N/A

INDEX_NAME                       PARTITION_NAME         STATUS
-------------------------------- ---------------------- --------
PT_I1                            PRT1                   USABLE
                                 PRT2                   USABLE
                                 PRT3                   USABLE
                                 PRT_MAX                USABLE

SYS_IL0000023751C00004$$         SYS_IL_P282            USABLE
                                 SYS_IL_P283            USABLE
                                 SYS_IL_P287            USABLE
                                 SYS_IL_P288            USABLE

SEGMENT_NAME              PARTITION_NAME         TABLESPACE_NAME
------------------------- ---------------------- ------------------------------
PART_TAB                  PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_I1                     PRT1                   TEST_8K
                          PRT2                   TEST_8K
                          PRT3                   TEST_8K
                          PRT_MAX                ASSM_2

PT_PK                                            TEST_8K

SYS_IL0000023751C00004$$  SYS_IL_P282            TEST_8K
                          SYS_IL_P283            TEST_8K
                          SYS_IL_P287            TEST_8K
                          SYS_IL_P288            ASSM_2

SYS_LOB0000023751C00004$$ SYS_LOB_P279           TEST_8K
                          SYS_LOB_P280           TEST_8K
                          SYS_LOB_P285           TEST_8K
                          SYS_LOB_P286           ASSM_2

All the index and index partitions stay valid; the new empty segments all end up in the target tablespace, and all the data object ids for the old PRT_MAX partitions becaome the data object ids for the new PRT3 partitions. Everything we want, and no physical rebuilds of any data sets.

Moral:

When you’re testing, especially when you’re doing a small test while anticipating a big data set, don’t rely on the clock; check the data dictionary (and trace files, if necessary) carefully to find out what activity actually took place.

Footnote:

It’s possible that there are ways to fiddle around with the various default attributes of the partitioned table to get the same effect – but since 12.2 is much better behaved anyway there’s no point in me spending more time looking for alternative solutions to a 10g problem.

 

Free Space

Several years ago I wrote a note about reporting dba_free_space and dba_extents to produce a map of the space usage in a tablespace in anticipation of messing about with moving or rebuilding objects to try and reduce the size of the files in the tablespace.  In the related page where I published the script I pointed out that a query against dba_extents would be expensive because it makes use of structure x$ktfbue which generates the information dynamically by reading segment header blocks. I also pointed out in a footnote to the original article that if you’ve enabled the recyclebin and have “dropped” some objects then there will be some space that is reported as free but is not quite free since the extents will still be allocated. This brings me to the topic for today’s blog.

While visiting a client site recently I came across an instance that was running a regular report to monitor available space in the database. Basically this was a query against view dba_free_space. Surprisingly it was taking a rather long time to complete – and the reason for this came in two parts. First, the recyclebin was enabled and had some objects in it and secondly there were no stats on the fixed object x$ktfbue.

In the case of the client the particular query produced a plan that included that included the following lines:


Id  Operation             Name              Rows    Bytes  Cost (%CPU)  Time
--  --------------------- ----------------  ----   ------  -----------  --------
63  HASH JOIN                               2785     212K     46  (85)  00:00:01
64    TABLE ACCESS FULL   RECYCLEBIN$       1589    20657      7   (0)  00:00:01
65    FIXED TABLE FULL    X$KTFBUE          100K    6347K     38 (100)  00:00:01 

This is part of the view where Oracle calculates the size of all the extents of objects in the recyclebin so that they can be reported as free space. Notice that in this plan (which is dependent on version, system stats, object_stats and various optimizer parameters) the optimizer has chosen to do a hash join between the recyclebin (recyclebin$) and the x$ structure – and that has resulted in a “full tablescan” of x$ktfbue, which means Oracle reads the segment header block of every single segment in the entire database. (I don’t know where the row stats came from as there were no stats on x$ktfbue, and this plan was pulled from the AWR history tables so the query had been optimised and captured some time in the past.)

If there had been nothing in the recyclebin the hash join and two tablescans wouldn’t have mattered, unfortunately the recyclebin had been enabled and there were a few rows in recyclebin$, so the “tablescan” happened. Here’s a cut-n-paste from a much simpler query run against a fairly new (no 3rd party app) database running 12.1.0.2 to give you some idea of the impact:


SQL> execute snap_events.start_snap

PL/SQL procedure successfully completed.

SQL> select count(*) from x$ktfbue;

  COUNT(*)
----------
      8774

1 row selected.

SQL> execute snap_events.end_snap
---------------------------------------------------------
Session Events - 01-Aug 21:28:13
---------------------------------------------------------
Event                                             Waits   Time_outs        Csec    Avg Csec    Max Csec
-----                                             -----   ---------        ----    --------    --------
Disk file operations I/O                              7           0           0        .018           1
db file sequential read                           5,239           0          14        .003           6
SQL*Net message to client                             7           0           0        .000           0
SQL*Net message from client                           7           0       1,243     177.562         572
events in waitclass Other                             3           1           0        .002           0

PL/SQL procedure successfully completed.

On my little laptop, with nothing else going on, I’ve managed to get away with “only” 5,239 single block reads, and squeezed them all into just 14 centiseconds (local SSD helps). The clients wasn’t so lucky – they were seeing tens of thousands of real physical reads.

The ideal solution, of course, was to purge the recyclebin and disable the feature – it shouldn’t be necessary to enable it on a production system – but that’s something that ought to require at least some paperwork. In the short term gathering stats on the fixed table helped because the plan changed from a hash join with “tablescan” of x$ktfbue to a nested loop with an “indexed” access path, looking more like the following (from a query against just recyclebin$ and x$ktfbue)

---------------------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |       |       |     4 (100)|          |
|   1 |  NESTED LOOPS            |                  |     7 |   182 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL      | RECYCLEBIN$      |     6 |    66 |     4   (0)| 00:00:01 |
|*  3 |   FIXED TABLE FIXED INDEX| X$KTFBUE (ind:1) |     1 |    15 |     0   (0)|          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("BUE"."KTFBUESEGBNO"="BLOCK#" AND "BUE"."KTFBUESEGTSN"="TS#" AND
              "BUE"."KTFBUESEGFNO"="FILE#"))

This was still fairly resource-intensive for the client, but was something of an improvement – they had a lot more than 6 items in their recyclebin.

Part of the problem, of course, is that x$ktfbue is one of the objects that Oracle skips when you gather “fixed object” stats – it can be a bit expensive for exactly the reason that querying it can be expensive, all those single block segment header reads.

If you want to check the stats and gather them (as a one-off, probably) here’s some suitable SQL:


select
        table_name, num_rows, avg_row_len, sample_size, last_analyzed
from
        dba_tab_statistics
where
        owner = 'SYS'
and     table_name = 'X$KTFBUE'
;

begin
        dbms_stats.gather_table_stats('SYS','X$KTFBUE');
end;
/

Summary

You probably shouldn’t have the recyclebin enabled in a production system; but if you do, and if you also run a regular report on free space (as many sites seem to do) make sure (a) you have a regular routine to minimise the number of objects that it accumulates and (b) gather statistics (occasionally) on x$ktfbue to minimise the overhead of the necessary join between recyclebin$ and x$ktfbue.

Plugzilla!

Cloning a pluggable database takes time, and for environments where you’d like to use clones as part of unit testing, or other elements of Agile development, it would be nice to be able to bring a clone into operation in the smallest time possible. One mechanism for that is sparse storage clones aka snapshot copy, but depending on your database version and your storage infrastructure, you might hit some limitations.

Enter …. Plugzilla! This PL/SQL package allows you clone pluggable databases extremely quickly by having pluggable database pre-cloned in advance.

Example

Lets say you have a development pluggable database called PDB1. You want to let developers take clones of this as quickly and as often as they like and at various stages in its life cycle. Here is how we might do it with plugzilla.

On Sunday June 1st, we’ve just built (say) version 3.1 of our app into PDB1. I want a frozen copy of PDB1 that can be used as a seed for developers to clone from. So I’ll do:


SQL> set serverout on
SQL> exec plugzilla.new_seed_from_existing(p_source=>'PDB1',p_seed=>'PDB31');
alter session set container = cdb$root
seed=SEED_PDB31
src=PDB1,mode=READ WRITE
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open read only
create pluggable database SEED_PDB31 from PDB1 file_name_convert=('PDB1','SEED_PDB31')
alter pluggable database SEED_PDB31 open restricted
alter pluggable database SEED_PDB31 close immediate
alter pluggable database SEED_PDB31 open read only
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

This will create a pluggable database called SEED_PDB31 (all the seeds will have the prefix “SEED_”, but see the package constants in the source code if you want to change this).

On Wednesday June 4th, we’ve just built (say) version 3.2 of our app into PDB1. I again want a frozen copy of PDB1 that can be used as a seed for developers to clone from. So I’ll do:


SQL> exec plugzilla.new_seed_from_existing(p_source=>'PDB1',p_seed=>'PDB32');
alter session set container = cdb$root
seed=SEED_PDB32
src=PDB1,mode=READ WRITE
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open read only
create pluggable database SEED_PDB32 from PDB1 file_name_convert=('PDB1','SEED_PDB32')
alter pluggable database SEED_PDB32 open restricted
alter pluggable database SEED_PDB32 close immediate
alter pluggable database SEED_PDB32 open read only
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

So now we have two seed copies of PDB1 at different points in time. These are the pluggables that form the base for any developer to clone from.

We now call plugzilla.preclone (although more likely is that you would have this as a scheduler job). This will look for any seeds (we have 2 from above) and pre-create ‘n’ pluggable copies of those databases where ‘n is defined by the package constant ‘g_reserve_copies’


SQL> exec plugzilla.preclone
alter session set container = cdb$root
Processing seed: SEED_PDB31
- pending clones: 0
- building 3 pre-clones
create pluggable database PEND_PDB3100001 from SEED_PDB31 file_name_convert=('SEED_PDB31','PEND_PDB3100001')
create pluggable database PEND_PDB3100002 from SEED_PDB31 file_name_convert=('SEED_PDB31','PEND_PDB3100002')
create pluggable database PEND_PDB3100003 from SEED_PDB31 file_name_convert=('SEED_PDB31','PEND_PDB3100003')
Processing seed: SEED_PDB32
- pending clones: 0
- building 3 pre-clones
create pluggable database PEND_PDB3200001 from SEED_PDB32 file_name_convert=('SEED_PDB32','PEND_PDB3200001')
create pluggable database PEND_PDB3200002 from SEED_PDB32 file_name_convert=('SEED_PDB32','PEND_PDB3200002')
create pluggable database PEND_PDB3200003 from SEED_PDB32 file_name_convert=('SEED_PDB32','PEND_PDB3200003')
alter session set container = cdb$root

PL/SQL procedure successfully completed.

These “pending” clones are fully operational pluggables that are yet to be claimed by a developer. They are pre-created so that when a developer wants a clone, they can do it very quickly.

So when a developer wants a sandbox pluggable of the application as of version 3.1. They simply ask for one


SQL> exec plugzilla.clone_from_seed('PDB31')
alter session set container = cdb$root
Found pending pdb PEND_PDB3100001 to use
alter pluggable database PEND_PDB3100001 open restricted
alter session set container = PEND_PDB3100001
alter pluggable database PEND_PDB3100001 rename global_name to PDB3100001
alter pluggable database PDB3100001 close immediate
alter pluggable database PDB3100001 open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

The first available pending pluggable database is picked from the list and renamed to PDB31001. This is an automatically generated name, but developers can choose their own. Because this is just a rename, irrespective of size, the developer will have their pluggable database available to them within 5-10 seconds.

If we want a second sandbox clone, this time with a custom name, I’ll simply run the routine again.


SQL> exec plugzilla.clone_from_seed('PDB31','MYDB') 
alter session set container = cdb$root
Found pending pdb PEND_PDB3100002 to use
alter pluggable database PEND_PDB3100002 open restricted
alter session set container = PEND_PDB3100002
alter pluggable database PEND_PDB3100002 rename global_name to MYDB
alter pluggable database MYDB close immediate
alter pluggable database MYDB open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

When the developer is done with their clone, they drop it.


SQL> exec plugzilla.drop_clone('MYDB')
alter session set container = cdb$root
Dropping clone MYDB
alter pluggable database MYDB close immediate
drop pluggable database MYDB including datafiles
alter session set container = cdb$root

PL/SQL procedure successfully completed.

Note that this does not “return” the pluggable to the pool of available pluggables, because it could contain changes which means it will have diverged from its initial seed. It is completely dropped and the space freed up. It is ‘preclone’ alone that keeps a pre-allocation of pluggables available. Because the numeric suffix continues to rise, there is a cap of 99999 pluggables that could be created. If your application is not deployed by then, you’ve got bigger issues to worry about Smile .

At any time, the table PLUGZILLA_META contains the state of plugzilla. After the above operations, it would look like this:


SQL> select * from plugzilla_meta;

RELATIONSHIP                   CHILD                PARENT
------------------------------ -------------------- -----------------
SEED_FROM_PDB                  SEED_PDB31           PDB1                 => we took a seed PDB31 from PDB1
SEED_FROM_PDB                  SEED_PDB32           PDB1                 => we took a seed PDB32 from PDB1
PENDING_CLONE                  PEND_PDB3100003      SEED_PDB31           => preclone built these
PENDING_CLONE                  PEND_PDB3200001      SEED_PDB32
PENDING_CLONE                  PEND_PDB3200002      SEED_PDB32
PENDING_CLONE                  PEND_PDB3200003      SEED_PDB32
CLONE_FROM_SEED                PDB3100001           SEED_PDB31           => we took a preclone and converted it to a clone

Notes

  • There are many many options in terms of cloning for pluggable databases. This package goes with the Keep-It-Simple policy. It is going to clone pluggables by making the source read only cloning the datafiles replacing existing pluggable name with a new one. You are of course free to modify the package to choose a cloning mechanism that best suits your needs.
  • Don’t forget – you’re messing with pluggable databases here. Don’t be THAT person that drops the wrong data!
  • When a pending pluggable becomes owned by a developer, the files are not being moved or renamed. This is done to keep the operation nice and snappy. A datafile move (done from within the database) is a copy operation. You could conceivably extend the package to call out to the OS to perform the appropriate “move” commands if you really wanted those file names changed, but Plugzilla doesn’t do that (currently).

Download plugzilla source code here

Autonomous Transaction Processing – your slice of the pie

I grabbed the following screen shot from a slide deck I’ve been giving about Autonomous Transaction Processing (ATP). It shows the services that are made available to you once you create your database. At first glance, it looks like we have a simple tier, where the lower in the tier you are, the smaller the slice of the database resource “pie” you get.

image

And this is true in terms of the share of resources that are assigned to each service. But it is also important to note the 3rd column, namely, the parallelism capacities that are assigned. This is the level of parallel capabilities that are active on that service. For example, here’s a simple little SQL demo that creates a table, and performs a couple of set-based deletes on that table. I’ll start by connecting to the LOW service.


SQL> set echo on
SQL> connect xxx/xxx@myservice_low
Connected.
SQL> drop table t purge;

Table T dropped.

SQL> create table t as select * from all_objects;

Table T created.

SQL> delete from t where rownum <= 1000;

1,000 rows deleted.

SQL> delete from t where rownum <= 1000;

1,000 rows deleted.

I might think to myself “Hmmm…I want to make sure that these deletes get a larger slice of the pie, so I’ll use the MEDIUM service instead”. Let’s see what happens when I do that.


SQL> connect xxx/xxx@myservice_medium
Connected.
SQL> drop table t purge;

Table T dropped.

SQL> create table t as select * from all_objects;

Table T created.

SQL> delete from t where rownum <= 1000;

1,000 rows deleted.

SQL> delete from t where rownum <= 1000;

Error starting at line : 5 in command -
delete from t where rownum <= 1000
Error report -
ORA-12838: cannot read/modify an object after modifying it in parallel

My script now crashes Sad smile. So its important to be aware of the parallelism facilities defined for MEDIUM and HIGH. We’ll do operations in parallel whenever possible. You get faster queries and faster DML but there are implications on the way you write your scripts, the locking that will result and how commit processing must be handled. HIGH and MEDIUM are not just “bigger” versions of the LOW service.

This is all outlined in the docs:

Predefined Database Service Names for Autonomous Transaction

The predefined service names provide different levels of performance and concurrency for Autonomous Transaction Processing.

  • tpurgent: The highest priority application connection service for time critical transaction processing operations. This connection service supports manual parallelism.
  • tp: A typical application connection service for transaction processing operations. This connection service does not run with parallelism.
  • high: A high priority application connection service for reporting and batch operations. All operations run in parallel and are subject to queuing.
  • medium: A typical application connection service for reporting and batch operations. All operations run in parallel and are subject to queuing. Using this service the degree of parallelism is limited to four (4).
  • low: A lowest priority application connection service for reporting or batch processing operations. This connection service does not run with parallelism.

TL;DR: For standard transactional activities on ATP, make sure it uses the TP or TPURGENT services. If you need faster performance for volume operations, then HIGH and MEDIUM are your friend, but understand the locking and commit implications.

Parse Solution

In the “Parse Puzzle” I posted a couple of days ago I showed a couple of extracts from an AWR report that showed contradictory results about the time the instance spent in parsing and hard parsing, and also showed an amazing factor of 4 difference between the DB Time and the “SQL ordered by Elapsed Time”. My example was modelling a real world anomaly I had come across, but was engineered to exaggerate the effect to make it easy to see what was going on.

The key feature was VPD (virtual private database) a.k.a. FGAC (find grained access control) or RLS (row-level security). I’ve created a “policy function” (the thing that generates the “security predicate”) to execute an extremely expensive SQL query; then I’ve created a policy with policy_type = ‘DYNAMIC’ so that the function gets executed every time a query against a particular table is executed. In fact my example holds three tables, and each table has its own policy function, and each policy function calls the same very expensive piece of SQL.

To see the effect this has on the AWR report I’ll now supply the contents of the “SQL ordered by Elapsed Time” and work through the list (though not in the order shown) explaining what each statement represents:


SQL ordered by Elapsed Time               DB/Inst: OR18/or18  Snaps: 2059-2060
-> Resources reported for PL/SQL code includes the resources used by all SQL
   statements called by the code.
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100
-> %Total - Elapsed Time  as a percentage of Total DB time
-> %CPU   - CPU Time      as a percentage of Elapsed Time
-> %IO    - User I/O Time as a percentage of Elapsed Time
-> Captured SQL account for  302.6% of Total DB Time (s):             158
-> Captured PL/SQL account for  101.7% of Total DB Time (s):             158

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
           156.8              1        156.84   99.4   99.8     .0 1ubpdythth4q1
Module: SQL*Plus
select id, f_rls(n1, n2, n3) from fgac_base where rownum .le. 10                   -- edited to avoid WP  format issue

           156.8             33          4.75   99.3   99.8     .0 9dhvggqtk2mxh
Module: SQL*Plus
 select count(*) X from waste_cpu connect by n .gt. prior n start with n = 1       -- edited to avoid WP  format issue

            53.2             10          5.32   33.7   99.8     .0 5531kmrvrzcxy
Module: SQL*Plus
SELECT COUNT(*) FROM FGAC_REF3 WHERE ID = :B1

            53.2             11          4.83   33.7   99.8     .0 8g2uv26waqm8g
Module: SQL*Plus
begin :con := "FGAC_PACK"."FGAC_PREDICATE3"(:sn, :on); end;

            52.7             10          5.27   33.4   99.7     .0 awk070fhzd4vs
Module: SQL*Plus
SELECT COUNT(*) FROM FGAC_REF1 WHERE ID = :B1

            52.7             11          4.79   33.4   99.7     .0 c8pwn9j11gw5s
Module: SQL*Plus
begin :con := "FGAC_PACK"."FGAC_PREDICATE1"(:sn, :on); end;

            50.9             10          5.09   32.3   99.9     .0 964u0zv0rwpw1
Module: SQL*Plus
SELECT COUNT(*) FROM FGAC_REF2 WHERE ID = :B1

            50.9             11          4.63   32.3   99.9     .0 bgqf405f34u4v
Module: SQL*Plus
begin :con := "FGAC_PACK"."FGAC_PREDICATE2"(:sn, :on); end;

             2.8              1          2.79    1.8   98.4     .0 fhf8upax5cxsz
BEGIN sys.dbms_auto_report_internal.i_save_report (:rep_ref, :snap_id, :pr_class
, :rep_id, :suc); END;

             2.6              1          2.64    1.7   98.4     .0 0w26sk6t6gq98
SELECT XMLTYPE(DBMS_REPORT.GET_REPORT_WITH_SUMMARY(:B1 )) FROM DUAL

             2.4              1          2.43    1.5   98.3     .0 1q1spprb9m55h
WITH MONITOR_DATA AS (SELECT INST_ID, KEY, NVL2(PX_QCSID, NULL, STATUS) STATUS,
FIRST_REFRESH_TIME, LAST_REFRESH_TIME, REFRESH_COUNT, PROCESS_NAME, SID, SQL_ID,
 SQL_EXEC_START, SQL_EXEC_ID, DBOP_NAME, DBOP_EXEC_ID, SQL_PLAN_HASH_VALUE, SQL_
FULL_PLAN_HASH_VALUE, SESSION_SERIAL#, SQL_TEXT, IS_FULL_SQLTEXT, PX_SERVER#, PX

                          ------------------------------------------------------

The first statement is an SQL statement that calls a PL/SQL function f_rls() for 10 consecutive rows in an “ordinary table”. This is the query that actuallly takes 157 seconds to complete as far as the client SQL Plus session is concerned.

The function (called 10 times) passed in three values n1, n2, n3. The function uses n1 to query table FGAC_REF1, n2 to query table FGAC_REF2, and n3 to query FGAC_REF3 – and we can see those three queries appearing as statements 5, 7, and 3 (in that order) in the output. The main query takes 157 seconds to complete because each of those statements takes approximately 52 seconds to complete 10 executions each.

But each of those three statements references a table with a policy that causes a predicate function to be executed for each parse and execute of the statement (one parse, 10 executes) and we can see 11 executions each of calls to fgac_pack.fgac_predicateN (N in 1,2,3), which take roughly 4.8 seconds each, for a total of about 52 seconds across 11 executions.

But those calls to the functions (11 each) all execute the same “connect by” query that appears as statement 2 in the list – for a total off 33 calls of a SQL statement that averages 4.75 seconds – and almost all of the “real” database time is in that 33 calls (33 * 4.75 = 156.75).

So we count 157 seconds because that’s the time spent in the “connect by” queries”, but we count that time again (but under PL/SQL execution) because it’s called from the policy functions, then count it again (under SQL execution) because the functions are called by the “select count(*) from fgac_refN” queries, then count it one final time (under SQL execution) for the driving query. This gives us a total 300% of the actual database time apparently being spent in SQL and 100% apparently being spent in PL/SQL.

You’ll notice that “real” optimisation of the SQL that does run would have taken just fractions of a second (as we saw in the Instance Activity Statistics); but one execution of the “connect by” query would have been associated with the first parse call of each of the fgac_refN queries, so 15 seconds of policy function / connect by query time would have been accounted under the parse time elapsed / hard parse time elapsed we saw in the Time Model statistics.

One of the strangest things about the reporting showed up in the ASH figures – which only ever managed to sample three SQL_IDs, reporting them as “on CPU” in every single sample, and those three SQL_IDs were for the “select count(*) from fgac_refN” queries; the “connect by” queries – the real consumer of CPU resource – didn’t get into the ASH sample at all.

Footnote

I did two things to make an important anomaly very obvious – I added a CPU intensive query to the policy function so that it was easy to see where the time was really going, and I made the VPD policy_type “dynamic” so that the policy function would execute on every parse and execute against the underlying table.

In real life the typcial SQL called within a policy function is very lightweight, and policies tend to be declared with type “context_sensitive”, and this combination minimises the number of calls to the function and the cost of each call. Unfortunately there are 3rd party applications on the market that include complex PL/SQL frameworks in their policy functions and then have the application server reset the session context for every call to the database.

And that’s where I came in – looking at a system where 10% of the CPU was being spent on parsing that apparently couldn’t possibly be happening.

Take a COPY out of SQLcl’s book

As the world continues to get smaller, it makes sense for all databases to be defined to handle more than just US7ASCII, which is why the default characterset for all recent versions of the Oracle database is AL32UTF8. In that way, we can handle the various challenges when it comes to languages other than english.


SQL> create table t ( x varchar2(30 char));

Table created.

SQL> insert into t values ('안녕미미 99~12900원 시즌오프 원피');

1 row created.

But lets see what happens when we copy that data using some legacy facilities in SQL Plus.


SQL> COPY TO scott/tiger@db18_pdb1 CREATE t1 USING SELECT * FROM t;

Array fetch/bind size is 15. (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
Table T1 created.

   1 rows selected from DEFAULT HOST connection.
   1 rows inserted into T1.
   1 rows committed into T1 at scott@db18_pdb1.

SQL> desc scott.t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 X                                      VARCHAR2(120)

Notice that the definition of table T1 does not match that of the source. The COPY command in SQL Plus dates back to a time before CHAR and BYTE semantics existed, and we even stress that in the documentation.

“12.16 COPY.
The COPY command is not being enhanced to handle datatypes or features introduced with, or after Oracle8i. The COPY command is likely to be deprecated in a future release.”

Luckily, we of course have a newer version of SQL Plus, namely SQLcl. Running the same command under SQLcl is the easy solution here.


C:\>sqlcl

SQLcl: Release 19.2.1 Production on Mon Aug 05 10:46:43 2019

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Mon Aug 05 2019 10:46:43 +08:00

Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.6.0.0.0

SQL> COPY TO scott/tiger@db18_pdb1 CREATE t2 USING SELECT * FROM t;
Array fetch/bind size is 15 (less if there are longs or blobs). (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
Table T2 created.
    1 rows selected from DEFAULT HOST connection.
    1 rows inserted into T2.
    1 rows committed into T2 at scott@db18_pdb1.

SQL> desc scott.t2
Name Null? Type
---- ----- -----------------
X          VARCHAR2(30 CHAR)

Parse Puzzle

Here are some details from an AWR report covering a few minutes in the lifetime of an instance of 18.3. It’s a carefully constructed demonstration and all I’ve done is take an AWR snapshot, execute a single SQL statement, then take another snapshot, so the only thing captured by the report is the work done in that brief time interval. The purpose of the exercise is to demonstrate how some Oracle features can make a complete nonsense of the AWR. (I have, as I often do, produced a model that reproduces an affect that can appear in production but exaggerates the effect to make it more clearly visible.)

First the Time Model statistics:

                                                                % of  % of Total
Statistic Name                                       Time (s) DB Time   CPU Time
------------------------------------------ ------------------ ------- ----------
sql execute elapsed time                                157.8   100.0
DB CPU                                                  157.5    99.8       97.3
parse time elapsed                                       13.6     8.6
hard parse elapsed time                                  13.6     8.6
PL/SQL execution elapsed time                             0.3      .2
PL/SQL compilation elapsed time                           0.0      .0
hard parse (sharing criteria) elapsed time                0.0      .0
repeated bind elapsed time                                0.0      .0
DB time                                                 157.8
background elapsed time                                   7.9
background cpu time                                       4.4                2.7
total CPU time                                          161.9
                          ------------------------------------------------------


Note particularly the parse times – the Time Model show 13.6 seconds spent in (hard) parsing.

Note also that (with a small error) DB time = DB CPU = SQL execute elapsed time, and the background time is very small (in particular it’s less than the parse time). This background time, by the way, is probably related to things that Oracle does behind the scenes when you take an AWR snapshot or run an AWR report.

Given the significant amount of time spent in hard parsing let’s take a look at the Instance Activity statistics – picking only the statistics relating to parse calls:


Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
parse count (describe)                            0            0.0           0.0
parse count (failures)                            0            0.0           0.0
parse count (hard)                              325            2.1         325.0
parse count (total)                           1,662           10.5       1,662.0
parse time cpu                                   39            0.3          39.0
parse time elapsed                               42            0.3          42.0

Although the Time Model thinks Oracle has spent 13.6 seconds in (hard) parse time, the Instance Activity Statistics says it has only spent 0.42 seconds (the statistic is repored in hundredths) That a fairly significant difference of opinion. So let’s see if we can find out more from the “SQL ordered by …”, and I’m only going to show you one heading as a teaser for the rest of the weekend:


SQL ordered by Elapsed Time               DB/Inst: OR18/or18  Snaps: 2059-2060
-> Resources reported for PL/SQL code includes the resources used by all SQL
   statements called by the code.
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100
-> %Total - Elapsed Time  as a percentage of Total DB time
-> %CPU   - CPU Time      as a percentage of Elapsed Time
-> %IO    - User I/O Time as a percentage of Elapsed Time
-> Captured SQL account for  302.6% of Total DB Time (s):             158
-> Captured PL/SQL account for  101.7% of Total DB Time (s):             158

How do you squeeze 400% of the elapsed time into SQL and PL/SQL executions? (Observation: it’s not an IBM P9 taking advantage of SMT/4)

One last set of stats – which will have some room for statistical error since they come from v$active_session_history:


select
        in_parse, in_hard_parse, sql_id, sql_exec_id, count(*)
from
        v$active_session_history  ash
where
        session_id = &m_session_id
and     sample_id between &m_start_sample_id and &m_end_sample_id
group by
        in_parse, in_hard_parse, sql_id, sql_exec_id
order by
        count(*)
/

I I SQL_ID	  SQL_EXEC_ID	COUNT(*)
- - ------------- ----------- ----------
Y Y 964u0zv0rwpw1		       3
Y Y awk070fhzd4vs		       4
Y Y 5531kmrvrzcxy		       5

N N 5531kmrvrzcxy		      42
N N 964u0zv0rwpw1		      42
N N awk070fhzd4vs		      51

6 rows selected.

So the ASH information seems to agree (reasonably closely) with the Time Model statistics – in the interval of the snapshot it’s noted 12 samples (which statistically represents 12 seconds) of hard parse time (and though my query doesn’t show it, the “event” is  null, i.e. “on CPU”).

 

To be continued, some time later this week …

Transaction management in PostgreSQL and what is different from Oracle

TL;DR: Everything is different about transaction behavior. This may also change your ideas about “database independent” applications.

I like to explain some PostgreSQL concepts from an oracle DBA point of view. There are many things that are different in the two RDBMS and it is important to understand them.

Auto commit

Here is a short example where I create a table, insert one row and rollback:

psql -U postgres demo ### PostgreSQL 11.4
 drop table if exists DEMO;
create table DEMO as select 1 n,current_timestamp t;
select * from DEMO;
insert into DEMO values (2,current_timestamp);
select * from DEMO;
rollback;
select * from DEMO;
insert into DEMO values (3,current_timestamp);
\q

I never executed any commit but my changes are saved and published. This is auto-commit by default in psql. I think I prefer the Oracle default where the commit is my decision, and only what I’ve committed is visible:

sqlplus demo/demo@//localhost/PDB1 ### Oracle Database 19.4
 column t format a35
exec for i in (select table_name from user_tables where table_name='DEMO') loop execute immediate 'drop table "'||i.table_name||'" cascade constraints'; end loop;
create table DEMO as select 1 n,current_timestamp t from dual;
select * from DEMO;
insert into DEMO values (2,current_timestamp);
select * from DEMO;
rollback;
select * from DEMO;
insert into DEMO values (3,current_timestamp);
quit

So, is Oracle the best for transaction management?

Well… autocommit default is only one thing. Let’s go further.

First, this is only a client setting. SQL*Plus has it set to off:

SQL> show autocommit
autocommit OFF

But JDBC drivers may have it set to on. So, rather than a nice thing about the database, it is just a nice default of SQL*Plus.

Commit on exit

And wait… are all sqlplus defaults so smart? Let’s come back to my Oracle database where I committed nothing:

sqlplus demo/demo@//localhost/PDB1 ### Oracle Database 19.4
 column t format a35
select * from DEMO;
quit

Are sqlplus defaults so smart? I didn’t commit the last insert but it was committed by default. Not because of auto-commit but because of exit-commit:

SQL> show exitcommit
exitcommit ON

That’s not a nice default. If I quit without saying ‘commit’ I want a rollback. It is highly recommended to set exit-commit off to avoid any unexpected commit. (also recommendedvfor scripts having a WHENEVER SQLERROR EXIT that mentions ROLLBACK because COMMIT is the default).

DDL auto-commit

And that’s not all… The first row (n=1) was never explicitly committed. It was inserted with DDL (CREATE TABLE) and DDL are always auto-committed. That’s the Oracle Database, nothing to do with the client: you cannot be transactional with DDL.

Start transaction

Back to PostgreSQL, the default is auto-commit but I have the choice. I can explicitly start a transaction and then I’ll have to explicitly commit it.

psql -U postgres demo ### PostgreSQL 11.4
start transaction;
drop table if exists DEMO;
create table DEMO as select 1 n,current_timestamp t;
select * from DEMO;
insert into DEMO values (2,current_timestamp);
select * from DEMO;
rollback;
select * from DEMO;
start transaction;
insert into DEMO values (3,current_timestamp);
\q
psql -U postgres demo ### PostgreSQL 11.4
select * from DEMO;
\q

See? I have the 3 rows from the first run which were all auto-committed. But now that I explicitly started a transaction, everything was transactional, even the DDL: the DROP TABLE, the CREATE TABLE, the INSERT were explicitly rolled-back. And even the last INSERT was implicitly rolled-back on exit.

Now, who is the winner in transaction management? There’s even more: you can send a multi-statement command to the backend and it will be processed as an implicit transaction.

Note that START TRANSACTION is the ANSI SQL syntax, but PostgreSQL accepts also BEGIN, BEGIN TRANSACTION and BEGIN WORK.

Set Transaction name

Don’t think that there are no “begin transaction” in Oracle. The SET TRANSACTION starts it, in order to define the isolation level, or simply to put a name to the transaction.

This example looks at the transaction address in V$SESSION and V$TRANSACTION

sqlplus demo/demo@//localhost/PDB1 ### Oracle Database 19.4
select saddr,taddr from v$session 
where sid=sys_context('userenv','sid');
select count(*) from DEMO;
select saddr,taddr from v$session
where sid=sys_context('userenv','sid');
update DEMO set t=current_timestamp;
select saddr,taddr from v$session
where sid=sys_context('userenv','sid');
rollback;
select saddr,taddr from v$session
where sid=sys_context('userenv','sid');
set transaction name 'my_transaction';
select saddr,taddr from v$session
where sid=sys_context('userenv','sid');
quit

This illustrates when a transaction starts (visible with a TADDR in V$SESSION and a row in V$TRANSACTION): the first INSERT/DELETE/UPDATE/MERGE/SELECT FOR UPDATE or a SET TRANSACTION

Autonomous Transaction

Not available in PostgreSQL but possible in Oracle: we can have nested transactions. This is very convenient in some limited cases, like logging the error in the database (and commit this insert) before the rollback of the transaction.

sqlplus demo/demo@//localhost/PDB1 ### Oracle Database 19.4
 set serveroutput on
create table DEMO2 as select * from DEMO;
select saddr,taddr from v$session
where sid=sys_context('userenv','sid');
select count(*) from DEMO;
select saddr,taddr from v$session
where sid=sys_context('userenv','sid');
update DEMO set t=current_timestamp;
select saddr,taddr from v$session
where sid=sys_context('userenv','sid');
select addr,ses_addr from v$transaction;
declare
pragma autonomous_transaction;
begin
update DEMO2 set t=current_timestamp;
for i in ( select addr,ses_addr from v$transaction) loop
dbms_output.put_line(
'Transaction: ADDR: '||i.addr||' SES_ADDR: '||i.ses_addr
);
end loop;
rollback;
end;
/
rollback;
quit

This shows that there can be multiple transactions for the same session. The PADDR is only the address or the top-level one.

Statement-level rollback

Still in Oracle, when a statement fails, the modifications done by this statement are rolled back, but not the previous modifications. The transaction can continue (like re-try, or do an alternative change):

sqlplus demo/demo@//localhost/PDB1 ### Oracle Database 19.4
column t format a35
exec for i in (select table_name from user_tables where table_name='DEMO') loop execute immediate 'drop table "'||i.table_name||'" cascade constraints'; end loop;
create table DEMO as select 1 n,current_timestamp t from dual;
alter table DEMO add unique(n);
select * from DEMO;
insert into DEMO values (2,current_timestamp);
select * from DEMO;
insert into DEMO values (1,current_timestamp);
select * from DEMO;
quit


ERROR: current transaction is aborted

That’s different in PostgreSQL where the transaction cannot continue when you encounter an error:

psql -U postgres demo ### PostgreSQL 11.4
drop table if exists DEMO;
start transaction;
create table DEMO(n,t) as select 1 n,current_timestamp t;
alter table DEMO add unique(n);
select * from DEMO;
insert into DEMO values (2,current_timestamp);
select * from DEMO;
insert into DEMO values (1,current_timestamp);
select * from DEMO;
rollback;
select * from DEMO;
\q

ERROR: current transaction is aborted, commands ignored until end of transaction block

Here I rolled back. But I can also commit to terminate the transaction, but it will rollback anyway:

# commit;
ROLLBACK

Savepoint

Actually, we can achieve statement-level rollback even in PostgreSQL, using savepoints:

psql -U postgres demo ### PostgreSQL 11.4
start transaction;
drop table if exists DEMO;
create table DEMO(n,t) as select 1 n,current_timestamp t;
alter table DEMO add unique(n);
select * from DEMO;
insert into DEMO values (2,current_timestamp);
select * from DEMO;
savepoint my_before_insert_savepoint;
insert into DEMO values (1,current_timestamp);
select * from DEMO;
rollback to my_before_insert_savepoint;
select * from DEMO;
\q

With Oracle, you don’t need to because there is an implicit savepoint before each execution.

Well, the PostgreSQL client psql do the same:

\set ON_ERROR_ROLLBACK on

and then psql will automatically create a “pg_psql_temporary_savepoint”.

PostgreSQL 11 Procedures

Before version 11 all commands (which can be one statement, or multiple ones, or PL/pgSQL anonymous or stored procedures) were run in an atomic context, with no intermediate commits possible without a roundtrip with the client. Like this:

psql -U postgres demo ### PostgreSQL 11.4
drop table if exists DEMO;
create table DEMO(n,t) as select 1 n,current_timestamp t;
alter table DEMO add unique(n);
create or replace procedure my_inserts(n1 int, n2 int) as
$$
begin
for i in n1..n2 loop
insert into DEMO values (i,current_timestamp);
end loop;
end;
$$ language plpgsql;
call my_inserts(10,12);
call my_inserts(5,15);
select * from DEMO;
\q

The first call has inserted values 10 and 12 and they were committed because I run in the default AUTOCOMMIT. But the second call has encountered a duplicate key and the whole was rolled-back. But in PostgreSQL 11 I can add an intermediate commit so that the first rows are committed before I encounter the error.

psql -U postgres demo ### PostgreSQL 11.4
drop table if exists DEMO;
create table DEMO(n,t) as select 1 n,current_timestamp t;
alter table DEMO add unique(n);
create or replace procedure my_inserts(n1 int, n2 int) as
$$
begin
for i in n1..n2 loop
insert into DEMO values (i,current_timestamp);
commit;
end loop;
end;
$$ language plpgsql;
call my_inserts(10,12);
call my_inserts(5,15);
select * from DEMO;
\q

Here the rows 5 to 9 have been committed before encountering the exception for row 10.

Still, no need for an explicit BEGIN here. The COMMIT in the loop will end the transaction (started implicitly on the server as I did no BEGIN before) and start a new one. The last transaction started will be committed implicitly.

Additionally, you can look at the timestamps in the two previous demos. In PostgreSQL, current_timestamp is consistent in the transaction.

Invalid transaction termination

Here is the same, but with AUTOCOMMIT off:

psql -U postgres demo ### PostgreSQL 11.4
drop table if exists DEMO;
create table DEMO(n,t) as select 1 n,current_timestamp t;
alter table DEMO add unique(n);
create or replace procedure my_inserts(n1 int, n2 int) as
$$
begin
for i in n1..n2 loop
insert into DEMO values (i,current_timestamp);
commit;
end loop;
end;
$$ language plpgsql;
\set AUTOCOMMIT off
call my_inserts(10,12);
call my_inserts(5,15);
select * from DEMO;
rollback;
select * from DEMO;
\q

Actually, this is what made me start this blog post. But that’s already a lot, and further investigation on this will be on the next post:
AUTOCOMMIT and PostgreSQL transaction management in procedures

Implicit transactions

Ok, just one more on this. The error above is not really caused by AUTOCOMMIT but by the way psql handles AUTOCOMMIT. Actually, with PostgreSQL, the AUTOCOMMIT is not a BEGIN statement added by the psql client, but the backend server creating an implicit transaction when there is not already one. It is even the opposite: when AUTOCOMMIT is off, the client adds a “BEGIN” so that the server does not create an implicit one. Which means that the “invalid transaction termination” occurs also without auto-commit when we CALL the procedure from an explicit transaction.

ERROR: invalid transaction termination

\echo :AUTOCOMMIT
begin transaction;
do $$
begin
update DEMO set t=null;
rollback;
end
$$;
rollback;
\set AUTOCOMMIT off
\echo :AUTOCOMMIT
do $$
begin
update DEMO set t=current_timestamp;
commit;
end
$$;

This long post is only a short sample of the many differences between Oracle and PostgreSQL transaction management. And I’ve only used psql and sqlplus here. Do you want to test the same from Java JDBC and Python psycopg2?

Comments welcome on Twitter: