Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Successful Evangelism

I’ve been asked what it takes to be a successful evangelist and realizing that what makes one successful at it, is often like holding sand in your hands- no matter how tightly you hold your fists, it’s difficult to contain the grains.

The term evangelist is one that either receives very positive or very negative responses.  I’m not a fan of the term, but no matter if you use this term or call them advocates, representative, influencer-  it doesn’t matter, they are essential to the business, product or technology that they become the voice for.

Those that I view as successful evangelists in the communities that I am part of?

There are a number of folks I’m sure I missed I also admire as I interact and observe their contributions, but these are a few that come to mind when I think of fellow evangelists.

What makes an evangelist successful?  It may not be what you think.

1. It’s Not Just About the Company

Most companies think they hire an evangelist to promote and market the company and yet, when all you do it push out company info, company marketing- People STOP listening to you.  What you say, do and are interested in should drive people to want to know more about you, including the company you work for and what that company does.

All of these folks talk about interests outside of work.  They post about their lives, their interests and contribute to their communities.  This is what it means to be really authentic and setting an example.  People want to be more like them because they see the value they add to the world than just talking points.

2.  They’re Authentic

Authenticity is something most find very elusive.  If you’re just copying what another does, there’s nothing authentic about that.  There’s nothing wrong finding a tip or tidbit that someone else is doing and adopting it, but it has to WORK for you.  I was just part of a conversation yesterday, where Jeff and I were discussing that he doesn’t use Buffer, (social media scheduling tool) where I live by it.  It doesn’t work for Jeff and there’s nothing wrong with that.  We are individuals and what makes us powerful evangelists is that we figured out what works for each of us.

3.  In the Know

As a technical evangelist, you can’t just read the docs and think you’re going to be received well.  Theory is not practice and I’ve had a couple disagreements with managers explaining why I needed to work with the product.  I’ve had to battle for hardware to build out what I’ve been expected to talk on and only once I didn’t fight for it and I paid for it drastically.  I won’t write on a topic unless I can test it out on my own.  Being in the trenches provides you a point of view no document can provide.

Documentation is secondary to experience.

4.  Your View is Outward

This is a difficult one for most companies when they’re trying to create evangelists from internal employees.  Those that may be deeply involved at the company level may interact well with others, but won’t redirect to an external view.  I’ve had people ask me why my husband isn’t doing as much as I am in the community.  Due to his position, he must be more internally and customer facing.  My job is very separate from my fellow employees.  I must always be focused outward and interact at least 95% of my time with the community.  You’ll notice all of the folks listed are continually interacting with people outside of their company and are considered very “approachable.”

We volunteer our time in the community- user groups, board of directors, events and partnering with companies.  We socialize, as we know our network is essential to the companies we represent.

5.  We Promote

I wish I did more public promotion like I see some of these other folks.  I’m like my parents-  I stand up for others and support them on initiatives and goals.  I do a lot of mentoring, but less when I’m blogging.  My mother was never about empty compliments and I did take after her on this.  I’m just not very good at remembering to compliment people on social media and feel I lack in this area, but I continually watch others do this for folks in the community and this is so important.

We ensure to work with those that may need introductions in our network, support in the community and reach out to offer our help.  In the public view, this is quite transparent, so when others pay this forward or return the favor, it can appear that people just bend over backwards for us, but we often have been their for the folks in question in the past, with no expectations and people remembered this.

We do promote our company, but for the right reasons.  The company has done something good for the community, has something special going on, but rarely do we push out anything marketing, as it just doesn’t come across very well from us.  It’s not authentic.

Additional Recommendations

  • Refrain from internet arguments, social media confrontations

I’m not saying to be a pushover.  I literally have friends muted and even blocked.  There’s nothing wrong with NOT being connected to individuals that have very different beliefs or social media behavior.  You shouldn’t take it personally– this is professional and you should treat it as such.

You may find, (especially for women and people of color) that certain individuals will challenge you on ridiculous topics and battle you on little details.  This is just the standard over-scrutinizing that we go through and if it’s not too bad, I tell people to just ignore it and not respond.  If it escalates, don’t hesitate to mute or block the person.  You’re not there to entertain them and by removing your contributions from their feed- “out of sight, out of mind”, offering peace to both of you… </p />
</p></div></div>

    	  	<div class=

Postgres vs. Oracle access paths III – Partial Index

In the previous post I said that an Index Only Access needs to find all rows in the index. Here is a case where, with similar data, Postgres can find all rows but Oracle needs additional considerations.

In the previous post I’ve executed:
select sum(n) from demo1
The execution plan was:

Aggregate (cost=295.29..295.30 rows=1 width=8) (actual time=2.192..2.193 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=30
-> Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.277 rows=10000 loops=1)
Output: n
Heap Fetches: 0
Buffers: shared hit=30

Basically, this reads all values of the column N and then aggregates them to the sum.
If I remove the SUM() I have only the part that reads all values from N:

explain (analyze,verbose,costs,buffers) select n from demo1 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.284 rows=10000 loops=1)
Output: n
Heap Fetches: 0
Buffers: shared hit=30
Planning time: 0.440 ms
Execution time: 1.972 ms

Oracle

This sounds logical. Now let’s run the same query, a simple ‘select n from demo1′ in Oracle:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID ad4z7tpt0dkta, child number 0
-------------------------------------
select /*+ */ n from demo1
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 397 (100)| 10000 |00:00:00.01 | 1451 |
| 1 | TABLE ACCESS FULL| DEMO1 | 1 | 10000 | 397 (0)| 10000 |00:00:00.01 | 1451 |
--------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Here the access path is different: a full table scan instead of an index only access (Index Fast Full Scan). It is not a cost decision. If we try to force an index access, with INDEX_FFS() or INDEX() hints, the query will still do a Full Table Scan. The reason is that and index only access is possible only if all columns and all rows are present in the index. But Oracle does not always index all rows. The Oracle index has no entry for the rows where all the indexed columns are nulls.

Where n is not null

If I run the same query with the purpose of showing only non-null values, with a ‘where n is not null’ predicate, then an index only access is possible:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2gbjpw5u0v9cw, child number 0
-------------------------------------
select /*+ */ n from demo1 where n is not null
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 (100)| 10000 |00:00:00.01 | 28 |
| 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 28 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("N" IS NOT NULL)

Constraints

An alternative, if we know that we will never have null values here, is to give the information to the optimizer that there are no null values in the column N:
In Oracle:
alter table demo1 modify n not null;
This is the equivalent of the PostgreSQL
alter table demo1 alter column n set not null;
Then, in addition to ensuring the verification of the constraint, the constraint informs the optimizer that there is no null values and that all rows can be find in the index:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID ad4z7tpt0dkta, child number 0
-------------------------------------
select /*+ */ n from demo1
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 (100)| 10000 |00:00:00.01 | 28 |
| 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 28 |
-------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Additional columns

Even if the column can have some null values, it is easy to have an index on null values in Oracle, just by adding a non-null column or expression. And if you don’t need this additional column, you can even add a constant, such as in the following index definition:

create unique index demo1_n on demo1(n,0);

This works because all index entries have at least one non null value. But looking at the buffers you can see that this additional byte (0 is stored in 1 byte) has a little overhead (31 blocks read here instead of 28):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID ad4z7tpt0dkta, child number 0
-------------------------------------
select /*+ */ n from demo1
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 8 (100)| 10000 |00:00:00.01 | 31 |
| 1 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 8 (0)| 10000 |00:00:00.01 | 31 |
-------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "N"[NUMBER,22]

Oracle Partial Indexes

In Oracle, all indexes that include a nullable column are partial indexes: not all rows are indexed, and an index access is possible only if the WHERE clause, or a constraint, guarantees that we don’t need the non-indexed rows. Combined with expression, it can be a way to implement partial indexes when the expression returns null for a specific condition. Oracle even provides computed columns (aka virtual columns) so that the expression does not have to be coded in the where clause of the query.

As an example with expressions, the following index has entries only for the values lower than 10:
create index demo_top10 on demo1(case when n<=10 then n end)

However, to use it, we must mention the expression explicitly:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 863drbjwayrt7, child number 0
-------------------------------------
select /*+ */ (case when n<=10 then n end) from demo1 where (case when
n<=10 then n end)<=5
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 (100)| 4 |00:00:00.01 | 2 |
|* 1 | INDEX RANGE SCAN| DEMO1_N_TOP10 | 1 | 5 | 1 (0)| 4 |00:00:00.01 | 2 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DEMO1"."SYS_NC00004$"<=5)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "DEMO1"."SYS_NC00004$"[NUMBER,22]

We can see that internally, a virtual column (“SYS_NC00004$”) has been created for the indexed expression, and is used for the predicate and the projection which uses the same expression. There is another possibility with the ‘partial index’ feature introduced in 12c but it has not the flexibility of a predicate: it is based on partitioning where only some partitions can be indexed.

Postgres Partial Indexes

Postgres does not need those workarounds. An index indexes all rows, including null entries, and partial indexes can be defined with a where clause:
create index demo_top10 on demo1(n) where n<=10

No need to change the query. As long as the result can come from the partial index, we can use the column without an expression on it:

explain (analyze,verbose,costs,buffers) select n from demo1 where n<=5 ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n_top10 on public.demo1 (cost=0.14..4.21 rows=4 width=4) (actual time=0.114..0.114 rows=5 loops=1)
Output: n
Index Cond: (demo1.n <= 5)
Heap Fetches: 0
Buffers: shared hit=2
Planning time: 0.557 ms
Execution time: 0.129 ms

Here the smaller partial index (demo1_n_top10) has been chosen by the query planner.

As you see I’ve not used exactly the same condition. The query planner understood that n<=5 (in the WHERE clause) is a subset of n<=10 (in the index definition). However, if the predicate is too different, it cannot use the index:

fpa=# explain (analyze,verbose,costs,buffers) select n from demo1 where 2*n<=10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using demo1_n on public.demo1 (cost=0.29..320.29 rows=3333 width=4) (actual time=0.020..1.086 rows=5 loops=1)
Output: n
Filter: ((2 * demo1.n) <= 10)
Rows Removed by Filter: 9995
Heap Fetches: 0
Buffers: shared hit=30

Here, instead of “Index Cond” we have a simple “Filter”. The Index Only Scan has read all the rows, and they were filtered afterward (“Rows Removed by Filter”).

Index condition

With the VERBOSE option of EXPLAIN we see the condition used by the index access:
Index Cond: (demo1.n <= 5)
‘Index Cond.’ is not a simple filter removing rows after an operation, but it is the condition which is used for fast access to the index entries in the sorted index structure. We have the equivalent in Oracle with the ‘+predicate’ format of dbms_xplan:

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("N"<=5)

Before going further on index access for WHERE clause predicate, the next post will show the major characteristic of indexes (besides the fact that it stores a redundant subset of columns and rows): they are maintained sorted and may return the resulting rows in order.

 

Cet article Postgres vs. Oracle access paths III – Partial Index est apparu en premier sur Blog dbi services.

Random null value in plsql generated data

I have added another small feature to the testdata library. The ability to create random data is good, but
sometimes you also need to have "gaps" in that data. So I made it possible to randomly create null instead of a
value for any generated field. Simply surround the generator function in parentheses and add how big a percentage the chance
are of a null value after the parentheses.

Rebuilding Indexes

One of the special events that can make it necessary to rebuild an index is the case of the “massive DML”, typically a bulk delete that purges old data from a table. You may even find cases where it’s a good idea to mark a couple of your indexes as unusable before doing a massive delete and then rebuild them after the delete.

Despite the fact that a massive delete is an obvious special case it’s still not necessary in many cases to worry about a rebuild afterwards because the space made free by the delete will be smoothly reused over time with very little variation in performance. There is, however, one particular feature that increases the probability of a rebuild becoming necessary – global (or globally partitioned) indexes on partitioned tables. The problem (and the absence of problem in non-partitioned tables) is in the nature of the rowid.

For non-partitioned tables, and partitioned tables with local indexes, the rowid stored in an index is (assuming we’re thinking only of heap tables) stored as a sequence of 6 bytes consisting, in order, of: (tablespace relative file number, block number within file, row number within block). If the table is non-partitioned, or if this is an index segment from a locally partitioned index, all the index entries will be pointing to the same table segment and Oracle knows which segment that is from the data dictionary information – so Oracle can derive the data_object_id of the table segment and convert the tablespace relative file number into the absolute file number to navigate to the right row in the table.

When the index is global or globally partitioned any index entry may point to any of the table’s segments, so the rowid that is stored in the index is expanded by a further 4 bytes to hold the data_object_id of the table segment it is pointing to – and the data_object_id is the leading component: (data_object_id, tablespace relative file number, block number within file, row number within block). Think about what this means when you start to drop “old” partitions and add new partitions. Compare this with what happens when you simply delete a large volume of old data from a table and starting inserting new data. There’s an important difference to the way in which indexes will evolve.

Purging data

When you delete a large volume of data from a (simple, heap) table you will create a lot of empty space in a lot of existing table blocks. If that delete is supposed to get rid of “old” data (and to keep the description simple we’ll assume it’s the first time you’ve done this) it’s very likely that the delete will result in lots of empty blocks near the start of the table – rows that were inserted at the same time will tend to be physically close to each other in the table. This means that future inserts will soon start to reuse those table blocks. Think about what this means for index entries – especially for non-unique keys.

Assume you have 100 rows with value ‘XXX’ for an indexed column. Breaking the rowid into its component parts the index entries will be (‘XXX’,{file_id, block_id, row_number}).  Now assume you delete the oldest 10 rows then, over time, insert 10 more rows with the same key value. You’ve deleted the 10 index entries with the lowest values for (file_id, block_id) but the space that’s become available in the table will be in and around exactly that range of blocks – so the new index entries will probably end up looking very similar to the deleted index entries and inserted in and around the existing index entries for value ‘XXX’, so over time the index is unlikely to allocate much new space.

Now think about what happens when your table it partitioned but the index is global; your index entries are (‘XXX’,{data_object_id, file_id, block_id, row_number}). When you drop the oldest partition you will probably[1] delete all the index entries with the lowest data_object_id. When you start inserting new rows for ‘XXX’ the new table partition will have a data_object_id that is going to be higher than any previous data_object_id – which means you’re going to be inserting rows into the right-hand (high-value) edge of this section of the index. In some cases – typically those where you have a couple of leaf blocks per key value – the index may end up growing significantly because the insertion point for rows in the new partition isn’t in the leaf block with the available space, and it won’t be until you’ve done a few more bulk deletes and the leaf blocks relating to the oldest table partitions become completely empty that the space can be reused.

An example of this type of behaviour probably appeared on the OTN database forum quite recently.  Of course, there are various reasons why indexes can become inefficient, and the degree of inefficiency may only become noticeable over a fairly long period of time; moreover there are various reasons why global indexes are a little problematic, and various reasons why a bulk delete (which is what executing “alter table drop partition” does to a global index) has unpleasant side effects dependent somewhat on the number (and size) of the partitions and on how many you try to drop in one go.

There’s not  a lot you can do about this quirk of global indexes, but it’s always worth taking extra care with partitioned tables and focusing even more carefully on a strategic review of indexes:

  • Does this index really need to exist at all
  • Could this index be replaced by a selective function-based index
  • Does this index really need to be global / globally partitioned
  • How big is this index compared to the size it ought to be
  • Should this index be (basic) compressed
  • Is this index likely to be disrupted by a historic purge – or is there another reason for its undesirable behaviour

 

[1] probably delete entries with the lowest data_object_id” – I have to say this because if you’ve executed a “move partition” at any time a new data_object_id will have been generated for the partition, so the oldest partition could, in principal, have the highest data_object_id. The issue of changing data_object_ids brings a whole new level of complexity to global indexes – but only in a few special cases, fortunately.

 

 

Postgres vs. Oracle access paths II – Index Only Scan

In the previous post I’ve explained a sequential scan by accident: my query needed only one column which was indexed, and I expected to read the index rather than the table. And I had to hint the Oracle example to get the same because the Oracle optimizer chooses the index scan over the table scan in that case. Here is where I learned a big difference between Postgres and Oracle. They both use MVCC to query without locking, but Postgres MVCC is for table rows (tuples) only whereas Oracle MVCC is for all blocks – tables and indexes.

So this second post is about Index Only Scan and the second constant you find in the documentation for the query planner:
random_page_cost (floating point)
Sets the planner’s estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0.


I am here in the situation after the previous post: created table and index, have run a query which did a sequential scan on the table:

explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1554.00..1554.01 rows=1 width=8) (actual time=17.430..17.430 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=1429
-> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.031..13.011 rows=10000 loops=1)
Output: n, a, x
Buffers: shared hit=1429
Planning time: 1.791 ms
Execution time: 17.505 ms

Index Only Scan

I want to understand why the query planner did not choose an access to the index only. This is where hints are useful: force a plan that is not chosen by the optimizer in order to check if this plan is possible, and then check its cost:

/*+ IndexOnlyScan(demo1) */
explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1727.29..1727.30 rows=1 width=8) (actual time=5.424..5.425 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=1429 read=29
-> Index Only Scan using demo1_n on public.demo1 (cost=0.29..1702.29 rows=10000 width=4) (actual time=0.177..4.613 rows=10000 loops=1)
Output: n
Heap Fetches: 10000
Buffers: shared hit=1429 read=29
Planning time: 0.390 ms
Execution time: 5.448 ms

From there you see that an Index Only Scan is possible but more expensive. The estimated cost is higher than the Seq Scan (cost=0.29..1702.29 instead of cost=0.00..1529.00). And the execution statistics shows that I’ve read the 1429 table pages in addition to the 29 pages of the index.

From the hit/read statistics we can note that the create table has left all the table pages in the buffer cache, but this is not the case for the create index. But that’s another story. My concern is why and index only access goes to read all table blocks in addition to the index ones, which brings the cost to 1727.30-1554.01=173.29 higher than the sequential scan.

The clue is in this line showing that all my rows were fetched from heap page, which is the table: Heap Fetches: 10000

Tuple visibility

In ACID databases, a modification must not be visible by others until the transaction completion (commit). There are two ways to achieve that. The first way is to read the latest version of data: lock in share mode what you read, so that no concurrent update can happen. The other solution is to query a previous version of data (MVCC – Multi Version Concurrency Control) where uncommitted changes are not visible. Both Oracle and Postgres use MVCC which is great because you can have transactions and queries on the same database. But they do the versioning at a different level.

Oracle MVCC is physical, at block level. Then everything is versioned: tables as well as index, with their transaction information (ITL) which, with the help of the transaction table, give all information about visibility: committed or not, and with the commit SCN. With this architecture, a modified block can be written to disk even with uncommitted changes and there is no need to re-visit it later once the transaction is committed.

Postgres MVCC is logical at row (‘tuple’) level: new version is a new row, and committed changes set the visibility of the row. The table row is versioned but not the index entry. If you access by index, you still need to go to the table to see if the row is visible to you. This is why I had heap fetches here and the table blocks were read.

This explains that the cost of Index Only Scan is high here. In addition to about 30 index blocks to read, I’ve read about 1429 table blocks. But that can be worse. For each index entry, and I have 10000 of them, we need to go to the table row, which is exactly what the 10000 heap fetches are. But I’m lucky because I have a very good clustering factor: I have created the table with increasing values for the column N (generated by generate_series). With a bad clustering factor (physical storage of rows in the table not correlated with the order of index) you would see up to 10000 additional shared hits. Thankfully, the query planner estimates this and has switched to table scan which is cheaper in this case.

Vacuum and Visibility Map

Always going to the table rows to see if they are committed would always be more expensive than a table scan. The Postgres vacuum process maintains a Visibility Map as a bitmap of pages that have been vacuumed and have no more tuples to vacuum. This means that all rows in those pages are visible to all transactions. When there is an update on the page, the flag is unset, and remains unset until the modification is committed and the vacuum runs on it. This visibility flag is used by the Index Only Scan to know if it is needed to get to the page.

Let’s run the vacuum and try again the same query:

vacuum demo1;
VACUUM
 
explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=295.29..295.30 rows=1 width=8) (actual time=2.192..2.193 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=30
-> Index Only Scan using demo1_n on public.demo1 (cost=0.29..270.29 rows=10000 width=4) (actual time=0.150..1.277 rows=10000 loops=1)
Output: n
Heap Fetches: 0
Buffers: shared hit=30
Planning time: 0.450 ms
Execution time: 2.213 ms

Here, without any hint, the query planner has chosen the Index Only Scan which is now less expensive than a Seq Scan: cost=0.29..270.29

Cost of Index Only Scan

There is an initial cost of 0.29 is calculated from cpu_operator_cost which defaults 0.0025 which means that about 0.29/0.0025=116 operations were charged here. This cost is minimal and I don’t go into details.
CaptureIndexScanpgora
Then, to get rows we have to

  • read 30 blocks from the index. Those seem to be random scan (with random_page_cost=4) and then the cost for all rows is 4*30=120
  • process the index entries (with cpu_index_tuple_cost=0.005) and then the cost for all 10000 rows is 0.005*10000=50
  • process the result rows (with cpu_tuple_cost=0.01) and then the cost for all 10000 rows is 0.01*10000=100

This brings the cost to the total of 270.29

For the above operation, the SUM(N) this is exactly the same as in the previous post on Seq Scan: cost=25 (cpu_operator_cost=0.0025 for 10000 rows) and is this initial cost because the sum is now only when all rows are processed, and an additional 0.01 for the result row.

Oracle

In the previous post I used the FULL() hint to compare Oracle Full Table Scan to Postgres Seq Scan, but by default, Oracle chose an index only access because the index covers all the rows and columns we need.

All columns that we need:

In the previous post we have seen the column projection (from the +projeciton format of dbms_xplan):

Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) SUM("N")[22]
2 - (rowset=256) "N"[NUMBER,22]

I need only the column N from the table DEMO1, and this column is in the index DEMO1_N

All rows that we need:

In Oracle an index does not have an entry for every row but only for rows where at least one of the indexed columns is not null. Here because we have no where clause predicate on N, and because we have not declared the column N as NOT NULL, the access by index may not return all rows. However, the SUM() function does not need to know about the null values, because they don’t change the sum and then the optimizer can safely choose to do an index only access.

Here is the query without hints:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6z194712fvcfu, child number 0
-------------------------------------
select /*+ */ sum(n) from demo1
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 7 (100)| 1 |00:00:00.01 | 26 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 26 |
| 2 | INDEX FAST FULL SCAN| DEMO1_N | 1 | 10000 | 7 (0)| 10000 |00:00:00.01 | 26 |
--------------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) SUM("N")[22]
2 - "N"[NUMBER,22]

This plan looks very similar to the Postgres one after the vacuum: 51 buffers which is approximately the number of blocks in my index here. However, Oracle does not have the ‘vacuum’ requirement because the MVCC applies to the index and Oracle does not need to go to the table to undo the uncommitted changes. But there is something else here. If you remember the previous post, the Oracle cost=1 is equivalent to the cost of a random read (single block) and the cost of reading one block through a larger I/O (multiblock read) is, with default statistics, about 0.278 times cheaper. Here, 7/26= 0.2692 which proves that the cost is based on multiblock reads. Oracle can read indexes with INDEX FAST FULL SCAN in the same way it reads table with FULL TABLE SCAN: with larger I/O. We don’t need any ordering of rows here, because we just do the sum, and then we don’t need to follow the chain of leaf blocks, scattered within the index segment. Just read all blocks as they come, with fast I/O.

Index Fast Full Scan is possible in Oracle because MVCC is at block level for indexes as well as tables. You can just read the blocks as of the point in time of the query, without being concerned by concurrent operations that update the index entries or split the blocks. Postgres Index Only Scan is limited because MVCC is on tables only, and then must scan the index in the order of leaves, and must read the visibility map and maybe the table pages.

In Oracle, an index can be used to partition vertically a table, as a redundant storage of a few columns in order to avoid full table scans on large rows, allowing queries to avoid completely to read the table when the index covers all required rows and columns. We will see more about the ‘all rows’ requirement in the next post.

 

Cet article Postgres vs. Oracle access paths II – Index Only Scan est apparu en premier sur Blog dbi services.

Oracle Cloud: script to stop all PaaS services

With metered cloud services, keeping all your instances running may become expensive. The goal is to start them only when you need them. Here is a script that stops all instances you have on the Oracle Cloud Service PaaS. You can schedule it for example to stop them at the end of the business day, or when they are not active for a long time. The scripts use the REST API called with curl, JSON output parsed with jq, HTTP status explained with links.

In the first part of the script, I set the variables. Set them to your user:password, identity domain, cloud service url, ssh key:

u="MyEmail@Domain.net:MyPassword"
i=a521642
r=https://dbcs.emea.oraclecloud.com
k="ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxAEm1WHYbJa50t61YhM53u4sljjSFGK458fgdljjkNqfihcRxSf2ENw6iaYhiBTPogG9IDaEqW+SbwpororD2/bep16/hHybGswD34jU7bf9kaaKi5gOgASChid4e322zrnwOtlzfHiiquhiUDgLwpQxCYVV5zU1RQ2NS3F3a45bepqkn/GuPY5x/KSn576+7HBCYVbao/PTwZAeTVbo6Xb4ZQQrOIsLJxxDCQsr0/g7ZS8/OJHt8kotydu13n3rANB2y312XtTsW9mAwpfXuCuqDM5+dIjUdmtflkYtqsfrqSpLevVfVt1L7hqo+JGo7OBO0doVs6lQSCblZhYHh Me@MyLaptop"

Here is the script. It starts to download the certificate if not already there. Then queries for all non stopped services and stops them. Finally, the last line displays the status of all services.


[ -f cacert.pem ] || curl --remote-name --time-cond cacert.pem https://curl.haxx.se/ca/cacert.pem
 
for s in $( curl -s --request GET --cacert cacert.pem --user $u --header "X-ID-TENANT-NAME:$i" $r/paas/service/dbcs/api/v1.1/instances/$i | jq -r '.services[]|select(.status!="Stopped")|.service_name' )
do
# call the 'Stop service' REST API and get the http status
httpstatus=$(curl --include --request POST --cacert cacert.pem --user $u --header "X-ID-TENANT-NAME:$i" --header "Content-Type:application/json" --data '{"lifecycleState":"Stop"}' $r/paas/service/dbcs/api/v1.1/instances/$i/$s | awk '{print >"/dev/stderr"} /^HTTP/{print $2}')
# look for http status in documentation
links -dump -width 300 https://docs.oracle.com/en/cloud/paas/java-cloud/jsrmr/Status%20Codes.html | grep -B 1 -A 1 " $httpstatus "
done
 
sleep 1
curl -s --request GET --cacert cacert.pem --user $u --header "X-ID-TENANT-NAME:$i" $r/paas/service/dbcs/api/v1.1/instances/$i | jq .

The script requires:

  • curl to call the REST API
  • jq to format and extract the returned JSON
  • links to get the HTTP status description from the documentation

The Cloud is all about automation and the REST API makes it very easy to do from command line or script.

 

Cet article Oracle Cloud: script to stop all PaaS services est apparu en premier sur Blog dbi services.

Basic Index Compression Made Simple (It Ain’t Easy)

I’ve discussed Index Compression a number of times as it’s an excellent way of minimizing the size of indexes without unnecessary, expensive index rebuilds. One of the nice features of Advanced Index Compression is that not only does it potentially compress indexes more effectively than possible with Basic Index Compression, but that it also eliminates […]

Display Data Guard configuration in SQL Developer

The latest version of SQL Developer, the 17.2 one released after Q2 of 2017, has a new item in the DBA view showing the Data Guard configuration. This is the occasion to show how you can cascade the log shipping in Oracle 12c

A quick note about this new versioning: this is the release for 2017 Q2 and the version number has more digits to mention the exact build time. Here this version is labeled 17.2.0.188.1159 and we can see when it has been built:

SQL> select to_date('17.x.0.188.1159','rr."x.0".ddd.hh24mi') build_time from dual;
 
BUILD_TIME
--------------------
07-JUL-2017 11:59:00

Non-Cascading Standby

Here is my configuration with two standby databases:

DGMGRL> show configuration
Configuration - orcl
 
Protection Mode: MaxPerformance
Members:
orcla - Primary database
orclb - Physical standby database
orclc - Physical standby database
 
Fast-Start Failover: DISABLED
 
Configuration Status:
SUCCESS (status updated 9 seconds ago)

I have only the LogXptMode defined here, without any RedoRoutes

DGMGRL> show database orcla LogXptMode
LogXptMode = 'SYNC'

with this configuration, the broker has set the following log destination on orcla, orclb and orclc:

INSTANCE_NAME NAME VALUE
---------------- -------------------- -------------------------------------------------------------------------------------------------------------
ORCLA log_archive_dest_1 location=USE_DB_RECOVERY_FILE_DEST, valid_for=(ALL_LOGFILES, ALL_ROLES)
ORCLA log_archive_dest_2 service="ORCLB", SYNC AFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300
db_unique_name="orclb" net_timeout=30, valid_for=(online_logfile,all_roles)
ORCLA log_archive_dest_3 service="ORCLC", SYNC AFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300
db_unique_name="orclc" net_timeout=30, valid_for=(online_logfile,all_roles)
 
INSTANCE_NAME NAME VALUE
---------------- -------------------- -------------------------------------------------------------------------------------------------------------
ORCLB log_archive_dest_1 location=/u01/fast_recovery_area
 
INSTANCE_NAME NAME VALUE
---------------- -------------------- -------------------------------------------------------------------------------------------------------------
ORCLC log_archive_dest_1 location=/u01/fast_recovery_area

In the latest SQL Developer you have the graphical representation of it from the DBA view / Dataguard / console:

SDDG001

Cascading Standby

In 12c we can define cascading standby: instead of the primary shipping the redo to all standby databases, you can have the primary shipping to one standby only, and this one can forward the redo to another one. You define that with the RedoRoute property:


DGMGRL> edit database orcla set property redoroutes = '(local:orclb) (orclb:orclc async)';
Property "redoroutes" updated
DGMGRL> edit database orclb set property redoroutes = '(orcla:orclc async) (local:orcla)';
Property "redoroutes" updated

The first route defined in each property is applied when orcla is the primary database:

  • on orcla (local:orclb) means that orcla sends redo to orclb when primary
  • on orclb (orcla:orclc async) means that orclb sends redo to orclc when orcla is primary. LogXptMode is SYNC but overriden here with ASYNC

The second route defined in each property is applied when orclb is the primary database:

  • on orcla (orclb:orclc async) means that orclb sends redo to orclc when orclb is primary. LogXptMode is SYNC but overriden here with ASYNC
  • on orclb (local:orcla) means that orclb sends redo to orcla when primary

With this configuration, and orcla still being the primary, the broker has set the following log destination on orcla, orclb and orclc:


INSTANCE_NAME NAME VALUE
---------------- -------------------- -------------------------------------------------------------------------------------------------------------
ORCLA log_archive_dest_1 location=USE_DB_RECOVERY_FILE_DEST, valid_for=(ALL_LOGFILES, ALL_ROLES)
ORCLA log_archive_dest_2 service="ORCLB", SYNC AFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300
db_unique_name="orclb" net_timeout=30, valid_for=(online_logfile,all_roles)
 
INSTANCE_NAME NAME VALUE
---------------- -------------------- -------------------------------------------------------------------------------------------------------------
ORCLB log_archive_dest_1 location=/u01/fast_recovery_area
ORCLB log_archive_dest_2 service="ORCLC", ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=3
00 db_unique_name="orclc" net_timeout=30, valid_for=(standby_logfile,all_roles)
 
INSTANCE_NAME NAME VALUE
---------------- -------------------- -------------------------------------------------------------------------------------------------------------
ORCLC log_archive_dest_1 location=/u01/fast_recovery_area

The show configuration from DGMGRL displays them indented to see the cascading redo shipping:

DGMGRL> show configuration
Configuration - orcl
 
Protection Mode: MaxPerformance
Members:
orcla - Primary database
orclb - Physical standby database
orclc - Physical standby database (receiving current redo)
 
Fast-Start Failover: DISABLED
 
Configuration Status:
SUCCESS (status updated 27 seconds ago)

And SQL Developer Data Guard console shows:
SDDG002

Switchover

Now the goal of defining several routes is to have all log destination automatically changed when the database role change.
I’m doing a switchover:


Connected to "orclb"
Connected as SYSDG.
DGMGRL> switchover to orclb;
Performing switchover NOW, please wait...
New primary database "orclb" is opening...
Operation requires start up of instance "ORCLA" on database "orcla"
Starting instance "ORCLA"...
ORACLE instance started.
Database mounted.
Database opened.
Connected to "orcla"
Switchover succeeded, new primary is "orclb"

Now it is orcla which cascades the orclb redo to orclc:

DGMGRL> show configuration;
Configuration - orcl
 
Protection Mode: MaxPerformance
Members:
orclb - Primary database
orcla - Physical standby database
orclc - Physical standby database (receiving current redo)
 
Fast-Start Failover: DISABLED
 
Configuration Status:
SUCCESS (status updated 74 seconds ago)

Here is how it is displayed from SQL Developer:

SDDG003

We have seen how the configuration is displayed from DGMGRL and graphically from SQL Developer. Of course, you can also query the Data Guard configuration:

SQL> select * from V$DATAGUARD_CONFIG;
 
DB_UNIQUE_NAME PARENT_DBUN DEST_ROLE CURRENT_SCN CON_ID
-------------- ----------- --------- ----------- ------
orcla orclb PHYSICAL STANDBY 3407900 0
orclc orcla PHYSICAL STANDBY 3408303 0
orclb NONE PRIMARY DATABASE 0 0

and the broker configuration:

SQL> select * from V$DG_BROKER_CONFIG;
 
DATABASE CONNECT_IDENTIFIER DATAGUARD_ROLE REDO_SOURCE ENABLED STATUS VERSION CON_ID
-------- ------------------ -------------- ----------- ------- ------ ------- ------
orcla ORCLA PHYSICAL STANDBY -UNKNOWN- TRUE 0 11.0 0
orclb ORCLB PRIMARY -N/A- TRUE 0 11.0 0
orclc ORCLC PHYSICAL STANDBY orcla TRUE 0 11.0 0

This another reason to use the broker. Once the configuration is setup and tested, you have nothing else to think about when you do a switchover. The log archive destination is automatically updated depending on the database roles.

 

Cet article Display Data Guard configuration in SQL Developer est apparu en premier sur Blog dbi services.

Postgres vs. Oracle access paths I – Seq Scan

Here is the first test I’ve done for my Postgres vs. Oracle access paths series and the first query did a sequential scan. It illustrates the first constant you find in the documentation for the query planner:
seq_page_cost (floating point)
Sets the planner’s estimate of the cost of a disk page fetch that is part of a series of sequential fetches. The default is 1.0.

Table creation

I start by creating a very simple table with 10000 rows and 3 columns. The first column(n) is indexed:

create table demo1 as select generate_series n , 1 a , lpad('x',1000,'x') x from generate_series(1,10000);
SELECT 10000
create unique index demo1_n on demo1(n);
CREATE INDEX
 
analyze verbose demo1;
INFO: analyzing "public.demo1"
INFO: "demo1": scanned 1429 of 1429 pages, containing 10000 live rows and 0 dead rows; 10000 rows in sample, 10000 estimated total rows
ANALYZE
select relkind,relname,reltuples,relpages from pg_class where relname='demo1';
relkind | relname | reltuples | relpages
---------+---------+-----------+----------
r | demo1 | 10000 | 1429
 
select relkind,relname,reltuples,relpages from pg_class where relname='demo1_n';
relkind | relname | reltuples | relpages
---------+---------+-----------+----------
i | demo1_n | 10000 | 30

I checked the table and index statistics that will be used by the optimizer: 10000 rows, all indexed, 1429 table blocks and 30 index blocks. Note that blocks are called pages, but that’s the same idea: the minimal size read and written to disk. They are also called buffers as they are read into a buffer and cached in the buffer cache.

Here is how I create a similar table in Oracle:

create table demo1 as select rownum n , 1 a , lpad('x',1000,'x') x from xmltable('1 to 10000');
Table created.
create unique index demo1_n on demo1(n);
Index created.
exec dbms_stats.gather_table_stats(user,'demo1');
PL/SQL procedure successfully completed.
 
select table_name,num_rows,blocks from user_tables where table_name='DEMO1';
 
TABLE_NAME NUM_ROWS BLOCKS
---------- ---------- ----------
DEMO1 10000 1461
 
select index_name,num_rows,leaf_blocks,blevel from user_indexes where table_name='DEMO1';
 
INDEX_NAME NUM_ROWS LEAF_BLOCKS BLEVEL
---------- ---------- ----------- ----------
DEMO1_N 10000 20 1

The same rows are stored in 1421 table blocks and the index entries in 20 blocks. Both use 8k blocks, but different storage layout and different defaults. This is about 7 rows per table blocks, for rows that are approximately larger than 1k and about 500 index entries per index block to store the number for column N plus the pointer to table row (a few bytes called TID in Postgres or ROWID for Oracle). I’ll not get into the details of the number here. More about the row storage:

My goal is to detail the execution plans and the execution statistics.

Postgres Seq Scan

I start with a very simple query on my table: SELECT SUM(N) from DEMO1;


explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1554.00..1554.01 rows=1 width=8) (actual time=4.616..4.616 rows=1 loops=1)
Output: sum(n)
Buffers: shared hit=1429
-> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.011..3.614 rows=10000 loops=1)
Output: n, a, x
Buffers: shared hit=1429
Planning time: 0.468 ms
Execution time: 4.661 ms

This query does a sequential scan (Seq Scan), which is the equivalent of Oracle Full Table Scan: read all rows from the table. You might tell me that it would be cheaper to scan the index because the index I’ve created holds all required columns. We will see that in the next post. Here, after having created the table as I did above, the query planner prefers to scan the table.

Here are the maths: my table has 1429 pages and each page access during a sequential scan has cost=1 as defined by:

show seq_page_cost;
seq_page_cost
---------------
1

Here, I see a cost estimated from 0 to 1529 for the Seq Scan operation.
The first number, 0.00 is the initialization cost estimating the work done before returning any rows. A Seq Scan has nothing to do before, and reading the first block can already return rows.
The second number is the cost to return all rows. We have seen that the scan itself costs 1429 but the rows (tuples) must be read and processed. This is evaluated using the following constant:

show cpu_tuple_cost;
cpu_tuple_cost
----------------
0.01

For 10000 rows, the cost to process them is 0.01*10000=100 which is an additional cost over the Seq Scan 1429 to get it to 1529. This explains cost=0.00..1529.00

Then there is a SUM operation applied to 10000 rows and there is a single parameter for the CPU cost of operators and functions:

show cpu_operator_cost;
cpu_operator_cost
-------------------
0.0025

Capturepgoraseqscan001
The sum (Aggregate) operation adds 0.0025*10000=25 to the cost and then the cost is 1554. You can see this cost in the minimal cost for the query, the first number in cost=1554.00..1554.01, which is the cost before retrieving any rows. This makes sense because before retrieving the first row we need to read (Seq Scan) and process (Aggregate) all rows, which is exactly what the cost of 1554 is.

Then there is an additional cost when we retrieve all rows. It is only one row here because it is a sum without group by, and this adds the default cpu_tuple_cost=0.01 to the initial cost: 1554.01

In summary, The total cost of the query is cost=1554.00..1554.01 and we have seen that it depends on:
– number of pages in the table
– number of rows from the result of the scan (we have no where clause here)
– number of rows summed and retrieved
– the planner parameters seq_page_cost, cpu_tuple_cost, and cpu_operator_cost

Oracle Full Table Scan

When I run the same query on Oracle, the optimizer chooses an index fast full scan rather than a table full scan because all rows and columns are in the index that I’ve created:

  • all rows because the SUM(N) do not need to get rows where N is not null (which are not stored in the index)
  • all columns because I need nothing else than the values for N

We will see that in the next post, for the moment, in order to compare with Postgres, I forced a full table scan with the FULL() hint.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID bhsjquhh6y08q, child number 0
-------------------------------------
select /*+ full(demo1) */ sum(n) from demo1
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 397 (100)| 1 |00:00:00.01 | 1449 |
| 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.01 | 1449 |
| 2 | TABLE ACCESS FULL| DEMO1 | 1 | 10000 | 397 (0)| 10000 |00:00:00.01 | 1449 |
---------------------------------------------------------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) SUM("N")[22]
2 - (rowset=256) "N"[NUMBER,22]

We have seen that Postgres cost=1 is for sequential scans (similar to what we call multiblock reads in Oracle) and random reads (single block reads) have by default cost=4 according to random_page_cost.

Oracle cost unit is based on single block reads and this is why the cost here (397) is lower than the number of blocks (1461). Different units. Postgres counts cost=1 for reads and counts a higher cost when a seek is involved. Oracle counts cost=1 for single block reads (including seek) and lower cost for larger I/O size.
Capturepgoraseqscan002
With the default system statistics, where latency is estimated 10 milliseconds and transfer is estimated to 4KB/ms. The single block read time is estimated to 12 milliseconds (10 + 8192/4096).
Again with the default system statistics where optimizer estimates 8 blocks per multiblock read, the multiblock read time is estimated to 26 milliseconds (10 + 8*8192/4096) which is on average 26/8=3.25 millisecond per block. This means that the ratio of single vs. multi block read is very similar for Oracle (3.25/12=0.27833333) and Postgres (seq_page_cost /random_page_cost=1/4=0.25) with default parameters.

Our table is stored in 1461 blocks and the full table scan involves reading all of them plus some segment header blocks. 1461*0.27833333=396

There is also the costing of CPU (the equivalent to cpu_tuple_cost) which is included here but I’ll not go into the details which are more complex than in Postgres and depends on your processor frequency. The goal of those posts is about Postgres. For Oracle, all this is explained in Jonathan Lewis and Chris Antognini books.

But basically, the idea is the same: Postgres Seq Scan and Oracle Full table Scan read the contiguous table blocks sequentially and the cost mainly depends on the size of the table (number of blocks) and the estimated time for sequential I/O (where bandwidth counts more than latency).

Buffers

In my tests, I’ve not only explained the query, but I executed it to get execution statistics. This is done with EXPLAIN ANALYZE in Postgres and DBMS_XPLAN.DISPLAY_CURSOR in Oracle. The statistics include the number of blocks read at each plan operation, with the BUFFERS option in Postgres and with STATISTICS_LEVEL=ALL in Oracle.


explain (analyze,buffers) select sum(n) from demo1 ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Aggregate (cost=1554.00..1554.01 rows=1 width=8) (actual time=3.622..3.622 rows=1 loops=1)
Buffers: shared hit=1429
-> Seq Scan on demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.008..1.724 rows=10000 loops=1)
Buffers: shared hit=1429
Planning time: 0.468 ms
Execution time: 4.661 ms

‘Buffers’ displays the number of blocks that have been read by the Seq Scan and is exactly the number of pages in my table. ‘shared hit’ means that they come from the buffer cache.

Let’s run the same when the cache is empty:

explain (analyze,verbose,costs,buffers) select sum(n) from demo1 ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1554.00..1554.01 rows=1 width=8) (actual time=13.837..13.837 rows=1 loops=1)
Output: sum(n)
Buffers: shared read=1429
-> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4) (actual time=0.042..12.506 rows=10000 loops=1)
Output: n, a, x
Buffers: shared read=1429
Planning time: 3.754 ms
Execution time: 13.906 ms

The buffers are now ‘shared read’ instead of ‘shared hit’. In Postgres, the number of logical reads, as we know them in Oracle, is the sum of hits and reads. In Oracle, all blocks are counted as logical reads, which includes the smaller set of physical reads.

IO calls

Here is more about the reads when the block is not in the buffer cache. On Linux, we can trace the system calls to see how those sequential I/Os are implemented.

I get the ‘relfilenode':

postgres=# select relname,relnamespace,reltype,relowner,relfilenode,relpages,reltuples from pg_class where relname='demo1';
relname | relnamespace | reltype | relowner | relfilenode | relpages | reltuples
---------+--------------+---------+----------+-------------+----------+-----------
demo1 | 2200 | 42429 | 10 | 42427 | 1429 | 10000

I get the pid of my session process:

select pg_backend_pid();
-[ RECORD 1 ]--+------
pg_backend_pid | 30732

I can trace system calls:

strace -p 30732

And look at the trace concerning my file (identified with its ‘relfilenode’):

30732 open("base/12924/42427", O_RDWR) = 33
30732 lseek(33, 0, SEEK_END) = 11706368
30732 open("base/12924/42427_vm", O_RDWR) = 43
30732 lseek(33, 0, SEEK_END) = 11706368
30732 lseek(33, 0, SEEK_END) = 11706368
30732 lseek(33, 0, SEEK_SET) = 0
30732 read(33, "\4004\220\3 \4 \360\233\30\10\340\227\30\10"..., 8192) = 8192
30732 read(33, "\4004\220\3 \4 \360\233\30\10\340\227\30\10"..., 8192) = 8192
30732 read(33, "\4004\220\3 \4 \360\233\30\10\340\227\30\10"..., 8192) = 8192
... 1429 read(33) in total

We see two open() calls with the relfilenode of my table in the file name: one for the table and one for the visibility map
The file descriptor for the table file is 33 and I’ve grepped only the related calls.
The lseek(33,0,SEEK_END) goes to the end of the file (11706368 bytes, which is 11706368/8192=1429 pages.
The lseek(33,0,SEEK_SET) goes to the beginning of the file.
Subsequent read() calls read the whole file, reading page per page (8192 bytes), in sequential order.

This is how sequential reads are implemented in Postgres: one lseek() and sequential read() calls. The I/O size is always the same (8k here). The benefit of sequential scan is not larger I/O calls but simply the absence of seek() in between. The optimization is left to the underlying layers filesystem and read-ahead.

This is very different from Oracle. Not going into the details, here are the kind of system calls you see during the full table scan:

open("/u01/oradata/CDB1A/PDB/users01.dbf", O_RDWR|O_DSYNC) = 9
fcntl(9, F_SETFD, FD_CLOEXEC) = 0
fcntl(9, F_DUPFD, 256) = 258
...
pread(258, "\6\242\2\5\3\276\25%\2\4\24\270\1\313!\1x\25%"..., 1032192, 10502144) = 1032192
pread(258, "\6\242\202\5\3\300\25%\2\4\16\247\1\313!\1x\25%"..., 1032192, 11550720) = 1032192
pread(258, "\6\242\2\6\3\302\25%\2\4x\226\1\313!\1x\25%"..., 417792, 12599296) = 417792

Those are also sequential reads of contiguous blocks but done with larger I/O size (126 blocks here). So in addition to the absence of seek() calls, it is optimized to do less I/O calls, not relying on the underlying optimization at OS level.

Oracle can also trace the system calls with wait events, which gives more information about the database calls:

WAIT #140315986764280: nam='db file scattered read' ela= 584 file#=12 block#=1282 blocks=126 obj#=74187 tim=91786554974
WAIT #140315986764280: nam='db file scattered read' ela= 485 file#=12 block#=1410 blocks=126 obj#=74187 tim=91786555877
WAIT #140315986764280: nam='db file scattered read' ela= 181 file#=12 block#=1538 blocks=51 obj#=74187 tim=91786556380

The name ‘scattered’ is misleading. ‘db file scattered read’ are actually multiblock reads: read more than one block in one I/O call. Oracle does not rely on the Operating System read-ahead and this is why we can (and should) use direct I/O and Async I/O if the database buffer cache is correctly sized.

Output and Projection

I’ve run the EXPLAIN with the VERBOSE option which shows the ‘Output’ for each operation, and I’ve done the equivalent in Oracle by adding the ‘+projection’ format in DBMS_XPLAN.

In the Oracle execution plan, we see the columns remaining in the result of each operation, after the projection:

Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) SUM("N")[22]
2 - (rowset=256) "N"[NUMBER,22]

The operation 2, the Full Table Scan, reads all rows with all columns, but selects only the one we need: N

In the Postgres equivalent, it seems that the Output mentions the columns available before the projection because we see all table columns here:

explain verbose select sum(n) from demo1 ;
QUERY PLAN
-------------------------------------------------------------------------
Aggregate (cost=1554.00..1554.01 rows=1 width=8)
Output: sum(n)
-> Seq Scan on public.demo1 (cost=0.00..1529.00 rows=10000 width=4)
Output: n, a, x

I prefer to see the columns after the projection and I use it a lot in Oracle to know which columns are needed from the table. A great optimization can be done when we have a covering index where all selected columns are present so that we don’t have to go to the table. But we will see that in the next post about Index Only Scan.

 

Cet article Postgres vs. Oracle access paths I – Seq Scan est apparu en premier sur Blog dbi services.

Postgres unique constraint

I’ll start a series on Postgres vs. Oracle access paths because I know Oracle and I learn Postgres. While preparing it, I came upon some surprises because I’m so used to Oracle that I take some behavior as granted for any SQL databases. I recently posted a tweet about one of them, comparing latest Postgres version to earliest Oracle version I have on my laptop.
The goal of the tweet was exactly what I said above: show my surprise, using Oracle 7 as a reference because this is the version where I started to learn SQL. And there’s no judgment behind this surprise: I can’t compare a software I use for more than 20 years with one I’m just learning. I have a big admiration for the Oracle design and architecture choices. But I’ve also a big admiration for what the Postgres community is doing.

In my tweet I’ve updated a primary key. I think I’ve never designed in real life a primary key that has to be updated later. For each table we need a key that is immutable to identify rows for referential integrity constraints, or for replication. The value must be known from the first insert (which means the columns are declared not null) and the value is never updated. It makes sense to use a primary key for that as it is unique and not null.

Actually, a better case would be a simple unique constraint where we just exchange two rows. A real-life example is a list of items, having probably a surrogate key as the primary key, and a unique key including an item number. When the user wants to move up one item, we just run an update on two rows, exchanging their numbers. The unique constraint just ensures that we have only distinct values so that a select … order by will always return the values in the same order.

All similar cases have the same consequence: when you process row by row the update, the uniqueness may be violated. But at the end of the statement, the constraint is still valid.

Here is the initial example with updating all rows:


create table demo as select generate_series n from generate_series(1,2);
SELECT 2
alter table demo add constraint demo_pk primary key(n);
ALTER TABLE
select * from demo;
n
---
1
2
(2 rows)
 
begin transaction;
BEGIN
update demo set n=n-1;
UPDATE 2
select * from demo;
n
---
0
1
(2 rows)

This works. I’ve inserted the rows in ascending order of n. Decreasing the value doesn’t violate the uniqueness at any time because it reads rows from the beginning to the end.

However, when we increase the value, we have a duplicate value until we process the next row. And by default, Postgres fails:

update demo set n=n+1;
ERROR: duplicate key value violates unique constraint "demo_pk"
DETAIL: Key (n)=(1) already exists.

The bad thing is that the behavior of the application depends on the physical order of the rows and the order where they are processed. This violates the Codd rule about physical independence. In addition to that, the SQL statements should behave as processing the set of rows rather than low-level row-by-row processing.

But there is also a very good thing: because the constraint is validated row by row, you know which value violates the constraint (here: “DETAIL: Key (n)=(1) already exists” ).

So my statement failed and this in Postgres seems to fail the whole transaction:

commit;
ROLLBACK

My second surprise is that the failure of one statement cancels the whole transaction. I see no error at commit, but it simply tells me that it has done a rollback instead of the commit.

deferrable

So, I compared with Oracle where this statement is always successful, because temporary violations that are resolved later, within the same statement, do not violate the constraint. I compared it with the oldest version I have on my laptop (Oracle 7.3) to show that it is something I’ve never seen as a new feature because I started with Oracle 7. And this kind of thing is the reason why I like SQL. Doing the same with a procedural language requires an intermediate update to be sure that there is no duplicate at any time.

The Postgres community is very responsive, especially when we may think that something works better in Oracle than Postgres (which was not the case here and which was not the goal of my tweet anyway – but tweets are short and may not express the tone properly).

Quickly a solutions were proposed: deferred constraint (example in this blog post).

I know deferred constraints in Oracle. They are similar in Postgres and here is the solution proposed:


alter table demo drop constraint demo_pk;
ALTER TABLE
alter table demo add constraint demo_pk primary key(n) deferrable initially deferred;
ALTER TABLE
begin transaction;
BEGIN
update demo set n=n-1;
UPDATE 2
select * from demo;
n
---
0
1
(2 rows)
 
update demo set n=n+1;
UPDATE 2

That seems good. Because the constraint validation is deferred, the update is successful.

However, this is not what I want. I want the previous statement to succeed, but I want the following statement to fail:

insert into demo values(1);
INSERT 0 1

Because constraint is deferred, this statement is successful and it is only at commit that it fails:

commit;
ERROR: duplicate key value violates unique constraint "demo_pk"
DETAIL: Key (n)=(1) already exists.

Why do I think this is not the good solution? First, because I want the statement to fail as soon as possible. And in addition to that, I want the commit to be fast. Doing expensive things at commit should be avoided, if possible. It is the point where all work is supposed to be done and you just want to save it (make it durable and visible to others).

deferrable initially immediate

Actually, the solution is to declare the constraint as deferrable, but not deferred.

alter table demo drop constraint demo_pk;
ALTER TABLE
alter table demo add constraint demo_pk primary key(n) deferrable initially immediate;
ALTER TABLE

This says that it is deferrable, but not deferred (except if you decide to set the constraint deferred for your transaction). That way it accepts temporary constraint violation if they are resolved at the end of the statement.

Now, my update statement is sucessful:

begin transaction;
BEGIN
update demo set n=n-1;
UPDATE 2
select * from demo;
n
---
0
1
(2 rows)
update demo set n=n+1;
UPDATE 2

Any other statement that violates the constraint fails immediately:

insert into demo values(1);
ERROR: duplicate key value violates unique constraint "demo_pk"
DETAIL: Key (n)=(1) already exists.
commit;
ROLLBACK

Documentation

The nice thing is that this is documented! I didn’t find it immediately because it is in the ‘Compatibility’ part of the ‘create table’ documentation. I’m not yet used to the Postgres documentation. I stopped at the ‘DEFERRED’ definition which mentions: A constraint that is not deferrable will be checked immediately after every command

But later Compatibility adds something more specific to the unique constraint:

Non-deferred Uniqueness Constraints
When a UNIQUE or PRIMARY KEY constraint is not deferrable, PostgreSQL checks for uniqueness immediately whenever a row is inserted or modified. The SQL standard says that uniqueness should be enforced only at the end of the statement; this makes a difference when, for example, a single command updates multiple key values. To obtain standard-compliant behavior, declare the constraint as DEFERRABLE but not deferred (i.e., INITIALLY IMMEDIATE). Be aware that this can be significantly slower than immediate uniqueness checking.

That’s another good point. Postgres documentation is clear and gives the right solution. We just have to read it to the end.

A side note for my French speaking readers here to mention that the Postgres documentation has been translated into French by Guillaume Lelarge, who also translated Markus Winand book and website. Translation is as good as the original in both cases.

Performance

The documentation mentions ‘significantly slower’. Here is a test on 100000 rows with non deferable constraint:

create table demo as select generate_series n from generate_series(1,100000);
SELECT 100000
alter table demo add constraint demo_pk primary key(n);
ALTER TABLE
vacuum demo;
VACUUM
select * from pgstatindex('demo_pk');
version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
2 | 1 | 2260992 | 3 | 1 | 274 | 0 | 0 | 89.83 | 0
(1 row)

Here is the update n=n-1 where all rows are updated but none violates the constraint at any time:

explain (analyze,verbose,costs,buffers)update demo set n=n-1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Update on public.demo (cost=0.00..1693.00 rows=100000 width=10) (actual time=425.699..425.699 rows=0 loops=1)
Buffers: shared hit=578646 read=1202 dirtied=1267
-> Seq Scan on public.demo (cost=0.00..1693.00 rows=100000 width=10) (actual time=0.013..16.186 rows=100000 loops=1)
Output: (n - 1), ctid
Buffers: shared hit=443

This update has read 578646+1202=579848 buffers.

Now creating the deferrable constraint:

alter table demo drop constraint demo_pk;
ALTER TABLE
alter table demo add constraint demo_pk primary key(n) deferrable initially immediate;
ALTER TABLE
vacuum demo;
VACUUM
select * from pgstatindex('demo_pk');
version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
2 | 1 | 2260992 | 3 | 1 | 274 | 0 | 0 | 89.83 | 0

And do the n=n+1 update:

explain (analyze,verbose,costs,buffers)update demo set n=n+1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Update on public.demo (cost=0.00..2135.00 rows=100000 width=10) (actual time=481.868..481.868 rows=0 loops=1)
Buffers: shared hit=679405 read=760 dirtied=825
-> Seq Scan on public.demo (cost=0.00..2135.00 rows=100000 width=10) (actual time=0.268..16.329 rows=100000 loops=1)
Output: (n + 1), ctid
Buffers: shared hit=885
Planning time: 0.237 ms
Trigger PK_ConstraintTrigger_75314 for constraint demo_pk: time=174.976 calls=99999
Execution time: 663.799 ms

This read more buffers and we can see that an internal trigger (PK_ConstraintTrigger_75314) has been run to re-check the unique constraint at the end of the statement. But only 17% more here for this special case where all rows are updated.

However, a more realistic test case exchanging only two values is much cheaper:


explain (analyze,verbose,costs,buffers) update demo set n=case when n=2 then 2000 when n=2000 then 2 end where n in (2,2000);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Update on public.demo (cost=8.85..16.60 rows=2 width=10) (actual time=0.079..0.079 rows=0 loops=1)
Buffers: shared hit=23
-> Bitmap Heap Scan on public.demo (cost=8.85..16.60 rows=2 width=10) (actual time=0.016..0.055 rows=2 loops=1)
Output: CASE WHEN (n = 2) THEN 2000 WHEN (n = 2000) THEN 2 ELSE NULL::integer END, ctid
Recheck Cond: (demo.n = ANY ('{2,2000}'::integer[]))
Heap Blocks: exact=3
Buffers: shared hit=9
-> Bitmap Index Scan on demo_pk (cost=0.00..8.85 rows=2 width=0) (actual time=0.009..0.009 rows=4 loops=1)
Index Cond: (demo.n = ANY ('{2,2000}'::integer[]))
Buffers: shared hit=6
Planning time: 0.137 ms
Trigger PK_ConstraintTrigger_75322 for constraint demo_pk: time=0.005 calls=1
Execution time: 0.120 ms

In my opinion, the overhead here is totally acceptable, especially given the fact that this re-check displays exactly which value violates the constraint in case there is a duplicate.

But I’m going too fast here. I’ve not even started my blog series about access paths where I’ll explain the cost of the execution plans, starting from the most simple: Seq Scan. Follow my blog or twitter to get informed. There will be nothing about ‘which is better, Oracle or Postgres?’. But I’m convinced that knowing the difference helps to understand how it works, and to design an application that has the correct behavior if ported from one to the other.

 

Cet article Postgres unique constraint est apparu en premier sur Blog dbi services.