Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Oracle and postgres disk IO performance

This post is about one of the fundamentally important properties of a database: how IO is done. The test case I studied is doing a simple full table scan of a single large table. In both Oracle and postgres the table doesn’t have any indexes or constraints, which is not a realistic example, but this doesn’t change the principal topic of the study: doing a table scan.

I used a publicly available dataset from the US bureau of transportation statistics called FAF4.5.1_database.zip
The zipped file is 347MB, unzipped size 1.7GB.

In both cases Oracle Linux 7.7 (64 bit) is used, running in VirtualBox, with the storage being a USB3 SSD. Number of CPUs is 4, memory size is 6G. Filesystem type: xfs.
The Oracle version used is Oracle 19.5, the Postgresql version used is 12.1.
For Postgresql, the postgresql.conf file is not changed, except for max_parallel_workers_per_gather which is set to 0 to make postgres use a single process.
For Oracle, the parameters that I think are important: filesystemio_options=’setall’. Oracle is used filesystem based (so no ASM).

This is the table definition for Oracle:

create table faf451 (
  fr_origin varchar2(3),
  dms_orig varchar2(3),
  dms_dest varchar2(3),
  fr_dest varchar2(3),
  fr_inmode varchar2(1),
  dms_mode varchar2(1),
  fr_outmode varchar2(1),
  sctg2 varchar2(2),
  trade_type varchar2(1),
  tons number,
  value number,
  tmiles number,
  curval number,
  wgt_dist number,
  year varchar2(4)
);

This is the table definition for Postgresql:

create table faf451 (
  fr_origin varchar(3),
  dms_orig varchar(3),
  dms_dest varchar(3),
  fr_dest varchar(3),
  fr_inmode varchar(1),
  dms_mode varchar(1),
  fr_outmode varchar(1),
  sctg2 varchar(2),
  trade_type varchar(1),
  tons double precision,
  value double precision,
  tmiles double precision,
  curval double precision,
  wgt_dist double precision,
  year varchar(4)
);

In order for the data to be easy loadable into postgres using copy from, I had to remove ‘””‘ (double double quotes) for the empty numeric fields. In oracle I could say “optionally enclosed by ‘”‘”. For Oracle I used an external table definition to load the data.

Now, before doing any benchmarks, I have an idea where this is going. Oracle is using direct IO (DIO) so linux page cache management and “double buffering” are avoided. Also, oracle will be doing asynchronous IO (AIO), which means submitting is separated from waiting for the notification that the submitted IOs are ready, and on top of that oracle will submit multiple IO requests at the same time. And again on top of that, oracle does multi-block IO, which means that instead of requesting each 8K database block individually, it will group adjacent blocks and request for these in one go, up to a size of combined blocks of 1MB, which means it can requests up to 128 8K blocks in one IO. Postgres will request every block synchronous, so 1 8K block at a time, and waiting for each request to finish. That makes me have a strong idea where this is going.

It should be noted that postgres explicitly is depending on the operating system page cache for buffering as a design principle. Because of DIO, blocks that are read by oracle are not cached in the operating system page cache.

I executed my benchmark in the following way:
– A run for every size is executed 5 times.
– At the start of every run for a certain size (so before every “batch” of 5 runs), the page cache is flushed: (echo 3 > /proc/sys/vm/drop_caches).
– Before each individual run, the database cache is flushed (systemctl restart postgresql-12 for postgres, alter system flush buffer_cache for oracle).

I started off with 2G from the dataset, and then simply performed a ‘copy from’ again to load the same dataset into the table in postgres. Oracle required a bit more of work. Oracle was able to save the same data in way less blocks; the size became 1.18G. In order to have both postgres and oracle scan the same amount of data, I calculated roughly how much rows I needed to add to the table to make it 2G, and copied that table to save it as a 2G table, so I could insert that table to increase the size of the test table by 2G. This way in both oracle and postgres I could test with a 2G table and add 2G at a time until I reached 20G.

These are the results. As you can see in the legenda: oracle is orange, postgres is blue.
postgres oracle scan results(click graph to load full picture)

What we see, is that postgres is a bit slower with the first run of 5 for the smaller dataset sizes, which becomes less visible with larger datasets.
Also, postgres is way faster if the dataset fits into the page cache and it has been read into it. This is logical because postgres explicitly uses the page cache as a secondary cache, and the test is the only activity on this server, so it hasn’t been flushed by other activity.

What was totally shocking to me, is postgres is performing alike oracle and both roughly are able to perform at the maximum IO speed of my disk: 300MB/s, especially when the dataset is bigger, alias beyond the page cache size.

It wasn’t shocking that oracle could reach the total bandwidth of the disk: oracle uses all the techniques to optimise IO for bandwidth. But how can postgres do the same, NOT deploying these techniques, reading 8K at a time??

The first thing to check is whether postgres is doing something else than I suspected. This can simply be checked with strace:

poll_wait(3, [{EPOLLIN, {u32=18818136, u64=18818136}}], 1, -1) = 1
recvfrom(11, "Q\0\0\0!select count(*) from faf451"..., 8192, 0, NULL, NULL) = 34
lseek(20, 0, SEEK_END)                  = 335740928
lseek(20, 0, SEEK_END)                  = 335740928
kill(1518, SIGUSR1)                     = 0
pread64(5, "\f\0\0\0\310ILc\0\0\0\0D\1\210\1\0 \4 \0\0\0\0\230\237\312\0000\237\312\0"..., 8192, 846061568) = 8192
pread64(5, "\f\0\0\0HcLc\0\0\0\0D\1\210\1\0 \4 \0\0\0\0\230\237\312\0000\237\312\0"..., 8192, 846069760) = 8192
pread64(5, "\f\0\0\0\260|Lc\0\0\0\0D\1\210\1\0 \4 \0\0\0\0\230\237\312\0000\237\312\0"..., 8192, 846077952) = 8192
pread64(5, "\f\0\0\0000\226Lc\0\0\0\0D\1\210\1\0 \4 \0\0\0\0\230\237\312\0000\237\312\0"..., 8192, 846086144) = 8192
…etc…

The above strace output shows only 4 rows of pread64() calls, but this goes on. So no “secret” optimisation there.

Luckily, my VM has a new enough version of Linux for it to be able to use eBPF, so I can use biosnoop. Biosnoop is a tool to look at IO on one of the lower layers of the linux kernel, the block device interface (hence ‘bio’). This is the biosnoop output:

# /usr/share/bcc/tools/biosnoop
TIME(s)        COMM           PID    DISK    T  SECTOR    BYTES   LAT(ms)
0.000000000    postmaster     4143   sdb     R  66727776  708608     5.51
0.006419000    postmaster     4143   sdb     R  66731720  77824     11.06
0.006497000    postmaster     4143   sdb     R  66734432  786432    11.03
0.011550000    postmaster     4143   sdb     R  66731872  1310720   16.17
0.013470000    postmaster     4143   sdb     R  66729160  1310720   18.86
0.016439000    postmaster     4143   sdb     R  66735968  1310720   14.61
0.019220000    postmaster     4143   sdb     R  66738528  786432    15.20

Wow…so here it’s doing IOs of up to 1MB! So somewhere between postgres itself and the block device, the IOs magically grew to sizes up to 1MB…that’s weird. The only thing that sits between postgres and the block device is the linux kernel, which includes page cache management.

To get an insight into that, I ran ‘perf record -g -p PID’ during the scan, and then perf report to look at the recorded perf data. This is what is I found:

Samples: 21K of event 'cpu-clock', Event count (approx.): 5277000000
  Children      Self  Command     Shared Object       Symbol                                                                  ◆
-   41.84%     3.63%  postmaster  libpthread-2.17.so  [.] __pread_nocancel                                                    ▒
   - 38.20% __pread_nocancel                                                                                                  ▒
      - 38.08% entry_SYSCALL_64_after_hwframe                                                                                 ▒
         - 37.95% do_syscall_64                                                                                               ▒
            - 35.87% sys_pread64                                                                                              ▒
               - 35.51% vfs_read                                                                                              ▒
                  - 35.07% __vfs_read                                                                                         ▒
                     - 34.97% xfs_file_read_iter                                                                              ▒
                        - 34.69% __dta_xfs_file_buffered_aio_read_3293                                                        ▒
                           - 34.32% generic_file_read_iter                                                                    ▒
                              - 21.10% page_cache_async_readahead                                                             ▒
                                 - 21.04% ondemand_readahead                                                                  ▒
                                    - 20.99% __do_page_cache_readahead                                                        ▒
                                       + 14.14% __dta_xfs_vm_readpages_3179                                                   ▒
                                       + 5.07% __page_cache_alloc                                                             ▒
                                       + 0.97% radix_tree_lookup                                                              ▒
                                       + 0.54% blk_finish_plug                                                                ▒

If you look at rows 13-15 you see that the kernel is performing readahead. This is an automatic function in the linux kernel which looks if the requests are sequential of nature, and when that’s true performs readahead, so that the scan is made faster.

PL/SQL That is not DEFINER or INVOKER rights - BUG?

I always understood that PL/SQL objects in the database that are not explicitly changed to INVOKER rights - CURRENT_USER or DEFINER. During testing of an issue with PFCLScan where a customer was using our project template to analyse PL/SQL and....[Read More]

Posted by Pete On 24/01/20 At 03:19 PM

Friday Philosophy – Presenting Sex

These slides are from the first ever presentation I did. And yes, the presentation was at work.

My first real job after college was as a trainee analyst programmer for the National Health Service and, as part of the “graduate training program”, we attended a short course on presentation skills. As you would expect, we all had to do a presentation at the end. As most of us had never had a job before and we were dull with no interesting hobbies, most of the presentations I could see being put together around me were a little… boring. I decided to try something different.

If you think the photographs with this article are a little poor with odd reflections, that is because the original images are printed on transparent acetate sheets and were displayed by putting them on an overhead projector – a large box-thing with a blindingly bright light that shone through the acetate sheet and cast the image onto a screen or wall via a mirror/lens bit. Laptops had not been invented back then and I’m not even sure a PC (if you could afford a PC 386SX) could drive a projector. This was all manual – you had to change sheets by hand. At least you never had problems with connecting the overhead projector to the acetate sheet, you simply put the sheet on upside down and back-to-front and had to re position it 3 times before you got it right. This is important, I could not quickly flick between images.

When I put up my first sheet, with the cute little couple holding hands, our tutor screeched and said to one of the other delegates “Oh God! I thought you were kidding when you said what he was presenting on!”. Before I could even take this opening image off the projector sh had stepped forward and told me I could not present this.

“Why not?” I asked, “we are all adults!”. She refused to let me swap to my first proper slide, “This is not the sort of topic that is suitable at work.”

Stand off.

“Well, what do you think I am going to talk about?”. Our tutor was now quite flustered. “I think we all know what you are presenting on – and I shudder to think what the next slide is going to be of!” (or something like that, this was a long time ago). I had no choice. I got her to come closer and look at my next couple of slides…

Her relief was visible. She could immediately see the next slides were not based on “The Joy of Sex” or similar publications and after she’d looked at each of my acetate sheets carefully (just to make sure) I was allowed to continue.


https://mwidlake.files.wordpress.com/2020/01/sex1.jpg?w=448&h=600
448w, https://mwidlake.files.wordpress.com/2020/01/sex1.jpg?w=112&h=150 112w" sizes="(max-width: 224px) 100vw, 224px" />

Of course, this had somewhat diluted the tension & expectation that had been building up, but I felt I had milked the initial surprise as much as I was going to be able to. I moved onto the next slide and most of the audience was disappointed by the lack of limbs, bodies and appendages to be seen. As you can see to the left, the next slide was an odd set of little diagram of dots & letters and what many of us would now recognise as a sort-of family tree diagram. As some of you know, my degree had been in genetics (and zoology but that is bye-the-bye).

There is a very interesting thing about sex, as in sexual reproduction. What is the point? Well, apart from the immediate pleasure for animals like us that seem to enjoy the initial act, why do we mix the genomes of two organisms to produce new offspring? It is not mandatory, many organisms (especially plants and bacteria) employ asexual reproduction. All the children are effectively clones of the adult. There is no finding a mate or the need for pollen to arrive, the actual process biologically is a lot simpler & more reliable, and you don’t need males running around using up resources for, let’s face it, a pretty small contribution to the effort. Asexual reproduction is a lot quicker, simpler, needs less energy. A species that does away with sex can out-compete sexy competition.

 

https://mwidlake.files.wordpress.com/2020/01/sex2.jpg?w=448&h=600 448w, https://mwidlake.files.wordpress.com/2020/01/sex2.jpg?w=112&h=150 112w" sizes="(max-width: 224px) 100vw, 224px" />

My little talk was all about that, about why you have male and female, why mixing the genes from two individuals and, over time, across the gene pool of your species, is beneficial. I won’t bore you with the details here.

That first presentation of mine went down very well and it was remembered by everyone there. A lot of people (who had not even been there for the premier of that talk) mentioned it to me when I left the company. It made an impression on me too – if you can grab people’s attention at the start of a presentation, it really helps make it a success.

And, of course, as anyone in marketing will tell you – Sex Sells.

In this case, even the lack of sex.

Online Statistics Collection during Bulk Loads on Partitioned Tables

Introduction

One of the enhancements to statistics collection and management in Oracle 12c was the ability of the database will automatically collect statistics during either a create-table-as-select operation or during the initial insert into a freshly created or freshly truncated table, provide that insert is done in direct-path mode (i.e. using the APPEND hint).
When that occurs, there is an additional operation in the execution plan; OPTIMIZER STATISTICS GATHERING.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | 495K(100)| |
| 1 | LOAD AS SELECT | | | | | |
| 2 | OPTIMIZER STATISTICS GATHERING | | 70M| 11G| 495K (2)| 00:00:20 |
| 3 | TABLE ACCESS FULL | XXXXXXXXXXXXXXX | 70M| 11G| 495K (2)| 00:00:20 |
----------------------------------------------------------------------------------------------------

The motivation for this blog was encountering a bulk insert into a partitioned table where the statistics gathering operation consumed a very significant amount of time. Partitioning gives you more things to consider.

A Simple Test

I created a simple test that compares the time taken by online statistics collection on partitioned and non-partitioned tables, with the explicit collection of statistics using DBMS_STATS. I have four tables with the same structure.

  • T1: Not partitioned. Data will be copied from this table to each of the others. 
  • T2: Partitioned. Online statistics only. 
  • T3: Partitioned. Explicitly gathered statistics. 
  • T4: Partitioned. Explicitly gathered incremental statistics.
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">CREATE TABLE T1 (a number, b varchar2(1000), c number) NOLOGGING;
CREATE TABLE T2 (a number, b varchar2(1000), c number)
PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;
CREATE TABLE T3 (a number, b varchar2(1000), c number)
PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;
CREATE TABLE T4 (a number, b varchar2(1000), c number)
PARTITION BY RANGE (a) INTERVAL(100) (PARTITION t_part VALUES less than (101)) NOLOGGING;

I loaded 100 million rows into each in direct-path mode. The partitioned tables end up with 100 partitions, each with 1 million rows. I have also suppressed redo logging during the direct-path insert by creating the tables with the NOLOGGING attribute.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">EXEC dbms_stats.set_table_prefs(user,'T3','INCREMENTAL','FALSE');
EXEC dbms_stats.set_table_prefs(user,'T4','INCREMENTAL','TRUE');

The following set of tests will be run for different combinations of:

  • Parallel hint on query, or not 
  • Parallel hint on insert, or not 
  • Table parallelism specified 
  • Parallel DML enabled or disabled at session level 
  • Column-specific METHOD_OPT table preference specified or not. 

I enabled SQL trace, from which I was able to obtain the elapsed time of the various statements, and I can determine the amount of time spent on online statistics gathering from timings on the OPTIMIZER STATISTICS GATHERING operation in the execution plan in the trace.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">TRUNCATE TABLE T2;
TRUNCATE TABLE T3;
EXEC dbms_stats.delete_table_stats(user,'T2');
EXEC dbms_stats.delete_table_stats(user,'T3');
EXEC dbms_stats.delete_table_stats(user,'T4');

INSERT /*+APPEND &inshint*/ into T2 i SELECT * /*+&selhint*/ from t1 s;
INSERT /*+APPEND &inshint NO_GATHER_OPTIMIZER_STATISTICS*/ into T3 i SELECT /*+&selhint*/ * from t1 s;
INSERT /*+APPEND &inshint NO_GATHER_OPTIMIZER_STATISTICS*/ into T4 i SELECT /*+&selhint*/ * from t1 s;
commit;
EXEC dbms_stats.gather_table_stats(user,'T3');
EXEC dbms_stats.gather_table_stats(user,'T4');

Quirks

It was while building this test that I discovered a couple of quirks:

What Statistics Are Normally Collected by Online Statistics Gathering? 

After just the initial insert, I can see that I have table statistics on T1 and T2, but not on T3 and T4.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">SELECT table_name, num_rows from user_tables where table_name LIKE 'T_' order by 1; 

TABLE_NAME NUM_ROWS LAST_ANALYZED
---------- ---------- -----------------
T1 10000000 14:07:36 16/01/20
T2 10000000 14:07:36 16/01/20
T3
T4

I also have column statistics on T1 and T2, but no histograms.

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">break on table_name skip 1
SELECT table_name, column_name, num_distinct, global_stats, histogram, num_buckets, last_analyzed
FROM user_tab_columns where table_name like 'T_' order by 1,2;

TABLE_NAME COLUMN_NAME NUM_DISTINCT GLO HISTOGRAM NUM_BUCKETS LAST_ANALYZED
---------- ------------ ------------ --- --------------- ----------- -----------------
T1 A 10000 YES NONE 1 14:06:58 16/01/20
B 10000 YES NONE 1 14:06:58 16/01/20
C 100 YES NONE 1 14:06:58 16/01/20

T2 A 10000 YES NONE 1 14:07:11 16/01/20
B 10000 YES NONE 1 14:07:11 16/01/20
C 100 YES NONE 1 14:07:11 16/01/20

T3 A NO NONE
B NO NONE
C NO NONE

T4 A NO NONE
B NO NONE
C NO NONE

However, I do not have any partition statistics (I have only shown the first and last partition of each table in this report).

#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">break on table_name skip 1
SELECT table_name, partition_position, partition_name, num_rows
FROM user_tab_partitions WHERE table_name like 'T_' ORDER BY 1,2 nulls first;

TABLE_NAME PARTITION_POSITION PARTITION_NAME NUM_ROWS LAST_ANALYZED
---------- ------------------ -------------------- ---------- -----------------
T2 1 T_PART

100 SYS_P20008

T3 1 T_PART

100 SYS_P20107

T4 1 T_PART

100 SYS_P20206

Online optimizer statistics gathering only collects statistics at table level but not partition or sub-partition level. Histograms are not collected.

From Oracle 18c, there are two undocumented parameters that modify this behaviour. Both default to false. Interestingly, both are enabled in the Oracle Autonomous Data Warehouse.

  • If _optimizer_gather_stats_on_load_hist=TRUE histograms are be collected on all columns during online statistics collection. 
  • If _optimizer_gather_stats_on_load_all=TRUE statistics are collected online during every direct-path insert, not just the first one into a segment. 

Do I Need Partition Statistics?

Statistics will be collected on partitions that do not have them when the automatic statistics collection job runs in the next database maintenance window. The question is whether to manage without them until then?
"The optimizer will use global or table level statistics if one or more of your queries touches two or more partitions. The optimizer will use partition level statistics if your queries do partition elimination, such that only one partition is necessary to answer each query. If your queries touch two or more partitions the optimizer will use a combination of global and partition level statistics."
 – Oracle The Data Warehouse Insider Blog: Managing Optimizer Statistics in an Oracle Database 11g - Maria Colgan
It will depend upon the nature of the SQL in the application. If the optimizer does some partition elimination, and the data is not uniformly distributed across the partitions, then partition statistics are likely to be beneficial. If there is no partition elimination, then you might question whether partitioning (or at least the current partitioning strategy) is appropriate!

What is the Fastest Way to Collect Statistics on Partitioned Tables?

Let's look at how long it takes to insert data into, and then subsequently collect statistics on the tables in my example. This test was run on Oracle 19c on one compute node of a virtualised Exadata X4 machine with 16 CPUs.  This table shows elapsed time and the total DB time include all parallel server processes for each operation.

Table Name
Oper-ation Comment Option Serial Insert & Statistics Parallel Insert & Statistics Parallel SQL & Statistics Parallel DML, Insert, Select & Statistics Parallel DML, SQL & Statistics Parallel Tables Parallel Tables & DML Parallel Tables, DML & Method Opt
Table
NOPARALLEL NOPARALLEL NOPARALLEL NOPARALLEL NOPARALLEL PARALLEL PARALLEL PARALLEL
Insert Hint blank PARALLEL(i) blank PARALLEL(i) blank blank blank blank
Select Hint blank PARALLEL(s) PARALLEL PARALLEL(s) PARALLEL blank blank blank
Parallel DML DISABLE DISABLE DISABLE ENABLE ENABLE DISABLE ENABLE ENABLE
Stats Degree none DEFAULT DEFAULT DEFAULT DEFAULT none none none
Method Opt none none none none none none none ... FOR COLUMNS SIZE 1 A
T2
Insert
Online Stats Gathering
Elapsed Time (s)
172.46
160.86
121.61
108.29
60.31
194.47
23.57
20.57
Optimizer Statistics Gathering
82.71
55.59
55.90
-
-
-
-
-
T3
Insert
NO_GATHER_OPTIMIZER_STATS
125.40
156.36
124.18
20.62
29.01
199.20
20.97
21.15
Explicit Stats
122.80
146.25
63.23
15.99
24.88
24.58
24.99
24.62
T4
Insert
NO_GATHER_OPTIMIZER_STATS
123.18
158.15
147.04
20.44
29.91
204.61
20.65
20.60
Incremental Explicit Stats
80.51
104.85
46.05
23.42
23.14
23.21
22.60
23.03
T2
Insert
Online Stats Gathering
DB Time (s)
174
163 169 359 337 248 366 308
T3
Insert
NO_GATHER_OPTIMIZER_STATS
128
193 160 290 211 236 312 326
Explicit Stats
122
146 63 265 305 262 335
T4
Insert
NO_GATHER_OPTIMIZER_STATS
126
194 167 295 205 233 304 295
Incremental Explicit Stats
80
105 2 281 266 300 179 226
  • It is difficult to determine the actual duration of the OPTIMIZER STATISTICS GATHERING operation, short of measuring the effect of disabling it. The time in the above table has been taken from SQL trace files. That duration is always greater than the amount saved by disabling online statistics gathering with the NO_GATHER_OPTIMIZER_STATS hint. However, the amount of time accounted in Active Session History (ASH) for that line in the execution plan is usually less than the elapsed saving. 
    • Eg. For the sequential insert, 83s was accounted for OPTIMIZER STATISTICS GATHERING in the trace, while ASH showed only 23s of database time for that line of the plan. However, perhaps the only meaningful measurement is that disabling online statistics gathering saved 47s, 
  • DML statements, including insert statements in direct-path, only actually execute in parallel if parallel DML is enabled. Specifying a degree of parallelism on the table, or a parallel hint is not enough. Parallel DML should be enabled.
    • either at session level
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">ALTER SESSION ENABLE PARALLEL DML;
    • or for the individual statement.
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">insert /*+APPEND ENABLE_PARALLEL_DML*/ into T2 SELECT * from t1;
    • Specifying parallel insert with a hint, without enabling parallel DML will not improve performance and can actually degrade it.
    • Specifying parallel query without running the insert in parallel can also degrade performance.
  • Online statistics will be collected in parallel if
    • either the table being queried has a degree of parallelism,
    • or a parallel hint applies to the table being queried, or the entire statement,
    • or parallel DML has been enabled 
  • Where statistics are collected explicitly (i.e. with a call to DBMS_STATS.GATHER_TABLE_STATS) they are collected in parallel if 
    • either, the DEGREE is specified (I specified a table statistics preference),
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">EXEC dbms_stats.set_table_prefs(user,'T3','DEGREE','DBMS_STATS.DEFAULT_DEGREE');
    • or the table has a degree of parallelism.
#eeeeee; border: 0px solid #000000; font-family: "courier new"; font-size: 98%; line-height: 1.0; overflow: auto; padding-left: 2px; padding-right: 2px; width: 95%;">ALTER TABLE T3 PARALLEL;
  • Incremental statistics are generally faster to collect because they calculate table-level statistics from partition-level statistics, saving a second pass through the data.
  • When parallel DML is enabled at session level, I found that the performance of statistics collection also improves.

Conclusion

Overall, the best performance was obtained when the tables were altered to use parallelism, and parallel DML was enabled; then the query, insert and statistics collection are performed in parallel.
However, the improved performance of parallelism comes at a cost.  It can be a brutal way of bringing more resource to bear on an activity.  A parallel operation can be expected to use more database time across all the parallel server processes than the same operation processed serially.  My best results were obtained by activating all of the CPUs on the server without regard for any other activity.  Too many concurrent parallel operations have the potential to overload a system.  Remember also, that while the parallel attribute remains on the table any subsequent query will also run in parallel.
Suppressing online statistics collection saves total database time whether working in parallel or not. The saving in elapsed time is reduced when the insert and query are running in parallel.  The time taken to explicitly collect statistics will exceed that saving because it is doing additional work to collect partition statistics not done during online statistics collection.
Using incremental statistics for partitioned tables will also reduce the total amount of work and database time required to gather statistics, but may not significantly change the elapsed time to collect statistics.
If you need table statistics but can manage without partition statistics until the next maintenance window, then online statistics collection is very effective. However, I think the general case will be to require partition statistics, so you will probably need to explicitly collect statistics instead.  If you want histograms, then you will also need to explicitly collect statistics.

Structuring Content : Think Pyramid!

https://oracle-base.com/blog/wp-content/uploads/2020/01/pyramid-2611048_... 300w" sizes="(max-width: 308px) 85vw, 308px" />

This post is about making sure you get your message across by structuring your content correctly, and giving people convenient jump-off points when they’ve hit the level that is appropriate for them.

In my post about plain language, I mentioned a discussion on structuring content, and linked to a previous post called The Art of Miscommunication. I thought it was worth digging into content structure a little more.

We can think of the content structure as a pyramid. Starting at the top we keep things short and simple, then each layer down gets progressively more detailed. A person consuming the content can get to a level appropriate for them and stop, safe in the knowledge they have not missed something vital.

Level 1

What do I need to know about this content?

  • What is it about?
  • Is it really meant for me?
  • Are there any actions assigned to me?
  • Is it even worth my time reading this, or have I been included for the hell of it?

If we think about emails, blog posts and videos, it’s important we understand what the content is about early. This allows us to decide if this is the right content for us. In my opinion this is about the subject of the email, or title of blogs and videos, along with a short first paragraph.

Using the example of an email, it might be important that some of the management chain understand a situation is happening, but they may not understand the detail of the issue, or have time to dig into it further.

Here is a silly example of an email subject and first paragraph, which represents how I think the top level of the pyramid should work.

“Payroll run failed. Will be fixed by this afternoon!

This morning’s payroll run failed. Jayne is on the case, diagnosed the problem and is confident it will be fixed by this afternoon. The P1 incident number is INC123456.”

It’s short! It tells me what I need to know. It gives me some confidence I don’t need to worry about things unless I hear different. At this point I might feel safe to jump out of the email. I know it’s a Priority 1 (P1) incident, which means it will have a wash-up meeting, so I don’t need to respond asking someone to document and communicate the root cause. I feel most higher-level managers should be happy with this, and be able to duck out now.

Level 2

This is where we add a little more detail. We are still going to keep things short and simple. We will already have lost some of the readers, so the people left behind are here because they want something with a bit more depth. Maybe something like this.

“At 06:18 we got a notification to say the payroll process had died. It got escalated to Jayne, who checked the application logs. It looked like the payroll run had been hanging for a while and then died.

She asked the DBAs to check the database while she checked the OS layer on the app server. The DBAs said the database went really quiet at that time, like no new requests were coming through from the app layer, but didn’t think it was the database that was causing the problem.

Jayne noticed a Jenkins agent on the app server was grabbing 100% of the CPU, which is what killed the payroll run.

The Jenkins agent was restarted. The payroll run was restarted. Everyone is monitoring it, and they’re confident it will complete by 13:00.”

There is no evidence here, but it clearly describes what happened, and what is being done about it. If I didn’t feel confident about closing the email after the first level, I should now.

Level 3 and Beyond

In the case of an email, I don’t think anything beyond this point makes sense. Large emails and posts look daunting, and I get the impression people just “file them” to be looked at later. Maybe that’s just me. </p />
</p></div>

    	  	<div class=

SQLcmd on my Mac: Seg fault

trying to connect to SQL Server from my Mac so install sqlcmd:

brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
brew install mssql-tools

then get error

$ sqlcmd -S kylelfsqls.xxxxxx.us-west-2.amazonaws.com,1433 -U kylelf  
Sqlcmd: Error: Microsoft ODBC Driver 17 for SQL Server : SSL Provider: [OpenSSL library could not be loaded, make sure OpenSSL 1.0 or 1.1 is installed].
Sqlcmd: Error: Microsoft ODBC Driver 17 for SQL Server : Client unable to establish connection.
Segmentation fault: 11

Did’t find much info google, so blogging here.
Ran

ln -s /usr/local/opt/openssl@1.1 /usr/local/opt/openssl

and it worked

WITH Subquery

Here’s another anomaly that appears when you mix and match Oracle features. In this case it’s “With” subqueries (common table expressions / CTEs) and Active Dataguard (ADG) Standby databases. The problem appeared on the Oracle-l listserver and luckily for the OP another member of the list had seen it before and could point to a relevant MOS document id which explained the issue and supplied a workaround.

The OP had their standby database opened read-only for reporting and found the following oddity in the extended SQL trace file for one of their reports:


WAIT #140196872952648: nam='db file scattered read' ela= 1588 file#=4097 block#=579715946 blocks=128 obj#=-39778567 tim=17263910670242
WAIT #140196872952648: nam='db file scattered read' ela= 1495 file#=4097 block#=579715947 blocks=128 obj#=-39778567 tim=17263910672065
WAIT #140196872952648: nam='db file scattered read' ela= 1671 file#=4097 block#=579715948 blocks=128 obj#=-39778567 tim=17263910674042
WAIT #140196872952648: nam='db file scattered read' ela= 1094 file#=4097 block#=579715949 blocks=128 obj#=-39778567 tim=17263910675443

Before pointing out the oddity (if you haven’t spotted it already) I’ll just explain a few of the numbers thayt are a little unusual.

  • File# = 4097: the user has parameter db_files = 4096, so this is the first Temp file.
  • Block# = 579,715,946: the database is 120TB, and the temporary tablespace is a “bigfile” tablespace so it’s okay for the file to hold more than 579M blocks.
  • Obj# < 0: Negative object numbers is a characteristic of materialized CTEs: if you look at the execution plan a materialized CTE will be reported as a table with a name like  SYS_TEMP_FDA106F9_E259E68.  If you take the first hexadecimal number and treat is as a 32-bit signed integer you get the value that would be reported as the obj# in the trace file.  (Converting to decimal and subtract power(2,32) is one way of doing the arithmetic).
  • tim= nnnnnnnn:  this is the timestamp (usually in microseconds), and we can see intervals of roughly 1,400 to 2,000 microseconds between these lines.

So here’s the oddity: in this set of 4 consecutive waits we’re waiting for multiblock reads of 128 blocks – but each read starts one block after the previous read. It’s as if Oracle is reading 128 blocks and forgetting everything after the first one. And the timestamps are significant because they tell us that this isn’t a case of Oracle spending so much time between reads that the other blocks fall off the end of  the buffer cache before the query reaches them.

I think I’ve seen a pattern like this once before but it would have been quite a long time ago and I can’t find any notes I might have made about it (and it turns out that my previous experience was not relevant to this case). Fortunately another member of Oracle-l had also seen the pattern and supplied the solution through a reference to a MOS document that led to: Doc ID 2251339.1 With Subquery Factorization Temp Table Does Not Cache in Standby in 12.1.0.2.

It’s not a bug – Oracle is supposed to do this if you manage to materialize a CTE in a Read-only Standby database. I don’t understand exactly why there’s a problem but thanks to some feature of how consistent reads operate and block SCNs are generated when you populate the blocks of the global temporary table (GTT) that is your materialized CTE it’s possible for Oracle to produce the wrong results if it re-visits blocks that have been read into the cache from the GTT. So when you do a multiblock read during a tablescan of the GTT Oracle can use the first block it has read (presumably because it’s immediately pinned), but can’t use the remaining 127 – and so you get the odd pattern of consecutive blocks appearing at the start of consecutive multiblock reads.

This raises a couple of interesting (and nasty) questions.

  • First – does every 128 block read get read to the middle of the buffer cache, pushing another 128 blocks out of the buffer cache or does Oracle automatically read the blocks to the “cold” end of the LRU, minimising the impact on the rest of the cache; we hope it’s the latter.
  • Second – If I use a small fetch size while running my query might I find that I have to re-read the same block (with its 127 neghbours) many times because Oracle releases any pinned blocks at the end of each fetch and has to re-acquire the blocks on the next fetch.

If anyone wants to test the second question by running a query from SQL*Plus with extended trace enabled the following simple query should answer the question:

alter session set events '10046 trace name context forever, level 8';
set arraysize 2

with v1 as (select /*+ materialize */ * from all_objects)
select object_name from v1;

Workarounds

There is a workaround to the issue – you can add the hint /*+ inline */ to the query to ensure that the CTE is not materialized. There is a bit of a catch to this, though (on top of the fact that you might then need to have two slightly different versions of the code if you want to run the query on production and standby) – if Oracle places the subquery text inline the optimizer may manage to merge it into the rest of the query and come up with a bad execution plan. Again you can probably work around this threat by extending the hint to read: /*+ inline no_merge */. Even then the optimizer could decide it has better statistics about the “real” table columns that it might have lost when it materialized the subquery, so it could still produce a different execution plan from the materialized plan.

As an alternative (and somewhat more brutal) workaround you could set the hidden parameter “_with_subquery” to inline either at the session or system level, or in the startup parameter file.

 

Increasing headcount is probably not the answer!

https://oracle-base.com/blog/wp-content/uploads/2020/01/crowd-2815536_64... 300w" sizes="(max-width: 263px) 85vw, 263px" />

I’m incredibly irritated by tech people using headcount as a reason for their problems. From my experience, throwing bodies at problems is rarely the correct answer.

Increasing headcount only makes sense if:

  • You understand the problem.
  • You’ve defined the corrective actions.
  • You have processes in place to make new people productive quickly.

If you don’t understand the problem, and don’t already have a plan for solving it, hiring a load of people isn’t going to help you. It can actually make things worse. At best they will sit around and do nothing. At worst, they will start working and come up with a bunch of random “solutions” to your problems, which can leave you in a worse position than you started. Supporting a bunch of random crap is no fun at all.

My first job was a great example of doing things the right way.

  • The company signed a new customer. The software was used to track drug trials. Each trial had a unique identifier. The customer wanted to be able to refer to trials using the unique identifier, or a free text alias. This meant adding a trial alias to loads of screen in the application. There was also a need to upgrade the whole application from Forms 3.0 to Forms 4.0.
  • The analysis was done. Two procedures were defined and documented. One procedure gave directions on performing the upgrade. One procedure gave directions on adding the trial alias to the forms that needed it.
  • In addition to the existing workforce, the company hired four new people. Two were computer science graduates. Two, including me, were not. None of us had previous Oracle Database or Oracle Forms experience. After some basic training, we were put to work upgrading the forms and adding in the trial alias code.
  • It worked fine, because despite us being not much more than trained monkeys, the prerequisites had been put in place to allow someone with a PhD in cabbage sex to be a productive member of the team. There were no major blockers or constraints to deal with.

I’ve also seen it done the wrong way a bunch of times, but I’m not going to go there as it’s too depressing, and the people and companies involved will recognise themselves…

There are situations where bodies can help, but randomly throwing people at something is not a great solution if you’ve not put in the necessary effort up front to make it work. You should also be asking how many of the tasks should really be automated, so humans can be allocated to something more meaningful. In these cases, even if extra bodies could work, your focus should be on solving the root cause, not papering over the cracks.

When I discuss headcount, or throwing bodies at a problem, I could be talking about hiring more permanent staff, temporary staff or outsourcing projects. There is a lot to be said for the old saying, “You can’t outsource a problem!”, but it could easily be, “You can’t headcount a problem!” </p />
</p></div>

    	  	<div class=

Migrating Oracle Exadata Workloads to Azure

I know, I know-  there’s a number of you out there thinking-

I’m being brought in on more and more of these projects due to a recent change for some Exadata backup components many companies kept onsite, (parts that wear out more often, like PDUs and cell disks) which are no longer an option and that moving to Azure is a viable option for these workloads if you know what to identify and address before the move to the cloud.

Engineered, Not Iron

An Exadata is an engineered system-  database nodes, secondary cell nodes, (also referred to as storage nodes/cell disks), InfiniBand for fast network connectivity between the nodes, specialized cache, along with software features such as Real Application Clusters, (RAC), hybrid columnar compression, (HCC), storage indexes, (indexes in memory) offloading technology that has logic built into it to move object scans and other intensive workloads to cell nodes from the primary database nodes.  There are considerable other features, but understanding that Exadata is an ENGINEERED system, not a hardware solution is important and its both a blessing and a curse for those databases supported by one.  The database engineer must understand both Exadata architecture and software along with database administration.  There is an added tier of performance knowledge, monitoring and patching that is involved, including knowledge of the Cell CLI, the command line interface for the cell nodes.  I could go on for hours on more details, but let’s get down to what is required when I am working on a project to migrate an Exadata to Azure.

Focus on the Right Thing

I discovered that many times, it was natural for many to focus on the engineered system resources and attempt to duplicate this in Azure.  Although there are numerous features that are similar to Exadata in Azure Data Warehouse, it’s still not the correct path to focus on the engineered system.

Focus on the Database Workload.

Oracle offers us this in the way of an Automatic Workload Repository, (AWR) report.  This data is always being collected, so no extra workload will be requested on the production system.  Simply generate the report on the repository and review the information provided on CPU, memory, IO, along with any red flags that should be addressed before migrating to the cloud.

It allows us to break down workloads by databases.  For most Exadata investments, there was one or two databases that were the primary reason for the Exadata and the main consumer of resources.  By isolating by database, it allows for better sizing, pricing and licensing cost.

With the breakdown by database, we’re also evolving a spreadsheet with the information we need to calculate the Azure resources that would be needed in a lift and shift.  This makes it simpler to consume by the customer if they’re interested, too.

https://dbakevlar.com/wp-content/uploads/2020/01/awr_sizing_xls-300x146.png 300w, https://dbakevlar.com/wp-content/uploads/2020/01/awr_sizing_xls-768x374.png 768w, https://dbakevlar.com/wp-content/uploads/2020/01/awr_sizing_xls.png 1507w" sizes="(max-width: 800px) 100vw, 800px" />

Its an Exadata Thing

The next step is to identify what Exadata specific features are in play that won’t come over in a lift/shift-

  1. Storage indexes that will need to be physically created that currently only exist in memory for smart scans.
  2. What offloading to cell nodes happens currently and will require faster IO and added resources to support on the IaaS cloud solution?
  3. What HCC is in place and the added IO pressure on the system once it no longer exists? What compression options are available in Oracle on IaaS for the customer to take advantage of?
  4. What objects, etc. reside in the different caches?  Flash Cache for recently used objects must be listed and tested for latency.
  5. Was the SGA shrunk to force offloading and does this mean the new databases require more memory than what is currently allocated?
  6. If using RAC, (expected 98% of time on Exadata) is there a plan to stay RAC and if so, is it justified and if not, ensure to calculate for the savings from global cache no longer needed, but total for the needs of each node’s current workload.
  7. Flash log–  without this temporary store for low-latency redo log writes, latency impacts must be identified and optimized through other means.

All of this data is captured in some form, within the AWR reports and via queries from the Cell CLI.  I commonly request a one week snapshot report of AWR from each database on the Exadata.  If there are monthly, quarterly or yearly reports, this can be addressed farther into the project, but gaining an idea of the workloads for each of the databases is my first step.

  1. Enter in all pertinent data into the spreadsheet, including OS, db version, RAC/Non-RAC, memory, CPU/Core, etc.
  2. Trap the percentage of CPU, memory and IO being used by the workload vs. what is available.
  3. Review the AWR and perform a database health check, including any recommendations for issues that should be addressed before the database is moved to the cloud.
  4. Identify any areas that need deeper investigation- odd block sizes, non-default parameter settings, running maintenance/admin jobs that need to be addressed, etc.

I do provide a full healthcheck report of the databases.  Its not something they often receive, so to have this review is incredibly beneficial to their long-term satisfaction with their cloud migration.  Who wants to migrate over a problem with their database?  Resolving it before the cloud migration saves everyone time, money and headaches.

Second phase is to run Exadata specific collections:

  1. Query the cell cli for offloading percentages
  2. Pull the flashcache information from the AWR
  3. Query the amount of flash log that is in use and how often
  4. Query the objects that are currently using HCC, along with compression ratios, uncompressed sizes, IO percentages for workload
  5. Identify storage indexes used for smart scans.  This includes columns and object name they reside on.
  6. Inspect the PGA, is there tons of multi-pass or is the PGA sized correctly?  Will this be sufficient once off of Exadata where some hash joins may be run on cell nodes?

With this data collected, the next step is unique to the customer database, but most often requires:

  • Physical indexes created in the production database, but left invisible and only made visible after the migration to Azure.
  • Implement new compression options that are available in the version of Oracle in Azure.
  • Resize the SGA and PGA to address any shrinking that was done on the Exadata.
  • Pin objects in memory and other cache features for better performance.
  • Optimize any SQL that relied on the Exadata offloading to exist previously, where optimization should have been the first route taken.
  • Place redo logs on ultra disk if required
  • If there is a specific workload that was using high degree of parallelism,(DOP) to run, it may be time to discuss an Azure DW solution to process the data using Azure Data Factory to push the ELT.  Using the right tool for the job is always preferable and there are times where the massive parallel processing and ability to offload to a solution that can handle the workload long term without impacting the existing data store is preferential.

Testing Without Features on Exadata

There are numerous ways you can “shut off” Exadata features.  If the space exists on the Exadata or a test database(s) exist, the following parameter changes can be used to do this or it can be done as an optimization process as part of the lift and shift.

Stop offloading to cell nodes, disable bloom filters and disable storage indexes:

alter system set cell_offload_processing=false;
alter system set #555555; text-transform: none; text-indent: 0px; letter-spacing: normal; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; word-spacing: 0px; display: inline !important; white-space: normal; orphans: 2; float: none; -webkit-text-stroke-width: 0px; background-color: #ffffff;"> _BLOOM_FILTER_ENABLED=false;
alter system set #555555; text-transform: none; text-indent: 0px; letter-spacing: normal; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; word-spacing: 0px; display: inline !important; white-space: normal; orphans: 2; float: none; -webkit-text-stroke-width: 0px; background-color: #ffffff;">_KCFIS_STORAGEIDX_DISABLED=true;

You can then verify offloading to cell nodes is no longer occurring with the following query:

select a.name, b.value/1024/1024 MB from v$sysstat a, v$sysstat b
where a.statistics#=b.statistics# and (a.name in ('physical read total bytes', 
'physical write total bytes', 'cell IO uncompressed bytes')
or a.name like 'cell phy%');

The results should no longer show cell activity once this is put into place and you can check offloading and smart scans.  SGA can be increased and other features will still be in play, but this can then be a step by step process as we migrate to the cloud.  It also gives us the ability to shut off features and test in a controlled situation.

I won’t say migrating off of Exadata is easy, but I will say it’s easily doable.  The main goal is to ensure that customers are satisfied in the long run and this is where the steps you see in this post are important.  It’s not as much a lift and shift as it is a lift and build to the cloud.  This would be a requirement in going to any non-Exadata solution and this one will save the customer money and grant them the scalability they didn’t have with an on-prem engineered solution.

 

 



Tags:  , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright ©  [Migrating Oracle Exadata Workloads to Azure], All Right Reserved. 2020.

UTL_FILE_DIR and 18c revisited

A couple of years back (wow…time flies!) I made a video and a post about the de-support of UTL_FILE_DIR in 18c. This was good thing because the number of people opting for “utl_file_dir=*” in their init.ora file, and hence opening themselves up to all sorts of risks seemed to be large! (The post elaborates on this more with an example of erasing your database with a short UTL_FILE script Smile)

But a friend came to me with a valid rebuttal, namely that while most people tended to use UTL_FILE_DIR to read/write files to a small list of directories (most typically one), other customers had taken advantage of UTL_FILE to utilise complex directory structures with hundreds or even thousands of sub-folders. Converting that infrastructure (which is trivially implemented with “utl_file_dir=*” even with all its risks) is a tougher proposition when using directory objects in the database, because a single directory object points to a single directory.

To assist with moving away from utl_file_dir in such a circumstance, I’ve whipped up a small utility which aims to be a drop-in replacement for your UTL_FILE.FOPEN calls. The intent of the utility is that moving to the new directory objects method should simply be the case of adding a nested call. Thus if you had an existing file open call of:


f utl_file.fopen('/u01/app/oracle','myfile.dat',...);

you simply nest the utility function as follows:


f utl_file.fopen( utl_file_dir('/u01/app/oracle'),'myfile.dat',...);

and you should be good to go.

The way this is achieved is that the utility does its best to come up with a sensible but unique name for a directory object to map to the standard OS path that has been passed. The comments in the code explain the workings, and of course you’re free to modify it to suit your needs.



create or replace
function utl_file_dir(p_file_or_dir varchar2, 
                      p_create_dir boolean default true, 
                      p_add_hash boolean default true) return varchar2 is
  pragma autonomous_transaction;

  l_default_dir varchar2(128) := 'TEMP';
  l_delim       varchar2(1)   := '/';   -- change for Windows

  l_last_delim  int           := instr(p_file_or_dir,l_delim,-1);
  l_file        varchar2(255); 
  l_dir_path    varchar2(255);
  l_dir_object  varchar2(255);
  l_exists      int;
  l_clash       int;
begin
  if l_last_delim = 0 then
     --
     -- If no presence of a directory delimiter, then we assume the entire
     -- string is a file if it contains a '.' (ie, a file extension)
     -- and return a default directory object name (l_default_dir)
     -- otherwise we assume the string is a directory in its own right
     --
     if p_file_or_dir like '%.%' then
       l_dir_object := l_default_dir;
     else
       l_dir_path   := p_file_or_dir;
     end if;
  else
     --
     -- We have a delimiter. The directory is the contents up to
     -- the last delimiter, unless there is no file extension past
     -- that last delimiter. In that latter case, we assume the entire
     -- string is a directory in its own right
     --
     l_file      := substr(p_file_or_dir,l_last_delim+1);
     if l_file like '%.%' then
       l_dir_path     := substr(p_file_or_dir,1,l_last_delim-1);
     else
       l_dir_path     := p_file_or_dir;
     end if;
  end if;

  --
  -- Now we make a clean directory object name from the path. We could
  -- of course use any random string, but this is designed to make things
  -- a little more intuitive. 
  -- 
  -- For example '/u01/app/oracle' will become U01_APP_ORACLE
  --
  -- You have a choice here in terms of the risk element of collisions depending 
  -- on how loose your folder structure is.  For example, the two paths:
  --
  --   /u01/app/oracle/
  --   /u01/app/"oracle-"/
  --
  -- by default will map to the same clean name of U01_APP_ORACLE and we will   
  -- report an error in this instance.
  -- 
  -- Alternatively (and the default) is that we take our directory path and 
  -- grab the last few bytes from a MD5 hash on it, to greatly increase the likelihood
  -- of a non-clashing directory name.  In the above example, the clean directory names become
  --
  --   U01_APP_ORACLE_25B9C47A
  --   U01_APP_ORACLE_7D51D324
  -- 
  -- So what you lose in intuitive readability you gain in reduced chance of collision.
  -- This is controlled with "p_add_hash"
  --
  if l_dir_object is null then
     l_dir_object := regexp_replace(replace(replace(l_dir_path,l_delim,'_'),'-','_'),'[^[:alnum:] _]');
     l_dir_object := regexp_replace(trim('_' from upper(regexp_replace(l_dir_object,'  *','_'))),'__*','_');
     if p_add_hash then
       select substr(l_dir_object,1,119)||'_'||substr(standard_hash(l_dir_path,'MD5'),1,8)
       into   l_dir_object
       from   dual;
     else
       l_dir_object := substr(l_dir_object,1,128);
     end if;
  end if;

  -- Now we go ahead and create that directory on the database.
  -- The user running this function must have CREATE ANY DIRECTORY privilege granted
  -- explicitly, which means of course, you should protect this routine and perhaps add
  -- some sanity checking to make sure that no-one creates a directory to reference (say) 
  -- the objects in V$DATAFILE !
  
  if p_create_dir then
    select count(*),
           count(case when directory_path != l_dir_path then 1 end) 
    into   l_exists,
           l_clash
    from   all_directories
    where  directory_name = l_dir_object;

    if l_exists = 0 then
      execute immediate 'create directory "'||l_dir_object||'" as q''{'||l_dir_path||'}''';
    else
      --
      -- If (hash or not) we enter the nasty situation where the same clean name would
      -- map to 2 path names, we give up and go home.
      --
      if l_clash > 0 then
        raise_application_error(-20000,'Found matching directory object '||l_dir_object||' with different path from >'||l_dir_path||'<');
      end if;
    end if;
  end if;
  
  commit;
  return l_dir_object;
end;
/
sho err

Here are some examples of typical usage. By default we would actually create the directory object, but you can override this for simple testing like I’ve done below



SQL> -- Examples
SQL>
SQL> variable dirname varchar2(128)
SQL>
SQL> -- standard file
SQL> exec :dirname := utl_file_dir('/u01/app/oracle/test.dat',p_create_dir=>false)

PL/SQL procedure successfully completed.

SQL> print dirname

DIRNAME
-----------------------------------------------------------------------------------------------
U01_APP_ORACLE_9E2472BB

SQL>
SQL> -- quoted/spaces etc
SQL> exec :dirname :=  utl_file_dir('/u01/"asd app"/oracle/test.dat',p_create_dir=>false)

PL/SQL procedure successfully completed.

SQL> print dirname

DIRNAME
-----------------------------------------------------------------------------------------------
U01_ASD_APP_ORACLE_FFDC5BEA

SQL>
SQL> -- trailing delimiter.
SQL> exec :dirname :=  utl_file_dir('/u01/app/oracle/',p_create_dir=>false)

PL/SQL procedure successfully completed.

SQL> print dirname

DIRNAME
-----------------------------------------------------------------------------------------------
U01_APP_ORACLE_25B9C47A

SQL> exec :dirname :=  utl_file_dir('/u01/app/oracle--/',p_create_dir=>false)

PL/SQL procedure successfully completed.

SQL> print dirname

DIRNAME
-----------------------------------------------------------------------------------------------
U01_APP_ORACLE_7D51D324

SQL>
SQL> -- no file
SQL> exec :dirname :=  utl_file_dir('/u01/app/oracle',p_create_dir=>false)

PL/SQL procedure successfully completed.

SQL> print dirname

DIRNAME
-----------------------------------------------------------------------------------------------
U01_APP_ORACLE_9E2472BB

SQL>
SQL> -- no delimiter
SQL> exec :dirname :=  utl_file_dir('mydir',p_create_dir=>false)

PL/SQL procedure successfully completed.

SQL> print dirname

DIRNAME
-----------------------------------------------------------------------------------------------
MYDIR_33CD7707

SQL>
SQL> -- no delimiter but probably a file
SQL> exec :dirname :=  utl_file_dir('mydir.txt',p_create_dir=>false)

PL/SQL procedure successfully completed.

SQL> print dirname

DIRNAME
-----------------------------------------------------------------------------------------------
TEMP

TL;DR: This utility lets moving away from UTL_FILE_DIR be less difficult.

Code at https://github.com/connormcd/misc-scripts