Search

OakieTags

Who's online

There are currently 0 users and 44 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

APEX Woman In Technology Award for 2014

Based on the advise of a colleague, and also because of my desire to become more involved in the wider technology community in Colorado, I joined the Colorado Technology Association (CTA) last year.  I had been involved with the Rocky Mountain Oracle Users Group (RMOUG) for over 20 years, much of that time on the board of directors, including two 3-year stints as president, but I wanted to participate in the wider community beyond my beloved Oracle technology community.

I’ve only attended three CTA events to date, but this past summer I learned that CTA is a great advocate of Women In Technology (WIT) and has done a great deal of work to encourage women to enter the technology industry and to stay in the industry.

When I began my own career in the IT industry in the early 1980s, I recall many women as colleagues, working as programmers and system administrators.  Imperceptibly, over the intervening years, the proportion of women in IT seems to have declined.  When my wife Kellyn Pot’vin-Gorman and I began dating in 2011, she made me aware that the number of women had dropped precipitously, to less than 25% overall.

It is a confusing trend.  Information technology, especially the technical jobs like programming, application development, testing, and systems administration, should be dominated by the girls with whom I grew up, who were so much better at math, science, and physics than us boys.  Society places pressures on girls and women to consider family ahead of career, but that makes IT even more of a natural fit, particularly today with the ability to work remotely from anywhere.

What the heck is going on?  Why is this happening?  Why has a profession in which logic and cerebral prowess is so important proved to be so antediluvian?

Kellyn has been concerned as well, and she is alarmed that she is becoming increasingly more isolated as the only woman in the room in many places in the IT industry.  She’s been acutely aware of the exodus of women from the IT industry, she has heard the explanations by women leaving, and has felt the pressures, large and small, herself.  I know she loves getting into the geeky stuff up to her elbows, as do I, so it is not through “natural inclination” for her or our women colleagues to leave this work, same as I.  If you read her blog DBAKevlar, you’ll understand and witness her fascination with technology, and you’ll be amazed at her facility in sharing that wonder.  Likewise, her Twitter handle @DBAKevlar is a powerful channel for her interests and fun perspective and has been cited by Forbes magazine as one of the “top 50 twitter handles that everyone interested in Oracle should follow“.

My wife is a geek.  And I admire her.

Over the past few years, Kellyn decided to take action.  It is not clear what the root cause(s) or problems might be, so it was not clear what form that action should take.  Everyone has their own personal viewpoint on the matter, ranging from outright misogyny, to hints of discrimination, with flavors at cultural preferences and misunderstandings on occasion, seasoned with outdated stereotypes stirred in for good measure.

It is an explosive brew.

Clearly, anyone who wants to get near to the third rail of sexual politics in the high-flying technology field, has to tread carefully.

But you just can’t tread carefully when the root cause is the third rail itself.

As a technologist, Kellyn had learned that it is easier to socialize a solution if one has marshaled the necessary facts.  She has researched the issues facing women in the IT industry, relying on studies and basic research from academia, government, and professional societies.  She has the facts close to hand, with references.  Coupled with her own first-hand experiences and those of her female colleagues, she started expressing her findings to her colleagues, discussing women’s issues at technology conferences, asking the basic question “what do you think?

She understands that men are not necessarily the full cause of the problem.  Often, it is women who seek to oppress other women, in a bid to be the “queen bee”.  Often, men are undermined or thwarted at any attempts to support the women in their lives.  Consider the ribbing that a man who takes on more responsibility for childcare receives from other women and men, the raised eyebrows, the leering jokes.  The suspicion directed toward a man in the grocery store during work-hours.  The challenges of male nurses, who face the implicit assumption they are homosexual or questions why they are not doctors.

These are the sorts of issues that women in technology face.

At times,

for her many tireless (and sometimes personally costly) efforts encouraging and mentoring women in the technology industry.  About 2 months ago, we learned that she was considered a finalist for the award.

And this past Wednesday evening, at the Hyatt Regency Denver at the Colorado Convention Center, we found that her non-stop work has won her the award for 2014.

Concurrent statistics gathering in 12.1.0.2

Prompted by an actual task at hand I spent some time investigating an 11.2.0.2 feature – concurrent statistics gathering. It has been on my to-do list for quite some time but so far I didn’t have a use case, and use cases make it so much easier. The question was-how can I gather statistics on a really large, partitioned table? Previously, you could revert to the degree in dbms_stats.gather_table_stats to ensure that statistics were gathered in parallel. This is all good, but sometimes you need more umph. Some DBAs wrote scripts to execute individual statistic gathering jobs against partitions in parallel, using the tabname and partname arguments in dbms_stats.gather_table_stats(). But that requires manual effort – and the not-quite-so-new concurrent option is so much nicer. Let me take you along the ride… Actually I have to tell the story starting with the happy ending as I had a few snags along the way. This is 12.1.0.2.1 on Oracle Linux 6.5.

Enabling concurrent statistics gathering

A flag needs to be set to enable the new concurrent mechanism for gathering statistics. The initial blog post on the subject set the flag globally:

https://blogs.oracle.com/optimizer/entry/gathering_optimizer_statistics_is_one

Scroll down to “configuration and settings” for the detail. I personally would rather like to set this at table level, but despite the Oracle documentation suggesting that it was possible, it is not. You can check the header of DBMS_STATS to see the different values that can be passed as PNAME to set_table_prefs().

After enabling the concurrent flag on the database level, you can query it:

SQL> select dbms_stats.get_prefs('CONCURRENT')
  2  from dual;

DBMS_STATS.GET_PREFS('CONCURRENT')
------------------------------------------------------------------------------------
TRUE

SQL> select dbms_stats.get_prefs('CONCURRENT',user,'BIGTAB_PART') flag from dual;

FLAG
-------------------------------------------------------------------------------------
TRUE

We should be ready for using the concurrent method now. Here is some background information about the table in question:

SQL>  select partition_name, num_rows, sample_size, last_analyzed 
  2   from user_tab_partitions where table_name = 'BIGTAB_PART'
  3  ;

PARTITION_NAME                   NUM_ROWS SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- ----------- ---------
SYS_P1254
SYS_P1253
SYS_P1252
SYS_P1251
SYS_P1250
SYS_P1249
SYS_P1248
SYS_P1247
SYS_P1246
SYS_P1245
P_MAN

11 rows selected.

SQL> select partition_name, bytes/power(1024,2) m from user_segments where segment_name = 'BIGTAB_PART';

PARTITION_NAME                          M
------------------------------ ----------
SYS_P1245                            8064
SYS_P1246                            8064
SYS_P1247                            8064
SYS_P1248                            8064
P_MAN                                8064
SYS_P1249                            8064
SYS_P1250                            8064
SYS_P1251                            8064
SYS_P1252                            8064
SYS_P1253                            8064
SYS_P1254                               8

11 rows selected.

So each partition is about 8 GB in size. A little helper script prints all the table preferences:

getting table preferences for table BIGTAB_PART
value for CASCADE is DBMS_STATS.AUTO_CASCADE
value for DEGREE is NULL
value for ESTIMATE_PERCENT is DBMS_STATS.AUTO_SAMPLE_SIZE
value for METHOD_OPT is FOR ALL COLUMNS SIZE AUTO
value for NO_INVALIDATE is DBMS_STATS.AUTO_INVALIDATE
value for GRANULARITY is AUTO
value for PUBLISH is TRUE
value for INCREMENTAL is FALSE
value for STALE_PERCENT is 10
value for CONCURRENT is TRUE

Now you can simply execute the call to dbms_stats.gather_table_stats(user, ‘BIGTAB_PART’, degree=>4) and thanks to all the defaults the gathering of statistics will be performed by multiple jobs. In order for this to work you need to have job_queue_processes set to > 4 (the default is 1000 in 12.1.0.2) and you need a resource manager plan to be active. Refer to the oracle blog post referenced earlier for more information about statement queueing and the execution of these jobs. The post also recommends setting parallel_adaptive_multi_user to false, which is the default in 11.2. Just be careful changing PX parameters-these can have undesired side effects.

Using the diagnostic queries take from the Oracle blog post you can see what’s going on. In my case, 6 jobs were created:

select job_name, state, comments from dba_scheduler_jobs where job_class like 'CONC%'

JOB_NAME             STATE           COMMENTS
-------------------- --------------- ----------------------------------------------------------------------------------------------------
ST$T3606_6_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1254";"MARTIN"."BIGTAB_PART"
ST$T3606_5_B2        RUNNING         "MARTIN"."BIGTAB_PART"."SYS_P1252";"MARTIN"."BIGTAB_PART"."SYS_P1253"
ST$T3606_4_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1250";"MARTIN"."BIGTAB_PART"."SYS_P1251"
ST$T3606_3_B2        RUNNING         "MARTIN"."BIGTAB_PART"."SYS_P1248";"MARTIN"."BIGTAB_PART"."SYS_P1249"
ST$T3606_2_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1246";"MARTIN"."BIGTAB_PART"."SYS_P1247"
ST$T3606_1_B2        RUNNING         "MARTIN"."BIGTAB_PART"."P_MAN";"MARTIN"."BIGTAB_PART"."SYS_P1245"

6 rows selected.

By the way if you are interested in the payload, you can use DBMS_METADATA:

SQL> get get_job
  1* SELECT DBMS_METADATA.GET_DDL('PROCOBJ', '&1') from dual;
SQL>

You pass it the job name and you get some pretty cryptic looking output :)

Interlude

Actually, you have to see this…

DBMS_METADATA.GET_DDL('PROCOBJ','ST$T3982_1')
-----------------------------------------------------------------------------------------


BEGIN
dbms_scheduler.create_job('"ST$T3982_1"',
job_type=>'PLSQL_BLOCK', job_action=>
'declare c dbms_stats.CContext := dbms_stats.CContext(); begin c.extend(30); c(1):=''ST$T3982_1;1;855;1;FALSE;FALSE;''; c(2):=q''#"MARTIN";#''; c(3):=q''#"BIGTAB_PART";#''; c(4):=q''#"SYS_P1250";#''; c(5):=''0;''; c(6):=''FALSE;''; c(7):=q''#FOR ALL
COLUMNS SIZE AUTO;#''; c(8):=''NULL;''; c(9):=''PARTITION;''; c(10):=''TRUE;''; c(11):=q''#NULL;#''; c(12):=q''#NULL;#''; c(13):=q''#NULL;#''; c(14):=''NULL;''; c(15):=''DATA;''; c(16):=''FALSE;''; c(17):=''GLOBAL AND PARTITION;''; c(18):=''TRUE;'';
c(19):=''0;''; c(20):='';''; c(21):=''FALSE;''; c(22):=''FALSE;''; c(23):=''4;''; c(24):=''ST$T3982;''; c(25):=''TRUE;''; c(26):=''FALSE;''; c(27):=''4.562199;''; c(28):=''228.109937;''; c(29):=''0;''; c(30):=''0;''; dbms_stats.gather_table_stats('''
','''',context=>c); end;'
, number_of_arguments=>0,
start_date=>NULL, repeat_interval=>
NULL
, end_date=>NULL,
job_class=>'"CONC_ST$T3982"', enabled=>FALSE, auto_drop=>TRUE,comments=>
'"MARTIN"."BIGTAB_PART"."SYS_P1250"'
);
dbms_scheduler.set_attribute('"ST$T3982_1"','raise_events',38);
dbms_scheduler.enable('"ST$T3982_1"');
COMMIT;
END;

This looks better when you parse it a bit more so that it reads:

DECLARE
  c dbms_stats.CContext := dbms_stats.CContext();
BEGIN
  c.extend(30);
  c(1) :='ST$T3982_1;1;855;1;FALSE;FALSE;';
  c(2) :=q'#"MARTIN";#';
  c(3) :=q'#"BIGTAB_PART";#';
  c(4) :=q'#"SYS_P1250";#';
  c(5) :='0;';
  c(6) :='FALSE;';
  c(7) :=q'#FOR ALL COLUMNS SIZE
AUTO;#';
  c(8) :='NULL;';
  c(9) :='PARTITION;';
  c(10):='TRUE;';
  c(11):=q'#NULL;#';
  c(12):=q'#NULL;#';
  c(13):=q'#NULL;#';
  c(14):='NULL;';
  c(15):='DATA;';
  c(16):='FALSE;';
  c(17):='GLOBAL AND PARTITION;';
  c(18):='TRUE;';
  c(19):='0;';
  c(20):=';';
  c(21):='FALSE;';
  c(22):='FALSE;';
  c(23):='4;';
  c(24):='ST$T3982;';
  c(25):='TRUE;';
  c(26):='FALSE;';
  c(27):='4.562199;';
  c(28):='228.109937;';
  c(29):='0;';
  c(30):='0;';
  dbms_stats.gather_table_stats('','',context=>c);
END;

Interesting!

The concurrent statistics gathering works for incremental statistics as well and is a good way to get the initial set of stats it needs anyway (see Doc ID 1541543.1 for background on incremental statistics gathering).

The end result are statistics!

SQL> select partition_name, num_rows, sample_size, last_analyzed
  2   from user_tab_partitions where table_name = 'BIGTAB_PART';

PARTITION_NAME                   NUM_ROWS SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- ----------- ---------
SYS_P1254                              32          32 07-NOV-14
SYS_P1253                        25600000    25600000 07-NOV-14
SYS_P1252                        25600000    25600000 07-NOV-14
SYS_P1251                        25600000    25600000 07-NOV-14
SYS_P1250                        25600000    25600000 07-NOV-14
SYS_P1249                        25600000    25600000 07-NOV-14
SYS_P1248                        25600000    25600000 07-NOV-14
SYS_P1247                        25600000    25600000 07-NOV-14
SYS_P1246                        25600000    25600000 07-NOV-14
SYS_P1245                        25600000    25600000 07-NOV-14
P_MAN                            25599968    25599968 07-NOV-14

11 rows selected.

Stats not gathered concurrently if partitions are small

This all looks pretty straight forward, except that it wasn’t when I first tried. I set all the parameters as documented but still had no jobs running in the background. The first table I tried to gather statistics on concurrently was partitioned exactly like BIGTAB_PART with the difference that all partitions where compressed for QUERY HIGH. This means that a lot less space is taken for each. I got an idea that segment_size might be reladed, so I compressed BIGTAB_PART and here is the result:

SQL> select partition_name, bytes/power(1024,2) m from user_segments 
  2   where segment_name = 'BIGTAB_PART';

PARTITION_NAME                                   M
------------------------------ -------------------
P_MAN                                       321.75
SYS_P1245                                   321.00
SYS_P1246                                   322.50
SYS_P1247                                   321.50
SYS_P1248                                   320.75
SYS_P1249                                   322.25
SYS_P1250                                   322.25
SYS_P1251                                   322.25
SYS_P1252                                   322.25
SYS_P1253                                   322.00
SYS_P1254                                     8.25

11 rows selected.

When trying to gather statistics concurrently (without changing the table preferences at all), this did not happen. So remember next time you gather statistics concurrently and don’t see multiple jobs for intra-table statistics gathering that your partitions might actually be too small.

Presentations to go to at #DOAG2014

As every year, there’s a long list of great speakers with interesting talks to attend at the DOAG (German Oracle User Group) annual conference. Sadly I cannot attend them all, so I’ve got to make a choice:

First day

Datenbank-Upgrade nach Oracle 12.1.0.2 – Aufwand, Vorgehen, Kunden by Mike Dietrich, Oracle

Die unheimliche Begegnung der dritten Art: XML DB für den DBA by Carsten Czarski, Oracle

Advanced RAC Programming Features by Martin Bach, Enkitec

Automatische Daten Optimierung, Heatmap und Compression 12c live by Ulrike Schwinn, Oracle

Second day

Understanding Oracle RAC Internals – The Cache Fusion Edition by Markus Michalewicz, Oracle

Die Recovery Area: Warum ihre Verwendung empfohlen ist – I have to go to that one because I present it myself :-)

Geodistributed Oracle GoldenGate and Oracle Active Data Guard: Global Data Services by Larry Carpenter, Oracle

Oracle Database In-Memory – a game changer for data warehousing? by Hermann Baer & Maria Colgan, Oracle

Oracle Distributed Transactions by Joel Goodman, Oracle

Third day

High Noon – Bessere Überlebenschancen beim Datenbank Security Shoot Out by Heinz-Wilhelm Fabry, Oracle

Tuning Tools für echte Männer und Sparfüchse – vom Leben ohne EM12c by Björn Rost, portrix Systems

Best Practices in Managing Oracle RAC Performance in Real Time by Mark Scardina, Oracle

Maximum Availability with Oracle Multitenant: Seeing Is Believing by Larry Carpenter, Oracle

Tagged: #DOAG2014

Cache Buffer Chains Demystified

You must have seen sessions waiting on the event “latch: cache buffers chains” from time to time. If you ever wondered what this means and how you can reduce time spent on it, read on. Here you will learn how buffer cache works, how Oracle multi-versioning works, how buffers are allocated and deallocated, what hash chain is and how buffers are linked to it, what the role of cache buffer chain latch is and why sessions wait on it, how to find the objects causing the contention and how to reduce the time spent on that event.

While exploring the reasons for the slowness in database sessions, you check the wait interface and see the following output:

 SQL> select state, event from v$session where sid = 123;

STATE   EVENT
------- ---------------------------
WAITING latch: cache buffers chains

This event is more common, especially in applications that perform a scan of a few blocks of data. To resolve it, you should understand what the cache buffers chains latch is and why sessions have to wait on it. To understand that you must understand how the Oracle buffer cache works. We will explore these one by one, and close with the solution to reducing the cache buffers chains latch waits.

This is the fifth in the Series "100 Things You Probably Didn't Know About Oracle". If you haven't already, I urge you to read the other parts -

  • Part1 (Commit does not force writing of buffers into the disk), 
  • Part2 (How Oracle Locking Works), 
  • Part3 (More on Interested Transaction Lists).
  • Part4(Can I Fit a 80MB Database Completely in a 80MB Buffer Cache?)

You will learn
•    How buffer cache works
•    How buffers are populated
•    About buffer states and versioning
•    How buffers are flushed
•    About the role of Cache Buffer Chain latch
•    How to reduce CBC Latches
•    About other kinds of latches

How Buffer Cache Works

The buffer cache (BC), resident inside the SGA of an Oracle instance is designed to hold blocks that come from the database. When a user issues a statement such as the following:

update EMP
set NAME = ‘ROB’
where EMPNO = 1

the Oracle server process assigned to the session performs the following actions:
1)    locates the block that contains the record with EMPNO = 1
2)    loads the block from the database file to an empty buffer in the buffer cache
3)    if an empty buffer is not immediately found, locates an empty buffer or forces the DBWn process to write some dirty buffers to make room
4)    updates the NAME column to ROB in the buffer

In step 1, we assume an index is present and hence the server process can locate the single block immediately. If the index is not present, Oracle will need to load all the blocks of the table EMP into the buffer cache and check for matching records one by one.

The above description has two very important concepts:
1)    a block, that is the smallest unit of storage in the database
2)    a buffer, that is a placeholder in the buffer cache used to hold a block.

Buffers are just placeholders, which may or may not be occupied. They can hold exactly one block at a time. Therefore for a typical database where the block size is set to 8KB, the buffers are also of size 8KB. If you use multiple block sizes, e.g. 4KB or 16KB, you would have to define multiple buffer caches corresponding to the other block sizes. In that case the buffer sizes will match the block sizes corresponding to those blocks.

When buffers come to the cache, the server process must scan through them to get the value it wants. In the example shown above, the server process must find the record where EMPNO=1. To do so, it has to know the location of the blocks in the buffers. The process scans the buffers in a sequence. So, buffers should ideally be placed in a sequence, e.g. 10 followed by 20, then 30, etc. However this creates a problem. What happens when, after this careful placement buffers, buffer #25 comes in? Since it falls between 20 and 30, it must be inserted in-between, i.e. Oracle must move all the buffers after 20 one step towards the right to make room for the new buffer #25. Moving of memory areas in the memory is not a good idea. It costs expensive CPU cycles, requires all actions on the buffers (even reading) to stop for the duration and is prone to errors.

Therefore, instead of moving the buffers around, a better approach is to put them in something like a linked list. Fig 1 shows how that is done. Each of the buffers has two pointers: which one is behind and which one is right ahead. In this figure, buffer 20 shows that 10 is in front and 30 is the one behind. This would be the case regardless of the actual position of the buffers. When 25 comes in, all we have to do is to update the “behind pointer” of 20 and “ahead pointer” of 30 to point to 25. Similarly the “ahead pointer” and “behind pointer” of 25 are updated to point to 30 and 20 respectively. This simple update is much quicker, does not need activity to stop on all the buffers except the ones being updated and less error-prone.

However, there is another problem. This is just one of the lists. Buffers are used for other purposes as well. For instance, the LRU algorithm needs a list of buffers in LRU order, the DBWn process needs a list of buffers for writing to the disk, etc. So, physically moving the buffers to specific lists is not just impractical, it’s impossible at the same time. Oracle employs a simpler technique to overcome the obstacle. Rather than placing the actual buffers in the linked list, Oracle creates a simpler, much lighter structure called buffer header as a pointer to an actual buffer. This buffer cache is moved around, leaving the actual buffer in place. This way, the buffer header can be listed in many types of lists at the same time. These buffer headers are located in the shared pool, not the buffer cache. This is why you will find the reference to buffers in the shared pool.

Buffer Chains

The buffers are placed in strings. Compare that to rows of spots in a parking lot. Cars come in to an empty spot in a row. If they don’t find one, they go to the next row and so on. Similarly buffers are located on the cache as rows. However, unlike the parking spots which are physically located next to each other, the buffers are logically placed as a sequence in the form of a linked list, described in the above section. Each linked list of buffers is known as a buffer chain, as shown in Fig 2.

Notice how each of the three chains has different numbers of buffers. This is quite normal. Buffers are occupied only when some server process brings them up from the block. Otherwise the buffers are free and not linked to anything. When the buffers are freed up, perhaps because some process such as DBWn writes their contents to the disk, they are removed from the list—a process known as unlinking from the chain. So, in a normal database, buffers will be constantly linked to and unlinked from a chain—making the chain long or small depending on the frequency of either activity. The number of buffer chains is determined by the hidden database parameter _db_block_hash_buckets, which is automatically calculated from the size of the buffer cache.

When a server process wants to access a specific buffer in the cache, it starts at the head of the chain and goes on to inspect each buffer in sequence until it finds what it needs. This is called walking the chain. You might be wondering about a nagging question here—when a buffer comes to the cache, who decides which of the three chains it should be linked to and how? A corollary to that is a challenge posed by the server process in trying to find a specific buffer in the cache. How does the process know which chain to walk? If it always starts at the chain 1, it will take an extraordinary amount of time to locate the block. Typical buffer caches are huge, so the number of chains may run into 10’s of thousands, if not 100’s. So, searching all the chains is not practical. On the other hand, if Oracle were to maintain a memory table showing which blocks are located in which buffers is not practical either, because maintaining that memory table will be time consuming and make the process sequential. Several processes can’t read chains in parallel then.

Oracle solves the problem in a neat manner. Consider the parking lot example earlier. What if you forget where you parked your car? Suppose after you come out of the mall, you find that all the cars have been buried under a thick pile of snow making identification of any of the cars impossible. So, you would have to start at the first car at the first row, dust off the snow from the license plate, check for your car, move on to the next, and so on. Sounds like a lot of work, doesn’t it? So, to help forgetful drivers, the mall marks the rows with letter codes and asks the drivers to park in the row matching the first letter of their last name. John Smith will need to park in row S, and in row S only, even if row T or row R are completely empty. In that case, when John returns to find his car and forgets where it is, he will know to definitely find it in row S. That will be the domain of his search—much, much better than searching the entire parking lot.

Similarly, Oracle determines which specific chain a buffer should be linked to. Every block is uniquely identified by a data block address (DBA). When the block comes to the buffer cache, Oracle applies a hash function to determine the buffer chain number and places the block in a buffer in that chain alone. Similarly, while looking up a specific buffer, Oracle applies the same hash function to the DBA, instantly knows the chain the buffer will be found and walks that specific buffer only. This makes accessing a buffer much easier compared to searching the entire cache.

To find out the data block address, you need to first get the relative file# and block#. Here is an example where I want to find out the blocks of the table named CBCTEST.

 SQL> select
  2     col1,
  3    dbms_rowid.rowid_relative_fno(rowid) rfile#,
  4    dbms_rowid.rowid_block_number(rowid) block#
  5  from cbctest;

      COL1     RFILE#     BLOCK#
---------- ---------- ----------         
1 6        220         
2 6        220         
3 6        220         
4 6        221         
5 6        221         
6 6        221

6 rows selected.

From the output we see that there are 6 rows in this table and they are all located in two blocks in a file with relative file# 6. The blocks are 220 and 221. Using this, we can get the data block address. To get the DBA of the block 220:

SQL> select dbms_utility.make_data_block_address(6,220) from dual; 

DBMS_UTILITY.MAKE_DATA_BLOCK_ADDRESS(6,220)
-------------------------------------------                                   
25166044

The output shows the DBA of that block is 25166044. If there are three chains, we could apply a modulo function that returns the reminder from an input after dividing it by 3:

SQL> select mod(25166044,3) from dual; 

MOD(25166044,3)
---------------              
1

So, we will put it in chain #1 (assuming there are three chains and the first chain starts with 0). The other block of that table, block# 221 will end up in chain #2:

SQL> select dbms_utility.make_data_block_address(6,221) from dual; 

DBMS_UTILITY.MAKE_DATA_BLOCK_ADDRESS(6,221)
-------------------------------------------                                   
25166045

SQL> select mod(25166045,3) from dual;

MOD(25166045,3)
---------------              
2

And so on. Conversely, Oracle if we get a DBA, we can apply the mod() function and the output shows the chain it can be found on. Oracle does not use the exact mod() function as shown here; but a more sophisticated hash function. The exact mechanics of the function is not important; the concept is similar. Oracle can identify the exact chain the buffer needs to go to by applying a hash function on the DBA of the buffer.

Multi-versioning of Buffers

Consider the update SQL statement shown in the beginning of the paper. When Oracle updates the buffer that already exists in the buffer cache, it does not directly update it. Instead, it creates a copy of the buffer and updates that copy. When a query selects data from the block as of a certain SCN number, Oracle creates a copy of the buffer as of the point in time of interest and returns the data from that copy. As you can see, there might be more than a single copy of the same block in the buffer cache. While searching for a buffer the server process needs to search for the versions of the buffer as well. This makes the buffer chain even longer.

To find out the specific buffer of a block, you can check the view V$BH (the buffer headers). The column OBJD is the object_id. (Actually it's the DATA_OBJECT_ID. In this case both are the same; but may not be in all cases). Here are the columns of interest to us:

•    FILE# - the file_id
•    BLOCK# - the block number
•    CLASS# - the type of the block, e.g. data block, segment header, etc. Shown as a code
•    STATUS - the status of the buffer, Exclusive Current, Current, etc.

To make it simpler to understand, we will use a decode() on the class# field to show the type of the block. With that, here is our query: 

select file#, block#,
decode(class#,
    1,'data block',
    2,'sort block',
    3,'save undo block', 
    4,'segment header',
    5,'save undo header',
    6,'free list',
    7,'extent map',
    8,'1st level bmb',
    9,'2nd level bmb',
    10,'3rd level bmb', 
    11,'bitmap block',
12,'bitmap index block',
    13,'file header block',
    14,'unused',
15,'system undo header',
    16,'system undo block', 
    17,'undo header',
18,'undo block') 
        class_type, 
    status
from v$bh
where objd = 99360
order by 1,2,3
/

    FILE#     BLOCK# CLASS_TYPE       STATUS     
---------- ---------- ----------------- ----------         
6        219 segment header    cr                  
6        221 segment header    xcur         
6        222 data block        xcur         
6        220 data block        xcur

4 rows selected.

There are 4 buffers. In this example we have not restarted the cache. So there are two buffers for the segment header. There is one buffer for each data block – 220 and 221. The status is "xcur", which stands for Exclusive Current. It means that the buffer was acquired (or filled by a block) with the intention of being modified. If the intention is merely to select, then the status would have shown CR (Consistent Read). In this case since the rows were inserted modifying the buffer, the blocks were gotten in xcur mode. From a different session update a single row. For easier identification I have used Sess2> as the prompt:

Sess2> update cbctest set col2 = 'Y' where col1 = 1;

1 row updated.

From the original session, check the buffers:

      FILE#     BLOCK# CLASS_TYPE        STATUS   
---------- ---------- ----------------- ----------         
6        219 segment header    cr                 
6        220 segment header    xcur         
6        220 data block        xcur                
6        220 data block        cr                  
6        221 data block        xcur       

5 rows selected.

There are 5 buffers now, up one from the previous four. Note there are two buffers for block ID 220. One CR and one xcur. Why two?

It's because when the update statement was issued, it would have modified the block. Instead of modifying the existing buffer, Oracle creates a "copy" of the buffer and modifies that. This copy is now XCUR status because it was acquired for the purpose of being modified. The previous buffer of this block, which used to be xcur, is converted to "CR". There can't be more than one XCUR buffer for a specific block, that's why it is exclusive. If someone wants to find out the most recently updated buffer, it will just have to look for the copy with the XCUR status. All others are marked CR.

Suppose from a third session, update a different row in the same block.

Sess3> update cbctest set col2 = 'Y' where col1 = 2;
1 row updated. 

From the original session, find out the buffers. 


 FILE# BLOCK# CLASS_TYPE         STATUS
------ ------ ------------------ ----------
6 219 segment header xcur
6 219 segment header cr
6 221 data block xcur
6 220 data block xcur
6 220 data block cr
6 220 data block cr
6 220 data block cr
6 220 data block cr
6 220 data block cr

9 rows selected.


Whoa! There are 9 buffers now. Block 220 now has 6 buffers - up from 4 earlier. This was merely a select statement, which, by definition does not change data. Why did Oracle create a buffer for that?

Again, the answer is CR processing. The CR processing creates copies of the buffer and rolls them back or forward to create the CR copy as of the correct SCN number. This created 2 additional CR copies. From one block, now you have 6 buffers and some buffers were created as a result of select statement. This how Oracle creates multiple versions of the same block in the buffer cache.

Latches

Now that you know how many buffers can be created and how they are located on the chains in the buffer cache, consider examine another problem. What happens when two sessions want to access the buffer cache? There could be several possibilities:

1)    Both processes could be after the same buffer
2)    The processes are after different buffers but the buffers are on the same chain
3)    The buffers are on different chains

Possibility #3 is not an issue; but #2 will be. We don’t allow two processes to walk the chain at the same time. So there needs to be some sort of a mechanism that prevents other processes to perform an action when another process is doing it. This is enabled by a mechanism called a latch. A latch is a memory structure that processes compete to acquire. Whoever gets is is said to “hold the latch”; all others must wait until the latch is available. In many respects it sounds like a lock. The purpose is the same—to provide exclusive access to a resource—but locks have queues. Several processes waiting for a lock will get it when the lock is released in the same sequence they started waiting. Latches, on the other hand, are not sequential. Whenever latches are available, every interested process jumps into the fray to capture it. Again, only one gets it; the others must wait. A process first performs a loop, for 2000 times to actively look for the availability of a latch. This is called spinning. After that the process sleeps for 1 ms and then retries. If not successful, it tries for 1 ms, 2 ms, 2 ms, 4 ms, 4 ms, etc. until the latch is obtained. The process is said to be sleep state in between.

So, latches are the mechanism for making sure no two processes are accessing the same chain. This latch is known as cache buffers chains latch. There is one parent CBC latch and several child CBC latches. However, latches consume memory and CPU; so Oracle does not create as many child latches as there are chains. Instead a single latch may be used for two or more chains, as shown in Fig 3. The number of child latches is determined by the hidden parameter _db_block_hash_latches.

Latches are identified by latch# and child# (in case of child latches). A specific instance of latch that is used is identified by its address in memory (latch address). To find out the latch that protects a specific buffer, get the file# and block# as shown earlier and issue this SQL:

select hladdr
from x$bh
where dbarfil = 6
and dbablk = 220;

Going back to CBC latches, let’s see how you can find out the correlation between chains and latches. First, find the Latch# of the CBC latch. Latch# may change from version to version or across platforms; so it’s a good idea to check for it.

select latch# from v$latch
where name = 'cache buffers chains';

LATCH#
------
   203

This is the parent latch. To find out the child latches (the ones that protect the chains), you should look into another view—V$LATCH_CHILDREN. To find out how many child latches are there:

SQL> select count(1) cnt from v$latch_children where latch# = 203; 

CNT
-------
  16384

If you check the values of the two hidden parameters explained earlier, you will see:

_db_block_hash_buckets 524288
_db_block_hash_latches 16384

The parameter _db_block_hash_buckets decides how many buffer chains are there and the parameter _db_block_hash_latchesdecides the number of CBC latches. Did you notice the value, 16384? It determines the number of CBC latches and we confirmed that it is in fact the number of CBC latches.

Diagnosis of CBC Latch Waits

Let’s now jump into resolving the CBC latch issues. The sessions suffering from CBC latch waits will show up in V$SESSION. Suppose one such session is SID 366. To find out the CBC latch, check the P1, P1RAW and P1TEXT values in V$SESSION, as shown below:

select p1, p1raw, p1text
from v$session where sid = 366;

P1         P1RAW          P1TEXT
---------- ---------------- -------
5553027696 000000014AFC7A70 address

P1TEXT clearly shows the description of the P1 column, i.e. the address of the latch. In this case the address is 000000014AFC7A70. We can check the name of the latch and examine how many times this latch has been requested by sessions but has been missed.

 SQL> select gets, misses, sleeps, name
  2  from v$latch where addr = '000000014AFC7A70';

GETS  MISSES SLEEPS NAME
----- ------ ------ --------------------
49081     14     10 cache buffers chains

From the output we conform that this is a CBC latch. It has been acquired 49,081 times, 14 times missed and 10 times processes have gone to sleep waiting for it.

Next, identify the object whose buffer is so popular. Get the File# and Block# from the buffer cache where the CBC latch is the latch address we identified to be the problem:

select dbarfil, dbablk, tch
from x$bh
where hladdr = '000000014AFC7A70';

DBARFIL DBABLK TCH
------- ------ -----
      6    220 34523

The TCH column shows the touch count, i.e. how many times the buffer has been accessed—a measure of its popularity and hence how much likely that it will be subject to CBC latch waits. From the file# and block# we can get the object ID. The easiest way is to dump the block and get the object ID from the dump file. Here is how you dump the above mentioned block.


alter system dump datafile 6 block min 220 block max 220;

This produces a tracefile, a part of which is shown below.


Start dump data blocks tsn: 4 file#:6 minblk 220 maxblk 220
Block dump from cache:
Dump of buffer cache at level 4 for pdb=0 tsn=4 rdba=25166044
BH (0x7ff72f6b918) file#: 6 rdba: 0x018000dc (6/220) class: 1 ba: 0x7ff7212a000
  set: 12 pool: 3 bsz: 8192 bsi: 0 sflg: 0 pwc: 39,28
  dbwrid: 0 obj: 93587 objn: 93587 tsn: [0/4] afn: 6 hint: f

Get the object ID (the value after “objn”). Using that value you can get the object name:


 SQL> select object_name
  2 from dba_objects
  3 where object_id = 93587;

OBJECT_NAME
----------------------------------------
CBCTEST

Now you know the table whose blocks are so highly popular resulting in CBC latches.

Resolving CBC Latch Waits

From the above discussion you would have made one important observation—CBC latch waits are caused by popularity of the blocks by different processes. If you reduce the popularity, you reduce the chances that two processes will wait for the same buffer. Note: you can’t completely eliminate the waits; you can only reduce it. To reduce is, reduce logical I/O. For instance, Nested Loops revisit the same object several times causing the buffers to be accessed multiple times. If you rewrite the query to avoid NLs, you will significantly reduce the chance that one process will wait for the CBC latch.

Similarly if you write a query that accesses the blocks from a table several times, you will see the blocks getting too popular as well. Here is an example of such a code:

 for i in 1..100000 loop
   select …
   into l_var
   from tablea
   where …;
   exit when sql%notfound;
end loop;

You can rewrite the code by selecting the data from the table into a collection using bulk collect and then selecting from that collection rather than from the table. The SQL_ID column of the V$SESSION will show you which SQLs are causing the CBC latch wait and getting to Object shows you which specific object in that query is causing the problem, allowing you to devise a better solution.

You can also proactively look for objects contributing to the CBC latch wait in the Active Session History, as shown below:

select p1raw, count(*)
from v$active_session_history
where sample_time < sysdate – 1/24
and event = 'latch: cache buffers chain'
group by event, p1
order by 3 desc;

The P1RAW value shows the latch address, using which you can easily find the file# and block#:


select o.name, bh.dbarfil, bh.dbablk, bh.tch
from x$bh bh, sys.obj$ o
where tch > 0
and hladdr= ''
and o.obj#=bh.obj
order by tch;

With the approach shown earlier, you can now get the object information from the file# and block#. Once you know the objects contributing to the CBC latch waits, you can reduce the waits by reducing the number of times the latch is requested. That is something you can do by making the blocks of the table less popular. The less the number of rows in a block, the less popular the block will be. You can reduce the number of rows in a block by increasing PCTFREE or using ALTER TABLE … MINIMIZE RECORDS_PER_BLOCK. If that does not help, you can partition a table. That forces the data block address to be recomputed for each partition, making it more likely that the buffers will end up in different buffer chains and hence the competition for the same chain will be less.

Conclusion

In this blog you learned how Oracle manages the buffer cache and how latches are used to ensure only one process can walk the chain to access a buffer. This latch is known as Cache Buffer Chain latch. You learned why this latch is obtained by Oracle and how to reduce the possibility that two processes will want the latch at the same time. I hope this helped you understanding and resolving Cache Buffer Chains Latch related waits. Your feedback will be highly appreciated.

Is Continuous Integration compatible with database applications ?

Continuous integration and continuous delivery offers huge efficiency gains for companies  but is continuous integration even possible when the application’s backbone is a massive relational database. How can one spin up database copies for developers, QA, integration testing,  and delivery testing ?  Its not like Chef or Puppet can spin up a 10TB database copy in a few minutes the way one can spin up a Linux VM.

There is a way and that way is called data virtualization which allows one to spin up that 10TB database in minutes as well as branch a copy of that 10TB from Dev to QA, or for that matter branch several copies and all for a very little storage.

227840.strip.print

Old methods of application project development and rollout have a solid history of failure to meet deadlines and budget goals.

Repeating the old methods and expecting different results is what Einstein would call insanity

“Insanity: doing the same thing over and over again and expecting different results.” – Einstein

Continuation Integration (CI),  Continuous Delivery  and Agile offer an opportunity to hit deadlines on budget with tremendous gains in efficiency for companies as opposed to waterfall methods. With waterfall methods we try to get all the requirements, specifications  and architecture designed up front and then set the development teams working and then tackle integration and deployment near the end of the cycle. It’s impossible to be able to precisely target dates when the project will be completed and sufficiently QA’ed. Even worse during integration problems and bugs start to pour in further exacerbating the problems of meeting release dates on time.

Agile, CI and CD fix these issues, but there is one huge hurdle for most shops and that hurdle is getting the right data, especially when the data is large, into the Agile, CI and CD life cycle and flowing through that lifecycle.

With Agile, Continuous Integration and Continuous Delivery we are constantly getting feedback on where were are, how fast we are going and we are altering our course. Our course is also open to changing as new information comes in on customer requirements.

Agile development calls for short sprints, among other things,  for adding features and functions and with continuous integration those features can be tested daily or multiple times a day. Further enhancing continuous integration is continuos delivery, for those systems that make sense, where new code that has passed continuous integration can be rolled into continuous delivery meaning testing the code deployment in a test environment. For some shops, where it makes sense such as famously flickr, Facebook, Google the code can be passed into continuous deployment into production.

By using agile programming methods and constantly doing integration testing one can get constant feedback, do course correction, reduce technical debt and stay on top of bugs.

Compare the two approaches in the  graphs below.

In the first graph we kick off the projects. With waterfall we proceed until we are near completion and then start integration and delivery testing. At this point we come to realize how far from the mark we are. We hurriedly try to get back to the goal, but time has run out. Either we release with less than the targeted functionality or worse the wrong functionality or we miss the

Screen Shot 2014-11-07 at 2.39.09 PM

With Agile and CI its much easier to course correct with small iterations and the flexibility to modify the designs based on incoming customer and market requirements.

Screen Shot 2014-11-07 at 2.40.47 PM

With Agile and CI, code is tested in an integrated manner as soon as the first sprint is done so bugs are caught early and kept in check. With waterfall, since it takes so much longer to get working set of code working and integration isn’t even started until near the end of the cycle, bugs start to accumulate significantly towards the end of the cycle.

Screen Shot 2014-11-07 at 1.19.46 PM

In waterfall, deployment doesn’t even happen until the end of the cycle because there isn’t an integrated deployable set of code until the end. The larger the project gets and the more time goes by the more expensive and difficult the deployment is. With agile and CI the code is constantly deployable and so the cost of deployment stays constant at a low cost.

Screen Shot 2014-11-07 at 1.19.35 PM

A waterfall project can’t even start to bring in revenue until it’s completely finished but with agile, there is usable code early on and with continuous deployment that code can be leveraged for revenue early.

With all these benefits, more and more shops are moving towards continuos integration and continuos.  With tools like Jenkins, #555555;"> Team City, Travis to run continuos integration test and virtualization technologies such as VMware, AWS, Openstack, Vagrant, Docker and tools like Chef, Puppet and Ansible to run the setup and configuration many shops have moved closer and closer to continuos integration and delivery. 

But there is one huge road block.

#555555;"> #2970a6;" href="http://tibcovideo.wistia.com/medias/ao82bp3qch?mkt_tok=3RkMMJWWfF9wsRonuqrNZKXonjHpfsX56uotUK+/lMI/0ER3fOvrPUfGjI4JSdQ/cOedCQkZHblFnVgBT62+WLgNqKUE">Gene Kim lays out the top 2 bottlenecks in  IT

    #555555;">
  1. Provisioning environments for development
  2. Setting up test and QA environments

and goes on to say

#555555;">One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it.

From Contino’s recent white paper 

Having worked with many enterprise organisations on their DevOps initiatives, the biggest pain point and source of wastage that we see across the software development lifecycle is around environment provisioning, access and management.

From an article published today in Computing [UK]  we hear the problem voiced:

#000000;">“From day one our first goal was to have more testing around the system, then it moves on from testing to continuous delivery,” 

#000000;">But to achieve this, while at same time maintaining the integrity of datasets, required a major change in the way Lear’s team managed its data.

#000000;">“It’s a big part of the system, the #0077aa;" title="Looking beyond Oracle: two high-end SQL databases you may never have heard of" href="http://www.computing.co.uk/ctg/analysis/2378456/looking-beyond-oracle-two-high-end-sql-databases-you-may-never-have-heard-of" target="_blank">database, and we wanted developers to self-serve and base their own development in their own controlled area,” he says.

#000000;"> Lear was determined to speed up this process, and began looking for a solution – although he wasn’t really sure whether such a thing actually existed.

This road block as been voiced by experts in the industry more and more as the industry moves towards continuos integration.

continuous-deliveryWhen performing acceptance testing or capacity testing (or even, sometimes, unit testing), the default option for many teams is to take a dump of the production data. This is problematic for many reasons (not least the size of the dataset),

Humble, Jez; Farley, David (2010-07-27). Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler)) (Kindle Locations 7285-7287). Pearson Education. Kindle Edition.

What can we do about this enormous obstacle to Continous Integration of providing environments that rely on databases and these databases are too big and complex to provide copies for development, QA and continuos integration?

Fortunately for us there is data virtualization technology. As virtual machine technology opened the door to  continuos integration, data virtualization swings it wide open for enterprise level application development that depend on large databases.

Data virtualization is an architecture (that can be encapsulated in software as Delphix has done) which connects to source data or database , take an initial copy and then and forever collects only the changes from the source (like EMC SRDF, Netapp SMO, Oracle Standby database). The data is saved on storage that has either snapshot capabilities (as in Netapp & ZFS or software like Delphix that maps a snapshot filesystem onto any storage even JBODs). The data is managed as a timeline on the snapshot storage. For example Delphix saves by default 30 days of changes. Changes older than the 30 days are purged out, meaning that a copy can be made down to the second anywhere within this 30 day time window. (some other technologies that address part of the virtual data stack are Oracle’s Snap Clone and Actifio).

Virtual data improves businesses’ bottom line by eliminating the enormous infrastructure, bureaucracy and time drag that it takes to provision databases and data for business intelligence groups and development environments. Development environments and business intelligence groups depend on having a copies of production data and databases and data virtualization allows provisioning in a few minutes with almost no storage overhead by sharing duplicate blocks among all the copies.

 

As a side note, but important, development and QA often require that data be masked to hide sensitive information such as credit cards or patient records,  thus its important that a solution come integrated with masking technology.  Data virtualization combined with masking can vastly reduce the surface area (amount of potentially exposed data) required to secure but eliminating full copies. Also data virtualization structure includes chain of authority where who had access to what data at what time is recorded.

The typical architecture before data virtualization looks like the following where a production database is copied to

  1. backup
  2. reporting
  3. development

In development the copies are further propagated to QA, UAT, but because of the difficulty in provisioning these environments, which takes number teams and people (DBA, storage, system, network, backup) , the environments are limited due to resource constraints and often the data is old and unrepresentative.

Screen Shot 2014-11-09 at 6.28.32 PM

 

With data virtualization, there is a time flow of data states stored efficiently on the “data virtualization appliance” and provisioning a copy  only takes a few minutes, little storage and can be run by a single administrator or even run as self service by the end users such as developers, QA and business analysts.

Screen Shot 2014-11-09 at 6.28.54 PM

 

With the ease of provisioning large data environments quickly easily and for low resources, it become easy to quickly provision copies of environments for development and to branch in minutes those copies into multiple parallel QA lanes to enable continuous integration:

 

 

 

 

Screen Shot 2014-11-09 at 6.32.22 PM

The duplicate data blocks, which is the majority  in the case of large data sets, can even be shared across development versions:

Screen Shot 2014-11-10 at 4.49.14 AM

Go From

Waterfall Deployment

photo

word_in_dev

 

to Continuous Integration

surfers_small_waves

 

 

Next Club Oracle London – 12th November 2014

The next Club Oracle London is taking this place, on Wednesday 12th November. You can find the details here.

This is a free evening or Oracle talks provided by some of our local experts and beer/snacks provided by e-DBA. The first event was back on the 3rd of July and it was a great evening. The coming event will have presentations by James Anthony, Dominic Giles and Jason Arneil in 12C in-memory database, key facts gleaned from OOW14 and 12C enhancements respectively – all excellent presenters and you get to ask them awkward questions in an open session at the end. Plus free beer and (I think) pizza to keep you going. All you can ask for of a user group meeting.

I have to say, it’s not the easiest event to find out about unless you have managed to get onto the list of people who are interested (ie turned up last time I think). There is another mention actually on e-DBAs LinkedIn page here and my freind Neil Chandler put a quick post about it last week but there is not a lot else out there about it!

I know e-DBA are keen for these events to stand on their own 2 feet and not be seen as just an e-DBA event so they are not pushing it hard themselves – they are happy to just to provide the support – but it seems the message is not getting out to the Oracle community about the events, so spread the work if you are from around these parts. It would be good to see it well attended and continue. I understand they are already looking at who they would like to present next time, so there should be future events.

 

{And I still think they should call it London Oracle Club… }

Quiz night

Prompted by an email from Yves Colin (who’ll be presenting on the Tuesday of UKOUG Tech14) I was prompted to dig out a little script I wrote some years ago and re-run an old test, leading to this simple question:  what’s the largest size array insert that Oracle will handle ?

If you’re tempted to answer, watch out – it’s not exactly a trick question, but there is a bit of a catch.

Answer:

There is an internal limit of 255 on the size of array that Oracle can insert as a unit. I leave it as an exercise to the reader to decide whether or not this makes any significant difference to performance, since the effects of row size, number of indexes maintained, and disk I/O requirements may make the effects of the limit virtually undetectable. To demonstrate the limit all we need do is insert a few hundred rows into a table and do a block dump, optionally followed by a dump of the redo log.


create table t1 (n1 number not null) segment creation immediate;

alter system switch logfile;

insert into t1 select rownum from all_objects where rownum <= 1000;
commit;

execute dump_log

start rowid_count t1

/*
   FILE_NO   BLOCK_NO ROWS_IN_BLOCK
---------- ---------- -------------
         5        180           660
         5        184           340

2 rows selected.


ROWS_IN_BLOCK     BLOCKS
------------- ----------
          340          1
          660          1

2 rows selected.
*/

alter system flush buffer_cache;

-- SQL> alter system dump datafile 5 block 180;

The dump_log procedure is simply a pl/sql wrapper for a call to ‘alter system dump logfile {current log}’; I the script rowid_count.sql extracts the file and block numbers from rowids in the given table and aggregates them in different ways.  The reason for running the script is to find a table block with a lot of rows in it; the block I dumped actually held the first 660 rows of the insert. Here’s a tiny extract from the block dump (with one little comment added):


tab 0, row 0, @0x1904			-- 6,400 dec
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 02
tab 0, row 1, @0x190a
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 03
tab 0, row 2, @0x1910
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 04
tab 0, row 3, @0x1916
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1

This shows the first four rows inserted, and you can see that the offset to the first row is 6,400 bytes, and then each subsequent row is 6 bytes further down the block (when the numbers get larger the row lengths will increase to 7 bytes). The positioning of these rows is, at first sight, a little odd – you might have guessed that they would either start at offset “zero” and work down to offset “8K”, or start at “8K” and work backwards up to zero – why is the first row 3/4 of the way down the block ?

Rather than show you more row dumps, I’ll give you a carefully selected extract from the row directory:


0x12:pri[0]     offs=0x1904
0x14:pri[1]     offs=0x190a

...

0x20c:pri[253]  offs=0x1f8a
0x20e:pri[254]  offs=0x1f91		End of first 255 rows
0x210:pri[255]  offs=0x120e		Start of second 255 rows
0x212:pri[256]  offs=0x1215

...

0x40a:pri[508]  offs=0x18f6
0x40c:pri[509]  offs=0x18fd		End of second 255 rows
0x40e:pri[510]  offs=0xdf5		Start of last 150 rows
0x410:pri[511]  offs=0xdfc

...

0x536:pri[658]  offs=0x1200		End of last 150 rows
0x538:pri[659]  offs=0x1207

The first 255 rows inserted are stacked at the bottom of the block at offsets 0×1904 to 0x1f91.
The second 255 rows inserted are stacked above them at offsets 0x120e to 0x18fd (note 0x18fd + 6 = 0×1903)
The last 150 rows inserted are stack above them at offsets 0xdf5 to 0×1207 (note 0×1207 + 6 = 0x120d)

No matter how large your attempted array insert, the maximum number of rows (or index entries) Oracle can insert into a block in a single internal array operation is 255.

Further corroboration comes from the redo log dump – here’s a tiny bit of a single change vector (i.e. a single atomic change to a single Oracle block) from the redo generated while this insert was going on:


CHANGE #18 CON_ID:0 TYP:0 CLS:1 AFN:5 DBA:0x014000c7 OBJ:95876 SCN:0x0000.007528dd SEQ:1 OP:11.11 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x01  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: F  xid:  0x0007.007.000009f2    uba: 0x01000bda.02a9.14
KDO Op code: QMI row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x014000c7  hdba: 0x01400083
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 lock: 1 nrow: 255
slot[0]: 0
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 02
slot[1]: 1
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 03
slot[2]: 2
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 04

Although the CON_ID in the line 1 tells you this dump is from 12c the same limit holds across all (modern) versions of Oracle. The operation (OP 11.11 – in line 1) is “Insert Multiple Rows”, and at line 9 you can see: “nrow: 255″. I haven’t included the rest of the change vector, but all it does is show the remaining 252 rows.

Friday Philosophy – Is Dave Productive?

How do I know if Dave is doing his job properly? If I am his (or her*) manager, what techniques can I use to ensure I am getting my pound of flesh out of this worker drone in return for the exorbitant salary my company puts into said drone’s bank account each month?

Well, as a start there is my last Friday Philosophy all about deduction of work profile via auditory analysis of input devices (ie how fast is Dave typing) :-) I have to say, the response to that topic has been very good, I’ve had a few chats with people about it and got some interesting comments on the blog article itself. My blog hits went Ping :-)

However, I have a confession to make. I have a “history” in respect of keyboards and management of staff. Maybe one of my old colleagues will comment to confirm this, but I used to regularly walk into an office full of “my people” and bark “Type faster you B*****ds! I don’t care what it is you are doing, I just want to see those fingers flying over the keyboard!”. They all knew to ignore me, this was just one example of my pathetic sense of humour. In some ways, I was never a very good manager as I was just a bit too juvenile, irreverent and non-managerial.

I was being ironic and they knew it. I had no time for many of the Management Easy Options you so often come across in organisations that are used to apparently help ensure the staff are working hard. What do I mean by Management Easy Options? I’ll cover a few.

 

You have to be at your desk for at least 8 hours.

At Your Desk. Because if you are at your desk you are working of course. And if you are not at your desk, you are not working. Hours at the desk apparently equate to productivity. So a Management Easy Option is to insist all your staff are seen to be in the office and at their desk for as long as, and preferably longer, than the average time across all staff. And that is partly why in dysfunctional companies staff are in the office so long. As if lots of managers want to demonstrate that they are “good managers” by having their staff “productive” at their desks, their staff will be there longer than average…which pushes up the average…so they keep the staff there longer… *sigh*

I could spend a few pages on the academic and psychological studies that disprove the above nonsense about 8 hours of productive work – but we all know it is nonsense anyway. We talk about it at lunch or in the pub. If you are stuck at your desk longer than you can concentrate, you do other stuff that is hard to distinguish from work. Or you do poor work. WE ALL KNOW THIS so why does this myth about hours-at-desk continue? What happens to some manager’s brains such that they start managing and soon stop knowing this?!?

As a self employed worker in the London IT market, I often get given a contract to sign that specifies I must do a professional working day:- that “consists of 8 hours minimum each day”. For the last 5 or 6 years I have always crossed out that clause or altered it to say “8 hours maximum” or replaced it with what I feel should be the real clause, which is:

A professional working day, which is to, on average across a week,  match or exceed the requirements of my manager for a day’s productivity.

If I am being asked to work a Professional Working Day then to me that means I have to achieve a day’s worth of benefit to the company for each day paid to me. If that takes me 8 hours or 6 or 9 or whatever is immaterial. As a Professional I will on average, each day, keep my manager happy that I am worth employing. If that involves 6 hours of extra work one day between 8pm and 2am, fine. But do not expect 8 hours the next day. If my manager is not happy, then you ask me to go and I will go. It really is as simple as that.

{honesty forces me to admit that at present, for the first time in years, I have that 40 hour clause in place. Because I am doing a role for a friend, and I did not want to cause a fuss by objecting to the clause. But if management ever refer to the clause, my friend knows I will simply thank management for their time to date – and I’ll be going now}.

I drifted into my own world there, but the point I really wanted to make is that hours spent at the desk in no way indicate if the job is being done. We all know that, all the managers know that (well, they will if they are any good). Some people can be at their desk 10 hours a day and, frankly, it would help the company if they were not! Other people are at their desk but spend a huge slice of the time on the web or Instant Messaging or *cough* writing blogs.

 

You have to be in the office.

If you are at home, you will be goofing off.
So what does the above say about the manager if that is their opinion? If you are at home, you would goof off, so therefore your staff will? Of course working from home has other considerations, such as it is only possible if your role allows you to spend some days not physically doing things in the office (pressing reset buttons on boxes? Making tea for the team?) and you are in the office enough to maintain and make proper bridges with your colleagues. I also think working from home is a privilege to earn and not a right, as some people really are incapable of working from home. I had a role a while back where when one chap was “working from home” he was actually doing all sorts of things – but his smartphone was set up to fake an online presence. He was incapable of working from home.

But in IT there really is not a need for many of us to spend all that time and unpleasantness commuting and some tasks really are done more efficiently if people can’t keep coming up to your desk and demanding their personal priorities really are your priorities too (which usually equates to they are in it up to their necks and you can dig them out).

 

Enforce a Clean Desk policy.

Now, there are things that should not ever be left on your desk. Financial information, personal information (like people’s CVs or annual reviews), management information (salary reviews, plans to axe 22% of the workforce, stuff like that) but I have no time at all for the argument that a clean desk looks more professional. It does not look more professional, that is just weaselly, lying balls. It looks more like someone has implemented a draconian clean desk policy and any sign of the desk occupants being human is of no consideration.

If you walk into an office with 300 utterly clean desks, it looks like a soul-less, bitter and degrading place to work slave.

You walk into an office and you see pictures of offspring & partners, little toys (not my thing but some people like to have the gonk their boy/girlfriend gave them) and that’s just fine.

Yeah, if Malcolm has a pile of 237 Diet Coke cans in a pyramid on his desk that is not so hot, but as a manager it is your job to go tell Malcolm to recycle those damn cans. And for those of us who work in Clean Desk environments, we all know we spend a few minutes each morning pulling stuff out of our pedestals and a few minutes each evening chucking it all back in there. Great use of time, oh management clean desk police. So the Management Easy Option is to make everyone remove all signs of humanity and *also* waste time moving all useful things off your desk each evening and drag them out each morning, rather than occasionally check what people leave on their desk and, when Cherry has left details of the latest dodgy plan to hide details from the FDA on her desk, give her a seriously hard talking to.

In one job I did not have desk pedestal, I had a locker – “Over There” at the other side of the office where my first allotted desk was. It took two or three trips each morning and end of the day to sort out my stuff and keep my desk “clean”. At least I docked it off the 8 hour day…

 

So having moaned about a few of these Easy Management Options that, in my opinion, are detrimental – how do you ensure Dave is Productive? Now, this is a complex and challenging idea and I am not sure some managers will understand it. But, the way you can tell if Dave is productive is that…

He Does His Job.

He completes the tasks assigned to him in the time frame that is reasonable or informs you of the reasons why the tasks are taking longer. If Dave’s role includes scooping up issues and solving them autonomously, you know Dave is doing his job as the end users are not screaming at you. In fact, if as a manger you are barely aware of Dave existing, either he is doing his job exceedingly well or you employed him to do a non-existent job (so more fool you). The bottom line is that, as Dave’s manager, your job is to to aid Dave do his job, overcome obstacle and track that his tasks are done.. ie be a proper manager, not rule by Easy Management Options.

Bottom line, to get back to my first paragraph or two, it matters not one jot how fast Dave types. If (s)he is in the office for the meetings and any core hours needed, fine. So long as a member of staff is not doing things that negatively impact their ability to do their job or those around them to do theirs, there are few blanket rules that help. All those Easy Management Options simply exist to cover the backsides of poor managers and satisfy the desire for control that comes from HR and upper management. Neither of which *Ever* abide by the rules they lay down on others.

Break free! Type slowly! Put a picture of Debbie Harry on your desk. Work from home and Go Crazy spending an hour in the afternoon combing the dog. Just make sure you do your job. In my book, that makes you worth your pay. Is it really so hard to manage people in that way?!?

(*) I have yet to meet a lady called Dave, but Dave is simply my generic name for someone working in IT. No real Dave is implied. But both sexes are.

AUSOUG conference Perth

The annual two day AUSOUG conference in Perth is well underway, and so far, its been a wonderfully successful event. In particular, we Australians are notorious for not really getting into the networking thing, so conferences often have a lot of ‘awkward silences’ when sessions are not on, and we are meant to be (god forbid) *talking* to each other :-)

But this year has been different. Lots of banter, lots of discussion. Also, the AUSOUG committee got it exactly right this year. The venue is just the right size, the rooms are the right size, the timings of the sessions and transit times were spot on. It is a real credit to them – it shows how much they are listening to the speakers and attendees feedback.

I had a couple of talks, which I’ll slideshare in the next couple of days, and we also had plenty of good talks as part of the OTN tour from Tim Hall, Bjorn Rost, Craig Shallamer. Penny Cookson, Jeff Kemp and Chris Muir rounded off the talks I saw.

So if you’re ever in Perth around conference time – put it on your calendar. Its well worth it. (www.ausoug.org.au)

About Oracle’s Code of Ethics, MOS Confidentiality Clause, and Content Stolen from My Website

I strive to keep the number non-technical posts on this blog to the minimum. However, there are days when a non-technical post has to be written. This is one of such days.

Oracle Corporation, concerned about legal and ethical issues, not only provides to its employees a document entitled Code of Ethics and Business Conduct, but also expects that every employee adheres with the standards that the document sets. For the purpose of this post, I would like to quote part of the “Intellectual Property” section:

It is against Oracle policy to use, copy, display, or distribute third-party copyrighted software, documentation, or other materials without permission or approval from Oracle’s Legal Department. For example, you may not post another entity’s copyrighted content to any internal or external Web site or other electronic forum without first obtaining the necessary approvals.

Having said that, I encourage you to have a look to the My Oracle Support note entitled Implementing Instance Caging (1177593.1). For your convenience (and also because I am fully aware that the current situation will change quickly), even though Oracle Corporation did not give me permission to do so, I provide you this PDF file showing the content of the note as the 5th of November 2014. In this way, you can accesses it from this blog without having access to My Oracle Support.

You might ask why I wrote that Oracle Corporation did not give me permission to publish the note I mentioned. The reason is quite simple… Despite the fact that I have access to My Oracle Support (among other, the company I work for, Trivadis, is a Platinum level partner), My Oracle Support Terms of Use clearly states the following:

You may not use, disclose, reproduce, transmit, or otherwise copy in any form or by any means the information contained in the Materials for any purpose, other than to support your authorized use of the Oracle product and/or cloud services, for which you have a current support contract, without the prior written permission of Oracle.

So, why I did it? Because part of the content of the note was reproduced by Oracle Corporation, or better, by one of its employees, without my permission! In fact, as you can easily see yourself, most of the post entitled Instance Caging that I published in November 2009 was included (i.e. copy/pasted) into the note.

While I might find flattering that people at Oracle are using my blog as a resource for technical information, such a conduct is against not only to common practices, but also to Oracle’s well defined code of ethics.

All the best!



PS: I did not write this post to attack Oracle Corporation, but to draw its attention about the unethical behaviour of some of its employees. Alas, doing that in public is the most effective way I am aware of.



UPDATE 2014-11-06 22:19: less than 4 hours after I published this post the note is no longer visible in My Oracle Support. Hence, the link referencing it is broken.