December 9, 2010 (Forward to the Next Post in the Series) In the interest of interesting quotes, a quote that I found in the “Oracle Tuning the Definitive Reference Second Edition” book on page 995: “In the 1970s, Moore’s law was introduced, stating that processor costs were always falling while speed continued to improve. However, as Oracle [...]
Oracle’s pre-RMAN hot backup mode is the subject of one of the most pervasive and persistent misconceptions about Oracle.
During an Oracle tablespace hot backup, you (or your script) puts a tablespace into backup mode, then copies the datafiles to disk or tape, then takes the tablespace out of backup mode. These steps are widely understood by most DBAs.
However, there is a popular misconception that datafiles are “quiesced,” “frozen,” “offlined” or “locked” during backup mode. So many people think it is true, that it appears in some books on Oracle and on numerous websites. Some have even reported that they learned this from DBA class instructors.
The myth has a couple permutations. One is that while the datafiles are allegedly not writable, changes are stored somewhere in the SGA, the redologs, the rollback segments or some combination thereof, then written back into the datafile when the tablespace is taken out of backup mode. There is a passage in the SAMS title Oracle Unleashed describing this supposed mechanism.
When you place a tablespace in backup mode, the Oracle instance notes that a backup is being performed and internally compensates for it. As you know, it is impossible to make an authentic copy of a database file that is being written to. On receipt of the command to begin the backup, however, Oracle ceases to make direct changes to the database file. It uses a complex combination of rollback segments, buffers, redo logs, and archive logs to store the data until the end backup command is received and the database files are brought back in sync. Simplifying a hot backup in this way is tantamount to classifying the USS Nimitz as a boat. The complexity of the actions taken by the Oracle RDBMS under a hot backup could consume an entire chapter and is beyond the scope of this book. What you should understand is the trade-off for taking a hot backup is increased use of rollback segments, redo logs, archive logs, and internal buffer areas within the SGA.
– Oracle Unleashed, Copyright © SAMS/Macmillan, Inc. 1997, chapter 14)
No No No! Stop making stuff up! Oracle’s tablespace hot backup does not work this way at all. It is actually a simple, elegant and failure-resistant mechanism. It absolutely does not stop writing to the datafiles. It actually allows continued operation of the database almost exactly as during normal operation. Contrary to the characterization as “complex” in SAMS Oracle Unleashed, it can be summarized in a few steps:
Those three actions are all that is required to guarantee consistency once the file is restored and recovery is applied. By freezing the Checkpoint SCN, any subsequent recovery on that backup copy of the file will know that it must commence at that SCN. Having an old SCN in the file header tells recovery that the file is an old one, and that it should look for the archivelog containing that SCN, and apply recovery starting there.
Note that during hot backup mode, checkpoints to datafiles are not suppressed. Only the main Checkpoint SCN flag is frozen, but CKPT continues to update a Hot Backup Checkpoint SCN in the file header.
There is a confusing side effect of having the Checkpoint SCN frozen at an SCN earlier than the true checkpointed SCN of the database. In the event of a system crash or a shutdown abort during hot backup of a tablespace, the automatic crash recovery routine during startup will look at the file headers, think that the files for that tablespace are out of date, and will suggest that you need to apply old archived redologs in order to bring them back into sync with the rest of the database. Fortunately, no media recovery is necessary. With the database started up in mount mode:
SQL> alter database end backup;
This action will bring the Checkpoint SCN in the file headers in sync with the Hot Backup Checkpoint SCN (which is a true representation of the last SCN to which the datafile is checkpointed). Once you do this, normal crash recovery can proceed during ‘alter database open;’.
By initially checkpointing the datafiles that comprise the tablespace and logging full block images to redo, Oracle guarantees that any blocks changed in the datafile while in hot backup mode will also be present in the archivelogs in case they are ever used for a recovery. Most of the Oracle user community knows that Oracle generates a greater volume of redo during hot backup mode. This is the result of Oracle logging of full images of changed blocks in these tablespaces. Normally, Oracle writes a change vector to the redologs for every change, but it does not write the whole image of the database block. Full block image logging during backup eliminates the possibility that the backup will contain unresolvable split blocks. To understand this reasoning, you must first understand what a split block is.
Typically, Oracle database blocks are a multiple of O/S blocks. For example, most Unix filesystems have a default block size of 512 bytes, while Oracle’s default block size is 8k. This means that the filesystem stores data in 512 byte chunks, while Oracle performs reads and writes in 8k chunks or multiples thereof. While backing up a datafile, your backup script makes a copy of the datafile from the filesystem, using O/S utilities such as copy, dd, cpio, or OCOPY. As it is making this copy, your process is reading in O/S-block-sized increments. If DBWn happens to be writing a DB block into the datafile at the same moment that your script is reading that block’s constituent O/S blocks, your copy of the DB block could contain some O/S blocks from before the database performed the write, and some from after. This would be a split block. By logging the full block image of the changed block to the redologs, Oracle guarantees that in the event of a recovery, any split blocks that might be in the backup copy of the datafile will be resolved by overlaying them with the full legitimate image of the block from the archivelogs. Upon completion of a recovery, any blocks that got copied in a split state into the backup will have been resolved by overlaying them with the block images from the archivelogs.
All of these mechanisms exist for the benefit of the backup copy of the files and any future recovery. They have very little effect on the current datafiles and the database being backed up. Throughout the backup, server processes read datafiles DBWn writes them, just as when a backup is not taking place. The only difference in the open database files is the frozen Checkpoint SCN, and the active Hot Backup Checkopint SCN. To demonstrate the principle, we can formulate a simple proof:
Create a table and insert a row:
SQL> create table fruit (kind varchar2(32)) tablespace users; Table created. SQL> insert into fruit values ('orange'); 1 row created. SQL> commit; Commit complete.
Force a checkpoint, to flush dirty blocks to the datafiles.
SQL> alter system checkpoint; System altered.
Get the file name and block number where the row resides:
SQL> select dbms_rowid.rowid_relative_fno(rowid) file_num, dbms_rowid.rowid_block_number(rowid) block_num, kind from fruit; FILE_NUM BLOCK_NUM KIND -------- --------- ------ 4 183 orange SQL> select name from v$datafile where file# = 4; NAME ----------------------------- /u01/oradata/uw01/users01.dbf
Use the dd utility to skip to block 183 and extract the DB block containing the row:
unixhost% dd bs=8k skip=183 count=1 if=/u01/oradata/uw01/users01.dbf | strings 1+0 records in 16+0 records out orange
Now we put the tablespace into hot backup mode:
SQL> alter tablespace users begin backup; Tablespace altered.
Update the row, commit, and force a checkpoint on the database.
SQL> update fruit set kind = 'plum'; 1 row updated SQL> commit; Commit complete. SQL> alter system checkpoint; System altered.
Extract the same block. It shows that the DB block has been written to disk during backup mode:
unixhost% dd bs=8k skip=183 count=1 if=/u01/oradata/uw01/users01.dbf | strings 1+0 records in 16+0 records out plum orange
Don’t forget to take the tablespace out of backup mode!
SQL> alter tablespace administrator end backup; Tablespace altered.
It is quite clear from this demonstration that datafiles receive writes even during hot backup mode!
This is a post specially for Junping Zhang, who has been waiting patiently for a follow-up to my geek post about find the system-level setting for parameter use_stored_outlines. His question was: “Is it also possible to find the session-level setting ?”
The answer is yes – but it’s harder, version specific, requires a little preparation, and only allows you to see the setting for your own session.
The biggest problem is that the session-level setting is stored in the session’s “fixed global area”, and Oracle Corp. has not exposed an map of this memory as an X$ structure – so we have to find a way of mapping it ourselves – and the map will change with version and platform.
We start by using oradebug to dump a copy of the session global area to a trace file (and this operation requires you to have a fairly high level of privilege in your system) then searching through the trace file for the value “uso”. Once you’ve done this once you find that the exact name of the variable you want is ugauso_p, and you can find its address.
Once you know the exact name of the variable you’re after you can dump its details to screen rather than to a trace file – the following is a cut-n-paste from a 10.2.0.3 session (with the first few lines of the trace file dump inserted into the middle of the output):
SQL> oradebug setmypid Statement processed. SQL> oradebug dump global_area 4 Statement processed. ******************************** ----- Dump of the Fixed UGA ----- ub4 ksmugmg1 [91173FC, 9117400) = 0000ABAB kghds * ksmughpp [9117400, 9117404) = 09117404 ksmpy ksmuh_p [9117404, 9117520) = 05E1F6D0 0000FFB8 21595A04 091E0048 ... ... etc. ... ******************************** SQL> oradebug dumpvar uga ugauso_p qolprm ugauso_p [9117A44, 9117A68) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 SQL> alter session set use_stored_outlines=ABCDEFG; Session altered. SQL> oradebug dumpvar uga ugauso_p qolprm ugauso_p [9117A44, 9117A68) = 00000001 42410007 46454443 00000047 00000000 00000000 00000000 00000000 00000000 SQL>
The numbers in the brackets [9117A44, 9117A68) give you the address of the variable in your session’s virtual memory space – so if you can find this pair of addresses from SQL or PL/SQL you can write some code to read x$ksmmem to get the current value for the variable. One way of working out the addresses is to query x$ksmpp – the list of pga memory pointers.
SQL> select * from x$ksmpp; ADDR INDX INST_ID KSMCHCOM KSMCHPTR KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR -------- ---------- ---------- ---------------- -------- ---------- -------- ---------- -------- 091E8AA4 0 1 kpuinit env han 08EEAB80 1584 freeabl 0 00 091E8A6C 1 1 Alloc environm 08EE9B50 4144 recr 4095 08EEB108 ... 091E7F0C 53 1 Fixed Uga 091173F0 20620 freeabl 0 00 ... 091E7DBC 59 1 PLS non-lib hp 091CBFB0 8224 freeabl 0 08F9E02C 091E7D84 60 1 free memory 091C0010 49056 free 0 00 61 rows selected. SQL> spool off
The item we are interested in is the “Fixed Uga” which (for this session, on this machine, on this operating system, on this version of Oracle, after this startup of the instance, etc. etc. etc.) is based at 0x091173f0. Although there are many reasons why the location of the fixed uga isn’t fixed in absolute terms, we know that for a given platform and version, the contents of the fixed uga really are “fixed”. So we can work out the offset of the ugauso_p for this platform and version by subtracting the address of the start of the fixed uga from the address we go from oradebug for ugauso_p, viz: 0x9117A44 – 0x091173F0 = 0×0654 = 1,620.
Once we have the offset we can write a pl/sql function to find the start of the fixed uga, jump to the offset, then walk through virtual memory picking up the bytes we need. Basically it’s the same code that I used for the system-level parameter, with a change in how to find the starting address of the value.
create or replace function session_uso return varchar2 as uso varchar2(30); mem_addr x$ksmmem.addr%type := hextoraw('0E9A7838'); mem_ct number; mem_indx number; raw_val x$ksmmem.ksmmmval%type; indx_temp number; col_size number; raw_char varchar2(64) := null; begin select type_size into session_uso.col_size from v$type_size where type = 'SIZE_T' ; select ksmchptr, ksmchsiz into session_uso.mem_addr, session_uso.mem_ct from x$ksmpp where ksmchcom = 'Fixed Uga' ; select indx into session_uso.mem_indx from x$ksmmem where addr = session_uso.mem_addr ; dbms_output.put_line('Memory address: ' || rawtohex(session_uso.mem_addr)); dbms_output.put_line('Start location: ' || session_uso.mem_indx); dbms_output.put_line('Nominal Length: ' || session_uso.mem_ct); -- -- Assume that the 126.96.36.199 offset to USO is 2504 bytes -- And that the first t_type is a counter, then each -- value after that is a letter until we hit 00 for a -- maximum of 32 letters -- mem_indx := mem_indx + 2504/col_size; -- 188.8.131.52 on Windows XP 32-bit -- mem_indx := mem_indx + 1620/col_size; -- 10.2.0.3 on windows XP 32 bit for i in 1 .. 32 loop -- Have to do the silly addition, or -- we get a "tablescan" of x$ksmmem session_uso.indx_temp := session_uso.mem_indx + i; select --+ index(x$ksmmem) ksmmmval into session_uso.raw_val from x$ksmmem where indx = session_uso.indx_temp ; dbms_output.put_line(session_uso.raw_val); exit when session_uso.raw_val = hextoraw('00'); raw_char := rawtohex(session_uso.raw_val) || rawtohex(session_uso.raw_char); end loop; if session_uso.raw_char is null then session_uso.uso := 'use_stored_outlines is not set'; else for i in reverse 1..(length(session_uso.raw_char)-4) / 2 loop exit when substr(session_uso.raw_char, 2*i - 1, 2) = '00'; session_uso.uso := session_uso.uso || chr(to_number(substr(session_uso.raw_char, 2*i - 1, 2), 'XX')); end loop; end if; return session_uso.uso; end; /
Just as a quick demonstration – here’s a cut-n-paste (including the debug output) from a normal session after creating the function in the sys account of a 10.2.0.3 database and making the function publicly available through a synonym:
SQL> execute dbms_output.put_line(session_uso) Memory address: 06F373F0 Start location: 13118004 Nominal Length: 20620 00 use_stored_outlines is not set PL/SQL procedure successfully completed. SQL> alter session set use_stored_outlines = long_category; Session altered. SQL> execute dbms_output.put_line(session_uso) Memory address: 06F373F0 Start location: 13118004 Nominal Length: 20620 4F4C000D 435F474E 47455441 0059524F 00 LONG_CATEGORY PL/SQL procedure successfully completed. SQL>
December 9, 2010 (Back to the Previous Post in the Series) While reading the “Oracle Tuning the Definitive Reference Second Edition” book I found a handful of interesting suggestions regarding Oracle wait events. Previous articles on this blog have described the contents of 10046 trace files, and leveraged the contents of those files to explain [...]
December 8, 2010 (Back to the Previous Post in the Series) (Forward to the Next Post in the Series) While reading the “Oracle Tuning the Definitive Reference Second Edition” book I found a handful of interesting suggestions regarding Oracle wait events. Take 60 seconds to analyze the following quote from page 656: “There are three columns of [...]
Now that the "difficult" work is complete - preparing the data - we can move to building a predictive model to help identify and understand churn.
The case study suggests that separate models be built for different customer segments (high, medium, low, and very low value customer groups). To reduce the data to a single segment, a filter can be applied:
It is simple to take a quick look at the predictive aspects of the data on a univariate basis. While this does not capture the more complex multi-variate effects as would occur with the full-blown data mining algorithms, it can give a quick feel as to the predictive aspects of the data as well as validate the data preparation steps. Oracle Data Mining includes a predictive analytics package which enables quick analysis.
ATTRIBUTE_NAME ATTRIBUTE_SUBNAME EXPLANATORY_VALUE RANK
-------------------- ----------------- ----------------- ----------
LOS_BAND .069167052 1
MINS_PER_TARIFF_MON PEAK-5 .034881648 2
REV_PER_MON REV-5 .034527798 3
DROPPED_CALLS .028110322 4
MINS_PER_TARIFF_MON PEAK-4 .024698149 5
From the above results, it is clear that some predictors do contain information to help identify churn (explanatory value > 0). The strongest uni-variate predictor of churn appears to be the customer's (binned) length of service. The second strongest churn indicator appears to be the number of peak minutes used in the most recent month. The subname column contains the interior piece of the DM_NESTED_NUMERICALS column described in the previous post. By using the object relational approach, many related predictors are included within a single top-level column.
When building a predictive model, we need to ensure that the model is not over-trained. One way to do this is to retain a held-aside dataset and test the model on unseen data. Splitting the data into training and testing datasets is straightforward by using SQL predicates and a hash function:
create or replace view churn_test_high as
select * from churn_data_high where ora_hash(customer_id,99,0) >= 60;
create or replace view churn_train_high as
select * from churn_data_high where ora_hash(customer_id,99,0) < 60;
The above statements will separate the data into a 40% random sample for testing and the remaining 60% for training the model. We can now pass the training data into an Oracle Data Mining routine to create a mining model. In this example, we will use the GLM algorithm with automatic data preparation.
create table churn_set (setting_name varchar2(30), setting_value varchar2(4000));
insert into churn_set values ('ALGO_NAME','ALGO_GENERALIZED_LINEAR_MODEL');
insert into churn_set values ('PREP_AUTO','ON');
Now that we have built a model - CHURN_MOD_HIGH - we can test that model against the held-aside data to see how it performs.
select actual, predicted, count(*) cnt from (
select churn_m6 actual, prediction(churn_mod_high using *) predicted
group by actual, predicted;
The above query will show the number of correct and incorrect predictions for all combinations, often referred to as a confusion matrix.
Thus, without having to extract data or jump through hoops to massage star schema data into a flattened form, Oracle Data Mining is able to extract insight directly from the rich database data.
This post will follow the transformation steps as described in the case study, but will use Oracle SQL as the means for preparing data. Please see the previous post for background material, including links to the case study and to scripts that can be used to replicate the stages in these posts.
1) Handling missing values for call data records
The CDR_T table records the number of phone minutes used by a customer per month and per call type (tariff). For example, the table may contain one record corresponding to the number of peak (call type) minutes in January for a specific customer, and another record associated with international calls in March for the same customer. This table is likely to be fairly dense (most type-month combinations for a given customer will be present) due to the coarse level of aggregation, but there may be some missing values. Missing entries may occur for a number of reasons: the customer made no calls of a particular type in a particular month, the customer switched providers during the timeframe, or perhaps there is a data entry problem. In the first situation, the correct interpretation of a missing entry would be to assume that the number of minutes for the type-month combination is zero. In the other situations, it is not appropriate to assume zero, but rather derive some representative value to replace the missing entries. The referenced case study takes the latter approach. The data is segmented by customer and call type, and within a given customer-call type combination, an average number of minutes is computed and used as a replacement value.
In SQL, we need to generate additional rows for the missing entries and populate those rows with appropriate values. To generate the missing rows, Oracle's partition outer join feature is a perfect fit.
I have chosen to use a distinct on the CDR_T table to generate the set of values, but a more rigorous and performant (but less compact) approach would be to explicitly list the tariff-month combinations in the cdre inlined subquery rather than go directly against the CDR_T table itself.
Now that the missing rows are generated, we need to replace the missing value entries with representative values as computed on a per-customer-call type basis. Oracle's analytic functions are a great match for this step.
We can use the avg function, and specify the partition by feature of the over clause to generate an average within each customer-call type group. The nvl function will replace the missing values with the tailored, computed averages.
2) Transposing call data records
The next transformation step in the case study involves transposing the data in CDR_T from a multiple row per customer format to a single row per customer by generating new columns for all of the tariff-month combinations. While this is feasible with a small set of combinations, it will be problematic when addressing items with higher cardinality. Oracle Data Mining does not need to transpose the data. Instead, the data is combined using Oracle's object-relational technology so that it can remain in its natural, multi-row format. Oracle Data Mining has introduced two datatypes to capture such data - DM_NESTED_NUMERICALS and DM_NESTED_CATEGORICALS.
In addition, the case study suggests adding an attribute which contains the total number of minutes per call type for a customer (summed across all months). Oracle's rollup syntax is useful for generating aggregates at different levels of granularity.
as dm_nested_numericals) mins_per_tariff_mon from
(select cust_id, tariff, month, sum(mins) mins
One of the strengths of Oracle Data Mining is the ability to mine star schemas with minimal effort. Star schemas are commonly used in relational databases, and they often contain rich data with interesting patterns. While dimension tables may contain interesting demographics, fact tables will often contain user behavior, such as phone usage or purchase patterns. Both of these aspects - demographics and usage patterns - can provide insight into behavior.
Churn is a critical problem in the telecommunications industry, and companies go to great lengths to reduce the churn of their customer base. One case study1 describes a telecommunications scenario involving understanding, and identification of, churn, where the underlying data is present in a star schema. That case study is a good example for demonstrating just how natural it is for Oracle Data Mining to analyze a star schema, so it will be used as the basis for this series of posts.
The case study schema includes four tables: CUSTOMERS, SERVICES, REVENUES, and CDR_T. The CUSTOMERS table contains one row per customer, as does the SERVICES table, and both contain a customer id that can be used to join the tables together. Most data mining tools are capable of handling this type of data, where one row of input corresponds to one case for mining. The other two tables have multiple rows for each customer. The CDR_T (call data records) table contains multiple records for each customer which captures calling behavior. In the case study, this information is already pre-aggregated by type of call (peak, international, etc.) per month, but the information may also be available at a finer level of granularity. The REVENUES table contains the revenue per customer on a monthly basis for a five month history, so there are up to five rows per customer. Capturing the information in the CDR_T and REVENUES table to help predict churn for a single customer requires collapsing all of this fact table information into a single "case" per customer. Most tools will require pivoting the data into columns, which has the drawbacks of densifying data as well as pivoting data beyond column count limitations. The data in a fact table is often stored in sparse form (this case study aggregates it to a denser form, but it need not be this way for other mining activities), and keeping it in sparse form is highly desirable.
For fact table data that has a much larger number of interesting groups (such as per-product sales information of a large retailer), retaining the sparse format becomes critical to avoid densification of such high cardinality information. Oracle Data Mining algorithms are designed to interpret missing entries in a sparse fact table appropriately, enabling increased performance and simpler transformation processing.
Some steps in the referenced case study are not completely defined (in my opinion), and in those situations I will take my best guess as to the intended objective. This approximation is sufficient since the intent of this series of posts is to show the power and flexibility of Oracle Data Mining on a real-world scenario rather than to match the case study letter-for-letter.
The following files support reproduction of the results in this series of posts:
telcoddl.sql - SQL which creates the four tables
telcoloadproc.plb - Obfuscated SQL which creates the procedure that can generate data and populate the tables - all data is generated, and patterns are injected to make it interesting and "real-world" like
telcoprep.sql - A SQL create view statement corresponding to the data preparation steps from part 2 of this series
telcomodel.sql - A SQL script corresponding to the steps from part 3 of this series
In order to prepare a schema that can run the above SQL, a user must be created with the following privileges: create table, create view, create mining model, and create procedure (for telcoloadproc), as well as any other privs as needed for the database user (e.g., create session).Once the schema is prepared, telcoddl.sql and telcoloadproc.plb can be run to create the empty tables and the procedure for loading data. The procedure that is created is named telco_load, and it takes one optional argument - the number of customers (default 10000). The results from parts 2 and 3 of this series correspond to loading 10,000 customers.
The sample code in these posts has been tested against an 11gR2 database. Many new features have been added in each release, so some of the referenced routines and syntax are not available in older releases; however, similar functionality can be achieved with 10g. The following modified scripts can be used with 10g (tested with 10gR2):
telcoprep_10g.sql - A SQL create view statement corresponding to the data preparation steps from part 2 of this series, including substitution for the 11g PIVOT syntax and inclusion of manual data preparation for nested columns.
telcomodel_10g.sql - A SQL script corresponding to the steps from part 3 of this series, including substitution of the Generalized Linear Model algorithm for 10g Support Vector Machine, manual data preparation leveraging the transformation package, use of dbms_data_mining.apply instead of 10gR2 built-in data mining scoring functions, explicit commit of settings prior to build, and removal of the EXPLAIN routine from the script flow.
In addition, the create mining model privilege is not available in 10g.
The next part in this series will demonstrate how data in a star schema can be prepared for Oracle Data Mining. The final part in this series will mine the data by building and testing a model.
1 Richeldi, Marco and Perrucci, Alessandro, Churn Analysis Case Study, December 17, 2002.
December 8, 2010 (Forward to the Next Post in the Series) While reading the “Oracle Tuning the Definitive Reference Second Edition” book I found a handful of interesting suggestions regarding Oracle wait events. For instance, page 406 of the book states that: “Excessive db file sequential read waits on an OLTP database might indicate a suboptimal [...]