Search

OakieTags

Who's online

There are currently 0 users and 37 guests online.

Recent comments

Affiliations

August 2010

Joins – NLJ

This is part one of my thesis that “all joins are nested loop joins – it’s just the startup overheads that vary”; there will be a note on “Joins – HJ” and “Joins – MJ” to follow. In some ways, the claim is trivially obvious – a join simply takes two row sources and compares [...]

A Study in Tweeting

I follow @oracledatabase on Twitter for obvious reasons. They tweeted a “case study” last week on the use of Advanced Compression to save money. You can find the case study here The end customer migrated from MSSQL to Oracle for a low terabytes size datawarehouse. Unfortunately we don’t get details of the old hardware or setup, [...]

optimizer_index_cost_adj

ある二つのディスクを性能比較してみると:

Sequential Read(Full Scan) 312.3 151.3
Random Read(Index Scan) 22.6 0.988
Sequential/Random 14 153

左右を比べるとSequential Read性能が優勢(当然)。
左は圧倒的にSequential Read性能が優れているのでFull Scanが有利。
右はRandom Read性能が悪すぎるのでFull Scanが有利。
左のRandom Read性能は右より20倍優れているのでIndex Scanが20倍速い。

だから、左にはIndex Range Scanを使わせたい:
optimizer_index_cost_adj < 100

右はIndex Range ScanよりFull Scanを選択する可能性を高めたい:
optimizer_index_cost_adj > 100

因みに、Exadata1のTPC-Hベンチマークでは:
optimizer_index_cost_adj = 200 を設定している。

Optimizerが常にパーフェクトな結果を出さない理由のひとつにディスクI/O性能の環境依存が挙げられる。

話は変わって、
今日は成田までKyle Haileyさんを迎えにいってきました。
彼はエンバカデロに勤めるOptimizerのスペシャリストです。
ホテルまでの車中、僕が最近はまりまくっているNodes Parallel QueryやIn-Memory DBの話をしました。

ついでに秋葉原のメイドカフェにも行ってみました。
女性客もたくさん来てる。普通です。普通に仕事の話とかして、DBオタクの話も、、、、
、、、これで僕も、ついに本物の「OTAKU」になれました。

Strip SizeとAllocation Unit Size

前回のテストではI/O buffer sizeを最適化することによりdb_file_multiblock_read_count = 64でも安定するようになった。そして、全体を通してみると、この64の結果が一番よい:
db_file_multiblock_read_count = 64, _db_file_direct_io_count = 524288

db_file_multiblock_read_count = 128, _db_file_direct_io_count = 1048576画面右上に出ている数値は最後のサンプリングタイミングでの数値。平均ではない。

今回の試作機はKingstonのお買い得SSD4本でRAID-0を構築した。

そのときのStrip Sizeは128KBとした。

このサイズはIntel Matrix Storage ManagerでRAID0を設定するときの最大値だ。
そして、exFAT形式のファイルをフォーマットするときに、

Allocation Unit Sizeを512KBとした。128KB x 4本 = 512KB

その結果

Time Model Bug

Tasks that are performed via jobs in the database will be double accounted in the system time model that has been introduced with Oracle 10g.

So if you execute significant workload via DBMS_JOB or DBMS_SCHEDULER any system time model related statistic like DB Time, DB CPU etc. that gets recorded for that workload gets double accounted.

This bug is not particularly relevant since your top workloads will still be the same top workloads, because all other statistics (like Elapsed Time, CPU, Buffer Gets etc.) are not affected by the bug.

I mention it only here since the bug (see below for details) as of the time of writing can't yet be found on My Oracle Support in the bug database but I recently came across several AWR reports where the majority of workload was generated via job processes and therefore the time model statistics were effectively doubled.

It might help as a viable explanation if you sometimes wonder why an AWR or Statspack report only captures 50% or less of the recorded total DB Time or DB CPU and where this unaccounted time has gone. If a significant part of the workload during the reporting period has been performed by sessions controlled via DBMS_JOB or DBMS_SCHEDULER then probably most of the unaccounted time is actually not unaccounted but the time model statistics are wrong.

So if you have such an otherwise unexplainable unaccounted DB Time / DB CPU etc. you might want to check if significant workload during the reporting period was executed via the job system. Note that I don't say that this is the only possible explanation of such unaccounted time - there might be other reasons like uninstrumented waits, other bugs etc.

Of course all the percentages that are shown in the AWR / ADDM / Statspack reports that refer to "Percentage of DB Time" or "Percentage of DB CPU" will be too small in such cases.

If the majority of workload during the reporting period has been generated by jobs then you can safely assume that the time model statistics have to be divided by 2 (and the percentages have to be doubled). If you have a mixture of jobs and regular foreground sessions then it will be harder to derive the correct time model statistics.

Note that the "Active Session History" (ASH) is not affected by the bug - the ASH reports always were consistent in my tests regarding the DB Time (respectively the number of samples) and CPU time information.

The following simple test case can be used to reproduce the issue at will. Ideally you should have exclusive access to the test system since any other concurrent activity will affect the test results.

You might want to check the 1000000000 iterations of the simple PL/SQL loop on your particular CPU - on my test system this takes approx. 46 seconds to complete.

The first version assumes that a PERFSTAT user with an installed STATSPACK is present in the database since STATSPACK doesn't require an additional license. An AWR variant follows below.

alter session set nls_language = american nls_territory = america;

store set .settings replace

set echo on timing on define on

define iter="1000000000"

variable snap1 number

exec :snap1 := statspack.snap

declare
n_cnt binary_integer;
begin
n_cnt := 0;
for i in 1..&iter loop
n_cnt := n_cnt + 1;
end loop;
end;
/

variable snap2 number

exec :snap2 := statspack.snap

/* Uncomment this if you want to test via DBMS_JOB
variable job_id number

begin
dbms_job.submit(:job_id, '
declare
n_cnt binary_integer;
n_status integer;
begin
n_cnt := 0;
for i in 1..&iter loop
n_cnt := n_cnt + 1;
end loop;
n_status := dbms_pipe.send_message(''bg_job_complete'');
end;
');
end;
/

commit;
*/

/* Uncomment this if you want to test via DBMS_SCHEDULER */
begin
dbms_scheduler.create_job(
job_name => dbms_scheduler.generate_job_name
, job_type => 'PLSQL_BLOCK'
, job_action => '
declare
n_cnt binary_integer;
n_status integer;
begin
n_cnt := 0;
for i in 1..&iter loop
n_cnt := n_cnt + 1;
end loop;
n_status := dbms_pipe.send_message(''bg_job_complete'');
end;
' , enabled => true);
end;
/

declare
pipe_status integer;
begin
pipe_status := dbms_pipe.receive_message('bg_job_complete');
end;
/

declare
pipe_id integer;
begin
pipe_id := dbms_pipe.remove_pipe('bg_job_complete');
end;
/

variable snap3 number

exec :snap3 := statspack.snap

rem set heading off pagesize 0 feedback off linesize 500 trimspool on termout off echo off verify off

prompt Enter PERFSTAT password

connect perfstat

column dbid new_value dbid noprint

select dbid from v$database;

column instance_number new_value inst_num noprint

select instance_number from v$instance;

column b_id new_value begin_snap noprint
column e_id new_value end_snap noprint

select :snap1 as b_id, :snap2 as e_id from dual;
define report_name=sp_foreground.txt

@?/rdbms/admin/sprepins

column dbid new_value dbid noprint

select dbid from v$database;

column instance_number new_value inst_num noprint

select instance_number from v$instance;

column b_id new_value begin_snap noprint
column e_id new_value end_snap noprint

select :snap2 as b_id, :snap3 as e_id from dual;
define report_name=sp_background.txt

@?/rdbms/admin/sprepins

undefine iter

@.settings

set termout on

Here is the same test case but with AWR reports (requires additional diagnostic license)

alter session set nls_language = american nls_territory = america;

store set .settings replace

set echo on timing on define on

define iter="1000000000"

column snap1 new_value awr_snap1 noprint

select dbms_workload_repository.create_snapshot as snap1 from dual;

declare
n_cnt binary_integer;
begin
n_cnt := 0;
for i in 1..&iter loop
n_cnt := n_cnt + 1;
end loop;
end;
/

column snap2 new_value awr_snap2 noprint

select dbms_workload_repository.create_snapshot as snap2 from dual;

/* Uncomment this if you want to test via DBMS_JOB
variable job_id number

begin
dbms_job.submit(:job_id, '
declare
n_cnt binary_integer;
n_status integer;
begin
n_cnt := 0;
for i in 1..&iter loop
n_cnt := n_cnt + 1;
end loop;
n_status := dbms_pipe.send_message(''bg_job_complete'');
end;
');
end;
/

commit;
*/

/* Uncomment this if you want to test via DBMS_SCHEDULER */
begin
dbms_scheduler.create_job(
job_name => dbms_scheduler.generate_job_name
, job_type => 'PLSQL_BLOCK'
, job_action => '
declare
n_cnt binary_integer;
n_status integer;
begin
n_cnt := 0;
for i in 1..&iter loop
n_cnt := n_cnt + 1;
end loop;
n_status := dbms_pipe.send_message(''bg_job_complete'');
end;
' , enabled => true);
end;
/

declare
pipe_status integer;
begin
pipe_status := dbms_pipe.receive_message('bg_job_complete');
end;
/

declare
pipe_id integer;
begin
pipe_id := dbms_pipe.remove_pipe('bg_job_complete');
end;
/

column snap3 new_value awr_snap3 noprint

select dbms_workload_repository.create_snapshot as snap3 from dual;

set heading off pagesize 0 feedback off linesize 500 trimspool on termout off echo off verify off

spool awr_foreground.html

select
output
from
table(
sys.dbms_workload_repository.awr_report_html(
(select dbid from v$database)
, (select instance_number from v$instance)
, &awr_snap1
, &awr_snap2
)
);

spool off

spool awr_background.html

select
output
from
table(
sys.dbms_workload_repository.awr_report_html(
(select dbid from v$database)
, (select instance_number from v$instance)
, &awr_snap2
, &awr_snap3
)
);

spool off

spool awr_diff.html

select
output
from
table(
sys.dbms_workload_repository.awr_diff_report_html(
(select dbid from v$database)
, (select instance_number from v$instance)
, &awr_snap1
, &awr_snap2
, (select dbid from v$database)
, (select instance_number from v$instance)
, &awr_snap2
, &awr_snap3
)
);

spool off

undefine awr_snap1
undefine awr_snap2
undefine awr_snap3

undefine iter

column snap1 clear
column snap2 clear
column snap3 clear

@.settings

set termout on

And here is a sample snippet from a generated Statspack report on a single CPU system with nothing else running on the system:

Normal foreground execution:

STATSPACK report for

Database DB Id Instance Inst Num Startup Time Release RAC
~~~~~~~~ ----------- ------------ -------- --------------- ----------- ---
orcl112 1 05-Aug-10 08:21 11.2.0.1.0 NO

Host Name Platform CPUs Cores Sockets Memory (G)
~~~~ ---------------- ---------------------- ----- ----- ------- ------------
XXXX Microsoft Windows IA ( 1 0 0 2.0

Snapshot Snap Id Snap Time Sessions Curs/Sess Comment
~~~~~~~~ ---------- ------------------ -------- --------- ------------------
Begin Snap: 13 05-Aug-10 08:34:17 25 1.2
End Snap: 14 05-Aug-10 08:35:05 25 1.2
Elapsed: 0.80 (mins) Av Act Sess: 1.1
DB time: 0.87 (mins) DB CPU: 0.80 (mins)

Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 104M Std Block Size: 8K
Shared Pool: 128M Log Buffer: 6,076K

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
DB time(s): 1.1 2.4 0.09 3.99
DB CPU(s): 1.0 2.2 0.08 3.68

Execution via Job/Scheduler:

STATSPACK report for

Database DB Id Instance Inst Num Startup Time Release RAC
~~~~~~~~ ----------- ------------ -------- --------------- ----------- ---
orcl112 1 05-Aug-10 08:21 11.2.0.1.0 NO

Host Name Platform CPUs Cores Sockets Memory (G)
~~~~ ---------------- ---------------------- ----- ----- ------- ------------
XXXX Microsoft Windows IA ( 1 0 0 2.0

Snapshot Snap Id Snap Time Sessions Curs/Sess Comment
~~~~~~~~ ---------- ------------------ -------- --------- ------------------
Begin Snap: 14 05-Aug-10 08:35:05 25 1.2
End Snap: 15 05-Aug-10 08:35:53 24 1.3
Elapsed: 0.80 (mins) Av Act Sess: 1.9
DB time: 1.55 (mins) DB CPU: 1.54 (mins)

Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 104M Std Block Size: 8K
Shared Pool: 128M Log Buffer: 6,076K

Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
DB time(s): 1.9 92.8 0.79 7.74
DB CPU(s): 1.9 92.1 0.78 7.68

As you might have guessed my single CPU test system has not been added a second CPU when performing the same task via DBMS_SCHEDULER / DBMS_JOB yet the time model reports (almost) 2 DB Time / DB CPU seconds and active sessions per second in that case.

I have reproduced the bug on versions 10.2.0.4, 11.1.0.7 and 11.2.0.1 but very likely all versions supporting the time model are affected.

A (non-public) bug "9882245 - DOUBLE ACCOUNTING OF SYS MODEL TIMINGS FOR WORKLOAD RUN THROUGH JOBS" has been filed for it, but the fix is not available yet therefore as far as I know it is not yet part of any available patch set / PSU.

Note that there seems to a different issue with the DB CPU time model component: If you have a system that reports more CPUs than sockets (for example a Power5, Power6 or Power7 based IBM server that reports 16 sockets / 32 CPUs) then the DB CPU component gets reduced by approximately 50%, which means it is divided by 2.

This means in combination with above bug that you end up with a doubled DB Time component for tasks executed via jobs, but the DB CPU time model component is in the right ballpark since the doubled DB CPU time gets divided by 2.

I don't know if the bug fix also covers this issue, so you might want to keep this in mind when checking any time model based information.

Exadata has arrived, part 1

At the 5th of august, the database machine (half rack) has arrived at VX Company. The machine arrives wrapped (in plastic) on a pallet.

Because the whole rack is already assembled, which means all the parts are already in the rack, the machine is quite heavy (600 kilograms I’ve been told). After some pushing, pulling and lifting together, we arrived at our serverroom.

Next is the hardware configuration and attaching the network cables from the database machine to our network.

After the cables have been mounted and checked, the power is turned on, and all the database (4) and storage servers (7) are turned on!

The 9th of august we carry on configuring the machine with Oracle ‘ACS’ (Advanced Customer Support), to configure the database and storage servers.

Tagged: oracle database machine exadata

Hacking Oracle over the web and exploiting Database Vault

The BlackHat USA event in Caesars Palace Las vegas was on at the end of July and now the papers have been put up on the BlackHat site. I saw that Sumit Siddharth had posted his slides a couple of....[Read More]

Posted by Pete On 06/08/10 At 06:00 PM

Looks simple but ...

(open images in a new window to see full size)
I'm really enjoying DB Optimizers simple drill down into SQL statements that hide complexity, for example this query looks pretty simple:

but a simple double click into "BIG_STATEMENT2" shows me what's really going on:

I am an ACE Director!

Yesterday I received a mail from the Oracle ACE team (Victoria and Lillian) telling me I was nominated for ACE Director, and that the nomination was accepted.

This means I am an ACE Director now!

Thanks to the Oracle ACE team and everybody who was involved and everybody who congratulated me!

Tagged: oracle ace aced director

Michigan OakTable Symposium

I'm excited to be speaking about Visual SQL Tuning at Michigan OakTable Symposium. The symposium sounds awesome with many of my favorite Oracle experts. Looks to be the mid-year northern version of HOTSOS.
"The Michigan OakTable Symposium will take place September 16-17th, 2010, in Ann Arbor, Michigan. The Michigan OakTable Symposium brings together a world-class list of speakers on a variety of Oracle database subjects, with a focus on DBA and development subjects.
Due to the size of the conference hall, this event is strictly limited to 300 attendees! This provides a unique opportunity for easy access to all speakers, with plenty of opportunities for one-on-one interaction, discussion, and networking, with the leading experts in Oracle database administration and development.
A few of the confirmed speakers are:
ACE Directors - Doug Burns, Alex Gorbachev, Jonathan Lewis, Cary Millsap, Mogens Norgaard, Tanel Poder
ACE's - Randolph Geist, Tim Gorman, Marco Gralike, Joze Senegacnik, Riyaj Shamsudeen, Chen Shapira"