ASM:Shared Diskを実現するためのOracleコンポーネント
OEL:Oracle Enterprose Linux
CELLSRV:Cell Service Databaseとのコミュニケーション
MS:Management Service
RS:Restart Service
IORM:I/O Resource Manager
ASMがあるからOracle RACのShared Disk環境が構築できる。そして従来であれば高価なSANディスクアレイ装置を複数台用意しなければ実現できなかったI/O転送量、バンド幅を実現しているソフトウェア側の立役者的な存在といえる。
ASMと同様なオープンソースにSolaris ZFSがあるが、Oracleを使うのならASMの方がパフォーマンス面で有利。
ZFS機能概要:@IT記事より抜粋より:
Sun Flash Accelerator
HDDのキャッシュとして使われる
SSDの技術を使いHDDの転送量を上げている
Exadata Storage Server(Exadata Cell)には96GB(容量)x4枚=384GBのFlash Accelaratorが装備されている。
NANDメモリにはSAMSUNG製が使われ、コントローラには現時点で唯一6GbpsのSATA3接続が可能なMarvel社製が使われている。
同様のテクノロジー参考資料:
http://ichiroobata.blogspot.com/2010_07_01_archive.html
- Intel Xeonなどに代表される高性能なコアが作り出すあるいは必要とする並列計算データが膨大となり
- Gigabit EtherNetを従来通り用いた並列システムでは、ネットワークの通信能力を超える通信が発生し、多くの通信要求が待ち状態になって、並列処理がスムーズに進まない
- ネットワークが飽和して並列計算性能が伸びないという現象が、高速コンピュータでは少ない台数で発生し、低速コンピュータではより多くの台数を接続した時に発生してしまう
- そこでGigabit EtherNetの10倍以上の通信性能を持ち、高いバンドレートと低い遅延で、並列処理を実現する新しいネットワークファシリティ、InfiniBandが登場した
- 世界の並列コンピュータシステムをリストしたTop500 (http://www.top500.org) でも、InfiniBandが他を圧倒している
参考:株式会社コンカレントシステムズの記事より(http://www.concurrent.co.jp/products/option/infiniband.html)
http://ja.wikipedia.org/wiki/InfiniBandより抜粋
1998年後半、基幹業務に使用される各種サーバは、ムーアの法則に従い、ほぼ18カ月に2倍のペースで高速化されてきていた。しかし、サーバ内のバスアーキテクチャは機械・電気的制限により、その進化に追い付けず、コンピュータシステム性能向上のボトルネックとなっていた。
What tools do you use? What are your favorites and why? As someone who has been working Oracle performance tools for the last 15 years, I’m always interested in what the best tools are. Here is my list of tools that I’m aware of (what other tools are out there?)
| price | would I use it ? |
db perf | multi-db | sql tune | notes | ||
| Specialty | |||||||
| orasrp | free | yes | x | sql trace analyzer | |||
| 10053 Viewer | free | yes | x | replace .doc with .zip and unzip | |||
| xplan | free | yes | x | extend explain plan info from Alberto Dell Era | |||
| latchprof | free | yes | x | Tanel’s latch collection query | |||
| waitprof | free | yes | x | Tanel’s wait collection query | |||
| tvdxstat | free | yes | x | Trivadis Extended Tracefile Analysis Tool | |||
| spviewer | free | yes | x | this looks interesting for statspack/AWR viewing and comparison | |||
| awr formatter | free | yes | x | chrome plugin for AWR browsing – see screencast | |||
| snapper | free | yes | x | Tanel’s command line performance analysis | |||
| moats | free | yes | x | character mode top activity screen ! | |||
| hotsos tools | ?? | sql trace analyzer, other stuff | |||||
| DBA | |||||||
| Ashviewer | free | yes | collects ASH data itself or uses v$active_session_history – pretty cool |
||||
| ASHMon | free | yes | reads ASH data (requires license) unless SASH is installed and pointed to | ||||
| MyOra | free | x | x | Never tried it. Looks fast and covers broad functionalitybut UI looks busy | |||
| Mumbai | free | yes | x | integrates snapper and orasrp | |||
| Richmon | free | x | |||||
| Lab128 | $500/seat | yes | x | x | |||
| DB Optimizer | $1500/seat | yes | x | x | visual sql tuning diagrams | ||
| Quest Spotlight | $1000/cpu | x | x | x | |||
| Quest SQL Optimizer | $1690/seat | x | |||||
| Enterprise | |||||||
| Quest Performance Analyzer | $3000/cpu | yes | x | x | |||
| Oracle Diag Pack | $5000/cpu | yes | x | x | |||
| Oracle SQL Tuning Pack | $5000/cpu | x | x | ||||
| Confio ignite | $1500/core? | x | x | ||||
| Precise Indepth I3 | $2000/cpu? | x | x | ||||
| Monicle | ?? | ||||||
| other stuff | |||||||
| fourth elephant | |||||||
| DBtuna | |||||||
(some of the prices I got off of programmers paradise, others are word of mouth, so any clarifications on prices would be appreciated)
The tools marked “yes” are ones that I use or would use. I would use Quest’s Performance Analyzer given the chance. I’ve never used it but from the demos of the tool, I like what I see. I have never used Mumbai, but plan to and like again what I see in the demos.
All the other tools marked “yes” I use.
All the tools except “other stuff”, I consider reasonable tools for one job or another. Some are better than others in different ways. The “other stuff” tools I don’t see much point in.
I generally consider OEM with diagnostic pack too expensive and slow, but hey, if customers have it, I’ll use it ! (I had better like it as I designed the performance page and top activity page , though much of what I wanted didn’t make it in) Plus diag pack comes with v$active_session_history and all of AWR which I can query directly with OEM.
I tried to design “DB Optimizer” to be simple to install on my laptop and just point at databases and immediately start monitoring and tuning. The biggest advantage of “DB Optimizer” over other tools, for me, is the Visual SQL Tuning diagrams. Other nice features are being able to load test code by multiple concurrent sessions. With concurrent sessions running code examples, I can easily demonstrate issues that arise in the database with the profiling page.
But for absolute ease of install and speed, I have to hand it to Lab128. Lab128 runs fast, runs it’s own ASH collection (which doesn’t require diag pack) and has the option to read v$active_session_history as well.
Though Lab128 is fast, collects ASH info and can monitor multiple databases simultaneously for days on end, it still doesn’t allow me a real easily accessible centralized database of performance data. That’s why I created “S-ASH” which can collect ASH data from multiple databases into a central repository where I can run analytic queries or visualize it with ASHMon.
ASHMon and S-ASH are still rough pieces of code as I spent the last 3 years working solely on DB-Optimizer.
ASHViewer is a tool that will also read ASH data from Oracle and/or collect ASH on it’s own, which is awesome. ASHviewer can also be hooked up to S-ASH repository with some minor changes. Here are the minor changes (thanks to Marcin Przepiorowski)
SQL Baslines in 11g are the new Stored Outlines – and one of the nicest features of SQL Baselines is that you are allowed to fake them; or rather, it’s legal to generate an execution plan for one query and transfer its execution plan to another query using the packaged procedure dbms_spm.load_plans_from_cursor_cache(). This posting is a demonstration of the technique.
We start with a sample data set and a query that is going to do “the wrong thing”. As usual I’ve got a locally managed tablespace with 8KB blocks and 1MB uniform extents, freelist management, and I’m running with CPU Costing disabled (and running 11.1.0.6 in this case):
set serveroutput off
create table t1
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
rownum <= 10000
)
select
rownum id,
mod(rownum,1000) n1000,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 10000
;
create index t1_n1 on t1(n1000);
begin
dbms_stats.gather_table_stats(
ownname => user,
tabname =>'T1',
method_opt => 'for all columns size 1'
);
end;
/
select
/*+ target_query */
id
from
t1
where
n1000 = 500
;
The final query is the one that I want to fake a baseline for. With my current set up it does an index range scan to pick up 10 rows, but I’m going to make it do a tablescan instead. I’m going to need to pull the exact text of the query from memory in a moment, so I’ll find its sql_id and child_number by searching for the “pseudo-hint” that I’ve included in the text, and I’ll report the execution plan to show that I’ve picked up the right thing (I’m assuming that there’s only one piece of SQL that’s going to include the text “target_query”, of course):
column sql_id new_value m_sql_id_1
column plan_hash_value new_value m_plan_hash_value_1
column child_number new_value m_child_number_1
select
sql_id, plan_hash_value, child_number
from
v$sql
where
sql_text like '%target_query%'
and sql_text not like '%v$sql%'
and rownum = 1
;
select * from table(dbms_xplan.display_cursor('&m_sql_id_1',&m_child_number_1));
SQL_ID 306m98cpu9yz7, child number 0
-------------------------------------
select /*+ target_query */ id from t1 where n1000 = 500
Plan hash value: 1420382924
---------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 11 |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 10 | 80 | 11 |
|* 2 | INDEX RANGE SCAN | T1_N1 | 10 | | 1 |
---------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("N1000"=500)
The next step is to create a new query (and in a production system that might simply mean running a heavily hinted version of the target query) that uses the execution plan I want to see; and I’ll use the same search technique to find it and report its plan:
select
/*+ full(t1) alternate_query */
id
from
t1
where
n1000 = 500
;
column sql_id new_value m_sql_id_2
column plan_hash_value new_value m_plan_hash_value_2
column child_number new_value m_child_number_2
select
sql_id, plan_hash_value, child_number
from
v$sql
where
sql_text like '%alternate_query%'
and sql_text not like '%v$sql%'
and rownum = 1
;
select * from table(dbms_xplan.display_cursor('&m_sql_id_2',&m_child_number_2));
SQL_ID bvpb73bb6c6uy, child number 0
-------------------------------------
select /*+ full(t1) alternate_query */ id from t1 where n1000 = 500
Plan hash value: 3617692013
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 28 |
|* 1 | TABLE ACCESS FULL| T1 | 10 | 80 | 28 |
----------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("N1000"=500)
After been running and reporting these queries and their plans, I’ve captured the SQL_Id, child_number, and plan_hash_value for each query; and this is more than enough information to make it possible to create an SQL Baseline for one query using the execution plan for the other query.
declare m_clob clob; begin select sql_fulltext into m_clob from v$sql where sql_id = '&m_sql_id_1' and child_number = &m_child_number_1 ; dbms_output.put_line(m_clob); dbms_output.put_line( dbms_spm.load_plans_from_cursor_cache( sql_id => '&m_sql_id_2', plan_hash_value => &m_plan_hash_value_2, sql_text => m_clob, fixed => 'NO', enabled => 'YES' ) ); end; /
I used the SQL_ID and child_number from the first query to get the full SQL text of the query into an in-memory CLOB, and then use the SQL_id and plan_hash_value from the second query to associate the second plan with the first query – storing the result as a SQL Baseline that is enabled and ready for use.
You’ll have to take my word that I haven’t faked the following text, but this is what I get when I re-run the original query (flushing the shared pool first to make sure that I don’t accidentally end up picking up the original child cursor):
alter system flush shared_pool;
select
/*+ target_query */
id
from
t1
where
n1000 = 500
;
select * from table(dbms_xplan.display_cursor('&m_sql_id_1',null));
SQL_ID 306m98cpu9yz7, child number 1
-------------------------------------
select /*+ target_query */ id from t1 where n1000 = 500
Plan hash value: 3617692013
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 28 |
|* 1 | TABLE ACCESS FULL| T1 | 10 | 80 | 28 |
----------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("N1000"=500)
Note
-----
- cpu costing is off (consider enabling it)
- SQL plan baseline SYS_SQL_PLAN_3c0ea7f3dbd90e8e used for this statement
It’s so much easier than the fiddling one had to do for stored outlines which was quite easy in 8i, but got a little nasty in 9i because of the extra (undocumented) details that looked as if they might have been necessary when the third table appeared in the outln schema. However, in 10g, the dbms_outln package was enhanced to allow you to create outlines from the library cache – see this note from Randolf Geist for more details – but remember that stored outlines will be deprecated in Oracle 12.
Footnote: the dbms_spm package is created by script $ORACLE_HOME/rdbms/admin/dbmsspm.sql, and there is a comment near the top saying:
-- Grant the execute privilege on SPM package to public. -- -- But ADMINISTER SQL MANAGEMENT OBJECT privilege is checked -- -- before a user is allowed to execute. --
Ringing in the new year with a new look on wordpress. WordPress offers me better aesthetic possibilities and customization options than blogspot. Blogspot was great to get started fast. Fast believe it or not has more appeal to me than aesthetics. Aesthetics, graphics and design are super important to me, so once I have something working and have the time I then want to spend some time improving it’s look and feel, but really the technology, functionality and easy of creation are the first and foremost.
Thanks to Tanel Poder and Greg Rahn for the template idea and for tweaks in the wordpress theme. The theme was tweaked by Greg and then repurposed by Tanel who help me get a copy and then get started.
For plugins, I used
Oracle 11g has introduced a hell lot of new features related to Parallel Execution – like statement queuing, in-memory PX and auto-DOP, to name a few. There are also small bits of enhancements here and there – like default value of the PARALLEL_EXECUTION_MESSAGE_SIZE and an introduction of an easy way to stick PX slaves to a current RAC node with PARALLEL_FORCE_LOCAL.

Over the past few weeks several people have asked me about an Exadata article entitled “Making the Most of Oracle Exadata” by Marc Fielding of Pythian. Overall it’s an informative article and touches on many of the key points of Exadata, however, even though I read (skimmed is a much better word) and briefly commented on the article back in August, after further review I found some technical inaccuracies with this article so I wanted to take the time to clarify this information for the Exadata community.
Marc writes:
style="text-decoration: underline;">Smart scans: Smart scans are Exadata’s headline feature. They provide three main benefits: reduced data transfer volumes from storage servers to databases, CPU savings on database servers as workload is transferred to storage servers, and improved buffer cache efficiency thanks to column projection. Smart scans use helper processes that function much like parallel query processes but run directly on the storage servers. Operations off-loadable through smart scans include the following:
- Predicate filtering – processing WHERE clause comparisons to literals, including logical operators and most SQL functions.
- Column projection – by looking at a query’s SELECT clause, storage servers return only the columns requested, which is a big win for wide tables.
- Joins – storage servers can improve join performance by using Bloom filters to recognize rows matching join criteria during the table scan phase, avoiding most of the I/O and temporary space overhead involved in the join processing.
- Data mining model scoring – for users of Oracle Data Mining, scoring functions like PREDICT() can be evaluated on storage servers.
I personally would not choose a specific number of benefits from Exadata Smart Scan, simply stated, the design goal behind Smart Scan is to reduce the amount of data that is sent from the storage nodes (or storage arrays) to the database nodes (why move data that is not needed?). Smart Scan does this in two ways: it applies the appropriate column projection and row restriction rules to the data as it streams off of disk. However, projection is not limited to just columns in the SELECT clause, as Marc mentions, it also includes columns in the WHERE clause as well. Obviously JOIN columns need to be projected to perform the JOIN in the database nodes. The one area that Smart Scan does not help with at all is improved buffer cache efficiency. The reason for this is quite simple: Smart Scan returns data in blocks that were created on-the-fly just for that given query — it contains only the needed columns (projections) and has rows filtered out from the predicates (restrictions). Those blocks could not be reused unless someone ran the exact same query (think of those blocks as custom built just for that query). The other thing is that Smart Scans use direct path reads (cell smart table scan) and these reads are done into the PGA space, not the shared SGA space (buffer cache).
As most know, Exadata can easily push down simple predicates filters (WHERE c1 = ‘FOO’) that can be applied as restrictions with Smart Scan. In addition, Bloom Filters can be applied as restrictions for simple JOINs, like those commonly found in Star Schemas (Dimensional Data Models). These operations can be observed in the query execution plan by the JOIN FILTER CREATE and JOIN FILTER USE row sources. What is very cool is that Bloom Filters can also pass their list of values to Storage Indexes to aid in further I/O reductions if there is natural clustering on those columns or it eliminates significant amounts of data (as in a highly selective set of values). Even if there isn’t significant data elimination via Storage Indexes, a Smart Scan Bloom Filter can be applied post scan to prevent unneeded data being sent to the database servers.
Marc writes:
style="text-decoration: underline;">Storage indexes: Storage indexes reduce disk I/O volumes by tracking high and low values in memory for each 1-megabyte storage region. They can be used to give partition pruning benefits without requiring the partition key in the WHERE clause, as long as one of these columns is correlated with the partition key. For example, if a table has order_date and processed_date columns, is partitioned on order_date, and if orders are processed within 5 days of receipt, the storage server can track which processed_date values are included in each order partition, giving partition pruning for queries referring to either order_date or processed_date. Other data sets that are physically ordered on disk, such as incrementing keys, can also benefit.
In Marc’s example he states there is correlation between the two columns PROCESSED_DATE and ORDER_DATE where PROCESSED_DATE = ORDER_DATE + [0..5 days]. That’s fine and all, but to claim partition pruning takes place when specifying ORDER_DATE (the partition key column) or PROCESSED_DATE (non partition key column) in the WHERE clause because the Storage Index can be used for PROCESSED_DATE is inaccurate. The reality is, partition pruning can only take place when the partition key, ORDER_DATE, is specified, regardless if a Storage Index is used for PROCESSED_DATE.
Partition Pruning and Storage Indexes are completely independent of each other and Storage Indexes know absolutely nothing about partitions, even if the partition key column and another column have some type of correlation, as in Marc’s example. The Storage Index simply will track which Storage Regions do or do not have rows that match the predicate filters and eliminate reading the unneeded Storage Regions.
Marc writes:
style="text-decoration: underline;">Columnar compression: Hybrid columnar compression (HCC) introduces a new physical storage concept, the compression unit. By grouping many rows together in a compression unit, and by storing only unique values within each column, HCC provides storage savings in the range of 80 90% based on the compression level selected. Since data from full table scans remains compressed through I/O and buffer cache layers, disk savings translate to reduced I/O and buffer cache work as well. HCC does, however, introduce CPU and data modification overhead that will be discussed in the next section.
The Compression Unit (CU) for Exadata Hybrid Columnar Compression (EHCC) is actually a logical construct, not a physical storage concept. The compression gains from EHCC come from column-major organization of the rows contained in the CU and the encoding and transformations (compression) that can be done because of that organization (like values are more common within the same column across rows, vs different columns in the same row). To say EHCC only stores unique values within each column is inaccurate, however, the encoding and transformation algorithms use various techniques that yield very good compression by attempting to represent the column values with as few bytes as possible.
Data from EHCC full table scans only remains fully compressed if the table scan is not a Smart Scan, in which case the compressed CUs are passed directly up to the buffer cache and the decompression will then be done by the database servers. However, if the EHCC full table scan is a Smart Scan, then only the columns and rows being returned to the database nodes are decompressed by the Exadata servers, however, predicate evaluations can be performed directly on the EHCC compressed data.
Read more: Exadata Hybrid Columnar Compression Technical White Paper
Marc also writes:
Use columnar compression judiciously: Hybrid columnar compression (HCC) in Exadata has the dual advantages of reducing storage usage and reducing I/O for large reads by storing data more densely. However, HCC works only when data is inserted using bulk operations. If non-compatible operations like single-row inserts or updates are attempted, Exadata reverts transparently to the less restrictive OLTP compression method, losing the compression benefits of HCC. When performing data modifications such as updates or deletes, the entire compression unit must be uncompressed and written in OLTP-compressed form, involving an additional disk I/O penalty as well.
EHCC does require bulk direct path load operations to work. This is because the compression algorithms that are used for EHCC need sets of rows as input, not single rows. What is incorrect with Marc’s comments is that when a row in a CU is modified (UPDATE or DELETE), the entire CU is not uncompressed and changed to non-EHCC compression, only the rows that are UPDATED are migrated to non-EHCC compression. For DELETEs no row migrations take place at all. This is easily demonstrated by tracking ROWIDs as in the example at the bottom of this post.
Marc writes:
style="text-decoration: underline;">Flash cache: Exadata s flash cache supplements the database servers buffer caches by providing a large cache of 384 GB per storage server and up to 5 TB in a full Oracle Exadata Database Machine, considerably larger than the capacity of memory caches. Unlike generic caches in traditional SAN storage, the flash cache understands database-level operations, preventing large non-repeated operations such as backups and large table scans from polluting the cache. Since flash storage is nonvolatile, it can cache synchronous writes, providing performance benefits to commit-intensive applications.
While flash (SSD) storage is indeed non-volatile, the Exadata Smart Flash Cache is volatile – it loses all of its contents if the Exadata server is power cycled. Also, since the Exadata Smart Flash is currently a write-through cache, it offers no direct performance advantages to commit-intensive applications, however, it does offer indirect performance advantages by servicing read requests that would otherwise be serviced by the HDDs, thus allowing the HDDs to service more write operations.
Read more: Exadata Smart Flash Cache Technical White Paper
--
-- EHCC UPDATE example - only modified rows migrate
--
SQL> create table order_items1
2 compress for query high
3 as
4 select rownum as rnum, x.*
5 from order_items x
6 where rownum <= 10000;
Table created.
SQL> create table order_items2
2 as
3 select rowid as rid, x.*
4 from order_items1 x;
Table created.
SQL> update order_items1
2 set quantity=10000
3 where rnum in (1,100,1000,10000);
4 rows updated.
SQL> commit;
Commit complete.
SQL> select b.rnum, b.rid before_rowid, a.rowid after_rowid
2 from order_items1 a, order_items2 b
3 where a.rnum(+) = b.rnum
4 and (a.rowid != b.rid
5 or a.rowid is null)
6 order by b.rnum
7 ;
RNUM BEFORE_ROWID AFTER_ROWID
--------------- ------------------ ------------------
1 AAAWSGAAAAAO1aTAAA AAAWSGAAAAAO1aeAAA
100 AAAWSGAAAAAO1aTABj AAAWSGAAAAAO1aeAAB
1000 AAAWSGAAAAAO1aTAPn AAAWSGAAAAAO1aeAAC
10000 AAAWSGAAAAAO1aXBEv AAAWSGAAAAAO1aeAAD
--
-- EHCC DELETE example - no rows migrate
--
SQL> create table order_items1
2 compress for query high
3 as
4 select rownum as rnum, x.*
5 from order_items x
6 where rownum <= 10000;
Table created.
SQL> create table order_items2
2 as
3 select rowid as rid, x.*
4 from order_items1 x;
Table created.
SQL> delete from order_items1
2 where rnum in (1,100,1000,10000);
4 rows deleted.
SQL> commit;
Commit complete.
SQL> select b.rnum, b.rid before_rowid, a.rowid after_rowid
2 from order_items1 a, order_items2 b
3 where a.rnum(+) = b.rnum
4 and (a.rowid != b.rid
5 or a.rowid is null)
6 order by b.rnum
7 ;
RNUM BEFORE_ROWID AFTER_ROWID
--------------- ------------------ ------------------
1 AAAWSIAAAAAO1aTAAA
100 AAAWSIAAAAAO1aTABj
1000 AAAWSIAAAAAO1aTAPn
10000 AAAWSIAAAAAO1aXBEv
name="googleone_share_1" style="position:relative;z-index:5;float: right; margin-left: 10px;">
Recent comments
17 weeks 1 day ago
26 weeks 6 days ago
28 weeks 4 days ago
31 weeks 6 days ago
34 weeks 1 day ago
43 weeks 4 days ago
45 weeks 1 day ago
46 weeks 1 day ago
46 weeks 2 days ago
49 weeks 20 hours ago