Search

OakieTags

Who's online

There are currently 0 users and 32 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Notes on Learning MySQL (as an Oracle DBA)

This post originally appeared over at Pythian. There are also some very smart comments over there that you shouldn’t miss, go take a look!

I spent some time last month getting up to speed on MySQL. One of the nice perks of working at Pythian is the ability to study during the workday. They could have easily said “You are an Oracle DBA, you don’t need to know MySQL. We have enough REAL MySQL experts”, but they didn’t, and I appreciate.

So how does an Oracle DBA goes about learning MySQL?
Obviously you start by reading the docs. Specifically, I looked for the MySQL equivalent of the famous Oracle “Concepts Guide”.
Unfortunately, it doesn’t exist. I couldn’t find any similar overview of the architecture and the ideas behind the database. The first chapter of “High Performance MySQL” had a high level architecture review, which was useful but being just one chapter in a book, it lacked many of the details I wanted to learn. Peter Zaitsev’s “InnoDB Architecture” presentation had the kind of information I needed – but covered just InnoDB.

Thats really too bad because I definitely feel the lack – which I can easily tell you what Oracle does when you connect to a database, run a select, an update, commit or rollback – I can’t say the same about MySQL. So far I managed without this knowledge, but I have a constant worry that this will come back and bite me later.

Lacking a concepts guide, I read the documentation I had access to: Sheeri has nice presentations available for Pythian employees (and probably customers too. I’m not sure if she ever released them to the whole world). The official documentation is not bad either – it covers syntax without obvious errors and serves as a decent “how do I do X?” guide.

But reading docs is only half the battle. The easier half too. So I installed MySQL 5.1 on my Ubuntu from ready packages. Then I installed MySQL 5.5 from the tarball – which was not nearly as much fun, but by the time this worked I know much more about where everything is located and the various ways one can mis-configure MySQL.

Once the installation was successfull, I played a bit with users, schemas and databases. MySQL is weird – Schemas are called databases, users have many-to-many relation with databases. If a user logs in from a differnet IP, it is almost like a different user. If you delete all the data files and restart MySQL – it will create new empty data files instead. You can easily start a new MySQL server on the same physical box by modifying one file and creating few directories.

MySQL docs make a very big deal about storage engines. There are only 2 things that are important to rememeber though: MyISAM is non-transactional and is used for mysql schema (the data dictionary), it doesn’t have foreign keys or row level locks. InnoDB is transactional, has row level locks and is used everywhere else.

There are a confusing bunch of tools for backing up MySQL. MySQLDump is the MySQL equivalent of Export. Except that it creates a file full of the SQL commands required to recreate the database. These files can grow huge very fast, but it is very easy to restore from them, restore any parts of the schema or even modifying the data or schema before restoring.
XTRABackup is a tool for consistent backups of InnoDB schema (remember that in MyISAM there are no transactions so consistent backups is rather meaningless). It is easy to use – one command to backup, two commands to restore. You can do PITR of sorts with it, and you can restore specific data files. It doesn’t try to manage the backup policies for you the way RMAN does – so cleaning old backups is your responsibility.

Replication is considered a basic skill, not an advanced skill like in the Oracle world. Indeed once you know how to restore from a backup, setting up replication is trivial. It took me about 2 hours to configure my first replication in MySQL. I think in Oracle Streams it took me few days, and that was on top of years of other Oracle experience.

Having access to experienced colleagues who are happy to spend time teaching a newbie is priceless. I already mentioned Sheeri’s docs. Chris Schneider volunteered around 2 hours of his time to introduce me to various important configuration parameters, innoDB secrets and replication tips and tricks. Raj Thukral helped me by providing step by step installation and replication guidance and helping debug my work. I’m so happy to work with such awesome folks.

To my shock and horror, at that point I felt like I was done. I learned almost everything important there was to know about MySQL. It took a month. As an Oracle DBA, after two years I still felt like a complete newbie, and even today there are many areas I wish I had better expertise. I’m sure it is partially because I don’t know how much I don’t know, but MySQL really is a rather simple DB – there is less to tweak, less to configure, fewer components, less tools to learn.

Jonathan Lewis once said that he was lucky to learn Oracle with version 6, because back then it was still relatively simple to learn, but the concepts didn’t change much since so what he learned back then is still relevant today. Maybe in 10 years I’ll be saying the same about MySQL.

Joins

There are only three join mechanisms used by Oracle: merge join, hash join and nested loop join. Here’s an important thought: all three join methods are nested loop joins with different startup cost. Discuss. (I’ll be saying more about this in about a week’s time.) And while I’m asking questions: what’s the perfect tense of [...]

試作機でTPC-Hベンチマークテストを行う

前回の続きでKingstonのお買い得SSD4本で構築したRAID-0のTPC-Hベンチマークを行った。

初めは、1セッション、パラレル度=6でテスト:

そのときのディスク転送量は:
300-400MB/s程度しか出ていない。

CrystalDiskMarkで計った限界量は
1GBのReadで最高522MB/sを出しているのに、およそ120MB足りない。

そこで、4セッションにして、もう一度実行してみた:

限界転送量が出た。

522MB/sはすごい。でも、1セッションだと、パラレル度を増やしても、そこまで行かなかった。
Indexレンジスキャンなどが動かないように:
optimizer_index_cost_adj=1000;
と設定している。
それにTemporary sortも起こさないようにPGA_AGREGATE_TARGETも十分にとっている。
そしてCPUは合計で30%-70%程度しか消費していない。
そして、限界転送量を出し続けているにもかかわらず、qphは減速を繰り返し、最後は計測時間のタイムオーバーとなった:

  • Heavy SQLは全てdirect path read
  • 1セッションだと限界転送量は出ないのか?
  • 同時4セッションで失速する理由は?
  • 始めはqph性能を上げるのではなくdirect path readで522MB/sの限界転送量を出すことにこだわってみたい。

    waits event micro_sec sql
    7 direct path read 101175 select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
    from customer, orders, lineitem
    where o_orderkey in
    ( select l_orderkey
    from lineitem
    group by l_orderkey
    having sum(l_quantity) > 313)
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
    group by c_name, c_custkey
    , o_orderkey
    , o_orderdate
    , o_totalprice
    order by o_totalprice desc, o_orderdate
    4 direct path read 106464 select nation, o_year, sum(amount) as sum_profit
    from ( select n_name as nation, extract(year from o_orderdate) as o_year
    , l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
    from part, supplier, lineitem, partsupp, orders, nation
    where s_suppkey = l_suppkey
    and ps_suppkey = l_suppkey
    and ps_partkey = l_partkey
    and p_partkey = l_partkey
    and o_orderkey = l_orderkey
    and s_nationkey = n_nationkey
    and p_name like '%navy%') profit
    group by nation, o_year
    order by nation, o_year desc
    4 direct path read 111444 select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count
    , sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count
    from orders, lineitem
    where o_orderkey = l_orderkey
    and l_shipmode in ('SHIP', 'FOB')
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '1997-01-01'
    and l_receiptdate < date '1997-01-01' + interval '1' year
    group by l_shipmode
    order by l_shipmode
    4 direct path read 112964 select s_name, count(*) as numwait
    from supplier, lineitem l1, orders, nation
    where s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus ='F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists
    ( select * from lineitem l2
    where l2.l_orderkey = l1.l_orderkey
    and l2.l_suppkey <> l1.l_suppkey)
    and not exists
    ( select * from lineitem l3
    where l3.l_orderkey = l1.l_orderkey
    and l3.l_suppkey <> l1.l_suppkey
    and l3.l_receiptdate > l3.l_commitdate)
    and s_nationkey= n_nationkey
    and n_name = 'CHINA'
    group by s_name
    order by numwait desc, s_name
    9 direct path read 127263 select s_name, count(*) as numwait
    from supplier, lineitem l1, orders, nation
    where s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus = 'F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists
    ( select * from lineitem l2
    where l2.l_orderkey = l1.l_orderkey
    and l2.l_suppkey <> l1.l_suppkey)
    and not exists
    ( select * from lineitem l3
    where l3.l_orderkey = l1.l_orderkey
    and l3.l_suppkey <> l1.l_suppkey
    and l3.l_receiptdate > l3.l_commitdate)
    and s_nationkey = n_nationkey
    and n_name = 'EGYPT'
    group by s_name
    order by numwait desc, s_name
    5 direct path read 161197 select supp_nation, cust_nation, l_year, sum(volume) as revenue
    from
    ( select n1.n_name as supp_nation, n2.n_name as cust_nation
    , extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume
    from supplier, lineitem, orders, customer, nation n1, nation n2
    where s_suppkey = l_suppkey
    and o_orderkey = l_orderkey
    and c_custkey = o_custkey
    and s_nationkey = n1.n_nationkey
    and c_nationkey = n2.n_nationkey
    and ( (n1.n_name = 'JAPAN' and n2.n_name = 'ETHIOPIA')
    or (n1.n_name = 'ETHIOPIA' and n2.n_name = 'JAPAN'))
    and l_shipdate between date '1995-01-01'
    and date '1996-12-31') shipping
    group by supp_nation, cust_nation, l_year
    order by supp_nation, cust_nation, l_year
    22 direct path read 169514 select s_suppkey, s_name, s_address, s_phone, total_revenue
    from supplier, revenue0
    where s_suppkey = supplier_no
    and total_revenue =
    ( select max(total_revenue) from revenue0)
    order by s_suppkey

    SQL> select count(*),event,sum(TIME_WAITED) TIME_WAITED,SQL_TEXT
    from v$active_session_history a, v$sql b
    where sample_time > sysdate -1/144
    and event is not null
    and a.sql_id is not null
    and a.sql_id=b.sql_id
    group by event, SQL_TEXT
    order by TIME_WAITED;


    Partition-Wise Join of List-Partitioned Tables

    When two tables are equi-partitioned on their join keys, the query optimizer is able to take advantage of partition-wise joins. To make sure that the tables are equi-partitioned, as of Oracle Database 11g reference partitioning can be used. In fact, per definition, with reference partitioning all “related” tables have exactly the same partitioning schema. If [...]

    外部表 in SSD

    TPC-Hの環境をKingstonのお買い得SSD4本で構築したRAID-0に構築した。
    以前のCrystalDiskMarkの結果は以下のとおりだった(SATA直結のRAID-0、exFAT 512K):
    比較対象としたのは、Western Digital社のWD6000HLHX。SATAIII 6Gb/秒で接続可能な10,000 RPMハードドライブ
    CrystalDiskMarkの結果は以下のとおりだった:
    150万件のORDERSを外部表として使う:


    SQL> desc orders
    名前 NULL? 型
    ----------------------------------------- -------- ----------------------------
    O_ORDERDATE DATE
    O_ORDERKEY NOT NULL NUMBER
    O_CUSTKEY NOT NULL NUMBER
    O_ORDERPRIORITY CHAR(15)
    O_SHIPPRIORITY NUMBER
    O_CLERK CHAR(15)
    O_ORDERSTATUS CHAR(1)
    O_TOTALPRICE NUMBER
    O_COMMENT VARCHAR2(79)


    SQL> select count(*) from orders;

    COUNT(*)
    ----------
    1500000


    Parallel度は4でテストを行った:
    SQL> alter session force parallel query parallel 4;

    テストSQL:
    SQL> select * from orders where O_COMMENT like '%abcde%';

    内部表 外部表
    SSD 0.82秒 2.15秒
    HDD 3.08秒 2.74秒

    テストは、外部表を作成後マシンのリブートを行い、すべてのキャッシュをクリア後の1回目の結果を取った。

    結果Full Scanに於いて、
    SSDの内部表、通常のテーブル、が圧倒的に速い。
    HDD上では内部も外部も速度は変わらない。
    SSDの外部表がHDDの内部表よりも速い。

    150万件のORDERテーブルでもdirect path readでScanすると「いずれにせよ高速」だということが分かる。

    外部表作成SQLとテストSQLの実行計画:


    SQL> CREATE TABLE orders_xt
    2 ORGANIZATION EXTERNAL
    3 (
    4 TYPE ORACLE_DATAPUMP
    5 DEFAULT DIRECTORY UNLOADER_DATA
    6 LOCATION ( 'orders_xt.dmp' )
    7 )
    8 AS
    9 SELECT *
    10 FROM orders;
     
    内部表
    --------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
    --------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 75000 | 8203K| 568 (2)| 00:00:07| | | |
    | 1 | PX COORDINATOR | | | | | | | | |
    | 2 | PX SEND QC (RANDOM)| :TQ10000 | 75000 | 8203K| 568 (2)| 00:00:07| Q1,00 | P->S | QC (RAND) |
    | 3 | PX BLOCK ITERATOR | | 75000 | 8203K| 568 (2)| 00:00:07| Q1,00 | PCWC | |
    |* 4 | TABLE ACCESS FULL| ORDERS | 75000 | 8203K| 568 (2)| 00:00:07| Q1,00 | PCWP | |
    --------------------------------------------------------------------------------------------------------------
     

    外部表
    ------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
    ------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 20 | 2780 | 3 (0)| 00:00:01 | | | |
    | 1 | PX COORDINATOR | | | | | | | | |
    | 2 | PX SEND QC (RANDOM) | :TQ10000 | 20 | 2780 | 3 (0)| 00:00:01 | Q1,00 | P->S | QC (RAND) |
    | 3 | PX BLOCK ITERATOR | | 20 | 2780 | 3 (0)| 00:00:01 | Q1,00 | PCWC | |
    |* 4 | EXTERNAL TABLE ACCESS FULL| ORDERS_XT | 20 | 2780 | 3 (0)| 00:00:01 | Q1,00 | PCWP | |
    ------------------------------------------------------------------------------------------------------------------------

    最後に、
    TPC-HのORDERSを外部表とするのはリアリティがある。
    CSVで定期的に注文情報が更新されそのままDWHに反映されるシステムはスマートだ。

    New Outfit, New Style

    Back on June 17th WordPress 3.0 “Thelonious” was released and it offered up a handful of new features. Just a few days ago (July 29th) the 3.0.1 release went GA so I decided it was time to investigate what the new 3.0 ready themes had to offer. After looking through a handful of themes I decided to give the Magazine Basic theme a try for now. It offered a 1024 pixel wide layout and threaded comments; two of the features I was really looking for.  Feel free to share your comments: good, bad or otherwise.  Thanks! Here is a capture of the previous version just in case you don’t recall what it looked like (click for full size).

    Cloning Oracle Home from RAC to Stand-Alone

    This post originally appeared over at Pythian. There are also some very smart comments over there that you shouldn’t miss, go take a look!

    This should have been the easiest task on my todo list: Install Oracle 10.2.0.3 EE standalone on a new Linux RHEL 5 server, later to be used as a standby for a production RAC system. This means 2 lines of “runinstall -silent …”, less than 5 minutes of DBA work and maybe 20 minutes of waiting. I did not expect to spend over 5 hours doing this.

    Problems started when I discovered that I don’t have the 10.2.0.3 patchset and another patch that exists on production and should be installed on the standby. I had to wait for my Metalink credentials to be approved for this customer CSI before I could download the patches for them.

    “Why don’t you just clone the software from production?” asked a helpful colleague.

    Sounds like a great suggestion. I cloned Oracle software before and it is a simple process: tar $ORACLE_HOME, copy the tar file to the new server, untar, run the cloning script which will register the new home with the inventory, and you are done!

    In theory, at least.

    Here is what actually happened:

    1. Tar, copy, untar, script
    2. Ran OPatch to verify that the new oracle home is in the inventory and that I see the correct version and patches.
    3. OPatch is showing two nodes. Oops. I didn’t realize oracle home has information about the cluster – didn’t Oracle move the inventory elsewhere? Spend an hour looking for the cause of this.
    4. Found that the two nodes are mentioned in  $ORACLE_HOME/inventory/ContentsXML/oraclehomeproperties.xml
    5. Removed this file.
    6. Deattached Oracle Home to clean inventory without deleting the software.
    7. Ran the clone script again
    8. Yay! OPatch looks good now.
    9. Decided to create test database  to be extra sure everything is fine
    10. NETCA failed with linking error. Spent an hour figuring out why. Cursed a bit.
    11. Had to install libc-devel, 32 bit version. Too bad RDA didn’t catch this.
    12. Created test database, but SQLPLUS now fails with linking error.  More cursing. Wondered what I did to deserve this.
    13. libaio.so.1 was missing so I had to install the 64 bit version of libaio. Too bad RDA was silent about this as well.
    14. Couldn’t start the database because the database couldn’t find the cluster. Why was it even looking for a cluster? Spent an hour figuring out why. Ah, because I copied the software from a RAC server and it was linked as RAC database.
    15. Relinked everything with RAC_OFF option.
    16. Finally things are working. Too bad it is 8pm already.

    What I should have done: (I’m not sure if it is supported by Oracle, but at least it works)

    1. Double check that we have all RPMs.
    2. Tar, copy, untar
    3. remove $ORACLE_HOME/inventory/ContentsXML/oraclehomeproperties.xml
    4. run clone.pl: clone/bin/clone.pl ORACLE_HOME=/appl/oracle/product/10.2.0/db_1 ORACLE_HOME_NAME=OraDb10g_home1
    5. Relink as non-RAC:  make -f ins_rdbms.mk rac_off
    6. Verify with OPatch.
    7. Create test DB:
      netca /silent /responsefile ~/netca.rsp
      dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbName TST -sid TST -SysPassword xxx -SystemPassword xxxx -emConfiguration NONE -datafileDestination /appl/oracle/oradata  -storageType FS -characterSet WE8ISO8859P1 -nationalcharacterSet AL16UTF16 -memoryPercentage 40
    8. Go for a nice afternoon ride.

    I hope that I’m not the only DBA who always have to find the most difficult way to accomplish a task, and that this post will be useful to others. Perhaps the best piece of advice I can offer is to avoid this type of cloning in the first place.

    Exadata v2 Smart Scan Performance Troubleshooting article

    I finally finished my first Exadata performance troubleshooting article.

    This explains one bug I did hit when stress testing an Exadata v2 box, which caused smart scan to go very slow – and how I troubleshooted it:

    Thanks to my secret startup company I’ve been way too busy to write anything serious lately, but apparently staying up until 6am helped this time! :-) Anyway, maybe next weekend I can repeat this and write Part 2 in the Exadata troubleshooting series ;-)

    Enjoy! Comments are welcome to this blog entry as I haven’t figured out a good way to enable comments in the google sites page I’m using…

    Share/Bookmark

    書き込みキャッシュと読み込みキャッシュ

    1.書き込みキャッシュ
    Write Cacheがパフォーマンスに及ぼす影響は今回のテストで分かった。
    また、揮発性のWrite Cacheだけの書き込みがブロック障害の原因になる可能性もあるということも気になる。
    しかし、それを言っているときりがない。HDD自体にもWrite Cacheはあるのだから

    今回のディスク WD6000HLHX

    32 MB キャッシュ   600 GB,SATA 6 Gb/s,10,000 RPM

    同タイプでRAID用「高ストライピング前提」「少Write Cacheタイプ」のモデルもある
    WD3000BLFS  16 MB キャッシュ   300 GB,SATA 3 Gb/s,10,000 RPM
    →だから容量が少ない。だから安全性優先でWrite Cacheが少ない。

    2.読み込みキャッシュ

    ハイブリッド型RAIDカードタイプ
    Adaotec社HPより

    アダプテックMaxIQ SSD キャッシュ - 8倍のIO/秒
    アダプテックMaxIQ SSDキャッシュソフトウェアは、最高のリードパフォーマンスを発揮するために、読み出し頻度の高い(「ホット」)データを見分けて、このデータをSSDキャッシュ内にコピーする特許出願中の「ラーンドパス」アルゴリズムを導入しています。

    SSD in HDDタイプ  シーゲイト Momentus® XT ドライブ
    シーゲートHPより

    Adaptive Memory テクノロジは、頻繁に使用されるアプリケーションとデータ・ファイルを知的にモニタし、それらをドライブのソリッド・ステート部分に配置して、素早く呼び出しできるようにします。

    Flash PCI Express Cardタイプ(以前紹介したioXtream)

    「第4回 Oracle ExadataによるDWH高速化」より

    ほとんどのデータを、低コストなストレージに格納しておき、利用頻度の高いデータのみを透過的にFlashメモリに移動します。このキャッシング・アルゴリズムは、現時点では公開されておりませんが、Read効率を求めないREDOログ・ファイルなどに対しては、データのキャッシングは行いません。キャッシュすべきオブジェクトをアプリケーションから明示的に設定することも可能です。

    select count(*) ...????

    最後に、

    2TBのHDDは通常64MBの大きなWrite Cacheが装備されている。
    そのため、停電によるブロック障害の可能性が高くなる。
    だからSun Flash Cardの中央に白いバッテリーが見える。
    そしてExadata(SATA版)では168玉X64MB=1GBのWrite Cacheが書き込み性能を支えている。

    14 Minute video overview of DB Optimizer

    There are a lot of powerful features in DB Optimizer so I put together a quick 14 minute video to go over them:

    1. Profiler - view database load, find top SQL and send SQL to SQL Tuner for tuning
    2. Tuner - supply a SQL statement and have DB Optimizer's SQL Tuner automatically tune the query
    3. Editor - write and debug SQL code with code assistance and quick fixes
    4. Load Editor - test SQL code with multiple concurrent users


    DB Optimizer XE product demo from Kyle Hailey on Vimeo.