Search

OakieTags

Who's online

There are currently 0 users and 27 guests online.

Recent comments

Affiliations

hadoop

Big Data is the Commercial Supercomputing in the Age of Datafication

NERSC's Hopper NERSC; Design: Caitlin Youngquist/LBNL Photo: Roy Kaltschmidt/LBNL

NERSC’s Hopper NERSC; Design: Caitlin Youngquist/LBNL Photo: Roy Kaltschmidt/LBNL

Oracle Big Data Appliance Delivery Day

It’s been about two and a half years since Enkitec took delivery of our first Exadata. (I blogged about it here: Weasle Stomping Day) Getting our hands on Exadata was very cool for all of us geeks. A lot has changed since then, but we’re still a bunch of geeks at heart and so this week we indulged our geekdom once again with the delivery of our Big Data Appliance (BDA). In case you haven’t heard about it, Oracle has released an engineered system that is designed to host “Big Data” (which is not my favorite term, but I’ll have to save that for some other time).  The Hadoop ecosystem has taken off in the last couple of years and this is Oracle’s initial foray into the arena.

Exadoop

We started on an interesting mad scientist kind of project a couple of days ago.

One of our long time customers bought an Exadata last month. They went live with one system last week and are in the process of migrating several others. The Exadata has an interesting configuration. The sizing exercise done prior to the purchase indicated a need for 3 compute nodes, but the data volume was relatively small. In the end, a half rack was purchased and all four compute nodes were licensed, but 4 of the 7 storage servers were not licensed. So it’s basically a half rack with only 3 storage servers.

Linux 6 Transparent Huge Pages and Hadoop Workloads

This past week I spent some time setting up and running various Hadoop workloads on my CDH cluster. After some Hadoop jobs had been running for several minutes, I noticed something quite alarming — the system CPU percentages where extremely high.

Platform Details

This cluster is comprised of 2s8c16t Xeon L5630 nodes with 96 GB of RAM running CentOS Linux 6.2 with java 1.6.0_30. The details of those are:

OOW 2011 – NoSQL Databases and Oracle Database Environments

I am currently at a presentation of Patrick Schwanke, Quest Germany, regarding easy and high speed connect between NoSQL and Oracle Databases. Not really what I planned but as mentioned by Alex Nuijten in an earlier post, unstructured data and it’s handling is gaining ground, so I thought it would a good start do start …

Continue reading »

Oracle Big Data Appliance — Oracle’s Bold Move Into Big Data Space

Oracle Big Data Appliance (BDA) is being announced at the Oracle OpenWorld keynote as I’m posting this. It will take some time for it to be actually available for shipment and some details will likely change but here is what we have so far about Oracle Big Data Appliance. A rack with InfiniBand, full of [...]

Oracle Big Data Appliance — What’s Cooking?

Many analysts are suggesting that a big data appliance will be announced at this OOW. Based on published Oracle OpenWorld focus sessions on oracle.com (PDF documents), the following technologies will most likely be the key — Hadoop, NoSQL, Hadoop data loader for Oracle, R Language. Want more details — you have to wait for them. [...]

Shared Nothingだとhavingはどうするの?

前回のテストでShared NothingのHadoop/HiveとShared EverythingのOracleの比較をした。結果はOracleの圧勝で、計算によると100台のhadoop環境でやっとOracle SE並みのスピードとなる。

I/Oの最適化など内部的な問題が大きいのだが、shared nothing環境で致命的に実現できないHAVING句に関して今回のテストでは以下のように書き換えている:

Oracleの場合
select ps_partkey,sum(ps_supplycost * ps_availqty) as value
from partsupp, supplier, nation
where ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'INDIA'
group by ps_partkey having

Hadoop/HiveでTPC-H

「Peta Byteを超えるデータ量をスキャンする」のに1台のサーバでは無理がある。
だから、Hadoopということになる。でも「100台のサーバを揃えてテストをする」なんて趣味の範囲を超えてしまうのでできない。
取りあえず、以下の構成4台で、データ量も32GBにしてOracleと比較してみる。

OSもCentOS 5.5 (x86-64)にして、今回使っているAMD Phenom II X6 1100T Black Edition BOX(3.3 GHz/6 core)1台でDOP=6のOracleのパフォーマンスを見てみると: