Who's online

There are currently 0 users and 41 guests online.

Recent comments



Parameterizing Hive Actions in Oozie Workflows

Very common request I get from my customers is to parameterize the query executed by a Hive action in their Oozie workflow.
For example, the dates used in the query depend on a result of a previous action. Or maybe they depend on something completely external to the system – the operator just decides to run the workflow on specific dates.

There are many ways to do this, including using EL expressions, capturing output from shell action or java action.
Here’s an example of how to pass the parameters through the command line. This assumes that whoever triggers the workflow (Human or an external system) has the correct value and just needs to pass it to the workflow so it will be used by the query.

Here’s what the query looks like:

I/O Benchmarking tools

This blog post will  be a place to park ideas and experiences with I/O benchmark tools and will be updated  on an ongoing basis.

Please feel free to share your own experiences with these tools or others in the comments!

There are a number of tools out there to do I/O benchmark testing such as

  • fio
  • IOZone
  • bonnie++
  • FileBench
  • Tiobench
  • orion

My choice for best of breed is fio
(thanks to Eric Grancher for suggesting fio).


The cost of Oracle

It’s not uncommon for people on one hand to expound the functionality, performance and features of Oracle, whilst on the other hand, lament the potential high cost of the product.

I’m not pontificating here – I’m commonly one of these people.  So much good stuff in Oracle….yet so much to pay to get that good stuff :-(

So in the interests of fairness, I thought I’d share a little story where an Oracle solution was implemented with total expenditure of: ZERO

When the Oracle wait interface isn’t enough

Oracle has done a great job with the wait interface. It has given us the opportunity to profile the time spend in Oracle processes, by keeping track of CPU time and waits (which is time spend not running on CPU). With every new version Oracle has enhanced the wait interface, by making the waits more detailed. Tuning typically means trying to get rid of waits as much as possible.

Tech13 Agenda

It’s coming up to the time when I have to think about which presentations to go to at UKOUG Tech13 – always difficult to decide whether to see topics I’m familiar with to find out how much I didn’t know, or whether to see topics which I don’t know to get some sort of intelligent briefing. Here’s my starting thought:

Super Sunday:
12:30 Me, on compression (index, basic and OLTP – not HCC)
13:40 Tony Hasler: “Why does the optimizer sometimes get the plan wrong”
15:00 Kyle Hailey: “Oracle transaction locks and analysis”
16:00 Neil Chandler: “10046 trace – powerful, or pointless in the real world”

Once you’ve done your I/O…there’s still more to do !

The world is obsessed with I/O nowadays….

This is understandable – we’re in the middle of a pioneering period for I/O – flash, SSD, MLC, SLC, with ever more sophisticated transport mechanisms – infiniband, and the like.

But don’t forget, that once you get those blocks back to Oracle, you need to “consume” them, ie, get those rows and get that data…

And that’s not free !

For example, lets look at two tables, both 500 megabytes, so the I/O cost to consume them is thereabouts the same.

The first one has ~50byte rows.

Data Simplicity

Complexity costs us a lot. Managing data in databases is a big chunk of that cost. Applications voraciously consume ever-larger quantities of data, driving storage spend and increased IT budget scrutiny. Delivering application environments is already so complex that the teams of experts dedicated to that delivery demand many control points, and much coordination. The flood of data and the complex delivery process makes delivery of environments slower and more difficult, and can lengthen refresh times so much that stale data becomes the norm. Complexity also grows as IT tries to accommodate the flood of data while their application owners expect Service Level Agreements, backup/recovery protections, and compliance requirements to remain constant.

Why Data Agility is more valuable than schema consolidation.

If you’ve been an Oracle data professional for any length of time, you’ve undoubtedly had a conversation about reducing your total number of databases by consolidating applications schemas from each individual database into separate schemas in one monolith database. Certainly, in the days before shared dictionaries, pluggable databases, and thin cloning this argument could be made easily because of the high price of the database license. Consolidation is a winning argument, but doing it at the cost of data agility turns it into a loser. Let’s explore why.

The argument For Schema Consolidation

What Delphix does in 1 minute 22 seconds




For a quick write up on what Delphix and database virtualizaition is , see

For a quick writeup on the use cases for Delphix and database virtualization, see


Screen Shot 2013-11-17 at 2.27.05 PM

Designing IT for Data Manufacture


photo by Jason Mrachina

As a (recovering) Mechanical Engineer, one of the things I’ve studied in the past is Design for Assembly (DFA). In a nutshell, the basic concepts of DFA are to reduce assembly time and cost by using fewer parts and process steps, making the parts and process steps you do use standard, automating, making it easy to grasp/insert/connect parts, removing time wasting process steps like having to orient the parts and so on.