This is the fourth of twelve articles in a series called Operationally Scalable Practices. The first article gives an introduction and the second article contains a general overview. In short, this series suggests a comprehensive and cogent blueprint to best position organizations and DBAs for growth.
I hesitate greatly to use the word “standard” having seen the ways it gets bandied about by IT groups everywhere. But the ubiquitous presence of this term and the manifold frustrations which always accompany it only prove that there’s something relevant here. We’re all familiar with the basic ideas behind so-called standardization. Anyone who has had a job maintaining more than one server for more than one month got frustrated at some point because something worked in one place and unexpectedly failed in another. The reason? Something was different between those two places. And so began the crusade for “standardization” – at least for those of us who had the time, energy, comfort level and motivation to try changing things!
The basic idea behind standards is to make things similar. The more evolved and realistic version of standards is that there’s process wrapped around the differences. (Hope you have those #1: The Foundation" href="http://ardentperf.com/2013/11/06/osp-1-the-foundation/">change history systems ready!) The “standard platform” itself has a lifecycle – and there will be new standard platforms to come on whatever schedule makes sense for your business (personally I think 2-4 years is a good place to start). Even though you may still have more than one configuration, by limiting them it becomes realistic to develop predictable processes for moving between them. Of course there will always be bespoke systems to manage; some will be legacy, some will be special-purpose. Smart standardization is not idealistic but rather business-driven. Hope I’m not sounding repetitious by saying that! For each exception, carefully weigh the costs and the benefits over the long term.
As a brief aside here: I do think outside opinions (user groups, professional networks and paid consultation) are very valuable and generally under-utilized. But use them mainly to test your own thinking for flaws. Consultants can sometimes come across very confident and convincing about the right way to do something (it’s an important skill for success in that field) but nobody knows your actual infrastructure and operations better than you. Get lots of input – even pay for some – then question everything and don’t follow advice that you don’t understand. Blame me if the consultants resent you for it!
Successfully moving into standards works best when you start at the bottom of the stack: with your physical hardware. I’ve worked at larger companies who are further along this road and they sometimes have the spec nailed down to the peripheral model numbers and firmware versions. But how can a small company get started on this when they purchase infrequently and have a bunch of existing servers to deal with? A few suggestions:
When you survey existing equipment, don’t worry if you can’t get things exactly identical – just get as close as possible. It’s worthwhile just to get the CPU core count, memory size and disk spindle counts to match – or at least minimize the variation. That alone will help a lot when you tackle standards for the layers above the hardware.
Maybe you’ll need a “high-storage” option or a “large-memory” one (Amazon’s options for EC2 instances are informative when defining options). Minimizing variation means that you only allow defined configurations: any app that needs a little extra of some resource needs to go all the way up to the next tier or option you’ve defined. Any app that needs less of some resource isn’t allowed to save money by trimming the hardware. After all, one of the major benefits of good standards is to make consolidation easy – so a bunch of those small apps should share one server. No in-betweens or compromises. In this sense, having a “standard” simply comes down to the courage and ability to convincingly say no to anything else. You need the foresight to see the benefits and the hindsight to recount the costs.
This is already enough material for digesting and discussing, so I’ll split this topic into two articles. In the next article I’ll get into some very specific suggestions from an Oracle database perspective. Any thoughts so far? What do you agree or disagree with?
Just a quick note/announcement, that we had our annual Michigan Oracle Users Summit yesterday at the VisTaTech Center on the campus of Schoolcraft College, in Livonia, MI. It was a good conference, and I think everyone who made it out, enjoyed their time there and saw some excellent presentations.
I did two presentations, Understanding and Interpreting Deadlocks and All About Indexes. The presentations will be available at the MOUS website, but I also wanted to make them available here. (See the links above.)
P.S. I finally, just today, got my storage for my GoldenGate test boxes, so, I’ll be proceeding with that testing soon. Stay tuned for my next blog post in that series, in a week or so.
Oracle 12c has increased the maximum length of character-based columns to 32K bytes – don’t get too excited, they’re stored out of lines (so similar in cost to LOBs) and need some modification to the parameter file and data dictionary (starting the database in upgrade mode) before you can use them.
Richard Foote has a pair of articles on indexing such columns:
Be cautious about enabling this option and test carefully – there are going to be a number of side effects, and some of them may require a significant investment in time to resolve. The first one that came to my mind was that if you’ve created a function-based index on a pl/sql function that returns a varchar2() type and haven’t explicitly created the index on a substr() of the return value then the data type of the function’s return value will change from the current default of varchar2(4000) to varchar2(32767) – which means the index will become invalid and can’t be rebuilt or recreated.
Obviously you can redefine the index to include an explicit substr() call – but then you have to find all the code that was supposed to use the index and modify it accordingly.
When I read Eliyahu Goldratt’s the Goal in Grad School, one of the key things that stuck with me is that there’s always a bottleneck and that the process only moves as fast as the bottleneck allows it to move. The Theory of Constraints methodology posits three key measures for an organization: Throughput, Inventory, and Operational Expense. Sitting down to think about this, I reasoned that we could use those same metrics to measure the total cost of data for copies, clones and backups.
For every bit of data that enters the door through Production, we could offer that the Throughput represents the data generated to create or update the copies, clones and backups. Inventory could represent the number of copies of each bit of production data that sits on a copy, clone, or backup. And, Operational Expense represents all of the Labor and Resources spent creating and transmitting that data from Production to its final resting place.
When expressed in these terms, the compelling power of thin cloning is clear. Let me show you what I mean by a little Thought Experiment:
If I had a fairly standard application with 1 Production Source, 8 downstream copies, and a 4 week – Weekly Full / Daily Incremental backup scheme and a plan to refresh the downstream systems on average 1/week, what would the metrics look like with and without thin cloning?
TOC Metrics for Cloned/Copies/Backed-up Data
8 * Daily Change Rate of Production
8 * Full Size of Production
4 * Full Size of Prod (1/week for 4 Weeks)
24 * Daily Change Rate of Production (6/week for 4 weeks)
8 shipments and applications of change data / day
1 Backup Operation/Day
With Delphix thin cloning, these metrics change significantly. The shared data footprint eliminates most of the shipment and application and redundancy. So:
TOC Metrics for Cloned/Copies/Backed-up Data using thin clones
1 * Daily Change Rate of Production
Change is shipped and applied to the shared footprint once.
1 * Full Size of Production (Before being Compressed)
28 * Daily Change Rate of Production
A full copy is only ever taken once. (Otherwise, it is incremental forever.)
1shipments and applications of change data / day
0 Backup Operation/Day
Since change is applied to the common copy, backups are just redundant operations.
The thought experiment reflects what we see every day with Delphix. The Throughput of data that has to move through a system is significantly less (8x less in our experiment). And, it gets relatively more efficient as you scale. The Inventory of data that has to be kept by the system is not driven by the number of copies, but rather is driven by the change rate and the amount of change kept. Unless you are completely flopping over your copies downstream (in which case you have different problems), this also gets relatively more efficient as you scale. And finally, when it comes to Operational Expense, you’re not just getting more efficient, you’re actually eliminating whole classes of effort and radically simplifying others.
The bottom line here is that Data has been THE BIG BOTTLENECK for a long time in our applications. And, with thin cloning from Delphix, you’ve finally got the power to take control of that bottleneck, measure the true impact, and reduce your Total Cost of Data by delivering the right data to the right person at a fraction of the cost you pay today.
Finally, at long long last, I have a spare 30 minutes in my life to complete this blog entry !! As discussed previously, Oracle has extended the maximum length of varchar2, nvarchar and raw columns to 32K, but this comes with some challenges when it comes to indexing such columns due to restrictions on the […]
I’ve been fortunate to attend and present at many Oracle conferences over the years but the one I would love to get to one year is the UKOUG conference. It always seems to have a great line-up of speakers and I’ve heard lots of positive feedback. Unfortunately, it’s a long long way from home, but […]
Great interview with Gene Kim, author of The Phoenix Project, discussing the Theory of Constraints and delays in project development. Here is an excerpt starting from minute 6:50
That’s all fine and dandy, but how do we fix it? Isn’t the problem of provisioning environments something we just have to live with? So what? That’s life right? Wrong! There is a solution. Database virtualization enables provisioning the largest and most difficult part of development and QA environment in minutes and on demand. What a clear and powerful testament to the power of database virtualization
Gene Kim, quoted above, is the author of The Phoenix Project which is a great book laying out a captivating (at least for geeks) adventure of finding out where and what the constraints are at a company and how to implement steps to alleviate the constraint taking the company from a losing position in the market to the market leader.
What is the impact of the major constraint in IT, the inability to provision environments on demand ? A quote from the IT Revolution Manifesto
According to analysts, global IT spending in 2010 was approximately $10 trillion. But nearly 70% of IT projects fail, and nearly 50% of IT work is unplanned work or rework. If we conservatively estimate that 20% of IT work is wasted, that’s $2 trillion of value each year that we’re letting slip through our fingers.
Many people read that and think it’s ridiculous and maybe those numbers are high, but whether they are high or not, it’s clear that project failure has a massive world wide impact.
WORLD Cost of IT Failure $6 Trillion http://sistemas.uniandes.edu.co/~isis4617/dokuwiki/lib/exe/fetch.php?media=principal:itcomplexitywhitepaper.pdf
What are the issues that are leading to project delays and project failures?
#1 Issue Facing Bank CIO’s in 2014: “Dealing with the Complexity of Data”
“Implementing effective data access and .. nimble infrastructure
…banks will be able to … to actually harnessing the data and using it to their advantage.”
# 2 issue: Innovation- “Frankly, technology executives have no choice but to emphasize innovation.”,
How do we know when we have successful data access and nimble infrastructure?
One of the best predictors of DevOps performance is that IT Operations can make available environments available on-demand to Development and Test, so that they can build and test the application in an environment that is synchronized with Production.
What are the initiatives in the industry to address data agility?
IT trends: Gartner’s 2013 predictions for information management
#2 Technologies helping information producers and consumers organize, share and exchange data on the move
What technology is at the fore front of data agility? Delphix.
Dephix enables operations can make available environments available on-demand to Development and Test in minutes for almost no storage overhead.
In the 3 years that Delphix has been on the market it had been adopted by
The list goes on.
“The status quo is pre-ordained failure”
The status quo is the biggest impediment in the industry. Those companies who innovate succeed. Delphix is sorting the wheat from the chaff. Delphix is the most powerful innovation I’ve seen in years. The companies that are adopting Delphix and innovating are succeeding. Minimum application development project improvement that we’ve documented so far is 20%. Average development project acceleration is 2x. We’ve seen two companies improve application development by 5x using Delphix.
More about Delphix at http://www.kylehailey.com/what-is-delphix/
A little over a year ago, I wrote a review of a book called Girl 99 by Andrew P. Jones. I got an email from the author a few days ago to say his latest book, Untogether Lives, was released on Kindle, so I downloaded it straight away. Here’s what it’s all about.
Untogether Lives is a collection of fourteen stories that peek through the curtains of an eclectic cast, struggling to keep mind, body and the world around them together. From an amputee shoe thief, to an unlikely arsonist, to a sexually frustrated quadriplegic.
Predominantly dark and occasionally disturbing, these stories are not for the faint-hearted, but neither are they without humour. Not everyone in Untogether Lives gets a happy ending, and not everyone survives – but, hey, that’s life for you.
I loved it! It seems strange saying that when the subject matter is so dark, but it’s true. The writing style works really well for me. The content is very different to what I normally read, but it was good being taken out of my comfort zone. What really amazes me is the amount of connection I felt to some of the characters, even in the shorter stories. Despite the darkness, there is humour in there too. This book is definitely not for the sensitive souls out there, but I think it is a great collection of stories. I’m looking forward to the next book from the author!
Oaktable World continues it’s global tour with the next stop in Manchester, UK during the UKOUG.
Check out the awesome lineup
And it’s free. Just be sure to register to reserve your place.
The event is offered free of charge to all visitors to the UKOUG Tech13 Conference. Registration is required in order to gain entry, but please note that the event will be intentionally oversubscribed to allow delegates to attend a mixture of sessions alongside Tech13 sessions. As a consequence of the oversubscription, entry to specific sessions will be on a first come, first served basis, so please arrive with plenty of time for each session even after registration.
The event will be held across the street from the Tech13 conference at: Premier Inn, 7-11 Lower Mosley Street, Manchester, Greater Manchester M2 3DW
About a month or two ago, I was doing some work toward developing a process to patch CRS out-of-place using cloned golden images. I held off on publishing anything because I wanted to do some testing but we’ve been so busy with deployments and maintenance over the past month that I haven’t had a chance. I think that it might benefit a few people to go ahead and post the work I’ve done so far even though I’m not finished. Thus… note that this material is still very much in-process.
This is a relevant topic for any organization who deploys Oracle-based clusters with regularity and who needs a solid process for managing the software. Many large companies already use a package/clone approach to manage various patch-levels of database software (I’ve been directly involved in this). The company produces a golden-tarball of the database software which includes the current corporate standard patch-level including one-offs that remediate previously encountered bugs. That golden-tarball is the only thing that needs to be copied or deployed anywhere. It gets internally version-controlled and centrally distributed and reduces both time and mistakes during the deployment of Oracle software to new systems and updates to existing systems. This approach is very advantageous and fully supported by Oracle.
The same process hasn’t yet been possible with CRS because Oracle has not yet produced documentation to sufficiently decouple the software installation part from the runtime update part. Therefor, with CRS we can’t replace the typical opatch-based approach with a tarball and clone approach.
The official docs do cover the following four scenarios for CRS:
These four procedures offer a starting point for our investigation toward a single, central, internally version controlled golden-tarball of CRS with all required patches. It’s also worthwhile to review the docs about adding and removing nodes. The biggest outstanding problem I see is PSUs and one-offs. I would like to create a golden-tarball of CRS with the additional PSUs and one-offs which installs into a new GRID_HOME reflecting its internal version number that we assign (e.g. 11.2.0/grid_23). On my existing systems I’d like to just automatically deploy this directory everywhere then have a simple rolling process to switch the active CRS from the old home to this one. Easy with RAC homes, apparently impossible with CRS homes…
Recently I discovered one additional interesting bit of information in an Oracle support note. (Apparently it’s been published for a few years now and I didn’t find it until now!) Note 1136544.1 gives an official technique for out-of-place PSU application on CRS. I’m not entirely sure but the steps in this note may have originally been generated by the new oplan utility. One of the interesting things about this note is that it uses a perl script called patch112.pl and a perl library called crsconfig_lib to reconfigure CRS to run in a new directory. The first three of the aforementioned processes (create, add, upgrade) from the official docs eventually call OUI and a root script to set up the copied CRS home. The forth aforementioned process (move) calls rootcrs.pl to reconfigure CRS to a new directory. From my reading through patch112 and crsconfig_lib, I can see that it’s updating the OLR location and init scripts in /etc. The cdata directory in the grid home contains the OLR. Based on some cleanup procedures in the documentation, I think that the crf directory (the new cluster health monitor) might also contain clusterware config or state files that would need to be retained when using a golden-tarball. Note 1136544.1 is also interesting because – unlike the move process above – it grabs an inconsistent snapshot of these two directories (i.e. copies them without shutting down CRS first) – and CRS doesn’t care when it switches over to the cloned and patched home but happily starts up and continues on.
Based on all of this, I think that the golden-tarball/clone approach for CRS might actually be possible with some small modifications to the procedure in Note 1136544.1:
The real trick here is that we have to be very careful to copy over all the important configuration files without missing anything and yet copy absolutely no binaries! The installation will be corrupted if we overwrite a patched file with an un-patched file. I think that the list above should be safe and sufficiently complete but as I said before, I haven’t tested this yet. I will likely give it a try sometime over the next few months and post my results. In the meantime I’m very interested in feedback about this idea – let me know what you think!