Interesting question on human mistakes was posted on the DBA Managers Forum discussions today.
As human beings, we are sometimes make mistakes. How do you make sure that your employees won’t make mistakes and cause downtime/data loss/etc on your critical production systems?
I don’t think we can avoid this technically, probably working procedures is the solution.
I’d like to hear your thoughts.
I typed my thoughts and as I was finishing, I thought that it makes sense to post it on the blog too so here we go…
The keys to prevent mistakes are low stress levels, clear communications and established processes. Not a complete list but I think these are the top things to reduce the number of mistakes we make managing data infrastructure or for that matter working in any critical environment be it IT administration, aviation engineering or medical surgery field. It’s also a matter of personality fit – depending on your balance between mistakes tolerance and agility required, you will favor hiring one individual or another.
Regardless of how much you try, there are still going to be human errors and you have to account for them in the infrastructure design and processes. The real disasters happen when many things align like several failure combined with few human mistakes. The challenge is to find the right balance between efforts invested in making no mistakes and efforts invested into making your environment errors-proof to the point when risk or human mistake is acceptable to the business.
Those are the general ideas.
Just a few examples of the practical solutions to prevent mistakes when it comes to Oracle DBA:
Some of the items to limit impact of the mistakes:
Both lists can go on very long. Old article authored by Paul Vallee is very relevant top this topic — The Seven Deadly Habits of a DBA…and how to cure them.
Feel free to post your thoughts and example. How do you approach human mistakes in managing production data infrastructure?