TOOL now in 4 eBook formats

Our book is now for sale in the following 4 eBook formats / devices:

Note the online versions are in color if the devices support color (it looks great on an iPad!).  We think the color content matters quite a bit, so if your device is black and white we recommend buying the print version (feel free to buy both!) to get the full experience.

Enjoy!  The eBook price is $9.99.  Cheers, -Chris


How to Get Your Life Back – Part III

Continuing on with my four- part series on how to create a world class problem management function and proactively take control of service quality and get back that precious time you're currently spending on fire-fighting.

Once you’ve created a work environment where problems are freely revealed and you’ve achieved teamwork across functions via clear leadership and shared goals, the next key ingredient is making technical problem solving a core competency.  An effective work environment isn’t enough without your own CSI team who can roll up their sleeves and figure out tough problems.  In our book we describe these as the detective Columbos of your org – the Crackerjack detectives that always get their man.  Calling out the need for technical problem solving expertise may seem like obvious guidance but it’s actually a common gap I see in many problem management functions.  Few organizations do it really well.

I previously blogged about the “Car Talk” guys from National Public Radio (Tom and Ray Magliozzi, also known as Click and Clack) and how they leverage an encyclopedic knowledge of automotive engineering to solve their callers’ issues without ever even seeing the car.  They’re both MIT grads which provides their foundational understanding of engineering mechanics.  They also have an innate knowledge of how things work that comes from years of taking cars apart and putting them back together again in the garage.  Combining the two together, it’s remarkable how quickly they’re able to figure out a caller’s problem rapidly and explain it clearly. 

Another analogy that helps make this point is from the world of medicine.  Doctors have a substantial baseline of scientific knowledge they acquire before ever seeing a patient.  By having an in-depth understanding of physiology, pharmacology, biology, etc., they possess a foundational expertise they’re able to call upon when confronted with a difficult diagnosis.  Web MD’s Boolean logic is fine for common ailments but insufficient for more complex and challenging illnesses. 

Same goes for technical troubleshooting.  There are way too many IT administrators or engineers who think troubleshooting is about calling up the vendor support line (I can save you some time, they’ll tell you to upgrade to the latest patch level) or tinkering with settings in a console without a true understanding of the technology itself.  As technology becomes more mature these skills seem to be in even shorter supply since many new IT professionals haven’t known a world where you have to build solutions from the base elements of technology. 

In my experience the best IT problem solvers operate the same way as good mechanics or doctors.  They tend to have a foundation of classical technical knowledge but almost always possess a never-ending curiosity about the way systems work.  They’re easy to spot.  They like to get their hands dirty.  They’re the ones that have built their own computers, run open source technology on their home PC, can hack just about anything and insist upon seeing for themselves what the code is actually trying to do (as opposed to what the developers say it does).  At corporate IT shops they are the ones who know where all the log files are for the relevant systems and how to interpret them.  They go to the logs first before dragging in other SMEs.

In the end, what type of police detective, auto mechanic or doctor do you want working for you? Think about that when there’s a temptation to skimp on you your problem management investment.  It’s important to hire for this competency and to develop it through training.  Don’t assume it’s going to come naturally.

Next up, the final post in the series, the Holy Grail: Determining meta-problems and making calculated strategic improvements.

-Mike Hagan-


Does your system have "The Nerves"?

Recently I watched a program called Recipe for Murder. It must be Mike's recent posts that got me thiking about Problem Management as I watched the show. It was a documentary about a crime wave that occurred in the 1950s in Sydney, Australia. Despite images of harmony in the country, crime was rampant in Sydney and over 100 people died of poisoning in 1953.

The "Excellent Nervine"... soothes, quiets, and strengthens the Nerves.When the first cases of illness were diagnosed by doctors, no obvious cause could be found. Apparently, during that period when the docs didn't know what they were dealing with, they often concluded that  patients were simply suffering from "The nerves". That was a good catch-all for saying there was something wrong with them but we didn't know what exactly. The doctors would prescribe some miracle syrup, such as root beer, Coca Cola, or Dr Guretin's Nerve Syrup (pictured). Did they really expect those remedies to solve the problem? Probably not, and that might have been okay if there was really no way to understand the cause of the problems and a possible solution.

Unfortunately, in IT we too often see the same attitude towards system problems. Administrators taking the easy path of rebooting a server, or service desk analysts asking a user to reboot their PC to take care of that unstable desktop. Do they really expect the problem to go away? Are they crossing their fingers that the same user won't call back with the same issue?

Fortunately, in 1952 a couple of astute detectives in Sydney started applying methodical investigation techniques to try to understand why so many people were dropping like flies. They quickly started noticing some commonalities between cases, and suspected foul play. They were open minded and explored all possibilities. When they suspected that poisoning might be the cause, and stared sampling the cakes and tea that some women were serving their "loved ones" they found that the samples tested positive for Thallium, the active ingredient in rat poison. Colorless, tasteless, and with no smell, Thallium was the perfect murder ingredient. The product was banned in most of the world, but Sydney had an estimated rat population of 1 million at the time so it was still allowed to be used.

If it wasn't for this great detective work (and the eventual conviction of three serial murderers), more people might have been poisoned. In IT, we've all been faced with problems that seem to poison our days too. Yet too often bogus remedies are applied and the underlying problems are allowed to continue disrupting the flow of business, and in some cases cause crises. In the Opposite of Luck, we advocate treating problems like criminals, even serial killers. It's amazing how applying a methodical approach of preserving the "crime scene", searching for and analyzing evidence, collecting facts, listing suspects, and zeroing in on the culprit can help make your business a better world, and IT a less scary place.


How To Get Your Life Back – Part II

Astronaut Ken Mattingly (played by Gary Sinise in “Apollo 13”) set aside his own personal disappointment to play a critical leadership role during that crisis. Teamwork across multiple engineering teams was key to addressing difficult power issues and bringing the spacecraft home safely.Continuing the series on problem management as a critical means of producing consistent quality and keeping that damned Blackberry from dancing around on your night stand at 4am!  Last week I laid out the four essentials to succeeding at problem management and creating a living culture of constant improvement.  We've covered #1: Celebrate the revelation of problems.  Now let's focus on #2: Drive cross-functional teamwork via clear ownership and shared goals

Working cross-functionally is the key to getting to root cause of complex problems.  Without teamwork across all teams the hardest problems go unresolved and linger out there like unpaid parking tickets in your glove compartment – every day growing more and more dangerous.  The days of discreet mainframe applications are mostly behind us and today’s distributed computing solutions are a sprawling web of complexity spanning all areas of responsibility in the typical IT org. 

How many times have you seen an incident response where one by one the various on-call representatives chime in and declare their team’s piece isn’t the culprit (database, network, app, etc.)?  After everyone's said their piece is working fine all you hear is cricket’s chirping … yet, the system is still down! 

This is human nature and I see it all the time in organizations where teams don’t share the same quality objectives.  There’s more concern about proving what the issue isn’t rather than figuring out what's wrong.  The preceding example comes from the world of incident management but it’s equally true for problem management.  To achieve the teamwork and partnership needed teams have to share common quality goals and have effective leadership.  Without those things, they’ll just bat the issues back and force rather than dig in and really figure things out.

To ensure accountability, prioritization and focus it’s essential to have an over-arching quality metric and goal shared by the entire IT organization.  Quality must be everyone’s business – not just the operations teams.  However, rather than just tracking availability which is a limited and one-dimensional quality measure at best, we recommend a Business Impact Index that quantifies the true pain inflicted on the enterprise by system outages so that they’re better understood and can be more easily compared across different systems. 

Ideally, the BII metric output would be lost dollars but in most companies that not easily done so a simple point system can suffice.  The important thing is to measure in terms of the business damage caused (lost revenue, customer dissatisfaction. lost productivity, etc.) so that you know which incidents hurt the most.  Without that focus it’s impossible to mass your limited resources on the root causes most in need of redress.  The BII also improves IT credibility and business alignment as it shows you’re holding yourself accountable for the things that hurt them the most.  There’s lots of additional detail about the BII and how to implement it in our book.

Once you know the true impact of incidents you can get to work on the most critical problems.  However, doing so requires teamwork and coordination that will span the entire department.  It’s important to have a senior level leader serve as the virtual owner for the problem management function to ensure accountability, productivity and conflict resolution regardless of where root cause lies.

The senior sponsor should assure that problems are being prioritized; addressed and needed progress is being made.  She should also ensure all functions are making the needed contribution so that resource gaps can be worked.  The sponsor should be prepared to intervene on stalemates, conflicts and vendor escalations. Magic can happen when folks come together to bear down of the most wanted list. 

Come back next time as I discuss how to make technical troubleshooting a core competency.

-Mike Hagan-


How To Get Your Life Back (Oh, And Make Your Business More Successful In The Process)

(This is the first post in a series on the essentials for implementing high performing and results oriented problem management in your organization)

As NASA Flight Director, Gene Kranz (played by Ed Harris in "Apollo 13") led a remarkably effective problem solving organization. Not coincidentally, finger pointing wasn't part of the culture.The recent high profile cloud service outages at Amazon and Microsoft have generated a lot of inquiries to me about the challenges of effective problem management.  I frequently give talks on the subject and it’s something a lot of organizations really struggle to do well.

I’m here to tell you though; it’s an absolutely essential component of ensuring customer satisfaction, managing costs and maintaining high employee morale.  Only by being great at getting to the true root cause of quality impacting issues can an IT organization reach its full potential.  Solving problems and taking preventative action frees up the resource cycles and focus to allow you to concentrate on innovating for the business rather than reacting to recurring outages and crises. It also gets you off of the treadmill of reactive behavior that forces long hours and a wildly eratic work schedule. Sleeping with a smart phone under your pillow is just no way to live and it doesn't have to be that way.

These are the four essentials to succeeding at problem management and creating a living culture of constant improvement:

1)      Celebrate the revelation of problems

2)      Drive cross-functional teamwork via clear ownership and shared goals

3)      Invest in and develop technical troubleshooting as a core competency

4)      Classify your problems into meta-problems and make bold targeted institutional improvements where they’ll have the biggest impact

For today’s post we’ll focus on #1.

Celebrating the revelation of problems is first and paramount.  You simply can’t fix problems you don’t know about and you can’t know about problems if people aren’t inclined to bring them into the light of day.  This is a difficult challenge for a great many organizations that struggle to achieve the right balance of ensuring accountability while at the same time maintaining a work environment where employees feel empowered or even safe to bring issues forward.

In organizations that do this poorly, upper management goes looking for scalps after an incident or lower/middle management looks for sacrificial scalps to proactively keep the senior leader wolves at bay. In other more passive aggressive cultures, folks don’t get overtly punished for identifying problems but if they do raise something they get stuck with full ownership without the needed resources or partnership to actually address it. 

In these types of work environments it’s not surprising that employees do whatever they have to do to protect themselves.  At best, they avoid acknowledging issues that they know we’ll bring recriminations or they’ll get stuck owning with little chance of resolution.  At worst, they’ll deny, obfuscate, deflect, filibuster and sometimes even destroy critical data.

It’s the responsibility of leadership to create an environment where employees are held accountable for their actions and results but also where problems aren’t hidden or concealed.  I always tell my teams or clients to be mad at the problem, not the person.  I also tell them they can’t get in trouble for revealing a problem but they sure can get in trouble for concealing one.

It’s also important for your deeds to match you words.  Employees listen to what you say but will be much more focused on what you do.  Sometimes it takes a lot of discipline and restraint to avoid the blame game.  If you have a one-strike-and-you’re-out policy – even if it’s only once in a while, don’t be surprised if significant issues are not being brought to light.

In the end, blame wastes crucial energy that could have gone towards getting to root cause and making proactive improvements. It’s counterproductive, ineffective and bad for morale.

Tune in for my next post on problem management essential #2:  Drive cross-functional problem solving teamwork via clear ownership and shared goals.

-Mike Hagan-