More on Macondo

I’ve now had time to read the whole US Commission report on the BP Deepwater Horizon disaster in the Gulf of Mexico – the discussion sections that I’d not read earlier, in order not to be influenced, when I published my initial conclusions. It is ever clearer.

“Most, if not all, of the failures at Macondo can be traced back to underlying failures of management and communication. Better management of decision-making processes within BP and other companies, better communication within and between BP and its contractors, and effective training of key engineering and rig personnel would have prevented the Macondo incident.”

My emphasis this time on their positive use of “would” – ie without doubt. My own agenda here is to pick up those communication and decision-making aspects of business management systems, but as an engineer in the downstream business and as a human, you have to feel for the guys who made the mistakes and struggled with their consequences, in many cases to their deaths.

It’s a long time since BP has been a “British” company, and any finger-pointing between BP and Haliburton an Transocean is unhelpful. Creditable to notice lines in the official (US) report like

“As BP’s own report agrees …”

compared to

“Halliburton has to date provided nothing … “

or

“Haliburton should have …”

My point is that the responsibility is shared industrially (as the report concludes), and I see BP taking its share.

I make that point because I did make an observation earlier about the hairy-arsed “wild-catting” culture present at the sharp end in this industry, with a US frontier freedoms mentality wherever in the world the operation is. Any sophisticated business managing such operations – however good BP is – would be unlikely to change that “by design” and in fact should think hard before attempting to do so.

Remember this was one of the largest, newest and most sophisticated rigs in the world. There is a recommendation about the control and monitoring systems in use, particularly during the fateful period when the “kick” had already started and the fatal blow-out was on its way :

Why did the crew miss or misinterpret these signals? One possible reason is that they had done a number of things that confounded their ability to interpret [the] signals ….

In the future, the instrumentation and displays used for well monitoring must be improved. There is no apparent reason why more sophisticated, automated alarms and algorithms cannot be built into the display system to alert the driller and mudlogger when anomalies arise. These individuals sit for 12 hours at a time in front of these displays. In light of the potential consequences, it is no longer acceptable to rely on a system that requires the right person to be looking at the right data at the right time, and then to understand its significance in spite of simultaneous activities and other monitoring responsibilities.”

Hard to argue with that ? But, very important to distinguish decision-making from decision-support. You (we all) are relying on a tremendous amount of experience and judgement, not to mention risk-taking balls, at the upstream sharp-end of the business, drilling into the unknown. There will be blood ? Hopefully not, but it is part of the risk. There are some clear management and control-system safety-critical steps in all these processes, which need to be treated as such, with fail-safe steps needed, but we need to be careful not to (try to) automate all risk out of the system. People are highly ingenious at bypassing systems that prevent them doing their job. Applying controls in the wrong places can counter-intuitively increase the risks. We need systems that support people doing their jobs, not take them out of the loop entirely. There is good reason why the human eye is brought to bear on these processes. Proper risk assessment is one thing, but knowing when to do it and what to do with the result needs focus.

There are a number of other things also borne out by the report.

If you’ve never actually experienced a disaster first hand, it is difficult to appreciate that one is actually taking place, denial is naturally human – the hope for anything but that. By definition, the safer industry in general, the fewer participants have the necessary experience. The captain of the Titanic comes to mind. Drills and simulations of the worst case risks become so important to take seriously. This point is so important it makes it into the summary paragraph above.

Integrity & pressure testing is something of which I have considerable experience. Such testing inevitably occurs late in the process, as early as possible naturally, but nevertheless towards the end of the job. Inevitably the consequences of failing such a test can therefore have great business delay, cost and rework consequences, and all the attendant contractual responsibility wrangling that might entail. So, paradoxically, it is at the integrity / pressure test point when you most want failure to occur. Such tests may be potentially destructive by design and if it’s going to fail, this is precisely when we need it to happen, when the health and safety risk is lowest and the business value risk almost at its peak. You need to be looking for failure here. It takes balls to fail a pressure / integrity test, and the people & processes here need real authority and independence from the business productivity roles. I already mentioned the need to acknowledge safety criticality in levels of surveillance and regulation imposed from outside the working team. Again the report (and BP’s own actions since their own investigation) well recognize this issue. There really should have been (almost literally) alarm bells ringing before this test process even started. It could hardly have been more critical.

From the most significant failure point to an incidental one, though both are examples of communication of information for decision-making in the summary paragraph; The confusion about whether or not the specified spacers had actually been delivered and available as the correct type (design-class), affecting the decision as to the spacing arrangement actually deployed. Several ironies in that inconclusive chain of decisions, that provided the unfortunate quote used as the headline in the report.

“Who cares. It’s done … we’ll probably be fine …”

Supply chain confusion about the type of materials actually delivered and available. How hard can it be for supplied items to be marked and systems informed with their true class (type) ? One for the information modelling and class libraries aspects of the ISO15926 day job.

The BP Commission Report

Still digesting this

They were operating on well-known and understood tight margins on pressure balance ever since the incident during partial drilling by the earlier rig, and right through completion of the drilling to the final “primary” cement job. That balance was always between too little (mud, pressure, cement, etc) failing to control the hazardous hydrocarbons, vs too much (mud, pressure, cement, etc) destroying (the value of) the well. It may seem scary to lay people, but this is always what engineering is about – difficult judgements by responsible, moral people – we’ll “probably” be OK. It looks like “cost-cutting” to do less, but we all cost-cut (look for the best price, the most cost/value effective) every day.

[At this point, I’ve only read as far as the end of the cement design and analysis – ch4, p102 – and I’ve not seen any mentions (yet) of the problems and risks associated with the BOP systems, or the top-sides relief systems, serious but secondary – but I’ll hazard a guess (based on earlier reading of BP’s own report) that the real failure is the decision to ignore the failed negative pressure test (!), and the failure of any warning / criticality signs in BP’s higher supervisory management systems that this whole operation was on tight margins, which could have enforced double checks on the safety-critical decision points, like this one, and other additional quality surveillance. As I said earlier the irony is that BP were one of the first to introduce “criticality” ratings to the industry, 25 years ago.]

So, continuing, reading on … a quote from the commission report (their italic emphasis, not mine) and even with hindsight their use of the tense “would” – is telling.

“At the Macondo well, the negative-pressure test was the only test performed that would have checked the integrity of the bottom-hole cement job.”

And later …

“It was therefore critical to test and confirm the ability of the well (including the primary cement job) to withstand the under-balance.”

The visiting execs and the new trainee in the team both add to the dynamics of dealing with the apparent problem at a critical moment in what was already known to be a critically-balanced situation – interesting. And then the fateful error :

” … the 1,400 psi reading on the drill pipe could only have been caused by a leak into the well. Nevertheless, at 8 pm, BP Well Site Leaders, in consultation with the crew, made a key error and mistakenly concluded the second negative test procedure had confirmed the well’s integrity.”

After that, yes the BOP’s should have been a last line of defence, but weren’t … it’s history … Having been in the pressure testing position myself on several projects, I feel for Anderson …. was he amongst the dead, I wonder ? [He was.]

The recommendations need reading in detail, but this looks like systemic management / surveillance / regulation system needs, so that what look like normal processes in abnormal situations don’t (accidentally) skip critical checks. To their credit, BP still seems to be taking the full hit of responsibility, but I doubt BP is special in this respect.  These are industry needs.