truth, lies, and o-rings

I’m reading Truth, Lies and O-rings by Allan McDonald regarding the Challenger disaster.  The author was a senior manager at the time of the Shuttle disaster in Morton Thiokol, the company responsible for making the solid rocket motors, which were the point of the failure.

I’m only 16% of the way through, but it is seriously like watching a car crash.  It’s mind-boggling, but I suppose it shouldn’t be based on my worldview that most people, including most engineers, are incompetent and that engineering is very hard because unlike many professions, there is a right answer, and the wrong answer results in failures.

In any case, I am blown away by reading about how MT (Morton Thiokol) is busy analyzing their O-ring problems and is aware that there is a temperature component based on the observations on the retrieved hardware from previous flights.  However, they don’t bother to try and really figure this out until the night before a proposed Shuttle launch date in cold weather, at which point their engineering team hastily comes up with a hand-written proposal recommending a launch commit criteria of 53 F temperature at the O-ring.

Now, when your vendor is telling you it’s not safe to launch below 53 F and you’re flying humans, it seems obvious in retrospect that the proper course of action is to stand down and do a full analysis.  I mean, you don’t develop launch commit criteria using the back of an envelope.  The real question should have been, among other things, how can you be really sure that 53 F is really safe?  (The Shuttle had previously survived a mission with the O-ring at that condition, but that’s no guarantee it would survive a thousand or a hundred thousand missions at that condition given other uncertainties.)

If no astronauts were in the picture, it would have been absolutely appropriate for NASA to push back, consider proceeding at risk, pressure the vendor to reconsider, whatever.  But with astronauts’ lives on the line?  If your vendor pulls a new launch commit criteria out of their back pocket on a system that’s supposedly been qualified to 13 degrees lower?  (Keep in mind that in aerospace to qualify something to a given temperature you test in to temperatures 30 degrees lower or higher.  It’s unclear to me when the author says the booster was qualified to 40F whether he’s saying it was tested at 10F, or that it was tested at 40F.  I presume the former.)  I would conclude that my vendor was incompetent and that a full review of their system was needed.

Well, now I get to read what NASA really did.  It’s interesting reading because my knowledge about Challenger is really quite limited.

6 thoughts on “truth, lies, and o-rings

  1. Daniel

    Someone told me engineering is turning something worth little (aluminum) into something valuable (a rocket or airplane). I have also learned that a lot of engineering is predicting the future. For example: will this piece break? How high or how far with something fly?

    If most engineers are incompetent, would you consider yourself competent or incompetent?

    How do so many engineering ventures succeed with so much incompentence?

  2. admin Post author

    Interesting questions. To answer honestly, I should probably e-mail you or passwd protect the post.

    I consider myself mostly competent but I certainly have my moments of incompetence. Basically I feel that recognizing competence is a high bar is the first step to achieving it.

    I think my company and NASA both have high hiring bars and the majority of our employees and NASA’s employees are competent.

    But consider our alma mater – one of the best engineering schools in the nation. Would you agree that we award degrees to some individuals that could not be called competent? Then consider all the lesser schools.

    I think the most impressive engineering feats, like landing on the moon, were accomplished some of the most competent people in the country and also by using strategies like margins of safety and the like to account for mistakes. (Such strategies are themselves marks of competence.)

    However, the poorly designed freeway exit I take daily that slows my commute is a show of incompetence. As was my husband’s old apartment building, a 17 floor building knocked down due to being declared unsafe due to mistakes in structural analysis or bad materials or something.

    Incompetence is all around us. It’s why capitalism works so well, as long as opportunity is preserved. It allows the competent to succeed.

    I should also note that I wrote the blog in a bit of a fury. Having now read 40% of the book, I have a lot more thoughts on it all.

  3. Daniel

    Ok, I agree. There are engineers that I would struggle to trust. However, processes and quality checks can help the results. I agree that a sign of competence is acknowledging and working with ones (self and group) own potential mistakes through margins of safety, cross checking results, peer review, etc.

    On the note of capitalism, capitalism works well if competence can succeed and incompetence can be superseded by competence. However, sometimes capital (money, market share, entrenched interests, etc) can help incompetence succeed longer than it should.

  4. Sarah

    I’m going to add this book to my “to read” list. I have many thoughts and complex opinions about issues like this. Without having read the book, but with having general knowledge of how NASA functions, and working within an organization that was formed as a direct response to the Challenger accident…there are just so SO many factors at play. They had seen evidence of o-ring damage on previous flights, but Challenger happened. They knew that foam was shedding off the external tank, but Columbia happened. In retrospect it seems so frustratingly obvious what should have happened, and yet it happens.

  5. Becca

    So my perspective about this sort of thing has really changed in the program. In my old ops organization, I felt we were really good at defining constraints and categorizing them – really needed, nice to have, etc. I thought as engineers, there was lots of perspective, you knew the consequences of your recommendation and you knew its impact on other systems (or as part of your recommendation you found out). But now that I am in the org people come to with risk, there’s lots of what I call incomplete engineering. Lots of people come to you every day and say “we recommend you only operate above this limit”, not because they understand what happens below that limit, but because they know you are definitely ok above it. Which is of course sometimes appropriate (especially in the case of human life). But then there is also no discussion of consequence at an appropriate level or integration (e.g. impact on other systems). Contractors, particularly, are risk adverse because there is no incentive for them to push for mission success (in fact, most contracts are rewarded financially by recommending more analysis work in all circumstances). It leaves more questions than answers. Anyway, the decision making clearly went bad in Challenger, but I am beginning to more and more understand how management organizations become the way they do and fail…

  6. Becca

    Also, I think competent vs. incompetent is all a matter of perspective. There are lots of engineers that can be highly competent at one thing, but when thrust into a different situation or an integrated problem, are highly incompetent. There might be a right answer, but getting the inputs to solve for the right answer is ridiculously complicated in some cases. And then add to that organizations that don’t do a good job placing people in jobs that cater to their strengths and competence…

    Anyway.

Comments are closed.