– Thanks for coming to Space Place tonight. Our guest speaker this month is Steve Cantley. Steve was a flight controller for the International Space Station at NASA’s Mission Control in Houston at the time that the Columbia Space Shuttle accident happened, and so he had a unique viewpoint of those events that we all witnessed in the media and various ways. And so he’s going to talk to us tonight about that disastrous loss of a spacecraft and its crew. Steve is now in Madison. He’s currently Director of Innovation and Advanced Research at a firm called bb7, a product development firm here in Madison. And among other things they provide support to the IceCube experiment, the Neutrino telescope at the South Pole that we operate here in Madison as well. So I hope you’ll help me welcome Steve tonight for his talk. (audience applauding)
– Thank you Jim, thank you everyone for coming. I guess let’s just dive right in. So February 1st, 2003, communications was lost in mid-sentence as Commander Rick Hubbard aboard Columbia was talking to Mission Control. That was right at about 8 a.m. almost precisely, and Columbia at that point was 16 minutes from landing, but four states away over north central Texas. Columbia and all the crew aboard were lost. This happened at about 200,000 feet of altitude while Columbia was traveling at Mach 18. And if you’re used to seeing photographs or footage of work being done in an aircraft accident reconstruction, the debris field was not like that at all. The debris field covered about 2500 square miles. So let’s talk a little bit about what we’re going to cover tonight. As I said in 2003, NASA lost its second orbiter and crew in an accident. Challenger was lost in 1986, it exploded shortly after launch. Columbia, the vehicle we’re primarily talking about tonight was lost during reentry when Columbia came apart over north Texas. Figuring out what happened required piecing together dozens of lines of inquiry and makes a really fascinating case study in event reconstruction and also in risk management. We’re going to start with a short clip inside Mission Control. The clip is going to cover just about six minutes of time, from the very first indication of a problem until communication with the crew is lost. And in it you’re going to watch LeRoy Cain, the Flight Director and his team, try and figure out what’s going on as increasingly, as seemingly unconnected problems happen on the left side of the vehicle. In a couple of spots, I’m going to toss in some comments to help understand kind of what’s happening in there because the syntax in mission control can be awfully dense and full of acronyms.
After the clip we’ll dig into the process of understanding the physical cause of the accident and I’m going to get a little bit deep into the technical details on a couple of occasions, but trust me it’s going to be worth it because at the end you’ll have a pretty good idea of what happened and why it happened. Once we’ve discussed that I want to talk about why this happened. Or what I should have said is, is what happened and how it happened, and then we’re going to get into why it happened. And some of the organizational issues that led to that. Hopefully you’ll feel as I do, that the lessons don’t just only apply to the space business, but they’re things that you can apply in your own life as well. Okay, so this is a timeline of what happened in Mission Control the morning of February 1st. Some of the writing is a little bit small, but I’m going to go through it, and you’ll notice that there’s an awful lot of information here that you don’t see. That’s okay, we’re going to catch that a little later. At 7:15 in the morning over the Indian Ocean, Columbia did her retroburn. That was to drop her orbit so that she could return to Earth. At 7:44 the orbiter hits what’s known as entry interface. That happens at about 400,000 feet of altitude and what happens there is the orbiter first really begins to experience the Earth’s atmosphere again.
So about five minutes after that a series of maneuvers called roll reversals happen. And what’s going on there is it’s just taking an S curve. A very long S curve to try and even out heating on the left and right side of the vehicles. At 7:50, about 7:51, a period called peak heating began, and that lasts for 10 or 15 minutes, and it’s when the temperatures on the leading edges of the wings and the nose really get to the scary high levels. 2500 degrees, 3000 degrees. At 7:53 Columbia crossed the California coast. This happened at 231,000 feet, Mach 23, so the velocities are still astronomically high. And just a few seconds after that, shown here, you’re going to hear the flight control team react to the loss of four temperature sensors in the left wing. And they’re going to talk about the sensors going off scale low. What this means is that the instrumentation system on board the orbiter has lost its connection to those sensors. By 7:55 the leading edge would typically be at about 3000 degrees, which would be about its maximum. At 7:58, almost 7:59, also on the left side of the vehicle, the two main landing gear tires on the left main landing gear, their tire pressure sensors now went off scale low. Just after that, about a minute later, that was when the last real time data, telemetry from the orbiter appeared on the console work stations in Mission Control. At 7:59:32 was the last crew communication and at 8:12, after a phone call to someone in the Mission Control Center said that folks in Dallas had seen Columbia breaking up as she reentered overhead. Flight Director LeRoy Cain declared a spacecraft contingency and directed search and rescue assets into the debris area. So here’s the video clip. There is a little time jump in the middle. This is about six minutes total.
– [Jeff Kling] Flight, MMACS.
– [Director Cain] Go ahead, MMACS.
– [Jeff Kling] FYI, I’ve just lost four separate temperature transducers on the left side of the vehicle, hydraulic return temperatures. Two of them on system one and one in each of systems two and three.
– [Director Cain] Four hyd return temps?
– [Jeff Kling] To the left outboard and left inboard elevons.
– [Director Cain] Okay, is there anything common to them, DSC or MDM or anything? I mean, you’re telling me you lost them all at exactly the same time.
– [Jeff Kling] No, not exactly, they were within probably four or five seconds of each other.
– [Director Cain] Okay, where are those? Where is that instrumentation located?
– [Jeff Kling] They’re all four of them are located in the aft part of the left wing, right in front of the elevons.
– [Steve] You can see all of the other flight controllers really giving a good look at their data to see if they’re seeing anything as well. And then the no commonality means that there’s no single piece of hardware that could account for the loss of all four of these sensors.
– [Jeff Kling] Two of them are to the left outboard elevon, then two of them to the left inboard.
– [Director Cain] Okay, I got you.
– [Mike Sarafin] Flight, Guidance, we’re processing drag with good residual.
– [Steve] This and the next call are just housekeeping in Mission Control.
– [Bill Foster] Flight, GC.
– Go.
– Your air to grounds are enabled for the landing count.
– [Director Cain] Thank you. GNC, Flight.
– [Mike Sarafin] Flight, GNC.
– [Director Cain] Everything look good to you, control and rates and everything is nominal, right?
– [Mike Sarafin] Control’s been stable through the rolls that we’ve done so far, Flight, we have good trims. I don’t see anything out of the ordinary.
– [Director Cain] Okay.
– Here’s the time jump.
– [Jeff Kling] Pressure on left outboard and left inboard, both tires.
– [Charlie Hobaugh] And Columbia, Houston, we see your tire pressure messages and we did not copy your last.
– [Director Cain] Is it instrumentation, MMACS?
– [Jeff Kling] Flight, MMACS, those are also off–
– [Commander Husband] Roger, uh… (radio static chattering)
– And we’re going to talk about that static a little bit as well. And so as Jim alluded, I’m Steve Cantley. I’m here because the folks that you just saw are former colleagues of mine. I had the privilege to spend a whole bunch of years probably at the best job you could imagine. I was responsible for the International Space Station’s electrical power system. Worked a whole bunch of assembly missions, a whole bunch of hours on console in between assembly missions. All told, about 2500 hours. I want to be clear that while I’m going to be talking about people and decisions and mistakes tonight, I’m not here to really indict anyone or any organization. The people that I’m going to talk about are treasured and valued former colleagues who made mistakes. The information sources for this presentation, I’ve been putting it together in various form now for years. A lot of it is drawn from the Columbia Accident Investigation Board Report. Every couple of years I dive back into that and relive this time. Another information source that bears particular comment is hindsight. People say hindsight is 20-20, at some level they’re correct, at another level hindsight is very much not 20-20.
You can forget the order in which you knew things. You can forget the order in which things became apparent to you. This is why accident investigators try to talk to people just as soon as they can after an incident. And I’ve experienced this myself as I’ve put together various versions of this presentation and I’ve written down something and then later found that no, that was out of order as I go back and search the source materials. So, what we just talked about was the end of STS-107. STS-107 being the mission designation for this particular Columbia flight. On the day of the accident, on the day after the accident, Ron Dittemore, shown here, the Shuttle Program Director, spent a lot of hours on a stage giving a press conference. And he was asked essentially the same question a lot of times, and had to give the same answer. “I don’t know, we don’t know.” There was anecdotal information in the press about a foam strike during launch. That had been analyzed extensively during the mission and the best estimate of the analysts was there was maybe some damage, but there was nothing that would constitute a safety of flight issue. So in Dittemore’s world there wasn’t, there wasn’t an issue that would have brought down Columbia because of that foam. There were anecdotal words from people who were viewing Columbia as she passed overhead during the reentry.
Out west it was still before sunrise so the sky was very dark. You could see Columbia going overhead as a bright white dot. Well, they all believed that they saw little white dots coming off of the bright white dot, debris being shed during the reentry. There was no documented evidence of any of that yet so that was also anecdote. The orbiter passed north of Kirtland Air Force Base where a PI who had an infrared telescope had the habit of focusing it on the passing shuttle as it went by. And I’ve put that here at the bottom of the slide. And after a little bit of enhancement, well maybe there’s a little bit of a plume coming off the back side of the left wing. Maybe there’s a little bit of a discontinuity here at the front of the left wing. But again, it’s early days. That’s all anecdote. So we knew Columbia had been lost with all hands, we knew that recovery of debris and remains had started.
But really nobody knew anything else. So the formal investigation began the day after the accident, Sunday, February 2nd, with the investment of the Columbia Accident Investigation Board. Now they were given nine charters, I’ve pulled the third one out because I think it bears particular, particularly on our discussion here tonight. So they’re tasked with find the facts, figure out the root cause, recommend preventive actions for the future, but the last sentence is especially important, I’ve highlighted in yellow. The investigation will not be conducted or used to determine questions of culpability, legal liability, or disciplinary action. That’s a remarkably enlightened thing to put in the charter for the board and to my thinking it says we really are honestly interested in finding out what went on here. We want everyone’s honest input. We don’t want people hedging to try and protect their jobs. So before we get into a discussion of the foam I want to hit just a little bit of space shuttle architecture. We have the external tank here, which is the big orange thing that holds the liquid hydrogen and the liquid oxygen that power the orbiter through its ascent. We have the left and right solid rocket boosters. They fall off a couple of minutes into flight but provide an early kick to get the whole system a good chunk of the way up. Here’s the orbiter itself.
This one is Atlantis. We’re going to be talking a lot about a substance called reinforced carbon carbon. It is a highly temperature resistant and physically pretty tough substance that lines the leading edge of the wings. So, there are 22 RCC panels on each wing, left and right. And here you can see them on the bottom of the orbiter as well, and then the very tip of the nose is also a little RCC shield because the nose and the leading edge of the wings are the parts that really get the hottest. The main landing gear doors, we’re going to talk about those a little bit. Here’s the right main landing gear door, here’s the left main landing gear door. And you can see how close the tip of the main landing gear door comes to this curve in the wing. Physically they’re pretty close. At the back end of the orbiter near the tail are the two OMS pods. OMS, orbital maneuvering system. And this is where a lot of the hardware that forms what’s called the RCS or the reaction control system resides. The RCS is all of the little jets that the orbiter uses to hold its position when it’s in orbit, and it also is two larger thrusters just above the space shuttle main engines that are involved in the reentry burn.
So it turns backward, the two big OMS engines light off, slowing down the orbiter so that it can come in to the atmosphere. And then at the very end are the three space shuttle main engines. Once the solid rocket boosters fall off during ascent they are solely responsible for providing the thrust to get to orbit. So really, the story that we’re here to talk about tonight starts 16 days before the accident, on January 16th, when Columbia launched as part of the STS-107 mission. It was a rare mission during that time period that didn’t go to the Space Station. Almost everything at that time was focused on building the Space Station, expanding it. The part of the space program that I was so keen on. About 80 seconds in, a piece of foam broke off from the external tank at the base of what’s called the bipod. I’m showing that right here. You can see this little ramp structure, well part of that foam fell off. The bipod holds the nose of the orbiter to the external tank. It fell off, traveled down the stack and impacted the left wing.
The block of foam was initially estimated to be about 20 by 16 by six, and weigh about two and two-thirds pounds. Later they figured it was a little bit longer and a little bit narrower, maybe not quite as thick, and dropped it to more like one and three-quarters pounds. It struck the wing at 530 miles an hour or so, and left a big shower of debris after the impact. The day after the launch, analysts were able to get film from tracking cameras and that’s when this incident was seen. But at 81.7 seconds, here you can see the piece of foam, this is the first time it was visible. Two tenths of a second later, 81.9 seconds, there’s the shower of foam, pulverized foam debris, coming under the surface of the wing, left behind. The analysts figured out that the most probable impact area was either that reinforced carbon carbon leading edge on the left wing, or possibly a more shallow impact with the underside of the wing. And in that case it was more likely to be near the main landing gear door. This information was all handed off to a debris assessment team. They worked for most of the rest of the mission trying to analyze what had happened. They pretty early on started to come to their own conclusions.
They used a numerical model to try and figure out how much damage might have occurred. And they quickly realized that the critical question was did it hit the underside of the wing, the thermal tiles there, or did it hit the RCC, the leading edge. If it hit the tile, well it probably dug through some of the tile, and if that happened, there would be a little bit of extra heat getting through to the skin beneath the tile, but there was nothing that they thought would have been a long term problem. All of this was stuff that could have been fixed upon landing, before Columbia’s next launch. The risk was if it hit at that main landing gear door, maybe it would have broken a seal that during landing would have allowed hot gas to get in. But again, they didn’t think that that was likely to be the case. If the foam instead hit the RCC panel, well everyone knows that RCC panels are really tough, and so probably nothing happened there either. It might have cracked it. That had been seen before on one mission, but that wasn’t something that was likely to cause a problem during landing either. So, we think that everything is okay, but a picture is worth a thousand words. So what we really want to do is get an image of that part of the wing. Unlike many space shuttle missions, this one didn’t have a robotic arm on it.
There was no need for it, and the weight of the robotic arm could be used to carry up more experimental supplies because this was a science mission. So no way to swing a camera out and take a look. So the only recourse is to ask an organization like the Department of Defense to use some of their assets to take a look at the wing. Or to run a spacewalk solely for the purpose of going over to the edge of the payload bay and taking a look. So, outside of appropriate channels, a lot of people on the debris assessment team tried to spin up an effort to get the DOD to take a look at the wing. Almost certainly had they done it through channels, everything would have been fine, and the request would have been made and granted. But all of these different efforts to get something looked at, there were something like eight different efforts to try and get imaging done, were all outside of channels. And so Linda Ham, the chair of the STS-107 mission management team, she turned all of those requests off, for better or for worse. It was her decision to make and had the request come in to the mission management team through the appropriate engineering channels, almost certainly the request would have been granted.
But it wasn’t. And so again, just the tie it all up, the conclusion is that there– The model indicated that there was likely some penetration of components of the thermal protection system, but not enough to constitute a safety of flight hazard. Other foam impacts had happened in the past. This seemed very similar to them. The term that I kind of came to hate was “It’s in-family.” And since it’s in-family, and nothing bad happened then, the assumption is nothing bad is going to happen this time. So, the Columbia Accident Investigation Board now has to start the investigation. And they are impounding data from inside Mission Control, they are trying to get their hands on all of the evidence in form of debris and anything else, so that they can analyze it and figure out what happened.
Well we’re going to talk about three kind of happy coincidences that happened early on. The first, is all of the telemetry that comes down from the orbiter goes through the TDRS satellite system, a geosynchronous satellite system. And the data comes down at a ground station in White Sands, New Mexico. From there it gets minimally processed and then sent on to Houston into Mission Control where it gets more heavily processed and put on the console screens of people like me. There was data that was in the ground station system that never got forwarded, not for any nefarious reason, but it was very intermittent, it happened in very brief spurts, never long enough for the system to lock up and say aha, I have a good signal. In addition, a lot of the packets of data had bit errors in them. So clearly there was something there, but there were probably good physical reasons why the antenna might be sweeping across the TDRS satellite and just giving the information out in little spurts. So folks who knew what they were doing were able to take that information and decode it by hand. It ended up to cover about the 30 second period between the end of Commander Husband’s aborted comment, and the beginning of vehicle breakup. And in the last two seconds it shows a pretty stark picture of the situation on Columbia. Several master alarm warnings were going off. Any one of them is a bad day. Several of them are a tremendously bad day.
The hydraulic system that controls all of the flight control surfaces on the orbiter. There are three independent hydraulic systems. All of them had no hydraulic pressure left. Their hydraulic accumulator quantities were all at zero. Clearly they had all been vented to space and bled out. Without aero control surfaces Columbia had no attitude control. There was no data at all coming from the left wing of the vehicle. It is as if the wing was simply not there. Also we talked a little bit about OMS pods. Here is the left OMS pod. There was also no data coming from the left OMS pod. Again, it is as if it wasn’t there. And someone, presumably Commander Husband, had deactivated the digital autopilot by moving the control stick. The second happy coincidence happened in two parts.
As the board undertook to understand everything that happened along the ground track during Columbia’s reentry, well it turned out that there were an awful lot of people who liked to take video tapes of the orbiter reentry. This was of course before smartphones, so all of this had to be done with VHS camcorders. One of the observers in particular happened to take his GPS coordinates before Columbia came by, so we knew exactly where he was. And the coincidence is that Columbia passed directly in front of and occulted Venus. And so you have an observer with an absolute position, you have Venus with an absolute position. Well you know exactly when the orbiter went between them, and now you have an absolute time stamp. So you can start comparing all of these videotapes. You can start correlating events to one another, and you have an absolute time reference for one of the tapes. That time reference can be now expanded out to all of them. And you can look at debris releases in the video tapes and look at telemetry on board the orbiter and start to correlate things. So that was a very, very helpful thing.
And then the third is the debris field. As I said it was very extensive. Each one of these dots is a piece of debris. Here’s the Dallas-Fort Worth metroplex. The ground track was just a little bit south of Dallas in this direction. And the vehicle break up position was right about here. So this is about 250 miles long, and about 10 miles wide. 2500 square miles of pretty dense forest over in this region, forest and swamp, needed to be combed through to recover in the end a little over 30% of the vehicle by weight. As folks started to find components from Columbia’s avionics bays, that was really the next level of excitement because they really needed to find a particular piece of equipment called MADS, which stands for the modular auxiliary data system. And by saying okay, this component was here next to MADS and we found it here, okay here’s another component that we found. It was here next to MADS, and we found that there, they were able over a period of a few weeks, to draw a pretty small circle where MADS could be, and indeed when looking in that pretty small circle came up on top of a little rise, and there it was. MADS is like the flight data recorder on steroids that you’d find in an airplane. Instead of a few parameters, it holds several thousand.
Temperatures, pressures, strain gauges, all the kinds of engineering information that you’d want if you were trying to maintain a space shuttle, but equally valuable, more valuable if you’re trying to figure out how a space shuttle came apart. And then the normal sort of reconstruction that you’re used to seeing in an aircraft accident. Except in an aircraft accident there’s usually more debris available. Again, this happened at 200,000 feet at Mach 18. That they were able to find anything useful in the forests in east Texas just kind of boggles my mind. But you can get a lot of information from what’s there and the damage that you see in what’s there. But the remarkable thing in an accident like this is there’s also a lot of information available from what’s not there. The lower surface of the left wing, there’s very little debris that actually got recovered from there. There’s a lot of debris that was recovered from the lower surface of the right wing. Why the difference? Well, there must have been something going on in the left wing that heated it up enough that when this debris got released, it wasn’t able to withstand its own little bit of reentry heat. And then you build a timeline of everything. And the MADS recorder is one of the things that really really helped with that.
I’ve pulled out a very small piece of a very large map that got built by the board. Here Columbia is traveling from the northwest to the southeast, north of the Grand Canyon. Each one of these dots is a piece of telemetry, and any time there’s something interesting, they put a little cartoon bubble and tell you what that is. With no a priori opinion about whether it’s going to be relevant, but let’s put it all together so that we can go back and look at it in sequence, and see what we can learn from it. Another piece of really important timeline was recovered by USSPACECOM. USSPACECOM is an organization, it’s part of the Air Force and they’re, one of their important jobs is to track all of the orbital debris that exists in low Earth orbit. If there’s a chunk big enough that’s going to come within about a kilometer of the Space Station within the next few days, well you stop what you’re doing on the Space Station and you do a little bit of a burn to take yourself out of its box, so it never has the chance to come close enough to you to cause a problem.
But what they found upon examining their archival data was that on flight day two, the day after launch, a small object departed the left hand side of the orbiter. They couldn’t tell exactly where it came from, but they could tell it came from the left side. They couldn’t tell you what it was, but they could tell you how much of a radar return they got off of it. And that radar return eventually became very important. We’re going to talk about that in a few minutes. So on the second chart, I showed you the first version of this with a lot of stuff grayed out. People probably can’t see it from the back, that’s fine. I’m going to talk about just a few of these things that after MADS data and the data that was hidden in the ground processing system at White Sands, well there’s a lot more that’s known. So entry interface at 7:44:09. At 7:48:39, so four minutes and 30 seconds later, a strain gauge, this was information derived from MADS, a strain gauge in the left wing begins to show an off nominal increase. This is behavior that had never been seen on any of Columbia’s entries before. So four minutes after entry interface, the vehicle is already beginning to respond to something going wrong on the left hand side of the vehicle. Start of peak heating started at right about 7:51.
Well, at 7:52 there was the first clear indication, again, this was in MADS data, of an off nominal aerodynamic forces on the vehicle. We talked about the hydraulic line return temps going off scale low just after Columbia crossed the California coast. Well just less than 10 seconds after that, the first observer on the ground observed the shedding of debris. And we wouldn’t have been able to time-tag that without the Venus transit that I mentioned earlier. Debris number six just a little bit later, less than a minute later, debris number six, one of the brightest, possibly indicating a loss of an object of greater than a hundred pounds coming off of Columbia. Start of peak heating at 7:55. At 7:58 the Kirtland Air Force Base asymmetrical profile was captured. And just seconds after that the aerodynamic forces on Columbia really started to just go well away from what was sustainable. At 7:59:23, that was the last data on MCC console workstations. Well, it also coincides with when the attitude control system was no longer able to maintain Columbia’s attitude. 7:59:32 was the last crew communication. 8 o’clock and 18 seconds was the onset of orbiter break up, here. And at 8 o’clock and 57 seconds the crew module was observed breaking up into its subcomponents. And at that point of course, we know we had lost the crew. So, the little bit of data that they were able to find in MADS, in the telemetry processing system, and with that lucky transit of Venus, it allowed the board to put together this pretty clear view of what was going on.
So let’s just really briefly summarize what’s, what the clues tell us. This column is the right wing and the fuselage. Kind of highlighted in yellow. This column is the left wing. So the debris from the right side of the vehicle is distributed all together. The debris from the left side of the vehicle is found up range of where the vehicle came apart. That’s an important clue that something was going on on the left side. Everything on the right wing got brief heating as the vehicle came apart. Something that’s big and fluffy is going to slow down very quickly and endure very little heat. Something that is more moderate is going to get more moderate heating. So, everything on the right side got that quicker moderate heating. Everything on the left side, the heavy structure of the wing showed signs of really extended exposure to hypersonic, high heat airflow. The wing tiles from the right side of the vehicle, well the adhesive that holds the tile to the aluminum skin is stronger than the tile material. So if a tile came off, it fractured part way through the thickness. It’s not going to pull off the skin. On the right side of the vehicle, well it’s a room temperature vulcanizing adhesive. And so if the adhesive gets warm the tile adhesive releases, the tile isn’t fractured, instead it has delaminated from the vehicle. This is what was seen. How does that adhesive get warm? The only way that adhesive can get warm is by heating up the inside of the wing, and the heat leaking out through the aluminum skin. Telemetry and the MADS data showed everything is normal on the right side of the vehicle. Telemetry and the MADS data showed an accumulating set of instrumentation going off scale low on the left hand.
And the aerodynamics on the right side were normal, the aerodynamics on the left were not. There was excess drag from the left wing. Initially there was too little lift from the left wing, something was spoiling the lift. Later, as the wing presumably changed shape as structure inside was melted away, kind of surprisingly, the lift from the left wing actually increased. So all of this paints a pretty clear picture that something was going on on the left side of the vehicle. One of the things the Columbia board had to deal with was vast disagreement among engineers about what had happened. There was a large contingent that believed that no foam strike could possibly damage the RCC. And that people looking at that, despite what the MADS data suggested, people looking at that were just plain wrong. And that the real problem is either the foam had struck the left main landing gear door, and provided a gas entry path there, or something else completely unrelated to either of these problems had happened, and people were completely fooling themselves by examining the foam. Something else had gone wrong. And so one of the really important things that happened, and part of the justification was that object that was seen departing from the left side of the orbiter on flight day two. After the board had tested a lot of materials, one of the possibilities for what that object was was a piece of RCC maybe this big. And so let’s run an actual test and we will either learn that this dog can hunt, or we will learn that we’ve been looking in the wrong place. – [Announcer] Three, two, one, zero. (loud noise)
– Whoa.
– Whoa.
– You heard a lot of sounds of surprise, and those were real. There were a lot of people who thought that this test was going to fail. The next view after this one is going to be from inside the wing and look at a couple of those pieces. And think about that flight day two object that was seen in radar. So let’s tie all of this together and we’ll start with the board’s conclusion. A piece of foam from the external tank likely penetrated RCC panel eight or nine, that’s shown in this little red circle here, allowing hot entry gas to enter Columbia’s wing structure during reentry. The left wing lost structural integrity. The wing spar right behind the RCC panel burned through. The orbiter comes in at about a 40 degree pitch, and so, hot gas would have been coming in angled about like this. Well eventually, after the wing breached, there was probably an outflow through the top of the wing. What is that pointed at? It’s pointed at the OMS pod right behind the wing. Why was there no information from the OMS pod? Because the OMS pod had been melted away. Its thermal protection was not sufficient, was not designed to see full reentry heat.
The digital autopilot was reacting to excess lift and drag from the left side of the orbiter. RCS jets, that’s the small jets that hold attitude were trying to help the flight control surfaces maintain attitude, and eventually the combination of the two just wasn’t able to handle the forces trying to pull the orbiter’s nose to the left. And the wing down. And when that happened, the orbiter pitched up so that its belly was directly into its path, and there’s no aerodynamic structure that’s going to handle that for very long. Immediately after the pitch up the left wing just simply came off. And shortly after that the orbiter started breaking up into pieces, and after that happened of course, they started breaking up into smaller and smaller pieces. So if you’ve ever been in an industrial safety course, I’ve been in a couple, the instructor always talks about two core beliefs. All accidents area avoidable, and a near miss is an accident that didn’t happen. You put these two together and it builds a useful mental model of how to avoid accidents. Because any accident is actually a sequence of multiple things that have to happen in order. And if you can break an error chain at any point in that sequence, you’ve kept the accident from happening. So do I believe that all accidents are avoidable? No I do not. Does that mean that I don’t accept this first core belief as an incredibly useful thing? I absolutely accept it. My job as a mission controller was break the error chain whenever I could see it. So what we have here on the right hand side is the underbelly of Atlantis after STS-27. So this would have been two missions after we lost Challenger.
The mission after the return to flight. And some foam came off of the nose cone of the right hand solid rocket booster and tore the belly of the orbiter up. Atlantis is actually the most heavily damaged orbiter to ever safely return from space. It was serious enough, the crew had a much better view of it than the folks in mission control did. They thought that they likely were not going to survive reentry. They very nearly didn’t. The background of this slide shows a complete missing slide, I’m sorry, a complete missing tile. And that exposed the aluminum skin to reentry heat. Just on the other side of that aluminum skin was a piece of heavy structure that supported an antenna. That structure was able to pull enough heat away from the aluminum skin to keep a burn through from happening and saved Atlantis and her crew.
A near miss is an accident that didn’t happen. And if you’re a healthy safety organization you pay attention to the near misses because by killing all the near misses you make yourself safe. So, analysis after the Columbia accident showed that the piece of tile that came off of that bipod area actually happened in about 10% of the launches. NASA was aware of three other instances, including this STS-112 mission that happened two launches before Columbia’s. But deeper examination of archival tracking film showed that there were several more than that. It turned out to be about 10% of launches lost foam from that area. On the other hand, almost every launch had foam that came off of the external tank from somewhere on it. And some of those pieces of foam ended up hitting the orbiter and damaging the thermal protection system, but never in a way that meant that we lost the orbiter on landing. STS-27 came pretty close. That was foam from the solid rocket booster, not the external tank. On STS-45 something, nobody’s sure quite what, but they’re pretty sure it was man made, something hit again, Atlantis’s right wing, and put a crack in it, in the RCC, that was about two inches long. A near miss is an accident that didn’t happen, and you need to pay attention to the near misses to keep future accidents from happening. And then on STS-112, just a few months before Columbia’s launch, a piece of foam came off, this time the same bipod area. Instead of hitting the wing, it traveled further down the vehicle, so ended up with a larger relative velocity, and hit an electronics box kind of down near the base of the left hand solid rocket booster, where it left a pretty decent dent in that electronic box’s cover. Break the error chain and fix the near misses.
Then of course we can go back even further to 1986 when Challenger was lost a little over a minute into flight because two O-rings that keep hot gasses inside the solid rocket booster failed, let hot gasses bypass this joint, where they eventually burned a hole in the liquid hydrogen tank on the external tank. Several previous missions had seen evidence that the O-rings were degraded. On two previous missions that happened with especially low launch day temperatures, the first O-ring had burned through completely. A near miss is an accident that didn’t happen. So, I promised you some of my thoughts. And they’re going to be in two or three different flavors.
First of all, in terms of management issues, there was a mindset because of this in-family problem that we talked about, where we had seen the foam hit the orbiter before and nothing bad happened. Well, it didn’t cause problems then. You have to prove to me that there is a hazard that I don’t know about. Well that’s not the way a really safety conscious organization should deal with it. If it’s something that shouldn’t be there, even if you’ve seen it before, it should be up to me to demonstrate that there isn’t a hazard this time. And that’s not what was going on. NASA was continuously asked to do more with less budget. And that just reinforces those first two bullets. And finally, Congress had imposed upon NASA a goal of having a Space Station component called Node 2, that had to be launched by February 19th, 2004, about a year after the Columbia accident. The Shuttle and Station Program Offices didn’t have any schedule margin left. They were always hovering around about 15 days positive margin or 30 days negative margin as things happened as they got closer and closer to this date. And so that well-defined a launch date requirement, a year, two years, three years into the future, imposed competing priorities on the Shuttle and Station programs and it also caused mismatched expectations between senior NASA management and the workforce. The board asked all of the senior managers, is that a hard and fast date? Would you have been able to change that date if you had really needed to? And uniformly they said, sure we think we could have explained that to Congress and it would have been okay. Folks at the working troop level saw how hard senior managers were fighting to protect that date, and everyone at my level and even several levels higher believed that missing that date was an existential threat to the Space Station. So, how could we adequately treat this risk if everyone had a different opinion of how important that date was? I believe in many ways that the Columbia accident is just the same sort of thing as the Challenger accident.
Both were caused by what I’ve said here is a creeping acceptance of nagging problems by the Shuttle Program Office. In the case of Challenger, it was hot gasses getting past O-rings in the solid rocket booster. And several other things as well, but that was the one that ended up destroying an orbiter. In Columbia’s case it was acceptance of foam falling off the external tank and striking the orbiter. The very first set of requirements that the orbiter was designed under said, nothing can fall off of the external tank and hit the orbiter because you risk the thermal protection system. It happened and it happened and it happened, and it became accepted. And then I have a really deep empathy for the decision-makers, like Linda Ham and Ron Dittemore because in their positions, I believe deep, deep down, I would have made the same decisions. And there’s a really good reason for that. I was in Mission Control, they had started in Mission Control. They were both shuttle flight controllers. They were really good at it. They were so good that they got promoted to flight director. They did an awesome job in the Flight Director Office. And when they had spent enough time there and were looking for other challenges they were given senior jobs in the Space Shuttle Program Office where they did really good work. They made mistakes. You can’t pay attention to everything all the time. And they got complacent about the foam and so we lost an orbiter. The key here is that in order to get into those senior positions, they had to conform to the expectations first of the people in Mission Control, who would be able to promote them to flight director, and then senior NASA management, who would be able to move them into the Program Office. And if they didn’t conform, they wouldn’t get those promotions. And I don’t mean conform here in a bad way, I mean the organization has expectations of people who are going to do well. And if you do well, you’re conforming with those expectations. So people who are sure, yeah I wouldn’t have made that decision.
Well, if you’re right, then that means that you would have never made it out of the bleachers into the big boy seats and big girl seats that people like Ham and Dittemore got to sit in. And so that’s the really difficult piece of all of this, knowing that I would have made the same mistakes. And so this is why in an organization like NASA, that has a very strong can-do attitude, it’s really important to listen when people from outside the business come in and say, you know what, you’re culture needs to be fixed in these two or three areas, I’d really like you to think about it. Instead of saying, we’re Mission Control, we know what we’re doing, no one else does it as well. Well, that’s all true, but that doesn’t mean that you can’t make an improvement and that that improvement isn’t important. And then, what are the enduring lessons, well, I worked in Mission Control when we lost Columbia. It’s on me. You know, this is, it’s one of those things that I think of as we have responsibilities that are simultaneously collective and individual. Yes, the team failed, but yes everyone on the team failed because they didn’t stand up and say, stop, this can’t go on any longer.
An author named Diane Vaughan wrote a remarkable book after the Challenger accident. She’s a sociologist, not an engineer, and so she thinks in terms of influence and hierarchy in communication, and social networks. And the term she brought to my attention is the normalization of deviance. The gradual process through which unacceptable practice or standards become acceptable. As the deviant behavior is repeated without catastrophic result, it becomes the social norm for the organization. So think about every organization that you’re part of, from your family to the United States, and where are you allowing deviance to become normalized. So one of the members of the board paraphrased this as, the unexpected becomes the expected, and then becomes the accepted. And once you start accepting foam coming off of the external tank, and hitting the orbiter, well it’s only a matter of time until something like Columbia happens. And so my parting admonition is that we can’t ever stop taking appropriate risks because the universe kind of turns gray then and life loses everything that really makes it meaningful. What we can’t have are inappropriate risks. And inappropriate risks, the analogy I would draw to the things that we’ve just been talking about, are when management has one view of the importance of a deadline, and everyone at the working level believes something entirely different about the importance of that deadline. If that’s happening, then you can’t have an honest conversation about what the risk is, what the reward is, and whether the risk and the reward match. And so, in the space biz, I would say, don’t accept a risk because you didn’t ask the right questions, imagine the right conditions, or challenge a key assumption that really needed to be challenged. And so, here we go. This is the crew of STS-107, and Columbia on her last launch. So thank you.
(audience applauding)
Search University Place Episodes
Related Stories from PBS Wisconsin's Blog

Donate to sign up. Activate and sign in to Passport. It's that easy to help PBS Wisconsin serve your community through media that educates, inspires, and entertains.
Make your membership gift today
Only for new users: Activate Passport using your code or email address
Already a member?
Look up my account
Need some help? Go to FAQ or visit PBS Passport Help
Need help accessing PBS Wisconsin anywhere?

Online Access | Platform & Device Access | Cable or Satellite Access | Over-The-Air Access
Visit Access Guide
Need help accessing PBS Wisconsin anywhere?

Visit Our
Live TV Access Guide
Online AccessPlatform & Device Access
Cable or Satellite Access
Over-The-Air Access
Visit Access Guide
Follow Us