The other day I was watching a documentary about Mars rovers, and at one point a story was told about a computer on the rover that almost had to be completely thrown out because someone dropped a tool on a table next to it. Not on it, next to it. This same rover also was planned to land by a literal freefall; crash landing onto airbags. And that’s not even covering vibrations and G-forces experienced during the launch and reaching escape velocity.
I’ve heard similar anecdotes about the fragility of spacecraft. Apollo astronauts being nervous that a stray floating object or foot may unintentionally rip through the thin bulkheads of the lunar lander. The Hubble space telescope returning unclear and almost unusable pictures due to an imperfection in the mirror 1/50th the thickness of a human hair, etc.
How can NASA and other space agencies be confident that these occasionally microscopic imperfections that can result in catastrophic consequences will not happen during what must be extreme stresses experienced during launch, travel, or re-entry/landing?
​
​
EDIT: Thank you for all the responses, but I think that some of you are misunderstanding the question. Im not asking why spacecraft parts are made out of lightweight materials and therefore are naturally more fragile than more durable ones. Im also not asking why they need to be 100% sure that the part remains operational.
I’m asking why they can be confident that parts which have such a low potential threshold for failure can be trusted to remain operational through the stresses of flight.
In: 3487
> How can NASA and other space agencies be confident that these occasionally microscopic imperfections that can result in catastrophic consequences will not happen during what must be extreme stresses experienced during launch, travel, or re-entry/landing?
They can’t, not to 100% certainty. They put in as much redundancy as they can and test the components as much as possible, but at some point you actually have to launch the damn thing into space. There have been cases when problems haven’t been caught and it has resulted in things like the Challenger Disaster. In that case, O-Rings were manufactured to the wrong tolerance and things didn’t get sealed properly, resulting in the shuttle exploding during takeoff.
> I’m asking why they can be confident that parts which have such a low potential threshold for failure can be trusted to remain operational through the stresses of flight.
Because they do a lot of testing, and they have gotten really good at learning from their mistakes. Basically, everything something has gone wrong NASA does a deep dive to understand what happened and what they can do to prevent it from happening again.
Let’s pretend we have a coathanger. One of those old wire coathangers. You can hang a coat on it pretty easy, hence the name. You might even be able to hang a really heavy coat on it. But if you were to push on the hook on that coathanger, it would twist and you could unwind it pretty easily. Then it wouldn’t hold a coat anymore. It might not be able to anything at all!
Now, let’s pretend this computer has a switch on it, and a switch got shaken loose. When it comes to to remotely press that switch, what if it just… doesn’t go?
Pretty specific example, right? It’s not the same thing, but here’s a case of a faulty switch [causing a crash](https://www.nature.com/articles/news041018-1) in a spacecraft. This one wasn’t installed properly and the entire mission ended in rapid, unplanned deconstruction of the craft and its samples.
I’m a little late to the party so this will likely get buried, but I’m a space systems engineer with nearly 10 years experience working on interplanetary science missions. I write the requirements for space missions as well as author and control the high level processes that govern events exactly like what you’re asking about here (among other things). There’s a lot going on, and it’s not well suited for an ELI5, but I’ll do my best.
First, the tool drop next to a flight computer. This is known in NASA/aerospace jargon as an anomaly. An anomaly is ANY event that occurs which is unexpected during the lifecycle of any given piece of hardware or software. This could be something as obvious as an unexpected error in your software to dropping a whole damn satellite off of a fixture supporting it during ground processing (assembly, integration, and test) prior to flight – and yes, that really has happened. In this case, the tool drop on the bench next to the flight computer (“avionics”) is a shock event. Shock in this context means a sudden transient acceleration, which is actually pretty rare in space flight context – usually only occurring at the moment of the spacecraft separation from the launch vehicle. We do typically test for this shock during the environmental test campaign on a fully integrated spacecraft (known in the industry as “test like you fly, fly like you test” where we subject the spacecraft on the ground to as close to expected flight conditions as we can replicate) but don’t typically submit individual components or subsystems like a computer to a shock test. The reason for this is because spacecraft structures are designed specifically to, among other things, dampen the shock at the sensitive components like the flight computer. So now we’ve had an unexpected shock event for our flight computer (that’s an anomaly) – but it happened in a totally uncontrolled manner. Likely the loads it was subjected to were fine, but to know for sure we’d have to know and analyze a lot of things that we just couldn’t – what height was the tool dropped from, angle and surface area of impact, weight of the tool, mechanical properties of the bench, anything on the bench that could have dampened or amplified vibration, etc. – you get the picture. Okay, so there’s no way to know for sure the shock the computer experienced, so the next best option is to inspect the computer for damage – except that would mean taking it all the way apart, using a scanning electron microscope to inspect each circuit board, reassemble it, and continue on with integration to the spacecraft. This is expensive and time consuming, and Mars launch windows only come every 2 years, so if the delay is long enough (which it likely would have been) means you’re paying lots of money to keep the spacecraft sitting on the ground waiting for the next launch opportunity. At this point, the mission risk posture comes into play – do you use the computer as-is or accept the cost and schedule overruns? This can get very political very fast, especially for something as high profile as a Mars Rover, so management was likely inclined to rely on redundancy (back up computer or computers) in the system instead.
To answer your more general question, we design and test spacecraft for extremely specific environments and use cases. These are highly complex and expensive systems, so anything outside of those narrow parameters are approached with extreme caution, and the ultimate decision will likely come down to the organization’s or individual mission’s risk tolerance.
Every component on a satellite goes through environmental testing(thermal, shock, vibration, etc) based on where it is on the sat. and what loads it will see on launch/release and steady state operation. Loads are generated from collected data and finite element analysis and then a safety factor is added on top of that for testing. So basically you test each component to excessive levels that they will never see in operation(hopefully). Say your component doesn’t survive environmental testing one option is to mechanically isolated it from the sat. using shock and vibration absorbers.
This is what we did at my old job, we would make an enineering unit(EDU) put it thru all the testing and check performance. Then we make qualification units that we beat the shit out of. Lastly, we made flight units that had minimal testing to ensure performance.
Latest Answers