Random hardware faults – i.e. individual gates going nuts and driving a value they’re not supposed to – are practically expected in every electronic device, at a very low probability. When we talk about mobile or home entertainment devices, we could live with their impact. But when we talk about safety critical designs, such as automotive or medical, we could well die from it. That explains why ISO 26262 automotive safety standard is obsessed with analyzing and minimizing the risk they pose. While some may view that obsession as pure pain, I think it’s an exciting new challenge. I’m thrilled to join the Horizons BLOG team and get an opportunity to convince our readers of this view. If I do my job properly, I’ll get to blog much more on ISO 26262, so keep your fingers crossed.
Gates are a lot like bikes. A bike could go wrong in endless ways – I once had a bike so old it literally broke in two with me on it – but bikes usually fail in a few common ways: 70% flat tires, 15% chain-ring corrosion, 13% brakes, 2% everything else. Any bike shop could give you those numbers, and they’ll be largely similar. Which of these problems you get often depends on the ways you ride and the kind of bike you have. The exact same goes for gates: though they could go wrong in endless ways, they usually go wrong in just a few, which largely depend on environmental conditions and production process. The most common “failure modes” for gates are single event and stuck-at, which basically mean the gate gets a wrong value for one cycle or indefinitely. Your fab and some scientific measurements could give you the probabilities per each.
Some bike “faults” will be “safe” and others “unsafe”. With a flat tire you still get to stop on the road-side and curse, but not if you lose your brakes downhill. Some faults will be safe in one state and unsafe in another – lose your brakes on a plain road and you’re probably fine. At a high level, ISO requires that you look at the faults the gates in your design could have, then make sure the “unsafe” fault probability is below a certain number. Sticking to our bike example, we could say flat-tire and chain-ring problems are “safe”, and assuming all our trips are either down or uphill, we’re left with 5% “unsafe” faults, plus everything hiding in the remaining 2%.
7% unsafe faults are way too much for some ISO certifications, so what do we do? The expansive way is to put in a redundant brake system. The smart way is to refine our analysis and check if downhill drives are really 5%, and if all of them are really that bad. This can be a complicated thing to do, but would sure be cheaper than shipping an additional brake system with every bike. When we come to complex ICs, “smart” needs to be “very very smart” and “cheaper” might mean you get to keep your job. That explains, why, as I said, I find this such a challenging problem to solve. If you still think “fault analysis” is pure pain, I hope you see by now “no fault analysis” can be much worse.
For more information on getting ISO 26262 faults straights, please review my full article on Verification Academy.
I look forward to your comments.