Getting ISO 26262 faults straight
Random hardware faults – i.e. individual gates going nuts and driving a value they’re not supposed to – are practically expected in every electronic device, at a very low probability. When we talk about mobile or home entertainment devices, we could live with their impact. But when we talk about safety critical designs, such as automotive or medical, we could well die from it. That explains why ISO 26262 automotive safety standard is obsessed with analyzing and minimizing the risk they pose. While some may view that obsession as pure pain, I think it’s an exciting new challenge. I’m thrilled to join the Horizons BLOG team and get an opportunity to convince our readers of this view. If I do my job properly, I’ll get to blog much more on ISO 26262, so keep your fingers crossed.
Gates are a lot like bikes. A bike could go wrong in endless ways – I once had a bike so old it literally broke in two with me on it – but bikes usually fail in a few common ways: 70% flat tires, 15% chain-ring corrosion, 13% brakes, 2% everything else. Any bike shop could give you those numbers, and they’ll be largely similar. Which of these problems you get often depends on the ways you ride and the kind of bike you have. The exact same goes for gates: though they could go wrong in endless ways, they usually go wrong in just a few, which largely depend on environmental conditions and production process. The most common “failure modes” for gates are single event and stuck-at, which basically mean the gate gets a wrong value for one cycle or indefinitely. Your fab and some scientific measurements could give you the probabilities per each.
Some bike “faults” will be “safe” and others “unsafe”. With a flat tire you still get to stop on the road-side and curse, but not if you lose your brakes downhill. Some faults will be safe in one state and unsafe in another – lose your brakes on a plain road and you’re probably fine. At a high level, ISO requires that you look at the faults the gates in your design could have, then make sure the “unsafe” fault probability is below a certain number. Sticking to our bike example, we could say flat-tire and chain-ring problems are “safe”, and assuming all our trips are either down or uphill, we’re left with 5% “unsafe” faults, plus everything hiding in the remaining 2%.
7% unsafe faults are way too much for some ISO certifications, so what do we do? The expansive way is to put in a redundant brake system. The smart way is to refine our analysis and check if downhill drives are really 5%, and if all of them are really that bad. This can be a complicated thing to do, but would sure be cheaper than shipping an additional brake system with every bike. When we come to complex ICs, “smart” needs to be “very very smart” and “cheaper” might mean you get to keep your job. That explains, why, as I said, I find this such a challenging problem to solve. If you still think “fault analysis” is pure pain, I hope you see by now “no fault analysis” can be much worse.
For more information on getting ISO 26262 faults straights, please review my full article on Verification Academy.
I look forward to your comments.
Comments
Leave a Reply
You must be logged in to post a comment.
You referred to Stuck-At faults which are mainly used in Design For Test (DFT). I am wondering how DFT efforts and ISO 26262 relate to each other. Does ISO 26262 define new types of faults? can I extend DFT fault simulation to my ISO 26262 certification? Any info on this is appreciated.
Hi Amir,
There is some similarity but also a good deal of difference. DFT was designed mainly to detect production time problems. However, with shrinking geometries and low supply voltages, problems such as stuck-at and single event upsets are becoming more frequent during actual run time as well. That is, a flop that came out of the fab smiling and healthy, might become stuck-at at some point of its life, while the vehicle is being used. When this happens your design should be able to detect the situation and recover or report it if it can violate safety goal. Hence you can already see that the problem is no longer post-silicon one, but also goes well into the domain of pre-silicon verification. DFT patterns might be of some use (if the chip can be put in test mode while in the vehicle), but might not provide the “real time” recovery/diagnostics required.