How to Avoid Metastability on Reset Signal Networks, a/k/a Reset Check is the New CDC
It’s axiomatic that digital circuitry must initialize properly before it’s used. Once upon a time, verifying a design’s reset signaling was a pretty straightforward process – simply confirm the continuity of reset signal from the pad ring to all the IPs and instances inside the DUT. Fast forward to the present, and previously unheard of bugs on reset signal networks are being created via:
* SoCs comprised of many IPs, where 3rd party IPs can handle clocking and reset differently
* Multiple power shutoff domains, where each domain has its own reset signaling
* Aggressive optimization of reset signaling networks to reduce power and area overhead.
To illustrate, consider the following cases where reset and clock signaling are not always in the same domains, potentially leading to big trouble for the second flip flop (dff2):
Case 1: rst1 & rst2 are different asynchronous reset signals despite same clock / same clock domain (i.e. clk1 = clk2)
Case 2: rst1 & rst2 are different asynchronous reset signals and clk1 and clk2 are in different clock domains
Alternatively, to borrow from the DVCon USA 2016 paper on this subject, Reset and Initialization, the Good, the Bad and the Ugly by Ping Yeung of Mentor Graphics and Kaowen Liu of MediaTek, design registers can be sorted into three types: GOOD registers (those that are initialized properly), BAD registers (those that are not initialized) and UGLY registers (those that are initialized, but are subsequently corrupted).
The bottom line is that independent “reset domains” can give rise to metastability and signal reconvergence issues similar to clock domain crossing (CDC) bugs. The proliferation of reset domains, and the increasing complexity of rest signaling topologies, means that manual inspection & verification is unsafe, creating considerable risk of unpredictable chip behavior when samples come back. Even worse, like with CDC, the metastability induced by mixing independent reset domains cannot be modeled accurately in simulation.
So what can be done?
“Simple”: leverage time-tested structural and formal CDC analysis techniques!
First, reset signaling networks look a lot like clock trees, giving rise to the concept of reset domains like is done for multiple clock signals in a CDC analysis. Plus, as with CDC verification, automated, exhaustive formal analysis can be applied. Building on over a decade of experience with formal-based CDC verification, Mentor has developed a fully automated formal solution that exhaustively verifies your reset signaling network.
Specifically, taking RTL as input, the Questa Reset Check app automatically performs an exhaustive, bottom-up reset tree analysis – inferring all the reset structures of a design, including gating and control logic. The app then automatically generates and proves assertions that cover numerous reset-specific structural checks. No knowledge of formal or property specification languages is required.
The results include reports on the synchronicity, polarity and set/reset functionality of all reset tree register nodes; any reset domain crossing (RDC) signals of adjacent registers within the same clock domain; and the complete matrix of clock and reset signals’ relationships.
Back to our original example above, consider the following results where the detected violations are tabulated by the app:
The first class of error shown is the case where the IPs are in the same clock domain, but they are connected to different asynchronous reset signals. This is “BAD” and will lead to chip killing metastability on the reset signals.
Another violation reported by the tool in the lower part of the image are the cases where asynchronous and synchronous resets are wired up in the same clock domain. Depending on the design configuration, this might not be a bad thing – if this is OK the tool supports the creation of a “waiver” so this connection is no longer marked as an error.
Like with CDC verification’s many layers of complexity, these two cases are only the starting point of a proper reset signaling analysis — there are many other second order effects that Questa Reset Check detects and reports. Only an exhaustive formal analysis can verify all of this with mathematical certainly, and thus the Questa Reset Check app was created to help customers address these challenges.
Are you seeing “reset domain crossing” (RDC) verification issues appearing in your SoC projects? Please share your thoughts in the comments below, or contact me offline.
Until next time, may your coverage be high and your power consumption be low,
Joe Hupcey III
References:
Comments
Leave a Reply
You must be logged in to post a comment.
I am getting following error while trying to compile library files. Error is encountered while using “netlist load lib” command.
# Command: netlist load lib
# Command arguments:
# LICENSE: ERROR: Transaction request failed
# // License request for zncompccl feature failed
# FLEX ERROR: -15, feature: zncompccl, checkout none, server: 27000@10.100.2.12
# , reason: Cannot connect to license server system.
# FLEX ERROR: -15, feature: zncompccl, checkout none, server: 27000@10.100.2.14
# , reason: Cannot connect to license server system.
# FLEX ERROR: -15, feature: zncompccl, checkout none, server: 29000@10.100.2.20
# , reason: Cannot connect to license server system.
# Summary: 0 Fatals, 0 Errors, 0 Warnings in netlist load lib
# Fatal : No fatal messages have been queued but command returned failure. [command-32]
# Final Process Statistics: Max memory 281MB, CPU time 0s, Total time 120s
# End of log Sun Aug 9 07:37:51 2020
Hi Kapil,
Please file a support request at http://supportnet.mentor.com/ so the technical Support Team can follow-up with you on this issue.
Joe