This year’s DVCon U.S. saw many great papers, posters, and tutorials; covering almost every aspect of functional verification. Thus, in light of such tough competition, it is with great pride to share the following review of my colleagues’ Best Paper award winner: Formal Verification Experiences: Spiral Refinement Methodology for Silicon Bug Hunt, by Ping Yeung, Mark Eslinger, and Jin Hou.
First, it is important to realize that this paper focuses on a verification flow that most people would not immediately associate with formal verification: POST-silicon debug – i.e. the chip samples have come back from the fab, they are correctly setup in the test jig, but sadly they are not behaving the way they should. Based on our experience, such silicon bugs are incredibly tricky to root-cause. Most of them happen deep in functional operation under unusual combinations of events and scenarios. Indeed, Ping, et.al.’s paper is presented in the context of three such real-world failures: a DDR3 controller, a proprietary memory controller design, and packet scheduler IP.
At a 50,000ft level here is how a formal-based post-silicon debug flow works:
* Working in tandem with the design engineers, and guided by clues from the failures being observed in the lab, the formal verification engineer writes a formal testbench against the area(s) of the golden RTL that are suspected to be causing the failure(s).
* The team then iterates with the formal analysis to replicate the failures and uncover their causes, formally exploring all possible input stimuli, transactions, and combinations of external events in the selected IP(s).
* Sometimes only one bug turns out to be the trouble maker; but typically this flow finds multiple/combinations of root causes for the failure scenario(s).
* Plus (and here is where a formal approach really shines) once the bug(s) are fixed in the RTL, the engineers can exhaustively validate that the repairs effectively and completely addresses the failures AND do not introduce any new bugs or side-effects.
Piece of cake, right?
Unpacking this flow in actionable detail – taking it from the 50,000ft level bullet points above, and going down to the 500ft level by effectively elaborating on how it works in-practice, and what is required for success – is IMHO why this paper earned the Best Paper accolade. Specifically, the crux of the paper is the introduction of six specific factors that are critical to the success of this flow, and how you can re-balance each one depending on the availably of the supporting resources to do a “spiral refinement” that converges quickly.
Based on experience with multiple customers, a formal-based bug hunting flow must address the following six factors that are critical to success:
- The complexity of the design
- The quality of the sub-goals and target assertions
- The completeness of the interface constraints
- The control and orchestration of the formal engines
- The quality of the initial states for formal exploration
- The formal expertise of the users
Again, as noted above, the paper brings this framework to life with three customer case studies. In a nutshell, the process is to identify obstacles then gradually improve or refine each of the success factors so that you can “zero-in” on the bug(s) that are in the silicon. Once the bug(s) are identified, the formal verification environment can serve as a golden setup to verify any potential software and RTL fixes. It is very efficient in reducing the turn-around time and helps prevent the introduction of new issues.
In conclusion, I hope you never have to use this flow (i.e. I hope you never get “that call” from the lab engineers). But if the worst happened, this methodology could save your project so I strongly recommend downloading this paper, printing it out, and storing it in your top desk drawer in case of emergency.
Joe Hupcey III,
for the Questa Formal team
P.S. Our team is on a winning streak! Please review my post on the DAC 2020 Best Paper, Easy Deadlock Verification and Debug with Advanced Formal Verification given how critical it is for many applications to exhaustively prove the design can absolutely never deadlock.