Do You Know for Sure Your RISC-V RTL Doesn’t Contain Any Surprises?

RISC-V
RISC-V

Given the relative novelty and complexity of RISC-V RTL designs, whether you are buying a commercially supported core or downloading a popular open-source offering, there is the small but non-zero risk of unwanted surprises escaping undetected into your end-product. In order of high-to-low probability, consider:

  • The presence of a weird-yet-entirely-possible corner-case bug
  • Bugs “inside” the custom instructions that you or your vendor are creating for your application
  • Bugs “at the edges” of a custom instruction — e.g., the instruction executes correctly, but it somehow leaves the machine in a corrupted state
  • Related: undocumented and/or poorly specified new features that unwittingly open up holes in the design
  • Malicious Trojan logic surreptitiously inserted via a supply chain attack

Recently I published a long-form article in Semiconductor Engineering on all the mitigation approaches that you can take to reduce these risks, where the focus is on an exhaustive, formal-based flow to verify that a RISC-V ISA is free from gaps and inconsistencies (and any Trojan Horse logic). Hence, this post is the “Cliff Notes version” to enable you to decide if you should read the full article with detailed code examples.

Common Mitigation Approaches

  • Expert inspection of the incoming or developing RTL code: Obviously, this should be done. But as they say in the mathematical world — “this is necessary but not sufficient”.
  • Simulation-based methods: Instruction Set Simulation (ISS), automated comparison of your DUT output to mature golden models, constrained-random UVM testbenches for DUT RTL simulation, and even piping real world stimulus into hardware-assisted emulation of the DUT are all valuable and should be employed. But they all suffer from the same flaw: they are inherently incomplete as they rely on non-exhaustive stimulus generation. (And in a supply chain attack, the Trojan logic developer has deliberately created a triggering sequence of signaling and data that will likely escape detection by even the most creative white hat hacker.)

The bottom line is that the only way to be completely sure that your RISC-V RTL is free of any natural or malicious surprises is to apply exhaustive, formal methods to verify the design.

A Three Stage Formal-based Process

The recommended formal-based flow unfolds as follows:

Step 1 – Create a formal testbench that “models” the DUT specification

The foundation of this methodology is to write a set of properties that represent the behavior of each instruction in your RISC-V design. The task is to capture the effect of a given instruction on the IP’s outputs and internal architectural states — in the RISC-V world, this is the program counter (PC) and register file (RF)) for any given arbitrarily long input sequence. This is done using a purpose-built extension to IEEE SVA called Operational SVA. In a nutshell, this is a library that’s shipped with the formal-based processor verification tool; and from the verification engineer’s point-of-view, it looks like an intuitive subset of familiar SVA code.

Step 2 – Define the input constraints and checks to be run against the DUT

To complement the specification properties for each instruction, the next step is to define the input constraints and any additional output checks. Again, Operational SVA is employed, where the user now specifies a “completeness plan” to define both legal inputs and illegal signaling that the DUT should be ignoring. Rephrasing, in Step 1 above you create what is effectively a cycle-accurate model of the DUT that must be proven true for all time and all inputs; in this step you set up the constraints and any special behaviors to look out for. Add these together, and in effect you have a formal testbench that’s ready to run!

Step 3 – Execute analysis

The goal of all formal tools is to exhaustively prove that all properties are true for all time and all inputs. In the case of RISC-V processor verification, the tool works to prove that any arbitrarily long input sequence can be matched to a unique sequence of the specified Operational SVA that predicts the values of outputs and architectural states. Typically, in RISC-V DUTs that the team has worked with, failures are found in the interior of instruction’s RTL logic or in the “handoff logic” that tees up the next legal branch/instruction.

The good news is that when such issues are fixed and the formal analysis proves all the properties, the results are truly “complete” — i.e., you can be mathematically certain that there are no RTL coding errors — the formal analysis has literally proven the absence of any bugs and unexpected behaviors!

Results

Putting this methodology to the test, my colleagues did a case study using the popular Rocket Chip open-source core. Specifically, the RV64IMAFDC – sv39 vm configuration was examined. This is a 64-bit processor core with a 39-bit virtual memory system [5] and extensions, such as compressed and atomic instructions.

Although this snapshot of the Rocket Chip had been extensively verified and taped out multiple times, four previously unknown, corner-case, suspicious behaviors were identified by this flow and reported to the Rocket Core RTL developers ([2], [3], [4], and [5]). Elaborating on [5] specifically — the discovery of a non-standard instruction CEASE (Opcode 0x30500073) in the RTL: clearly the Rocket Chip team fell behind on their documentation (and they fixed this almost immediately upon filing of the GitHub pull request).

The larger point is that this demonstrates that the logic required to implement an entire instruction — dozens of lines of code and many gates of synthesized, placed, and routed logic — escaped detection by visual inspection, RTL simulation, gate level simulation, the whole back-end implementation process, and hardware prototypes in the lab. Only this formal-based approach was able to find these issues!

Summary

The complete, formal-based processor verification approach presented in this article uses an extension to IEEE SVA, Operational SVA, to formally verify that a RISC-V ISA is free from gaps and inconsistencies. Unlike constrained-random simulation testbenches, emulation, or physical prototyping, the complete set of properties and constraints exhaustively detects many types of RTL errors, as well as the presence of undocumented or poorly specified code and malicious Trojans.

If this flow is of interest, please feel free to reach out to me to discuss it further.

Joe Hupcey III
Siemens EDA

Reference Links

0 – Siemens EDA formal-based RISC-V solutions

1 – Long-form article (with code examples) in Semiconductor Engineering: Do You Know For Sure Your RISC-V RTL Doesn’t Contain Any Surprises?

2 – DIV Instruction Result not Written to Register File:
https://github.com/freechipsproject/rocket-chip/issues/1752

3 – JAL and JALR Instructions Store Different Return PC:
https://github.com/freechipsproject/rocket-chip/issues/1757

4 – Illegal Opcodes Replayed and Causing Unexpected Side Effects: https://github.com/freechipsproject/rocket-chip/issues/1861

5 – Non-standard Instruction CEASE (Opcode 0x30500073) Discovered in RTL: https://github.com/freechipsproject/rocket-chip/issues/1868

6 – Verification Horizons blog, How Can You Say That Formal Verification Is Exhaustive?, Joe Hupcey III
https://blogs.sw.siemens.com/verificationhorizons/2021/09/16/how-can-you-say-that-formal-verification-is-exhaustive/

Leave a Reply