Performance Profiling How-To (Make My Testbench Faster)

Here’s the situation…

You’re DV lead. You and your team are at month 10 of a 12 month development cycle. The RTL and testbench are feature complete, now you’re ramping up the coverage closure. The eight hour nightly regression you had a month ago has grown into a 12 hour regression with no sign of slowing down. You’re desperate to keep it under 14 hours to keep your feedback loop tight; run the regression overnight, view the results when you get into the office, triage failures in the morning, churn through coverage data and tweak tests in the afternoon, commit changes on your way out, run the regression overnight…

Of course you’ve already taken measures to optimize the loop; test grading to cut out low value tests, tuned command line switches for speed, added backdoor register configuration and a smart wave dumping strategy. Basically, you’re running as lean as you can. The only option you have left is to squeeze more out of the cycles you have left.

In short, you need to make your testbench faster.

The simulation profiling we discussed back in Simulation Performance Profiling Like A Pro is still pretty new to me. Well not new new because I’ve been using it successfully with customers for a couple years now. But during the 18 years of verification I did before joining Siemens I did no profiling. None. Frankly, I don’t even remember hearing about simulation profiling tools – from anyone – so I can’t imagine I’m the only DV engineer without much experience.

Time to fix that.

Building Profiling Experience – Step 1

I’ve put together a tutorial to get started with the new performance profiler.

Disclaimer: I use the word “tutorial” loosely because I’m not actually going to teach you much, at least not right now. Instead, I’ll share the same advice that I give my Siemens colleagues: the best way to get started with profiling is jump in and figure it out. That advice comes from watching colleagues and customers use the profiler. From the dashboard view, everyone seems to take different queues from the data and different paths through the analysis eventually ending at a similar outcome. Basically, level 9 curiosity and a bit of patience are far more useful than instruction when it comes to performance profiling. If you have those, read on.

The “tutorial” includes a basic DUT, a small UVM testbench and a couple scripts to run with and without profiling. I’ve deliberately written snippets of slow code into the testbench. Functionally they get the job done but they perform poorly. A couple snippets are excruciatingly slow, a couple others just slow’ish. The challenge for you is to run the test with the profiling on, use the profiler in Visualizer to identify each snippet and refactor them into something faster.

You can download the “tutorial” at Questa Performance Challenge Volume 1.

Fast Facts and Hot Tips

  • You want Questa/Visualizer 2021.3 to get the latest profiling features.
  • For run-time, my VM takes about 10 minutes to run the testcase as-is. Fully optimized the test should run about 15x faster. That’ll give you an idea of where you start and finish.
  • Don’t worry about what the test does or what the DUT is doing because it’s not at all important. I actually don’t even remember what the testcase does so I wouldn’t be able to help there anyway!
  • To give you a feel for the level of difficulty, a few people here have done the “tutorial” and fixed everything with minimal help. So it’s pretty doable.
  • The hot spots are relatively obvious and easy to identify. If you’re stuck finding the hotspots or you’re having a hard time addressing them, I’d suggest bugging a colleague for a 2nd set of eyes and some brainstorm assistance.

Instructions… Kind Of

Just a few words about driving the profiler and opening profiles in Visualizer. In the run.fprofile script packaged in the “tutorial”, you’ll see data collection requires one switch to vopt and the same switch with a plusarg to vsim.

vopt -fprofile ...
vsim -fprofile+perf ...

Those will get you fdb output in a directory called fprofile. Next step is loading the fdb in Visualizer with the same switch.

visualizer -fprofile+perf+dir=fprofile

Visualizer will load with the performance dashboard. From there, it’s all you.

What If I Don’t Get It?

Well, I’m pretty sure you will get at least some of it, if not most or all of it. But if you find yourself unable to reach 15x and you’ve also stumped a few of your colleagues with it, best to just walk away and check back when I get to the solution code snippets. I’ll be back with those in a few weeks.

Until then, happy profiling :).

-neil

Leave a Reply