Cloud Computing Makes Overnight TAT Attainable
By Matthew Hogan and Derong Yan
As we all know, during the final sign-off verifications of full chip system-on-chip (SoC) designs, the turnaround time (TAT) of each verification flow is crucial to meeting the tapeout deadlines and time-to-market (TTM) of products. Being able to achieve a TAT of 8-10 hours, which allows designers to run verification flows and fix violations multiple times within a 24-hour period, is the target of all SoC designers and electronic design automation (EDA) tool suppliers. However, with shrinking process nodes and design feature sizes, and increasing numbers of transistors being squeezed into SoC designs, the computing resources needed to run verifications of a SoC design have significantly increased. During tapeout crunch time, SoC designers often want to run multiple types of verification flows in parallel to speed up the verification cycles. But most fabless design companies do not have hundreds or thousands of large computing servers onsite, due to the equipment and maintenance costs, so the verification TAT of a SoC design is often constrained by the available computing resource. Without enough computing servers to distribute the jobs for faster runtimes, verification flows can take days to complete on a full chip SoC design.
Here is some good news: Cloud computing offers a farm of computing servers at different sizes in terms of CPU number and physical memory. Depending on the verification job, users can reserve a large number of servers on demand, and instantly get access to virtually unlimited computing resources to meet peak demand during tapeout crunch. Cloud computing has made it possible for SoC design companies who are willing to fund the conversion of multi-day verification jobs into essentially overnight runs for the benefit of reducing their product TTM. Of course, the actual impact of cloud computing on verification TAT will depend on the types of verification flows, the SoC designs, and the EDA tools used.
The Calibre® PERC™ reliability verification platform is widely used for the verification of electrostatic discharge (ESD) reliability issues in SoC designs at intellectual property (IP), large block, and full-chip levels. To study the benefits of running Calibre PERC flows in the cloud, we ran a Calibre PERC flow on a full-chip SoC design using one major cloud service provider. The SoC design was based on an advanced process node from a leading semiconductor foundry.
In our experiments, we ran the same Calibre PERC flow (using the same SoC design and the same rule decks) a total of three times using different numbers of cloud servers: (1) on one cloud server using Calibre MT multi-threading technology, (2) on 5 cloud servers using Calibre MTflex flexible multi-threading technology, and (3) on 51 cloud servers using Calibre MTflex technology.
When running on one server, the Calibre PERC flow took 4+ days (106 hours) to complete. When running on 5 servers using Calibre MTflex, the runtime improved significantly, but the flow still took more than a day (31 hours) to complete. However, when running the flow on 51 servers using Calibre MTflex, the flow completed in 9.5 hours, which would enable nearly three runs in each 24-hour period.
When using cloud servers, users are charged based on the number and type of servers used, and the total server usage time. The ratio between our 2nd and 3rd runs was (5 servers x 31 hours) : (51 servers x 9.5 hours) for a cost ratio of 1:3.13. This ratio tells us, in this particular example, that a company willing to fund 3X the spend can get their verification run done 3X as fast. Of course, the ratio between the cost and the runtime improvement will vary with the types of Calibre PERC flows, the rule decks, and the SoC designs.
Within most fabless SoC design companies, it is usually difficult for a design team to acquire 50 servers at once to run a Calibre PERC flow. However, they can easily acquire 50 or more servers on the cloud with a few mouse clicks, and instantly turn a multi-day Calibre PERC flow into an overnight run. With cloud technology making virtually limitless resources available, the critical decision now becomes the cost-benefit analysis. What is an overnight run worth to your company?
For more details, please check out our technical paper, Reliability verification in the cloud delivers significant runtime benefits. Have questions about running Calibre PERC reliability verification in the cloud? Let us know!