What is the winning formula for running library characterization on the cloud?

According to IDG’s 2020 Cloud Computing Study, 32% of IT budgets in organizations will be allocated to cloud computing this year. That’s an impressive number, considering just a few years ago, IT hardware expenditure was focused mostly on compute resources that were on-premises. Today, an increasing number of semiconductor and EDA workloads are now being prepared for cloud deployment, or being built for the cloud.

Library characterization is one such application. Many library characterization teams today have been constrained by hardware requirements when delivering characterized .lib to their users. Since we typically need up to hundreds of millions of SPICE measurements to complete a characterization run, achieving quick turnaround times is largely dependent on having more CPUs to work with.

Cloud platforms provide a good way of addressing this, by providing a large pool of compute that supplements on-premises infrastructure, allowing characterization teams to “burst” through high priority tasks. At Siemens, we actively work with cloud providers to facilitate this. As an example, check out our whitepaper on AWS cloud characterization.

It’s tempting to stack as many CPUs as feasible to complete characterization jobs quickly. In the ideal world, the total CPU cost of a cloud-launched job would remain constant regardless number of CPUs used, since CPU x time would remain the same. However, each additional CPU incurs an efficiency penalty, depending on the task, software, cloud configuration, and how Amdahl’s law impacts that particular computational task. In addition, there may be non-compute related tasks (like engineering time spent on library validation) that puts a floor on total schedule time required.

So, we’re faced with a new tradeoff now. How many CPUs do we run in parallel, before we’re no longer maximizing our utilization of cloud resources?

In order to answer this, we should ask ourselves: What is the cost of additional throughput, and how does that compare with our target runtime?

For example, reducing days of runtime to an overnight run is very desirable for characterization teams, as it maximizes the time library production teams and integration teams can work together. On the other hand, reducing runtime from 4 hours to 2 hours may not be a priority if it means spending thousands of dollars more on that run.

We should of course keep in mind the methods to increase cost-effectiveness that scale with number of CPUs, such as using the right cloud configuration (i.e., VM type, storage type fit for the task), cloud-optimized software, and low-cost CPUs like spot instances. Therefore, taking these steps in conjunction with balancing runtime targets and cost is a sound strategy that will allow us to get the most out of our time and budget for every characterization job.

Leave a Reply