Designing the next generation of power efficient AI chips part 1

Artificial intelligence is continuing to take an increasingly important role in new software and technology each year. However, even as AI helps make the latest smartphones smarter or self-driving cars safer, so too does it come with an unseen cost. Running the calculations needed for AI to function is not free and can consume a great deal of power, especially when compared to the constrained battery available in something like a smartphones. While this certainly represents a challenge, thanks to advanced High-Level Synthesis (HLS) tools, such as Siemens EDAs Catapult, designers can get the most out of every watt of power used to run the latest AI algorithms. In a recent podcast I discussed these possibilities with Russell Klein, Program Director at Siemens EDA and part of the Catapult HLS team. Below are a few highlights of our conversation. You can listen to the full podcast here or check out the transcript here.

AI, at its core, is simply a collection of large and very complex algorithms that are commonly run on graphics processing units (GPUs) or neural processing units (NPUs) which are well equipped to handle the massive parallel computing requirements of AI algorithms. However, when it comes to power efficiency, the general-purpose nature of GPUs and NPUs causes them to fall far behind more specialized implementations. Specifically, Russ highlights that GPUs have significant extra hardware for processing graphics and both GPUs and NPUs must be able to run any algorithm someone might want to use, drastically limiting what can be cut or optimized when designing the chip compared to a purpose-built AI accelerator designed for use with a single algorithm.

Russ explains that, compared to general purpose chips, AI accelerators created using High-Level Synthesis to run a single algorithm or family of algorithms can offer much better performance per watt in a smaller package. To realize these gains, HLS starts by looking at the algorithm itself as it is described in C or C++ as well as considering the conditions under which the algorithm will run, such as process node, chip architecture, and so on. Conditions might include expected ranges of values the algorithm will function on, for example all inputs might be normalized on the range of 0 to 1 allowing for certain tricks or optimizations to take place in hardware that wouldn’t be possible if the input range was unbounded.

While nothing HLS does couldn’t also be done by a team of engineers with conventional methods, the speed at which it operates is truly impressive – and vital in the realm of AI. Even compared to other areas of computer science and technology, AI is a new and rapidly changing field with advances in algorithms made on the scale of months rather than years. When it comes to designing chips to run these algorithms using a manual design process, this could easily mean in the 18 to 24 months it takes to deliver finished silicon that the algorithm it was originally designed to run has been supplanted by a new, better version. By contrast, HLS can do the same thing in a fraction of the time, allowing hardware designers to leverage the latest and greatest advances in AI to their fullest extent as they are developed.

AI accelerators designed using HLS have countless applications they will be especially vital for integrating AI in power constrained applications including smartphone digital assistants, edge data processing for smart sensors, or processing the data required for full self-driving electric vehicles. Applications like these are not only power limited but demand the fast processing normally associated with large, power-hungry hardware. With purpose-built accelerators however, it’s possible to deliver both the required speed and power efficiency.

As AI continues to take center stage in countless industries, the importance of quickly designing and creating chips capable of running the right algorithm efficiently for any application will only grow more important. HLS is a vital tool in this process which will help enable AI’s integration into areas where current chips simply cannot go, helping bring about a true next generation of smart, connected, technology.


Siemens Digital Industries Software is driving transformation to enable a digital enterprise where engineering, manufacturing and electronics design meet tomorrow. Xcelerator, the comprehensive and integrated portfolio of software and services from Siemens Digital Industries Software, helps companies of all sizes create and leverage a comprehensive digital twin that provides organizations with new insights, opportunities and levels of automation to drive innovation.

For more information on Siemens Digital Industries Software products and services, visit siemens.com/software or follow us on LinkedIn, Twitter, Facebook and Instagram.

Siemens Digital Industries Software – Where today meets tomorrow.

Leave a Reply

This article first appeared on the Siemens Digital Industries Software blog at https://blogs.sw.siemens.com/thought-leadership/2023/03/14/designing-the-next-generation-of-high-efficiency-ai-chips-part-1/