Thought Leadership

Designing the next generation of power efficient AI chips part 1

By Spencer Acain

Artificial intelligence is continuing to take an increasingly important role in new software and technology each year. However, even as AI helps make the latest smartphones smarter or self-driving cars safer, so too does it come with an unseen cost. Running the calculations needed for AI to function is not free and can consume a great deal of power, especially when compared to the constrained battery available in something like a smartphones. While this certainly represents a challenge, thanks to advanced High-Level Synthesis (HLS) tools, such as Siemens EDAs Catapult, designers can get the most out of every watt of power used to run the latest AI algorithms. In a recent podcast I discussed these possibilities with Russell Klein, Program Director at Siemens EDA and part of the Catapult HLS team. Below are a few highlights of our conversation. You can listen to the full podcast here or check out the transcript here.

AI, at its core, is simply a collection of large and very complex algorithms that are commonly run on graphics processing units (GPUs) or neural processing units (NPUs) which are well equipped to handle the massive parallel computing requirements of AI algorithms. However, when it comes to power efficiency, the general-purpose nature of GPUs and NPUs causes them to fall far behind more specialized implementations. Specifically, Russ highlights that GPUs have significant extra hardware for processing graphics and both GPUs and NPUs must be able to run any algorithm someone might want to use, drastically limiting what can be cut or optimized when designing the chip compared to a purpose-built AI accelerator designed for use with a single algorithm.

Russ explains that, compared to general purpose chips, AI accelerators created using High-Level Synthesis to run a single algorithm or family of algorithms can offer much better performance per watt in a smaller package. To realize these gains, HLS starts by looking at the algorithm itself as it is described in C or C++ as well as considering the conditions under which the algorithm will run, such as process node, chip architecture, and so on. Conditions might include expected ranges of values the algorithm will function on, for example all inputs might be normalized on the range of 0 to 1 allowing for certain tricks or optimizations to take place in hardware that wouldn’t be possible if the input range was unbounded.

While nothing HLS does couldn’t also be done by a team of engineers with conventional methods, the speed at which it operates is truly impressive – and vital in the realm of AI. Even compared to other areas of computer science and technology, AI is a new and rapidly changing field with advances in algorithms made on the scale of months rather than years. When it comes to designing chips to run these algorithms using a manual design process, this could easily mean in the 18 to 24 months it takes to deliver finished silicon that the algorithm it was originally designed to run has been supplanted by a new, better version. By contrast, HLS can do the same thing in a fraction of the time, allowing hardware designers to leverage the latest and greatest advances in AI to their fullest extent as they are developed.

AI accelerators designed using HLS have countless applications they will be especially vital for integrating AI in power constrained applications including smartphone digital assistants, edge data processing for smart sensors, or processing the data required for full self-driving electric vehicles. Applications like these are not only power limited but demand the fast processing normally associated with large, power-hungry hardware. With purpose-built accelerators however, it’s possible to deliver both the required speed and power efficiency.

As AI continues to take center stage in countless industries, the importance of quickly designing and creating chips capable of running the right algorithm efficiently for any application will only grow more important. HLS is a vital tool in this process which will help enable AI’s integration into areas where current chips simply cannot go, helping bring about a true next generation of smart, connected, technology.


Siemens Digital Industries Software helps organizations of all sizes digitally transform using software, hardware and services from the Siemens Xcelerator business platform. Siemens’ software and the comprehensive digital twin enable companies to optimize their design, engineering and manufacturing processes to turn today’s ideas into the sustainable products of the future. From chips to entire systems, from product to process, across all industries. Siemens Digital Industries Software – Accelerating transformation.

Leave a Reply

This article first appeared on the Siemens Digital Industries Software blog at https://blogs.sw.siemens.com/thought-leadership/2023/03/14/designing-the-next-generation-of-high-efficiency-ai-chips-part-1/