{"id":13009,"date":"2026-03-09T17:00:06","date_gmt":"2026-03-09T21:00:06","guid":{"rendered":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/?p=13009"},"modified":"2026-03-26T12:35:54","modified_gmt":"2026-03-26T16:35:54","slug":"the-importance-of-flexibility-in-ai-chip-design-podcast-transcript","status":"publish","type":"post","link":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/the-importance-of-flexibility-in-ai-chip-design-podcast-transcript\/","title":{"rendered":"The importance of flexibility in AI chip design podcast &#8211; Transcript"},"content":{"rendered":"\n<p>AI is not a one size fits all solution for approaching industrial problems, rather, it\u2019s important to tailor different models to different applications to ensure the best results no matter the use case. Equally so, the chips that run these AI models should not be one size fits all either, with highly customized chips offering far better performance and efficiency when it comes to accelerating AI in everything from a datacenter to a smart camera.<\/p>\n\n\n\n<p>To find out more check out the full episode <a href=\"https:\/\/blogs.sw.siemens.com\/podcasts\/ai-spectrum\/the-importance-of-flexibility-in-ai-chip-design\/\">here<\/a> or keep reading for a transcript of that conversation.<\/p>\n\n\n\n<p>Spencer Acain:<\/p>\n\n\n\n<p>Hello, and welcome to the AI Spectrum Podcast. I&#8217;m your host, Spencer Acain. In this series, we explore a wide range of AI topics from all across Siemens and how they&#8217;re applied to different technologies. Today I am joined once again by Russell Klein, program director for Siemens EDA&#8217;s High Level Synthesis team. Previously, we&#8217;ve talked about what it takes to design chips with HLS and why that&#8217;s helpful for artificial intelligence and even how AI itself is helping to flatten the learning curve with complex tools like those used in the chip design process, like HLS tools. But now, to round things out here, I think we talked about this a little at the beginning, but I&#8217;d like to come back to the topic of energy efficiency and power because we&#8217;re adding more AI to everything, and of course, we&#8217;re going to need better chips to help run all of that, and HLS helps with that.<\/p>\n\n\n\n<p>So yeah, could you maybe elaborate on that just a little bit more giving&#8230; Like you were talking about being able to work with chips and you see a result that you&#8217;re not happy with early on and then through HLS, and then you can quickly change it and redefine it. And how might you want to do that? How could you maybe make a chip better by doing that? Are you able to get big power improvements or estimate how much power your final chip is going to cost? Because obviously, if you&#8217;re going to be incorporating this in a laptop or something, you really need it to be a really efficient chip to accelerate your AI with. Is that something that HLS is really helping with?<\/p>\n\n\n\n<p>Russell Klein:<\/p>\n\n\n\n<p>Yeah, it all comes down to customization. So if we look at trying to deploy AI algorithms, we can run AI algorithms on really generalized chips like CPUs. A CPU, it&#8217;s what&#8217;s called Turing Complete. It can run any program that anybody could ever devise. It&#8217;s extremely general purpose, but it really doesn&#8217;t excel at executing AI algorithms. And we can go to more specialized architectures like GPUs or tensor processing units, TPUs, and here, we make the hardware a bit more specialized. So GPUs originally were built to process arrays that represented images, but it can be used to process arrays that represent the tensors in an AI algorithm. So GPUs is what the early AI developers migrated to off of the CPUs. They&#8217;re a bit more specialized than they do things that are unique to processing arrays of data, which is what you need to do in the context of an AI application to be able to run it both quickly and efficiently.<\/p>\n\n\n\n<p>So if we look at how fast does an AI algorithm run on a GPU versus a CPU, we&#8217;re going to find it both runs faster and it takes less energy to execute that algorithm. So not necessarily looking at power, but looking at energy. Now, if we want to go beyond a GPU or a TPU, we can get even higher levels of performance and efficiency by customizing the implementation. And what I mean by that is that a GPU may have 512 processors. If we&#8217;ve got an AI algorithm that&#8217;s going on our smartwatch and it needs to process arrays of 90 elements, giving it a GPU that has 512 processors means that we&#8217;re probably going to have a lot of wasted hardware because the arrays that we&#8217;re processing are just smaller. Or it could be that we&#8217;re in a data center and we&#8217;ve got a GPU with 512 processing elements, but we&#8217;re going to be doing a design that&#8217;s just about 30% larger than that. Say it&#8217;s got 700 elements in the arrays that it&#8217;s processing.<\/p>\n\n\n\n<p>There, we&#8217;re going to have to push things in and out of the GPU. And so when we create our own hardware, when we create a customized implementation for performing these AI operations, we can outperform, in other words, run faster than a GPU could and also use less energy because we&#8217;re going to be moving the data around less and those processing elements can be more efficient. Another thing to talk about is the operators that are in the GPU. So if we look at a GPU, typically, it has floating point math processors in it that will handle numbers, everything from 10 to the 30-something power all the way down to one over 10 to the 30-something power. So it&#8217;s got an enormous range of numbers that it can process. But if we look at most AI algorithms, almost all our numbers are between zero and one, and all of the numbers are, we might get some numbers that creep up to a few 100, but we&#8217;re not going to see numbers in the tens of millions or hundreds of thousands.<\/p>\n\n\n\n<p>And being able to deploy multipliers and adders and math operations that exactly match the algorithm that we are going to be running, not anything that anybody can dream up, but the one that we&#8217;re running, we can shrink those operators and make them so they&#8217;re going to take up less area, they&#8217;re going to take up less power to perform that same operation. And so, by creating these customized implementations of something that will execute our AI, we can dramatically reduce the area, the power that&#8217;s used and speed things up at the same time. But creating that custom implementation means hardware design.<\/p>\n\n\n\n<p>So again, if we do this at the Verilog level, this becomes a really long process and we&#8217;ve got one shot at getting it right. Whereas with High-Level Synthesis, we can now look at the very abstract algorithm and use a compiler to get us to that Verilog implementation. We&#8217;re now working with trying to be able to take Python neural network descriptions as input to our High-Level Synthesis tool to be realized ultimately in Verilog, this is the thing we call catapult AINN. There, we&#8217;re taking in directly these neural network descriptions and creating Verilog implementations that are going to be able to execute that one particular neural network very quickly and very efficiently.<\/p>\n\n\n\n<p>Spencer Acain:<\/p>\n\n\n\n<p>So you&#8217;re talking about if you wanted to put it in a laptop, for instance, like something to accelerate a copilot, for example, right?<\/p>\n\n\n\n<p>Russell Klein:<\/p>\n\n\n\n<p>So you mentioned, if we wanted to put it on a laptop, it need to be efficient, but really, think about if we wanted to put it on your smartwatch or if we wanted to put it on an edge embedded system, something that&#8217;s going to get all of its power from say a very tiny solar panel or by just absorbing energy from the environment around it, either heat or motion. Those are areas where we&#8217;re going to want those devices to get smarter. We can&#8217;t plug an NVIDIA GPU into it. It just really isn&#8217;t practical. But by using High-Level Synthesis, we can take these algorithms that represent those neural networks and create a very bespoke implementation and hardware that&#8217;s going to do really well at running just this one algorithm. And that level of customization gives us both efficiency and performance that just can&#8217;t be achieved with these general-purpose GPUs or general-purpose CPUs. It&#8217;s really all about the customization.<\/p>\n\n\n\n<p>And that customization can be targeted to a field programmable gate array so that it can be reprogrammed for different purposes. For example, if we had a data center and we wanted to run algorithms faster by customizing the hardware implementation, putting that into an FPGA so that when a new algorithm came along, we could reprogram it. But if we&#8217;re targeting an edge system where that function is really well known and isn&#8217;t going to change in the future, we can build that into an ASIC, which is going to be even smaller, even more energy efficient than an FPGA. We can build that into an edge system to be able to get the highest level of efficiency and performance for that AI that we&#8217;re deploying there.<\/p>\n\n\n\n<p>Spencer Acain:<\/p>\n\n\n\n<p>But yet, that definitely makes a lot of sense. That ability to really customize and get the most efficiency out of it is going to be critical, I think, in widespread AI adoption, certainly. You can&#8217;t cram a GPU into every camera on a factory floor or something like that, or every smartwatch like you said.<\/p>\n\n\n\n<p>Russell Klein:<\/p>\n\n\n\n<p>And that level of customization is really what is going to allow an implementation for a specific algorithm to outperform that generalized function. It&#8217;s not that we can build a better GPU, it&#8217;s we&#8217;re building a very specific piece of hardware that has been highly tailored to that one algorithm, and that&#8217;s really the key to getting that efficiency and performance.<\/p>\n\n\n\n<p>Spencer Acain:<\/p>\n\n\n\n<p>Makes sense. So, to round things out here, where do you see the next step forward for AI and the chip design process? Where are things going from here, maybe even just with AI in general?<\/p>\n\n\n\n<p>Russell Klein:<\/p>\n\n\n\n<p>Well, one of the things that&#8217;s really interesting about&#8230; As these tools get smarter both at the High-Level Synthesis level, the traditional RTL synthesis, the gate to GDSII, all of those different stages, if you look at how those tasks are being completed today in industry, it&#8217;s really a one-way conveyor belt. We create the RTL and then we take that RTL through synthesis, and then we take those synthesized gates and we turn them into a layout. One of the things that&#8217;s really impractical to do today is to look at, say the gate-level design and say, &#8220;You know what? This would be a lot better if the step upstream had done things just a little bit differently.&#8221; And so, today, you do a little bit of that by the synthesis designer running the tool looking at what came out, and then maybe tweaking things a little bit.<\/p>\n\n\n\n<p>So a single step, we can iterate there, but we can&#8217;t iterate any larger. It&#8217;s just too difficult to go all the way back to the beginning and run through the processes again. And it&#8217;s really difficult to know when we&#8217;re down at the layout level, could we have done something differently in the upstream tools that would really improve this layout. With AI, with understanding the context of all these decisions that we&#8217;re making throughout the tools, we can now start to think about globally optimizing that chip design.<\/p>\n\n\n\n<p>So right now, we can optimize the High-Level Synthesis, we can optimize the gate-level synthesis, we can optimize the layout, and each of these individual steps can be optimized. But because no one human can get the context of that entire breadth of steps of the entire chip in their head, but AI is going to be able to do that. It&#8217;s going to be able to look at these in context and say, &#8220;We can now start to bridge between these different tool domains in a way that just isn&#8217;t practical with the amount of information that we could humanly process.&#8221; But AI is going to be able to get a handle on what is the context across these tools? And can we be a little bit smarter about that whole process? It&#8217;s going to bring optimizations that we&#8217;ve only imagined so far.<\/p>\n\n\n\n<p>Spencer Acain:<\/p>\n\n\n\n<p>Yeah. It really sounds like context is the key piece for AI and really developing something here for engineering, for design, for industry in general. But that was some great insight, Russ. Thank you. But I think that&#8217;s about all the time we have here today. So Russ, thank you for joining me here.<\/p>\n\n\n\n<p>Once again, I have been your host, Spencer Cain on the AI Spectrum Podcast. Tune in again next time as we continue exploring the exciting world of AI.<\/p>\n\n\n\n<p>Russell Klein:<\/p>\n\n\n\n<p>Thank you.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>Siemens Digital Industries Software<\/strong>\u00a0helps organizations of all sizes digitally transform using software, hardware and services from the <a href=\"https:\/\/xcelerator.siemens.com\/global\/en.html\" target=\"_blank\" rel=\"noreferrer noopener\">Siemens Xcelerator<\/a> business platform. Siemens\u2019 software and the comprehensive digital twin enable companies to optimize their design, engineering and manufacturing processes to turn today\u2019s ideas into the sustainable products of the future. From chips to entire systems, from product to process, across all industries.\u00a0<a href=\"http:\/\/www.siemens.com\/software\" target=\"_blank\" rel=\"noreferrer noopener\">Siemens Digital Industries Software<\/a>\u00a0\u2013 Accelerating transformation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI is not a one size fits all solution for approaching industrial problems, rather, it\u2019s important to tailor different models&#8230;<\/p>\n","protected":false},"author":82958,"featured_media":13011,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spanish_translation":"","french_translation":"","german_translation":"","italian_translation":"","polish_translation":"","japanese_translation":"","chinese_translation":"","footnotes":""},"categories":[1,13763],"tags":[12,194,11,2,4,175],"industry":[],"product":[],"coauthors":[2461],"class_list":["post-13009","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news","category-podcast-transcript","tag-artificial-intelligence","tag-digital-transformation","tag-digital-twin","tag-digitalization","tag-simulation","tag-xcelerator"],"featured_image_url":"https:\/\/blogs.sw.siemens.com\/wp-content\/uploads\/sites\/19\/2026\/03\/siemens-questa-one-newsroom_original.jpg","_links":{"self":[{"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/posts\/13009","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/users\/82958"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/comments?post=13009"}],"version-history":[{"count":1,"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/posts\/13009\/revisions"}],"predecessor-version":[{"id":13010,"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/posts\/13009\/revisions\/13010"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/media\/13011"}],"wp:attachment":[{"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/media?parent=13009"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/categories?post=13009"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/tags?post=13009"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/industry?post=13009"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/product?post=13009"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blogs.sw.siemens.com\/thought-leadership\/wp-json\/wp\/v2\/coauthors?post=13009"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}