Thought Leadership

Designing the next generation of AI chips Part 2 – Transcript

By Spencer Acain

I recently had the opportunity to speak with Russell Klein, program director at Siemens EDA and a member of the Catapult HLS team about the ways importance of custom AI accelerators and the ways AI itself is helping with the design process of these chips and how HLS can help integrate EDA capabilities into the MBSE process. Check out the podcast here or keep reading for a full transcript

Spencer Acain: Hello and welcome to the AI Spectrum podcast. I’m your host, Spencer Acain. In this series we talk to experts all across Siemens about a wide range of AI topics and how they are applied to different technologies. Today I am joined once again by Russell Klein, program director at Siemens EDA and a member of the CATAPULT HLS team.

Spencer Acain: Continuing where we left off last time, how do the AI accelerators creating using a tool like CATAPULT compare with other companies that produce similar technologies such as, for example, NVIDIA?

Russell Klein: Yeah, so NVIDIA has, you know, a lot of, they’ve got a lot of really smart engineers working on AI accelerators. But the challenge that NVIDIA has is for them to create a successful deployment they need an accelerator that will get adopted by, you know, 5,000 different projects or 10,000 different projects. So they need an AI accelerator that’s gonna work for absolutely any inference that anybody brings to it. If you’re building an inferencing accelerator for your one specific algorithm, you can do a lot of customization that’s going to get you a faster, more efficient, smaller implementation than a general purpose NVIDIA chip, and you’re going to be able to do that with less resource. So for example, an NVIDIA machine learning accelerator, it needs to support any floating point number that anybody might think of bringing into an inference at any point in time in the future whereas if we look at a specific inference, it may use numbers, generally AI algorithms will normalize their numbers around 0, so they tend to be, you know, between one and minus one. But occasionally you’ll accumulate things up to larger numbers, but if you could look at a specific algorithm and see, you know we never go between, we never go beyond 20 on the positive side, minus 30 on the negative side, that means you can have much smaller memories. You can have much smaller registers and really important, much smaller multipliers. Multipliers can be the biggest area consumer in the design. NVIDIA can’t build a smaller multiplier because somebody might come with an algorithm someday in the future that’s using larger numbers, so they can’t take that shortcut. But if you are doing a specific implementation, you can easily do that, and that saves you a lot of area that saves you a lot of power. Also on the data movement, we look at the data movement tends to be one of the limiting factors in terms of performance on inferencing and if we’re customizing an implementation, we can size those caches to exactly what we need. We can move the data around in a way that’s optimal for the execution of that inference. Again, that’s something in NVIDIA can’t take advantage of if they’re gonna build an inferencing accelerator that works for 1000 different designs.

Spencer Acain: So it sounds like CATAPULT, HLS, this type of technology really is just it’s all about getting the most out of every like every transistor you’ve put on your board, basically. Just getting the absolute best performance by cutting away anything that’s not immediately necessary for your specific task and was this a result you could get without CATAPULT? Could you be using other tools for this or non-HLS tools I guess?

Russell Klein: Well, the catapults not doing any magic here, right? It’s doing, it’s automating a lot of the steps and traditional hardware design. So if you had a large enough budget and you had enough engineers you certainly could do the same thing that CATAPULT or high level synthesis is doing. You probably couldn’t do it in the same time frame because you know the computers work a lot of this math a lot faster than humans can, no matter how many humans you bring to bear on the problem. So at a practical level, high level synthesis is really going to be the fastest path from that algorithm description into a hardware implementation that can be realized in silicon. And again we’ve talked about this a little bit, but CATAPULT enables that designer to look at a number of different architectures very quickly. They can look at the design space by changing the nature of the architecture of what they’re building through that abstract description and try out a number of different implementations. Now if you take a look at a traditional manual RTL hardware development cycle, what you typically find is that once they have an architecture that kind of sort of works, they go to production. They’re very few teams that are sufficiently resourced that they can even look at two different, materially different architectures within a single project, you know, maybe there’s some folks at Google or Amazon who have, you know, so much money coming into the hardware design project that they could try two or three. But beyond those guys, as soon as you’ve got one working implementation, you’re off the production cuz you need to get that product out there and earning money. So that same level of design exploration that we can do a practical level with high level synthesis just really isn’t practical for traditional hardware design teams.

Spencer Acain: It sounds kind of like generative design over on the hardware or the like on the NX side of things where you have a computer that’s capable of exploring a much wider design space in a very similar amount of time compared to what it a team of engineers could do manually. And it lets you really explore a bigger design space and maybe find a better solution than you would have found by just going with whatever the first working thing is.

Russell Klein: Yeah. Although this this is right now, this exploration of different alternatives is a manual process. So it’s not generative AI, but you can imagine that in the fullness of time we will be applying AI techniques to this architecture search space where it can become something that is directed by artificial intelligence to go look at, you know what are practical architectures? And let’s go try two or three of them out. So I kind of see that in the future where we’ll be using those that type of AI to be able to create a generative design of finding the right architecture where right now it requires the human to think about, you know, how many multipliers are we going to use here, what kind of pipelines are we going to construct? All that’s all that’s in the purview of the designer today.

Spencer Acain: I see. So it’s while it’s still fairly manual process, it’s it would lend itself well to automation by AI in the future once you can kind of get the training and get the get the kinks worked out of that sort of system. And that’s something you’re aiming for?

Russell Klein: Indeed, ultimately High-Level Synthesis is gonna body embody a lot of AI to be able to do generative design in this space, but we’re not there yet. If you if you take a look at any compilation technology and High-Level Synthesis, we are parsing a high-level language description and there’s a lot of compiler technology under the hood there. Compilers use a lot of heuristics, that is they look at a few metrics and say usually when this metric is small and that metric is big, the right implementation is X and where something else happens the right implementation is Y. And compilers are littered with heuristics, and that’s a really rich targeting ground for very simple AI where it can look at a broader set of metrics and come out with better outcomes than we’re traditionally programming into compilers in terms of heuristics. So for example, one of the things that catapult or any High-Level Synthesis tool needs to do is it’s got an array of numbers that are going to be operated on it has to decide, do I put those in registers or do I put them in memories? Well, if I put them in registers, they’re gonna burn more power. It’s gonna take up more area. It’s less efficient in terms of space and power. But I can get to the contents of that data in parallel if I put it in a memory it’s going to be serialized and that could slow things down. And right now, we use a heuristic that says if it’s smaller than a number X and the user can come in and say X is 256 or 1024 or whatever. If it’s smaller than that number we’ll go registers, otherwise we’ll go memory and the user can go in and override it. But you can imagine that an AI could say, well, how’s this data really being used? OK, I’ll pick this type of memory and I’ll bank it this way, or interleave it that way, or no, we really need it register. So those kinds of decisions can start to be made by artificial intelligence built into the tool.

Russell Klein: If we look at even higher level ultimately I think the AI’s gonna start to make really good decisions on how do we partition between hardware and software. So one of the challenges is that you’ve got is you could find this really heavy lifting like our 2D convolution, lots of computation going on there. OK, let’s go pull that into an accelerator. Well, the convolution feeds into what’s called an activation function and the activation function may be very lightweight. So for example a linear rectifier just says if it’s a negative number, make it 0 if it’s positive number, leave it alone. Very low computational complexity, so you say OK, let’s leave that software, but it means that you now have a big communication path between the accelerator outside the CPU and the linear rectifier in the CPU, and that slows things down. So one thing you recognize really quickly is oh, that activation function needs to go in the accelerator, not because it’s computationally complex, but because I need to minimize communication paths. Well, AI is going to be able to go find a lot of those things and you know not have the user have to get some experience to go figure those things out.

Spencer Acain: Well, that makes it makes sense to me. It sounds like you could really be using AI to pick up all the little things that would be difficult for a human to spot, necessarily. The connections, the correlations, the just the smallest details that can easily fall between the cracks and you’re designing something very large and very complicated with a lot of moving parts, so to speak.

Russell Klein: Exactly. Yep.

Spencer Acain: So, I guess before we wrap this up, is there anything else interesting that you’re currently working on that you’d like to share with us here?

Russell Klein: One of the things that we’re working on right now is looking at we’ve got some partners over in Siemens who are working on a model based systems engineering flow, MBSE and High-Level Synthesis has a lot of synergy with that type of flow. What’s really interesting about this is with High-Level Synthesis, we can take this abstract algorithm and we can say if we were to put it in hardware, this is the area that would consume. This is the power it would consume. These are the latencies and the throughputs that we’d get. Well in a model based system engineering flow what you want to be able to do is describe that system, at a very abstract level and then start to do some analysis about how do we deploy this system, what goes in software, what goes in hardware. Umm, you know, what systems are we gonna collect together on this board versus that board and so forth? And what you wanna do is make an informed decision. When you say, well, that looks like it’s gonna be a lot of computation, maybe it needs to go in hardware. Maybe we need to get an ASIC or an FPGA to perform this function. You’d really like to know at that level, well, what’s that gonna cost? How big is that ASIC gonna be? What kind of FPGA do I need to bring to the party to perform that efficiently? Right now in an MBSE flow, without some of the functionality from a High-Level Synthesis tool, you can make good guesses right. Again, heuristics like we were talking about earlier with High-Level Synthesis and taking a portion of the technology that we’re using there, you could answer those questions deterministically and make much better and more informed decisions in the model based system engineering flow than would be possible without this capability.

Spencer Acain: Well, that sounds like it would be a really a nice benefit there to just getting everything integrated together from the like the design and like physical hardware side down to the chips and control side, just keep have it just kind of a single way of looking at everything that can really bring out the most efficient designs and the best designs for combined hardware.

Russell Klein: Indeed. Also, as we go to implementation, right, so model based system engineering flow, we’re gonna use that to describe the system. Once we’ve described it, analyzed it and figured out here’s what we want to build on. The next thing you do is go to implementation that’s where High-Level Synthesis can say those functions. I can implement those in hardware and boom, they’re right there for you. So as we look at the MBSE flow going into design and the traditional hardware design tools that we’ve got here at Siemens EDA there’s a really nice flow that can be created there as well.

Spencer Acain: That sounds like it’s a great kind of look toward the future of what HLS can do when combined with other tools and, but I think that would be a great place to stop it for here. And because we are just about out of time. So thank you for joining me here. Russ, it’s been very informative.

Russell Klein: All right. Thank you very much.

Spencer Acain: Once again, I have been your host Spencer Acain and this has been the AI Spectrum podcast. Tune in next time as we continue or dive into the exciting world of AI.


Siemens Digital Industries Software helps organizations of all sizes digitally transform using software, hardware and services from the Siemens Xcelerator business platform. Siemens’ software and the comprehensive digital twin enable companies to optimize their design, engineering and manufacturing processes to turn today’s ideas into the sustainable products of the future. From chips to entire systems, from product to process, across all industries. Siemens Digital Industries Software – Accelerating transformation.

Leave a Reply

This article first appeared on the Siemens Digital Industries Software blog at https://blogs.sw.siemens.com/thought-leadership/2023/03/28/designing-the-next-generation-of-ai-chips-part-2-transcript/