Episode 3: Identifying hardware design challenges and AI at the edge– the transcript

Previously, I summarized Episode 3: Identifying hardware design challenges and AI at the edge, where our experts Ellie Burns and Mike Fingeroff got together to discuss different compute platforms and their limitations, which is leading to a surge of new platform development. They also cover the many challenges that hardware designers face as they try to move AI to IoT edge devices. To listen to the podcast, click here. For those that prefer reading instead of listening to the podcast, here is the transcript of Episode 3:

Mike Fingeroff: Today we will discuss different compute platforms and their limitations, which is leading to a surge of new platform development. We will discuss the many challenges that hardware designers face as they try to move AI to IoT edge devices.

Mike Fingeroff: A good example of AI and the Cloud is that, every day, as I’m doing paper searches, my newsfeed is continuously popping up with new articles that are talking about the latest and greatest developments in AI and ML, particularly trying to move to the edge. So, what are some of the challenges of implementing the latest cutting-edge neural network algorithms on today’s compute platforms?

Ellie Burns: That’s a really interesting question because the algorithms are constantly changing. How does this really work between hardware and software? And typically, if you want maximum flexibility, then you basically write your AI and ML in software. That means running algorithms on a CPU – like an Intel processor that’s in your laptop or other processors that are in the Cloud. That’s what’s typically the most flexible. But what we’re starting to see is those processors do not have enough compute power. People are trying to use a GPU instead. If you remember the history of the GPU, you know NVIDIA as your gaming platform. Well, the GPU really found a new life because it was very, very good at parallel processing, which is what is needed for all of these big neural networks. The GPU has the computing horsepower to actually run these networks. And then, we’ve also seen Google coming along with the TPU. All of these solutions are programmable. And that’s great! That’s really what we need, and the algorithms are changing all the time. The solution needs to be very, very flexible.

Ellie Burns: However, we’re starting to see this second wave where, because AI is so compute intensive, the solutions use a monstrous amount of energy. Let me give you a quick example. ResNet does multiply and accumulate, and filters. ResNet takes 3.87 billion multiply/accumulate operations per image. Imagine taking a video that’s got thousands of images just in a couple of seconds. It takes 3.87 billion operations for just one image. You have billions and billions of math operations just to do a single video. Another example. The Baidu Chinese speech recognition takes 20 billion, billion math operations. This is really causing an explosion of new compute platform development. The industry now is saying, “Okay, whoa! CPU and GPU, TPU – these things just can’t handle the load and they consume way too much power!” This is causing in our industry to look at different silicon platforms in order to get the performance up and keep the power down.

Ellie Burns: We’re seeing a huge drive at multiple companies to develop silicon platforms. And this is why we’re seeing a gigantic investment in the semiconductor industry. They’re really looking to improve the TOPS (trillion operations per second) or inferencing per second. But they’re also looking at TOPS per watt. They are having to find new and different hardware platforms to be able to deal with the energy consumption. Example companies include Google, Wave Compute, and Graph Core. But what I think is really interesting is, we’re starting to see a couple of companies and a couple of devices out there that are looking more brain-like, which is really fascinating. They are embracing analog designs. When I first started my career, analog was an old school methodology. It was really the mature engineers that were doing analog. Now, what we’re starting to see that analog significantly lowers the power. Companies like Mythic are bringing analog into the mix for inference so that they can drop the power by orders of magnitude. And that’s really more trying to emulate how our brains work. And there’s even new research and new networks for things like spiking neural networks. Companies are trying more and more to closely emulate what our brains do, which have had millions and millions of years of evolution to get it right at the lowest possible power.

Mike Fingeroff: If we look at the current state of the art in both training and inferencing, what we’re seeing right now is that training is happening offline on GPU farms somewhere in the Cloud, where you’ve got the compute power. But, you’re expending a lot of energy and you have to do offline processing of the data. As we see more and more applications moving to the edge. Do you foresee this split between training and inferencing, changing over time?

Ellie Burns: Yes, definitely! At first not a lot of the new ICs and the new platforms I was telling you about was taking place. AI silicon was really targeting the data center. And that’s still where a lot of the investment is. But what we’re starting to see is that the investment at the edge is going to surpass that. That is where we’re seeing the fastest growth because you really need to be able to do that inferencing closer to where the data is coming in. It’s much more energy efficient to process the data locally. So, let’s say, for example, you have your glasses that are processing an image as you’re seeing them. It takes a tremendous amount of energy to actually connect to the Internet and pass that entire image, all of those streams of images, up to the Cloud to have any processing done, and then have it come back to your glasses. It’s much more energy-efficient, to process bits and pieces of it at the edge, and then be able to then send only what you need back up to the Cloud. We’re definitely seeing that on the inferencing side. The edge is where we need to do the investment and therefore we need lower power solutions.

Ellie Burns: But we’re also seeing that many of the companies want to be able to say, “Okay, I need to also be able to learn. I’ve got all this new data coming in.” They want to be able to do inferencing at the edge. But we’re also starting to see companies say, “Can we have the same kind of platform and infrastructure to be able to do some limited amount of learning and training at the edge?” And then lastly they are thinking about power. AI takes billions of operations and it stores weights in memory. One of the things that’s maybe not quite so obvious to everybody, one of the biggest problems in AI that causes all of the energy consumption is the data movement. It’s not just the core compute that is causing the energy consumption, but it’s really about moving around these datasets. I’ve got millions and millions of these weights and they have to be efficiently moved through the computational units.

And one of the things that folks are finding out is that the energy that is used by a complete inference is very dependent on how much data you have to move around. So just think about – you don’t want to move that data all the way up into the Cloud and then all the way back. You want to efficiently process that locally as best as you can, in the most efficient way possible. Because, IoT edge designs are about getting the power consumption down.

Mike Fingeroff: Thanks Ellie. It is an exciting time in AI as we see new silicon solutions being created to support new applications. The balance of performing the billions of calculations on massive data sets on silicon versus the Cloud seems to be a huge challenge. Tune into our next podcast where Ellie and I will discuss the gaps in the current silicon design flow and the challenge of creating a generic solution to AI.

Leave a Reply