Convolutional Neural Network Quantization for Low-Power | Webinar

By russelklein

Inferencing for Convolutional Neural Network(s) (CNNs) is notoriously compute intensive. This makes them an ideal candidate for hardware acceleration, which is faster and more power efficient than running software on general purpose CPUs. Training and inferencing are typically done using floating point representations of the features, weights, and biases. Using a fixed point representation reduces the size and power of the operators in the accelerator. With a purpose built accelerator, the size of fixed point operators can be anything – they are not limited to 8 or 16 bits. Qkeras, or quantized Keras, is a library built on Tensorflow that allows developers to specify quantized fixed-point operations for each layer. It enables training and inferencing with reduced precision representations. This webinar will describe how to use Qkeras and High-Level Synthesis to produce a bespoke quantized Convolutional Neural Network accelerator, and compares the accuracy, power, performance, and area of different quantizations. Join the live event to get all your questions answered by our experts.

View the Webinar

What you will Learn:

  • How to determine the optimal operand sizing for a hardware accelerator deploying a neural network using QKeras  
  • How to determine the area, performance, and energy of a neural network accelerator
  • How to compare software performance against hardware accelerated performance, and make informed trade-off decisions

Who Should Watch:

  • Developers of neural networks that will be deployed on the edge or in other contexts where low power and efficiency are required in addition to high performance.

Leave a Reply

This article first appeared on the Siemens Digital Industries Software blog at