Thought Leadership

Efficient training of AI Vision for factory automation

By Spencer Acain

This 3-part series will examine some of the challenges of training AI for industrial robotics as well as exploring emerging training methods to address those challenges. This initial blog examines today’s established approach for training industrial robots with vision sensing that is cumbersome and time-consuming. Blogs 2 and 3 will then discuss emerging techniques that achieve the same or superior results with shorter time and less resources – both human and capital equipment.

Part 1: The massive challenge of AI Training.

Endowing robots with human-like motor skills and the ability to perform tasks in a natural way is one of the important goals of robotics, with huge potential to boost industrial automation. A promising way to achieve this is by equipping robots with the ability to learn new skills by themselves, similar to how a human would learn. However, acquiring new motor skills is not a simple task. For robots, the continuous exploration space is massive – a robot can be at any given position at any given time and interact with its environment in infinite ways.

Artificial Intelligence (AI) and Machine Learning (ML) are emerging as the optimal technologies to train robots to perform an almost unlimited range of tasks. As a part of the AI and ML revolution, today’s robots are starting to make real-time decisions based on input from a variety of sensing devices.  Leveraging all this sensor data enables robots to perform industrial tasks that previously required human workers such as part detection, random part grasping, assembly, wiring and so on.   Machine learning algorithms based on deep neural networks act as the “brains” behind these complex robotic skills. Unlike traditional programs, an ML model is not explicitly programmed, but trained for specific tasks using one of several different training schemas.

For industrial automation for factory operations, robots predominantly rely on vision sensing.  Cameras are used for acquiring production line data but are one of the most challenging data sets for AI training. This is due to the large amount of information encoded in a single picture — millions of pixels in each image that need to be processed — and the time-consuming process of building training data sets. Consequently, building, validating, and deploying these technologies for industrial robotic systems is a laborious and time-consuming process. 

It requires setting up an experiment rig where the actual parts, robots, sensors, and other peripherals are all integrated and dedicated exclusively to the training effort. Moreover, robots must perform their task over an extremely wide range of possibilities to generate enough training examples. Depending on the use case and existence of pre-trained models, anywhere from hundreds to millions of training examples are needed to achieve a robust algorithm. All this ties up expensive capital equipment for months on end, diverting process equipment from working on its intended task in the production line.

What’s more, manual human oversight and continual assistance is required to position the parts after every try and to monitor each of the tasks executed by the robot in order to provide the correct training feedback to facilitate learning – i.e. was the attempt a success or failure. The technician must also frequently stop the robot when there is a safety issue or a risk of damaging the product or equipment. The task is even more daunting when training a robot to assemble a new product.  Most likely only a few prototypes are available, since the mass production is yet to be designed and implemented. This severely restricts the number of experiments that can run and the amount of data which can be collect in the real world.

There are two major AI training techniques that can be used for visual training – supervised learning and reinforcement learning. Each offering certain benefits and presenting certain challenges. In the second part of this series, we’ll take a closer look at those algorithms, what makes them tick, and why you might consider using one or the other in a factory.

Check out part 2 here.


Siemens Digital Industries Software is driving transformation to enable a digital enterprise where engineering, manufacturing and electronics design meet tomorrow. Xcelerator, the comprehensive and integrated portfolio of software and services from Siemens Digital Industries Software, helps companies of all sizes create and leverage a comprehensive digital twin that provides organizations with new insights, opportunities and levels of automation to drive innovation.

For more information on Siemens Digital Industries Software products and services, visit siemens.com/software or follow us on LinkedIn, Twitter, Facebook and Instagram.

Siemens Digital Industries Software – Where today meets tomorrow.

Leave a Reply

This article first appeared on the Siemens Digital Industries Software blog at https://blogs.sw.siemens.com/thought-leadership/2022/03/07/efficient-training-of-ai-vision-for-factory-automation/