Thought Leadership

Bringing AI to edge applications

By Spencer Acain

Artificial intelligence, especially with the recent explosion of large language models (LLMs), is starting to find its way into more and more software where it can serve a wide variety of roles. What many of these systems, especially those backed by LLMs, have in common, is their reliance on large energy-hungry server farms to support the inferences of massive multi-billion parameter models.

However, as AI is used more and more every day, it’s becoming increasingly clear that these practices are not sustainable on a large scale. AI datacenters don’t benefit from the economy of scale many other industries do and even while idle these systems consume vast quantities of energy. While there are many potential solutions being actively researched, one of particular interest for industrial applications may be as comparatively simple as taking AI out of the cloud and bringing it down into edge devices and applications.

AI models: a problem of size

One of the reasons many AI models run in datacenters is simply because of their size. For example, ChatGPT 3.5 is estimated to use between 150-180 billion parameters in order to allow it to do everything from hold a conversation to write code. Responding to a single query can take the resources of an entire server loaded with AI accelerators – several kilowatts of power just to a question as simple as “how are you doing today?”. However, when responding to a simple query, a large portion of those billions of parameters could be contributing little or nothing to the final result since they are responsible for handling different sets of information.

As AI models grow larger, necessitating more powerful hardware to run them on, the cost of running all AI services in data centers simply isn’t sustainable, either economically or environmentally. As the honeymoon period of generative AI comes to an end, companies are increasingly finding it difficult to translate consumer interest and product intent into financially viable products simply because of the electrical and amortized hardware costs required to generate each inference. While moving this load out of the data center isn’t always a viable solution, for AI-driven assistive programs in professional software, it may be the future.

Bringing AI out of the cloud

Instead of continuing to scale up, creating larger and larger general-purpose models for specific applications, such as a recommender system within a CAE program or a code copilot feature in an IDE, it might make more sense to go smaller and bring the models onto the devices they are intended to be used on. With proper design and training, a model with 10, 5, or even 1 billion parameters can offer similar levels of speed and accuracy as much larger models within its specific area of training.

The shift to smaller, local models offers many benefits. Most modern consumer and workstation grade computers already contain hardware capable of accelerating AI tasks which, while insufficient to run something as large as ChatGPT, could easily run a much smaller model specialized for a single task. This eliminates a lot of the inefficiency of general purpose models in large data centers, making the use of AI inferences cheaper and more sustainable.

In the industrial world, protection of data, IP, and other intangible assets is vitally important and one of the key hurdles when bringing AI into the workplace. The simple act of bringing the models down from the cloud and onto the workstation, can solve most of these challenges. AI-powered tools that run on local models eliminate the extra elements of risk when it comes to allowing sensitive company and customer data off-site and out of the control of a company’s own IT infrastructure. Because of this, the adoption of AI into these edge environments will speed the overall process of implementation as a whole, helping organizations to reap the benefits that AI offers sooner.

What’s next for AI?

While AI, much like the internet or the cloud, may seem like something that exists amorphously and can only be accessed through a computer or smartphone, the reality is there’s nothing stopping someone from downloading an AI model and running at home or at work right now. AI isn’t something that needs to be relegated to the server farms of big hyperscalers in the form of massive, monolithic models trying to do everything. Instead, it can be scaled and adapted, made to fit where it’s needed and do only what is required of it, whether that’s general-purpose chatbot in the cloud or simply predicting the next word you want to type on your smartphone.

Siemens Digital Industries Software helps organizations of all sizes digitally transform using software, hardware and services from the Siemens Xcelerator business platform. Siemens’ software and the comprehensive digital twin enable companies to optimize their design, engineering and manufacturing processes to turn today’s ideas into the sustainable products of the future. From chips to entire systems, from product to process, across all industries. Siemens Digital Industries Software – Accelerating transformation.

Leave a Reply

This article first appeared on the Siemens Digital Industries Software blog at