How do you engineer GenAI Solutions? Siemens and the ELFMo Project

By Robin Bornoff

Generative AI promises to change the way we work and live. Although still a nascent technology, we’re now seeing impressive demonstrations of everything from conversational chatbots through to automated code creation assistants and software co-pilots. These tools promise to transform the way businesses operate, enabling faster design, smarter automation, and entirely new kinds of products and services.

But there’s a catch.

The reality is GenAI needs engineering.

Engineering here in terms of applying rigorous, structured, and repeatable methods to design, adapt, integrate and validate GenAI based solutions so they are safe, effective, reliable, and compliant for real-world use.

For industrial companies, simply ‘using an LLM’ is not enough. Answers to tough questions have to be identified:

How do we ensure the GenAI’s outputs are reliable and accurate?
How do we adapt a foundation model to satisfy our specific use-cases?
How do we manage risks – legal, ethical, technical – throughout its lifecycle?
How do we comply with regulations like the upcoming EU AI Act?

At Siemens Digital Industry Software we know that adoption of new technology isn’t just about making it work productively for our users, in addition it’s ensuring that it does so reliability, effectively and sustainably.

That’s why we’re one of the partners in the ITEA4 ELFMo project. A European collaboration to develop methods, tools, and processes needed to reliably integrate Large Foundation Models (LFMs) into industrial software and services. (ELFMo – Engineering Large Foundational Models)

The core challenge: Engineering GenAI for industrial use

Deploying GenAI in our own Simcenter CAE solutions demands a rigorous approach to address specific challenges. Outputs must be highly accurate, traceable, and explainable – unchecked hallucinations or errors can lead to flawed simulations. The black-box nature of LFM models challenges the need for explainability and verification, which are absolutely essential. Without a disciplined, risk-aware approach, integrating GenAI into our tools risks undermining user trust in their critical engineering workflows

ELFMo: Building the blueprint for industrial GenAI adoption

The ELFMo consortium brings together European industrial partners, research institutions, and technology providers with a shared vision: to make the integration of Large Foundation Models (LFMs) into enterprise and industrial applications reliable, safe, and effective. In doing so aims to help European companies adopt GenAI in a way that is trustworthy, efficient, and aligned with European values and regulations.

Innovation has to ‘move the needle’ and in ELFMo this will be achieved in these 4 areas:

Risk-based decision making
Tools and methods for adaptation
Evidence-based quality and compliance
Open-source/open-access standards

How does one actually develop industrial GenAI?

Successfully integrating GenAI into industrial and engineering software isn’t just another software project. Unlike classical software development – where behavior is explicitly coded and deterministic – working with Large Foundation Models (LFMs) means shaping, adapting, and validating systems whose behavior emerges from vast, opaque training data. This process demands new kinds of design decisions, new forms of testing, and careful risk management to ensure the AI behaves reliably, safely, and in line with business needs and constraints.

In traditional software projects, requirements are implemented through clear logic and well-defined interfaces. By contrast, LFM-based systems rely on training data, prompt design, and model selection to achieve the desired behavior – often with inherent unpredictability. Adapting GenAI for industrial use means not only integrating it into existing architectures, but also systematically managing issues like hallucination etc. throughout its lifecycle.

ELFMo will innovate, test and elaborate appropriate methodologies, but there are already recognised stages throughout an LFM-based solution’s development…

Use-case definition

As with all product development, be it software or not, it starts with clearly defining the problem to solve:

What is the industrial or engineering challenge?
Who are the stakeholders?
What are the success criteria, regulatory constraints, and operational boundaries?
What failure modes are unacceptable?

This step aligns business objectives with technical feasibility and sets the baseline for risk assessment.

Architecture design

Next comes designing the AI system architecture:

What type of adaptation is needed? Options include fine-tuning, retrieval-augmented generation (RAG), and agentic workflows.
What data flows, interfaces, and user interactions are required?

A well-defined architecture helps plan for maintainability, security, and integration challenges up front.

Model shortlisting

This step involves selecting candidate foundation models by considering various, often conflicting, options:

Model size and cost constraints
Open-source vs. commercial models
Licensing and IP considerations
Language, domain coverage, and existing benchmarks

It’s not just about ‘picking the biggest model’, but balancing capability, cost, risk, and alignment with use case needs.

Model benchmarking

Once candidate models are shortlisted, their functional and non-functional behaviour needs to be compared. Here some typical metrics:

Quantitative metrics: BLEU, ROUGE (for text), accuracy, F1 score, perplexity etc.
Task-specific benchmarks: domain-relevant test sets, simulation validation outputs
Risk and failure mode analysis: e.g. known hallucination rates, out-of-domain robustness etc.

Data collection and preparation

Any AI is highly dependent on the data used to train it, or the data it accesses when in operation. Data really is king when it comes to GenAI and so needs to be treated accordingly:

Curate, clean, and annotate domain-specific datasets
Ensure representative coverage of scenarios and edge cases
Address class imbalance or bias in training data
Protect sensitive or proprietary IP

Model instruction, training and tuning

This step enables adaption of the LFM to best satisfy the given use-case and is at the core of the developmental process:

Prompt engineering: few-shot, chain-of-thought, RAG database adaption etc.
Parameter-efficient tuning: adapters, LoRA, prefix tuning etc.
Full fine-tuning (where possible, if at all possible!)
Integration with MLOps pipelines for tracking experiments, parameters, and artifacts

Evaluation

A comprehensive evaluation of the behaviour of the LFM is essential before field deployment:

Functional testing: Does the model answer questions correctly, reliably, and consistently?
Non-functional testing: latency, throughput, scalability
Trustworthiness metrics:
- Correctness: accuracy against ground truth
- Robustness: perturbation testing, out-of-domain generalization
- Explainability: availability of attention maps, traceable prompts, RAG source tracking
- Fairness and Bias: subgroup analysis
- Safety: rejection of unsafe inputs, adversarial testing
- Uncertainty estimation: calibrated confidence scores
- Determinism: repeatability
Human-in-the-loop testing: Subject Matter Expert review for critical tasks

Assurance and governance

Finally, ongoing assurance and governance ensure compliance and trust over the LFM’s lifecycle:

Continuous monitoring for model drift, hallucinations, degraded performance
Auditing: maintaining logs, explainability artifacts, model documentation
Regulatory compliance: meeting EU AI Act requirements for risk assessment and documentation
Ethical considerations: bias mitigation, transparency to users
Security and IP protection

Looking ahead: Building the future of industrial GenAI

ELFMo is a three-year collaborative effort running through to 2027. Its goal isn’t to create a single product, but to deliver a comprehensive framework of methods, tools, and best practices that any organisation can use to safely and effectively adopt Large Foundation Models.

For Siemens, this means helping to shape a future where GenAI is not a risky experiment, but a reliable component of engineering and industrial software – one that can be integrated confidently into simulation tools, product lifecycle management systems, and automation solutions. As ELFMo progresses, the outcomes will support not only Siemens’ own solutions, but the broader industrial ecosystem in Europe.

ELFMo engineering genai — ELFMo Plenary Meeting, Siemens Belgium, May 2025

Disclaimer

This is a research exploration by the Simcenter Technology Innovation team. Our mission: to explore new technologies, to seek out new applications for simulation, and boldly demonstrate the art of the possible where no one has gone before. Therefore, this blog represents only potential product innovations and does not constitute a commitment for delivery. Questions? Contact us at Simcenter_ti.sisw@siemens.com.