The 5 best FAIR data secrets to empower process engineers

By Shawn Wasserman

John Nixon: Hello and welcome to today’s episode of the Industry Forward Podcast, where we explore key trends, transformative technologies and real-world innovations that are reshaping the fields from consumer-packaged goods (CPG), to life sciences, to energy and beyond. I’m your host, John Nixon, global vice president of Process Industries at Siemens Digital Industry Software. My guest today is Bill Hahn. He’s the director of Solutions Consulting at Siemens Digital Industry Software. And I’m always, always happy to have him as a guest. Welcome, Bill!

William Hahn: Thanks, John. [I’m] excited for this discussion we’re about to have.

John Nixon: Well today, Bill and I are continuing our discussion on the impact of digitalization and process industries. And specifically, today I want to speak about FAIR data and 5 lessons it can teach us.

Data has to be findable, accessible, interoperable and repeatable—or reusable. And we often hear this as an acronym called FAIR, F-A-I-R. And FAIR data, it feeds digital twins, simulation and AI. Am I reading this correctly?

William Hahn: One hundred percent, John. If you think about physics-based simulations, right? The simulation tools themselves need to have access to the data. They need the input data in the format that they’re able to read, and they will provide output data in the form of results.

[Those results] need to go somewhere that eventually, a human or AI will be able to interpret what those results are. So, the data needs to be findable, accessible, interoperable, repeatable, not just for us, but for the tools that are going to be part of the digital twin, right? Whether it’s a simulation tool, whether it’s an AI agent, right? That data has to be accessible to those tools in the right format.

Contextualization of: “Okay, the data is findable and accessible. Is it the right version, right? Was there a change made to the data? And now there’s an updated version, but our simulation is running off of an older version of the data.”

So, I think it all comes back to: “to support FAIR, we need to first look at the data management aspect of it.” What platform is supporting the single source of truth for this FAIR data that makes it accessible in the right context to all the different formats that we’ll need?

John Nixon: Bill, as you’re saying this, several points come to mind. First, FAIR data is not an abstract data science concept. When I hear you, it becomes very evident it is what makes data usable across the lifecycle—from research and development to engineering and on into operations.

John changes the topic

In practice, this would require several technology capabilities working together. Here are some thoughts I’d like to share with you.

First, you need a strong data backbone. You talked about that, and lifecycle management coupled with that, so data is findable and traceable as we discussed. In our portfolio here at Siemens, this is where platforms like Teamcenter come in. We’re managing engineering, simulation and process data with clear relationships and history.

Another thought that comes to mind, Bill, as you’re talking about this, is that: FAIR data, it depends on semantics, schemas and context. As you were just discussing, data values alone, they appear meaningless without understanding assumptions, conditions, intent [and] what you often call requirements. And model-based definitions and physics-based simulation models, these embed this meaning directly into the data.

A third point that comes to mind is: digital twins act as the ultimate test of FAIR data. If data isn’t accessible, interoperable, or repeatable, the digital twin simply won’t work. It will immediately show where your data strategy is falling apart. Twins expose gaps in data quality immediately.

Another point that comes to mind is simulation platforms are powerful, fair data generators. They create structured, repeatable, synthetic data where sensors don’t exist—enabling learning from bench to pilot to scale. Bill, this reminds me of the example we’ve discussed in the past where if you’re looking at, for example, a steam or a gas turbine, well, I can’t pop a sensor on one of the fans inside the turbine. But guess what? I can create synthetic sources of data in areas of the combustion chamber I would never be able to have a physical sensor. So, it’s exciting to me the level of fidelity in these physics-informed twins, where we can create these synthetic sensors for synthetic data and allow us to have insights that would have eluded us before.

And finally, I’ll say this, Bill, FAIR data is a prerequisite for AI. AI doesn’t fix poor data. It amplifies it. When data lineage and assumptions and context are clear, then AI can drive optimization, prediction and decision-making with confidence. I would say you can’t have fair data if you don’t have a system of record that also understands relationships. It’s not just files and folders.

William Hahn: John, I agree, one hundred percent. I mean, to summarize: “data has to be trustable,” right? If you don’t trust the data (either the inputs, the outputs, or throughout the process) it doesn’t matter what the results in front of you are going to say, you’re not going to trust it, right?

So, how do you get the trust level of the data? I think using physics is one. Where we know there are equations and laws that are governing the responses (John Nixon: Right.) that we can actually trust the data that’s coming out of the other side.

But when you look at coupling that with AI, how do you trust the AI response? How do you trust that there aren’t hallucinations, right? Like you said, that depends on the quality of the data [input]. It’s the garbage in, garbage out scenario, right? And the way that you trust what the response of the AI is going to be is by controlling the data that it has access to, right? And how it uses that.

In the end, the AI that we have today it’s a giant probability engine, right? It’s going through each letter, each word, each response and calculating the probability of the next word. When you break it down like that, right? It’s just calculating probability.

So, how much you leave up to that AI and how much you leave up for the AI to guess, based on probabilities, is how accurate your data is going to be. So, instead of feeding it a bunch of unstructured data sources that aren’t related, where it now has to guess how to use the data and also how the data is related, imagine feeding it a data model and an ontology that asserts those relationships. So, that’s one less thing the AI has to guess on, right?

Same thing with physics-based simulation. Instead of the AI just looking at the data and saying, well, I think this is how the turbine’s going to perform because the millions of data points I have are kind of pointing in that direction—maybe? If we can put a physics-based model on top of that, where now we have confidence that not only is our assumption correct, but it’s grounded in the laws of physics. You’re that much more confident in the response.

John Nixon: Well, it’s been real exciting to speak with you today, Bill, on this Process Industries podcast. I want to thank you, as always, for a very stimulating conversation. And before we close off, I’d like to recap some takeaways that I’ve heard from you today, Bill.

First, the digital twin and our data strategy that supports that digital twin is key to your long-term AI strategy. That we have with AI behind it, not only that data, but the physics enablement that ensures that it’s operating within the parameters of physics and thermodynamics, the real world that we live in every day.

That we need to ensure that we have FAIR data. It’s fully interoperable, it’s well orchestrated and traceable. And you used this earlier in our conversation, that it’s trusted. And with that, when you say it’s trusted, it means that from stage gate to stage gate, from concept in the laboratory to a transfer of that technology to now [developing] it at scale with recipes that span a global footprint, there has to be the means to show that with the ever-progressing complexity and maturity of that data, you have carried forward all of the traceability necessary that you can look at either an asset or a product and you understand what has led to it today and what it is going forward.

Have I encapsulated all of those or is there anything I might have missed, Bill?

William Hahn: As usual, John, that was perfect.

John Nixon: Well as always Bill, thank you for being here. I am so glad to have you here as a guest.

William Hahn: Thanks John, great being on the podcast today.

John Nixon: Well, I want to thank our audience for joining us today on the Industry Forward Podcast. I hope this episode helped you process your process industry knowledge. Please join us next time and we’ll continue to explore technologies and ideas that are shaping the future of industry. For now, goodbye!

John Nixon – Global Vice President of Process Industries at Siemens Digital Industry Software

As Group Vice President for Process Industries at Siemens, John leads a global team that helps process industries leverage digital solutions that enhance efficiency, accelerate innovation and achieve sustainability goals.

John has over three decades of experience in strategy, operations and technology deployment for energy, chemicals, life sciences and CPG. He is well versed in the operational and business pressures of industry, including regulatory demands, decarbonization, talent gaps and the push for innovation.

Connect with John on LinkedIn

William Hahn, Director of Solutions Consulting at Siemens Digital Industry Software

As Director of Solutions Consulting at Siemens, Bill leads the digital transformation initiatives for large enterprise clients across life sciences, CPG, energy, chemical and infrastructure industries.

Bill has over 18 years of experience in digital strategy development, systems engineering, and business process optimization. He specializes in end-to-end software development, and implementations, that aligning technology solutions with business objectives like risk management, compliance and operational resilience.

Connect with Bill on LinkedIn

The 5 best FAIR data secrets to empower process engineers – Transcript

Leave a Reply Cancel reply

What to read next:

Leave a Reply Cancel reply