Ontologies: How to make sense of the data explosions in process industries – Transcript
John Nixon: Hello and welcome to today’s episode of the Industry Forward Podcast, where we explore the key trends, transformative technologies and real-world innovations that are reshaping fields from aerospace to energy and beyond. I’m your host, John Nixon, Global Vice President of Process Industries at Siemens Digital Industry Software. And I’m excited to kick off a new series in which we will explore the impact of digitalization on the process industries.

I’m especially excited about the process industries as my background, having spent 30 plus years in industry, has taken me from chemicals to energy to infrastructure in China, Canada, Mexico, Europe and every continent on this planet. It’s been an incredible adventure, an incredible journey to find myself here today on this podcast.
And with me today, my guest, is William Hahn, Director of Solutions Consulting at Siemens Digital Industry Software. Bill, welcome to the podcast. And can you take a moment, tell our listeners a little bit about yourself?
William Hahn: Yeah, thanks for having me, John. A little bit about me, like you mentioned, I’m the Director of Solutions Consulting at Siemens. I cover several process industries, including energy, chemicals, infrastructure, consumer packaged goods, as well as life sciences. And as part of that role, I manage our technical teams that help bring all our solutions directly to our customers.
Some background, I started my career, 10 years at NASA, working systems engineering and integration for the space station program, as well as spending several years in the oil and gas [industry] in asset integrity and operations. And now I find myself, for the past 8 and a half years or so, at Siemens working in the software space.
And I can tell you, John, from all the different industries that I’ve worked in, both in industry and at Siemens, we’re at a real inflection point right now in terms of the explosion of data happening, especially in the process industries.
John Nixon: Bill, you just hinted to an explosion of data. I always say: data and complexity are like entropy. They always seem to be increasing. How does ever increasing complexity affect the process industry?
William Hahn: Yeah, I think sometimes having too much data is almost as bad as not having enough data. You have to use the right data at the right time in the right context. And understanding when to use the data is just as important as having the data itself. So, I think the more that data starts exploding, the more data that’s available at our fingertips, the more time we have to spend to make sure that we properly correlate the data.
John Nixon: When you talk about the sheer volume of data, and we look at this, it’s almost an exponential growth with it, I think about the ability to communicate that data visually. Having worked in the field, I know how important it is to bring that digital twin; to bring that data as a twin to the workface. So, communicating visually around that data, not only at the workface, but also when you think about a concept of a metaverse, the ability to look at all of this data in context, to me, we need a framework that can operate at that order of magnitude. Instead of just looking at a pump or a pipe, when you look at entire systems and entire plants, there’s a degree of maturity around that scale that is necessary today, especially in a world where every headline is about artificial intelligence.
I want to talk about that for a moment, Bill. Not every process industry is at the same maturity level when it comes to data. The energy sector can be working with 40-, 50-, 60-year-old equipment that captures little or no data. Then you have the medical industry that must deal with the human body, where we have only scratched the surface of data that needs to be discovered around that complex system. How do process industries catch up to each other and address these issues?
William Hahn: Yeah, I think, John, you made a lot of good points there. Just look at it at a high level, right? I think there are different audiences for the data now. Just looking at the two examples that you mentioned: the human audience, right, where we talk about visual navigation of the data, but now we also have AI audiences, right? We have these AI agents that are acting on behalf of humans that need to contextualize that data in a different way. The way AI needs to visualize that data and understand that data is going to be different than how you or I need to contextualize that data and understand that data.
So, we have this combination of needing to manage the visual complexity, like you said, but also from an AI perspective: how do we organize that data and create the relationships and understandings, call it ontologies, that AI needs to contextualize that data and make sense of it. Which is going to be different than how you or I does that, right?
John Nixon: Bill, we often use terms like taxonomy or ontology. Could you just take a moment, for our listeners, let’s explain what we mean by taxonomy and ontology.
William Hahn: Yeah, I think, John, ontology is a term you’ll hear it a lot. I almost envision it, more like a web of data. So, if you think about traditional databases, which are tables and rows, right, and then joins between those tables. Imagine more of a web of data where each data point is a node, right? And there’s connections, there’s relationships between each node. You can get this really complex web of data that crosses across disciplines and across systems. We can start creating all this interconnectedness regardless of data model, right?
To use a practical example, it doesn’t matter what your data model in PLM (product lifecycle management) is, or your data model in ERP (enterprise resource planning), right? You kind of extract that out into an ontology and say, you know, here’s my web of data, all the data objects and the interconnected relationships in one system and the other system. And then we bridge that: how do those data nodes connect to the data nodes in this other system. And we start building out a fabric, right, a web of data that’s completely interconnected to truly understand not just the relationships within systems, but the relationships across systems.
John Nixon: If I may, Bill, let me take that a step further as somebody who used to work in the field. If I’m looking at, let’s say a pump, there’s scheduled data associated with that pump from me by my scheduling system. There’s cost data, you mentioned ERP, and logistics information associated with that pump from the ERP system. I’m also going to have operations and maintenance that has been done on that pump. So, I may have a computerized maintenance management system, a CMMS system, associated with that pump.
So, I look at all the different sources of data—could be a CMMS, TMMT (total materials management traceability), ERP, EAM (enterprise asset management)—a whole series of different sources of data that are very different sources of data. And what you’re saying is the ontology is the node or the asset itself and its spider web of connections out to all of these other sources of data that really define holistically everything associated with that pump.
William Hahn: Yeah, think about it, John, from a practical example using an AI copilot, right? I can point a large language model (LLM), an AI copilot, any of the available models today, at the maintenance management system. And then I can also point it at the ERP system and maybe also point it at the PLM system.
So, what do I get from that? Now, my AI has access to those three databases. But what the AI doesn’t know is how does the cost attribute in ERP relate to a scheduled maintenance task in the maintenance management system, which also relates to a bill of materials in the PLM system from a design perspective, right?
So, it has to guess, based on contextual clues, what those relationships are. And every time we know AI guesses, it just opens up the door for hallucination. So, what we’re doing with an ontology is we’re basically asserting that relationship. Saying “no, the cost node actually has a line that goes to the schedule node, which has a line that goes to the bill of material node.” So, we’ve created that relationship even across those different data models and different systems. Seemingly unrelated data, we’ve given it a relationship that the AI can traverse.
John Nixon: So, Bill, what this tells me is when you use an LLM, I have a conversation with AI and I say, I’m considering doing X to this pump. Maybe I want to disassemble it and reassemble it as part of a maintenance procedure. But because I’ve done that, I’ve now taken it out of service. Because of the ontologies, or relationships, I’ve created for that asset to other sources of data, it could tell me, if you take that out of service, here are going to be the downstream effects from that. This system that it’s part of will no longer be in service. Because that’s no longer in service, the following products won’t be produced, or any other number of scenarios. So, what you’re telling me is, because of the relationships that you’ve set up with the other sources of data that would otherwise be in silos, but because of an ontology, a map of relationships between all of those sources of data, I now take the guessing, as you like to say, the probabilistic guessing that is done by AI becomes far more, or hyper accurate, on its downstream effects because of actions.
William Hahn: Yeah, and I’ll add one little thing to that comment, John. I think that was perfect, but it will tell you confidently. The thing with AI is, AI is always going to give you an answer. It’s always going to try. Whether it’s right or not is up to the data source, right?
This is where hallucinations can become really problematic, because when you have a whole lot of data that on its own in silos is seemingly accurate, the wrong response to a user will actually sound pretty good. It’ll sound pretty accurate. So now I’m making decisions off of bad information. Because the AI actually made it sound good to me.
And, how do I avoid those hallucinations? It’s by reducing, like you said, the guesses that AI has to do.
John Nixon: The concept of synthetic data, there’s such a huge opportunity with the technology we have today to uncover the data we don’t know that we don’t know in these industries. And I’m so excited about that. And in fact, it heralds back to a cliche I’ve heard many times that data is the new oil. In process industries, especially energy, we like to quote that often. How can process industries, what can we do to get, let’s say, more valuable insights and extract value or even make revenue from this data?
William Hahn: Yeah, I think there’s some huge upside and some pitfalls in that area. Maybe I’ll address the pitfalls first, right? Unlike oil, when we pull oil out of the ground, for the most part as a race, we know what to do with it, right? You know, we can refine it. We can use it in chemical processes, so on and so forth, right? But when it comes to data, I think we don’t always know what to do with it.
And it comes back to the conversation we had about contextualization. If we use an example of synthetic data, say via digital twin or a simulation, how you or I might want to interpret might be visually in the industrial metaverse I want to look at synthetic data or simulation data of a digital twin of this mixing reactor, for example. because we want to be able to visually see how that is performing. That’s how our brains work, right?
But from an AI perspective, the AI probably just wants the raw data stream, right? Where’s all the raw data from the thousands of data, millions of data points, from the simulation that it can, cut through in a very short amount of time and make sense of that simulation the same way we’re making sense of that simulation from a visual sense.
So, to avoid the pitfalls, we need to understand how we manage that data. Where we store it. How we store it. How we understand what’s the latest version of that data. And then how do we contextualize that data on top of that storage?
And I think a lot of what I see in the industry today, there’s a lot of data lakes, right? Hey, I just want to throw my data in a data lake, and we’ll figure it out from there, right? But the data is only one piece of it. The contextualization is the other piece. So, okay, you have all the data in the data lake. How are you going to make sense of it?
Well, you need to have visual navigation, and you need to have reporting and analytics and dashboarding and ways that humans can understand that data.
And then for AI, I need to organize that data in ontologies so that the AI can actually traverse those relationships that from a human perspective, we may just assume, just looking at the data. Hey, there’s a correlation between A and B. AI may or may not make that same assumption, right? So that ontology might have to be created right as part of the data set.
John Nixon: So, Bill, let’s wrap this up. We’ve had a great conversation today. What’s really important, we’ve talked about data, but it’s data in context, right? We say that over and over again. Contextualization is so very important. Bill, you did a great job talking about, you know, not only the factory floor, but talking about the field and really looking at all the complexity that comes with that. and the ability now with AI to supplement and extract the insights to provide that force multiplier. It’s been just tremendous to have that conversation. And at this point, we’re really looking at a revolution in contextualization. And I’m so excited about the future Industry Forward podcasts, because we’re going to start really diving into conversations across the process industry that drive data and contextualization and the transformation that we’re seeing before us. It’s been just incredible. So, I’m asking you to come back, join us for a future Industry Forward podcast. You’re going to gain a lot of insight, and I look forward to these wonderful conversations.
Well, I hope this helped our listeners process their process industry knowledge. Did you get that pun, Bill? Thank you for listening. Please join us next time.
William Hahn: Love it! Thanks so much.
John Nixon – Global Vice President of Process Industries at Siemens Digital Industry Software

As Group Vice President for Process Industries at Siemens, John leads a global team that helps process industries leverage digital solutions that enhance efficiency, accelerate innovation and achieve sustainability goals.
John has over three decades of experience in strategy, operations and technology deployment for energy, chemicals, life sciences and CPG. He is well versed in the operational and business pressures of industry, including regulatory demands, decarbonization, talent gaps and the push for innovation.
William Hahn, Director of Solutions Consulting at Siemens Digital Industry Software

As Director of Solutions Consulting at Siemens, Bill leads the digital transformation initiatives for large enterprise clients across life sciences, CPG, energy, chemical and infrastructure industries.
Bill has over 18 years of experience in digital strategy development, systems engineering, and business process optimization. He specializes in end-to-end software development, and implementations, that aligning technology solutions with business objectives like risk management, compliance and operational resilience.


