Exploring massive engineering data analytics with Ian McGann (Series 2 Episode 1)

By Stephen Ferguson
Interview with Ian McGann

Launching Series 2!

In the first episode of a new season of the Engineer Innovation podcast we bravely enter the brave new world of Massive Engineering Data Analytics, which lies at the crossroads of the artificial intelligence, machine learning and the digital twin

In 1993, when I first started work, there were 15.6 exabytes of storage in the world – that’s the sum total of all the digital data that our species had ever created. 30 years later and we store up that much data every 15 days.

Collecting, cleaning, storing and processing massive amounts of data is a non trivial process. If you can overcome some of the obstacles then the rewards can be immense:

Our guest this episode is the irrepressible Ian McGann, who Director of Innovation for the Simcenter team and an expert in Massive Engineering Data Analytics.

In this episode we discuss:

Turning huge amounts of data into engineering decisions.

We’ve been doing that for years of course, but in the past, we needed the accumulated experience of engineers to bridge the gap between everything that was learned from previous design iterations. Now we can deploy machine learning algorithms that use all of the available data, including data from previous iterations and real-world performance and use that to make even better engineering decisions.

Collecting, storing and analyzing massive quantities of data.

That data can come from simulation, it can come from validation experiments, or it come from sensor data from the product being used in service. Using the executable digital twin its possible to combine all of these things using simulation to extend sensor data to virtually measure in locations where you wouldn’t be physically able to place a sensor. Whereas previously we would have sorted and filtered and processed that data to make it useful, these days we can also feed it to a machine learning algorithm, which can then predict for us, results that we might not have measured before.

The difference between engineers and data scientists

In data science 80% of the work is in transforming the data into a state which is usable by the machine learning algorithms. Since most data is inherently dirty to some extent, much of that effort is involved with identifying potential outliers in the training set. However the good news is that you don’t have to be a data scientist to get involved in Massive Engineering Data Analytics

  • Turning huge amounts of data into engineering decisions
  • Collecting, storing and analyzing massive quantities of data
  • The differene between engineers and data scintists


Stephen Ferguson:  Hello, and welcome to The Engineer innovation podcast. With me today is Ian McGann, who is a director of innovation as part of the Simcenter team. When we started this podcast, the topics that we really wanted to focus on were digital twins, machine learning, artificial intelligence, and also cloud computing. And today we’re going to be talking to Ian about something which lies at the crossroads of all those topics, which is massive engineering, data analytics. Hello, Ian. Hi.


Ian McGann:  0:41

Thanks for having me. Good to see you, Stephen,


Stephen Ferguson:  0:43

it’s good to see you too. And as well, let’s start off by talking about this word massive, because as engineers, yeah, we’ve been involved in the big data game, before it was fashionable. I’ve been in engineering for 30 years, and I still have a filing cabinet full of engineering analysis files stored on DVDs on magnetic tapes. So what does massive mean? Because that’s quite an emotive word in the context of massive data engineering analytics, to


Ian McGann:  1:11

maybe go back to the point that you’re making you have a lot of data that you were storing, and keeping on tapes and stuff like that, what did you use the data for?


Stephen Ferguson:  1:18

So the intent was really, of course, it was always a backup. So we’d do a simulation, we’d exploit it for what it was worth. And then we’d finish using it. We would’ve exploited it as much as we could, in the time available. And then we’d put it on a tape, so that if we ever had to for QA purposes, if we get called back to verify results, or if the project came alive again, and we had to do further iterations, we’d always have that information available, although 95% of the time, we never did, and they just sat gathering dust in a filing cabinet, until we don’t have the tape machines that we can recover the data for anymore.


Ian McGann:  1:54

One of the things you said you create a model and you exploit it. So you use that model at that time, to make decisions to act and to decide to do something. For me when it comes to big data engineering, or massive engineering, data analytics. This may be subtle differences. But it’s collecting your data of storing your data, analysing your data with the goal. And I think this is important, the goal is to turn larger quantities or large amounts of data into decisions. But you were making decisions on a single point that you had, you said, Okay, we did a simulation, and we used it, and then we stored it away. And then the next time a problem came along, you had a new model, and you did a new simulation, and you made decisions based on that simulation. And you as the engineer, because you probably did the previous analysis, you were building up in your mind the knowledge of the previous generations, and you could make decisions based on it. So your experience was a big part of it. And massive engineering data analytics, tries to capture all of the data, all the previous variations that you have to help you make an even better decision, because I’m sure there’s things you forget. There’s lots of things I forget when I’m working. So it’s all about turning large quantities of data into decisions. And it involves collecting, storing and analysing that data to do so


Stephen Ferguson:  3:13

where are we talking about this? Where is that data coming from? Principally, is it coming from simulation models or experiments or field tests


Ian McGann:  3:19

Yeah, all of the above. So consistency is nice and key for a lot of these things. Take modal as an example. NASTRAN solver 103. So we’re collecting data, we’re solving Astro ensemble 103 For the last 20 years on models. And we’ve learned that Eigen frequencies of the structure we know roughly where they are and where they lie. And it’s never a big surprise, let’s say when we go out and we do the measurement we’re pretty spot on, right? In that case, if I did the simulation, and I was very familiar with the structure, I would trust the simulation, I would say, Yep, it’s good. I’ve done this hundreds of times, I know what I’m getting the result was good. And I can rely on that data. In the case where I can’t rely on it. Where it’s a new kind of structure, I’ve never simulated it before. It’s got some nonlinear components. Next, maybe I want to rely on the test data, the actual physical data that after the prototype is available. So the source of the data really depends on your level of comfort and the accuracy level to the data as well. But then there are areas where you can’t measure, you can’t put cameras, you can’t measure inside a vat of molten steel, it would just melt. So those areas you need the simulation, you have to have the simulation data, that’s the only thing you’ve got. So in some cases, you have to have simulation. And in other cases, it’s really depending on where the best result is coming from and where you would trust it. In a lot of cases, I would trust the simulation for modal testing. Perfect right? stress strain testing, it’s great simulation gives you more information that you could get on the test. You have to have strain gauges, but then there’s also the combination and I’ll we’ll come to that, that’s like the digital twin aspect of it.


Stephen Ferguson:  4:55

So data have to be and are we talking post processed engineering data So, are we talking simulation files and simulation results? How are we feeding the algorithms behind this,


Ian McGann:  5:06

in some cases, the algorithms, depending on what you want to get out of it. So if you want to get the Eigen frequencies as a result out of the machine learning or out of the data, so we’ve gone a step already too far. So let’s say we have large quantities of data, and we want to turn those into decisions. And to help us make those decisions. We can search query filter on the data, or we can also feed that data into machine learning algorithms. And those machine learning algorithms predict for us results that we might not have measured before. That’s the other part to it. So in those cases, when we’re using machine learning, the type of results we want would be the same type we would get out of it, I want to know the eigen frequencies, I want to know, the stress and the strain, I want to know the flow rates, and I want to know where the hotspots are, right. So the results we want as an output are the same results that we will use from our large amounts of data to feed into those machine learning. And if you followed any of the machine learning discussions related to ADAS and autonomous vehicles, there, it’s all about labelling the data. It’s all about annotating and labelling the data. So I have an image and I say that’s a person and it’s a car and I label everything in the world of outside and stress and strain in the world of acoustics. And the labelling of the data means well, that’s a booming noise, or that’s a modulation, or that’s a certain stress and a certain value, right. So the labelling is a little different. But still, here’s a time series data. And I’m labelling that time series data as being the the tone to noise ratio. And how did I get that while like processed the data? Well, I think of analytics, data analytics, let’s say there are stages to it right? Your world, you would do a simulation, so you gain information. And then maybe you process that in some way, and you gain some insight. And then you could also do some optimization with it. So you can say, I want to put a variations on the design, and then do some optimization, information to incite to optimization. In analytics, we would call that descriptive, or diagnostic analytics, that would be like your information and insight. And then predictive and prescriptive analytics as you go over towards the optimization side. So getting data is nice. But as you sor,t filter and process that data, you start to work towards the inside and the optimization levels. Am I making sense?


Stephen Ferguson:  7:19

Yeah, absolutely. How much work is involved in getting that data into the correct kind of labelled format, or in a format that the algorithms can understand. And we can feed to the machine learning algorithms,


Ian McGann:  7:32

Most of the algorithms today I’ve seen are taking an input a CSV file, and then there are images you’ll see as well as algorithm. So if I have an image, I can feed that in to the model. But a lot of time, the algorithms that I’ve seen are working with CSV files, so a scholar values, pulling values that you’re basically inputting, which means I have to, in a certain point, translate that data and transform that data to the point where the algorithm will take it. And that’s a big part of massive engineering, data analytics. When you look at the whole world of data science, 80% of the work is transforming your data, wrangling your data, moving her data, preparing it, finding outliners in your data, right? Because what happens is, if you take road load data acquisition, the guy drives around a track 100 time and then one time he hits the curb, he’s not supposed to, and that causes extra data, I want to make sure that curb is never in the inputs into my analytics, because it wasn’t intentional, let’s say, it’s an outliner. So that type of work, you have to do that before you feed it in, good data is extremely important.


Stephen Ferguson:  8:39

So if I look at some engineering simulation results, yeah, I’ll use my engineering intuition experience, as you say, to spot the outliers to spot the bad data points. Sometimes I’ll do that accurately. Sometimes it won’t be accurate. And maybe I should have included that in the set. But that’s different point. So that part of the process is still manual, or can we train the algorithms to identify and spot the outliers?


Ian McGann:  9:00

Yes, you can use machine learning and algorithms as a way to sort and filter your data as well. Yeah, absolutely. So there’s technologies out there for you to map type of display, where you basically throw all your data into the unstructured learning. And then the output then is you scatter plots. And then from those scatter plots, you’ll see islands kind of form. And you can label those islands. Or you can say, Well, maybe you find something interesting about those islands, you label them, which always find is the outliners will be like, they’ll be far away. They won’t be near any Island, let’s say those are really good indications of data points that are not relevant for your dataset. And then you take those and you basically exclude them for your input. While you’re looking at the data. By the way, the data basically is you’re feeding it in, you have maybe say 1000 data points, you’re gonna take 80% of that data, use that for the training and then the other 20% for validation. Now, if one of those outliner data points will be used for the validation, let’s say, you’re not going to get a good result because it’s outside your norm. And that’s where you want to make sure they’re excluded. You also want to find out why.


Stephen Ferguson:  10:07

So it comes down to basically making sure that you’re training appropriately. And then when you use it in anger, you can spot those things and exclude them if necessary, which makes sense. So we talked about the effort that’s involved in data processing, which I guess will decline as we go on with this as well, because there’ll be more and more tools available, people have more experience doing this. Is it something that regular engineers like me can access? Or do you have to be a data scientist to make this useful? At the moment,


Ian McGann:  10:33

I would say there is a gap in knowledge, understanding experience between the data scientist and the traditional engineer that will come out of college, right, depending on your mechanical or electrical, whatever, there is a gap. Even if we took an engineer, 10 years ago, when we say, Hey, here’s a job, you’re not going to be an NVA specialist, they would have zero NVH experience, it’s very hard to get that in college. Regardless, you’re going to build it up. And we see universities today, they’re transforming their programmes, pretty much every university out there is transforming their programmes just say that if you are an engineer, you’re going to have machine learning as a curriculum, it’s going to be a topic that you’re going to learn a lot about. So data science is part of the curriculum now. So the engineers coming out, they’re more familiar with Python and machine learning processes than they ever have been before. That’s changing. But myself, it took a while I had to go and learn it. And I had to figure it out. And I still don’t know everything. And I’m not a Python user, I’m still at that level of trying to understand the big picture. Thankfully, there’s teams here that no Python very well and can build up the models and do the work. But they themselves were and one of them was an acoustic specialist. And he’s now writing a machine learning algorithm to take all of his knowledge, all of his experience and put that into a machine learning algorithm. So basically, he’s saying, I want this machine learning algorithm to be an acoustic specialist. So copy him, and put him all over the place. And we actually have a good use case for that.


Stephen Ferguson:  11:56

And that’s a good point, so you’ve got gnarly old guys like me and you looking at the end of our careers, we’re close to the end of our careers than we are to the start of it. You’ve got all these younger engineers coming through, who have got this built in machine learning data science expertise, because they’re learning in college because they’re young and impressionable, they can take on the new information. I guess one of the use cases for all this is that people like us people who’ve worked in history for a long while, we accumulate experience, but it’s all stuck inside our heads engineering experience applied to projects. And I guess what machine learning is doing is taking that experience that you’ve learned from individual projects, individual simulations, and making it more transferable. And general is that correct? Would you say?


Ian McGann:  12:36

Yeah, it’s definitely helping to capture the experience that we have, and then transfer it or reference a use case, we’re working with Ferrari at the moment. And if you’ve ever been in a Ferrari, or observed one, it’s an amazing sound feeling experience, right? So it’s not just about the drive and the dynamics of the vehicle. The sound of the vehicle is super, it’s just an amazing car to feel in terms of the sound in terms of the drive Great. At some point, if you ever get the chance to somebody said, Hey, do you want to take a ride? Say, Yes.


Stephen Ferguson:  13:09

Not in the driver’s seat, because I don’t trust my driving. So I had to be in the passenger seat. But yes, when I get the opportunity,


Ian McGann:  13:14

I’m what they call Black Lake. So I could never get inro that situation where I was lucky. But yeah, so the idea there is that they have a person who knows what the Ferrari should sound like, how it should sound and how it should perform. So they have that experienced person. But like you and I, that experienced person is like getting to a point where he’s gonna retire at some point, right? He’s gonna be saying thank you very much Ferrari and goodbye. So we’re doing projects with Ferrari to basically try and capture the sound or that feeling that you get of the Ferrari. And then basically, when a novice when a person comes out who doesn’t know, or doesn’t have that experience, let’s say he’s driving the car, we can tell him if it sounds like it should sound or not. That’s the key point. If it doesn’t sound like it, should, they obviously then have to fix something or change something, it’s tuning. And that’s great in the design stage as well, when you’re really early in the design stage. And you’re trying to make sure that okay, we’re capturing that Ferrari experience, that you are not relying on a person, you can actually use that machine learning algorithm to help you give you that, yes, you’ve reached it or No, you need to keep working. That’s just one use case of capturing our crusty experience, as you might put it.


Stephen Ferguson:  14:29

Yes. So what is the payoff? When we started, we talked a lot about the obstacles to accessing this. Yeah. Which probably isn’t fair. What is the payoff for mastering massive engineering data analytics?


Ian McGann:  14:41

Good question. When you look at balancing, and I don’t really mean the type of balancing that I have a colleague here behind me who’s got a unicycle, maybe it is a bit like that! The kind of balancing where I’m trying to get the optimal energy management of the car. I recently bought a car and it was an electric car and I’m looking at a database called EV dodge database.org and it gives you the range of all the cars. And I’m closely focused on that. And I’m trying to figure out, I want to get a certain range out. I know I do a drive every week back and forth to go see my son, and I want to be able to make sure I can do that without any problems. So the range estimate is really important. And I don’t trust the car companies to give me the correct range. Never.


Stephen Ferguson:  15:24

Never as you never used to trust your ICE engine car manufacturer for the MPG because those were just specialised optimised ideal circumstances estimates that you could never recreate in reality.


Ian McGann:  15:37

Yeah, exactly. So the EV database EV-database.org is a great site. They actually give you an independent study on that. So it’s like, okay, well, we’ve tested the cars, and this is what we’re estimating it is the range to me. So that range estimate becomes really important. Now, I am on the other side, I’m now an auto designer automotive OEM, right, I’m an automotive company. And I need to then produce a really good range on this car. But at the same time, I also want to have a really good driving experience. And I want to have a really good NVH experience. So the vibration of sound. So road noise, extremely tiring, extremely annoying, rattles … So anyway, I want to have a really good experience. And the balancing act that you have to go through to get that like to do energy management, like battery power consumption, thermal management, you got to understand the drag coefficient. So there’s your CFD simulation, you got to understand the different driver profiles, you got to take an estimated energy consumption. So the thermal the fluid, the air co, it’s a cold morning, I’m turning on the air co, energy rate goes down. So they gotta give me the estimate in cold weather, they’re going to give me the estimate and miles, in hot weather.  It’s a battery capacity, the age of the battery, that all has a big influence in all this. And then you’ve got an NVH guy who says I want double glazing windows, and I want the best trim ever. And I want this to be a cocoon, and I don’t want to hear a thing when I’m in that car. And the vehicle dynamics, people are saying we need a lighter, lower centre of gravity rougher suspension. So the NVH guys are gonna be crying. But the dynamics people are like, Yep, so now I gotta balance all of these attributes. And that balancing act is not easy. So the payoff of all of this is huge, when it comes to being able to get an answer really fast. I’m in a room, and I have to make a decision. And I got a management saying to me, okay, what will happen if we change the trim on the door from trim A trim B, and I need to be able to say, it’s going to be XYZ influence in terms of costs, and in terms of NVH performance in terms of dynamic performance, and so forth. Machine learning can give me that feedback, nearly instantaneous, right? So we train all these models, and that’s the kind of use cases we’re working with customers on. It’s like, we take all the NVH attributes, all the KPIs key performance indicators, they would call them, we take all those KPIs. And we take the different types of vehicle inputs, the tires, the tread type, the length, the budget, the windows, the trim the headliner, we take all of that input. And we basically feed that into the machine learning algorithm. And the output is that person can now build up a Frankenstein car, or he has the car and the manager says to him, hey, what happens if we change the trim? And he just pulls down menu and changes the trim and immediately gets an answer on the result. And it’s super accurate. It’s really good.


Stephen Ferguson:  18:37

So I used to work for an automotive OEM 25 years ago and we used to have those Monday morning design meetings, to spend a weekend stressing trying to get your simulation results working, and you could deliver the results from the simulations. But if you had to make those decisions, you’re talking a week away two weeks away. So to be able to do those things in almost real time, I think is incredible. How about the question of trust, though? So we’re making those instantaneous predictions. And you’re saying that they’re very accurate because they’re trained using lots of previous data? But how much can we trust those they’re pre validated? Or do we have to validate those assumptions afterwards.


Ian McGann:  19:11

So the data set that you’re feeding into it 80% is used to train 20% is used to validate, and you then check it in the future, right, you’re bringing up an important point, I can trust that data and the results that model for so long before it starts to give me inaccurate results. So you have to have in place an industrialised way to update those models to feed them new data over time. New tests are done, or new simulations are done, are doing drag coefficients all the time. And so I’m doing these new simulations and I’m constantly feeding that data in, say, every month, all the simulations that were done on drag coefficients, I take that I feed it into the machine learning model and they update it. So now got new information, and it’s improving itself over time. At the same time, if you completely automat it, you got to do checks, so you have to check okay, Are there outliners, there is a need for a person there to make sure that the quality of the data is still good that I’m putting into the system. If I’m using a testing, I need to make sure that I have consistency. So template based measurement is what we’re pushing, they’re automatically capturing the temperature automatically capturing the road surface conditions, all of these things, right. So automatically doing a spike detection quality checks, right? So there are certain things that we do to help the process to make it the consistency clean, but you need those. Otherwise, the work in terms of preparing all of that data becomes too big.


Stephen Ferguson:  20:36

So where does the digital twin come into this executed digital twin. So we talked about that you can’t always measure where you want to measure, I guess that’s the route into the XDT, isn’t it?


Ian McGann:  20:47

That’s definitely one of the virtual sensors. There’s a lot of digital twins or documentation wins, there’s someone in the Simcenter organisation, we’re focusing on the performance based digital twin, I want to understand what of a digital twin of the dynamics of the car, it’s a performance related or the acoustics etc. And then we have an executable digital twin. And the executable digital twin is a virtual representation of a physical asset. This is I’m trying to remember the definition in the head. So it’s a virtual representation of a physical asset, it’s linked to the physical assets. So when the physical asset moves, the virtual asset moves, and you link it through a sensor. And that sensor doesn’t have to be the most complex, deep sensor that you cannot always measure it. And the advantage there is that you can actually take a use case like we have one for CNC machines, and the CNC machine is moving and it picks up a new drillbit. And as it starts to spin up, sometimes there’s an imbalance. And that imbalance is usually caused by dirt or particles that go up into the fibre, I don’t know what you call that! And we basically detect that in real time, because the potential types of imbalance you get, you can measure it, but it’s hard to measure right at that location. So we do is we actually measure it in a different location. And we use a digital twin to help us get the exact location we need. Basically, now this is a digital twin that we built into the machine, so that you as an operator, it’ll inform you, there’s an imbalance. And the operator then will either activate the air system to blow out the bit and clean it. And then it will reattach and continue with job. So that’s a type of virtual sensor, I can’t measure in location A, I can measure in location B, and I’m going to create a model to help me do that.


Stephen Ferguson:  22:38

So you can’t measure in the centre of your molten vat of steel, which is your early example. But you can put sensors on the wall outside of the vat. And you can use simulation to work out what the results should be in the place where you’re really interested in.


Ian McGann:  22:52

Yeah. Now in order to do that, though, those are the simulations that you run in real time. So we have to take that simulation and reduce it down, we have reduced order modelling technology to take that reduce it down. So that can run in real time. Not every simulation has to run in real time. And that’s the other thing, some simulations can run in the cloud. And they can run periodically with new data inputs. We have a system set up for a water company, this is a nice one it and I actually like it because it manages the entire water infrastructure of a country. Think of the amount of pumps think of the amount of valve, the amount of loaders all of these systems running. And it’s extremely complex. And you want to be more efficient, you want to make sure that you can predict what is the usage going to be so I can get the water to the location that needs to be so that it’s available for the people. That’s the job, efficiency. And you can suffer because if you’re not, if you’re not predicting well enough, you’re pumping, and you’re costing more money, night tariff day tariff, maybe you have solar panels out there, I saw that recently, solar panels on top of the water reservoirs and, we’ll drink to that. So solar panels on top of the water reservoir to collect the energy right when the water is there’s so basically powering it during the day. So it’s cheaper during the day. So if I can predict the usage, when I have the energy, I can then save myself a lot of money. So machine learning to help you predict. But it’s not only about the machine learning, I also have a physics models of all the different valves of all the different pumps to help me predict when things will fail that I haven’t ever measured before. So that makes sense. Yeah. So machine learning is great for I have measured it. And I have that data. What if I haven’t measured it if it’s something that has never broken before, if it’s something that’s never happened before never fails? It’s not part of my failure mode effect analysis FMEA. Those hasn’t failed before. So I don’t know if this physics models are needed for that because they can predict when that failure will occur because it’s physics They said same to you, oh, this is going to fail if your model is is updating in real time with the real data feeding into it, and they’ll tell you, your pump is going to overload or your pump needs to cool down. And we’ve never seen this before, because it’s an unusual case, situation, slash whatever. So it needs for that combination, I would call it model based machine learning it’s great, because now you can have that mix of machine learning, which is experienced, like what we’re talking about you and I, oh, it’s crusty old guys, with all of our experience, combined with a physics based model that gives you insight into the structure as it is today. Because the model updates. So the model I have today is different to the model I had a month ago. It’s updated itself. And it’s updating every day, right. So that combination, and then we also have ones where we have machine learning with real time physics models, linking in to the cloud, where you have very complex CFD calculations running. And those complex CFD calculations are updating every 10 minutes or so. So we’re feeding it with new inputs, it runs, it delivers the output, and then updates the overall system network making improvements. We did that for syringes. A nice example, we did that one for syringes, that kind of use case. And we did it for food industry extruder for potato chips. So there are a couple of examples are being used in a lot of different areas now.


Stephen Ferguson:  26:27

So I recently spoke to somebody at a conference who was using executable digital twins in military satellites. So satellites, we fly them into space, and they orbit, but sometimes they’re disturbed by solar forces or by impacts and have to adjust their attitude. And so they use momentum wheels, which basically spin up and create a torque, which changes the angle of the satellite. Now, those in order to measure the health of those momentum wheels, they’ve got a limited number of sensors, and they use executable digital twin, but they use it also for kind of cybersecurity as well. So if there was an attack on that satellite, a cyber attack, and so like, they might use some of those sensors, or lose some of those sensors. But using SQ to be a digital twin, they can recreate the data from the missing sensors, and keep the asset operational as well, which I guess is another usage in a kind of cybersecurity aware world.


Ian McGann:  27:22

Yeah, it’s cool. Actually, I didn’t know that example existed. I love that use case, because it’s not only usable for extending your sensor range out to things that you can’t measure. But it’s also usable in case a sensor has failed. And something has been damaged, a sensor has failed. And we need to understand the latest information. That’s a nice example, because you can’t get up there and access it. And the really great thing is that you can take the model from a week ago before the incident and the model now and then compare the two with each other and see, okay, what is different about this structure that we might gain some insight to, because we can’t physically look at it, we can’t see it, I can’t touch it, unless there’s somebody there. But see a really cool example.


Stephen Ferguson:  28:09

So a digital twin is a real physical asset, operating under the circumstances it was designed for and perhaps some extra circumstances it wasn’t designed for, to what extent can we take the learning that data accumulated by that digital twin, and use it to future generations of that product? Or that vehicle or whatever? Is that a use case? So can we learn from real life experience and build that into our next generation of designs,


Ian McGann:  28:35

not only next generation, but existing. So imagine you’re changing the way you sell a tractor today, you sell a tractor, and you say, I’m gonna sell it for its life. Here you go, you pay a price. Or I could sell you based on the damage that you do to the tractor. Now there’s an expected damage, right, which is it’s got a damage curve. And if you’re using the tractor, like you normally should, that won’t change. And I can now measure that with an executable digital twin and say, Well, you were doing light work, you weren’t abusing the system, you were using it as intended. And you’re paying less than you normally would because it’s actually in good condition. So apply that not just for a tractor, apply for every car, every scooter that’s out there was east scooters that are going around. I want to know okay, one of them had been damaged more than it should have been, I need to go find it. So in that context of what we would call a customer correlation. CUCO This is a project that started in in Germany and Europe 1982. Around that timeframe. We are working on what we call the CUCO customer correlation, which was trying to understand in different environments, how people use products, right how people use cars in that case, but it’s not just for cars, you can apply it to anything. I’ve seen Scottish people take a french fry machine and dip a pizza and I  was like why? People use products in ways you never expect.  Even did a Mars bar, like they took a Mars bar, put it in some batter and then throw it in. 


Stephen Ferguson:  30:05

It’s part of their cultural heritage now, a deep fried Mars bar.


Ian McGann: 


Okay, you can’t always expect people to use your products in the way that you’ve designed them. And the executable digital twin, because you have that extended sensor range, let’s say, allows you to get that information earlier. And the one thing, like you were talking about security, I don’t have to send the data, the model secure. So I can have the data on the vehicle or on the component. And the data is basically updating the model. And it’s giving me that executable model basically is updating, and then I send the model back over the model small enough and the model, then I can then rerun that model locally in a secure way, and basically get the extended data that I need. And I can compare models, I can see the difference between the model in India and the model in China, and see, okay, how are these systems being used different? And the reason you might do that is because when you want to compare something, you try and take the same input loading for both and say, Okay, if I use this input loading and both models, what is the difference? How do they perform, whereas the input loading in China might be different to the input loading in India or Australia or Europe? I’d say.


Stephen Ferguson:  31:15

That’s really interesting. Going back again, to massive engineering, data analytics. Where are we on that journey? As engineers? Like I said, at the start, all of us have been big data people since before big data was a thing. But to what extent is industry tooled up at the moment to use that data? Are we early on the journey? Or is it already a mature approach?


Ian McGann: 


Definitely not mature? No, we’re still early. What you now see is like, if you look at machine learning in our daily use, you got Netflix telling you what you need to watch next. And you liked his show. And I think I personally hate that. And you got nicer examples, like we’re improving our energy usage, because we’re optimising the temperature, the lighting, and we’re decreasing the overall wasted energy in buildings. We’re using machine learning and financing spotting odd behaviour. This is a nice one where suddenly, hey, we see that you’re spending something that you normally wouldn’t spend..



My wife does that for me. But I guess algorithms can do it as well.


Ian McGann: 


Yeah, the algorithms and things that remind you Yeah, are you really doing well. And then you get a large, they call it large language models. LLMs, like the chat GPT, where you say, hey, write a poem about my son, and you give it all the conditions that you got a poem, for, they’re finding about those chat GPT as they go even further than they expected, they can now help you generate executable code, they’re expanding it out even further than they thought they could originally. So the use cases are expanding. In our world of engineering, I can’t rely on the same type of machine learning algorithm to help me make design decisions. It’s not industrialised enough, right. It’s okay. If it’s Netflix telling me what to watch. It’s okay if the thermostat didn’t turn off on time. But if I’m in, let’s say, Siemens energy, and I’m relying on this machine learning model to help me do an actionable item control loop, where it’s going to shut that thing off, and help it cool down properly, this generator, then then turn it on at the appropriate time and speed it back up again, I can’t rely on the same type of algorithm, I have to have good data. Good data is key.


Stephen Ferguson:  33:28

Yeah. And you have to be able to trust the decisions, don’t you in those circumstances as well? Yeah. Because the consequences of getting them wrong are much higher.


Ian McGann: 


And the consequences of I quickly rely on it happen. I don’t know if that makes sense. It’s like, I very quickly get comfortable with the tool. And I rely on its results. And then suddenly, it’s wrong. And I’m still relying on it. And I don’t question it enough. So you have to have that question. So what we’re building is this neural network. Don’t ask me more about that, please.


Stephen Ferguson:  33:58

It’s one of my favourite topics. But anyway, I’ll let you explain. Can you quickly explain Bayesian to us?


Ian McGann: 


It gives you the accuracy level of the prediction, though, it tells you, okay, here’s a predicted result. And this is the level of accuracy This is the level of quality that you’re getting out of it is somewhat, you can trust it. And I think that’s what’s nice about that technology is that we have to build those types of safeguards into the model that we’re building. Because if I’m going to really rely on this result to make it a very costly, expensive decision that could potentially harm somebody, I need to make sure that I have some sort of fallback to say, well, here’s the prediction. And this is the quality of that prediction. So when I’m making that decision, I know within certain confidence level, why? Because I can always say I’m sorry, I can’t tell you right now, I’m gonna have to run a new test or a new simulation. And I’ll come back to you in a week. Like you’re saying a week’s time. That’s what it takes. And that should be okay.


Stephen Ferguson:  34:51

I think what the Bayesian stuff does, it means that you need much more evidence to convince you of things which you wouldn’t know really expect. So if I show you a picture of any baby and you’d accept it was a baby, if I show you a picture of a dog dressed up in baby’s clothes, you’re going to question that and say, Well, I’m going to take a lot more convincing before I accept, that’s your baby. I mean, that’s a very crude example. But that’s what Bayesian inference does, as far as I understand anyway. So yeah, it’s really cool that we’re building that kind of technology into machine learning, because that’s a question I’ve had for a while. So I’m glad to hear you use those words as well. You’re involved with simulation, you’re involved with test, you do validation, you kind of a standard Engineering Simulation and test literate company who’s using it in day to day basis, but you wanted to get involved in massive engineering data analytics, what are the first steps do you think that you need to take?


Ian McGann: 


First steps is you should have a good process in place. On the testing worlds, that means that I have requirements, there’s a requirements management tool, it’s requesting a test to come in. And I then from then on, have a consistent approach to perform that actual measurement. In other words, I have a naming convention, I know this channel accelerometer or microphone, it has a certain name. And for that specific test that’s been requested, it’s the same all the time. It doesn’t change, it doesn’t change from microphone one to microphone, 22. It’s microphone front, left, whatever, it’s got a name because of its location, and it’s that location is fixed. So that kind of consistency, that kind of procedure is important. Another thing that’s very valuable is well, what are you measuring? I’m totally in the measurement world, or what are you simulating? I need to know, what is that device under investigation. And under test, I’d say. So I have to have a descriptive view of that. So in other words, the bill of materials, becomes really important. It is this vehicle with this tire configuration with this trim with these leather seats. That is what I’m testing, that is what I’m simulating and I have that stored somewhere, I can track it. So I know that link is important, because this is the result I’m getting, this is what I measured. And all of that information goes into the machine learning. So I need to have that bill of materials become super important. And you need to be able to have that something that evolves. Bill of Materials isn’t a static thing that evolves over time, parts change, etc. So I need to have that historical view. The Enterprise bomb, let’s say, is key for that. So having the bill of materials, knowing what you measured, having that available. And then having a consistent simulation approach. I have a CAD file, I go through a mesh, I’m using this type of meshing, I’m defining these type of boundary conditions. This is my velocity boundary condition at the inlet, this is my outer boundary condition. This is where my microphone placement is going to be for my vibro acoustic simulation. And I’m going to get a sound pressure between these frequency ranges with this level of accuracy per frequency, let’s say or per half frequency. And I’m going to be able to use that then to get the overall sound pressure at the drivers here in a DBA or songs or whatever format it is, as long as I’m consistent. That’s what’s important. The KPI tool I use to calculate that formula for my key performance indicator consistency.


Stephen Ferguson:  38:14

So to finish off with how can Siemens help people on that journey towards massive engineering data analytics?


Ian McGann: 


I think we’re the only company actually, that has the testing that has the simulation that has the BOM, the bill of materials made you that enterprise BOM, so Teamcenter. So we got Teamcenter, we have Mindsphere for the cloud based calculation and processing and collection of data on the fields, let’s say. So I have factory for data, I want all of that. I want to know how this part was manufactured. And every associated data related to that, right. So I have all that Mindsphere has helped me gather all that the field data. So this data becomes really important. Teamcenter tells me what it collected the data for this vehicle, or this component or this motorcycle, right? It collected that data for that component. And then it also helps me manage all the models associated with that. Because it’s not just one digital twin, I got one for the CFD, I got one for acoustics, I got one for this, I got the test data validating are all right, all of that feeding into helping you feed into machine learning algorithms to do that kind of prediction. I think Siemens is the only company in the world that has all of those tools from design, manufacturing, and in service. There’s nobody else out there. Now I’m not saying that you have to come to us for everything. I’m an engineer, so I wouldn’t do that. I don’t just rely on Netflix for my movies.


Stephen Ferguson:  39:38

And quite often you can’t because there are other tools are outside of our ecosystem outside of our product range, but in the engineering ecosystem, and you can still bring those in. Yeah?


Ian McGann: 


Yes. And we’re open. I think that’s the beautiful thing about it is there’s a mandate. My management tells us you have to be agnostic. You have to be able to work with other tools you have to be able to play nice. It’s not a greenfield customers have Other simulation and other testing tools out there, make sure you work with them. And then capturing that workflow, all of that with our machine learning tool, we have machine learning tools as well to capture that workflow and feed it into the algorithms and keep that updating over time. I’d say we are very well suited if somebody says, hey, I want to get into this. And I’m already using this and this tools from Siemens. How can you help me? I think we can help you a lot.


Stephen Ferguson:  40:25

Excellent. And I think that’s a great time to finish this podcast. So all the remains is me to thank Ian for being a wonderful guest. And thank you for listening to another edition of the Engineer Innovation podcast. Thank you.


Ian McGann: 


This episode of the Engineer Innovation podcast is powered by Simcenter.  Turn product complexity into a competitive advantage with Simcenter solutions that empower your engineering teams to push the boundaries, solve the toughest problems, and bring innovations to market faster.



Stephen Ferguson - Host

Stephen Ferguson – Host

Stephen Ferguson is a fluid-dynamicist with more than 30 years of experience in applying advanced simulation to the most challenging problems that engineering has to offer for companies such as WS Atkins, BMW and CD-adapco and Siemens. Stephen has tapes full of STAR-CD models from the 1990s in the dark recesses of his filing cabinet: CSET NEWS VSET ANY.

Ian McGann – Guest

Ian McGann – Guest

Stuff to watch:

Good reads:

Engineer Innovation Podcast Podcast

Engineer Innovation Podcast

A podcast series for engineers by engineers, Engineer Innovation focuses on how simulation and testing can help you drive innovation into your products and deliver the products of tomorrow, today.

Listen on:

Leave a Reply

This article first appeared on the Siemens Digital Industries Software blog at https://blogs.sw.siemens.com/podcasts/engineer-innovation/exploring-massive-engineering-data-analytics-with-ian-mcgann-episode-11/