In this episode of the Engineer Innovation podcast AI Expert Justin Hodges and (not so expert) Stephen Ferguson explain how engineers can start using Artificial Intelligence and Machine Learning TODAY, and quickly achieve massive productivity savings and extra insight into their simulations.
In the second half of the podcast (available on YouTube) we go through a “code-along-with Justin” example using a publicly available dataset.
We talk about:
- Using ChatGPT to massively increase engineering productivity
- The key steps required to perform any AI or Machine Learning project
- Some common pitfalls
- A way to rapidly assess which AI approach is most useful for your project
- Learning resources that will allow you to get started in AI today
This episode of the Engineer Innovation podcast is brought to you by Siemens Digital Industries Software — bringing electronics, engineering and manufacturing together to build a better digital future.
If you enjoyed this episode, please leave a 5-star review to help get the word out about the show.
For more unique insights on all kinds of cutting-edge topics, tune into siemens.com/simcenter-podcast.
This podcast is available to watch here (including the Part 2 Demo).
Here are the links to the resources mentioned in this episode:
- A bunny jumping on the back of a dog (created in Mid Journey AI)
- Kaggle Machine Learning Competitions – including the Digit Recognizer competition in which Stephen ranked 563
- Machine Learning Specialization Coursera
- Justin’s DataSet
- Lazy Predict
- Simcenter Studio
- Justin’s LinkedIn profile
Using ChatGPT to massively increase engineering productivity
The key steps required to perform any AI or Machine Learning project
Some common AI pitfalls
A way to rapidly assess which AI approach is most useful for your project
Learning resources that will allow you to get started in AI today
Hello, and welcome to the Engineer Innovation Podcast. With me today is a returning guest, who’s Justin Hodges. How are you doing, Justin?
Great, excellent. I love talking about AI and it’s always fun to chat with you.
The last time you were on the podcast I think was in, I’m going to say it was probably April. I think the episode came out in June as well, and I started that interview, I was trying to be provocative. You were on with Remi Duquette, who’s another expert in AI, and I was trying to be provocative and I was going to say, “What do you think about ChatGPT?” thinking that you were going to be dismissing it as a toy. But since then, I think the whole world has gone crazy with generative AI, with large language models.
And on this podcast, what we wanted to do, I think, is start off, we’ll have a chat about generative AI and what the current state of it is. But then, in the second half of it, you’re going to go through some real-life examples, because I think there’s only so much talking we can do about these things, and what we want to do is you want to encourage engineers to do some proper generative AI, practical AI themselves. And so we’re going to have some kind of code-along examples where people can follow what you are doing and do something productive and practical straight after or during this podcast. Does that sound like a plan, Justin?
We’ll go through an example that you can replicate on a different screen with the podcast on one screen and Python or Simcenter Studio on a separate screen. I made the example such that you don’t need to wait on me or request any files. You can just code along. And like you said, it’ll cover the basics.
We’ve mentioned the words generative AI, and this whole topic moves so quickly, it can be quite exhausting. Can you just explain for people listening who don’t know what generative AI means to you?
To make things, I guess that’s the simplest way, but at its core, it’s learning from vast amounts of data and it’s generating very complex high-fidelity models with minimal human intervention. And when I say make things, you’ve seen examples of ChatGPT, so it’s predicting and generating the next words in conversation, you’ve seen probably what feels like forever ago now with DALL-E, where it could produce images based on texts. So you could say, “Make a picture of a bunny jumping on the back of a dog,” and then it’ll make a picture for you and generate that for you. But it’s really just simple examples there. It could be music editing tools, social media content, apps, things like that.
So, really the purpose is about creating. And one fine point coming from CAE in mechanical engineering, aerospace, et cetera, is a lot of the times we focus on fitting a model to our data. And that simple exercise of saying, “I have this data, please fit a model,” wouldn’t quite qualify as generative AI or generative engineering. However, you can do a lot with those models, and in that case, you can use them in generative AI contexts where you actually do produce things like, “What’s the best design? Please tell me what the geometry should look like or what my flow field should look like.” Stuff like that.
Or helping you to write code as well, which is what I’ve been using it for, is I’m a 50-year-old guy, kind of FORTRAN 77 generation, I like to play around in Python, but I’m a rotten Python programmer. And so what I’ve been playing around with ChatGPT for is helping it to write Python code to solve some quite complicated problems. And in terms of productivity, my productivity as an engineer and a programmer, I’m going to say it’s at least five times, but I’m doing things that I didn’t think I’d be able to do, because instead of looking at Google, trying to figure out how to debug my code, how to work with a new library, what I’m doing is I sit down with ChatGPT, I describe the problem that I’m trying to solve, and usually you have to tell it not to write any code to start with while you’re still kind of describing the problem, and we have a conversation about the problem that I’m trying to solve, and then it helps me write the code.
On like four or five different projects which I’ve managed to solve in the last couple of months, which I’ve been thinking about for years, ChatGPT is writing almost all of the code for me. I literally don’t have to write a single line of code. Quite often it’ll make mistakes, so if you use the Code Interpreter version of ChatGPT, it’s quite funny, because you can see it making mistakes constantly, running the code itself in the background and fixing the mistakes. So it’ll make mistakes and we’ll work through those mistakes together. I’ll run the code, it won’t work. I’ll tell it what the errors are and it’ll debug the code. But I’ve got to some really, really interesting places with it and I’m just completely inspired by it. Yeah, it’s really, really changed the way I think about engineering.
Yeah, me too. There’s two quotes I’ll give, one silly, one less silly. One’s from Elon Musk years before ChatGPT, and he’s just talking about technology and cell phones, and he basically said we’re turning into cyborgs, right? Because with this little thing in your possession, you’re vastly more capable. And I think that this is a natural extension to continue that. And then the other one, I don’t remember what learning resource it was, but one of the online classes on generative AI, in kind of the opening preamble, they gave a really nice quote that I wrote down. It said, for applications of generative AI, “Lifting the dirty, dull, dangerous and difficult tasks from humanity’s shoulders so we can simply focus on the very essence of our work. The vision, the idea, and the purpose.” And I think that’s spot on. I think that that’s really what you just kind of said, is there’s a lot of stuff there that could take up the time and give you more time to focus on the vision, the idea and the purpose.
Let’s be honest as well, as simulation engineers, lots of what you have to do is dull and dirty, isn’t it? Building meshes, tuning under-relaxation parameters. There are still dirty bits that you don’t want to do, and really what you’re interested in is getting some data out and using that data to inform decisions as well. So anything you can do to kind of automate that and take the heavy lifting out of it I think is really useful, and I think that’s the point about AI. Lots of people are threatened by AI and you hear lots of conversations, we’ve had conversations, is this going to be the end of engineers? But the truth of it is there’s not enough engineers in the world. There’s not enough of us. There’s not enough engineering simulation in the world, and this is just helping us all to be many times more productive.
My other observation as well is that I’ve been using it for Python, but when I first started doing CFD 30 years ago, everything was on a command line. And so we had to learn the language of the CFD code, C set, new set, or V set, and I got really good at that, and for the first 10 years of my career, I was that guy typing in command lines, no point-and-click user interfaces. But people didn’t want to do that. That went away and people started using point-and-click user interfaces because that was easier. But actually, the way we interact with each other is through language, isn’t it? Instead of learning the language of the computer or learning how to use an interface, what I want to do is I won’t tell the computer in English or whatever language you speak, ask it to do the things that I want it to do.
And for me, that really is a big part of the future of engineering simulation, is having these conversations with an AI assistant that helps you implement things. I’ve found quite often that when I’ve got stuck with ChatGPT and I’ve asked to do things and it can’t do it, the problem is me usually. It’s because I’ve asked the wrong questions, and I’ve gone away and I’ve researched a topic and asked the questions in the proper way, and then ChatGPT comes back with some really good stuff as well. So I think there’s definitely a role for us, and I think it can make everybody more intelligent, I think that’s what artificial intelligence is doing, and more productive.
Yeah. Gosh, there’s so many thoughts. One is what somewhat was interesting, maybe like 10 years ago I was at an energy [inaudible] machinery conference and the keynote speaker was from a guy that owned essentially, or was in charge of essentially a large energy power plant grid management sort of company. And in his keynote, the first thing he said was, “What’s the most transformative thing in our industry that we should talk about for the next few years?” He said, “The iPhone.” Unrelated, but it completely changed the paradigm on how people expect to do their work, get their information, do their analysis, communicate, and the speed at which they do such. And I think that’s a bigger gap when you hear iPhone and energy management versus generative AI and the way that we use our software. But I think it’s the same sort of paradigm shift, right?
The reality is no one would buy software if it was limited to using command line now. And I think that will continually change more and more, and you can’t help but appreciate some of the cross-pollination from the mainstream generative AI that you see where now you can make apps in ridiculously fast amounts of time, exchange information, exchange models from different places and use them freely and in an open-source way. So, that’s one thing I would definitely say is a staple here in what’s at play, is this paradigm shift. The other point you touched on is cool because as far as we’re going to have to learn and change things and adapt to build up, people always say, “How can we trust AI? How can we trust generative AI?” And I think there’s probably three or four main things to talk about there, but one of them is controllability, or sorry, education.
I think we have the responsibility to promote responsible and ethical use, and that really relies on us. That means we have to understand limitations, best practices and stuff like that, and I think it’s cool now because you can ask anyone really who’s used ChatGPT to look at a few different prompts and they can probably reason out which one’s the older model, which one’s the new model. And so you already see people getting a really nuanced understanding of what the models do and don’t do. So yeah, it’s a freight
train, but it’s going in a good direction.
And I agree completely, but I think in order to do that, one of the first things you do is you have to stick your toe into the water and start using it. I think as engineers, we’re quite good at getting new technology and pushing all the buttons and trying to break it as well, so it’s about having some practical experience as well. And so on the last podcast, I asked you what’s some things people could do in order to start using AI and ML? And you suggested Kaggle as this library of competitions as well, and so last week sometime, I found myself browsing through that, found the competition-
… on digital image recognition, so recognizing numbers that are written down, and there’s a competition on there to do that, you can choose whatever technology you choose. But I go on ChatGPT, work out how to we import the data, ask ChatGPT how I would write a neural network, write a neural network, start playing with the neural network. And the important bit is ChatGPT is not doing that playing. It’s me trying to tune the hyperparameters to get good results, getting up to 96% recognition, which is I think top 1,300 in the competition, and then going back to ChatGPT and saying, “How can I get better?” And it says, “You should try a convolutional neural network.” So, downloading one of those, implementing that, playing around a bit, and then suddenly I’m at 98.5% recognition, top 600 in the competition.
I’ve done all this not understanding neural networks very much, not being a great Python programmer, but managed to do this in an evening, actually doing it in bed, which is a bad idea, because I can never ever sleep when I do these things. But I think things like neural networks, and we’re going to talk a bit in a minute about different types of machine learning models, they sound really intimidating because you think ChatGPT’s got billions of parameters and can I do anything on my computer? But absolutely, you can write a simple neural network, you could be up and running inside an hour, you might not understand all of it and you might not be an expert in it, but you can get running and get experimenting, and I think that’s incredibly valuable. It really is. So, I think that’s going to be the segue into the next bit. Let’s start talking about some of the ways that people can start doing machine learning.
Well, first of all, that’s amazing, your testimony there. I think you’re good at podcasts, but you might’ve missed your calling, because that’s a pretty impressive thing to do. I think I’d lose motivation and end up watching a documentary and falling asleep at the time that you got to your second iteration of your machine learning model. That’s awesome. Okay, so the question is how people can start using. Would you like to talk about that in the context of this world of computer-aided engineering in our industry and simulation?
What I want is everybody who’s listening to this podcast now or watching this podcast to go away and just start doing some of this stuff as well. So, basically, what I want to do is break down some of those obstacles to entry and demonstrate to everybody that this is technology which is useful now. And with the generative AI, you can get it to lead you through it as well, so you don’t necessarily have to read loads of textbooks or search loads of websites. You can be up and running within an hour or two I think.
Yeah. So, my go-to recipe for that, for just how do I lower the barrier to entry and get involved, I’m a huge fan of Kaggle and a huge fan of Google Colab, and basically you can use either for free compute. So as long as you have a computer that can access a web browser, then you can make a free account, get some limited amount of hours to run projects on either platform, and they even offer GPUs, so you’re not limited to a very, very slow small dataset problem. Like your experience with CNNs, you can use a GPU and start getting involved. And I love Kaggle in particular, not only so that I can constantly be humbled at the people who keep winning competitions and how I can’t win them ever, but because also it’s the best place to learn hands-on. People publish their notebooks and publish basically their scores and how they got the code they used to get their scores whenever they try to do a project.
The data is all available for open use, so you can completely log in for free, get access to a GPU, no setting up your environment, no setting up libraries and dependencies and anything. You get the data, you have discussion boards for popular competitions where people explain all the critical steps, and it’s a vast space. So if you want to specialize in time series forecasting, there’s probably a dozen plus competitions there that have really fruitful content. That’s the best way to get hands-on. And then my full spectrum of recommendations if you really want to learn and invest time to build up what you can do in machine learning as a non-expert is proceed that exercise with hands-on with the machine learning specialization on Coursera. That’s probably my favorite one and probably one of the most popular ones by members or students or whatever. That gives you just kind of a sample of everything I would say.
And then the reality is after you do the hands-on bit, pick a lane, pick an area. It’s pretty overwhelming, the advancements and how rapidly they happen in machine learning, so I think if you don’t try to specialize in one thing or spend a season of your life doing specific types of problems or specific types of models, you may remain kind of at the circus. Which is fine if that’s what you want, but people tend to probably gravitate towards certain subjects and then practice a bunch of competitions and tutorials and readings on that little refined space. So yeah, I mean at a high level, that’s kind of what I would recommend to get started, and it’s very achievable.
Obviously the most of who we talk to, and our group at Simcenter is simulation and test products and experts, people with mechanical engineering backgrounds and aerospace backgrounds, and while the majority of our coursework for us does not involve any formal programming, Python’s a great language to learn, and also we’ve dealt with very complicated things like partial differential equations that have no solutions and numerical approaches that are the best way to approximate those. I mean, writing those codes is very, very complex and there’s plenty of empowerment that we should feel having spent time in that area to come into this and approach machine learning as stuff that’s similar to regression, which we’ve heard before and we’ve learned before, and that’ll be a segue when we show the hands-on bit. I’ll show kind of the union of our maybe traditional regression experience to machine learning and how there’s probably more overlap than we may realize.
And I think as engineers as well, we’re used to using models, which can be hard to use, don’t always work first time, sometimes you have to kick them a few times before they produce anything useful. And so all those skills are useful for machine learning, as is the ability to work with large datasets as well, because we’ve been in the big data game for as long as CFD’s been around for 30 years, and so we’re used to dealing with data, cleaning data, and so those are all really helpful skills. Because there’s a lot of data cleaning involved in machine learning as well, is what I found.
Yeah, and the analysis tools we’re used to to look at our results from big studies or simulations, it’s not that different. There’s transferable knowledge there into machine learning on how we can analyze data. And a lot of that is done in the steps in the beginning of the process where you’re analyzing your data to make it suitable for machine learning models, and we’ll touch on that a bit in the hands-on bit that I’m super excited about for the second half of the podcast, but you’re right, people should feel empowered.
Should we talk about that now then? Should we talk about the steps? Are you going to take us through the steps you need to go through in a typical AI problem?
Yeah, sure. The steps here are kind of like a flow diagram. There’s a lot of different pieces to a certain block, depending on your problem. You may have a classification problem or a regression problem, and the reality is obviously the prep work for either is different, but if you try to abstract to a very high level, I basically have it to six steps or five steps. And the first one is really problem definition, and this is so important because especially in the last three years, when other industries started taking on machine learning and there was a learning curve, a huge amount of time in meetings and in SOWs and clarifications was on what data we have and what problem we want to solve. And that’s really just about deciding on the business value and the business problem you’re trying to answer.
And in the case of, say, CFD simulation, you could be saying, “I want to make predictions for this sets of geometries, and I’m okay with the model having these limitations.” Because you can’t take it and use it everywhere, so you kind of have to have an iterative discussion about what data you’ll have and what insights that would offer you later once you have the model. So, the problem definition is the compromise on both of those and then realizing does that answer a business problem? Does that give me a return on investment to set up the model? And that problem definition step is key in the first step.
Now, the second being data collection, so this one’s not an extensive one to talk about until you get into the details of, well, sometimes it could be physical measurement data, sometimes it could be simulation data, which could be images, CSV files, pictures, could be all kinds of things. But in essence, you just agreed upon the mandate and charter and vector of the problem and how you’re going to solve it. You collect the respective data and there’s some analysis tools in there to just make sure that you’ve done what you hoped for and collected the right data.
Now, I’ll call it step three A, B, and C, because it’s very related in terms of you may have to cycle through these 3A, 3B, 3C, but they’re separate steps. The first one would be data preparation. The analogy would be if I’m running, let’s go back to a computational fluid dynamics problem, we’re running a bunch of simulations and the goal is really to ensure safety and a good performance of the component that you’re designing. So, obviously you’re going to notice outliers in terms of the performance, in terms of the geometry, in terms of the operating conditions that this part could experience, and you really want to run a study to cover all of that. So, in the same way that you would want to have edge detection or outlier detection, or you’d make this decision on, well, do I focus on the average behavior or do I focus on the extreme behaviors? Or maybe it’s an aircraft and you want to focus on the extreme behaviors because that’s where you may have failure of some kind.
So, in the same way that you sort of judiciously decide how you’re going to look at your data and samples, and maybe excluding some or including some, is also a very relevant step in machine learning pipelines for data preparation, because the reality is you’re going to fit a model and do you want it to capture the mean trends or where all the points are densely populated, or should you include the outliers or not? Those will have definite impacts on your results. And there’s all kinds of things that fold into data preparation. Without being too long-winded, maybe your data’s highly skewed, maybe the data you collected isn’t ideal and you’d rather transform it or do those sorts of things that just kind of make the models better to digest the data, make better predictions.
So, you prep the data, as I said, it’s kind of like a coupled step like 3A, 3B, 3C. You prep the data, you do some model training, and you iteratively look at different models, different model hyperparameters or settings, and after that, you evaluate. People break their datasets down into chunks and they use some for training, some for testing, and some for, at the end when it’s all said and done, how does it perform? And in this sense, it may be iterative that you realize, “Oh, it didn’t perform that well. So, why? Oh, well, I didn’t apply this sort of manipulation to my data, or I had missing values, or my data’s causing some sort of other issue. Let me tweak it.” And then you, again, fit the models and then evaluate. So, you kind of cyclically do that until you’re happy, make it as interpretable as possible.
And then the last step is you deploy it, you use it, you take advantage of it, you’ve made this thing, now go realize the value of it, whether you’re using it as a reduced order model and you put it in a digital twin or an executable digital twin, or whether you’re going to deploy it and have the model managed over time in your internal system for when others call on it for predictions. So in that sense, to recap, define the problem, collect the data, iteratively go through tweaking the data, training and looking at the results, and then once you cycle through and you’re happy with if you’ve satisfied your initial business objective problem definition, then you send it off and deploy it and manage it.
So, in terms of effort, engineering effort, where do you have to concentrate your effort? What’s the hours split or the proportional split between those different stages? Is that easy to say or not? Or does it depend on the problem?
It’s a dirty question because there’s no one size fits all, but life is full of dirty questions, so it’s a perfectly good one to ask. Most people would say that they spend not, I don’t want to say the least amount of time, but probably not the most amount of time on picking different models to use for training and tweaking those models to be ideal in terms of optimizing the settings or the hyperparameters. Usually the other steps are more time-consuming. So, in a case where you have no data and you have to generate simulation results, that could be time-consuming, or maybe lesser so if you have a bunch of data existing and you just have to spend time to ask different team members what the sampling was for those different sets of simulations they ran, collect them into one place, analyze them, stuff like that. But usually the beginning steps take more time than just simply training different models and looking at them.
So, I guess that the time that you invest in one, two and three, or 3A, will pay you back double in 3B, 3C and four. I think that’s the point, isn’t it? Is that you need to put the upfront work in making sure you’re working with the proper data rather than have to troubleshoot at the backend. Like when you’re doing a CFD problem, if you don’t build a good mesh, well, you can try and fix that by changing under-relaxation parameters and try and deal with the errors downstream, but you’re better off investing the time in building a good mesh. I think it’s probably true that you need to spend a lot of time on one, two, and three, so you can pay off at the end of the process as well.
Yeah, that’s a theorem that we all learned in school at some point, garbage in, garbage out. So, it’s no different to a CFD model, to collecting physical test measurement data, to machine learning model fitting. But there is a nice thing here, is that for a few years it was all about big data, bigger models, bigger datasets that the models can digest. So just get a ton of data, fit it to a huge model, it’ll cost tens or hundreds of thousands of dollars to do this as far as the electricity bill. But definitely lately, however you want to call it, but maybe a year or two ago from the present time, there’s been a huge shift to smaller models, smaller data. I think there’s a big quote, “The era of big data is dead,” where it’s all focusing on lesser data.
So for us, that’s nice, because that means as pioneers in the field make more and more advanced models that can handle smaller amounts of data, maybe you need less data as far as the simulations you have to go off and run. So, it may not be as intimidating as it may seem, and it certainly depends on the problem. Some problems don’t need very much data. Some problems, the data’s cheap, sometimes it’s not. Sometimes it’s a pain, it’s expensive, it’s big, clunky, but it’s not always. But what about in your case? You’re a Kaggle grand master now. So, give us some postmortem on that experience. I mean, where did you spend the most amount of your time?
Doing the Kaggle data recognition thing, it was because the dataset is a well established dataset, so I just had to download it and use it and train it. The first problem I started on, which is kind of an engineering problem, but where I live, they’ve just changed the flight paths for aircraft and so we have lots of airplanes flying over our house making lots of noise, and a friend of mine’s got an aerial up and is recording, tracks all the airplanes, and we had 18 gigs of data for about a year of aircraft flying over it. And I thought, “You know what? It would be fun to get that data for any given day and plot it on a graph so I can see which aircraft are going well.” And so I did that, and you look at the sky above your house, and I live in a rural area, but there’s hundreds and hundreds and hundreds of tracks going all over the place.
And so I decided how am I going to decide which of these airplanes are flying to Stansted or are flying to Heathrow, are flying to Luton? Different airplanes at different heights. And if I tried to do that myself, I’d have to look at the altitudes, look at where they were going. So what I decided to do was, firstly, I had to split up 18 gigs of data into usable chunks so I could read it into my model, and so basically I started off by working on one day’s worth of data rather than trying to do it for a whole year’s worth as well. Then I split it up, I read it in, I found a way of classifying some of the flights.
So if you look at, there’s some databases of aircraft where you can go in and I wrote an interface to an API, so I could track, I look at the hex ID of an aircraft and work out where some of them were going to, and then I just drew a box in the sky, and for the ones that I knew, I trained a neural network, a simple neural network, which I think was two layers, about 16 neurons, not a lot, it was really simple, to look at the coordinates of where airplanes entered that box and the altitude of where they entered the box and where they left it. And from that, I could with 99% accuracy within a few hours predict any aircraft flying over my house.
And so this spaghetti map of all different aircraft suddenly turned into a color-coded map where I could see which aircraft were going in which direction, which for me, and again, I did this at 11 o’clock at night and didn’t sleep because I was so excited all night that I’ve managed to solve this problem which would’ve taken me months, if not years, to try and do it manually. And I did it in a few hours by just playing around with neural networks. And so up until then, neural networks were things that other people had used, but using ChatGPT to say, “I’ve got this problem, a classification problem, classic machine learning problem, how would I set up a simple neural network?” And it just did, it wrote the simple code for me, and I fine-tuned it in a few hours and it just really works, and then I applied that set to a year’s worth of data and have published the maps everywhere, and it’s been really useful for people as well.
So, that was my biggest challenge, but it’s always… So, the data preparation step for me and getting, I’m more and more aware that most data is dirty in some way, and so cleaning up that data, splitting it up, getting it into a form that the neural network could find useful was the most difficult bit for me but also most rewarding in terms of return on investment as well, because once you’ve done it, you can do lots of clever things.
That’s cool. It’s clear that we have successfully converted you to a machine learning soul.
Man, I’ve gone down the rabbit hole
So, it’s always a pleasure to see your passion, but what’s interesting is-
I’ve gone down the wormhole, it’s an obsession. I blame you completely, you and Remi for this completely, because before I talked to you, the first two times I talked to you guys, I was talking to you guys as experts, and there was never any prospect that I was going to start doing this myself. And now I spend most of my leisure time. I need to kind of dial it back a bit because it’s getting a bit too much, just because there’s all these problems that I wanted to solve and I’ve had ideas in my head and I couldn’t solve them, but all of a sudden, overnight I can solve them all because I’ve got the programming skills through ChatGPT, and I can tack machine learning onto them as well. I’m evangelical in my love of machine learning and AI, so I thank you for that, Justin. It’s a good thing and bad thing.
Good. Now, it’s always a pleasure to talk to a fellow nerd, and that’s a cool testimony, but I’ll just point out one thing that’s interesting, is you’re right, you didn’t spend that much time researching the best model or doing more complicated manipulations or combinations of models to come up with the best answer, and you probably, if you took it to the next level, you’d spend more time on steps in the process that aren’t related to model fitting. You’d probably say things like, “Okay, let me zoom out and look at the problem definition,” and then certain things become very important.
Like, okay, what percentage of airplanes that go through there are American Airlines passenger flights, real big ones? Probably the majority. Okay, well, then how does your model fare when you have a tiny corporate jet or a private plane? It could be a completely different sort of behavior that you miss because maybe only 1% of flights do that. So, then you have to do these techniques, like in that case, if you’re running a classification problem, what’s called class imbalance. You have 99% of classes are these commercial passenger planes that anyone can book, and then 1% of them are business jets for like six people. And so you have to zoom out and think about at a high level the business problem that you’re trying to solve with machine learning, and then things like that become very important.
One reason I love having the blessing of a job that is focused on machine learning in CAE is instead of me thinking about in my day-to-day work passenger planes and versus business jets, I’m thinking about things I learned in grad school, from physics and from engineering and thermodynamics and heat transfer and stuff. And so it’s cool to be able to look at machine learning from a physics perspective and then try to ascertain if things are being conserved, if things are being, like these core concepts and things we know from engineering are being preserved in the machine learning. And that’s not always possible, but I mean, it’s just a union of two fun things, right? So, thanks for sharing your story. It’s also illustrative towards the steps in the pipeline that people take when doing a machine learning exercise, I think.
This episode of the Engineer Innovation Podcast is powered by Simcenter. Turn product complexity into a competitive advantage with Simcenter solutions that empower your engineering teams to push the boundaries, solve the toughest problems, and bring innovations to market faster.
Stephen Ferguson – Host
Stephen Ferguson is a fluid-dynamicist with more than 30 years of experience in applying advanced simulation to the most challenging problems that engineering has to offer for companies such as WS Atkins, BMW and CD-adapco and Siemens. Stephen’s experience of AI and ML is limited to late night experiments trying to convince ChatGPT to do something genuinely useful.
Justin Hodges – Guest
Senior AI/ML Technical Specialist, Product Management at Siemens Digital Industries Software. He has a bachelor’s, master’s, and Ph.D. in Mechanical Engineering specializing in Thermofluids and a passion for AI and ML.
Take a listen to the Justin previous appearance on the Engineer Innovation Podcast: “Adapting to a New Era of AI”
Engineer Innovation Podcast
A podcast series for engineers by engineers, Engineer Innovation focuses on how simulation and testing can help you drive innovation into your products and deliver the products of tomorrow, today.