Thought Leadership

Understanding information with AI Podcast – Transcript

Interpreting different types of information is a task humans are inherently good at with just a little guidance and, from there, conclusions can be drawn and connections made. Artificial intelligence by contrast requires much more training but is capable of analyzing information and building connections in wholly different ways than humans, allowing for a novel perspective on key data.

Check out the full episode here or keep reading for a transcript of that conversation.

Spencer Acain: Hello and welcome to the AI Spectrum podcast. I’m your host, Spencer Acain. In this series, we explore a wide range of AI topics from all across Siemens and how they are applied to different technologies. Today, I am joined by Dr. James Loach, head of research at Senseye predictive maintenance. During the last episode, you gave a really great explanation of what foundation models are, and more specifically what the Time Series foundation model that you’ve developed at Senseye is and what it does. But I’d kind of like to know now, is this something you could have done without a foundation model? Like is, is this like a leap forward for you or is this just like the next step, natural step to take things as kind of like a text retrieval approach that of an LLM and then apply that to something like, you know, time series data?

James Loach: Yeah, I think it’s quite analogous to what’s happened with language, right? So, you know, pre-transformer language models are able to do all kinds of language-related tasks, you know, with some ability, right, and some usefulness. But what you saw with the transformer architecture and the scaling of models that, the scaling laws that held, you know, following this is really this massive, massive jump in capability, right? So these models are just much, much better and more useful than things that went before. But, you know, the things that they can do in some way are not different in kind, you know, you can see the things prefiguring what they can do. And with time series models, like everything I’ve just said to you is things that Senseye has done for years, you know, we’ve done anomaly detection and trend detection, we’ve done time series matching, you know, we did forecasting with quite a nice statistical model. But what, yeah, these models give you is a step potential and, you know, actual in case of forecasting, like this big step up in performance. And you see this, so building a statistical forecasting model that can deal with any kinds of random crazy machine data and produce reasonable forecasts is really quite difficult and we spend a lot of time. Yeah, I imagine that, you know, that we’re quite pleased we built it, right? And it would have been something like state of the art, yeah. But still, like if you look at it, you know, afresh, right? You can see the limitations of it all over the place, right? Because, you know, it’s a model that’s got some kind of basic view of the world that, you know, time series are composed of spikes and trends and seasonalities and it’s trying to assemble what combinations of these best fits the data. And it’s often struggling to do that, you know. And it was reasonable quality and very useful for users and things. But, you know, you look at it and, you know, with a pen on paper, you know, any person can really do as forecast as good as that system or better. Yeah. Right, you know, you can see the limitations, but when you go to, um, a good time series foundation model, you get situations like we have in Senseye, like where every forecast is really good, right? You know, and it’s a massive difference, and it is very weird as well because you know when you launch this and give it to customers, yeah, it somehow doesn’t have a very big effect, like in terms of feedback and things like this, you know, because it’s just doing forecasts; every forecast looks good and reasonable, and it just works, and it’s something that it basically fades into the background, you know? It does exactly what you’d expect it to do. It does it really well, and then it kind of disappears, right? You know? Which somehow is what you want, but the previous system, yeah, you could always look at those forecasts, and you could just kind of see in your head how it could be a little bit better, right? So what was not quite right about it or things? But that feeling is gone, right?

Spencer Acain: Yeah, it’s just not very impressive until you take a step back and look at how good it is. And you’re like, wait, this is incredible. This is exactly what I always wanted out of this. And that’s that intelligence, I guess, that you could really only get from kind of a transformer or large foundation model.

James Loach: Yeah. And with language models, you know, I think, yeah, we’re only just beginning to sort of calm down in a way and get used to that existing. Yeah, but you know, the existence of those things compared to the world before. I mean, it’s just such a massive, massive difference. And I don’t know, I don’t know that people our age will ever fully get used to the fact that these things exist as. You know, the same way that kids coming up will, yeah, just accept it. It’s just a thing, right? And it’s really crazy. One thing I think is very strong is the Turing test, you know? I remember about this as a teenager and it’s this big thing and what have you. Yeah, and it, you know, language models just blow past it, right? You know, and it’s like, no one has any interest in the Turing test anymore. It seemed like a thing. And it turned out that actually, with enough processing power in this particular architecture, yeah, perfect grammar and perfect conversation just happens, boom, right? And…

Spencer Acain: All right. Well, it sounds like there was, there’s, you know, a lot of benefits that you got out of telling for the route of taking, going for this, pursuing this time series foundation model. But what were some of the challenges? I mean, it can’t have been easy to, you know, go from that base model you mentioned to the final result where it’s just so seamless it blends right into the background. So what were some of the hurdles you faced along the way?

James Loach: Yeah, I mean, so the basic analogy is very strong. So these models should work, okay? But you can’t, you can’t build them in like an absolutely straightforward way because essentially there’s no equivalent to words or bits of words for, you know, continuous time series values. The vocabulary for a language model can be finite, yeah. But for time series, you know, it’s essentially infinite, okay? And so this is, when you’re building these models, I think is the main question, right? Like, what is your element? What’s your equivalent to the token? And there are different approaches to this. You know, people use like individual data point values, like, you know, discretizing them. Or people use like little patches of time series and things like this. Yeah. And then, you know, there are other model modifications that you have to make to make this kind of work for time series. But when we first came aware of this, there were sort of early attempts to do this, you know, that claim to work well and do well on metrics and things like this. But you know, our experience in testing these models, initially and for most models actually still today, you know, is that the real performance away from metrics is quite poor. You know, these things can get good numbers compared to statistical models. But you know, when you look at individual forecasts, yeah, a lot of these models still produce things where for some fraction of time series it is poor right and feels not you know not acceptable for our products right you know so one thing though is you have this strong analogy and you have a lot of people trying things a lot of things that look promising in terms of numbers but when you see the things yeah the performance can be can be poor right and you have other things going on as well right so there are certain classes of models that generate outputs, which again can look nice, but cannot be useful, right? So for example, in the Senseye use case, we need probabilistic forecasts, right? Because, you know, your threshold might be 10 amps. That might be something that you’re quite happy to fix. And you’ve got some data. And what you want to know is the probability that you’re going to hit that threshold. Obviously, it would be nice to know for certain, whether you would or you wouldn’t, right? But the world as it is, that’s not, that’s just not how it is, right? You know, there are many possible futures for a time series. And you know you want to understand where it’s likely to go and how strongly the values are clustered and things like that. You basically want probabilistic forecasts, but a lot of these models would only produce what you call point forecasts. It’s just taking a line and producing a line, okay? And so that can look nice, but it can not not be very useful for practical things. And then the third class of situation that you get is where models do seem to get it, right? You know, produce consistently high-quality forecasts, but you know have what we would think as some kind of fatal flaw, right, for our use case, right? Anyway, so, you know, looking at the models that exist, you have that sort of set up. And, you know, we approached that initially, like I said, by trying to build our own models. So like some of this, you know, it’s got to work, right? And, you know, the scale of these models is such that an organization like Senseye or, you know, my small team can afford to build these kinds of models, you know, so we try to build them from scratch. And we got into a situation which you often do when you, you know, you’re doing this game of machine learning in general. You know, you have to do a certain amount of training, there’s a certain amount of cost, you get results of a certain quality, and you have to be making bets as you go along, you know, as to whether it’s willing to invest more money in this training and retraining to whether you really believe that it’s going to get the kind of quality that you want. Okay, very often you’re making these trade-offs and often it has to be quite, quite intuitive. But we actually abandoned our first approach because we didn’t think that the payback was there, you know, in their sort of quality cost. And then we picked up on that third class of models that I mentioned, right? We came across the model, the AWS Kronos, which worked in a very simple, elegant way, and kind of got time series, produce very good probabilistic forecasts, but didn’t have certain things that we needed, right? And then our approach was to take that base model and then to add on these things, basically, right? So we specialized it to machine data of the kind that we have. And we set the model in a sort of larger system that allows it to do long-term forecasts, you know, because we, you know, we want to forecast over days or weeks, which is something the model wasn’t able to do natively. And yeah, another property of that model, which is quite common, is it’s mean reverting, you know, so very reluctant to follow trends in the data. And of course, for our use case, we want trend following. Yeah. So we ended up taking that model, building this kind of system around it, and heavily fine-tuning it and specializing it on industrial data and data with very strong trends in it of all different kinds. And then yeah and distilling it down making the model smaller you know you’re trying to retain its properties such that it you know could serve Senseye in a sort of affordable way right so we were yeah we started off doing a kind of first principles approach decided that we weren’t going to make it at some point and yeah at that time had another route open up which is to take a base model yeah and to make it work for our use case using all kinds of tricks and games and things one of which we have a patent submitted on right and yeah and that was the path that worked out okay so just to give you a sense of the way these things work in practice yeah

Spencer Acain: Yeah, I mean, it sounds like there’s always, you know, there are risks, costs, tradeoffs, right? You know, there’s, especially in such a cutting-edge field, you know, it’s sometimes you’ll follow it started on a path and then it’s like that this doesn’t work out. It’s not the right way and you have to backtrack before you’ve wasted too much time and money on it. Indeed. But that is all the time we have for this episode. So once again, I have been your host, Spencer Acain, joined by Dr. James Loach from Senseye. Tune in again next time as we continue exploring the exciting world of AI.


Siemens Digital Industries Software helps organizations of all sizes digitally transform using software, hardware and services from the Siemens Xcelerator business platform. Siemens’ software and the comprehensive digital twin enable companies to optimize their design, engineering and manufacturing processes to turn today’s ideas into the sustainable products of the future. From chips to entire systems, from product to process, across all industries. Siemens Digital Industries Software – Accelerating transformation.

Spencer Acain

Leave a Reply

This article first appeared on the Siemens Digital Industries Software blog at https://blogs.sw.siemens.com/thought-leadership/understanding-information-with-ai-podcast-transcript/