Predictive Maintenance: Common Challenges & How to Overcome Them

In this video,  AI Solutions Portfolio Director, Ramon Perez, explains some of the common challenges of predictive maintenance, particularly the lack of labeled cases, signal-to-noise problems, and the difficulty in interpreting results. To handle these challenges, Ramon discusses several strategies which include redefining failure upstream, creating a labeling mechanism with human feedback, building a system with various model types, and incorporating visual summary statistics to aid decision-making. He highlights the need for an augmented intelligence approach (“human-in-the-loop”) in solving predictive maintenance problems.

More on Predictive Maintenance: Spotlight on Aegir

Aegir is a machine learning solution that enables hydropower producers to monitor the performance of their assets and take preventative maintenance measures prior to failures.

The energy industry is dependent on massive, costly machines. Energy companies must ensure uninterrupted delivery to maintain consumer confidence. In the industry of hydropower, standard-sized turbines generate more than $50,000 per day in electricity, which represents substantial opportunity costs.

Learn how Aegir Lowers Costs

Video Transcript

Hey guys, I’m Ramon. I’m coming to you today to talk about predictive maintenance.

Why is it So Hard?

And the question I’ve got for you all is, why is it so hard? I was talking to one of our clients in the aircraft industry about the challenge of fan blades coming off on an airplane. This is obviously an extremely dangerous scenario. Incredibly rare. Airplanes fly millions and millions of miles between these types of events. But trying to make a prediction of this type of rare event and optimize maintenance around it is quite a challenge.

Lack of Labeled Cases (aircraft)

So what makes predictive maintenance so hard? First of all, as I described, the lack of labeled cases. You might find, like I mentioned in aircraft, you have sensors inside of an airplane engine, which are producing enormous amounts of information, but the actual event that you’re trying to predict is incredibly rare. So that’s problem number one. We see that in a ton of different types of predictive maintenance cases.

Signal to Noise Problems (power plant)

Signal-to-noise problems. In our work in hydroelectric power plants, we’re trying to predict when equipment could fail. We have thousands of sensors. So a particular power plant might have 5,000 sensors. You could see some of these sensors are sampling at 200 hertz, so you get 200 records every second. Some of them are sampling once a second, some of them once a minute, some every 10 minutes, some are a couple hours, some you might not see a record for days at a time, depending on the type of sensor. So just parsing all of that data and actually mapping that back to maintenance records that show particular types of failures, degradation of the equipment and the parts, that’s a real challenge.

Hard to Interpret Results (where do I begin?)

The signal noise problems, we spend quite a lot of time trying to deal with. And hard to interpret results because ultimately what you want is to feed a decision process, where if you’re going to say that a particular part inside the power plant is likely to fail, or that this particular aircraft engine is likely to lose a fan blade, you wanna give as much early warning as you can, but you also want to help the maintenance team know where to start. So in this case, where do I begin?

And we’ve got some wonderful modeling types that can be incredibly valuable at making predictions. In the deep learning realm, there’s long, short-term memory models that have become quite good for time series data on some of these sensors, but they tend to be very hard to interpret.

So what the maintenance team wants is something that actually can help them understand, what did the model think was the cause of this particular problem? And so I know where to go and look to make the fix. Or maybe the model says there’s a problem here, but I need to go into the plant and get some more information to determine if there really is a problem, and what happens next. So interpretability becomes really, really important.

How to Handle This?

Redefine Failure Further Upstream (heart attack)

Now, there are things to do to try to handle this. This basically comes from our experience in the industry, but redefining failure further upstream. So I work with a life sciences company that’s trying to predict heart failure, heart attacks, and it’s obviously not moral to just put a bunch of sensors on a bunch of people and then just wait for them to have a heart attack just so you’ve got great data for your models. You can’t do that, but what you could do is say, “Let’s look a little further upstream of a heart attack.” So are there particular kind of wobbles that we see in the dataset that are anomalous? So can we establish some sort of baseline of normality and start introducing anomaly detection earlier on, and perhaps redefine that as an early indicator of a problem?

Create a Labeling Mechanism (feedback loop)

So, in the case of heart attack right now, this is going to require some subject matter expertise, because basically, instead of a supervised model, you’re looking at an unsupervised model. You are going to say, “Well, we’re gonna represent these anomalies but we may need a doctor or some sort of input from an expert to help us label these.

So what we’ve done is kind of a synthetic label if you will, that can be fed back into a model later in sort of a semi-supervised approach, creating a labeling mechanism. So in the power planting example, one of the things we do is a model might say, well, we see some indicators of risk. Give that to a human. The human goes into the plant, takes a look around and says, “Okay, well, this looks like a real problem or this is completely understandable. I know why the model said that this is a problem but we have taken a look and we know that it’s not a problem. So that feedback loop, basically we want a mechanism where the humans can take a look at the model results, go work the problem, come back and provide their feedback to the model so that model can actually think of that as now a labeled case.

So you are now labeling as you go over the course of time, you feed that into your model so your model gets better and you get more towards a better predictive model or supervised learning. So that’s another mechanism where you can take unsupervised model towards a supervised model if you introduce that feedback loop, building a system with various modeling types.

Build a System with Various Model Types (hierarchy)

I think this is a really important thing to consider is that you’re not likely going to have one model that rules them all, like Gollum in the “The Lord of the Rings” here. And so you need to start thinking about a system of models, some local models, some global models. One of the things we’ve done in power plants is look at maybe a generator-specific model that just tries to interpret behavior from that generator but also a global model because a single generator may not give you as much signal as you need. So maybe incorporating lots and lots of generators together. Also anomaly models. Also some subject matter-driven models where if the subject matter experts say when this happens, plus this happens, plus this happens, we know that’s a bad situation. So generating the work or something to that effect or use that as an engineer feature. So you really wanna think of this kind of a hierarchy of modeling types can work together in a solution that again, introduces a feedback loop and incorporates a human in the problem.

Provide Visuals, Summary Stats, and Other Info to Aid Decision Making (solution)

And then lastly, because you really wanna think about this as a human-in-the-loop type of problem, you wanna introduce visual summary statistics and other information that can help in guiding the decision-making process. So this really comes to think of this as a solutionto the problem rather than a model.

And this is a challenge for a lot of data scientists because we want to think about how can my model solve this problem? But that’s really not gonna get it. In this particular case, predictive maintenance is really, really challenging because it tends to be very rare. It can oftentimes be very expensive. So the results need to have a human in the process who can make very good decisions. And so we wanna think about augmenting human and that person’s natural creativity and experience with the speed of a machine. So we really wanna get towards an augmented intelligence type of solution for solving predictive maintenance problems.

So thank you so much for your time and I hope that’s been helpful.