Similarity-Based Remaining Useful Life Estimation

Open Live Script

This example shows how to build a complete Remaining Useful Life (RUL) estimation workflow including the steps for preprocessing, selecting trendable features, constructing a health indicator by sensor fusion, training similarity RUL estimators, and validating prognostics performance. The example uses the training data from the PHM2008 challenge dataset [1].

Data Preparation

Since the dataset is small it is feasible to load the whole degradation data into memory. Download and unzip the data set to the current directory. Use the helperLoadData helper function to load and convert the training text file to a cell array of timetables. The training data contains 260 run-to-failure simulations. This group of measurements is called an "ensemble".

url = 'https://ssd.mathworks.com/supportfiles/nnet/data/TurbofanEngineDegradationSimulationData.zip';
websave('TurbofanEngineDegradationSimulationData.zip',url);
unzip('TurbofanEngineDegradationSimulationData.zip')
degradationData = helperLoadData('train_FD002.txt');
degradationData(1:5)

ans=5×1 cell array
    {149×26 table}
    {269×26 table}
    {206×26 table}
    {235×26 table}
    {154×26 table}

Each ensemble member is a table with 26 columns. The columns contain data for the machine ID, time stamp, 3 operating conditions and 21 sensor measurements.

head(degradationData{1})

    id    time    op_setting_1    op_setting_2    op_setting_3    sensor_1    sensor_2    sensor_3    sensor_4    sensor_5    sensor_6    sensor_7    sensor_8    sensor_9    sensor_10    sensor_11    sensor_12    sensor_13    sensor_14    sensor_15    sensor_16    sensor_17    sensor_18    sensor_19    sensor_20    sensor_21
    __    ____    ____________    ____________    ____________    ________    ________    ________    ________    ________    ________    ________    ________    ________    _________    _________    _________    _________    _________    _________    _________    _________    _________    _________    _________    _________

    1      1         34.998            0.84           100          449.44      555.32      1358.6      1137.2       5.48           8       194.64      2222.7      8341.9       1.02         42.02       183.06       2387.7       8048.6       9.3461        0.02          334         2223           100        14.73       8.8071  
    1      2         41.998          0.8408           100             445       549.9      1353.2      1125.8       3.91        5.71       138.51      2211.6        8304       1.02          42.2       130.42       2387.7       8072.3       9.3774        0.02          330         2212           100        10.41       6.2665  
    1      3         24.999          0.6218            60          462.54      537.31      1256.8      1047.5       7.05        9.02       175.71      1915.1      8001.4       0.94         36.69       164.22         2028       7864.9       10.894        0.02          309         1915         84.93        14.08       8.6723  
    1      4         42.008          0.8416           100             445      549.51        1354      1126.4       3.91        5.71       138.46      2211.6        8304       1.02         41.96       130.72       2387.6       8068.7       9.3528        0.02          329         2212           100        10.59       6.4701  
    1      5             25          0.6203            60          462.54      537.07      1257.7      1047.9       7.05        9.03       175.05      1915.1      7993.2       0.94         36.89       164.31         2028       7861.2       10.896        0.02          309         1915         84.93        14.13       8.5286  
    1      6         25.005          0.6205            60          462.54      537.02      1266.4      1048.7       7.05        9.03       175.17      1915.2      7996.1       0.94         36.78       164.27         2028       7868.9       10.891        0.02          306         1915         84.93        14.28        8.559  
    1      7         42.004          0.8409           100             445      549.74      1347.5      1127.2       3.91        5.71       138.71      2211.6      8307.8       1.02         42.19       130.49       2387.7       8075.5       9.3753        0.02          330         2212           100        10.62       6.4227  
    1      8         20.002          0.7002           100          491.19      607.44      1481.7      1252.4       9.35       13.65       334.41      2323.9      8709.1       1.08         44.27       315.11         2388       8049.3       9.2369        0.02          365         2324           100        24.33       14.799

Split the degradation data into a training data set and a validation data set for later performance evaluation.

rng('default')  % To make sure the results are repeatable
numEnsemble = length(degradationData);
numFold = 5;
cv = cvpartition(numEnsemble, 'KFold', numFold);
trainData = degradationData(training(cv, 1));
validationData = degradationData(test(cv, 1));

Specify groups of variables of interest.

varNames = string(degradationData{1}.Properties.VariableNames);
timeVariable = varNames{2};
conditionVariables = varNames(3:5);
dataVariables = varNames(6:26);

Visualize a sample of the ensemble data.

nsample = 10;
figure
helperPlotEnsemble(trainData, timeVariable, ...
    [conditionVariables(1:2) dataVariables(1:2)], nsample)

Working Regime Clustering

As shown in the previous section, there is no clear trend showing the degradation process in each run-to-failure measurement. In this and the next section, the operating conditions will be used to extract clearer degradation trends from sensor signals.

Notice that each ensemble member contains 3 operating conditions: "op_setting_1", "op_setting_2", and "op_setting_3". First, let's extract the table from each cell and concatenate them into a single table.

trainDataUnwrap = vertcat(trainData{:});
opConditionUnwrap = trainDataUnwrap(:, cellstr(conditionVariables));

Visualize all operating points on a 3D scatter plot. It clearly shows 6 regimes and the points in each regime are in very close proximity.

figure
helperPlotClusters(opConditionUnwrap)

Let's use clustering techniques to locate the 6 clusters automatically. Here, the K-means algorithm is used. K-means is one of the most popular clustering algorithms, but it can result in local optima. It is a good practice to repeat the K-means clustering algorithm several times with different initial conditions and pick the results with the lowest cost. In this case, the algorithm runs 5 times and the results are identical.

opts = statset('Display', 'final');
[clusterIndex, centers] = kmeans(table2array(opConditionUnwrap), 6, ...
    'Distance', 'sqeuclidean', 'Replicates', 5, 'Options', opts);

Replicate 1, 1 iterations, total sum of distances = 0.331378.
Replicate 2, 1 iterations, total sum of distances = 0.331378.
Replicate 3, 1 iterations, total sum of distances = 0.331378.
Replicate 4, 1 iterations, total sum of distances = 0.331378.
Replicate 5, 1 iterations, total sum of distances = 0.331378.
Best total sum of distances = 0.331378

Visualize the clustering results and the identified cluster centroids.

figure
helperPlotClusters(opConditionUnwrap, clusterIndex, centers)

As the plot illustrates, the clustering algorithm successfully finds the 6 working regimes.

Working Regime Normalization

Let's perform a normalization on measurements grouped by different working regimes. First, compute the mean and standard deviation of each sensor measurement grouped by the working regimes identified in the last section.

centerstats = struct('Mean', table(), 'SD', table());
for v = dataVariables
    centerstats.Mean.(char(v)) = splitapply(@mean, trainDataUnwrap.(char(v)), clusterIndex);
    centerstats.SD.(char(v))   = splitapply(@std,  trainDataUnwrap.(char(v)), clusterIndex);
end
centerstats.Mean

ans=6×21 table
    sensor_1    sensor_2    sensor_3    sensor_4    sensor_5    sensor_6    sensor_7    sensor_8    sensor_9    sensor_10    sensor_11    sensor_12    sensor_13    sensor_14    sensor_15    sensor_16    sensor_17    sensor_18    sensor_19    sensor_20    sensor_21
    ________    ________    ________    ________    ________    ________    ________    ________    ________    _________    _________    _________    _________    _________    _________    _________    _________    _________    _________    _________    _________

     489.05      604.91        1502      1311.2      10.52       15.493      394.33        2319      8785.8         1.26      45.487       371.45       2388.2       8135.5       8.6645          0.03      369.69        2319           100       28.527       17.115  
     462.54      536.86      1262.7      1050.2       7.05       9.0277      175.43      1915.4      8015.6      0.93992      36.798       164.57       2028.3       7878.6       10.913          0.02      307.34        1915         84.93       14.263       8.5587  
        445       549.7      1354.4      1127.7       3.91       5.7158      138.63        2212      8328.1       1.0202      42.146       130.55       2388.1       8089.7       9.3745          0.02      331.06        2212           100       10.588       6.3516  
     491.19      607.56      1485.5        1253       9.35       13.657       334.5        2324        8730       1.0777      44.446       314.88       2388.2       8066.1       9.2325      0.022112      365.37        2324           100       24.455       14.674  
     518.67      642.67      1590.3      1408.7      14.62        21.61      553.37      2388.1      9063.3          1.3      47.538       521.41       2388.1         8142       8.4419          0.03      393.18        2388           100        38.82       23.292  
     449.44       555.8      1366.7      1131.5       5.48       8.0003      194.45        2223      8356.4       1.0204      41.981       183.03       2388.1       8072.5       9.3319          0.02      334.22        2223           100       14.831       8.8983

centerstats.SD

ans=6×21 table
     sensor_1     sensor_2    sensor_3    sensor_4     sensor_5     sensor_6     sensor_7    sensor_8    sensor_9    sensor_10     sensor_11    sensor_12    sensor_13    sensor_14    sensor_15    sensor_16     sensor_17    sensor_18    sensor_19     sensor_20    sensor_21
    __________    ________    ________    ________    __________    _________    ________    ________    ________    __________    _________    _________    _________    _________    _________    __________    _________    _________    __________    _________    _________

    1.0346e-11    0.47582      5.8057      8.5359     3.2687e-13    0.0047002    0.67361     0.095286     18.819     0.00017552      0.2538      0.53621     0.098213      16.507      0.038392     6.6965e-16       1.49          0                 0     0.14441      0.08598 
    4.6615e-12    0.36083       5.285      6.8365      1.137e-13    0.0042209    0.44862      0.27006     14.479     0.00087595     0.20843      0.34493      0.28592      13.427      0.043297     3.4003e-16     1.2991          0        3.5246e-12     0.11218     0.066392 
             0    0.43282      5.6422      7.6564     1.1014e-13    0.0049312    0.43972      0.30478     18.325      0.0015563     0.23965      0.34011       0.3286      16.773      0.037476     1.0652e-15     1.4114          0                 0     0.10832     0.064709 
    1.8362e-11    0.46339      5.7888      7.7768     2.5049e-13    0.0047449     0.6046      0.12828      17.58       0.004193     0.23826      0.49843      0.13156      15.457      0.038886       0.004082     1.4729          0                 0     0.13531     0.079592 
    1.2279e-11     0.4967      6.0349      8.9734      6.573e-14    0.0013408    0.86985      0.07082     19.889     2.7314e-14     0.26508      0.73713     0.070545      17.105      0.037289     6.6619e-16     1.5332          0                 0     0.17871       0.1065 
    1.4098e-11     0.4401      5.6013      7.4461     2.2828e-13    0.0017508    0.47564      0.28634     17.446      0.0019595     0.23078      0.37631      0.30766      15.825      0.038139     3.3656e-16     1.4133          0                 0     0.11347     0.067192

The statistics in each regime can be used to normalize the training data. For each ensemble member, extract the operating points of each row, compute its distance to each cluster centers and find the nearest cluster center. Then, for each sensor measurement, subtract the mean and divide it by the standard deviation of that cluster. If the standard deviation is close to 0, set the normalized sensor measurement to 0 because a nearly constant sensor measurement is not useful for remaining useful life estimation. Refer to the last section, "Helper Functions", for more details on regimeNormalization function.

trainDataNormalized = cellfun(@(data) regimeNormalization(data, centers, centerstats), ...
    trainData, 'UniformOutput', false);

Visualize the data normalized by working regime. Degradation trends for some sensor measurements are now revealed after normalization.

figure
helperPlotEnsemble(trainDataNormalized, timeVariable, dataVariables(1:4), nsample)

Trendability Analysis

Now select the most trendable sensor measurements to construct a health indicator for prediction. For each sensor measurement, a linear degradation model is estimated and the slopes of the signals are ranked.

numSensors = length(dataVariables);
signalSlope = zeros(numSensors, 1);
warn = warning('off');
for ct = 1:numSensors
    tmp = cellfun(@(tbl) tbl(:, cellstr(dataVariables(ct))), trainDataNormalized, 'UniformOutput', false);
    mdl = linearDegradationModel(); % create model
    fit(mdl, tmp); % train mode
    signalSlope(ct) = mdl.Theta;
end
warning(warn);

Sort the signal slopes and select 8 sensors with the largest slopes.

[~, idx] = sort(abs(signalSlope), 'descend');
sensorTrended = sort(idx(1:8))

sensorTrended = 8×1

     2
     3
     4
     7
    11
    12
    15
    17

Visualize the selected trendable sensor measurements.

figure
helperPlotEnsemble(trainDataNormalized, timeVariable, dataVariables(sensorTrended(3:6)), nsample)

Notice that some of the most trendable signals show positive trends, while others show negative trends.

Construct Health Indicator

This section focuses on fusing the sensor measurements into a single health indicator, with which a similarity-based model is trained.

All the run-to-failure data is assumed to start with a healthy condition. The health condition at the beginning is assigned a value of 1 and the health condition at failure is assigned a value of 0. The health condition is assumed to be linearly degrading from 1 to 0 over time. This linear degradation is used to help fuse the sensor values. More sophisticated sensor fusion techniques are described in the literature [2-5].

for j=1:numel(trainDataNormalized)
    data = trainDataNormalized{j};
    rul = max(data.time)-data.time;
    data.health_condition = rul / max(rul);
    trainDataNormalized{j} = data;
end

Visualize the health condition.

figure
helperPlotEnsemble(trainDataNormalized, timeVariable, "health_condition", nsample)

The health condition of all ensemble members change from 1 to 0 with varying degrading speeds.

Now fit a linear regression model of Health Condition with the most trended sensor measurements as regressors:

Health Condition ~ 1 + Sensor2 + Sensor3 + Sensor4 + Sensor7 + Sensor11 + Sensor12 + Sensor15 + Sensor17

trainDataNormalizedUnwrap = vertcat(trainDataNormalized{:});

sensorToFuse = dataVariables(sensorTrended);
X = trainDataNormalizedUnwrap{:, cellstr(sensorToFuse)};
y = trainDataNormalizedUnwrap.health_condition;
regModel = fitlm(X,y);
bias = regModel.Coefficients.Estimate(1)

bias = 0.5000

weights = regModel.Coefficients.Estimate(2:end)

weights = 8×1

   -0.0323
   -0.0300
   -0.0527
    0.0057
   -0.0646
    0.0054
   -0.0431
   -0.0379

Construct a single health indicator by multiplying the sensor measurements with their associated weights.

trainDataFused = cellfun(@(data) degradationSensorFusion(data, sensorToFuse, weights), trainDataNormalized, ...
    'UniformOutput', false);

Visualize the fused health indicator for training data.

figure
helperPlotEnsemble(trainDataFused, [], 1, nsample)
xlabel('Time')
ylabel('Health Indicator')
title('Training Data')

The data from multiple sensors are fused into a single health indicator. The health indicator is smoothed by a moving average filter. See helper function "degradationSensorFusion" in the last section for more details.

Apply Same Operation to Validation Data

Repeat the regime normalization and sensor fusion process with the validation data set.

validationDataNormalized = cellfun(@(data) regimeNormalization(data, centers, centerstats), ...
    validationData, 'UniformOutput', false);
validationDataFused = cellfun(@(data) degradationSensorFusion(data, sensorToFuse, weights), ...
    validationDataNormalized, 'UniformOutput', false);

Visualize the health indicator for validation data.

figure
helperPlotEnsemble(validationDataFused, [], 1, nsample)
xlabel('Time')
ylabel('Health Indicator')
title('Validation Data')

Build Similarity RUL Model

Now build a residual-based similarity RUL model using the training data. In this setting, the model tries to fit each fused data with a 2nd order polynomial.

The distance between data $i$ and data $j$ is computed by the 1-norm of the residual

$d (i, j) = | | y_{j} - {y_{}^{ˆ}}_{j, i} | |_{1}$

where $y_{j}$ is the health indicator of machine $j$ , ${y_{}^{ˆ}}_{j, i}$ is the estimated health indicator of machine $j$ using the 2nd order polynomial model identified in machine $i$ .

The similarity score is computed by the following formula

$s c o r e (i, j) = e x p (- d (i, j)^{2})$

Given one ensemble member in the validation data set, the model will find the nearest 50 ensemble members in the training data set, fit a probability distribution based on the 50 ensemble members, and use the median of the distribution as an estimate of RUL.

mdl = residualSimilarityModel(...
    'Method', 'poly2',...
    'Distance', 'absolute',...
    'NumNearestNeighbors', 50,...
    'Standardize', 1);

fit(mdl, trainDataFused);

Performance Evaluation

To evaluate the similarity RUL model, use 50%, 70% and 90% of a sample validation data to predict its RUL.

breakpoint = [0.5, 0.7, 0.9];
validationDataTmp = validationDataFused{3}; % use one validation data for illustration

Use the validation data before the first breakpoint, which is 50% of the lifetime.

bpidx = 1;
validationDataTmp50 = validationDataTmp(1:ceil(end*breakpoint(bpidx)),:);
trueRUL = length(validationDataTmp) - length(validationDataTmp50);
[estRUL, ciRUL, pdfRUL] = predictRUL(mdl, validationDataTmp50);

Visualize the validation data truncated at 50% and its nearest neighbors.

figure
compare(mdl, validationDataTmp50);

Visualize the estimated RUL compared to the true RUL and the probability distribution of the estimated RUL.

figure
helperPlotRULDistribution(trueRUL, estRUL, pdfRUL, ciRUL)

There is a relatively large error between the estimated RUL and the true RUL when the machine is in an intermediate health stage. In this example, the most similar 10 curves are close at the beginning, but bifurcate when they approach the failure state, resulting in roughly two modes in the RUL distribution.

Use the validation data before the second breakpoint, which is 70% of the lifetime.

bpidx = 2;
validationDataTmp70 = validationDataTmp(1:ceil(end*breakpoint(bpidx)), :);
trueRUL = length(validationDataTmp) - length(validationDataTmp70);
[estRUL,ciRUL,pdfRUL] = predictRUL(mdl, validationDataTmp70);

figure
compare(mdl, validationDataTmp70);

figure
helperPlotRULDistribution(trueRUL, estRUL, pdfRUL, ciRUL)

When more data is observed, the RUL estimation is enhanced.

Use the validation data before the third breakpoint, which is 90% of the lifetime.

bpidx = 3;
validationDataTmp90 = validationDataTmp(1:ceil(end*breakpoint(bpidx)), :);
trueRUL = length(validationDataTmp) - length(validationDataTmp90);
[estRUL,ciRUL,pdfRUL] = predictRUL(mdl, validationDataTmp90);

figure
compare(mdl, validationDataTmp90);

figure
helperPlotRULDistribution(trueRUL, estRUL, pdfRUL, ciRUL)

When the machine is close to failure, the RUL estimation is even more enhanced in this example.

Now repeat the same evaluation procedure for the whole validation data set and compute the error between estimated RUL and true RUL for each breakpoint.

numValidation = length(validationDataFused);
numBreakpoint = length(breakpoint);
error = zeros(numValidation, numBreakpoint);

for dataIdx = 1:numValidation
    tmpData = validationDataFused{dataIdx};
    for bpidx = 1:numBreakpoint
        tmpDataTest = tmpData(1:ceil(end*breakpoint(bpidx)), :);
        trueRUL = length(tmpData) - length(tmpDataTest);
        [estRUL, ~, ~] = predictRUL(mdl, tmpDataTest);
        error(dataIdx, bpidx) = estRUL - trueRUL;
    end
end

Visualize the histogram of the error for each breakpoint together with its probability distribution.

[pdf50, x50] = ksdensity(error(:, 1));
[pdf70, x70] = ksdensity(error(:, 2));
[pdf90, x90] = ksdensity(error(:, 3));

figure
ax(1) = subplot(3,1,1);
hold on
histogram(error(:, 1), 'BinWidth', 5, 'Normalization', 'pdf')
plot(x50, pdf50)
hold off
xlabel('Prediction Error')
title('RUL Prediction Error using first 50% of each validation ensemble member')

ax(2) = subplot(3,1,2);
hold on
histogram(error(:, 2), 'BinWidth', 5, 'Normalization', 'pdf')
plot(x70, pdf70)
hold off
xlabel('Prediction Error')
title('RUL Prediction Error using first 70% of each validation ensemble member')

ax(3) = subplot(3,1,3);
hold on
histogram(error(:, 3), 'BinWidth', 5, 'Normalization', 'pdf')
plot(x90, pdf90)
hold off
xlabel('Prediction Error')
title('RUL Prediction Error using first 90% of each validation ensemble member')
linkaxes(ax)

Plot the prediction error in a box plot to visualize the median, 25-75 quantile and outliers.

figure
boxplot(error, 'Labels', {'50%', '70%', '90%'})
ylabel('Prediction Error')
title('Prediction error using different percentages of each validation ensemble member')

Compute and visualize the mean and standard deviation of the prediction error.

errorMean = mean(error)

errorMean = 1×3

   -1.1217    9.5186    9.6540

errorMedian = median(error)

errorMedian = 1×3

    1.3798   11.8172   10.3457

errorSD = std(error)

errorSD = 1×3

   21.7315   13.5194   12.3083

figure
errorbar([50 70 90], errorMean, errorSD, '-o', 'MarkerEdgeColor','r')
xlim([40, 100])
xlabel('Percentage of validation data used for RUL prediction')
ylabel('Prediction Error')
legend('Mean Prediction Error with 1 Standard Deviation Eror bar', 'Location', 'south')

It is shown that the error becomes more concentrated around 0 (less outliers) as more data is observed.

References

[1] A. Saxena and K. Goebel (2008). "PHM08 Challenge Data Set", NASA Ames Prognostics Data Repository (http://ti.arc.nasa.gov/project/prognostic-data-repository), NASA Ames Research Center, Moffett Field, CA

[2] Roemer, Michael J., Gregory J. Kacprzynski, and Michael H. Schoeller. "Improved diagnostic and prognostic assessments using health management information fusion." AUTOTESTCON Proceedings, 2001. IEEE Systems Readiness Technology Conference. IEEE, 2001.

[3] Goebel, Kai, and Piero Bonissone. "Prognostic information fusion for constant load systems." Information Fusion, 2005 8th International Conference on. Vol. 2. IEEE, 2005.

[4] Wang, Peng, and David W. Coit. "Reliability prediction based on degradation modeling for systems with multiple degradation measures." Reliability and Maintainability, 2004 Annual Symposium-RAMS. IEEE, 2004.

[5] Jardine, Andrew KS, Daming Lin, and Dragan Banjevic. "A review on machinery diagnostics and prognostics implementing condition-based maintenance." Mechanical systems and signal processing 20.7 (2006): 1483-1510.

Helper Functions

function data = regimeNormalization(data, centers, centerstats)
% Perform normalization for each observation (row) of the data
% according to the cluster the observation belongs to.
conditionIdx = 3:5;
dataIdx = 6:26;

% Perform row-wise operation
data{:, dataIdx} = table2array(...
    rowfun(@(row) localNormalize(row, conditionIdx, dataIdx, centers, centerstats), ...
    data, 'SeparateInputs', false));
end

function rowNormalized = localNormalize(row, conditionIdx, dataIdx, centers, centerstats)
% Normalization operation for each row.

% Get the operating points and sensor measurements
ops = row(1, conditionIdx);
sensor = row(1, dataIdx);

% Find which cluster center is the closest
dist = sum((centers - ops).^2, 2);
[~, idx] = min(dist);

% Normalize the sensor measurements by the mean and standard deviation of the cluster.
% Reassign NaN and Inf to 0.
rowNormalized = (sensor - centerstats.Mean{idx, :}) ./ centerstats.SD{idx, :};
rowNormalized(isnan(rowNormalized) | isinf(rowNormalized)) = 0;
end

function dataFused = degradationSensorFusion(data, sensorToFuse, weights)
% Combine measurements from different sensors according 
% to the weights, smooth the fused data and offset the data
% so that all the data start from 1

% Fuse the data according to weights
dataToFuse = data{:, cellstr(sensorToFuse)};
dataFusedRaw = dataToFuse*weights;

% Smooth the fused data with moving mean
stepBackward = 10;
stepForward = 10;
dataFused = movmean(dataFusedRaw, [stepBackward stepForward]);

% Offset the data to 1
dataFused = dataFused + 1 - dataFused(1);
end