Main Content

Accelerate Signal Feature Extraction and Classification Using a Parallel Pool of Workers

Since R2024b

This example uses signal feature extraction objects to extract multidomain features that you can use to identify faulty bearing signals in mechanical systems [1]. Feature extraction objects enable you to efficiently compute multiple features by reducing the number of times that signals are transformed into a particular domain. The example shows how to compute features using a parallel pool of CPU workers while running on:

  • An AMD EPYC 7313P 16-Core Processor @ 3GHz CPU worker

The acceleration results may vary based on the available hardware resources.

This example extends the workflow described in Machine Learning and Deep Learning Classification Using Signal Feature Extraction Objects. To learn how to extract features and train models using a GPU, see Accelerate Signal Feature Extraction and Classification Using a GPU.

Download and Prepare Data

The data set contains acceleration signals collected from rotating machines in a bearing test rig and real-world machines such as oil pump bearing, intermediate speed bearing, and a planet bearing. There are 34 files in total. The signals in the files are sampled at fs = 48828 Hz. The filenames describe the signals they contain:

  • HealthySignal_*.mat Healthy signals

  • InnerRaceFault_*.mat Signals with inner race faults

  • OuterRaceFault_*.mat Signals with outer race faults

Download the data files into a temporary directory. Create a signalDatastore object to access the data in the files and obtain the labels.

dataURL = "https://www.mathworks.com/supportfiles/SPT/data/rollingBearingDataset.zip";
datasetFolder = fullfile(tempdir,"rollingBearingDataset");
zipFile = fullfile(tempdir,"rollingBearingDataset.zip");
if ~exist(datasetFolder,"dir")
    websave(zipFile,dataURL);
    unzip(zipFile,datasetFolder);
end

Create a signalDatastore object to access the data in the files and obtain the labels that refer to the signal category.

sds = signalDatastore(datasetFolder);

Filenames in the data set includes the label. Get a list of labels from the filenames in the datastore using the filenames2labels function.

labels = filenames2labels(sds,ExtractBefore=pattern("Signal"|"Fault"));

Set Up Feature Extraction Objects

Set up the feature extractors that extract multidomain features from the signals. Use these features to implement machine learning and deep learning solutions that classify signals as healthy, as having inner race faults, or as having outer race faults [2].

Use the signalTimeFeatureExtractor, signalFrequencyFeatureExtractor, and signalTimeFrequencyFeatureExtractor objects to extract features from all the signals.

  • For time domain, use root-mean-square value, impulse factor, standard deviation, and clearance factor as features.

  • For frequency domain, use median frequency, band power, power bandwidth, and peak amplitude of the power spectral density (PSD) as features.

  • For time-frequency domain, use spectral kurtosis [3] of the signal spectrogram as a feature.

Create a signalTimeFeatureExtractor object to extract time-domain features using the sample rate fs.

fs = 48828;

timeFE = signalTimeFeatureExtractor(SampleRate=fs, ...
    RMS=true, ...
    ImpulseFactor=true, ...
    StandardDeviation=true, ...
    ClearanceFactor=true);

Create a signalFrequencyFeatureExtractor object to extract frequency-domain features.

freqFE = signalFrequencyFeatureExtractor(SampleRate=fs, ...
    MedianFrequency=true, ...
    BandPower=true, ...
    PowerBandwidth=true, ...
    PeakAmplitude=true);

Create a signalTimeFrequencyFeatureExtractor object to extract time-frequency-domain features.

timeFreqFE = signalTimeFrequencyFeatureExtractor(SampleRate=fs, ...
    TimeSpectrum=true);

setExtractorParameters(timeFreqFE,"scalogram", ...
    VoicesPerOctave=16,FrequencyLimits=[50 20000]);

Train SVM Classifier Using Multidomain Features

Extract Multidomain Features

Extract multidomain features using a single CPU worker and a parallel pool of CPU workers and measure the computation times.

Extract features using a single CPU worker.

tStart = tic;
SVMFeatures = cellfun(@(a,b,c) [a b c],extract(timeFE,sds),extract(freqFE,sds), ...
    extract(timeFreqFE,sds),UniformOutput=false);
tCPU = toc(tStart);

Repeat the process using a parallel pool of CPU workers. Set the UseParallel flag in the extract functions of all the feature extractors to true. Start a parallel pool of workers before you start measuring the computation time because the parallel pool can take some time to start.

if isempty(gcp("nocreate"))
    parpool("processes");
end

Obtain multidomain features using a parallel pool of workers.

tStart = tic;
[~] = cellfun(@(a,b,c) [a b c], extract(timeFE,sds,UseParallel=true), ...
    extract(freqFE,sds,UseParallel=true),extract(timeFreqFE,sds,UseParallel=true), ...
    UniformOutput=false);
tPool = toc(tStart);

Compare the run times to see the increase in speed you get when you use a parallel pool of CPU workers for feature extraction.

bar(["CPU" "Parallel Pool of Workers"],[tCPU tPool],0.8,FontSize=12, ...
    Labels = ["" (num2str(round(tCPU/tPool,1))+"x faster")])
title("Feature Extraction Time: CPU vs. Parallel Pool of Workers")
ylabel("Run Time (seconds)")

Train SVM Classifier Model

Obtain multidomain feature tables that are used to train a multiclass SVM classifier and observe the classification accuracy.

Obtain the feature table from the multidomain feature matrix.

featureMatrix = cell2mat(SVMFeatures);

Split the feature matrix into training and testing feature data sets. Obtain their corresponding labels. Reset the random number generator for reproducible results.

rng("default")
cvp = cvpartition(labels,Holdout=0.25);

trainingResponse = labels(cvp.training);
testResponse = labels(cvp.test);

trainMatrix = featureMatrix(cvp.training,:);
testMatrix = featureMatrix(cvp.test,:);

Compute the mean and the standard deviation for the training feature matrix. Use the results to normalize the training and testing feature matrices. Using the statistics from only the training feature prevent testing data from leaking into the training process and ensures that all the feature entries have the same weight.

[trainMatrixNorm,testMatrixNorm] = helperGetNormalizedSVMFeatureMatrices(trainMatrix,testMatrix);

Obtain training and testing predictors from the normalized training and testing matrices.

trainingPredictors = array2table(trainMatrixNorm);
testPredictors = array2table(testMatrixNorm);

Use the training predictors to train an SVM classifier using a single CPU worker.

SVMModel = fitcecoc(trainingPredictors,trainingResponse);

Use the normalized test features to analyze the accuracy of the SVM classifier.

predictedLabels = predict(SVMModel,testMatrixNorm);

figure
cm = confusionchart(testResponse,predictedLabels, ...
ColumnSummary="column-normalized",RowSummary="row-normalized");

Calculate the classifier accuracy.

accuracy = trace(cm.NormalizedValues)/sum(cm.NormalizedValues,"all");
fprintf("The classification accuracy on the test partition is %2.1f%%",accuracy*100)
The classification accuracy on the test partition is 100.0%

Train LSTM Network Using Features

Set Up Feature Extraction Objects for Training LSTM Network

Each signal in the signalDatastore object sds has around 150,000 samples. Window each signal into 2000-sample frames and extract multidomain features from it. Set FrameSize for all three feature extractors to 2000 to achieve the signal framing.

timeFE.FrameSize = 2000;
freqFE.FrameSize = 2000;
timeFreqFE.FrameSize = 2000;

Features extracted from frames correspond to a sequence of features over time that has lower dimension than the original signal. The dimension reduction helps the LSTM network to train faster. The workflow in this section follows these steps:

  1. Split the signal datastore and labels into training and test sets.

  2. For each signal in the training and test sets, use all three feature extractor objects to extract features for multiple signal frames. Concatenate the multidomain features to obtain the feature matrix.

  3. Normalize the training and testing feature matrices.

  4. Train the recurrent deep learning network using the labels and feature matrices.

  5. Classify the signals using the trained network.

Split the labels into training and testing sets. Use 70% of the labels for training set and the remaining 30% for testing data. Use splitlabels to obtain the desired partition of the labels. This ensures that each split data set contains similar label proportions as the entire data set. Obtain the corresponding datastore subsets from the signalDatastore object. Reset the random number generator for reproducible results.

rng("default")

splitIndices = splitlabels(labels,0.7,"randomized");

trainIdx = splitIndices{1};
trainLabels = labels(trainIdx);
testIdx = splitIndices{2};
testLabels = labels(testIdx);

Obtain the training and testing signalDatastore subsets from sds for multidomain feature extraction from the signals in them.

trainDs = subset(sds,trainIdx); 
testDs = subset(sds,testIdx);

Extract and Normalize Multidomain Features

Measure the speedup from using parallel pool for LSTM feature extraction.

tStart = tic;
trainFeaturesCPU = cellfun(@(a,b,c) [a b c], ...
    extract(timeFE,trainDs),extract(freqFE,trainDs),extract(timeFreqFE,trainDs), ...
    UniformOutput=false);
testFeaturesCPU = cellfun(@(a,b,c) [a b c], ...
    extract(timeFE,testDs),extract(freqFE,testDs),extract(timeFreqFE,testDs), ...
    UniformOutput=false);
tCPU_LSTM = toc(tStart);

Obtain multidomain training and testing features from the signalDatastore subsets using a parallel pool of workers.

tStart = tic;
trainFeatures = cellfun(@(a,b,c) [a b c], ...
    extract(timeFE,trainDs,UseParallel=true), ...
    extract(freqFE,trainDs,UseParallel=true), ...
    extract(timeFreqFE,trainDs,UseParallel=true), ...
    UniformOutput=false);

testFeatures = cellfun(@(a,b,c) [a b c], ...
    extract(timeFE,testDs,UseParallel=true), ...
    extract(freqFE,testDs,UseParallel=true), ...
    extract(timeFreqFE,testDs,UseParallel=true), ...
    UniformOutput=false);

tPool_LSTM = toc(tStart);

Compare the run times to see the increase in speed you get when you use a parallel pool of CPU workers for feature extraction.

bar(["CPU" "Parallel Pool of Workers"],[tCPU_LSTM tPool_LSTM],0.8,FontSize=12, ...
    Labels = ["" (num2str(round(tCPU_LSTM/tPool_LSTM,1))+"x faster")])
title("Feature Extraction Time: CPU vs. Parallel Pool of Workers")
ylabel("Run Time (seconds)")

Normalize trainFeatures and testFeatures using the trainFeatures statistics.

[trainFeaturesNorm,testFeaturesNorm] = ...
    helperGetNormalizedLSTMFeatureMatrices(trainFeatures,testFeatures);

Train LSTM network

Train an LSTM network using the training features and their corresponding labels.

numFeatures = size(trainFeatures{1},2);
numClasses = 3;
 
layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(50,OutputMode="last")
    fullyConnectedLayer(numClasses)
    softmaxLayer];
 
options = trainingOptions("adam", ...
    Shuffle="every-epoch", ...    
    Plots="training-progress", ...
    ExecutionEnvironment="cpu", ...
    MaxEpochs=80, ...
    Verbose=false);

net = trainnet(trainFeaturesNorm,trainLabels,layers,"crossentropy",options);

Use the trained network to classify the signals in the test data set and analyze the accuracy of the network.

scores = minibatchpredict(net,testFeaturesNorm);
classNames = categories(labels);
predTest = scores2label(scores,classNames);
 
figure
cm = confusionchart (testLabels,predTest, ...
    ColumnSummary="column-normalized",RowSummary="row-normalized");

References

[1] Riaz, Saleem, Hassan Elahi, Kashif Javaid, and Tufail Shahzad. "Vibration Feature Extraction and Analysis for Fault Diagnosis of Rotating Machinery - A Literature Survey." Asia Pacific Journal of Multidisciplinary Research 5, no. 1 (2017): 103–110.

[2] Caesarendra, Wahyu, and Tegoeh Tjahjowidodo. “A Review of Feature Extraction Methods in Vibration-Based Condition Monitoring and Its Application for Degradation Trend Estimation of Low-Speed Slew Bearing.” Machines 5, no. 4 (December 2017): 21. https://doi.org/10.3390/machines5040021

[3] Tian, Jing, Carlos Morillo, Michael H. Azarian, and Michael Pecht. “Motor Bearing Fault Detection Using Spectral Kurtosis-Based Feature Extraction Coupled With K-Nearest Neighbor Distance Analysis.” IEEE Transactions on Industrial Electronics 63, no. 3 (March 2016): 1793–1803. https://doi.org/10.1109/TIE.2015.2509913

Helper Function

helperGetNormalizedSVMFeatureMatrices – This function normalizes the training and test feature matrices using the mean and standard deviation of the training feature matrix. These normalized features are going to be used for training an SVM model.

function [trainMatrixNorm,testMatrixNorm] = helperGetNormalizedSVMFeatureMatrices(trainMatrix,testMatrix)
% Compute normalization parameters from training data
featureMean = mean(trainMatrix, 1);
featureStd = std(trainMatrix, 0, 1);

% Normalize both using TRAINING parameters
trainMatrixNorm = (trainMatrix - featureMean) ./ featureStd;
testMatrixNorm = (testMatrix - featureMean) ./ featureStd;
end

helperGetNormalizedLSTMFeatureMatrices – This function normalizes the training and test feature matrices using the mean and standard deviation of the training feature matrix. The normalized features are going to be used for training an LSTM network.

function [trainFeaturesNorm,testFeaturesNorm] = helperGetNormalizedLSTMFeatureMatrices(trainFeatures,testFeatures)
% Compute normalization parameters from training data
trainMatrix = cell2mat(trainFeatures);
featureMean = mean(trainMatrix, 1);
featureStd = std(trainMatrix, 0, 1);

% Handle zero-variance features
zeroVarIdx = featureStd == 0;
featureStd(zeroVarIdx) = 1;  % Avoid division by zero

% Normalize training sequences
trainFeaturesNorm = cell(size(trainFeatures));
for i = 1:numel(trainFeatures)
    trainFeaturesNorm{i} = (trainFeatures{i} - featureMean) ./ featureStd;
end

if nargin == 2
    % Normalize test sequences using TRAINING parameters
    testFeaturesNorm = cell(size(testFeatures));
    for i = 1:numel(testFeatures)
        testFeaturesNorm{i} = (testFeatures{i} - featureMean) ./ featureStd;
    end
else
    testFeaturesNorm = [];
end
end

See Also

Functions

Objects

Topics