How can I read .ogg audio datasets for training and applying LSTM in Matlab according to the following code?

3 views (last 30 days)
There is a Matlab code that is doing the following steps for deep learning and applying LSTM, I need to change first three steps to use our dataset to train this model need to apply that for .ogg audio files so Create and Use some audio files with .ogg format as sample data and give me the code.
The following steps is for your information:
Three classes of audio signals are generated and labeled as 'white', 'brown', and 'pink'. Each class has 1000 samples. 800 samples from each class are used as the training samples to train the deep neural network, so total 800*3=2400 samples in the training dataset. Their labels are their class names 'white', 'brown', and 'pink'. (Lines 29 and 30) 200 samples from each class are used as the validation samples to test the performance of deep neural network, so total 600 samples in the validation dataset. Their labels are their class names 'white', 'brown', and 'pink' (Lines 32 and 33) Extract features from the training dataset and validation dataset. define the structure of the neural network model (LSTM) set training options train the model iteratively using the training dataset and test the model using the validation dataset every iteration. finish training and get the trained model. generate test dataset and use the trained model to classify the test dataset into three classes, 'white', 'brown', and 'pink'.
Our dataset has 2 classes, 'normal' and 'anomaly', instead of three classes 'white', 'brown', and 'pink' used in this example. We know what signals are normal and what signals are anomaly, so the class of each signal is known and you don't need to do labeling. You can separate our data into three parts. For example, 80% of all normal and anomaly signals for training (2 classes), 10% for validation, and 10% for testing.
code: '''
s = 44.1e3;
duration = 0.5;
N = duration*fs;
wNoise = 2*rand([N,1000]) - 1;
wLabels = repelem(categorical("white"),1000,1);
bNoise = filter(1,[1,-0.999],wNoise);
bNoise = bNoise./max(abs(bNoise),[],'all');
bLabels = repelem(categorical("brown"),1000,1);
pNoise = pinknoise([N,1000]);
pLabels = repelem(categorical("pink"),1000,1)
sound(wNoise(:,1),fs)
melSpectrogram(wNoise(:,1),fs)
title('White Noise')
sound(bNoise(:,1),fs)
melSpectrogram(bNoise(:,1),fs)
title('Brown Noise')
sound(pNoise(:,1),fs)
melSpectrogram(pNoise(:,1),fs)
title('Pink Noise')
featuresTrain = extract(aFE,audioTrain);
[numHopsPerSequence,numFeatures,numSignals] = size(featuresTrain)
audioTrain = [wNoise(:,1:800),bNoise(:,1:800),pNoise(:,1:800)];
labelsTrain = [wLabels(1:800);bLabels(1:800);pLabels(1:800)];
audioValidation = [wNoise(:,801:end),bNoise(:,801:end),pNoise(:,801:end)];
labelsValidation = [wLabels(801:end);bLabels(801:end);pLabels(801:end)];
aFE = audioFeatureExtractor("SampleRate",fs, ...
"SpectralDescriptorInput","melSpectrum", ...
"spectralCentroid",true, ...
"spectralSlope",true);
featuresTrain = permute(featuresTrain,[2,1,3]);
featuresTrain = squeeze(num2cell(featuresTrain,[1,2]));
numSignals = numel(featuresTrain)
[numFeatures,numHopsPerSequence] = size(featuresTrain{1})
featuresValidation = extract(aFE,audioValidation);
featuresValidation = permute(featuresValidation,[2,1,3]);
featuresValidation = squeeze(num2cell(featuresValidation,[1,2]));
layers = [ ...
sequenceInputLayer(numFeatures)
lstmLayer(50,"OutputMode","last")
fullyConnectedLayer(numel(unique(labelsTrain)))
softmaxLayer
classificationLayer];
options = trainingOptions("adam", ...
"Shuffle","every-epoch", ...
"ValidationData",{featuresValidation,labelsValidation}, ...
"Plots","training-progress", ...
"Verbose",false);
net = trainNetwork(featuresTrain,labelsTrain,layers,options);
wNoiseTest = 2*rand([N,1]) - 1;
classify(net,extract(aFE,wNoiseTest)')
bNoiseTest = filter(1,[1,-0.999],wNoiseTest);
bNoiseTest= bNoiseTest./max(abs(bNoiseTest),[],'all');
classify(net,extract(aFE,bNoiseTest)')
pNoiseTest = pinknoise(N);
classify(net,extract(aFE,pNoiseTest)')

Answers (1)

Kiran Felix Robert
Kiran Felix Robert on 8 Oct 2020
Hi Pooyan,
The audioread function can be used to read .ogg files.
The code uses 3 classes, each containing N different audio files, each file has 1000 samples for your deep learning application.
On similar grounds, assuming you have 100 audio files of 1000 samples each, for both normal and anomaly classes, you can use a loop to read the files and split the data for training / Validation / Testing.
The following code shows you an example, (Assuming you have named the files as Normal_1.ogg, Normal_2.ogg, …, Normal_100.ogg and Anomaly_1.ogg, Anomaly_2.ogg, …. ,Anomaly_100.ogg )
normal = zeros(100,1000);
anomaly = zeros(100,1000);
for i = 1:100
normal_name = strcat('normal_',num2str(i),'.ogg');
anomoly_name = strcat('anomaly_',num2str(i),'.ogg');
normal(i) = audioread(normal_name,Fs);
anomaly(i) = audioread(anomaly_name,Fs);
end
The above arrays can be split for training/Validation/Testing data set as per your requirement.
Kiran Felix Robert

Categories

Find more on Get Started with Audio Toolbox in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!