Invalid training data. Responses must be nonempty.

1 view (last 30 days)
Hello,
I am trying to build simple network which will recognize gender from voice. I have many records. I read them in DataStore but I cant get them in sequenceInputLayer. I tried everything. I know that my Neural network will maybe not work because of layers, but I only want to strat it and than I will make it accurate. Every record is longer than 6000 samples.
I gives me this error:
Error using trainNetwork (line 183)
Invalid training data. Responses must be nonempty.
Error in Program2 (line 31)
net = trainNetwork(audioTrain,layers, options)
clc;
close all;
clear all;
net = network
audio = audioDatastore(fullfile('E:\Projekt\M or F'), ...
'IncludeSubfolders',true, ...
'FileExtension', '.wav', ...
'LabelSource','foldernames');
labelCount = countEachLabel(audio)
numTrainFiles = 1000;
[audioTrain,audioValidation] = splitEachLabel(audio,numTrainFiles,'randomize');
layers = [ ...
sequenceInputLayer(6000)
fullyConnectedLayer(10)
softmaxLayer
classificationLayer];
options = trainingOptions("adam", ...
"MaxEpochs",4, ...
"MiniBatchSize",256, ...
"Plots","training-progress", ...
"Verbose",false, ...
"Shuffle","every-epoch", ...
"LearnRateSchedule","piecewise", ...
"LearnRateDropFactor",0.1, ...
"LearnRateDropPeriod",1, ...
'ValidationFrequency',100);
net = trainNetwork(audioTrain,layers, options)

Accepted Answer

jibrahim
jibrahim on 1 Mar 2021
Hi Martin,
You can't pass an audioDatastore directly to the network. Create a transform datastore that organizes the data into (audio,label) pairs.
The code below is a simple example where we try to recognize a speaker using an idea similar to yours. The accuracy is not good, but hopefully it is a good starting point.
If you have not done so already, O also recommend looking into this gender ID example in Audio Toolbox:
You might have better luck extracting features from the audio, rather than passing the raw audio to a network.
In any case, here is some example code:
% Download the FSDD data set
url = 'https://ssd.mathworks.com/supportfiles/audio/FSDD.zip';
datasetFolder = tempdir;
unzip(url,datasetFolder)
% Create datastore
% Use speaker name in file name as label
ads = audioDatastore(fullfile(datasetFolder,'FSDD'), ...
'IncludeSubfolders',true);
[~,filenames] = fileparts(ads.Files);
ads.Labels = categorical(extractBetween(filenames,'_','_'));
[adsTrain,adsValidation] = splitEachLabel(ads,.9);
inputSize = 500;
numHiddenUnits = 100;
numClasses = length(unique(ads.Labels));
layers = [ ...
sequenceInputLayer(inputSize)
bilstmLayer(numHiddenUnits,"OutputMode","sequence")
bilstmLayer(numHiddenUnits,"OutputMode","last")
fullyConnectedLayer(numClasses)
softmaxLayer
classificationLayer];
% Transformed datastores to be passed directly to network
tdsTrain = transform(@(x,info)processData(x,inputSize,info),adsTrain,'IncludeInfo',true);
tdsValidation = transform(@(x,info)processData(x,inputSize,info),adsValidation,'IncludeInfo',true);
options = trainingOptions("adam", ...
"MaxEpochs",4, ...
"MiniBatchSize",256, ...
"Plots","training-progress", ...
"Verbose",false, ...
"Shuffle","every-epoch", ...
"LearnRateSchedule","piecewise", ...
"LearnRateDropFactor",0.1, ...
"LearnRateDropPeriod",1, ...
"ValidationData",tdsValidation,...
'ValidationFrequency',100);
net = trainNetwork(tdsTrain,layers, options)
Here is the transform function I used:
function [data,info] = processData(audio,inputSize,info)
% Break audio into sequences to length inputSize with overlap
% inputSize/2
audio = buffer(audio,inputSize,floor(inputSize/2));
audio = mat2cell(audio,inputSize,ones(1,size(audio,2))).';
label = repmat(info.Label,size(audio,1),1);
data = table(audio,label);
end

More Answers (0)

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!