Retraining YAMNet for audio classification returns channel mismatch error in "deep.internal.train.Trainer/train"

Question

Ben on 3 Oct 2024

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/2157235-retraining-yamnet-for-audio-classification-returns-channel-mismatch-error-in-deep-internal-train-tr

Commented: Ben on 4 Oct 2024

I am retraining YAMNet for a binary classification task, operating on spectrograms of audio signals. My training audio has two classes, positive and negative. Audio is preprocessed & features extracted using yamnetPreprocess(). When training the network, trainnet() produces the following error:

Error using deep.internal.train.Trainer/train (line 74)
Number of channels in predictions (2) must match the number of
channels in the targets (3).
Error in deep.internal.train.ParallelTrainer>iTrainWithSplitCommunicator (line 227)
    remoteNetwork = train(remoteTrainer, remoteNetwork, workerMbq);
Error in deep.internal.train.ParallelTrainer/computeTraining (line 127)
                spmd
Error in deep.internal.train.Trainer/train (line 59)
                    net = computeTraining(trainer, net, mbq);
Error in deep.internal.train.trainnet (line 54)
net = train(trainer, net, mbq);
Error in trainnet (line 42)
    [net,info] = deep.internal.train.trainnet(mbq, net, loss, options, ...
Error in train_DenseNet_detector_from_semi_synthetic_dataset (line 192)
[trained_network, train_info] = trainnet(trainFeatures, trainLabels', net, "crossentropy", options);

My understanding of this error is that it indicates a mismatch between the number of classes the network expects, and the number of classes in the dataset. I do not see how this can be possible, considering the number of classes in the network is explicitly set by the number of classes in the datastore:

classNames = unique(ads.Labels);
numClasses = numel(classNames);
net = audioPretrainedNetwork("yamnet", NumClasses=numClasses);

My script is based on this MATLAB tutorial: audioPretrainedNetwork and there are no functional differences in the way I'm building datastores or preprocessing the data. The training options and the call to trainnet() are configured as follows:

options = trainingOptions('adam', ...
    InitialLearnRate = initial_learn_rate, ...
    MaxEpochs = max_epochs, ...
    MiniBatchSize = mini_batch_size, ...
    Shuffle = "every-epoch", ...
    Plots = "training-progress", ...
    Metrics = "accuracy", ...
    Verbose = 1, ...
    ValidationData = {single(validationFeatures), validationLabels'}, ...
    ValidationFrequency = validationFrequency,...
    ExecutionEnvironment="parallel-auto");
[trained_network, train_info] = trainnet(trainFeatures, trainLabels', net, "crossentropy", options);

Relevant variable dimensions are as follows:

>> unique(ads.Labels)
ans = 
  2×1 categorical array
     negative 
     positiveNoisy 
    
>> size(trainLabels)
ans =
           1       16240
           
>> size(trainFeatures)
ans =
          96          64           1       16240
          
>> size(validationLabels)
ans =
           1        6960
           
>> size(validationFeatures)
ans =
          96          64           1        6960

The only real differences between my script and the MATLAB tutorial are that I'm using parallel execution in the training solver, and the datastore outputEnvironment is set to "gpu" . If I set ExecutionEnvironment = "auto" instead of "parallel-auto" and set ads.OutputEnvironment = 'cpu' the error stack is shorter, but the problem is the same:

Error using trainnet (line 46)
Number of channels in predictions (2) must match the number of channels in
the targets (3).
Error in train_DenseNet_detector_from_semi_synthetic_dataset (line 189)
[trained_network, train_info] = trainnet(trainFeatures, trainLabels', net, "crossentropy", options);

Please could someone give me some advice? The root cause of this is buried in the deep learning toolbox, and it's a little beyond me right now.

Thanks,

Ben

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Joss Knight on 3 Oct 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2157235-retraining-yamnet-for-audio-classification-returns-channel-mismatch-error-in-deep-internal-train-tr#answer_1526415

Open in MATLAB Online

I think the issue will be that your label data is a categorical type with three categories. Run

categories(trainLabels)

to confirm. You might need to delete the unused category using removecats.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Joss Knight on 3 Oct 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2157235-retraining-yamnet-for-audio-classification-returns-channel-mismatch-error-in-deep-internal-train-tr#answer_1526390

It looks like your network is returning output with three channels instead of two. Could you try running analyzeNetwork(net) to see what it is outputting?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 3

Ben on 3 Oct 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2157235-retraining-yamnet-for-audio-classification-returns-channel-mismatch-error-in-deep-internal-train-tr#answer_1526405

YAMNet_networkAnalysis.mat

Hi Joss, thanks for your speedy reply.

The last layer is a softmax layer with activations 2(C) x 1(B) and zero learnables. The complete dlnetwork analysis is attached.

It all looks correct, no?

4 Comments
Show 2 older commentsHide 2 older comments

Ben on 3 Oct 2024

Open in MATLAB Online

>> Zpredict = predict(net, trainFeatures(:,:,:,1:miniBatchSize));
Error using dlnetwork/predict (line 658)
Uninitialized dlnetwork object. Use the initialize function to initialize
network before calling predict.

I have been trying to modify the YAMNet manually, rather than replying on audioPretrainedNetwork(), and I think I've found a bug in subset()...

My audioDatastore is created from a folder which has three sub-directories called "negative", "positiveClean" and "positiveNoisy", and uses those folder names as labels. The files in each folder also contain these labels in their file names (except they are all lower case, with an underscore in the middle).

This script has two modes, one of which does not use the data in the folder "positiveClean". Currently I am removing that data with the subset() function as per the example here subset:

% Build audioDataStore object containing training datasets
ads = audioDatastore(trainingDataPath, "IncludeSubfolders", true,...
    "FileExtensions",".wav", "LabelSource", "foldernames", ...
    OutputDataType="single");
% Choose which audio files to include based on the script mode:
if mode == 1
    % Get logical index for all filenames that do NOT contain 'positive_clean'
    NotPositiveClean = cellfun(@(c) ~contains(c, 'positive_clean'), ads.Files);
    % Create a subset of the datastore containg all files that are not 'positiveClean'
    ads = subset(ads, NotPositiveClean);
    disp("Training for detection tasks only. Excluding 'positiveClean' folder.")
elseif mode == 2
    disp("Training for both detection and denoising tasks...")  
end
% Split the full dataset into training data and validation data.
split = [trainPercentage/100, (100-trainPercentage)/100];
[ads_train, ads_validation] = splitEachLabel(ads, split(1), split(2));

I am then preprocessing, then modifying the network as follows:

% Load the pretrained network
[net, ~] = audioPretrainedNetwork("yamnet");
% Convert the network to a layer graph
lgraph = layerGraph(net);
% Remove the last layers
layersToRemove = {'dense', 'softmax'};
lgraph = removeLayers(lgraph, layersToRemove);
% Create a classification layer
newClassLayer = classificationLayer('Name', 'new_classoutput', 'Classes', classNames)

The "newClassLayer" contains 3 labels, despite being built directly from "classNames", which appears to contain only two classes:

>> newClassLayer
newClassLayer = 
  ClassificationOutputLayer with properties:
            Name: 'new_classoutput'
         Classes: [negative    positiveClean    positiveNoisy]
    ClassWeights: 'none'
      OutputSize: 3
   Hyperparameters
    LossFunction: 'crossentropyex'  
>> classNames
classNames = 
  2×1 categorical array
     negative 
     positiveNoisy 
 

Remembering here that classNames is built from ads_train.

classNames = unique(ads_train.Labels);
numClasses = numel(classNames);

Something is very fishy here...

Joss Knight on 3 Oct 2024

Edited: Joss Knight on 3 Oct 2024

Open in MATLAB Online

Yes I see. It's not enough just to remove instances of one of the classes from the data, because that class is still one of the label categories. You are going to need to remove that category from your target using removecats, see my other Answer.

To simplify:

% Create label data with 3 classes
randomLabels = categorical(randi(3, 1, 100));
mycats = categories(randomLabels) % 3 categories, '1', '2' and '3'
mycats = 3x1 cell array
    {'1'}
    {'2'}
    {'3'}
% Remove all the '2's
randomLabels(randomLabels==mycats(2)) = [];
mycats = categories(randomLabels) % Still 3 categories!
mycats = 3x1 cell array
    {'1'}
    {'2'}
    {'3'}
% Remove the '2' category from the data
randomLabels = removecats(randomLabels, mycats(2));
mycats = categories(randomLabels) % Now there's only '1' and '3'
mycats = 2x1 cell array
    {'1'}
    {'3'}

Sign in to comment.

Answer 4

Ben on 4 Oct 2024

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/2157235-retraining-yamnet-for-audio-classification-returns-channel-mismatch-error-in-deep-internal-train-tr#answer_1526740

Edited: Ben on 4 Oct 2024

Open in MATLAB Online

Ok great, thank you Joss.

I have resolved my issue by using removecats() inside the conditional statement that removes my unused data from the dataset.

    % Get logical index for all the files we want to keep
    keepIdx = cellfun(@(c)...
        ~contains(c, 'filename_substring_indicating_unwanted_file'), ads.Files);
    % Create a subset of the datastore containg all files shown as "true" in keepIdx
    ads_subset = subset(ads, keepIdx);
    ads_subset.Labels = removecats(ads_subset.Labels, ...
        "label_associated_with_removed_files");
    

This really does seem like a bug, or at the very leasy, an undocumented quirk of subset(). The example in the subset function's official documentation does not show this additional step being necessary, and removecats() is not referenced anywhere in that page.

The order of relevant operations in my code is as follows:

build "ads"
"ads" contains 600 files, 600 labels, 3 unique
categories(ads.Labels) = 3x1 cell array {'negative'}{'positiveClean'}{'positiveNoisy'}
Get indices "keepIdx" of files in "ads" with label {'negative'} or {'positiveNoisy'}
Create new datastore "ads_subset" using subset() and "keepIdx"
"ads_subset" contains 400 files, 400 labels, 2 unique
categories(ads_subset.Labels) = 3x1 cell array {'negative'}{'positiveClean'}{'positiveNoisy'}

and I think most would agree this is illogical and unexected behaviour. Is it possible to log a bug fix for this?

Additionally, to improve clarity on how to troubleshoot this kind of issue, could I suggest that the tutorial for YAMNet Transfer Learning on this page might be best to set network class size as:

numClasses = numel(categories(adsTrain.Labels));
net = audioPretrainedNetwork("yamnet",NumClasses=numClasses);

where currently, it shows:

classNames = unique(adsTrain.Labels);
numClasses = numel(classNames);
net = audioPretrainedNetwork("yamnet",NumClasses=numClasses);

Many thanks again for your help :)

2 Comments
Show NoneHide None

Joss Knight on 4 Oct 2024

I'll pass on your comments.

I don't think this is quite as clearcut as you make out. You have asked your underlying datastore to use the folder names as the label source; this information is gathered on construction of the original audioDatastore. subset() shouldn't be making any assumptions about your choice of labels subsequently. You may have removed all the data from one class because you want to fine-tune your model to favour other classes, or for many other reasons. Or put it another way, if you had a model that accepted data from a datastore, it should also support data from a subset of that datastore; but if you pruned any missing classes, it wouldn't.

Nevertheless you raise some interesting points, in particular your point about using numel(categories(...)) instead of unique is a very good one.

Thanks.

Ben on 4 Oct 2024

Ah yep, I see your point.

Thanks again for your help!

Sign in to comment.

Retraining YAMNet for audio classification returns channel mismatch error in "deep.internal.train.Trainer/train"

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (3)

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments

2 Comments
Show NoneHide None

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Retraining YAMNet for audio classification returns channel mismatch error in "deep.inte​rnal.train​.Trainer/t​rain"

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (3)

0 Comments Show -2 older commentsHide -2 older comments

4 Comments Show 2 older commentsHide 2 older comments

2 Comments Show NoneHide None

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Retraining YAMNet for audio classification returns channel mismatch error in "deep.internal.train.Trainer/train"

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments

2 Comments
Show NoneHide None