Main Content

Detect Music in Simulink Using YAMNet

The YAMNet network requires you to preprocess and extract features from audio signals by converting them to the sample rate the network was trained on (16e3 Hz), and then extracting overlapping mel spectrograms. The Sound Classifier block does the required preprocessing and feature extraction that is necessary to match the preprocessing and feature extraction used to train YAMNet.

To use YAMNet, a pretrained YAMNet network must be installed in a location on the MATLAB® path. If a pretrained network is not installed, run the yamnetGraph function and the software provides a download link. Click the link and unzip the file to a location on the MATLAB path.

Alternatively, execute the following commands to download and unzip the YAMNet model to your temporary directory.

downloadFolder = fullfile(tempdir,'YAMNetDownload');
loc = websave(downloadFolder,'https://ssd.mathworks.com/supportfiles/audio/yamnet.zip');
YAMNetLocation = tempdir;
unzip(loc,YAMNetLocation)
addpath(fullfile(YAMNetLocation,'yamnet'))

Get all music sounds in the AudioSet ontology. The ontology covers a wide range of everyday sounds, from human and animal sounds to natural and environmental sounds and to musical and miscellaneous sounds. Use the yamnetGraph function to obtain a graph of the AudioSet ontology and a list of all sounds supported by YAMNet. The dfsearch function returns a vector of 'Music' sounds in the order of their discovery using depth-first search.

[ygraph, allSounds] = yamnetGraph;
musicSounds = dfsearch(ygraph,"Music");

Find the location of these musical sounds in the list of supported sounds.

[~,musicIndices] = intersect(allSounds,musicSounds);

The detectMusic model detects the musical sounds in input audio. Open and run the model. The model starts by reading in an audio signal to classify using two From Multimedia File blocks. The first block reads in a musical sound signal and the second block reads in an ambiance signal that is not music. Both signals have a sample rate of 44100 Hz and contain 441 samples per channel. Using the Manual Switch (Simulink) block, you can choose one of the two signals.

The Sound Classifier block in the model detects the scores and labels of the input audio. The Selector (Simulink) block in the model picks the scores related to music using the vector of indices given by musicIndices. If the maximum value of these scores is greater than 0.2, then the score is related to music. The Scope (Simulink) block plots the maximum value of the score. The Activation dial in the model shows this value as well. Using the Audio Device Writer block, confirm that you hear music when the plot shows a score greater than 0.2

open_system("detectMusic.slx")
sim("detectMusic.slx")

close_system("detectMusic.slx",0)

See Also

Functions

Blocks

Related Topics