YAMNet

YAMNet sound classification network

Since R2021b

Libraries:
Audio Toolbox / Deep Learning

Description

The YAMNet block leverages a pretrained sound classification network that is trained on the AudioSet dataset to predict audio events from the AudioSet ontology.

Examples

Compare Sound Classifier block with Equivalent YAMNet blocks

Show that Sound Classifier block is equivalent to the cascade of YAMNet Preprocess block and YAMNet block.

Open Script

Ports

Input

expand all

features — Mel spectrograms
96-by-64 matrix | 96-by-64-by-1-by-N array

Mel spectrograms, specified as a 96-by-64 matrix or a 96-by-64-by-1-by-N array, where:

96 –– Represents the number of 25 ms frames in each mel spectrogram
64 –– Represents the number of mel bands spanning 125 Hz to 7.5 kHz
N –– Number of channels.

You can use the YAMNet Preprocess block to generate mel spectrograms. The dimensions of these spectrograms are 96-by-64.

Data Types: single | double

Output

expand all

sound — Predicted sound label
enumerated scalar

Predicted sound label, returned as an enumerated scalar.

Data Types: enumerated

scores — Predicted activations or scores
vector

Predicted activation or score values for each supported sound label, returned as a 1-by-521 vector, where 521 is the number of classes in YAMNet.

Data Types: single

labels — Class labels for predicted scores
vector

Class labels for predicted scores, returned as a 1-by-521 vector.

Data Types: enumerated

Parameters

expand all

Mini-batch size — Size of mini-batches
`128` (default) | positive integer

Size of mini-batches to use for prediction, specified as a positive integer. Larger mini-batch sizes require more memory, but can lead to faster predictions.

Classification — Select to output sound classification
`on` (default) | `off`

Enable the output port sound, which outputs the classified sound.

Predictions — Output all scores and associated labels
`off` (default) | `on`

Enable the output ports scores and labels, which output all predicted scores and associated class labels.

Block Characteristics

Data Types	`double` \| `single`
Direct Feedthrough	`no`
Multidimensional Signals	`no`
Variable-Size Signals	`no`
Zero-Crossing Detection	`no`

Algorithms

expand all

Prediction

The block accepts mel spectrograms of size 96-by-64 or 96-by-64-by-1-by-N, and computes a maximum of three outputs using these spectrograms:

sound: The label of the most likely sound. You get one "sound" for each 96-by-64 spectrogram input.
scores: 1-by-512 vectors. Each element in the vector is a score value for each supported sound label.
labels: 1-by-521 vectors. Each element in the vector is a sound label.

References

[1] Gemmeke, Jort F., Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. “Audio Set: An Ontology and Human-Labeled Dataset for Audio Events.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2017, pp. 776–80. DOI.org (Crossref), doi:10.1109/ICASSP.2017.7952261.

[2] Hershey, Shawn, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, et al. “CNN Architectures for Large-Scale Audio Classification.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2017, pp. 131–35. DOI.org (Crossref), doi:10.1109/ICASSP.2017.7952132.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Usage notes and limitations:

To generate generic C code that does not depend on third-party libraries, in the Configuration Parameters > Code Generation general category, set the Language parameter to C.
To generate C++ code, in the Configuration Parameters > Code Generation general category, set the Language parameter to C++. To specify the target library for code generation, in the Code Generation > Interface category, set the Target Library parameter. Setting this parameter to None generates generic C++ code that does not depend on third-party libraries.
For a list of networks and layers supported for code generation, see Networks and Layers Supported for Code Generation (MATLAB Coder).

Version History

Introduced in R2021b

YAMNet

Description

Examples

Compare Sound Classifier block with Equivalent YAMNet blocks

Ports

Input

features — Mel spectrograms
96-by-64 matrix | 96-by-64-by-1-by-N array

Output

sound — Predicted sound label
enumerated scalar

scores — Predicted activations or scores
vector

labels — Class labels for predicted scores
vector

Parameters

Mini-batch size — Size of mini-batches
`128` (default) | positive integer

Classification — Select to output sound classification
`on` (default) | `off`

Predictions — Output all scores and associated labels
`off` (default) | `on`

Block Characteristics

Algorithms

Prediction

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Version History

See Also

Apps

Blocks

Functions

YAMNet

Description

Examples

Compare Sound Classifier block with Equivalent YAMNet blocks

Ports

Input

features — Mel spectrograms 96-by-64 matrix | 96-by-64-by-1-by-N array

Output

sound — Predicted sound label enumerated scalar

scores — Predicted activations or scores vector

labels — Class labels for predicted scores vector

Parameters

Mini-batch size — Size of mini-batches 128 (default) | positive integer

Classification — Select to output sound classification on (default) | off

Predictions — Output all scores and associated labels off (default) | on

Block Characteristics

Algorithms

Prediction

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using Simulink® Coder™.

Version History

See Also

Apps

Blocks

Functions

features — Mel spectrograms
96-by-64 matrix | 96-by-64-by-1-by-N array

sound — Predicted sound label
enumerated scalar

scores — Predicted activations or scores
vector

labels — Class labels for predicted scores
vector

Mini-batch size — Size of mini-batches
`128` (default) | positive integer

Classification — Select to output sound classification
`on` (default) | `off`

Predictions — Output all scores and associated labels
`off` (default) | `on`

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.