Main Content

Pretrained Models

Transfer learning, sound classification, feature embeddings, pretrained audio deep learning networks

Audio Toolbox™ provides MATLAB® and Simulink® support for pretrained audio deep learning networks. Locate and classify sounds with YAMNet and estimate pitch with CREPE. Extract VGGish or OpenL3 feature embeddings to input to machine learning and deep learning systems. Use i-vector systems to produce compact representations of audio signals for applications such as speaker recognition, verification, identification, and diarization. Use detectspeechnn to perform voice activity detection (VAD).

Using pretrained deep learning networks requires Deep Learning Toolbox™. The Audio Toolbox pretrained networks are available in Deep Network Designer (Deep Learning Toolbox).

Functions

expand all

vggishEmbeddingsExtract VGGish feature embeddings (Since R2022a)
vggishVGGish neural network (Since R2020b)
vggishPreprocessPreprocess audio for VGGish feature extraction (Since R2021a)
classifySoundClassify sounds in audio signal (Since R2020b)
yamnetYAMNet neural network (Since R2020b)
yamnetGraphGraph of YAMNet AudioSet ontology (Since R2020b)
yamnetPreprocessPreprocess audio for YAMNet classification (Since R2021a)
openl3EmbeddingsExtract OpenL3 feature embeddings (Since R2022a)
openl3OpenL3 neural network (Since R2021a)
openl3PreprocessPreprocess audio for OpenL3 feature extraction (Since R2021a)
pitchnnEstimate pitch with deep learning neural network (Since R2021a)
crepeCREPE neural network (Since R2021a)
crepePreprocessPreprocess audio for CREPE deep learning network (Since R2021a)
crepePostprocessPostprocess output of CREPE deep learning network (Since R2021a)
speakerRecognitionPretrained speaker recognition system (Since R2021b)
ivectorSystemCreate i-vector system (Since R2021a)
detectspeechnnDetect boundaries of speech in audio signal using AI (Since R2023a)
vadnetVoice activity detection (VAD) neural network (Since R2023a)
vadnetPreprocessPreprocess audio for voice activity detection (VAD) network (Since R2023a)
vadnetPostprocessPostprocess frame-based VAD probabilities (Since R2023a)

Blocks

expand all

VGGish EmbeddingsExtract VGGish embeddings (Since R2022a)
VGGish PreprocessPreprocess audio for VGGish feature extraction (Since R2022a)
VGGishVGGish embeddings extraction network (Since R2022a)
Sound ClassifierClassify sounds in audio signal (Since R2021b)
YAMNetYAMNet sound classification network (Since R2021b)
YAMNet PreprocessPreprocess audio for YAMNet classification (Since R2021b)
OpenL3 EmbeddingsExtract OpenL3 embeddings (Since R2022b)
OpenL3 PreprocessPreprocess audio for OpenL3 embeddings extraction (Since R2022b)
OpenL3OpenL3 embeddings extraction network (Since R2022b)
Deep Pitch EstimatorEstimate pitch with CREPE deep learning neural network (Since R2023a)
CREPECREPE deep pitch estimation neural network (Since R2023a)
CREPE PreprocessPreprocess audio for CREPE deep pitch estimation (Since R2023a)
CREPE PostprocessPostprocess output of CREPE pitch estimation network (Since R2023a)

Apps

Deep Network DesignerDesign, visualize, and train deep learning networks

Topics