Machine Learning and Deep Learning for Audio
Audio Toolbox™ provides functionality to develop machine and deep learning solutions for audio, speech, and acoustic applications including speaker identification, speech command recognition, acoustic scene recognition, and many more.
audioDatastoreto ingest large audio data sets and process files in parallel.
Use Signal Labeler to build audio data sets by annotating audio recordings manually and automatically.
audioDataAugmenterto create randomized pipelines of built-in or custom signal processing methods for augmenting and synthesizing audio data sets.
audioFeatureExtractorto extract combinations of different features while sharing intermediate computations.
Audio Toolbox also provides access to third-party APIs for text-to-speech and speech-to-text, and it includes pretrained VGGish and YAMNet models so that you can perform transfer learning, classify sounds, and extract feature embeddings. Using pretrained networks requires Deep Learning Toolbox™.
- Dataset Management and Labeling
Ingest, create, and label large data sets
- Feature Extraction
Mel spectrogram, MFCC, pitch, spectral descriptors
- Data Augmentation
Augmentation pipelines, shift pitch and time, stretch time, control volume and noise
Detect and isolate speech and other sounds
- Pretrained Models
Transfer learning, sound classification, feature embeddings, pretrained audio deep learning networks
- Speech Transcription and Synthesis
Use a pretrained model or third-party APIs for text-to-speech and speech-to-text
- Code Generation and GPU Support
Generate portable C/C++/MEX functions and use GPUs to deploy or accelerate processing