Audio Toolbox

Design and analyze speech, acoustic, and audio processing systems


Audio Toolbox™ provides tools for audio processing, speech analysis, and acoustic measurement. It includes algorithms for audio signal processing (such as equalization and dynamic range control) and acoustic measurement (such as impulse response estimation, octave filtering, and perceptual weighting). It also provides algorithms for audio and speech feature extraction (such as MFCC and pitch) and audio signal transformation (such as gammatone filter bank and Mel-spaced spectrogram).

Toolbox apps support live algorithm testing, impulse response measurement, and audio signal labeling. The toolbox provides streaming interfaces to ASIO, WASAPI, ALSA, and CoreAudio sound cards and MIDI devices, and tools for generating and hosting standard audio plugins such as VST and Audio Units.

With Audio Toolbox you can import, label, and augment audio data sets, as well as extract features and transform signals for machine learning and deep learning. You can prototype audio processing algorithms in real time by streaming low-latency audio while tuning parameters and visualizing signals. You can also validate your algorithm by turning it into an audio plugin to run in external host applications such as Digital Audio Workstations. Plugin hosting lets you use external audio plugins like regular objects to process MATLAB® arrays. Sound card connectivity enables you to run custom measurements on real-world audio signals and acoustic systems.

Get Started:

Audio Streaming with Sound Cards

Connect to standard laptop and desktop sound cards for streaming low-latency multichannel audio between any combination of files and live inputs and outputs.

Connectivity to Standard Audio Drivers

Read and write audio samples from and to sounds cards (such as USB or Thunderbolt™) using standard audio drivers (such as ASIO, WASAPI, CoreAudio, and ALSA) across Windows®, Mac®, and Linux® operating systems.

Multichannel sound cards.

Low-Latency Multichannel Audio Streaming

Process live audio in MATLAB with milliseconds of round-trip latency.

Live raw input from four-channel microphone array.

Machine Learning and Deep Learning

Label, augment, create, and ingest audio and speech datasets, extract features, and compute time-frequency transformations. Develop audio and speech analytics with Statistics and Machine Learning Toolbox™, Deep Learning Toolbox™, or other machine learning tools.

Pre-Trained Deep Learning Models

Use popular deep learning models pre-trained with large audio datasets to carry out complex audio processing tasks – classify sound events in audio recordings with Yamnet and extract audio embeddings with VGGish.

Word cloud displaying the sound types identified by classifySound in a particular audio segment.

Audio and Speech Feature Extraction

Extract low-level features for speech and audio analytics, including Mel frequency cepstral coefficients (MFCC), gammatone cepstral coefficients (GTCC), pitch, harmonicity, and spectral descriptors. Feed deep learning architectures working on time-series, such as those based on LSTM layers.

Select buffering options and features of interest interactively while using Audio Feature Extractor in Live Editor.

Time-Frequency Transformations

Transform signals into time-frequency representations using a modified discrete cosine transform (MDCT), short-time Fourier transform (STFT), or the more compact Mel-spaced spectrogram. Decompose signals by using perceptually-spaced frequency bands that use gammatone filter banks. Feed deep learning models working on two-dimensional data, such as those based on CNN layers.

Live Mel spectrogram of speech commands.

Label and Annotate Audio Datasets

Assign ground-truth labels and annotations to audio recordings and data sets manually and automatically. Detect regions of speech in audio signals. Automate speech transcription using speech-to-text cloud-based services.

Region-of-interest labels in Audio Labeler app.

Ingest Large Audio Datasets

Index and read from large collections of audio recordings using audioDatastore. Randomly split lists of audio files according to labels. Parallelize processing tasks using tall arrays for data augmentation, time-frequency transformations, and feature extraction.

Datastore pointing to the Google speech command dataset.

Augment and Synthesize Audio and Speech Datasets

Set up randomized data augmentation pipelines using combinations of pitch shifting, time stretching, and other audio processing effects. Create synthetic speech recordings from text using text-to-speech cloud-based services.

Formant estimation for timbre-invariant pitch shifting.

Audio Processing Algorithms and Effects

Generate standard waveforms, apply common audio effects, and design audio processing systems with dynamic parameter tuning and live visualization.

Audio Filters and Equalizers

Model and apply parametric EQ, graphic EQ, shelving, and variable-slope filters. Design and simulate digital crossover, octave, and fractional-octave filters.

Interactive tuning of a three-band crossover filter with live visualization.

Dynamic Range Control and Effects

Model and apply dynamic range processing algorithms such as compressor, limiter, expander, and noise gate. Add artificial reverberation with recursive parametric models.

Interactive tuning of the dynamic response of a compressor.

System Simulation with Block Diagrams

Design and simulate system models using libraries of audio processing blocks for Simulink®. Tune parameters and visualize system behavior using interactive controls and dynamic plots.

Detail of a multiband dynamic range compressor model in Simulink.

Real-Time Audio Prototyping

Validate audio processing algorithms with interactive real-time listening tests in MATLAB.

Live Parameter Tuning via User Interfaces

Automatically create user interfaces for tunable parameters of audio processing algorithms. Test individual algorithms with the Audio Test Bench app and tune parameters in running programs with auto-generated interactive controls.

Interactive tuning of a custom three-band parametric EQ using Audio Test Bench.

MIDI Connectivity for Parameter Control and Message Exchange

Interactively change parameters of MATLAB algorithms by using MIDI control surfaces. Control external hardware or respond to events by sending and receiving any type of MIDI message.

MIDI message and audio signal flow written in MATLAB for a musical instrument synthesizer.

Acoustic Measurements and Spatial Audio

Measure system responses, analyze and meter signals, and design spatial audio processing systems.

Standard-Based Metering and Analysis

Apply sound pressure level (SPL) meters and loudness meters to recorded or live signals. Analyze signals with octave and fractional-octave filters. Apply standard-compliant A-, C-, or K-weighting filters to raw recordings.

Visualization of different SPL measurements across two-thirds-octave bands.

Impulse Response Measurement

Measure impulse and frequency responses of acoustic and audio systems with maximum-length sequences (MLS) and exponential swept sinusoids (ESS). Get started with the Impulse Response Measurer app. Automate measurements by programmatically generating excitation signals and estimating system responses.

 Impulse Response Measurer app.

Efficient Convolution with Room Impulse Responses

Convolve signals with long impulse responses efficiently using frequency domain overlap-and-add or overlap-and-save implementations. Trade off latency for computational speed using automatic impulse response partitioning.

Impulse response lasting 5 seconds or over 220k samples at 44100Hz.

Spatial Audio

Encode and decode different ambisonic formats. Interpolate spatially sampled head-related transfer functions (HRTF).

Example of desired sound source position and nearest angles where HRTF measurements are available.

Generate and Host Audio Plugins

Prototype audio processing algorithms written in MATLAB as standard audio plugins; use external audio plugins as regular MATLAB objects.

Generation of Audio Plugins

Generate VST plugins, AU plugins, and standalone executable plugins directly from MATLAB code without requiring manual design of user interfaces. For more advanced plugin prototyping, generate ready-to-build JUCE C++ projects (requires MATLAB Coder™).

Multi-band parametric EQ example: VST plugin generated from MATLAB code and running in REAPER.

Hosting of External Audio Plugins

Use external VST and AU plugins as regular MATLAB objects. Change plugin parameters and programmatically process MATLAB arrays. Alternatively, automate associations of plugin parameters with user interfaces and MIDI controls. Host plugins generated from your MATLAB code for increased execution efficiency.

Example of external VST plugin for audio denoising (Accusonus ERA-N) and programmatic interface in MATLAB.

Target Embedded and Real-Time Audio Systems

Use add-on C-code generation products to implement audio processing designs on software devices and automate connectivity to multichannel audio interfaces.

Low-Cost and Mobile Devices

Prototype audio processing designs on Raspberry Pi™ by using on-board or external multichannel audio interfaces. Create interactive control panels as mobile apps for Android® or iOS devices.

A Raspberry Pi 3 board. 

Zero-Latency Systems

Prototype audio processing designs with single-sample inputs and outputs for adaptive noise control, hearing aid validation, or other applications requiring minimum round-trip DSP latency. Automatically target Speedgoat audio machines and ST Discovery boards directly from Simulink models.

Latest Features

YAMNet Sound Classification

Classify sound recordings using deep learning (Deep Learning Toolbox required)

VGGish Audio Embeddings

Extract high-level audio features using deep learning (Deep Learning Toolbox required)

Generalized Cepstral Coefficients and Delta Features

Compute MFCC, GTCC, BFCC, and other types of cepstral coefficients, auditory spectrograms, and delta features

Octave Analysis for Inaudible Frequencies

Analyze signals with enhanced octave filter designs using octaveFilter, octaveFilterBank, and splMeter

Acoustic Fluctuation

Measure perceived acoustic fluctuation

GPU acceleration for feature extraction

Accelerate additional functions for feature extraction using compatible GPU cards (Parallel Computing Toolbox required)

See release notes for details on any of these features and corresponding functions.