mfcc
Extract MFCC, log energy, delta, and delta-delta of audio signal
Syntax
Description
specifies options using one or more name-value arguments.coeffs
= mfcc(___,Name=Value
)
Example: coeffs = mfcc(audioIn,fs,LogEnergy="replace")
returns
mel-frequency cepstral coefficients for the audio input signal sampled at
fs
Hz. The first coefficient in the coeffs
vector is replaced with the log energy value.
[
also returns the delta, delta-delta, and location of samples corresponding to each
window of data. You can specify an input combination from any of the previous
syntaxes.coeffs
,delta
,deltaDelta
,loc
] = mfcc(___)
mfcc(___)
with no output arguments plots the
mel-frequency cepstral coefficients. Before plotting, the coefficients are
normalized to have mean 0 and standard deviation 1.
If the input is in the time domain, the coefficients are plotted against time.
If the input is in the frequency domain, the coefficients are plotted against frame number.
If the log energy is extracted, then it is also plotted.
Examples
Compute Mel Frequency Cepstral Coefficients
Compute the mel frequency cepstral coefficients of a speech signal using the mfcc
function. The function returns delta
, the change in coefficients, and deltaDelta
, the change in delta values. The log energy value that the function computes can prepend the coefficients vector or replace the first element of the coefficients vector. This is done based on whether you set the LogEnergy
argument to "append"
or "replace"
.
Read an audio signal from the Counting-16-44p1-mono-15secs.wav
file using the audioread
function. The mfcc
function processes the entire speech data in a batch. Based on the number of input rows, the window length, and the overlap length, mfcc
partitions the speech into 1551 frames and computes the cepstral features for each frame. Each row in the coeffs
matrix corresponds to the log-energy value followed by the 13 mel-frequency cepstral coefficients for the corresponding frame of the speech file. The function also computes loc
, the location of the last sample in each input frame.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
[coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs);
Plot the normalized coefficients.
mfcc(audioIn,fs)
Extract MFCC from Frequency-Domain Audio
Read in an audio file and convert it to a frequency representation.
[audioIn,fs] = audioread("Rainbow-16-8-mono-114secs.wav"); win = hann(1024,"periodic"); S = stft(audioIn,"Window",win,"OverlapLength",512,"Centered",false);
To extract the mel-frequency cepstral coefficients, call mfcc
with the frequency-domain audio. Ignore the log-energy.
coeffs = mfcc(S,fs,"LogEnergy","Ignore");
In many applications, MFCC observations are converted to summary statistics for use in classification tasks. Plot a probability density function for one of the mel-frequency cepstral coefficients to observe its distributions.
nbins = 60; coefficientToAnalyze = 4; histogram(coeffs(:,coefficientToAnalyze+1),nbins,"Normalization","pdf") title(sprintf("Coefficient %d",coefficientToAnalyze))
Input Arguments
audioIn
— Input signal
vector | matrix | 3-D array
Input signal, specified as a vector, matrix, or 3-D array.
If
audioIn
is real, it is interpreted as a time-domain signal and must be a column vector or a matrix. Columns of the matrix are treated as independent audio channels.If
audioIn
is complex, it is interpreted as a frequency-domain signal. In this case,audioIn
must be an L-by-M-by-N array, where L is the number of DFT points, M is the number of individual spectra, and N is the number of individual channels.
Data Types: single
| double
Complex Number Support: Yes
fs
— Sample rate (Hz)
positive scalar
Sample rate of the input signal in Hz, specified as a positive scalar.
Data Types: single
| double
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: [coeffs,delta,deltaDelta,loc] =
mfcc(audioIn,fs,LogEnergy="replace",DeltaWindowLength=5)
returns mel
frequency cepstral coefficients for the audio input signal sampled at
fs
Hz. The first coefficient in the coeffs
vector is replaced with the log energy value. A set of 5 cepstral coefficients is
used to compute the delta and the delta-delta values.
Window
— Window applied in time domain
hamming(round(0.03*fs),"periodic")
(default) | vector
Window applied in time domain, specified as a real vector. The number
of elements in the vector must be in the range
[1,size(
]. The
number of elements in the vector must also be greater than
audioIn
,1)OverlapLength
.
Data Types: single
| double
OverlapLength
— Number of overlapping samples between adjacent windows
round(fs
*0.02)
(default) | integer
fs
*0.02)NumCoeffs
— Number of coefficients returned
13
(default) | positive scalar integer
Number of coefficients returned for each window of data, specified as an integer in the range [2 v], where v is the number of valid passbands.
The number of valid passbands is defined as sum(BandEdges
<= floor(fs/2))-2
. A passband is valid if its edges
fall below fs/2
, where fs
is the
sample rate of the input audio signal, specified as the second argument,
fs
.
Data Types: single
| double
BandEdges
— Band edges of filter bank (Hz)
row vector
Band edges of the filter bank in Hz, specified as a nonnegative
monotonically increasing row vector in the range [0,
fs
/2]. The number of band edges must be in the
range [4, 160]. The mfcc
function designs
half-overlapped triangular filters based on
BandEdges
. This means that all band edges,
except for the first and last, are also center frequencies of the
designed bandpass filters.
By default, BandEdges
is a 42-element vector,
which results in a 40-band filter bank that spans approximately 133 Hz
to 6864 Hz. The default bands are spaced as described in [2].
Data Types: single
| double
FFTLength
— Number of bins for calculating DFT
numel(Window
)
(default) | positive scalar integer
Window
)Number of bins used to calculate the discrete Fourier transform (DFT)
of windowed input samples. The FFT length must be greater than or equal
to the number of elements in the Window
.
Data Types: single
| double
Rectification
— Type of non-linear rectification
"log"
(default) | "cubic-root"
Type of nonlinear rectification applied prior to the discrete cosine
transform, specified as "log"
or
"cubic-root"
.
Data Types: char
| string
DeltaWindowLength
— Number of coefficients for calculating delta and delta-delta
9
(default) | odd integer greater than 2
Number of coefficients used to calculate the delta and the delta-delta
values, specified as an odd integer greater than two. If unspecified,
DeltaWindowLength
defaults to
9
.
Deltas are computed using the audioDelta
function.
Data Types: single
| double
LogEnergy
— Specify how the log energy is shown
"append"
(default) | "replace"
| "ignore"
Specify how the log energy is shown in the coefficients vector output, specified as:
"append"
–– The function prepends the log energy to the coefficients vector. The length of the coefficients vector is 1 +NumCoeffs
."replace"
–– The function replaces the first coefficient with the log energy of the signal. The length of the coefficients vector isNumCoeffs
."ignore"
–– The object does not calculate or return the log energy.
Data Types: char
| string
Output Arguments
coeffs
— Mel-frequency cepstral coefficients (MFCCs)
matrix | 3-D array
Mel-frequency cepstral coefficients, returned as an L-by-M matrix or an L-by-M-by-N array, where:
L –– Number of analysis windows the audio signal is partitioned into. The input size,
Window
, andOverlapLength
control this dimension:L = floor((size(
.audioIn
,1) − numel(Window
)))/(numel(Window)
−OverlapLength
) + 1M –– Number of coefficients returned per frame. This value is determined by
NumCoeffs
andLogEnergy
.When
LogEnergy
is set to:"append"
–– The function prepends the log energy value to the coefficients vector. The length of the coefficients vector is 1 +NumCoeffs
."replace"
–– The function replaces the first coefficient with the log energy of the signal. The length of the coefficients vector isNumCoeffs
."ignore"
–– The function does not calculate or return the log energy. The length of the coefficients vector isNumCoeffs
.
N –– Number of input channels (columns). This value is
size(
.audioIn
,2)
Data Types: single
| double
delta
— Change in coefficients
matrix | array
Change in coefficients from one frame of data to another, returned as an
L-by-M matrix or an
L-by-M-by-N
array. The delta
array is the same size and data type
as the coeffs
array.
Data Types: single
| double
loc
— Location of the last sample in each input frame
vector
Location of last sample in each analysis window, returned as a column
vector with the same number of rows as coeffs
.
Data Types: single
| double
Algorithms
MFCC
Mel-frequency cepstrum coefficients are popular features extracted from speech signals for use in recognition tasks. In the source-filter model of speech, cepstral coefficients are understood to represent the filter (vocal tract). The vocal tract frequency response is relatively smooth, whereas the source of voiced speech can be modeled as an impulse train. As a result, the vocal tract can be estimated by the spectral envelope of a speech segment.
The motivating idea of mel-frequency cepstral coefficients is to compress information about the vocal tract (smoothed spectrum) into a small number of coefficients based on an understanding of the cochlea. Although there is no hard standard for calculating the coefficients, the basic steps are outlined by the diagram.
The default mel filter bank linearly spaces the first 10 triangular filters and logarithmically spaces the remaining filters.
Log Energy
The information contained in the zeroth mel-frequency cepstral coefficient is often augmented with or replaced by the log energy. The log energy calculation depends on the input domain.
If the input (audioIn) is a time-domain signal, the log energy is computed using the following equation:
If the input (audioIn) is a frequency-domain signal, the log energy is computed using the following equation:
References
[1] Rabiner, Lawrence R., and Ronald W. Schafer. Theory and Applications of Digital Speech Processing. Upper Saddle River, NJ: Pearson, 2010.
[2] Auditory Toolbox. https://engineering.purdue.edu/~malcolm/interval/1998-010/AuditoryToolboxTechReport.pdf
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
Version History
Introduced in R2018aR2024b: WindowLength
has been removed
The WindowLength
parameter has been removed from the
mfcc
function. Use the Window
parameter instead.
In releases prior to R2020b, you could only specify the length of a time-domain window. The window was always designed as a periodic Hamming window. You can replace instances of the code
coeffs = mfcc(audioin,fs,WindowLength=1024);
coeffs = mfcc(audioIn,fs,Window=hamming(1024,"periodic"));
R2020b: Delta and delta-delta computation
The delta and delta-delta calculations are now computed using the audioDelta
function, which has a different startup behavior than the
previous algorithm. The default value of the DeltaWindowLength
parameter has changed from 2
to 9
. A delta
window length of 2
is no longer supported.
R2020b: WindowLength
will be removed in a future release
The WindowLength
parameter will be removed from the
mfcc
function in a future release.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)