Compute delta features
Delta of Audio Features
Read in an audio file.
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
audioFeatureExtractor object to extract some spectral features over time from the audio. Call
extract to extract the audio features.
afe = audioFeatureExtractor('SampleRate',fs, ... 'spectralCentroid',true, ... 'spectralSlope',true); audioFeatures = extract(afe,audioIn);
audioDelta to approximate the first derivative of the spectral features over time.
deltaAudioFeatures = audioDelta(audioFeatures);
Plot the spectral features and the delta of the spectral features.
map = info(afe); tiledlayout(2,1) nexttile plot(audioFeatures(:,map.spectralCentroid)) ylabel('Spectral Centroid') nexttile plot(deltaAudioFeatures(:,map.spectralCentroid)) ylabel('Delta Spectral Centroid') xlabel('Frame')
tiledlayout(2,1) nexttile plot(audioFeatures(:,map.spectralSlope)) ylabel('Spectral Slope') nexttile plot(deltaAudioFeatures(:,map.spectralSlope)) ylabel('Delta Spectral Slope') xlabel('Frame')
Delta and Delta-Delta of MFCC
The delta and delta-delta of mel frequency cepstral coefficients (MFCC) are often used with the MFCC for machine learning and deep learning applications.
Read in an audio file.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
designAuditoryFilterBank function to design a one-sided frequency-domain mel filter bank.
analysisWindowLength = round(fs*0.03); fb = designAuditoryFilterBank(fs,"FFTLength",analysisWindowLength);
stft function to convert the audio signal to a complex, one-sided frequency-domain representation. Convert the STFT to magnitude and apply the frequency-domain filtering.
[S,~,t] = stft(audioIn,fs,"Window",hann(analysisWindowLength,"periodic"),"FrequencyRange","onesided"); auditorySTFT = fb*abs(S);
cepstralCoefficients function to extract the MFCC.
melcc = cepstralCoefficients(auditorySTFT);
audioDelta function to compute the delta MFCC. Call
audioDelta again to compute the delta-delta MFCC. Plot the results.
deltaWindowLength = 21; melccDelta = audioDelta(melcc,deltaWindowLength); melccDeltaDelta = audioDelta(melccDelta,deltaWindowLength); coefficientToDisplay = 4; tiledlayout(3,1) nexttile plot(t,melcc(:,coefficientToDisplay+1)) ylabel('Coefficient ' + string(coefficientToDisplay)) nexttile plot(t,melccDelta(:,coefficientToDisplay+1)) ylabel('Delta') nexttile plot(t,melccDeltaDelta(:,coefficientToDisplay+1)) xlabel('Time (s)') ylabel('Delta-Delta')
Delta of Streaming Signals
You can calculate the delta of streaming signals by passing state in and out of the
dsp.AudioFileReader object to read an audio file frame-by-frame. Create an
audioDeviceWriter object to write audio to your speaker. Create a
timescope object to visualize the change in harmonic ratio over time.
fileReader = dsp.AudioFileReader("FemaleSpeech-16-8-mono-3secs.wav","SamplesPerFrame",32,"PlayCount",3); deviceWriter = audioDeviceWriter("SampleRate",fileReader.SampleRate); scope = timescope("SampleRate",fileReader.SampleRate/fileReader.SamplesPerFrame, ... "TimeSpanSource","Property", ... "TimeSpan",3, ... "YLimits",[-1,1], ... "Title","Delta of Harmonic Ratio");
While the audio file has unread frames of data:
Read a frame from the audio file
Calculate the harmonic ratio of that frame
Calculate the delta of the harmonic ratio
Write the audio frame to your speaker
Write the change in the harmonic ratio to your scope
On each call to
audioDelta, overwrite the previous state. Initialize the state using an empty array.
z = ; while ~isDone(fileReader) audioIn = fileReader(); hr = harmonicRatio(audioIn,fileReader.SampleRate,"Window",hann(fileReader.SamplesPerFrame,'periodic'),"OverlapLength",0); [deltaHR, z] = audioDelta(hr,5,z); deviceWriter(audioIn); scope(deltaHR) end release(scope)
x — Audio feature
scalar | vector | matrix | array
Audio feature, specified as a scalar, vector, or matrix. Columns of the input are treated as independent channels.
deltaWindowLength — Window length over which to calculate delta
9 (default) | odd integer
Window length over which to calculate delta, specified as an odd integer equal to or greater than 3.
initialCondition — Initial condition of filter
 (default) | vector | matrix | array
Initial condition of the filter used to calculate the delta, specified as a vector,
matrix, or multi-dimensional array. The first dimension of
initialCondition must equal
. The remaining
initialCondition must match the remaining dimensions
of the input
x. The default initial condition,
, is equivalent to initializing the state with all zeros.
delta — Delta of audio features
vector | matrix | array
Delta of audio features, returned as a vector or matrix with the same dimensions as
finalCondition — Final condition of filter
vector | matrix | array
Final condition of filter, returned as a vector, matrix, or multi-dimensional array.
The final condition is returned as the same size as the
The audioDelta function uses a least-squares approximation of the local slope over a region centered on sample x(k), which includes M samples before the current sample and M samples after the current sample.
 Rabiner, Lawrence R., and Ronald W. Schafer. Theory and Applications of Digital Speech Processing. Upper Saddle River, NJ: Pearson, 2010.