stretchAudio
Description
Examples
Apply TSM
Read in an audio signal. Listen to the audio signal and plot it over time.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav"); t = (0:size(audioIn,1)-1)/fs; plot(t,audioIn) xlabel('Time (s)') ylabel('Amplitude') title('Original Signal') axis tight grid on
sound(audioIn,fs)
Use stretchAudio
to apply a 1.5 speedup factor. Listen to the modified audio signal and plot it over time. The sample rate remains the same, but the duration of the signal has decreased.
audioOut = stretchAudio(audioIn,1.5); t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, Speedup Factor = 1.5') axis tight grid on
sound(audioOut,fs)
Slow down the original audio signal by a 0.75 factor. Listen to the modified audio signal and plot it over time. The sample rate remains the same as the original audio, but the duration of the signal has increased.
audioOut = stretchAudio(audioIn,0.75); t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, Speedup Factor = 0.75') axis tight grid on
sound(audioOut,fs)
Apply TSM to Frequency-Domain Audio
stretchAudio
supports TSM on frequency-domain audio when using the default vocoder method. Applying TSM to frequency-domain audio enables you to reuse your STFT computation for multiple TSM factors.
Read in an audio signal. Listen to the audio signal and plot it over time.
[audioIn,fs] = audioread('FemaleSpeech-16-8-mono-3secs.wav'); sound(audioIn,fs) t = (0:size(audioIn,1)-1)/fs; plot(t,audioIn) xlabel('Time (s)') ylabel('Amplitude') title('Original Signal') axis tight grid on
Convert the audio signal to the frequency domain.
win = sqrt(hann(256,'periodic')); ovrlp = 192; S = stft(audioIn,'Window',win,'OverlapLength',ovrlp,'Centered',false);
Speed up the audio signal by a factor of 1.4. Specify the window and overlap length used to create the frequency-domain representation.
alpha = 1.4; audioOut = stretchAudio(S,alpha,'Window',win,'OverlapLength',ovrlp); sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, TSM Factor = 1.4') axis tight grid on
Slow down the audio signal by a factor of 0.8. Specify the window and overlap length used to create the frequency-domain representation.
alpha = 0.8; audioOut = stretchAudio(S,alpha,'Window',win,'OverlapLength',ovrlp); sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, TSM Factor = 0.8') axis tight grid on
Increase Fidelity Using Phase-Locking
The default TSM method (vocoder) enables you to additionally apply phase-locking to increase the fidelity to the original audio.
Read in an audio signal. Listen to the audio signal and plot it over time.
[audioIn,fs] = audioread("SpeechDFT-16-8-mono-5secs.wav"); sound(audioIn,fs) t = (0:size(audioIn,1)-1)/fs; plot(t,audioIn) xlabel('Time (s)') ylabel('Amplitude') title('Original Signal') axis tight grid on
Phase-locking adds a nontrivial computational load to TSM and is not always required. By default, phase-locking is disabled. Apply a speedup factor of 1.8 to the input audio signal. Listen to the audio signal and plot it over time.
alpha = 1.8; tic audioOut = stretchAudio(audioIn,alpha); processingTimeWithoutPhaseLocking = toc
processingTimeWithoutPhaseLocking = 0.0798
sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, alpha = 1.8, LockPhase = false') axis tight grid on
Apply the same 1.8 speedup factor to the input audio signal, this time enabling phase-locking. Listen to the audio signal and plot it over time.
tic
audioOut = stretchAudio(audioIn,alpha,"LockPhase",true);
processingTimeWithPhaseLocking = toc
processingTimeWithPhaseLocking = 0.1154
sound(audioOut,fs) t = (0:size(audioOut,1)-1)/fs; plot(t,audioOut) xlabel('Time (s)') ylabel('Amplitude') title('Modified Signal, alpha = 1.8, LockPhase = true') axis tight grid on
Increase Fidelity Using WSOLA Delta
The waveform similarity overlap-add (WSOLA) TSM method enables you to specify the maximum number of samples to search for the best signal alignment. By default, WSOLA delta is the number of samples in the analysis window minus the number of samples overlapped between adjacent analysis windows. Increasing the WSOLA delta increases the computational load but might also increase fidelity.
Read in an audio signal. Listen to the first 10 seconds of the audio signal.
[audioIn,fs] = audioread('RockGuitar-16-96-stereo-72secs.flac');
sound(audioIn(1:10*fs,:),fs)
Apply a TSM factor of 0.75 to the input audio signal using the WSOLA method. Listen to the first 10 seconds of the resulting audio signal.
alpha = 0.75; tic audioOut = stretchAudio(audioIn,alpha,"Method","wsola"); processingTimeWithDefaultWSOLADelta = toc
processingTimeWithDefaultWSOLADelta = 19.4403
sound(audioOut(1:10*fs,:),fs)
Apply a TSM factor of 0.75 to the input audio signal, this time increasing the WSOLA delta to 1024. Listen to the first 10 seconds of the resulting audio signal.
tic audioOut = stretchAudio(audioIn,alpha,"Method","wsola","WSOLADelta",1024); processingTimeWithIncreasedWSOLADelta = toc
processingTimeWithIncreasedWSOLADelta = 25.5306
sound(audioOut(1:10*fs,:),fs)
Input Arguments
audioIn
— Input signal
column vector | matrix | 3-D array
Input signal, specified as a column vector, matrix, or 3-D array. How the function
interprets audioIn
depends on the complexity of
audioIn
and the value of Method
:
If
audioIn
is real,audioIn
is interpreted as a time-domain signal. In this case,audioIn
must be a column vector or matrix. Columns are interpreted as individual channels.This syntax applies when
Method
is set to'vocoder'
or'wsola'
.If
audioIn
is complex,audioIn
is interpreted as a frequency-domain signal. In this case,audioIn
must be an L-by-M-by-N array, where L is the FFT length, M is the number of individual spectra, and N is the number of channels.This syntax only applies when
Method
is set to'vocoder'
.
Data Types: single
| double
Complex Number Support: Yes
alpha
— TSM factor
positive scalar
TSM factor, specified as a positive scalar.
Data Types: single
| double
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: 'Window',kbdwin(512)
Method
— Method used to time-scale audio
'vocoder'
(default) | 'wsola'
Method used to time-scale audio, specified as the comma-separated pair consisting
of 'Method'
and 'vocoder'
or
'wsola'
. Set 'Method'
to
'vocoder'
to use the phase vocoder method. Set
'Method'
to 'wsola'
to use the WSOLA
method.
If 'Method'
is set to 'vocoder'
,
audioIn
can be real or complex. If 'Method'
is set to 'wsola'
, audioIn
must be
real.
Data Types: single
| double
Window
— Window applied in time domain
sqrt(hann(1024,'periodic'))
(default) | real vector
Window applied in the time domain, specified as the comma-separated pair
consisting of 'Window'
and a real vector. The number of elements in
the vector must be in the range [1,
size(
]. The number of elements in
the vector must also be greater than audioIn
,1)OverlapLength
.
Note
If using stretchAudio
with frequency-domain input, you must
specify Window
as the same window used to transform
audioIn
to the frequency domain.
Data Types: single
| double
OverlapLength
— Number of samples overlapped between adjacent windows
round(0.75*numel(Window
))
(default) | scalar in the range [0
numel(Window
)
)
Window
))Window
)Number of samples overlapped between adjacent windows, specified as the
comma-separated pair consisting of 'OverlapLength'
and an integer
in the range [0, numel(Window)
).
Note
If using stretchAudio
with frequency-domain input, you must
specify OverlapLength
as the same overlap length used to
transform audioIn
to a time-frequency representation.
Data Types: single
| double
LockPhase
— Apply identity phase-locking
false
(default) | true
Apply identity phase-locking, specified as the comma-separated pair consisting of
'LockPhase'
and false
or
true
.
Dependencies
To enable this name-value pair argument, set Method
to
'vocoder'
.
Data Types: logical
WSOLADelta
— Maximum samples used to search for best signal alignment
numel(Window
)-OverlapLength
(default) | nonnegative scalar
Window
)-OverlapLength
Maximum number of samples used to search for the best signal alignment, specified
as the comma-separated pair consisting of 'WSOLADelta'
and a
nonnegative scalar.
Dependencies
To enable this name-value pair argument, set Method
to
'wsola'
.
Data Types: single
| double
Output Arguments
audioOut
— Time-scale modified audio
column vector | matrix
Time-scale modified audio, returned as a column vector or matrix of independent channels.
Algorithms
Phase Vocoder
The phase vocoder algorithm is a frequency-domain approach to TSM [1][2]. The basic steps of the phase vocoder algorithm are:
The algorithm windows a time-domain signal at interval η, where
η = numel(
. The windows are then converted to the frequency domain.Window
) -OverlapLength
To preserve horizontal (across time) phase coherence, the algorithm treats each bin as an independent sinusoid whose phase is computed by accumulating the estimates of its instantaneous frequency.
To preserve vertical (across an individual spectrum) phase coherence, the algorithm locks the phase advance of groups of bins to the phase advance of local peaks. This step only applies if
LockPhase
is set totrue
.The algorithm returns the modified spectrogram to the time domain, with windows spaced at intervals of δ, where δ ≈ η/α. α is the speedup factor specified by the
alpha
input argument.
WSOLA
The WSOLA algorithm is a time-domain approach to TSM [1][2]. WSOLA is an extension of
the overlap and add (OLA) algorithm. In the OLA algorithm, a time-domain signal is windowed
at interval η, where η = numel(
. To construct the time-scale modified
output audio, the windows are spaced at interval δ, where δ ≈ η/α. α is the TSM factor
specified by the Window
) -
OverlapLength
alpha
input argument.
The OLA algorithm does a good job of recreating the magnitude spectra but can introduce
phase jumps between windows. The WSOLA algorithm attempts to smooth the phase jumps by
searching WSOLADelta
samples around the η interval for a window that
minimizes phase jumps. The algorithm searches for the best window iteratively, so that each
successive window is chosen relative to the previously selected window.
If WSOLADelta
is set to 0
, then the algorithm
reduces to OLA.
References
[1] Driedger, Johnathan, and Meinard Müller. "A Review of Time-Scale Modification of Music Signals." Applied Sciences. Vol. 6, Issue 2, 2016.
[2] Driedger, Johnathan. "Time-Scale Modification Algorithms for Music Audio Signals", Master's thesis, Saarland University, Saarbrücken, Germany, 2011.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
Usage notes and limitations:
Method
must be set to'vocoder'
.LockPhase
must be set tofalse
.Using
gpuArray
(Parallel Computing Toolbox) input withstretchAudio
is only recommended for a GPU with compute capability 7.0 ("Volta") or above. Other hardware might not offer any performance advantage. To check your GPU compute capability, seeComputeCompability
in the output from thegpuDevice
(Parallel Computing Toolbox) function. For more information, see GPU Computing Requirements (Parallel Computing Toolbox).
For an overview of GPU usage in MATLAB®, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2019b
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)