Why does the spectralCentroid function in MATLAB produce such different frequency ranges from the Python Librosa equivalent function?

6 views (last 30 days)
Hi,
I am conducting a spectral centroid analysis on a relatively short sample of a drumstick hitting a drum. I have different samples that I want to compare to each other, essentially to observe how the timbre of the drum changes.
The problem arises when I use two different libraries and languages to conduct the same spectral centroid analysis.
First, some information on the sample and parameters:
  • Audio involves a bouncing drumstick - repeating percussive sounds that get exponentially close together (about 8 distinct hits can be heard)
  • .wav files come from stereo recordings through a sound pressure meter (doubling as a live microphone) using the software Wavepad Sound Editor (Norsonic)
  • .wav files are then sampled at 44100 Hz through MATLAB and Librosa audio reading functions.
  • The .wav files are 2.5 to 3 seconds long, which works out to a sample size of 100-130k data points
  • The samples are NOT normalised
  • Besides sample rate, all other parameters are default according to the respective functions (e.g. window type, overlap; although some parameters can be found in one but not the other)
MATLAB Audio Toolbox
I used the MATLAB Audio Toolbox spectralCentroid function, following the example found in https://www.mathworks.com/help/audio/ref/spectralcentroid.html .
Here is my code:
%time
[audioIn,fs] = audioread('C:\Users\Yixuan\Desktop\Collab\BARE.wav'); %y=samples, fs=sampling frequency
L = length(audioIn); %length
whos audioIn;
whos fs;
Ts=1/fs; %Sampling period
%spectral centroid
centroid = spectralCentroid(audioIn,fs);
t = linspace(0,size(audioIn,1)/fs,size(centroid,1));
%plot graph
figure(1);
plot(t,centroid)
xlabel('Time (s)')
ylabel('Centroid (Hz)')
title('Spectral Centroid in Time Domain');
Python Librosa
I then used the same .wav files with the Librosa spectralCentroid function (https://librosa.github.io/librosa/generated/librosa.feature.spectral_centroid.html). I did it on Jupyter Notebook. The exact code I used follows this tutorial https://towardsdatascience.com/extract-features-of-music-75a3f9bc265d.
Here is the code I used (Python, not MATLAB)
audio_path1 = "C:\Users\Yixuan\Desktop\Collab\BARE.wav"
x1 , sr1 = librosa.load(audio_path1, sr = 96000)
spectral_centroids1 = librosa.feature.spectral_centroid(x1, sr=sr1)[0]
frames1 = range(len(spectral_centroids1))
t1 = librosa.frames_to_time(frames1, sr=sr1)
Then I used pandas DataFrame to plot the graph (for future convenience).
The Output
The following is for one of the .wav files. All the files are quite similar (bouncing drumstick).
MATLAB output (blue):
Librosa output (green):
What I surmised so far
A graph of the normalised centroid values, overlapped with the waveform, by Librosa:
Both graphs have similar shape, especially the valleys and peaks which correspond to the hits and the silences respectively. This makes sense as the drum has a fundamental frequency of about 180 Hz (and lowest overtones at 300-500Hz), while during the silences, high frequency noise should dominate.
I care about the centroid values during the hits (the valleys), which are about 250Hz for MATLAB and 950Hz for Librosa. And of course, the difference in the general range is huge, with the background noise residing at ~600Hz for MATLAB and ~3000Hz (and much higher in other samples) for Librosa.
This is the problem. As MATLAB's result is significantly lower and less sensible (the centroid should be much higher due to the sheer amount of higher frequencies involved), hopefully someone can point out the issue in MATLAB's result and validate Librosa's result?
I have tried sampling a 440Hz pure sine wave (generated by online websites).
  • The Librosa function gives a stable centroid value of ~445 Hz;
  • The MATLAB function gives fluctuating centroid value between 435 and 455 Hz.
Since my sampling may be the issue, I have tried increasing the sample rate to increase the window size (as suggested by a friend):
  • Increasing the sample rate for Librosa to 96000Hz, which lowers the centroid values by about 100Hz (hit = 850Hz)
  • Increasing the sample rate for MATLAB (by interpolating the samples by 4 times), which lowers the centroid values by about 150Hz (hit = 50Hz) and creates a lot more fluctuations
audioIn2 = interp(audioIn(:,1), 4);
centroid = spectralCentroid(audioIn2,fs);
t = linspace(0,size(audioIn2,1)/fs,size(centroid,1));
Obviously, they are still miles apart.
What is the cause of this discrepancy and how should I remove it? Thanks!
  2 Comments
Luuk van Oosten
Luuk van Oosten on 3 Apr 2020
Dear Yixuan,
Did you investigate the (Hamming) window size used? Cause if you just use
centroid = spectralCentroid(audioIn,fs)
MATLAB fills all remaining Name-Value pair arguments with the default values, which apparently are suboptimal for your problem (compared to your Pyothon thing).
I suggest you investigate the influence of, for example:
centroid = spectralCentroid(audioIn,fs,'Window',YixuansWindow)
But your could also adjust other parameters, such as
'OVerlabLength', 'FFTLength', 'Range', and 'SpectrumType'
As to the direct answer to your question "Why does the spectralCentroid function in MATLAB produce such different frequency ranges from the Python Librosa equivalent function?"
Probably because they are apparently not so equivalent, and Python Librose just happens to work more plug-and-play friendly on your data, but I have no doubt that with some tweaking you can make it work in MATLAB.
Have fun!
jibrahim
jibrahim on 3 Apr 2020
Edited: jibrahim on 3 Apr 2020
With the syntax highlighted in my answer, you get the following for spectralCentroid versus Librosa. I use the audio file Counting-16-44p1-mono-15secs.wav as a test signal (the file ships with Audio Toolbox).

Sign in to comment.

Accepted Answer

jibrahim
jibrahim on 3 Apr 2020
Edited: jibrahim on 3 Apr 2020
Hi Yixuan,
I took a look at the librosa code. As Luuk mentions, it is a matter of different defaults.
1) spectralCentroid uses a default window length of 30 ms. Librosa uses a default of 2048 samples. I feel a default in ms is better than a default in samples, since 2048 samples might be too long or short depending on your sample rate.
2) spectralCentroid uses a default overlap length of 20 ms. Librosa uses a default of 2048-512 samples.
3) spectralCentroid uses the power spectrogram by default. Librosa uses the magnitude spectrogram.
4) Librosa pads the audio with zeros to center the results. MATLAB does not.
Here is how to call spectralCentroid to match Librosa (within floating point errors):
windowLength = 2048;
win = hann(windowLength,'periodic');
overlapLength = windowLength - 512;
audioIn = [zeros(windowLength/2,1);audioIn];
centroidML = spectralCentroid(audioIn,fs,'Window',win,'OverlapLength',overlapLength,'SpectrumType','magnitude');
  2 Comments
Yixuan Leow
Yixuan Leow on 3 Apr 2020
Thanks Jibrahim! I managed to match my MATLAB results with the Librosa!
It seems like I can also change the window length in samples if I want to? Really appreciate the help.
jibrahim
jibrahim on 3 Apr 2020
Edited: jibrahim on 3 Apr 2020
Yep, you can specify the window to a vector of any length. But if you skip specifying the window, then the default length is round(fs*0.03), and it's a hamming window by default.

Sign in to comment.

More Answers (0)

Categories

Find more on Audio Processing Algorithm Design in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!