I am conducting a spectral centroid analysis on a relatively short sample of a drumstick hitting a drum. I have different samples that I want to compare to each other, essentially to observe how the timbre of the drum changes.
The problem arises when I use two different libraries and languages to conduct the same spectral centroid analysis.
First, some information on the sample and parameters:
- Audio involves a bouncing drumstick - repeating percussive sounds that get exponentially close together (about 8 distinct hits can be heard)
- .wav files come from stereo recordings through a sound pressure meter (doubling as a live microphone) using the software Wavepad Sound Editor (Norsonic)
- .wav files are then sampled at 44100 Hz through MATLAB and Librosa audio reading functions.
- The .wav files are 2.5 to 3 seconds long, which works out to a sample size of 100-130k data points
- The samples are NOT normalised
- Besides sample rate, all other parameters are default according to the respective functions (e.g. window type, overlap; although some parameters can be found in one but not the other)
MATLAB Audio Toolbox
Here is my code:
[audioIn,fs] = audioread('C:\Users\Yixuan\Desktop\Collab\BARE.wav');
L = length(audioIn);
centroid = spectralCentroid(audioIn,fs);
t = linspace(0,size(audioIn,1)/fs,size(centroid,1));
title('Spectral Centroid in Time Domain');
Here is the code I used (Python, not MATLAB)
audio_path1 = "C:\Users\Yixuan\Desktop\Collab\BARE.wav"
x1 , sr1 = librosa.load(audio_path1, sr = 96000)
spectral_centroids1 = librosa.feature.spectral_centroid(x1, sr=sr1)
frames1 = range(len(spectral_centroids1))
t1 = librosa.frames_to_time(frames1, sr=sr1)
Then I used pandas DataFrame to plot the graph (for future convenience).
The following is for one of the .wav files. All the files are quite similar (bouncing drumstick).
MATLAB output (blue):
Librosa output (green):
What I surmised so far
A graph of the normalised centroid values, overlapped with the waveform, by Librosa:
Both graphs have similar shape, especially the valleys and peaks which correspond to the hits and the silences respectively. This makes sense as the drum has a fundamental frequency of about 180 Hz (and lowest overtones at 300-500Hz), while during the silences, high frequency noise should dominate.
I care about the centroid values during the hits (the valleys), which are about 250Hz for MATLAB and 950Hz for Librosa. And of course, the difference in the general range is huge, with the background noise residing at ~600Hz for MATLAB and ~3000Hz (and much higher in other samples) for Librosa.
This is the problem. As MATLAB's result is significantly lower and less sensible (the centroid should be much higher due to the sheer amount of higher frequencies involved), hopefully someone can point out the issue in MATLAB's result and validate Librosa's result?
I have tried sampling a 440Hz pure sine wave (generated by online websites).
- The Librosa function gives a stable centroid value of ~445 Hz;
- The MATLAB function gives fluctuating centroid value between 435 and 455 Hz.
Since my sampling may be the issue, I have tried increasing the sample rate to increase the window size (as suggested by a friend):
- Increasing the sample rate for Librosa to 96000Hz, which lowers the centroid values by about 100Hz (hit = 850Hz)
- Increasing the sample rate for MATLAB (by interpolating the samples by 4 times), which lowers the centroid values by about 150Hz (hit = 50Hz) and creates a lot more fluctuations
audioIn2 = interp(audioIn(:,1), 4);
centroid = spectralCentroid(audioIn2,fs);
t = linspace(0,size(audioIn2,1)/fs,size(centroid,1));
Obviously, they are still miles apart.
What is the cause of this discrepancy and how should I remove it? Thanks!