Audio to Mel Spectrogram
Show older comments
Hello I am working on sound classification problem. my task is to create mel spectrogram with three different windows length 93ms and 46ms and 23ms this is achieved by keeping n_fft to 2048,1024 and 512 respectively. I am getting (128,216) but I don't understand the 3 there (128,216,3) here 128 is number of frequency bins and 216 are number of frames. Can some help me understand the right side the attached image the DL part?

2 Comments
Mathieu NOE
on 22 Sep 2023
You have 3 time windows , so you are omputing 3 spectrograms, each one is an array size 128 x 216
at the end your 3 spectrograms are stored in a 3D array, size 128 x 216 x 3
Mudasser Ahmad
on 22 Sep 2023
Answers (0)
Categories
Find more on Simulation, Tuning, and Visualization in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!