Saving images quickly for huge datasets

1 view (last 30 days)
ads = audioDatastore(fulfolder, ...
'IncludeSubfolders',true, ...
'FileExtensions','.wav','LabelSource','foldernames');
ads.Files = natsortfiles(ads.Files);
fs = 44100;%sampling time for melspectrogram
for i = 1:length(myFolder)
[filepath,filename,extension] = fileparts(ads.Files{i});
readingdata = read(ads);
%Pre-process audio data
if width(readingdata)>1
readingdata = mean(readingdata,2);
end
if length(readingdata)<fs
readingdata = [readingdata;readingdata];
end
path=fullfile(image_save,filename);
%Save spectrogram as image
spectro(readingdata,fs,path);
end
function spectro(audiodata, fs, path)
melSpectrogram(audiodata,fs);
colorbar ('off');
axis off;
f=gcf;
saveas(f,path,'jpg');
%Crop spectrogram data only
file = [path,'.jpg'];
img = imread(file);
crop_im = imcrop(img,[115 50 675 535]);
imwrite(crop_im,file,"jpg");
end
I have written this code that saves the Melspectrogram image of each audio sample into a specified folder ad later crops it out.
My problem arises when I got 5136 audio samples, saving each image takes very long.
I would like to know if there is any other special and quicker way to get these images saved to my folder. I had kept my device running for almost two days and I am still saving the 1100th image.
Just like added a training process to my GPU is there a way I can sideload this work on my GPU.

Answers (2)

Joss Knight
Joss Knight on 14 Apr 2022
It's hard to say what will speed things up, since we don't know which part of the process is slow. Is saving slow? Is computing the spectrogram slow? Try running the MATLAB profiler on a subset of the data to see where the bottlenecks are.
If it's file I/O that's slow you can try parallelizing using some parallel syntax such as parfor. You might also try using datastore writeall, for which you can define a WriteFcn, which would essentially be the code of your spectro function. writeall let's you set the UseParallel option to true.
If it's the spectrogram computation that's slow, and you have a GPU, maybe running on the GPU will help. Just move your data to the GPU, for instance, melSpectrogram(gpuArray(audiodata),fs).
  1 Comment
Joss Knight
Joss Knight on 14 Apr 2022
Oh, I've noticed that you're saving a figure to disk, then loading it again in order to crop it using imcrop. This is highly inefficient. Do not use saveas, use print, and work with the options to axis, axes and print to get the output you're after.

Sign in to comment.


jibrahim
jibrahim on 14 Apr 2022
Hi Joenam,
A couple of things I noticed in your code:
1) You rely on melSpectrogram to generate a plot for you, which is fine, but that will be a bottleneck, as you generate a plot for evey file. Perhaps returning the spectrogram (S = melSpectogram... will not generate a plot) and saving S to a file is faster
2) For each audio file, you write an image file, but then you read it, and then write it again. You would save time by pre-processing S, and writing the image file once, with no need to read it again.
  1 Comment
Joenam Coutinho
Joenam Coutinho on 15 Apr 2022
I am not quite clear with the point no.2.
I tried cropping the melspectrogram before saving it inorder to save time between reading and writing. But i am unable to feed S into imcrop. I gives me an error. 'Expected DATA to be nonempty.
I feel I am doing something wrong but do not know where I am going wrong

Sign in to comment.

Categories

Find more on Image Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!