Storing 200GB audio spectrograms in a tall table, is this possible?

1 view (last 30 days)
Hi,
I'm processing 200GB of 1 minute audio files in a way that for each file I store in a table the filename, a timestamp and the spectum (1x64000) for each of the 60s. Then I save each table to a mat file:
for f=1:length(totFiles)
%Audio data
File=tot(f).name(1:end-4);
Fecha=datetime(str2double(File(9:12)),str2double(File(13:14)),...
str2double(File(15:16)),str2double(File(18:19)),...
str2double(File(20:21)),str2double(File(22:23)));
%Audio read
[x,fs]=audioread(strcat(tot(f).folder,'/',tot(f).name));
long=length(x)/fs;%long audio en s
%Spectrum calculation each second
xf=reshape(x,1*fs,[]);
sp=pwelch(xf,fs,fs/2,fs,fs,'power');%Ojo si wlen =! 1*fs
%Table creation
T(1:60,:)=table((Fecha+seconds(1:60))',...
strcat(repmat(File,60,1),suff),sp',...
'VariableNames',{'Fecha','File','sp'});
location=('/Volumes/Almacén/matlab/espectrosCortegada/');
save(strcat(location,'espectrosFile_',num2str(f),'.mat'),'T');
clear T;clear x;
end
The problem is that whe I want to recover all this files in a tall array trough a datastore i get the error:
ds=datastore('/Volumes/Almacén/matlab/espectrosCortegada/*.mat')
Error using datastore
Cannot determine the datastore type for the specified location.
Specify the 'Type' name-value pair argument to indicate the type of datastore to create.
>> ds=datastore('/Volumes/Almacén/matlab/espectrosCortegada/*.mat','Type','file')
Error using datastore
Incorrect number of input arguments. Specify a function handle with the 'ReadFcn' parameter.
Any clue on how to face this problem or if this even possible?

Accepted Answer

jibrahim
jibrahim on 23 Mar 2023
Hi David,
Please find below a possible solution that uses Audio Toolbox functionality. It uses a sample dataset as an example.
% Download the Free Spoken Digit Data Set (FSDD).
% FSDD consists of 2000 recordings of four speakers saying the numbers 0
% through 9 in English.
downloadFolder = matlab.internal.examples.downloadSupportFile("audio","FSDD.zip");
dataFolder = tempdir;
unzip(downloadFolder,dataFolder)
dataset = fullfile(dataFolder,"FSDD");
% Create an audioDatastore that points to the dataset.
ads = audioDatastore(dataset,IncludeSubfolders=true);
% Create a transformed datastore that computes spectra from audio data.
% Here, use pwelch.
adsSpec = transform(ads,@(x)pwelch(x,'power'));
% Use writeall to write spectra to disk. Set UseParallel to
% true to perform writing in parallel.
outputLocation = fullfile(tempdir,"MyFeatures");
writeall(adsSpec,outputLocation,WriteFcn=@myCustomWriter,UseParallel=true);
% Create a signalDatastore that points to the out-of-memory features. The
% read function returns a spectrum/timestamp pair.
sds = signalDatastore(outputLocation,IncludeSubfolders=true, ...
SignalVariableNames=["spec","timestamp"],ReadOutputOrientation="row");
% Read one pair of spectrum/timestamp
y = read(sds)
% Create a tall table
t = tall(sds);
function myCustomWriter(spec,writeInfo,~)
% myCustomWriter(spec,writeInfo,~) writes spectra/time stamps
% pair to MAT files.
filename = strrep(writeInfo.SuggestedOutputName,".wav",".mat");
% also write a time stamp as an example
timestamp = datetime('now');
save(filename,"spec","timestamp");
end
  7 Comments
jibrahim
jibrahim on 4 Apr 2023
Not sure if this is what you want, but if you change one line of code in Procesado to:
features = [Kurtosis,Entropy];
then this works:
t = tall(sds);
Y = mean(cell2mat(t(:,1)));
Y = gather(Y)

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!