Fast subsetting or indexing of data
6 views (last 30 days)
Show older comments
I am working with large datasets which I am subsetting into various categories and saving as smaller files. What I am doing right now is working but it is quite time consuming and error prone, as it involved a lot of copy and paste.
For example, I have many files I have split into those with boats and those without boats. I then split those into season. Would there be a faster way to do this where I apply the same command to prescribed set of variables?
%% Comparisons... Season using water temp
boatsAbsent_t=boatsAbsent.Var1; %time variables
[BA_spring, BA_summer, BA_autumn, BA_winter]=indexSeasons(boatsAbsent_t); %index times into seasons
boatsPresent_t=boatsPresent.Var1;
[BP_spring, BP_summer, BP_autumn, BP_winter]=indexSeasons(boatsPresent_t);
%Subset PSD outputs and write to file
S=withtol(BA_spring,seconds(1));
BA_spring=boatsAbsent(S,:);
writetable(timetable2table(BA_spring),...
fullfile(folder,strcat(site,'_PSD_boatsAbsent_Spring.csv')));
S=withtol(BA_summer,seconds(1));
BA_summer=boatsAbsent(S,:);
writetable(timetable2table(BA_summer),...
fullfile(folder,strcat(site,'_PSD_boatsAbsent_Summer.csv')));
S=withtol(BA_autumn,seconds(1));
BA_autumn=boatsAbsent(S,:);
writetable(timetable2table(BA_autumn),...
fullfile(folder,strcat(site,'_PSD_boatsAbsent_Autumn.csv')));
S=withtol(BA_winter,seconds(1));
BA_winter=boatsAbsent(S,:);
writetable(timetable2table(BA_winter),...
fullfile(folder,strcat(site,'_PSD_boatsAbsent_Winter.csv')));
S=withtol(BP_spring,seconds(1));
writetable(timetable2table(BP_spring),...
fullfile(folder,strcat(site,'_PSD_boatsPresent_Spring.csv')));
S=withtol(BP_summer,seconds(1));
writetable(timetable2table(BP_summer),...
fullfile(folder,strcat(site,'_PSD_boatsPresent_Summer.csv')));
S=withtol(BP_autumn,seconds(1));
writetable(timetable2table(BP_autumn),...
fullfile(folder,strcat(site,'_PSD_boatsPresent_Autumn.csv')));
S=withtol(BP_winter,seconds(1));
writetable(timetable2table(BP_winter),...
fullfile(folder,strcat(site,'_PSD_boatsPresent_Winter.csv')));
3 Comments
Stephen23
on 29 Sep 2020
Meta-data is data, and data does not belong in variable names! Sticking meta-data into variable names, e.g. the season names:
BA_spring, BA_summer, BA_autumn, BA_winter
means that you force yourself into writing slow, inefficient code or doing lots of copy-and-paste. Rik correctly recommends that you should put all of your data in arrays, rather than splitting into separated variables.
Accepted Answer
Rik
on 29 Sep 2020
Whenever you find yourself copy-pasting code in Matlab, you should consider an array.
seasons={'Spring','Summer','Autumn','Winter'};
boatsPresent_t=boatsPresent.Var1; %time variables
boatsAbsent_t=boatsAbsent.Var1; %time variables
BP=cell(1,4);BA=cell(1,4);
[BP{:}]=indexSeasons(boatsPresent_t); %index times into seasons
[BA{:}]=indexSeasons(boatsAbsent_t); %index times into seasons
for n=1:numel(seasons)
S=withtol(BP{n},seconds(1));
BP_part=boatsPresent(S,:);
writetable(timetable2table(BP_part),...
fullfile(folder,strcat(site,'_PSD_boatsPresent_',seasons{n},'.csv')));
S=withtol(BA{n},seconds(1));
BA_part=boatsAbsent(S,:);
writetable(timetable2table(BA_part),...
fullfile(folder,strcat(site,'_PSD_boatsAbsent_',seasons{n},'.csv')));
end
If you have more states than just present and absent you should consider putting those states in an array so you can use it to generate logical indices.
5 Comments
Rik
on 30 Sep 2020
If you want to have a dynamic field name you need to use this syntax:
name='foo';
S.(name)='bar';
But what is wrong with the code you posted? You shouldn't be storing data (i.e. the season) in a variable name. If you do, that will cause the same issue every time you want to use the variables.
More Answers (0)
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!