For loop only working/filling cell array for half of data
1 view (last 30 days)
Show older comments
I am trying to use a for loop to fill a cell array containing tables with various statistics (e.g. mean, median ...) for sites within a large dataset.
The aim is to end up with a cell array 1x42, with a table for each variable.
The loop seems to only work for the first 16 variables. The remaining tables are empty. However, if I run the same loop specifiying a single variable (eg. i = 20), the code works and that output gives a filled table.
Code and input data are attached.
clear variables; clc; load x.mat;
for i = 1:(size(x,2))
x = x(~isnan(table2array(x(:,i))),:);
[site_num,ia,obs_count] = unique(x.site_num,'sorted');
ans_mean = accumarray(obs_count,table2array(x(:,i)),[],@(x)mean(x,'omitnan')); ans_mean = [array2table(ans_mean)];
txt1 = x(:,i).Properties.VariableNames; txt2 = ans_mean.Properties.VariableNames; header = strcat(txt1,{'_'},txt2); ans_mean = renamevars(ans_mean,'ans_mean',header);
ans_median = accumarray(obs_count,table2array(x(:,i)),[],@(x)median(x,'omitnan')); ans_median = [array2table(ans_median)];
txt1 = x(:,i).Properties.VariableNames; txt2 = ans_median.Properties.VariableNames; header = strcat(txt1,{'_'},txt2); ans_median = renamevars(ans_median,'ans_median',header);
ans_std = accumarray(obs_count,table2array(x(:,i)),[],@(x)std(x,'omitnan')); ans_std = [array2table(ans_std)];
txt1 = x(:,i).Properties.VariableNames; txt2 = ans_std.Properties.VariableNames; header = strcat(txt1,{'_'},txt2); ans_std = renamevars(ans_std,'ans_std',header);
ans_lq = accumarray(obs_count,table2array(x(:,i)),[],@(x)quantile(x,0.25)); ans_lq = [array2table(ans_lq)];
txt1 = x(:,i).Properties.VariableNames; txt2 = ans_lq.Properties.VariableNames; header = strcat(txt1,{'_'},txt2); ans_lq = renamevars(ans_lq,'ans_lq',header);
ans_uq = accumarray(obs_count,table2array(x(:,i)),[],@(x)quantile(x,0.75)); ans_uq = [array2table(ans_uq)];
txt1 = x(:,i).Properties.VariableNames; txt2 = ans_uq.Properties.VariableNames; header = strcat(txt1,{'_'},txt2); ans_uq = renamevars(ans_uq,'ans_uq',header);
obs_count = array2table(accumarray(obs_count,1)); txt1 = x(:,i).Properties.VariableNames; header = strcat(txt1,{'_'},{'obs_count'}); obs_count = renamevars(obs_count,'Var1',header);
all{i} = [array2table(site_num) ans_mean ans_median ans_std ans_lq ans_uq obs_count];
end
Any thoughts/help/tips would be greatly appreciated! Thank you!
Apologies if my code is quite inefficient, I'm still in the learning process :)
2 Comments
Accepted Answer
Karim
on 11 Nov 2022
Edited: Karim
on 12 Nov 2022
One issue was the reuse of the variable name "x" directly after entering the loop, you overwrite your orinal data by removing elements with a nan. After a few loops you are left with no data.
It's better to create a temporary variable, I called it "currData" to extract the data on which your are working in the current loop. I shortend the code a bit and added a few comments.
% load mat file
load(websave('myFile', "https://www.mathworks.com/matlabcentral/answers/uploaded_files/1189013/x.mat"));
% allocate a cell array for the output data
AllData = cell(1,size(x,2));
for i = 1:size(x,2)
% extract data for current loop, and convert to array
% EDIT: included Stephen23's proposal to extract the data
currData = x{:,i};
% figure out which values are a number
NumIdx = ~isnan( currData );
% only keep the numbers for further processing
currData = currData(NumIdx);
% sort the "site num" for the numbers in tha array
[site_num,~,obs_count] = unique(x.site_num(NumIdx) ,'sorted');
% get the name of the current variable
currVarName = x(:,i).Properties.VariableNames + "_";
% do the processing
ans_mean = accumarray(obs_count,currData,[],@(x)mean(x,'omitnan'));
ans_median = accumarray(obs_count,currData,[],@(x)median(x,'omitnan'));
ans_std = accumarray(obs_count,currData,[],@(x)std(x,'omitnan'));
ans_lq = accumarray(obs_count,currData,[],@(x)quantile(x,0.25));
ans_uq = accumarray(obs_count,currData,[],@(x)quantile(x,0.75));
% create the table names for the current variable
varNames = [ currVarName + "site_num";
currVarName + "ans_mean";
currVarName + "ans_median";
currVarName + "ans_std";
currVarName + "ans_lq";
currVarName + "ans_uq"
currVarName + "obs_count";];
% gather the data in a table
currTable = table(site_num, ans_mean, ans_median, ans_std, ans_lq, ans_uq, accumarray(obs_count,1),...
'VariableNames',varNames);
% store the table in the output cell array
AllData{i} = currTable;
end
% have a look at the data in the output cell
AllData
More Answers (0)
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!