Creating a loop to use all columns in a dataset array

Hello , i have a data set with 10 columns.The first column contains the names of the variables and the 9 next contain the data for the variables across 9 periods. I want , for each period ,to take the data which are lower than the data's median .I use the following command for the first period: s1=data(data.mv1<median(data.mv1),{'Name','mv1',}), where mv1 is the header of the first period's column and s1 a new dataset which contains only the variables i want. My question is , can i write a (for) loop that will automatically do this for the whole 9 periods, thus giving me s1,s2,...,s9?

1 Comment

There must be one or two typos in "s1=data(data.mv1<median(data.mv1),{'Name','mv1',})"
Please fix them by editing your question.

Sign in to comment.

 Accepted Answer

data = dataset({('a':'z').','names'},{rand(26,2),'mv1','mv2'});
EDIT
% Retrieve column names/variables (from second onwards)
varnames = data.Properties.VarNames(2:end);
nV = numel(varnames);
% Preallocate
C = cell(1,nV);
% Loop per each column (variable)
for v = 1:nV
idx = data.(varnames{v}) < median(data.(varnames{v}));
C{v} = data.(varnames{v})(idx);
C{v} = data(idx,[1 v]);
end

5 Comments

Thank you for your answer. A few brief questions: If i use C{v} i get a 'Cell contents assignment to a non-cell array object.' , whereas if use a new letter (for example s{v}) it works fine. Is there any way though for the s{v} to display the names of the variables along with the data? Also, your code creates one cell array which contains all the new data sets i wanted. How do i split them so i have different elements for each s{v}? Again , thank you for your time.
C{v} gave error cause I forgot to change preallocation to cell. Now it's fixed.
I added another line in the loop but don't know if it does what you're asking. Choose the first or the new line.
Can you elaborate on "how do I split..."?
The line you added does display the names like i wanted but the variable n is not defined. I assumed you wanted to write 'v+1' insted of 'n' so i tried that and it works. The end result of the code is a cell array with dimensions '1*nV' meaning each cell of the first row (up to the nV cell) contains a dataset ,namely C{v}. My question is, how do i extract these datasets to the matlab workspace so i can use them as variables?
Sry for the distraction, it should be v (why v+1?).
Why do you want it to be extracted to the workspace, I suggest to keep it in the cell array, easier to reference. Read http://matlab.wikia.com/wiki/FAQ#How_can_I_create_variables_A1.2C_A2.2C....2CA10_in_a_loop.3F
If i use v ,the first dataset doesn't show the first data but instead repeates the names of the variables. Agreed, it's much easier if i keep them in the cell ,it came to me right after i posted it unfortunately. Thanks again for all your help.

Sign in to comment.

More Answers (2)

It would be easy, if you do not use symbols like 's1' and 'mv1', which have an index inside the name. Better use and index as index: s{1}, s{2}, ... and mv{1}, mv{2}, ...
Hi,
I guess this leads to the question, how to get to data.mv1 where mv1 is given in a variable?
Note, that data.mv1 is the same as data.('mv1'). So if you have e.g.
header = {'mv1', ...};
then you could do
for i=1:length(header)
col = data.(header{i});
% do your median thing
dataNew = data(col<median(col));
end
Hope, this helps,
Titus

2 Comments

Is your <header> a cell array? When trying to run your code i get a 'Dataset array subscripts must be two-dimensional.' error. Also , shouldn't there be a suscript to dataNew like dataNew{i}?
Yes, header is a cell array containing the names. And yes, there should be some subscript depending on what further you want to do with the reduced data ...

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!