How to output specific rows from tables depending on values within the table?

I have a table of variable I have pulled form an excel spread sheet (the actual file is 45 columns X 2000 rows). But this gives the idea of what I am trying to achieve.
I would like to find each separate participants (identified by their 'ID') maximum jump height for that 'season' of testing and remove the other rows.
The Table I'm working with is like this (but extended):
[Season] [TrialNo] [ID] [AgeGroup] [bodyMass] [JumpHeight] [Force] [FlighTime] [LandingForce]
'Pre' 1 0001 U14 40 35 685 0.3 1100
'Pre' 2 0001 U14 40 32 630 0.25 1200
'Pre' 1 0002 U14 42 40 750 0.42 1000
'Pre' 2 0002 U14 42 36 700 0.4 1300
'Pre' 1 0003 U14 45 32 610 0.3 1111
'Pre' 2 0003 U14 45 28 600 0.3 1600
'Post' 1 0001 U14 40 35 685 0.3 1100
'Post' 2 0001 U14 40 32 630 0.25 1200
'Post' 1 0002 U14 42 40 750 0.42 1000
'Post' 2 0002 U14 42 36 700 0.4 1300
'Post' 1 0003 U14 45 32 610 0.3 1111
'Post' 2 0003 U14 45 28 600 0.3 1600
What I aim to end up with is something more like
[Season] [TrialNo] [ID] [AgeGroup] [bodyMass] [JumpHeight] [Force] [FlighTime] [LandingForce]
'Pre' 1 0001 U14 40 35 685 0.3 1100
'Pre' 1 0002 U14 42 40 750 0.42 1000
'Pre' 1 0003 U14 45 32 610 0.3 1111
'Post' 1 0001 U14 40 35 685 0.3 1100
'Post' 1 0002 U14 42 40 750 0.42 1000
'Post' 1 0003 U14 45 32 610 0.3 1111

2 Comments

JumpHeight seems to be already sorted for the ID for every pair or rows, so if that's always the case you could just delete every other row.
Alternatively, if its not always sorted, you could sort every two rows by JumpHeight.
Could you attach a sample spreadsheet?
Attached is a sample of the data I'm working with. As you can see jump height is random within the three trials.

Sign in to comment.

Answers (3)

If you're using R2018a and only interested in maximum JumpHeight can use groupsummary on table T:
>> GT = groupsummary (T,{'Season','ID'},'max','JumpHeight')
GT =
6×4 table
Season ID GroupCount max_JumpHeight
______ __ __________ ______________
'Post' 1 2 35
'Post' 2 2 40
'Post' 3 2 32
'Pre' 1 2 35
'Pre' 2 2 40
'Pre' 3 2 32
If you want to get all the row information or on an earlier release can use findgroups / splitapply workflow
idx = findgroups(T.Season,T.ID);
GT = splitapply(@maxidx,T,idx);
GT.Properties.VariableNames = T.Properties.VariableNames
function T = maxidx(varargin)
[~,i] = max(varargin{6});
tmpvarargout = cellfun(@(x) x(i,:),varargin,'UniformOutput',false);
T = table(tmpvarargout{:});
end
Sample Output
GT =
6×9 table
Season TrialNo ID AgeGroup bodyMass JumpHeight Force FlighTime LandingForce
______ _______ __ ________ ________ __________ _____ _________ ____________
'Post' 1 1 14 40 35 685 0.3 1100
'Post' 1 2 14 42 40 750 0.42 1000
'Post' 1 3 14 45 32 610 0.3 1111
'Pre' 1 1 14 40 35 685 0.3 1100
'Pre' 1 2 14 42 40 750 0.42 1000
'Pre' 1 3 14 45 32 610 0.3 1111
Edit: Script assumes JumpHeight is 6th column in table, might have to modify for correct position

6 Comments

As Razvan says, in R2018a and later, groupsummary sounds like the way to go.
Your question is sort of phrased as, "how do I pull out the data for each participant/season to find their maximum?" You may not have meant that, but just in case: most times, you can work on the data in the table across all groups, without needing to created separate sub-datasets. groupsummary, but also findgroups/splitapply, or varfun (with a grouping variable). The latter two choices are available in versions before R2018a.
That secondary output is exactly what I'm after. From there I will be able to calculate group means and graph individual results, which I can do easy enough. Just struggling to get that script working.
Unfortunately I'm only operating R2014a. Also attached is a sample of the data I'm actually working with
I am able to get the data with changing the index in the script to 9:
T = readtable('Raw_Data_Sample.xls');
idx = findgroups(T.Season,T.ID);
GT = splitapply(@maxidx,T,idx);
GT.Properties.VariableNames = T.Properties.VariableNames
function T = maxidx(varargin)
[~,i] = max(varargin{9});
tmpvarargout = cellfun(@(x) x(i,:),varargin,'UniformOutput',false);
T = table(tmpvarargout{:});
end
Output I get:
>> head(GT)
ans =
8×10 table
Year Season TrialNumber AgeGroup ID TestDate BodyMass PushForce JumpHeight Take_off_Velocity
___________ ____________ ___________ ________ _____ ____________________ ________ _________ __________ _________________
'2016-2017' 'Pre-season' 3 'U14' 17637 9/18/2016 4:58:00 PM 37.8 1136.6 0.394 2.69
'2016-2017' 'Pre-season' 3 'U14' 17864 9/18/2016 5:10:00 PM 49.56 1256.9 0.302 2.35
'2016-2017' 'Pre-season' 2 'U14' 17917 9/18/2016 5:07:00 PM 40.01 853.48 0.282 2.27
'2016-2017' 'Pre-season' 2 'U14' 18069 9/18/2016 4:47:00 PM 38.51 835.35 0.336 2.48
'2016-2017' 'Pre-season' 3 'U14' 18133 9/18/2016 5:05:00 PM 53.66 1277.5 0.364 2.57
'2016-2017' 'Pre-season' 3 'U13' 18891 9/18/2016 4:32:00 PM 40.76 870.99 0.423 2.81
'2016-2017' 'Pre-season' 3 'U13' 18935 9/18/2016 4:21:00 PM 42.36 1145.8 0.284 2.27
'2016-2017' 'Pre-season' 3 'U13' 19054 9/18/2016 4:25:00 PM 41.3 800.82 0.289 2.26
>>
Unfortunately I get the Error message: Undefined function 'findgroups' for the input arguments of type 'cell'
After running the section of script T = readtable
('Raw_Data_Sample.xls');
idx = findgroups(T.Season,T.ID);
GT = splitapply(@maxidx,T,idx);
GT.Properties.VariableNames = T.Properties.VariableNames
Any suggestions?
Also as a fix I tried changing the (T.Season,T.ID) to {T.Season,T.ID} and that changed the error message to Undefined variable "findgroups" or class "findgroups"
I had missed the fact that you're on R2014a. findgroups/splitapply were introduced in R2016b.
I think the way to try to get it in R2014a is to use sortrows and unique with the rows flag to find the indexing to the first sorted highest value.
ST = sortrows(T,{'Season' 'ID' 'JumpHeight'},{'ascend' 'ascend' 'descend'});
% taking advantage here of the fact that ST is sorted by JumpHeight and unique returns first element
[~,idx] = unique([double(categorical(ST.Season)) ST.ID],'rows');
GT = ST(idx,:)

Sign in to comment.

I am a little uncertain about the type of data structure you use. For completeness I therefore entered your data in an Excel workbook that I read using readtable.
If there are always exactly two trials per ID and season:
T0 = readtable('Byrne.xlsx');
rows = [];
for i = 1:2:size(T0,1)-1
[~,j] = max(T0.JumpHeight(i:i+1));
rows = [rows;i+j-1];
end
T2 = T0(rows,:);
If the number of trials may vary:
T0 = sortrows(T0,'ID');
T0 = sortrows(T0,'Season','descend');
J = [find(diff([0;T0.ID])~=0);size(T0,1)]; % Indices for each ID change
rows = [];
for i = 1:length(J)-1
[~,j] = max(T0.JumpHeight(J(i):J(i+1)-1));
rows = [rows;J(i)+j-1];
end
T2 = T0(rows,:);
In more recent versions of MATLAB there are several ways to do this. In R2014a, do a grouped varfun, using @max as the function to apply, ID and Season as the grouping Variables, and JumpHight as the InputVariable.

Categories

Asked:

on 5 Jun 2018

Answered:

on 3 Jul 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!