How to output specific rows from tables depending on values within the table?

Question

0 votes

I have a table of variable I have pulled form an excel spread sheet (the actual file is 45 columns X 2000 rows). But this gives the idea of what I am trying to achieve.

I would like to find each separate participants (identified by their 'ID') maximum jump height for that 'season' of testing and remove the other rows.

The Table I'm working with is like this (but extended):

[Season] [TrialNo] [ID] [AgeGroup] [bodyMass] [JumpHeight] [Force] [FlighTime] [LandingForce]
 'Pre'      1      0001    U14         40         35         685       0.3          1100  
 'Pre'      2      0001    U14         40         32         630       0.25         1200  
 'Pre'      1      0002    U14         42         40         750       0.42         1000  
 'Pre'      2      0002    U14         42         36         700       0.4          1300  
 'Pre'      1      0003    U14         45         32         610       0.3          1111  
 'Pre'      2      0003    U14         45         28         600       0.3          1600  
 'Post'     1      0001    U14         40         35         685       0.3          1100  
 'Post'     2      0001    U14         40         32         630       0.25         1200  
 'Post'     1      0002    U14         42         40         750       0.42         1000  
 'Post'     2      0002    U14         42         36         700       0.4          1300  
 'Post'     1      0003    U14         45         32         610       0.3          1111  
 'Post'     2      0003    U14         45         28         600       0.3          1600

What I aim to end up with is something more like

    [Season] [TrialNo] [ID] [AgeGroup] [bodyMass] [JumpHeight] [Force] [FlighTime] [LandingForce]
     'Pre'      1      0001    U14         40         35         685       0.3          1100    
     'Pre'      1      0002    U14         42         40         750       0.42         1000  
     'Pre'      1      0003    U14         45         32         610       0.3          1111  
     'Post'     1      0001    U14         40         35         685       0.3          1100  
     'Post'     1      0002    U14         42         40         750       0.42         1000  
     'Post'     1      0003    U14         45         32         610       0.3          1111

2 Comments
Show None Hide None

Paolo on 5 Jun 2018

Edited: Paolo on 5 Jun 2018

JumpHeight seems to be already sorted for the ID for every pair or rows, so if that's always the case you could just delete every other row.

Alternatively, if its not always sorted, you could sort every two rows by JumpHeight.

Could you attach a sample spreadsheet?

Sean Byrne on 5 Jun 2018

Raw_Data_Sample.xls

Attached is a sample of the data I'm working with. As you can see jump height is random within the three trials.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Razvan Carbunescu on 5 Jun 2018

Edited: Razvan Carbunescu on 5 Jun 2018

Open in MATLAB Online

1 vote

If you're using R2018a and only interested in maximum JumpHeight can use groupsummary on table T:

>> GT = groupsummary (T,{'Season','ID'},'max','JumpHeight')
GT =
6×4 table
    Season    ID    GroupCount    max_JumpHeight
    ______    __    __________    ______________
    'Post'    1         2               35      
    'Post'    2         2               40      
    'Post'    3         2               32      
    'Pre'     1         2               35      
    'Pre'     2         2               40      
    'Pre'     3         2               32

If you want to get all the row information or on an earlier release can use findgroups / splitapply workflow

idx = findgroups(T.Season,T.ID);
GT = splitapply(@maxidx,T,idx);
GT.Properties.VariableNames = T.Properties.VariableNames
function T = maxidx(varargin)
   [~,i] = max(varargin{6});
   tmpvarargout = cellfun(@(x) x(i,:),varargin,'UniformOutput',false);
   T = table(tmpvarargout{:});
end

Sample Output

GT =
6×9 table
    Season    TrialNo    ID    AgeGroup    bodyMass    JumpHeight    Force    FlighTime    LandingForce
    ______    _______    __    ________    ________    __________    _____    _________    ____________
    'Post'       1       1        14          40           35         685        0.3           1100    
    'Post'       1       2        14          42           40         750       0.42           1000    
    'Post'       1       3        14          45           32         610        0.3           1111    
    'Pre'        1       1        14          40           35         685        0.3           1100    
    'Pre'        1       2        14          42           40         750       0.42           1000    
    'Pre'        1       3        14          45           32         610        0.3           1111

Edit: Script assumes JumpHeight is 6th column in table, might have to modify for correct position

6 Comments
Show 4 older comments Hide 4 older comments

Razvan Carbunescu on 5 Jun 2018

Open in MATLAB Online

I am able to get the data with changing the index in the script to 9:

T = readtable('Raw_Data_Sample.xls');
idx = findgroups(T.Season,T.ID);
GT = splitapply(@maxidx,T,idx);
GT.Properties.VariableNames = T.Properties.VariableNames
function T = maxidx(varargin)
   [~,i] = max(varargin{9});
   tmpvarargout = cellfun(@(x) x(i,:),varargin,'UniformOutput',false);
   T = table(tmpvarargout{:});
end

Output I get:

>> head(GT)

ans =

8×10 table
       Year           Season       TrialNumber    AgeGroup     ID            TestDate          BodyMass    PushForce    JumpHeight    Take_off_Velocity
    ___________    ____________    ___________    ________    _____    ____________________    ________    _________    __________    _________________
    '2016-2017'    'Pre-season'         3          'U14'      17637    9/18/2016 4:58:00 PM      37.8       1136.6        0.394             2.69       
    '2016-2017'    'Pre-season'         3          'U14'      17864    9/18/2016 5:10:00 PM     49.56       1256.9        0.302             2.35       
    '2016-2017'    'Pre-season'         2          'U14'      17917    9/18/2016 5:07:00 PM     40.01       853.48        0.282             2.27       
    '2016-2017'    'Pre-season'         2          'U14'      18069    9/18/2016 4:47:00 PM     38.51       835.35        0.336             2.48       
    '2016-2017'    'Pre-season'         3          'U14'      18133    9/18/2016 5:05:00 PM     53.66       1277.5        0.364             2.57       
    '2016-2017'    'Pre-season'         3          'U13'      18891    9/18/2016 4:32:00 PM     40.76       870.99        0.423             2.81       
    '2016-2017'    'Pre-season'         3          'U13'      18935    9/18/2016 4:21:00 PM     42.36       1145.8        0.284             2.27       
    '2016-2017'    'Pre-season'         3          'U13'      19054    9/18/2016 4:25:00 PM      41.3       800.82        0.289             2.26

>>

Sean Byrne on 7 Jun 2018

Also as a fix I tried changing the (T.Season,T.ID) to {T.Season,T.ID} and that changed the error message to Undefined variable "findgroups" or class "findgroups"

Razvan Carbunescu on 7 Jun 2018

Open in MATLAB Online

I had missed the fact that you're on R2014a. findgroups/splitapply were introduced in R2016b.

I think the way to try to get it in R2014a is to use sortrows and unique with the rows flag to find the indexing to the first sorted highest value.

ST = sortrows(T,{'Season' 'ID' 'JumpHeight'},{'ascend' 'ascend' 'descend'});
% taking advantage here of the fact that ST is sorted by JumpHeight and unique returns first element
[~,idx] = unique([double(categorical(ST.Season)) ST.ID],'rows');
GT = ST(idx,:)

Sign in to comment.

Answer 2

Are Mjaavatten on 5 Jun 2018

Edited: Are Mjaavatten on 11 Jun 2018

Open in MATLAB Online

0 votes

I am a little uncertain about the type of data structure you use. For completeness I therefore entered your data in an Excel workbook that I read using readtable.

If there are always exactly two trials per ID and season:

T0 = readtable('Byrne.xlsx');
rows = [];
for i = 1:2:size(T0,1)-1
  [~,j] = max(T0.JumpHeight(i:i+1));
  rows = [rows;i+j-1];
end
T2 = T0(rows,:);

If the number of trials may vary:

T0 = sortrows(T0,'ID');
T0 = sortrows(T0,'Season','descend');
J = [find(diff([0;T0.ID])~=0);size(T0,1)];  % Indices for each ID change
rows = [];
for i = 1:length(J)-1
  [~,j] = max(T0.JumpHeight(J(i):J(i+1)-1));
  rows = [rows;J(i)+j-1];
end
T2 = T0(rows,:);

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 3

Peter Perkins on 3 Jul 2018

0 votes

In more recent versions of MATLAB there are several ways to do this. In R2014a, do a grouped varfun, using @max as the function to apply, ID and Season as the grouping Variables, and JumpHight as the InputVariable.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

How to output specific rows from tables depending on values within the table?

2 Comments
Show None Hide None

Answers (3)

6 Comments
Show 4 older comments Hide 4 older comments

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

How to output specific rows from tables depending on values within the table?

2 Comments Show None Hide None

Answers (3)

6 Comments Show 4 older comments Hide 4 older comments

0 Comments Show -2 older comments Hide -2 older comments

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None

6 Comments
Show 4 older comments Hide 4 older comments

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments