Ignore missing data in a table group
    14 views (last 30 days)
  
       Show older comments
    
I have a table with some missing data. I am using findgroups and splitapply to do some calculations on columns of the table, but when a group has a missing value the calculation returns missing. 
I would like to ignore the missing value in these calculations but without removing the enitre row- other colums have valid data. 
>> T = readtable('messy.csv','TreatAsEmpty',{'.','NA'})
T =
  21×5 table
       A         B          C          D       E  
    ________    ____    __________    ____    ____
    {'afe1'}       3    {'yes'   }       3       3
    {'egh3'}     NaN    {'no'    }       7       7
    {'wth4'}       3    {'yes'   }       3       3
    {'atn2'}      23    {'no'    }      23      23
    {'arg1'}       5    {'yes'   }       5       5
    {'jre3'}    34.6    {'yes'   }    34.6    34.6
    {'wen9'}     234    {'yes'   }     234     234
    {'ple2'}       2    {'no'    }       2       2
    {'dbo8'}       5    {'no'    }       5       5
    {'oii4'}       5    {'yes'   }       5       5
    {'wnk3'}     245    {'yes'   }     245     245
    {'abk6'}     563    {0×0 char}     563     563
    {'pnj5'}     463    {'no'    }     463     463
    {'wnn3'}       6    {'no'    }       6       6
    {'oks9'}      23    {'yes'   }      23      23
    {'wba3'}     NaN    {'yes'   }     NaN      14
    {'pkn4'}       2    {'no'    }       2       2
    {'adw3'}      22    {'no'    }      22      22
    {'poj2'}     -99    {'yes'   }     -99     -99
    {'bas8'}      23    {'no'    }      23      23
    {'gry5'}     NaN    {'yes'   }     NaN      21
    >> [G,gen]=findgroups(T(:,[3])); %find groups based on column C
    >> gen
gen =
  2×1 table
       C   
    _______
    {'no' }
    {'yes'}
    %% find mean of columns E and D based on Groups in column C ('no' or 'yes')
    >> MeanE=splitapply(@mean,T(:,5),G)
MeanE =
          61.4444444444444
          44.4181818181818
    % works 
    >> MeanD=splitapply(@mean,T(:,4),G) 
MeanD =
          61.4444444444444
                       NaN
    % Does not work for 'yes' group due to NaN values in column 4
    >> MeanD=splitapply(@mean,rmmissing(T(:,4)),G);
    %fails because rmmissing(T(:,4)) and G are differnet sizes
I would like to be able to ignore the NaN values in column D when calcualting the mean. I can't seem to make it work with ismissing or rmmissing, and feel like this should be simpler than splitting them manually. 
2 Comments
  Johan
      
 on 27 Oct 2021
				I'm not used to working with table but maybe instead of using @mean you can define a function and use this in your splitapply call.
mymean = @(x) mean(x,'omitnan');
splitapply(mymean,T(:,4),G) 
Accepted Answer
  Ive J
      
 on 27 Oct 2021
        As Johan also suggested you may use omitnant flag. Also, consider using groupsummary and groupfilter :
m = groupsummary(T, 'C', @(x)mean(x, 'omitnan'), {'D', 'E'})
        C         GroupCount    fun1_D    fun1_E
    __________    __________    ______    ______
    {0×0 char}         1           563       563
    {'no'    }         9        61.444    61.444
    {'yes'   }        11          50.4    44.418
More Answers (0)
See Also
Categories
				Find more on Data Type Identification in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

