Main Content

groupfilter

Filter by group

Since R2019b

Description

Table Data

example

G = groupfilter(T,groupvars,method) returns the rows of table or timetable T that satisfy the group-wise filtering condition specified in method. The filtering condition method is a function handle applied to each nongrouping variable. Groups are defined by rows in the variables in groupvars that have the same unique combination of values. For example, G = groupfilter(T,"Trial",@(x) numel(x) > 5) groups the data in T by Trial, and keeps the rows that belong to groups with more than five trials.

You can use groupfilter functionality interactively by adding the Compute by Group task to a live script.

example

G = groupfilter(T,groupvars,groupbins,method) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping. For example, G = groupfilter(T,"SaleDate","year",@(x) numel(x) > 5) bins the data in SaleDate by year, and keeps the rows whose bin has more than five elements.

example

G = groupfilter(___,datavars) specifies the table variables to apply the filtering method to for any of the previous syntaxes. For example, G = groupfilter(T,"Trial",@(x) x == max(x),"Height") keeps the rows of T that correspond to the maximum height for each trial.

G = groupfilter(___,"IncludedEdge",LR) specifies the included bin edge as "left" or "right" to indicate which end of the bin interval is inclusive. You can use IncludeEdge with any previous syntax that specifies groupbins.

Array Data

B = groupfilter(A,groupvars,method) returns the rows of vector or matrix A that satisfy the group-wise filtering condition specified in method. The filtering condition method is a function handle applied to all column vectors. Groups are defined by rows in the column vectors in groupvars that have the same unique combination of values.

You can use groupfilter functionality interactively by adding the Compute by Group task to a live script.

example

B = groupfilter(A,groupvars,groupbins,method) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping.

B = groupfilter(A,groupvars,groupbins,method,"IncludedEdge",LR) specifies whether to include the left or right edge in each bin when groupbins is specified.

example

[B,BG] = groupfilter(A,___) also returns the unique grouping vector combinations corresponding to the rows in B.

Examples

collapse all

Create a table containing two variables.

groupID = [1 1 1 2 2 3]';
sample = [3 1 2 9 8 5]';
T = table(groupID,sample)
T=6×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2   
       2         9   
       2         8   
       3         5   

Group by ID number, and return rows corresponding to groups with more than two samples.

Gnumel = groupfilter(T,"groupID",@(x) numel(x) > 2)
Gnumel=3×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2   

Return rows whose group samples are between 0 and 6.

Gvals = groupfilter(T,"groupID",@(x) min(x) > 0 && max(x) < 6)
Gvals=4×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2   
       3         5   

Create a table containing two variables that represent a day number and temperature.

daynum = [1 1 1 1 2 2 2 2]';
temp = [67 65 71 55 61 79 58 78]';
T = table(daynum,temp)
T=8×2 table
    daynum    temp
    ______    ____

      1        67 
      1        65 
      1        71 
      1        55 
      2        61 
      2        79 
      2        58 
      2        78 

Group by day number, and return the largest two temperatures for each day.

G = groupfilter(T,"daynum",@(x) ismember(x,maxk(x,2)))
G=4×2 table
    daynum    temp
    ______    ____

      1        67 
      1        71 
      2        79 
      2        78 

Create a table of dates and corresponding profits.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 4 10; ...
                       2017 4 14; 2017 4 30; 2017 5 25; ...
                       2017 5 29; 2017 5 21]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619]';
T = table(timeStamps,profit)
T=9×2 table
    timeStamps     profit
    ___________    ______

    04-Mar-2017     2032 
    02-Mar-2017     3071 
    15-Mar-2017     1185 
    10-Apr-2017     2587 
    14-Apr-2017     1998 
    30-Apr-2017     2899 
    25-May-2017     3112 
    29-May-2017      909 
    21-May-2017     2619 

Group the dates by month, and return rows that correspond to the maximum profit for that month.

Gmax = groupfilter(T,"timeStamps","month",@(x) x == max(x))
Gmax=3×3 table
    timeStamps     profit    month_timeStamps
    ___________    ______    ________________

    02-Mar-2017     3071         Mar-2017    
    30-Apr-2017     2899         Apr-2017    
    25-May-2017     3112         May-2017    

Return rows whose month had an average profit greater than $2300.

Gavg = groupfilter(T,"timeStamps","month",@(x) mean(x) > 2300)
Gavg=3×3 table
    timeStamps     profit    month_timeStamps
    ___________    ______    ________________

    10-Apr-2017     2587         Apr-2017    
    14-Apr-2017     1998         Apr-2017    
    30-Apr-2017     2899         Apr-2017    

Create a table T that contains information about nine individuals.

groupID = [1 2 3 1 2 3 1 2 3]';
Height = [62 61 59 66 70 72 57 67 71]';
HealthStatus = categorical(["Poor"; "Good"; "Fair"; "Poor"; "Fair"; "Excellent"; "Poor"; "Excellent"; "Fair"]);
T = table(groupID,Height,HealthStatus)
T=9×3 table
    groupID    Height    HealthStatus
    _______    ______    ____________

       1         62       Poor       
       2         61       Good       
       3         59       Fair       
       1         66       Poor       
       2         70       Fair       
       3         72       Excellent  
       1         57       Poor       
       2         67       Excellent  
       3         71       Fair       

Group by ID number, and return rows for groups that contain only members with a minimum height of 60.

G1 = groupfilter(T,"groupID",@(x) min(x) >= 60,"Height")
G1=3×3 table
    groupID    Height    HealthStatus
    _______    ______    ____________

       2         61       Good       
       2         70       Fair       
       2         67       Excellent  

Group by ID number, and return rows for groups that contain only members whose health status is Poor.

G2 = groupfilter(T,"groupID",@(x) all(x == "Poor"),"HealthStatus")
G2=3×3 table
    groupID    Height    HealthStatus
    _______    ______    ____________

       1         62          Poor    
       1         66          Poor    
       1         57          Poor    

Create a vector of dates and a vector of corresponding profit values.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ...
                       2017 3 14; 2017 3 31; 2017 3 25; ...
                       2017 3 29; 2017 3 21; 2017 3 18]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';

Group by day of the week, and compute the maximum profit for each group. Display the maximum profits and their corresponding groups.

[maxDailyProfit,dayOfWeek] = groupfilter(profit,timeStamps, ...
    "dayname",@(x) x == max(x))
maxDailyProfit = 5×1

        3071
        1185
        2899
        3112
        2619

dayOfWeek = 5x1 categorical
     Thursday 
     Wednesday 
     Friday 
     Saturday 
     Tuesday 

Input Arguments

collapse all

Input table, specified as a table or timetable.

Input array, specified as a column vector or a group of column vectors stored as a matrix.

Grouping variables or vectors, specified as one of these options:

  • For array input data, groupvars can be either a column vector with the same number of rows as A or a group of column vectors arranged in a matrix or a cell array.

  • For table or timetable input data, groupvars indicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.

    Indexing SchemeExamples

    Variable names:

    • A string or character vector

    • A string array or cell array of character vectors

    • A pattern object

    • "A" or 'A' — A variable named A

    • ["A" "B"] or {'A','B'} — Two variables named A and B

    • "Var"+digitsPattern(1) — Variables named "Var" followed by a single digit

    Variable index:

    • An index number that refers to the location of a variable in the table

    • A vector of numbers

    • A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing 0 or false values.

    • 3 — The third variable from the table

    • [2 3] — The second and third variables from the table

    • [false false true] — The third variable

    Function handle:

    • A function handle that takes a table variable as input and returns a logical scalar

    • @isnumeric — All the variables containing numeric values

    Variable type:

    • A vartype subscript that selects variables of a specified type

    • vartype("numeric") — All the variables containing numeric values

Example: groupfilter(T,"Var3",method)

Filtering method, specified as a function handle.

method defines the function used to filter out members from each group. The function must return a logical scalar or a logical column vector with the same number of rows as the input data indicating which group members to select.

  • If the function returns a logical scalar, then either all members of the group are filtered out (when the value is false) or none are filtered out (when the value is true).

  • If the function returns a logical vector, then members of groups are filtered out when the corresponding element is false, and members are kept when the corresponding element is true.

To define the function handle, use a syntax of the form @(inputargs) mymethod, where mymethod depends on inputargs.

  • A function can filter for rows corresponding to groups that meet a condition. For example, @(x) mean(x) > 10 passes to the output only rows corresponding to groups with a group mean greater than 10.

  • A function can filter for rows that meet a condition within their corresponding group. For example, @(x) x == max(x) passes to the output only rows corresponding to the maximum value of rows within their group.

For more information, see Create Function Handle and Anonymous Functions.

When groupfilter applies the method to more than one nongrouping variable at a time, the method returns a logical scalar or vector for each variable. For each row, the corresponding values in all returned scalars or vectors must be true to pass the row to the output.

Binning scheme for grouping variables or vectors, specified as one or more of the following binning methods. Grouping variables or vectors and binning scheme arguments must be the same size, or one of them can be scalar.

  • "none" — No binning.

  • Vector of bin edges — The bin edges define the bins. You can specify the edges as numeric values or as datetime values for datetime grouping variables or vectors.

  • Number of bins — The number determines how many equally spaced bins to create. You can specify the number of bins as a positive integer scalar.

  • Length of time (bin width) — The length of time determines the width of each bin. You can specify the bin width as a duration or calendarDuration scalar for datetime or duration grouping variables or vectors.

  • Name of time unit (bin width) — The name of the time unit determines the width of each bin. You can specify the bin width as one of the options in this table for datetime or duration grouping variables or vectors.

    ValueDescriptionData Type
    "second"

    Each bin is 1 second.

    datetime and duration
    "minute"

    Each bin is 1 minute.

    datetime and duration
    "hour"

    Each bin is 1 hour.

    datetime and duration
    "day"

    Each bin is 1 calendar day. This value accounts for daylight saving time shifts.

    datetime and duration
    "week"Each bin is 1 calendar week.datetime only
    "month"Each bin is 1 calendar month.datetime only
    "quarter"Each bin is 1 calendar quarter.datetime only
    "year"

    Each bin is 1 calendar year. This value accounts for leap days.

    datetime and duration
    "decade"Each bin is 1 decade (10 calendar years).datetime only
    "century"Each bin is 1 century (100 calendar years).datetime only
    "secondofminute"

    Bins are seconds from 0 to 59.

    datetime only
    "minuteofhour"

    Bins are minutes from 0 to 59.

    datetime only
    "hourofday"

    Bins are hours from 0 to 23.

    datetime only
    "dayofweek"

    Bins are days from 1 to 7. The first day of the week is Sunday.

    datetime only
    "dayname"Bins are full day names, such as "Sunday".datetime only
    "dayofmonth"Bins are days from 1 to 31.datetime only
    "dayofyear"Bins are days from 1 to 366.datetime only
    "weekofmonth"Bins are weeks from 1 to 6.datetime only
    "weekofyear"Bins are weeks from 1 to 54.datetime only
    "monthname"Bins are full month names, such as "January".datetime only
    "monthofyear"

    Bins are months from 1 to 12.

    datetime only
    "quarterofyear"Bins are quarters from 1 to 4.datetime only

Table variables to operate on, specified as one of the options in this table. datavars indicates which variables of the input table or timetable to apply the filtering methods to. Other variables not specified by datavars pass through to the output without being operated on. groupfilter applies the filtering methods to the specified variables and uses the results to remove rows from all variables. When datavars is not specified, groupfilter operates on each nongrouping variable.

Indexing SchemeExamples

Variable names:

  • A string or character vector

  • A string array or cell array of character vectors

  • A pattern object

  • "A" or 'A' — A variable named A

  • ["A" "B"] or {'A','B'} — Two variables named A and B

  • "Var"+digitsPattern(1) — Variables named "Var" followed by a single digit

Variable index:

  • An index number that refers to the location of a variable in the table

  • A vector of numbers

  • A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing 0 or false values.

  • 3 — The third variable from the table

  • [2 3] — The second and third variables from the table

  • [false false true] — The third variable

Function handle:

  • A function handle that takes a table variable as input and returns a logical scalar

  • @isnumeric — All the variables containing numeric values

Variable type:

  • A vartype subscript that selects variables of a specified type

  • vartype("numeric") — All the variables containing numeric values

Example: groupfilter(T,groupvars,method,["Var1" "Var2" "Var4"])

Included bin edge for binning scheme, specified as either "left" or "right", indicating which end of the bin interval is inclusive.

You can specify LR only if you also specify groupbins, and the value applies to all binning methods for all grouping variables or vectors.

Output Arguments

collapse all

Output table for table or timetable input data, returned as a table or timetable. G contains the rows in T that satisfy the group-wise filtering method.

Output array for array input data, returned as a vector or matrix. B contains the rows in A that satisfy the group-wise filtering method.

Grouping vectors for array input data, returned as a column vector or cell array of column vectors. BG contains the unique grouping vector or binned grouping vector combinations that correspond to the rows in B.

Alternative Functionality

Live Editor Task

You can use groupfilter functionality interactively by adding the Compute by Group task to a live script.

Compute by Group task in the Live Editor

Extended Capabilities

Version History

Introduced in R2019b

expand all