Recommendations on how to approach XLOOKUP/VLOOKUP/SUMIF operations on a table

16 views (last 30 days)
I have a very large table which has defined column names, and several of the initial columns with only strings in them. These are like descriptive variables. The remaining columns (30 in total) store numerical values.
I'm trying to come up with a systematic approach on how to isolate the rows that contain strings from multiple colums, and then do the arithmetic operations on the numbers located in the same rows of the remaining columns.
Most of the answers I found in the community discuss predefined locations in the table, as in "I'd like to sum up the values in rows x to y in column z". I'm trying to come up with a more general and flexible approach.
Allow me to try and illustrate my question on a severely simplified example. Assume I have a table:
| Material | Sigma | L | LR |
|---|---|---|---|
| Aluminum | Plus | .12 | 111 |
| Aluminum | Minus | .34 | 222 |
| Steel | Plus | .56 | 333 |
| PET | Minus | .78 | 444 |
| Steel | Minus | .90 | 555 |
| PET | Plus | .21 | 666 |
I am able to load in the table via:
my_data = readtable('raw_data.csv','PreserveVariableNames',true);
For example, I am able to average all the values in column LR by isolating it first into an array and then using the mean function on it:
sum_LR = mean(my_data.LR)
But what I don't know is how to do this only for the rows where Material is Steel, or where Sigma is Minus, or both at the same time. Say, what is the sum of LR for Steel - how to get 888? Or what is the average value of L where the Material is x and at the same time Sigma is y.
I tried to modify this solution but I'm unable to come up with anything that will work. So I'd like to aks you only for advice on how should I approach this?
Thank you.

Answers (1)

Cris LaPierre
Cris LaPierre on 17 Aug 2021
Perhaps you want to look into grpstats or groupsummary?
We can provide more specific advice if you can be more specific. Can you attach your file and some examples of calculations you need to compute? Use the paperclip icon to attach files to your post.
  4 Comments
Eric Sofen
Eric Sofen on 17 Aug 2021
Chris's suggestion of groupsummary or grpstat (or varfun or findgroups/splitapply) is the right approach if you want to calculate the statistics for each of the groups - that is, mean for steel, mean for Aluminum, mean for PET. You can specify multiple grouping variables in these functions to do the various combinations: means separately for PET/Plus, PET/Minus, Steel/Plus...
If you only want one of these, you can do some logical indexing on the table based on the first few variables to select the subtable or array of interest, then calculate your statistics on that.
For example, if you just want the mean of LR for Steel/Minus, use {} to extract the LR data as an array just where the material and sigma are correct. Then pass that to mean.
mean(t{t.Material =="Steel" & t.Sigma == "Minus","LR"})
Milos Krsmanovic
Milos Krsmanovic on 17 Aug 2021
OK, thank you very much @Cris LaPierre, @Eric Sofen. Both of those approaches are exactly what I was looking for.
I tried:
mean(my_data.LR(my_data.Material == "Steel" & my_data.Sigma == "Minus"))
i.e. the concateanated version of Chris' version and also Eric's proposal:
mean(my_data{my_data.Material =="Steel" & my_data.Sigma == "Minus","LR"})
and they both worked like a charm. Plus - no additional sub-tables were created, which is great. And I learned more about the syntax, I knew this could be made so simple in MATLAB.
Thank you!

Sign in to comment.

Categories

Find more on Data Preprocessing in Help Center and File Exchange

Tags

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!