Conditional average (need help with speed)

Question

0 votes

I have a table that looks like this:

country_id   year      M       T      average_T
          2000      10      76     NaN 
          2001      5       39     Mean of 76 and 62
          2002      NaN     37     Mean of 39 =39
          2003      15      5      NaN
          2004      10      28     Mean of 5 and 2
          2005      10      8      Mean of 8=8
          1999      15      1      NaN
          2000      10      62     Mean of 1=1
          2001      20      32     Mean of 76 and 62
          2002      10      72     Mean of 32=32
          2003      15      2      Mean of 5 and 2

I want to calculate the column average_T which is last year's average of the T values for the cases that have the same year and M value. (First entry for each id is NaN because we don't know past year's T for those entries)

I have written a code that can do this but it is impossible to run with my big data set:

mytable.average_T=NaN(N,1);
for k=2:N 
    if mytable{k,'country_id'} == mytable{k-1,'country_id'} 
        mytable.average_T(k,1)= mean(T(mytable.M==mytable.M(k-1)& ...
            mytable.year==mytable.year(k-1)), 'omitNaN');
     end
end

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

dpb on 16 Jan 2021

Edited: dpb on 17 Jan 2021

Open in MATLAB Online

0 votes

Grouping variables and rowfun to the rescue...

tMeans=rowfun(@(x),mean(x,'omitnan'),mytable,'InputVariables','T','GroupingVariables',{'year','M'});

11 Comments
Show 9 older comments Hide 9 older comments

dpb on 17 Jan 2021

Edited: dpb on 17 Jan 2021

It makes no sense, no. You either compute average over each country ID as a group as well or you don't group countries -- if you keep the country id then that is the ID of the group; if you don't use countries as a grouping variable then there is no way to associate any given order of the contributing elements that made up that average to the average itself; that is gone.

As noted in the other Q? of the same subject, you could keep a set of which countries were include in the averaging, but that's all that is, there's no order to associate with the mean.

Or in a similar vein as in the other Q? comment you could assign an auxiliary variable that is the row in the table that is passed through the function and kept with the group that would identify the members of the group but again while that could be sorted, other than it is the identification of who is in the group, there's no meaning in the order in the computed mean.

BTW, this last id would just be the grouping index you could get from findgroups; it may be that the information contained from it is what you're actually looking for here, but the request as couched just doesn't make sense.

As for the previous year thing, you can simply associate the computed average of the year with the previous year after the fact or create another year variable that is the actual year+1 to use as the grouping variable instead.

Mia Dier on 17 Jan 2021

Amazing thank you! :)

dpb on 17 Jan 2021

NB: You could do the same thing with findgroups and splitapply without building the output table from rowfun, too.

Sign in to comment.

Conditional average (need help with speed)

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

11 Comments
Show 9 older comments Hide 9 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Conditional average (need help with speed)

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

11 Comments Show 9 older comments Hide 9 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

11 Comments
Show 9 older comments Hide 9 older comments