How to use retime to get the plot of mean and stdev in continous time series

4 views (last 30 days)
Hello there,
I have a datasets containing y-values in multi-years time series. My intention is getting the plot of monthly mean and its shaded error bars. The datasets contain long span of months which crossing the following year, i.e. in my datasets (attached) there from February, 2020 until December 2022.
But, at first, I want to get the data cleaned by selecting only y-value with a certain range. Here is my first sight code:
T = readtable('datamine');
Tr = table2timetable(T, 'RowTimes','time');
Tx = table(Tr.time,Tr.value_y,'VariableNames',{'time_a','val'});
idx = Tx.val>= 12 & Tx.val< 13.4; % selecting only y-value with a certain range
Tx_mean = retime(Tx(idx,:),'monthly','mean'); % get monthly averaged values ==> Failed
Tx_std = retime(Tx(idx,:),'monthly',@std); % get monthly stdev values ==> Failed
I tried to use directly retime function but it failed. Anyone know to get continous monthly mean and its shaded error bars properly?
More or less, the plot should be like this. The x-axis is continous months, crossing the following years:
Thanks!

Accepted Answer

Star Strider
Star Strider on 19 Oct 2024
The correct way to express the deviation of cata around the mean is to use the standard error of the mean, given by:
where σ is the standard deviation and N are the number of data used to calculate it.
T1 = readtable('datamine.xlsx')
T1 = 147152x2 table
value_y time _______ ____________________ 12.887 15-Feb-2020 09:50:00 13.136 15-Feb-2020 10:00:00 13.127 15-Feb-2020 10:09:59 12.894 15-Feb-2020 10:20:00 12.816 15-Feb-2020 10:30:00 12.355 15-Feb-2020 10:39:59 12.317 15-Feb-2020 10:50:00 12.922 15-Feb-2020 10:59:59 13.162 15-Feb-2020 11:10:00 13.163 15-Feb-2020 11:19:59 13.109 15-Feb-2020 11:30:00 13.139 15-Feb-2020 11:39:59 13.148 15-Feb-2020 11:50:00 12.937 15-Feb-2020 12:00:00 12.387 15-Feb-2020 12:10:00 12.695 15-Feb-2020 12:20:00
TT1 = table2timetable(T1)
TT1 = 147152x1 timetable
time value_y ____________________ _______ 15-Feb-2020 09:50:00 12.887 15-Feb-2020 10:00:00 13.136 15-Feb-2020 10:09:59 13.127 15-Feb-2020 10:20:00 12.894 15-Feb-2020 10:30:00 12.816 15-Feb-2020 10:39:59 12.355 15-Feb-2020 10:50:00 12.317 15-Feb-2020 10:59:59 12.922 15-Feb-2020 11:10:00 13.162 15-Feb-2020 11:19:59 13.163 15-Feb-2020 11:30:00 13.109 15-Feb-2020 11:39:59 13.139 15-Feb-2020 11:50:00 13.148 15-Feb-2020 12:00:00 12.937 15-Feb-2020 12:10:00 12.387 15-Feb-2020 12:20:00 12.695
SEM = @(x) std(x)/sqrt(numel(x));
TT1momean = retime(TT1, 'monthly', 'mean')
TT1momean = 35x1 timetable
time value_y ___________ _______ 01-Feb-2020 12.949 01-Mar-2020 12.659 01-Apr-2020 12.587 01-May-2020 12.393 01-Jun-2020 12.535 01-Jul-2020 12.77 01-Aug-2020 12.802 01-Sep-2020 12.6 01-Oct-2020 12.585 01-Nov-2020 13.011 01-Dec-2020 12.637 01-Jan-2021 12.353 01-Feb-2021 12.727 01-Mar-2021 12.884 01-Apr-2021 12.835 01-May-2021 12.701
TT1mosem = retime(TT1, 'monthly', SEM)
TT1mosem = 35x1 timetable
time value_y ___________ _________ 01-Feb-2020 0.0093507 01-Mar-2020 0.0067858 01-Apr-2020 0.00657 01-May-2020 0.0067406 01-Jun-2020 0.0038461 01-Jul-2020 0.0050221 01-Aug-2020 0.0055277 01-Sep-2020 0.0039343 01-Oct-2020 0.0054015 01-Nov-2020 0.0074985 01-Dec-2020 0.0082383 01-Jan-2021 0.0049619 01-Feb-2021 0.0080578 01-Mar-2021 0.004472 01-Apr-2021 0.0032843 01-May-2021 0.0033188
TT1monum = retime(TT1, 'monthly', 'count');
[Nmin,Nmax] = bounds(TT1monum{:,1})
Nmin = 331
Nmax = 4464
% TT1Time = TT1mosem.time;
% TT1SEM = TT1mosem{:,1};
figure
plot(TT1momean.time, TT1momean{:,1}, '-k')
hold on
patch([TT1mosem.time; flip(TT1mosem.time)], [TT1momean{:,1}-TT1mosem{:,1}*1.96; flip(TT1momean{:,1}+TT1mosem{:,1}*1.96)], 'r', 'FaceAlpha',0.25, 'EdgeColor','r')
hold off
grid
xlabel('Time')
ylabel('Value')
title('Mean ±95% CI')
The SEM values are quite small when compared to the mean values (on the order of ) so they are barely visible. I did a separate accumulation for the number of values in each month, and since they were all above 330, using the 95% confidence intervals from the normal distribution is a safe estimate. It is not necessary to use the t-distribution, since it will closely approximate the normal distribution with this many degrees-of-freedom.
.
  12 Comments

Sign in to comment.

More Answers (0)

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!