averaging over a timeseries

39 views (last 30 days)
Sara on 25 Oct 2012
Answered: Johan Sjöstedt on 27 Jul 2015
I want to compare two timeseries, one which has hourly sampling and one which has five minute samples. They start and end and different times, but basically I want both to start at one am and end at midnight and be the same length.
However, I don't know how to calculate an hourly average from my five minute samples and make sure it matches the timestamps of the hourly ones.

per isakson on 26 Oct 2012
Edited: per isakson on 3 Nov 2012
This example might help you find a solution to your specific problem, i.e. "I don't know how to calculate an hourly average from my five minute samples"
Create an irregular time series with time steps typically
sdn = now + 6*sort( rand( 1000, 1 ) ); % serial data numbers
val = randn( size( sdn ) ) + sin( sdn );
Create points in time of an hourly time series
sdn1 = floor( sdn(1) );
sdn2 = ceil( sdn(end) );
% Create a series with minimal rounding errors
sdn_hourly = bsxfun( @plus, (sdn1:sdn2), transpose((0:23)/24) );
sdn_hourly = sdn_hourly(:);
Reduce the irregular time series to a regular hourly time series
[ ~, ixs ] = histc( sdn, sdn_hourly );
valh = accumarray( ixs, val, size( sdn_hourly ), @mean, nan );
.
--- [EDIT, 2012-11-02]: in response to comment ---
Averaging of regular time series causes some problems because of
• floating point precision; a rounding error may cause "13:00:00" to be included in the previous OR the succeeding "hourly interval"
• definition of time intervals. We want an half-closed interval because one value, e.g. "13:00:00", must not be included in two "hourly intervals".
My solution to these two problems are
• use seconds and whole numbers to represent time; I use serial-second-number, ssn. The resolution of one second is good enough for me. Before "ssn" my code was littered with "abs(sdn-sdn1)<eps" and similar.
• use the half-closed interval, [t1,t2). t such that t1<=t<t2 is in the interval. The function, histc, supports the half-open interval [). I let t1 either be a point in time or the time interval, [t1,t2). (The colon operator and linspace returns closed intervals: both the endpoints are included.)
• the half-closed interval (t1,t2] and
• that t2 is used as timestamp of the interval, (t1,t2]
I cannot see a good solution to that. Here is my experiment. The last three lines of code calculates hourly averages according to your requirements. (Replace @sum by @mean and do the testing that I haven't done.)
N = 2;
Create an regular time series with 5-minute time steps
vec = datevec( now );
sdn_5m = datenum(vec(1),vec(2),vec(3),0, transpose(0:5:N*24*60-0.01), 0 );
val_5m = ones( size( sdn_5m ) );
Create points in time of an hourly time series
sdn1 = floor( sdn_5m(1) );
sdn2 = ceil( sdn_5m(end) );
% Create a series with minimal rounding errors
sdn_hourly = bsxfun( @plus, (sdn1:sdn2), transpose((0:23)/24));
sdn_hourly = sdn_hourly(:);
Reduce a regular 5-minute series to a regular hourly time series [t1,t2)
[ ~, ixs ] = histc( sdn_5m, sdn_hourly );
val_5m_h1 = accumarray( ixs, val_5m, size( sdn_hourly ), @sum, nan );
assert( all( val_5m_h1(not(isnan(val_5m_h1)) ) == 12 ), 'a:b:c' ...
, 'Not all hourly sum are equal to twelve' )
%%Reduce to (t1,t2]
half_a_sec = 1/(24*60*60*2); % quick fix for this case
[ ~, ixs ] = histc( sdn_5m - half_a_sec, sdn_hourly );
val_5m_h2 = accumarray( ixs+1, val_5m, size( sdn_hourly ), @sum, nan );
per isakson on 2 Nov 2012
See above

Ryan G on 25 Oct 2012
You could try utilizing the tstool in MATLAB and see if that can ease this process at all.
Otherwise you have to make decisions on how you want to analyze your data. Do you just want the endpoints of the 5 minute sample data to match the hourly data? Do you want to find the average over the hour with the 5 minute data?
Ryan G on 25 Oct 2012
You could probably do something simple like this:
myOldData = datenum(DateVector); %convert date/time to number
myInterval = 60/5; %60minutes/hour 5 minutes/sample
for i = 1:1000
myNewData(i) = mean(myOldData(((i-1)*myInterval+1):i*myInterval));
end
Where myOldData is your data in vector form. Someone on here could probably do this without a for-loop I'm guessing, I'm not that good.

Johan Sjöstedt on 27 Jul 2015
For the sake of variety, and also to see if anyone has an input regarding the adequacy of my solution, here's another way to do it. Note that in my case I might have had other conditions to consider than you.
My case involves measurements from 2 different machines. The first one delivers measurements each second, the other one delivers mean values over each minute. I need to calculate average values over each minute from machine #1 to be able to plot against the values from machine #2.
function [M]=mean_minutes(x)
l=length(x); % Get number of elements in input vector.
N=round(l/60); % Number of intervals, to get 60 second intervals
n=round(l/N); % Length of each interval.
M=zeros(N,1); % Pre-allocate vector of the right size, filled with zeros.
for i = 1:N % For as many intervals we have..
M(i) = mean(x(((n*i-(n-1):i*n))));
% ..fill every element corresponding to minute-value element from machine #2 with the mean value over each minute from machine #1. Ex: 60 days of measuring -> data over 86400 seconds -> l=86400, N=1440, n=60. (n*i-(n-1):i*n) gives for i=1: 60*i-(60-1):1*60 -> Data from second 1 to 60. For i=2: 60*2-(60-1):2*60 -> Data from second 61 to 120 etc...
end
end
Thereafter, these intervals over which I calculate mean values from machine #1 needs to be synchronized so that they correspond to the exact same intervals of seconds over which machine #2 calculates its mean values (they might have different clock settings and also be subject to drift, eg. different time coefficients).