handling irregular observations. Maybe more progress needs to be made by Matlab team

Dear all,
Since in my analysis I use irregular time series observations that do not have standard frequency (like monthly, daily , yearly, quarterly) I was wondering how useful matlab can be in this case.
To give an example please take a look at the following link that displays how SAS (which I am not familiar with) can handle "automatically" such problems
I paste the table "Output 14.3.1 Measured Defect Rates"
1 13JAN1992 55
2 27JAN1992 73
3 19FEB1992 84
4 08MAR1992 69
5 27MAR1992 66
6 05APR1992 77
7 29APR1992 63
8 11MAY1992 81
9 25MAY1992 89
10 07JUN1992 94
11 23JUN1992 105
12 11JUL1992 97
13 15AUG1992 112
14 29AUG1992 89
15 10SEP1992 77
16 27SEP1992 8
we have irregular observations and after the interpolation we get monthly averages :
Obs date defects
1 JAN1992 59.323
2 FEB1992 82.000
3 MAR1992 66.909
4 APR1992 70.205
5 MAY1992 82.762
6 JUN1992 99.701
7 JUL1992 101.564
8 AUG1992 105.491
9 SEP1992 79.206
I had a discussion with Oleg regarding one of my previous questions
on how to obtain monthly averages when I have irregular observations. If I apply the approach of Oleg half the values in the output matrix interpData{b} are the same as the original input matrix A. But as you can see from the second table above, none of these values are the same as those of the first table.
is it possible to apply something similar as in the case of SAS program?. If not, then it is a pity that such a powerful program like Matlab is less better than SAS in this domain of converting irregular time series observations to other frequencies.
Thank you

10 Comments

The September value of SAS
9 SEP1992 79.206
looks strange given the two momentary(?) September values,
15 10SEP1992 77
16 27SEP1992 8
.
What do you have? Are they daily values? Exactly, what does
15 10SEP1992 77
represent? The average over the time period, (29AUG1992,10SEP1992]?
Do you know the SAS Procedure: EXPAND Procedure?
Well, I am not sure. But regarding my data that are bimonthly ( http://www.mathworks.de/matlabcentral/answers/44968-data-frequency-conversion-problem) is there anything else I can do to obtain monthly averages?
thanks
@per: the last value is 82 (checking the link)
If you only have bimonthly averages you cannot "obtain" monthly averages. It is not possible. The best you can do is make estimates of the monthly averages. Then extra information is important. Do you have more information on the underlying time series than the data. E.g. do you have an idea of how it varies during a year?
no I have no more information on the underlying time series than the data. How can I make estimates of the monthly averages? Using the approach of Oleg?
I am asking because you are the experts.
@Oleg: You are right about "82". However, the SAS example upsets me a bit.
Given is a couple of daily values of "Sampled Defect Rates" per month. The figures give me the impression that the calculated monthly values are weighted averages. The weight is the half times to the two neighbours. Instead of three digits after the decimal point I would like to see an estimate of the error.
@salva: I'm not an expert in statistics. However, I know that if the data represent outdoor ambient temperatures in Sweden then one approach might be appropriate (a lot is known about weather in Sweden) and with "Sampled Defect Rates" another approach.
Why do you need monthly data?
@salva: You write "for some others I have monthly and for the rest I have more irregular time series observations."
Do you have reason to believe that data from different countries share certain statistical properties? .... your problem is more about your specific data and statistical methods ... less about Matlab functions.
well, I have these data and what I know is that these values represent either a 4,5,6,8,or 9 week average and from these values I have to obtain (via interpolation via weighted averages?)estimated monthly averages.I would be grateful to you if you give me some guidelines on how to obtain estimated monthly averages.
I have no other way of solving this problem apart from asking you, guys
thank you
I'll like to pose a question. Assume you have bimonthly data
Jan&Feb 17
Mar&Apr 71
May&Jun 43
and I claim that the "best" monthly averages are
Jan 17
Feb 17
Mar 71
Apr 71
May 43
Jun 43
I guess you don't agree, but what arguments would you use to convince me that there are "better" estimates?
There is no magic trick!

Sign in to comment.

 Accepted Answer

I gave a look at SAS and honestly I don't understand how they got those values!
My approach was to take intra-month averages (I tried to interpret SASs method) and then interpolate them:
A = {
1 '13JAN1992' 55
2 '27JAN1992' 73
3 '19FEB1992' 84
4 '08MAR1992' 69
5 '27MAR1992' 66
6 '05APR1992' 77
7 '29APR1992' 63
8 '11MAY1992' 81
9 '25MAY1992' 89
10 '07JUN1992' 94
11 '23JUN1992' 105
12 '11JUL1992' 97
13 '15AUG1992' 112
14 '29AUG1992' 89
15 '10SEP1992' 77
16 '27SEP1992' 82}
% Convert dates to serial dates and store with data in a double matrix
data = [datenum(A(:,2),'ddmmmyyyy') cat(1,A{:,3})];
% Retrieve month year day
[yy mm dd] = datevec(data(:,1));
% Create aggregation subs for accumarray
subsr = repmat((yy-yy(1))*12 + mm-mm(1) + 1,2,1);
subsc = repmat(1:size(data,2),size(data,1),1);
% Take averages
avgData = accumarray([subsr subsc(:)], data(:),[],@nanmean);
% Interpolate
xi = datenum(1992,1:9,1);
intData = interp1(avgData(:,1),avgData(:,2),xi,'linear','extrap')
% Also, direct interpolation without averaging
intData2 = interp1(data(:,1),data(:,2),xi,'linear','extrap');
Plot
plot(data(:,1),data(:,2),'-db',xi,intData,'--om',xi,intData2,'-.+r')
axis tight
grid on
set(gca,'Xtick',xi)
datetick('x','mmm yy','keepticks')
legend('your data','interpolation of averages','direct interpolatio','location','NorthWest')
I feel a clarification is needed in response to salva's comments:
I don't know how many times I already said that, but manipulating data is dodgy. Even more the way SAS accomplishes that, which is not CLEAR from the link.
If you're doing research in finance/economics and you manipulate your data because you need it at certain points in time (at the beginning of the month) it's gonna already be an artificial result, but acceptable.
Do you think SAS is fancy because it changes ALL the values, well I assure SASs power isn't that.
MATLAB may lack some functions, but nobody stops you from writing your own and sharing it on the FEX.
MATLAB is not just a program but a programming language and it's not limited to statistics!
So yes, SAS could be more suited for statistical analysis because it has more embedded functions.

8 Comments

@Oleg. Your approach is great! I will accept it! Can I also apply the same approach to the previous question that I had regarding bimonthly data or is it different?. If so can you show me how to do it. I am totally amazed (and a bit lost at the same time:))
Thanks Oleg!. Regarding the data that I displayed in my previous question
should I apply the approach which is given in this question or that which you proposed in the previous question?
thanks
As per isakson already said, you cannot get a finer representation when you already have a coarser one.
So, lets get back to bi-monthly data. Suppose this is the amount of money you get at the end of every 2 months:
Feb 20
Apr 40
Jun 33
Now, can you tell me how much you got on average for Jan, Mar and May?
I assure you can still answer the question, but if I ask you which specific values gave you the average for Jan, Mar and May?
You should get the intuition by now that can come up with numbers that verify the previous question, but here's the trick, you invented them because the only true values are observed on Feb, Apr and Jun!
SO, if i understand correctly I can apply both of your approaches (the one in this thread and the other in the prevous thread). Am I right?
per isakson mentioned that: "The figures give me the impression that the calculated monthly values are weighted averages". SO this means that I can use weighted averages instead of interpolation?
thanks
Let's put it that way, it appears you have different datasets and before deciding what to do ask yourself this question:
"Does the manipulation create genuine results or I get garbage out of it?"
Therefore, I do not recommend using even interpolation because absence of data for a month is already some information about the behaviour of that series.
Can you use my second method with bi-monthly data? I doubt so.
As per says, it's not about MATLAB here, but about understanding what you need to do with your data.
Why do you need it monthly in the first place?
The linear interpolation IS a weighted average if you think about it. You're taking a bit of the value on the left and a bit of the value on the right, the bit is the weight.
thanks Oleg. So if neither interpolation nor the second menthod is apropriate, then what other options do I have? I suppose that many people around the world face the same problem as I do. SO, how they solve it?. Because according to what you say I get the feeling that such a problem (converting data available every two months to estimated monthly averages) has no solution. Regarding your question “Why do you need it monthly in the first place?” the answer is simple. I have data sets with different frequencies (bimonthly, monthly and irregular observations). My decision was to create a common data frequency is order to perform some kind of econometric analysis later
thanks
First of all you have to quantify how much of the population you lose if you discard completely the irregular series and the bi-monthly. Decide then which series to keep.
Then, I would suggest to apply some selection rules, a very standard approach. Filter out from the analysis those series which do not pass the selection rules, i.e. those which have very irregular spacing in time. How to decide about the rules, you should refer to literature that has already approached your type of analysis/data.
You can aggregate the monthly data to the bi-monthly frequency, that wouldn't impact your results as much as would the interpolation.
Hi Oleg,
Thank you for your reply. I think that for my purposes it would be convenient to focus only in the case of transforming bimonthly to monthly data. Specifically, What I am asking is how I can modify your approach that you proposed here
when we take into account that the months do not have the same length. Actually, I have opened a new question for this purpose here
I think that this the most interesting to me at the moment
cheers

Sign in to comment.

More Answers (0)

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!