how can I Replace outliers with median of previous observations?

Hello i have some outliers in a 206*174 dataset matrix.. I want to replace them with the median of the 5 previous observaitons using a loop..
how can i do that?
[EDITED, copied from Answer section, Jan]
i will be more clear.. outliers are observations of stationary series with absolute deviations from the median which exceed six times the interquartile range. I want to replace them with the median of the preceding five observations. thanks

7 Comments

If you post, what you have done so far, inserting the required changes would be easier. Please explain, what "previous" exactly mean, when you operate on a matrix. And specify what you mean by "outlier" - did you have a method to detect them already? If not, how are they recognized?
if true
% code[t n]=size(NUM)
X=median(NUM)
X1=repmat(X,t,1)% creates a large matrix that
% each column has n times the median value of the column
NUM1=NUM-X1 %substract each row to find the Mean absolute deviation
NUM1=abs(NUM(:,:))%take the absolute value
for j=1:n
Y(:,j)=iqr(NUM(:,j))% find the value of the difference between
%3 and 1 quartile
end
Y1=repmat(Y,t,1)
NUM2=6*Y1% multiply each value x6
outliers=NUM1-NUM2 %an outlier is when the MAD>6*Diff inquartiles
[x w]=find(outliers>0)%x is the row and w the column of each outlier
end
i want now to replace them with the median of the 5 preceding observations
you just need to say, no need for the find
Y(Y==outlieres) = X;
Please post clarifications of the question by editing the question. This is the location, where readers expect all necessary information.
Let me ask you again: What does "preceding" mean, when you process a matrix? The 5 rows before, the 5 columns before, 5 other matrices processed before? Do you have the indices of the outlöiers already or is this a part of the question?
The more time we waste with guessing, the less time is left for answering.
preceding means with the 5 rows before.. yes i use the find command to find the row and column of outliers.. i dont know if there is better way
So i will repeat.. I have a data of 206*174 observations... rows is time observations and columns is variables.. i want to find the outliers that are defined the the median absolute deviations to be greater 6 times the interquartile range in each variable series.
after that i want to replace each outlier with the median of previous 5 rows. thanks
% %Now we remove outliers like the paper of Stock and Watson 2005(num=data)
[t n]=size(NUM)% row size of data
X=median(NUM) %find the median of each column of NUM
X1=repmat(X,t,1)% creates a large matrix that
% each column has n times the median value of the column
NUM1=NUM-X1 %substract each row to find the Mean absolute deviation
NUM1=abs(NUM(:,:))%take the absolute value
for j=1:n
Y(:,j)=iqr(NUM(:,j))% find the value of the difference between
%3 and 1 quartile.
end
Y1=repmat(Y,t,1)
NUM2=6*Y1% multiply each value x6
outliers=NUM1-NUM2 %an outlier is when the MAD>6*Diff inquartiles
[x w]=find(outliers>0)%x is the row and w the column of each outlier
v=ones(t,n)
v(outliers>0)=0
%Note here that some problems arise for very smooth series so we remove %them for further analysis v(:,[39;84;86;92;95])=1 [x w]=find(v==0)
NUM1=zeros(size(data_st)) j=1
for i=1:t
if v(i,j)==0
NUM1(i,:)=NUM(median(NUM(i-6:i-1,:),1))
elseif v(i,:)==1
NUM1(i,:)=data_st(i,:)
end
j=j+1
if j==175
break
end
end
disp('Done')

Sign in to comment.

Answers (1)

something like this should work
yourthreshold = 10;
Data(Data>yourthreshold) = median(median(Data));
this replaces all values being greater than 10.

Products

Tags

No tags entered yet.

Asked:

on 19 Jul 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!