# How to remove outlier of time series signal data by using cubic spline interpolation?

17 views (last 30 days)
Yared Daniel on 13 May 2021
Commented: Yared Daniel on 13 May 2021
Hello everybody am new for Matlab and need an algorithm for my project that computes the following problems.
I have a time series 1D signal data like this:
x=[180 142 213 78 90 192 40 38 105 162 270 262 211 170 72 28 40 55 90 201]
and
L = numel(x),
fs=4
t=0:1/fs:(L-1)/fs;
I want to detect and remove outliers which is less than 50 and greater than 200 by using cubic spline interpolation.

DGM on 13 May 2021
If you want to use normal tools,
x=[180 142 213 78 90 192 40 38 105 162 270 262 211 170 72 28 40 55 90 201]
L = numel(x);
fs=4
t=0:1/fs:(L-1)/fs; % [0 4.75]
limits = [50 200];
% find valid data locations
% select only x,t corresponding to good samples
% interpolate back to original timebase
xr = interp1(tc,xc,t,'spline');
% or perhaps use a finer timebase?
tf = linspace(min(t),max(t),100);
xf = interp1(tc,xc,tf,'spline');
plot(t,x,'k:'); % all original samples
hold on; grid on
plot(tc,xc,'ko') % valid samples
plot(t,xr,'g') % same timebase
plot(tf,xf,'b') % finer timebase
Of course, there's nothing stopping the interpolated result from projecting values beyond the limits again, because that's where they would be using a spline.
If this is some sort of homework where you have to write a cubic spline interpolation routine, that's a different story.
Yared Daniel on 13 May 2021
I am so grateful for your help dear,
But let me modify my question to make clear. I am doing preprocessing on time series signal recorded from a sensor that resembles aforementioned array. I want to remove data which is >200 and < 50 which are considered as noise or spikes, and replace them with the result obtained by interpolating using cubic spline on a time t=0:1/fs:(L-1)/fs; % [0 4.75]
Regards
DGM on 13 May 2021
I'm confused. That description seems to be what this:
% interpolate back to original timebase
xr = interp1(tc,xc,t,'spline');
part would be (as opposed to xf). That was the complete removal of said samples and cubic spline interpolation on the given timebase.
If it's not quite what you want, you can always unaccept the answer. Accepting an answer before you're satisfied makes it much less likely that others might try to offer different solutions.

Steven Lord on 13 May 2021
Use > and < or the isoutlier function to determine where the outliers are located.
Use filloutliers specifying 'OutlierLocations' with the vector of outlier locations determined in the first step as that argument's value. Specify 'spline', 'pchip', or perhaps 'makima' as the fillmethod input.
DGM on 13 May 2021
Edited: DGM on 13 May 2021
So long as the same identification method and interpolation option is chosen, this replicates the results above. isoutlier() and filloutlier() are new to me. I'm still finding all sorts of new things since I got to try something newer than R2015b.
x=[180 142 213 78 90 192 40 38 105 162 270 262 211 170 72 28 40 55 90 201]
L = numel(x);
fs=4
t=0:1/fs:(L-1)/fs; % [0 4.75]
limits = [50 200];
% find valid data locations
% using filloutliers
plot(t,x,'k:'); hold on; grid on
plot(t,xrfo,'b')
Using makima or pchip might help reduce the degree to which the interpolated result projects again beyond the thresholds.
Yared Daniel on 13 May 2021
Thanks this worked well