How can I remove outliers in my data using Cook's Distance?

4 views (last 30 days)
I have a large dataset, 6 .'xlsx' files with ~ 400,000 rows each, and I want to use Cook's Distance to determine the outliers in the fourth column of each dataset and then delete the corresponding row. How would I do that?
  2 Comments
Fatemah Ebrahim
Fatemah Ebrahim on 29 Jun 2020
Edited: Fatemah Ebrahim on 29 Jun 2020
Hi! So I'm using the code they used on one of the '.xlsx' files as so:
X = A_t; % where this is a datetime value
Y = Adata(:,4); % where we are pulling the fourth column of the table
mdl = fitlm(X,Y);
plotDiagnostics(mdl,'cookd')
find((mdl.Diagnostics.CooksDistance)>3*mean(mdl.Diagnostics.CooksDistance))
And I am getting this error:
Error using classreg.regr.TermsRegression/handleDataArgs (line 550)
Predictor variables must be numeric vectors, numeric matrices, or
categorical vectors.
Error in LinearModel.fit (line 1184)
[X,y,haveDataset,otherArgs] =
LinearModel.handleDataArgs(X,varargin{:});
Error in fitlm (line 121)
model = LinearModel.fit(X,varargin{:});
Please let me know if you have any idea how to address this error, there does not seem to be much information on this. Thanks!

Sign in to comment.

Answers (0)

Categories

Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!