Finding outliers in a dataset

6 views (last 30 days)
Salma fathi
Salma fathi on 2 Aug 2022
Answered: Cris LaPierre on 2 Aug 2022
Hello, shown in the image are the plots for the dataset I am having. I am trying to clean out the dataset from outliers so that later on I would use it to train a machine learning model.
but apparently it is considering a lot of important data points as outliers, so is there any other approach I could follow to get rid of the outliers?
the plot on top is the whole dataset and in the bottom is after removing the outliears using the following lines
nonOutliers=rmoutliers(Matrix3, 'mean');
figure(3);tiledlayout(2,1);nexttile;
scatter(Matrix3(:,1),Matrix3(:,2),1);
nexttile;
scatter(nonOutliers(:,1),nonOutliers(:,2),1)
ylim([0 10*10^12])
  1 Comment
Monica Roberts
Monica Roberts on 2 Aug 2022
One thing to consider is, what do you consider outliers when you look at the graph? Right now, MATLAB doesn't seem to be considering the X-values when calculating outliers. You may want to consider splitting your data into chunks and passing it into rmoutliers. I'd start at where the data shoots up and group every ~200 values of x, pass those chunks into rmoutliers, and see what happens.
There are also other parameters you can pass into rmoutliers. For instance, maybe "mean" isn't the best method of detecting outliers for this dataset. Have you tried the others? The 'movmean' or 'movmedian' methods, for instance, might do the chunking I've described.

Sign in to comment.

Answers (1)

Cris LaPierre
Cris LaPierre on 2 Aug 2022
If you process your data in a live script, consider interactively exploring different ways to detect and remove outliers using the Clean Outlier Data live task. See here:

Categories

Find more on Data Import from MATLAB in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!