isoutlier
Find outliers in data
Syntax
Description
TF = isoutlier(A)true when an outlier is detected
                in the corresponding element of A.
- If - Ais a matrix, then- isoutlieroperates on each column of- Aseparately.
- If - Ais a multidimensional array, then- isoutlieroperates along the first dimension of- Awhose size does not equal 1.
- If - Ais a table or timetable, then- isoutlieroperates on each variable of- Aseparately.
By default, an outlier is a value that is more than three scaled median absolute deviations (MAD) from the median.
You can use isoutlier functionality interactively by adding
                the Clean Outlier
                    Data task to a live script.
TF = isoutlier(___,Name,Value)isoutlier(A,"SamplePoints",t) detects
                outliers in array A relative to the corresponding elements of a
                time vector t.
Examples
Find the outliers in a vector of data. A logical 1 in the output indicates the location of an outlier.
A = [57 59 60 100 59 58 57 58 300 61 62 60 62 58 57]; TF = isoutlier(A)
TF = 1×15 logical array
   0   0   0   1   0   0   0   0   1   0   0   0   0   0   0
Define outliers as points more than three standard deviations from the mean, and find the locations of outliers in a vector.
A = [57 59 60 100 59 58 57 58 300 61 62 60 62 58 57];
TF = isoutlier(A,"mean")TF = 1×15 logical array
   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0
Use a moving detection method to detect local outliers in a sine wave that corresponds to a time vector.
Create a vector of data containing a local outlier.
x = -2*pi:0.1:2*pi; A = sin(x); A(47) = 0;
Create a time vector that corresponds to the data in A.
t = datetime(2017,1,1,0,0,0) + hours(0:length(x)-1);
Define outliers as points more than three local scaled MAD from the local median within a sliding window. Find the locations of the outliers in A relative to the points in t with a window size of 5 hours. Plot the data and detected outliers.
TF = isoutlier(A,"movmedian",hours(5),"SamplePoints",t); plot(t,A) hold on plot(t(TF),A(TF),"x") legend("Original Data","Outlier Data")

Find outliers for each row of a matrix.
Create a matrix of data containing outliers along the diagonal.
A = magic(5) + diag(200*ones(1,5))
A = 5×5
   217    24     1     8    15
    23   205     7    14    16
     4     6   213    20    22
    10    12    19   221     3
    11    18    25     2   209
Find the locations of outliers based on the data in each row.
TF = isoutlier(A,2)
TF = 5×5 logical array
   1   0   0   0   0
   0   1   0   0   0
   0   0   1   0   0
   0   0   0   1   0
   0   0   0   0   1
Locate an outlier in a vector of data and visualize the outlier.
Create a vector of data containing a local outlier.
x = 1:10; A = [60 59 49 49 58 100 61 57 48 58];
Locate the outlier using the default detection method "median".
[TF,L,U,C] = isoutlier(A);
Plot the original data, the outlier, and the thresholds and center value determined by the detection method. The center value is the median of the data, and the upper and lower thresholds are three scaled MAD above and below the median.
plot(x,A) hold on plot(x(TF),A(TF),"x") yline([L U C],":",["Lower Threshold","Upper Threshold","Center Value"]) legend("Original Data","Outlier Data")

Input Arguments
Input data, specified as a vector, matrix, multidimensional array, table, or timetable.
- If - Ais a table, then its variables must be of type- doubleor- single, or you can use the- DataVariablesargument to list- doubleor- singlevariables explicitly. Specifying variables is useful when you are working with a table that contains variables with data types other than- doubleor- single.
- If - Ais a timetable, then- isoutlieroperates only on the table elements. If row times are used as sample points, then they must be unique and listed in ascending order.
Data Types: double | single | table | timetable
Method for detecting outliers, specified as one of these values.
| Method | Description | 
|---|---|
| "median" | Outliers are defined as elements more than three
                                            scaled MAD from the median. The scaled MAD is defined as c*median(abs(A-median(A))), wherec=-1/(sqrt(2)*erfcinv(3/2)). | 
| "mean" | Outliers are defined as elements more than three
                                            standard deviations from the mean. This method is faster
                                            but less robust than "median". | 
| "quartiles" | Outliers are defined as elements more than 1.5
                                            interquartile ranges above the upper quartile (75
                                            percent) or below the lower quartile (25 percent). This
                                            method is useful when the data in Ais not normally distributed. | 
| "grubbs" | Outliers are detected using Grubbs’ test for
                                            outliers, which removes one outlier per iteration based
                                            on hypothesis testing. This method assumes that the data
                                            in Ais normally distributed. | 
| "gesd" | Outliers are detected using the generalized extreme
                                            Studentized deviate test for outliers. This iterative
                                            method is similar to "grubbs", but
                                            can perform better when there are multiple outliers
                                            masking each other. | 
To detect outliers using a specified range, use the isbetween function.
Percentile thresholds, specified as a two-element row vector whose
                        elements are in the interval [0, 100]. The first element indicates the lower
                        percentile threshold, and the second element indicates the upper percentile
                        threshold. The first element of threshold must be less
                        than the second element.
For example, a threshold of [10 90] defines outliers as
                        points below the 10th percentile and above the 90th percentile.
Moving method for detecting outliers, specified as one of these values.
| Method | Description | 
|---|---|
| "movmedian" | Outliers are defined as elements more than three
                                            local scaled MAD from the local median over a window
                                            length specified by window. This
                                            method is also known as a Hampel
                                                filter. | 
| "movmean" | Outliers are defined as elements more than three
                                            local standard deviations from the local mean over a
                                            window length specified by window. | 
Window length, specified as a positive integer scalar, a two-element vector of positive integers, a positive duration scalar, or a two-element vector of positive durations.
When window is a positive integer scalar, the window is centered about the
                        current element and contains window-1 neighboring
                        elements. If window is even, then the window is centered
                        about the current and previous elements.
When window is a two-element vector of positive
                        integers [b f], the window contains the current element,
                            b elements backward, and f
                        elements forward.
When A is a timetable or SamplePoints is specified as a
                            datetime or duration vector,
                            window must be of type duration,
                        and the windows are computed relative to the sample points.
Operating dimension, specified as a positive integer scalar. If no value is specified, then the default is the first array dimension whose size does not equal 1.
Consider an m-by-n input matrix,
                            A:
- isoutlier(A,1)detects outliers based on the data in each column of- Aand returns an- m-by-- nmatrix. 
- isoutlier(A,2)detects outliers based on the data in each row of- Aand returns an- m-by-- nmatrix. 
For table or timetable input data, dim is not supported
                        and operation is along each table or timetable variable separately.
Name-Value Arguments
Specify optional pairs of arguments as
      Name1=Value1,...,NameN=ValueN, where Name is
      the argument name and Value is the corresponding value.
      Name-value arguments must appear after other arguments, but the order of the
      pairs does not matter.
    
Example: isoutlier(A,"mean",ThresholdFactor=4)
      Before R2021a, use commas to separate each name and value, and enclose 
      Name in quotes.
    
Example: isoutlier(A,"mean","ThresholdFactor",4)
Data Options
Sample points, specified as a vector of sample point values or one of
                            the options in the following table when the input data is a table. The
                            sample points represent the x-axis locations of the
                            data, and must be sorted and contain unique elements. Sample points do
                            not need to be uniformly sampled. The vector [1 2 3
                                ...] is the default.
When the input data is a table, you can specify the sample points as a table variable using one of these options.
| Indexing Scheme | Examples | 
|---|---|
| Variable name: 
 
 | 
 
 
 | 
| Variable index: 
 
 | 
 
 
 | 
| Function handle: 
 
 | 
 | 
| Variable type: 
 
 | 
 
 
 | 
Note
This name-value argument is not supported when the input data is a
            timetable. Timetables use the vector of row times as the sample
        points. To use different sample points, you must edit the timetable so that the row times
        contain the desired sample points.
Moving windows are defined relative to the sample points. For example,
                            if t is a vector of times corresponding to the input
                            data, then
                                isoutlier(rand(1,10),"movmean",3,"SamplePoints",t)
                            has a window that represents the time interval between
                                t(i)-1.5 and t(i)+1.5.
When the sample points vector has data type
                                datetime or duration, the
                            moving window length must have type duration.
Example: isoutlier(A,"SamplePoints",0:0.1:10)
Example: isoutlier(T,"SamplePoints","Var1")
Data Types: single | double | datetime | duration
Table variables to operate on, specified as one of the options in this
                            table. The DataVariables value indicates which
                            variables of the input table to examine for outliers. The data type
                            associated with the indicated variables must be
                                double or single.
The first output TF contains
                                false for variables not specified by
                                DataVariables unless the value of
                                OutputFormat is
                            "tabular".
| Indexing Scheme | Values to Specify | Examples | 
|---|---|---|
| Variable name | 
 | 
 
 
 | 
| Variable index | 
 | 
 
 
 | 
| Function handle | 
 | 
 | 
| Variable type | 
 | 
 
 
 | 
Example: isoutlier(T,"DataVariables",["Var1" "Var2"
                                "Var4"])
Output data type, specified as one of these values:
- "logical"— For table or timetable input data, return the output- TFas a logical array.
- "tabular"— For table input data, return the output- TFas a table. For timetable input data, return the output- TFas a timetable.
For vector, matrix, or multidimensional array input data,
                                OutputFormat is not supported.
Example: isoutlier(T,"OutputFormat","tabular")
Outlier Detection Options
Detection threshold factor, specified as a nonnegative scalar.
For methods "median" and
                                "movmedian", the detection threshold factor
                            replaces the number of scaled MAD, which is 3 by default.
For methods "mean" and
                            "movmean", the detection threshold factor replaces
                            the number of standard deviations from the mean, which is 3 by
                            default.
 For methods "grubbs" and "gesd", the detection
                            threshold factor is a scalar ranging from 0 to 1. Values close to 0
                            result in a smaller number of outliers, and values close to 1 result in
                            a larger number of outliers. The default detection threshold factor is
                            0.05.
For the "quartiles" method, the detection threshold factor replaces the
                            number of interquartile ranges, which is 1.5 by default.
This name-value argument is not supported when the specified method is
                                "percentiles".
Maximum outlier count, for the "gesd" method only,
                            specified as a positive integer scalar. The
                                MaxNumOutliers value specifies the maximum number
                            of outliers returned by the "gesd" method. For
                            example, isoutlier(A,"gesd","MaxNumOutliers",5)
                            returns no more than five outliers.
The default value for MaxNumOutliers is the integer
                            nearest to 10 percent of the number of elements in A.
                            Setting a larger value for the maximum number of outliers makes it more
                            likely that all outliers are detected but at the cost of reduced
                            computational efficiency.
The "gesd" method assumes the nonoutlier input data
                            is sampled from an approximate normal distribution. When the data is not
                            sampled in this way, the number of returned outliers might exceed the
                                MaxNumOutliers value.
Output Arguments
Outlier indicator, returned as a vector, matrix, multidimensional array, table, or timetable.
TF is the same size as A unless the
                        value of OutputFormat is "tabular". If
                        the value of OutputFormat is
                        "tabular", then TF has only variables
                        corresponding to the DataVariables specified.
Data Types: logical
Lower threshold used by the outlier detection method, returned as a scalar, vector, matrix, multidimensional array, table, or timetable. For example, the lower threshold value of the default outlier detection method is three scaled MAD below the median of the input data.
If method is used for outlier detection, then
                            L has the same size as A in all
                        dimensions except for the operating dimension where the length is 1. If
                            movmethod is used, then L has the
                        same size as A.
Data Types: double | single | table | timetable
Upper threshold used by the outlier detection method, returned as a scalar, vector, matrix, multidimensional array, table, or timetable. For example, the upper threshold value of the default outlier detection method is three scaled MAD above the median of the input data.
If method is used for outlier detection, then
                            U has the same size as A in all
                        dimensions except for the operating dimension where the length is 1. If
                            movmethod is used, then U has the
                        same size as A.
Data Types: double | single | table | timetable
Center value used by the outlier detection method, returned as a scalar, vector, matrix, multidimensional array, table, or timetable. For example, the center value of the default outlier detection method is the median of the input data.
If method is used for outlier detection, then
                            C has the same size as A in all
                        dimensions except for the operating dimension where the length is 1. If
                            movmethod is used, then C has the
                        same size as A.
Data Types: double | single | table | timetable
More About
For a finite-length vector A made up of N scalar observations, the median absolute deviation (MAD) is defined as
for i = 1,2,...,N.
The scaled MAD is defined as c*median(abs(A-median(A))), where
                    c=-1/(sqrt(2)*erfcinv(3/2)).
Alternative Functionality
Live Editor Task
You can use isoutlier functionality interactively by adding
                the Clean Outlier
                    Data task to a live script.

References
[1] NIST/SEMATECH e-Handbook of Statistical Methods, https://www.itl.nist.gov/div898/handbook/, 2013.
Extended Capabilities
The
        isoutlier function supports tall arrays with the following usage
    notes and limitations:
- The - "percentiles",- "grubbs", and- "gesd"methods are not supported.
- The - "movmedian"and- "movmean"methods do not support tall timetables.
- The - SamplePointsand- MaxNumOutliersname-value arguments are not supported.
- The value of - DataVariablescannot be a function handle.
- Computation of - isoutlier(A),- isoutlier(A,"median",...), or- isoutlier(A,"quartiles",...)along the first dimension is supported only for tall column vectors- A.
For more information, see Tall Arrays.
Usage notes and limitations:
- The - "movmean"and- "movmedian"methods for detecting outliers do not support timetable input data, datetime- SamplePointsvalues, or duration- SamplePointsvalues.
- String and character array inputs must be constant. 
This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.
The isoutlier function
    supports GPU array input with these usage notes and limitations:
- The - "movmedian"moving method is not supported.
- The - SamplePointsand- DataVariablesname-value arguments are not supported.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2017aFor table or timetable input data, return a tabular output TF
                instead of a logical array by setting the OutputFormat name-value
                argument to "tabular".
For table input data, specify the sample points as a table variable using the
            SamplePoints name-value argument.
See Also
Functions
- rmoutliers|- isbetween|- ischange|- islocalmax|- islocalmin|- filloutliers|- ismissing
Live Editor Tasks
Apps
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)