# fillmissing

Fill missing values

## Syntax

``F = fillmissing(A,'constant',v)``
``F = fillmissing(A,method)``
``F = fillmissing(A,movmethod,window)``
``F = fillmissing(___,dim)``
``F = fillmissing(___,Name,Value)``
``````[F,TF] = fillmissing(___)``````

## Description

````F = fillmissing(A,'constant',v)` fills missing entries of an array or table with the constant value `v`. If `A` is a matrix or multidimensional array, then `v` can be either a scalar or a vector. When `v` is a vector, each element specifies the fill value in the corresponding column of `A`. If `A` is a table or timetable, then `v` can also be a cell array.Missing values are defined according to the data type of `A`: `NaN` — `double`, `single`, `duration`, and `calendarDuration``NaT` — `datetime``<missing>`—`string``<undefined>` — `categorical``' '` — `char``{''}` — `cell` of character arrays If `A` is a table, then the data type of each column defines the missing value for that column.```

````F = fillmissing(A,method)` fills missing entries using the method specified by `method`, which can be one of the following: `'previous'` — previous non-missing value`'next'` — next non-missing value`'nearest'` — nearest non-missing value`'linear'` — linear interpolation of neighboring, non-missing values (numeric, `duration`, and `datetime` data types only)`'spline'` — piecewise cubic spline interpolation (numeric, `duration`, and `datetime` data types only)`'pchip'` — shape-preserving piecewise cubic spline interpolation (numeric, `duration`, and `datetime` data types only) ```

````F = fillmissing(A,movmethod,window)` fills missing entries using a moving window mean or median with window length `window`. For example, `fillmissing(A,'movmean',5)` fills data with a moving average using a window length of 5.```

````F = fillmissing(___,dim)` specifies the dimension of `A` to operate along. By default, `fillmissing` operates along the first dimension whose size does not equal 1. For example, if `A` is a matrix, then `fillmissing(A,2)` operates across the columns of `A`, filling missing data row by row.```

````F = fillmissing(___,Name,Value)` specifies additional parameters for filling missing values using one or more name-value pair arguments. For example, if `t` is a vector of time values, then `fillmissing(A,'linear','SamplePoints',t)` interpolates the data in `A` relative to the times in `t`.```

``````[F,TF] = fillmissing(___)``` also returns a logical array corresponding to the entries of `A` that were filled.```

## Examples

Create a vector that contains `NaN` values and replace each `NaN` with the previous non-missing value.

```A = [1 3 NaN 4 NaN NaN 5]; F = fillmissing(A,'previous')```
```F = 1×7 1 3 3 4 4 4 5 ```

Use interpolation to replace `NaN` values in non-uniformly sampled data.

Define a vector of non-uniform sample points and evaluate the sine function over the points.

```x = [-4*pi:0.1:0, 0.1:0.2:4*pi]; A = sin(x);```

Inject `NaN` values into `A`.

`A(A < 0.75 & A > 0.5) = NaN;`

Fill the missing data using linear interpolation, and return the filled vector `F` and the logical vector `TF`. The value 1 (`true`) in entries of `TF` corresponds to the values of `F` that were filled.

`[F,TF] = fillmissing(A,'linear','SamplePoints',x);`

Plot the original data and filled data.

```plot(x,A,'.', x(TF),F(TF),'o') xlabel('x'); ylabel('sin(x)') legend('Original Data','Filled Missing Data')``` Use a moving median to fill missing numeric data.

Create a vector of sample points `x` and a vector of data `A` that contains missing values.

```x = linspace(0,10,200); A = sin(x) + 0.5*(rand(size(x))-0.5); A([1:10 randi([1 length(x)],1,50)]) = NaN; ```

Replace `NaN` values in `A` using a moving median with a window of length 10, and plot both the original data and the filled data.

```F = fillmissing(A,'movmedian',10); plot(x,F,'r.-',x,A,'b.-') legend('Filled Missing Data','Original Data')``` Create a matrix with missing entries and fill across the columns (second dimension) one row at a time using linear interpolation. For each row, fill leading and trailing missing values with the nearest non-missing value in that row.

```A = [NaN NaN 5 3 NaN 5 7 NaN 9 NaN; 8 9 NaN 1 4 5 NaN 5 NaN 5; NaN 4 9 8 7 2 4 1 1 NaN]```
```A = 3×10 NaN NaN 5 3 NaN 5 7 NaN 9 NaN 8 9 NaN 1 4 5 NaN 5 NaN 5 NaN 4 9 8 7 2 4 1 1 NaN ```
`F = fillmissing(A,'linear',2,'EndValues','nearest')`
```F = 3×10 5 5 5 3 4 5 7 8 9 9 8 9 5 1 4 5 5 5 5 5 4 4 9 8 7 2 4 1 1 1 ```

Fill missing values for table variables with different data types.

Create a table whose variables include `categorical`, `double`, and `char` data types.

```A = table(categorical({'Sunny';'Cloudy';''}),[66;NaN;54],{'';'N';'Y'},[37;39;NaN],... 'VariableNames',{'Description' 'Temperature' 'Rain' 'Humidity'})```
```A=3×4 table Description Temperature Rain Humidity ___________ ___________ ____ ________ Sunny 66 '' 37 Cloudy NaN 'N' 39 <undefined> 54 'Y' NaN ```

Replace all missing entries with the value from the previous entry. Since there is no previous element in the `Rain` variable, the missing character vector is not replaced.

`F = fillmissing(A,'previous')`
```F=3×4 table Description Temperature Rain Humidity ___________ ___________ ____ ________ Sunny 66 '' 37 Cloudy 66 'N' 39 Cloudy 54 'Y' 39 ```

Replace the `NaN` values from the `Temperature` and `Humidity` variables in `A` with 0.

`F = fillmissing(A,'constant',0,'DataVariables',{'Temperature','Humidity'})`
```F=3×4 table Description Temperature Rain Humidity ___________ ___________ ____ ________ Sunny 66 '' 37 Cloudy 0 'N' 39 <undefined> 54 'Y' 0 ```

Alternatively, use the `isnumeric` function to identify the numeric variables to operate on.

`F = fillmissing(A,'constant',0,'DataVariables',@isnumeric)`
```F=3×4 table Description Temperature Rain Humidity ___________ ___________ ____ ________ Sunny 66 '' 37 Cloudy 0 'N' 39 <undefined> 54 'Y' 0 ```

## Input Arguments

Input data, specified as a vector, matrix, multidimensional array, table, or timetable.

If `A` is a timetable, then only table values are filled. If the associated vector of row times contains a `NaT` or `NaN` value, then `fillmissing` produces an error. Row times must be unique and listed in ascending order.

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64` | `logical` | `char` | `string` | `cell` | `table` | `timetable` | `categorical` | `datetime` | `duration` | `calendarDuration`

Fill constant, specified as a scalar, vector, or cell array. `v` can be a vector when `A` is a matrix or multidimensional array. `v` can be a cell array when `A` is a table or timetable.

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64` | `logical` | `char` | `cell` | `categorical` | `datetime` | `duration`

Fill method, specified as one of the following:

MethodDescription
`'previous'`previous non-missing value
`'next'`next non-missing value
`'nearest'`nearest non-missing value
`'linear'`linear interpolation of neighboring, non-missing values (numeric, `duration`, and `datetime` data types only)
`'spline'`piecewise cubic spline interpolation (numeric, `duration`, and `datetime` data types only)
`'pchip'`shape-preserving piecewise cubic spline interpolation (numeric, `duration`, and `datetime` data types only)
`'makima'`modified Akima cubic Hermite interpolation (numeric, `duration`, and `datetime` data types only)

Moving method to fill missing data, specified as one of the following:

MethodDescription
`'movmean'`Moving average over a window of length `window` (numeric data types only)
`'movmedian'`Moving median over a window of length `window` (numeric data types only)

Window length, specified as a positive integer scalar, a two-element vector of positive integers, a positive duration scalar, or a two-element vector of positive durations.

When `window` is a positive integer scalar, then the window is centered about the current element and contains `window-1` neighboring elements. If `window` is even, then the window is centered about the current and previous elements. If `window` is a two-element vector of positive integers `[b f]`, then the window contains the current element, `b` elements backward, and `f` elements forward.

When `A` is a timetable or `'SamplePoints'` is specified as a `datetime` or `duration` vector, `window` must be of type `duration`, and the windows are computed relative to the sample points.

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64` | `duration`

Dimension to operate along, specified as a positive integer scalar. If no value is specified, then the default is the first array dimension whose size does not equal 1.

When `A` is a table or timetable, `dim` is not supported. `fillmissing` operates along each table or timetable variable separately.

Consider a two-dimensional input array, `A`.

• If `dim=1`, then `fillmissing` fills `A` column by column. • If `dim=2`, then `fillmissing` fills `A` row by row. Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `fillmissing(A,'DataVariables',{'Temperature','Altitude'})` fills only the columns corresponding to the `Temperature` and `Altitude` variables of an input table

Method for handling endpoints, specified as the comma-separated pair consisting of `'EndValues'` and one of `'extrap'`, `'previous'`, `'next'`, `'nearest'`, `'none'`, or a constant scalar value. The endpoint fill method handles leading and trailing missing values based on the following definitions:

MethodDescription
`'extrap'`same as `method`
`'previous'`previous non-missing value
`'next'`next non-missing value
`'nearest'`nearest non-missing value
`'none'`no fill value
scalarconstant value (numeric, `duration`, and `datetime` data types only)

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64` | `logical` | `datetime` | `duration`

Sample points for fill method, specified as the comma-separated pair consisting of `'SamplePoints'` and a vector. The sample points represent the location of the data in `A`, and must be sorted and contain unique elements. Sample points do not need to be uniformly sampled. If `A` is a timetable, then the default sample points vector is the vector of row times. Otherwise, the default vector is `[1 2 3 ...]`.

Moving windows are defined relative to the sample points. For example, if `t` is a vector of times corresponding to the input data, then `fillmissing(rand(1,10),'movmean',3,'SamplePoints',t)` has a window that represents the time interval between `t(i)-1.5` and `t(i)+1.5`.

When the sample points vector has data type `datetime` or `duration`, then the moving window length must have type `duration`.

This name-value pair is not supported when the input data is a timetable.

Data Types: `double` | `single` | `datetime` | `duration`

Table variables to fill, specified as the comma-separated pair consisting of `'DataVariables'` and a variable name, a cell array of variable names, a numeric vector, a logical vector, or a function handle. The `'DataVariables'` value indicates which columns of the input table to fill, and can be one of the following:

• A character vector specifying a single table variable name

• A cell array of character vectors where each element is a table variable name

• A vector of table variable indices

• A logical vector whose elements each correspond to a table variable, where `true` includes the corresponding variable and `false` excludes it

• A function handle that returns a logical scalar, such as `@isnumeric`

Example: `'Age'`

Example: `{'Height','Weight'}`

Example: `@iscategorical`

Data Types: `char` | `cell` | `single` | `double` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64` | `logical` | `function_handle`

Known missing indicator, specified as the comma-separated pair consisting of `'MissingLocations'` and a logical vector, matrix, or multidimensional array of the same size as `A`. The indicator elements can be `true` to indicate a missing value in the corresponding location of `A` or `false` otherwise.

Data Types: `logical`

## Output Arguments

collapse all

Filled data, returned as a vector, matrix, multidimensional array, table, or timetable. `F` is the same size as `A`.

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64` | `logical` | `char` | `string` | `cell` | `table` | `timetable` | `categorical` | `datetime` | `duration` | `calendarDuration`

Filled data indicator, returned as a vector, matrix, or multidimensional array. `TF` is a logical array where 1 (`true`) corresponds to entries in `F` that were filled and 0 (`false`) corresponds to unchanged entries. `TF` is the same size as `A` and `F`.

Data Types: `logical`

