## List of Deep Learning Layers

This page provides a list of deep learning layers in MATLAB^{®}.

To learn how to create networks from layers for different tasks, see the following examples.

Task | Learn More |
---|---|

Create deep learning networks for image classification or regression. | Create Simple Deep Learning Network for Classification |

Create deep learning networks for sequence and time series data. | |

Create deep learning network for audio data. | Train Speech Command Recognition Model Using Deep Learning |

Create deep learning network for text data. |

### Deep Learning Layers

Use the following functions to create different layer types. Alternatively, use the
**Deep Network
Designer** app to create networks interactively.

To learn how to define your own custom layers, see Define Custom Deep Learning Layers.

#### Input Layers

Layer | Description |
---|---|

An image input layer inputs 2-D images to a network and applies data normalization. | |

A 3-D image input layer inputs 3-D images or volumes to a network and applies data normalization. | |

| A point cloud input layer inputs 3-D point clouds to a network and applies data normalization. You can also input point cloud data such as 2-D lidar scans. |

A sequence input layer inputs sequence data to a network. | |

A feature input layer inputs feature data to a network and applies data normalization. Use this layer when you have a data set of numeric scalars representing features (data without spatial or time dimensions). | |

| An ROI input layer inputs images to a Fast R-CNN object detection network. |

#### Convolution and Fully Connected Layers

Layer | Description |
---|---|

A 1-D convolutional layer applies sliding convolutional filters to 1-D input. | |

A 2-D convolutional layer applies sliding convolutional filters to 2-D input. | |

A 3-D convolutional layer applies sliding cuboidal convolution filters to 3-D input. | |

A 2-D grouped convolutional layer separates the input channels into groups and applies sliding convolutional filters. Use grouped convolutional layers for channel-wise separable (also known as depth-wise separable) convolution. | |

A transposed 2-D convolution layer upsamples two-dimensional feature maps. | |

A transposed 3-D convolution layer upsamples three-dimensional feature maps. | |

A fully connected layer multiplies the input by a weight matrix and then adds a bias vector. |

#### Sequence Layers

Layer | Description |
---|---|

A sequence input layer inputs sequence data to a network. | |

An LSTM layer learns long-term dependencies between time steps in time series and sequence data. | |

An LSTM projected layer learns long-term dependencies between time steps in time series and sequence data using projected learnable weights. | |

A bidirectional LSTM (BiLSTM) layer learns bidirectional long-term dependencies between time steps of time series or sequence data. These dependencies can be useful when you want the network to learn from the complete time series at each time step. | |

A GRU layer learns dependencies between time steps in time series and sequence data. | |

A 1-D convolutional layer applies sliding convolutional filters to 1-D input. | |

A transposed 1-D convolution layer upsamples one-dimensional feature maps. | |

A 1-D max pooling layer performs downsampling by dividing the input into 1-D pooling regions, then computing the maximum of each region. | |

A 1-D average pooling layer performs downsampling by dividing the input into 1-D pooling regions, then computing the average of each region. | |

A 1-D global max pooling layer performs downsampling by outputting the maximum of the time or spatial dimensions of the input. | |

A sequence folding layer converts a batch of image sequences to a batch of images. Use a sequence folding layer to perform convolution operations on time steps of image sequences independently. | |

A sequence unfolding layer restores the sequence structure of the input data after sequence folding. | |

A flatten layer collapses the spatial dimensions of the input into the channel dimension. | |

| A word embedding layer maps word indices to vectors. |

| A peephole LSTM layer is a variant of an LSTM layer, where the gate calculations use the layer cell state. |

#### Activation Layers

Layer | Description |
---|---|

A ReLU layer performs a threshold operation to each element of the input, where any value less than zero is set to zero. | |

A leaky ReLU layer performs a threshold operation, where any input value less than zero is multiplied by a fixed scalar. | |

A clipped ReLU layer performs a threshold operation, where any
input value less than zero is set to zero and any value above the clipping
ceiling is set to that clipping ceiling.
| |

An ELU activation layer performs the identity operation on positive inputs and an exponential nonlinearity on negative inputs. | |

A Gaussian error linear unit (GELU) layer weights the input by its probability under a Gaussian distribution. | |

A hyperbolic tangent (tanh) activation layer applies the tanh function on the layer inputs. | |

A swish activation layer applies the swish function on the layer inputs. | |

| A softplus layer applies the softplus activation function Y = log(1 +
e^{X}), which ensures that the output is always positive. This activation function is
a smooth continuous version of `reluLayer` . You can
incorporate this layer into the deep neural networks you define for actors in reinforcement
learning agents. This layer is useful for creating continuous Gaussian policy deep neural
networks, for which the standard deviation output must be positive. |

A function layer applies a specified function to the layer input. | |

| A PReLU layer performs a threshold operation, where for each channel, any input value less than zero is multiplied by a scalar learned at training time. |

#### Normalization Layers

Layer | Description |
---|---|

A batch normalization layer normalizes a mini-batch of data across all observations for each channel independently. To speed up training of the convolutional neural network and reduce the sensitivity to network initialization, use batch normalization layers between convolutional layers and nonlinearities, such as ReLU layers. | |

A group normalization layer normalizes a mini-batch of data across grouped subsets of channels for each observation independently. To speed up training of the convolutional neural network and reduce the sensitivity to network initialization, use group normalization layers between convolutional layers and nonlinearities, such as ReLU layers. | |

An instance normalization layer normalizes a mini-batch of data across each channel for each observation independently. To improve the convergence of training the convolutional neural network and reduce the sensitivity to network hyperparameters, use instance normalization layers between convolutional layers and nonlinearities, such as ReLU layers. | |

A layer normalization layer normalizes a mini-batch of data across all channels for each observation independently. To speed up training of recurrent and multilayer perceptron neural networks and reduce the sensitivity to network initialization, use layer normalization layers after the learnable layers, such as LSTM and fully connected layers. | |

A channel-wise local response (cross-channel) normalization layer carries out channel-wise normalization. |

#### Utility Layers

Layer | Description |
---|---|

A dropout layer randomly sets input elements to zero with a given probability. | |

A 2-D crop layer applies 2-D cropping to the input. | |

A 3-D crop layer crops a 3-D volume to the size of the input feature map. | |

| A scaling layer linearly scales and biases an input array `U` , giving an output `Y = Scale.*U + Bias` . You can incorporate this layer into the deep neural networks you define for actors or critics in reinforcement learning agents. This layer is useful for scaling and shifting the outputs of nonlinear layers, such as `tanhLayer` and sigmoid. |

| A quadratic layer takes an input vector and outputs a vector of quadratic monomials constructed from the input elements. This layer is useful when you need a layer whose output is a quadratic function of its inputs. For example, to recreate the structure of quadratic value functions such as those used in LQR controller design. |

| An STFT layer computes the short-time Fourier transform of the input. |

| A CWT layer computes the CWT of the input. |

| A MODWT layer computes the MODWT and MODWT multiresolution analysis (MRA) of the input. |

#### Resizing Layers

Layer | Description |
---|---|

| A 2-D resize layer resizes 2-D input by a scale factor, to a specified height and width, or to the size of a reference input feature map. |

| A 3-D resize layer resizes 3-D input by a scale factor, to a specified height, width, and depth, or to the size of a reference input feature map. |

#### Pooling and Unpooling Layers

Layer | Description |
---|---|

A 1-D average pooling layer performs downsampling by dividing the input into 1-D pooling regions, then computing the average of each region. | |

A 2-D average pooling layer performs downsampling by dividing the input into rectangular pooling regions, then computing the average of each region. | |

A 3-D average pooling layer performs downsampling by dividing three-dimensional input into cuboidal pooling regions, then computing the average values of each region. | |

A 1-D global average pooling layer performs downsampling by outputting the average of the time or spatial dimensions of the input. | |

A 2-D global average pooling layer performs downsampling by computing the mean of the height and width dimensions of the input. | |

A 3-D global average pooling layer performs downsampling by computing the mean of the height, width, and depth dimensions of the input. | |

A 1-D max pooling layer performs downsampling by dividing the input into 1-D pooling regions, then computing the maximum of each region. | |

A 2-D max pooling layer performs downsampling by dividing the input into rectangular pooling regions, then computing the maximum of each region. | |

A 3-D max pooling layer performs downsampling by dividing three-dimensional input into cuboidal pooling regions, then computing the maximum of each region. | |

A 1-D global max pooling layer performs downsampling by outputting the maximum of the time or spatial dimensions of the input. | |

A 2-D global max pooling layer performs downsampling by computing the maximum of the height and width dimensions of the input. | |

A 3-D global max pooling layer performs downsampling by computing the maximum of the height, width, and depth dimensions of the input. | |

A 2-D max unpooling layer unpools the output of a 2-D max pooling layer. |

#### Combination Layers

Layer | Description |
---|---|

An addition layer adds inputs from multiple neural network layers element-wise. | |

A multiplication layer multiplies inputs from multiple neural network layers element-wise. | |

A depth concatenation layer takes inputs that have the same height and width and concatenates them along the third dimension (the channel dimension). | |

A concatenation layer takes inputs and concatenates them along a specified dimension. The inputs must have the same size in all dimensions except the concatenation dimension. | |

| A weighted addition layer scales and adds inputs from multiple neural network layers element-wise. |

#### Object Detection Layers

Layer | Description |
---|---|

| An ROI input layer inputs images to a Fast R-CNN object detection network. |

| An ROI max pooling layer outputs fixed size feature maps for every rectangular ROI within the input feature map. Use this layer to create a Fast or Faster R-CNN object detection network. |

| An ROI align layer outputs fixed size feature maps for every rectangular ROI within an input feature map. Use this layer to create a Mask R-CNN network. |

| An anchor box layer stores anchor boxes for a feature map used in object detection networks. |

| A region proposal layer outputs bounding boxes around potential objects in an image as part of the region proposal network (RPN) within Faster R-CNN. |

| An SSD merge layer merges the outputs of feature maps for subsequent regression and classification loss computation. |

`yolov2TransformLayer` (Computer Vision Toolbox) | A transform layer of the you only look once version 2 (YOLO v2) network transforms the bounding box predictions of the last convolution layer in the network to fall within the bounds of the ground truth. Use the transform layer to improve the stability of the YOLO v2 network. |

| A space to depth layer permutes the spatial blocks of the input into the depth dimension. Use this layer when you need to combine feature maps of different size without discarding any feature data. |

| A 2-D depth to space layer permutes data from the depth dimension into blocks of 2-D spatial data. |

| A region proposal network (RPN) softmax layer applies a softmax activation function to the input. Use this layer to create a Faster R-CNN object detection network. |

| A focal loss layer predicts object classes using focal loss. |

| A region proposal network (RPN) classification layer classifies image regions as either object or background by using a cross entropy loss function. Use this layer to create a Faster R-CNN object detection network. |

| A box regression layer refines bounding box locations by using a smooth L1 loss function. Use this layer to create a Fast or Faster R-CNN object detection network. |

#### Output Layers

Layer | Description |
---|---|

A softmax layer applies a softmax function to the input. | |

A sigmoid layer applies a sigmoid function to the input such that the output is bounded in the interval (0,1). | |

A classification layer computes the cross-entropy loss for classification and weighted classification tasks with mutually exclusive classes. | |

A regression layer computes the half-mean-squared-error loss for regression tasks. | |

| A pixel classification layer provides a categorical label for each image pixel or voxel. |

| A Dice pixel classification layer provides a categorical label for each image pixel or voxel using generalized Dice loss. |

| A focal loss layer predicts object classes using focal loss. |

| A region proposal network (RPN) softmax layer applies a softmax activation function to the input. Use this layer to create a Faster R-CNN object detection network. |

| A region proposal network (RPN) classification layer classifies image regions as either object or background by using a cross entropy loss function. Use this layer to create a Faster R-CNN object detection network. |

| A box regression layer refines bounding box locations by using a smooth L1 loss function. Use this layer to create a Fast or Faster R-CNN object detection network. |

| An output layer of the you only look once version 2 (YOLO v2) network refines the bounding box locations by minimizing the mean squared error loss between the predicted locations and ground truth. |

| A Tversky pixel classification layer provides a categorical label for each image pixel or voxel using Tversky loss. |

| A classification SSE layer computes the sum of squares error loss for classification problems. |

| A regression MAE layer computes the mean absolute error loss for regression problems. |

## See Also

`trainingOptions`

| `trainNetwork`

| Deep Network
Designer

## Related Topics

- Example Deep Learning Networks Architectures
- Build Networks with Deep Network Designer
- Specify Layers of Convolutional Neural Network
- Set Up Parameters and Train Convolutional Neural Network
- Define Custom Deep Learning Layers
- Create Simple Deep Learning Network for Classification
- Sequence Classification Using Deep Learning
- Pretrained Deep Neural Networks
- Deep Learning Tips and Tricks