Deep Learning Data Formats
Most deep learning networks and functions operate on different dimensions of the input data in different ways.
For example, an LSTM operation iterates over the time dimension of the input data and a batch normalization operation normalizes over the batch dimension of the input data.
Data can have many different types of layouts:
Data can have different numbers of dimensions, for example, you can represent image and video data as 4-D and 5-D arrays, respectively.
Dimensions of data can represent different things, for example image data has two spatial dimensions, one channel dimension, and one batch dimension.
Data can have dimensions in multiple permutations. For example a batch of sequences can be represented as a 3-D array with dimensions corresponding to channels, time steps, and observations. These dimensions can be in any order.
To ensure that the software operates on the correct dimensions, you can provide data layout information in different ways:
Provide data with dimensions in a specific permutation
Network with an input layer and the data has the required layout
Pass data directly to network or function.
Provide data with labeled dimensions
Network with an input layer and the data does not have the required layout
Create a formatted
Deep learning model defined as a function that uses multiple deep learning operations
Custom layer that uses multiple deep learning operations
|Create layer that inherits from
Provide data with additional layout information
|Deep learning functions that require layout information and you want to preserve the layout of the data|
Specify layout information using the appropriate
input argument. For example, the
Model functions where dimensions change between functions. For example, when one function must treat the third dimension as time, and a second function must treat the third dimension as spatial.
To provide input data with labeled dimensions or input data with additional layout information, you can use data formats.
A data format is a string of characters, where each character describes the type of the corresponding dimension of the data.
The characters are:
For example, for an array containing a batch of sequences where the first, second, and
third dimension correspond to channels, observations, and time steps, respectively, you can
specify that it has the format
dlnetwork objects with input layers or when you use the
trainnet function, if your data already has the layout required by
the network, then it is usually easiest to provide input data with the dimensions in the
permutation that the network requires. In this case, you can input your data directly and
not specify layout information. The required format depends on the type of input
|Feature input layer|
|2-D image input layer|
|3-D image input layer|
|Sequence input layer|
When your data has a different layout, providing formatted data or data format information
is usually easier than reshaping and preprocessing your data. For example, if you have
sequence data, where the first, second, and third dimensions correspond to channels,
observations, and time steps, respectively, then it is usually easier to specify the string
"CBT" instead of permuting and preprocessing the data to have the
layout required by the software.
To create formatted input data, create a
and specify the format using the second argument. For example, for an array
X that represents a batch of sequences, where the first, second, and
third dimension correspond to channels, observations, and time-steps respectively,
X = dlarray(X,"CBT");
When you create a formatted
dlarray object. The software automatically permutes the dimensions such that the format has dimensions in this order:
For example, if you specify a format of
"TCB" (time, channel, batch),
then the software automatically permutes the dimensions so that it has format
"CBT" (channel, batch, time).
To provide additional layout information with unformatted data, specify the formats using
the appropriate input argument of the function. For example, to apply the
dlconv operation to an unformatted
X, that represents a batch of images, where the first two dimensions
correspond to the spatial dimensions and the third and forth dimensions correspond to the
channel and batch dimensions, respectively,
Y = dlconv(X,weights,bias,DataFormat="SSCB");