Main Content

Supported Networks, Layers, Boards, and Tools

Supported Pretrained Networks

Deep Learning HDL Toolbox™ supports code generation for series convolutional neural networks (CNNs or ConvNets). You can generate code for any trained CNN whose computational layers are supported for code generation. For a full list, see Supported Layers. You can use one of the pretrained networks listed in the table to generate code for your target Intel® or Xilinx® FPGA boards.

NetworkNetwork DescriptionTypeSingle Data Type (with Shipping Bitstreams)INT8 data type (with Shipping Bitstreams)Application Area
   ZCU102ZC706Arria10 SoCZCU102ZC706Arria10 SoCClassification
AlexNet

AlexNet convolutional neural network.

Series NetworkNo. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.Classification
LogoNet

Logo recognition network (LogoNet) is a MATLAB® developed logo identification network. For more information, see Logo Recognition Network.

Series NetworkYesYesYesYesYesYesClassification
DigitsNet

Digit classification network. See Create Simple Deep Learning Neural Network for Classification.

Series NetworkYesYesYesYesYesYesClassification
Lane detection

LaneNet convolutional neural network. For more information, see Deploy Transfer Learning Network for Lane Detection.

Series NetworkNo. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.Classification
VGG-16

VGG-16 convolutional neural network. For the pretrained VGG-16 model, see vgg16.

Series NetworkNo. Network exceeds PL DDR memory size.No. Network exceeds FC module memory size.YesYesNo. Network exceeds FC module memory size.YesClassification
VGG-19

VGG-19 convolutional neural network. For the pretrained VGG-19 model, see vgg19.

Series NetworkNo. Network exceeds PL DDR memory size.No. Network exceeds FC module memory size.YesYesNo. Network exceeds FC module memory size.YesClassification
Darknet-19

Darknet-19 convolutional neural network. For the pretrained darknet-19 model, see darknet19.

Series NetworkYesYesYesYesYesYesClassification
Radar ClassificationConvolutional neural network that uses micro-Doppler signatures to identify and classify the object. For more information, see Bicyclist and Pedestrian Classification by Using FPGA.Series NetworkYesYesYesYesYesYesClassification and Software Defined Radio (SDR)
Defect Detection snet_defnetsnet_defnet is a custom AlexNet network used to identify and classify defects. For more information, see Defect Detection.Series NetworkNo. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.Classification
Defect Detection snet_blemdetnetsnet_blemdetnet is a custom convolutional neural network used to identify and classify defects. For more information, see Defect Detection.Series NetworkNo. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.Classification
DarkNet-53Darknet-53 convolutional neural network. For the pretrained DarkNet-53 model, see darknet53.Directed acyclic graph (DAG) network basedYesYesYesYesYesNoClassification
ResNet-18ResNet-18 convolutional neural network. For the pretrained ResNet-18 model, see resnet18.Directed acyclic graph (DAG) network basedYesYesYesYesYesYesClassification
ResNet-50ResNet-50 convolutional neural network. For the pretrained ResNet-50 model, see resnet50.Directed acyclic graph (DAG) network basedNo. Network exceeds PL DDR memory size.No. Network exceeds PL DDR memory size.YesYesYesYesClassification
ResNet-based YOLO v2You only look once (YOLO) is an object detector that decodes the predictions from a convolutional neural network and generates bounding boxes around the objects. For more information, see Vehicle Detection Using DAG Network Based YOLO v2 Deployed to FPGA.Directed acyclic graph (DAG) network basedYesYesYesYesYesYesObject detection
MobileNetV2MobileNet-v2 convolutional neural network. For the pretrained MobileNet-v2 model, see mobilenetv2.Directed acyclic graph (DAG) network basedYesYesYesYesYesYesClassification
GoogLeNetGoogLeNet convolutional neural network. For the pretrained GoogLeNet model, see googlenet.Directed acyclic graph (DAG) network basedNo. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.No. To use the bitstream, enable the LRNBlockGeneration property of the processor configuration for the bitstream and generate the bitstream again.Classification
PoseNetHuman pose estimation network.Directed acyclic graph (DAG) network basedYes.YesYesYesYesYesSegmentation
U-NetU-Net convolutional neural network designed for semantic image segmentation.Directed acyclic graph (DAG) network basedNo. PL DDR memory oversize.No. PL DDR memory oversize.No. PL DDR memory oversize.No. PL DDR memory oversize.No. PL DDR memory oversize.YesSegmentation
SqueezeNet-based YOLO v3The you-only-look-once (YOLO) v3 object detector is a multi-scale object detection network that uses a feature extraction network and multiple detection heads to make predictions at multiple scales.dlnetwork objectYesYesNoNoNoNoObject detection
Sequence-to-sequence classificationClassify each time step of sequence data using a long short-term memory (LSTM) network. See Run Sequence-to-Sequence Classification on FPGAs by Using Deep Learning HDL Toolbox.Long short-term memory (LSTM) networkYesYesYesNoNoNoSequence data classification
Time series forecastingForecast time series data using a long short-term memory (LSTM) network. See Run Sequence Forecasting on FPGA by Using Deep Learning HDL Toolbox.Long short-term memory (LSTM) networkYesYesYesNoNoNoForecast time series data
Word-by-word text generationGenerate text word-by-word by using a long short-term memory (LSTM) network. See Generate Word-By-Word Text on FPGAs by Using Deep Learning HDL Toolbox.Long short-term memory (LSTM) networkYesYesYesNoNoNoSequence data prediction
YAMNetPretrained audio classification network. See yamnet (Audio Toolbox) and Deploy YAMNet Networks to FPGAs with and without Cross-Layer Equalization.Series NetworkYesYesYesYesYesYesAudio data classification
Semantic Segmentation Using Dilated ConvolutionsSemantic segmentation using dilated convolution layer to increase coverage area without increasing the number of computational parameters. See Deploy Semantic Segmentation Network Using Dilated Convolutions on FPGA.Series NetworkYesYesYesYesYesYesSegmentation
Time series forecastingForecast time series data using a long short-term memory (LSTM) network. See Run Sequence Forecasting Using a GRU Layer on an FPGA.Gated recurrent unit (GRU) layer networkYesYesYesNoNoNoForecast time series data
Pruned image classification networkPruned image classification network. See Deploy Image Recognition Network on FPGA with and Without Pruning.Series networkYesYesYesYesYesYesImage classification
Very-deep super-resolution (VDSR) networkCreate high resolution images from low-resolution images by using VDSR networks. See Increase Image Resolution Using VDSR Network Running on FPGA.Series networkYesYesYesYesYesYesImage processing
YOLO v4 tinyThe you only look once version 4 (YOLO v4) object detection network is a one-stage object detection network and is composed of three parts: backbone, neck, and head. SeeDetect Objects Using YOLOv4-tiny Network Deployed to FPGA.dlnetwork objectYesYesYesYesYesYesObject detection

Supported Layers

Deep Learning HDL Toolbox supports the layers listed in these tables.

Input Layers

Layer Layer Type Hardware (HW) or Software(SW)Description and LimitationsINT8 Compatible

imageInputLayer

SW

An image input layer inputs 2-D images to a network and applies data normalization. The normalization options zero-center and zscore can run on hardware if the compile method HardwareNormalization argument is enabled and the input data is of single data type. If the HardwareNormalization option is not enabled or the input data type is int8 the normalization runs in software. Normalization specified using a function handle is not supported. See Image Input Layer Normalization Hardware Implementation. When the Normalization property is set to none the activations function cannot be used for the imageInputLayer.

Yes. Runs as single datatype in SW.

featureInputLayer

SWA feature input layer inputs feature data to a network and applies data normalization.Yes

sequenceInputLayer

SWA sequence input layer inputs sequence data to a network.Yes
wordEmbeddingLayer (Text Analytics Toolbox)SWA word embedding layer maps word indices to vectors.No

Convolution and Fully Connected Layers

Layer Layer Type Hardware (HW) or Software(SW)Layer Output FormatDescription and LimitationsINT8 Compatible

convolution2dLayer

HWConvolution (Conv)

A 2-D convolutional layer applies sliding convolutional filters to the input.

When generating code for a network using this layer, these limitations apply:

  • Filter size must be 1-201.

  • Stride size must be 1-100 and square.

  • Padding size must be in the range 0-100.

  • Dilation factor supported up to [16 16] and must be square.

  • Padding value is not supported.

  • When the dilation factor is a multiple of three the calculated dilated filter size must have a maximum value of the existing convolution filter size limit. In all other cases, the filter size can be as large as the maximum value of the existing convolution filter size.

Yes

groupedConvolution2dLayer

HWConvolution (Conv)

A 2-D grouped convolutional layer separates the input channels into groups and applies sliding convolutional filters. Use grouped convolutional layers for channel-wise separable (also known as depth-wise separable) convolution.

Code generation is now supported for a 2-D grouped convolution layer that has the NumGroups property set as 'channel-wise'.

When generating code for a network using this layer, these limitations apply:

  • Filter size must be 1-201 and square. For example [1 1] or [14 14]. When the NumGroups is set as 'channel-wise', filter size must be 3-14.

  • Stride size must be 1-100 and square.

  • Padding size must be in the range 0-100.

  • Dilation factor must be [1 1].

  • When the NumGroups is not set as 'channel-wise', number of groups must be 1 or 2.

  • The input feature number must be greater than a single multiple of the square root of the ConvThreadNumber.

  • When the NumGroups is not set as 'channel-wise', the number of filters per group must be a multiple of the square root of the ConvThreadNumber.

Yes

transposedConv2dLayer

HWConvolution (Conv)

A transposed 2-D convolution layer upsamples feature maps.

When generating code for a network using this layer, these limitations apply:

  • Filter size must be 1-64 and square.

  • Stride size must be 1-64 and square.

  • Padding size must be in the range 0-8.

  • Padding value is not supported.

Yes

fullyConnectedLayer

HWFully Connected (FC)

A fully connected layer multiplies the input by a weight matrix, and then adds a bias vector.

When generating code for a network using this layer, these limitations apply:

Yes

Activation Layers

LayerLayer Type Hardware (HW) or Software(SW)Layer Output FormatDescription and LimitationsINT8 Compatible

reluLayer

HWLayer is fused.

A ReLU layer performs a threshold operation to each element of the input where any value less than zero is set to zero.

A ReLU layer is supported only when it is preceded by any of these layers:

  • Convolution

  • Fully Connected

  • Adder

Yes

leakyReluLayer

HWLayer is fused.

A leaky ReLU layer performs a threshold operation where any input value less than zero is multiplied by a fixed scalar.

A leaky ReLU layer is supported only when it is preceded by any of these layers:

  • Convolution

  • Fully Connected

  • Adder

Yes

clippedReluLayer

HWLayer is fused.

A clipped ReLU layer performs a threshold operation where any input value less than zero is set to zero and any value above the clipping ceiling is set to that clipping ceiling value.

A clipped ReLU layer is supported only when it is preceded by any of these layers:

  • Convolution

  • Fully Connected

  • Adder

Yes

tanhLayer

HWInherit from input

A hyperbolic tangent (tanh) activation layer applies the tanh function on the layer inputs.

Yes. Runs as single datatype in HW.

swishLayerHWInherit from inputA swish layer applies the swish activation function on layer inputs.No
dlhdl.layer.mishLayerHWInherit from inputA mish layer applies the mish activation function on layer inputs.No

Normalization, Dropout, and Cropping Layers

LayerLayer Type Hardware (HW) or Software(SW)Layer Output FormatDescription and LimitationsINT8 Compatible

batchNormalizationLayer

HWLayer is fused.

A batch normalization layer normalizes each input channel across a mini-batch.

A batch normalization layer is supported when preceded by an image input layer, convolution layer, or as a standalone layer.

Yes

crossChannelNormalizationLayer

HWConvolution (Conv)

A channel-wise local response (cross-channel) normalization layer carries out channel-wise normalization.

The WindowChannelSize must be in the range of 3-9 for code generation.

Yes. Runs as single datatype in HW.

dropoutLayer

NoOP on inferenceNoOP on inference

A dropout layer randomly sets input elements to zero within a given probability.

Yes

resize2dLayer (Image Processing Toolbox)

HWInherit from input

A 2-D resize layer resizes 2-D input by a scale factor, to a specified height and width, or to the size of a reference input feature map.

When generating code for a network using this layer, these limitations apply:

  • The Method property must be set to nearest.

  • The GeometricTransformationMode property must be set to half-pixel.

  • The NearestRoundingMode property must be set to round.

  • The ratio of the output size to input size must be an integer and in the range between two and 256.

The resize2DLayer is not supported for Intel FPGA and SoC boards.

Yes
crop2dLayerHWInherit from input

A 2-D crop layer applies 2-D cropping to the input.

When generating code for a network using this layer, these limitations apply:

  • The maximum number of rows or columns in the input to the layer must be 512.

  • The minimum number of columns in the output of the layer must be three.

Yes
dlhdl.layer.sliceLayerHW 

A slice layer divides the input to the layer into an equal number of groups along the channel dimension of the image.

When generating code for a network using this layer, these limitations apply:

  • Number of input channels to the layer must be a multiple of the Groups property of the layer.

  • The group size should be a multiple of the convolution thread number.

  • The slice layer should not be followed by a custom layer.

Yes

Pooling and Unpooling Layers

LayerLayer Type Hardware (HW) or Software(SW)Layer Output FormatDescription and LimitationsINT8 Compatible

maxPooling2dLayer

HWConvolution (Conv)

A max pooling layer performs downsampling by dividing the layer input into rectangular pooling regions and computing the maximum of each region.

When generating code for a network using this layer, these limitations apply:

  • Pool size must be 1-201.

  • Stride size must be 1-100 and square.

  • Padding size must be in the range 0-100.

HasUnpoolingOutputs is supported. When this parameter is enabled, these limitations apply for code generation for this layer:

  • Pool size must be 2-by-2 or 3-by-3.

  • The stride size must be the same as the filter size.

  • Padding size is not supported.

  • Pool size and stride size must be square. For example, [2 2].

Yes

No, when HasUnpoolingOutputs is enabled.

maxUnpooling2dLayer

HWConvolution (Conv)

A max unpooling layer unpools the output of a max pooling layer.

No

averagePooling2dLayer

HWConvolution (Conv)

An average pooling layer performs downsampling by dividing the layer input into rectangular pooling regions and computing the average values of each region.

When generating code for a network using this layer, these limitations apply:

  • Pool size must be 1-201.

  • Stride size must be 1-100 and square.

  • Padding size must be in the range 0-100.

Yes

globalAveragePooling2dLayer

HWConvolution (Conv)

A global average pooling layer performs downsampling by computing the mean of the height and width dimensions of the input.

When generating code for a network using this layer, these limitations apply:

  • The pool size must be 1-201 and square.

Yes

Combination Layers

LayerLayer Type Hardware (HW) or Software(SW)Layer Output FormatDescription and LimitationsINT8 Compatible

additionLayer

HWInherit from input.

An addition layer adds inputs from multiple neural network layers element-wise.

You can now generated code for this layer with int8 data type when the layer is combined with a Leaky ReLU or Clipped ReLU layer.

When generating code for a network using this layer, these limitations apply:

  • Both input layers must have the same output layer format. For example, both layers must have conv output format or fc output format.

Yes

depthConcatenationLayer

HWInherit from input.

A depth concatenation layer takes inputs that have the same height and width and concatenates them along the third dimension (the channel dimension).

When generating code for a network using this layer, these limitations apply:

  • The input activation feature number must be a multiple of the square root of the ConvThreadNumber.

  • Layers that have a conv output format and layers that have an FC output format cannot be concatenated together.

Yes

multiplicationLayer

HWInherit from inputA multiplication layer multiplies inputs from multiple neural network layers element-wise.Yes

Sequence Layers

LayerLayer Type Hardware (HW) or Software(SW)Description and LimitationsINT8 Compatible

lstmLayer

HW

An LSTM layer learns long-term dependencies between time steps in time series and sequence data. The layer performs additive interactions, which can help improve gradient flow over long sequences during training.

When generating code for a network using this layer, these limitations apply:

  • The input must be of single data type.

  • The OutputMode property must be set to sequence.

No

gruLayer

HW

A GRU layer is an RNN layer that learns dependencies between time steps in time series and sequence data.

When generating code for a network using this layer, these limitations apply:

  • Inputs must be of single data type.

  • You must set the GRU layer OutputMode to sequence.

No
lstmProjectedLayer

HW

A projected LSTM layer is a type of deep learning layer that enables compression by reducing the number of stored learnable parameters.

When generating code for a network using this layer, these limitations apply:

  • The input must be of single data type.

  • The OutputMode property must be set to sequence.

No
gruProjectedLayer

HW

A projected GRU layer is a type of deep learning layer that enables compression by reducing the number of stored learnable parameters.

When generating code for a network using this layer, these limitations apply:

  • Inputs must be of single data type.

  • The OutputMode property must be set to sequence.

No

Output Layer

LayerLayer Type Hardware (HW) or Software(SW)Description and LimitationsINT8 Compatible

softmaxLayer

SW and HW

A softmax layer applies a softmax function to the input.

If the softmax layer is implemented in hardware:

  • The inputs must be in the range -87 to 88.

  • Softmax layer followed by adder layer or depth concatenation layer is not supported.

  • The inputs to this layer must have the format 1-by-N, N-by-1, 1-by-1-by-N, N-by-1-by-1, and 1-by-N-by-1.

  • If the convolution module of the deep learning processor is enabled the square root of the convolution thread number must be an integral power of two. If not, the layer is implemented in software.

Yes. Runs as single datatype in SW.

classificationLayer

SW

A classification layer computes the cross-entropy loss for multiclass classification issues that have mutually exclusive classes.

Yes

regressionLayer

SW

A regression layer computes the half mean squared error loss for regression problems.

Yes

sigmoidLayer

HW

A sigmoid layer applies a sigmoid function to the input.

The sigmoid layer is implemented in the custom module of the deep learning processor configuration and runs as single datatype in HW.

Yes. Runs as single datatype in HW.

Keras and ONNX Layers

LayerLayer Type Hardware (HW) or Software(SW)Layer Output FormatDescription and LimitationsINT8 Compatible
nnet.keras.layer.FlattenCStyleLayerHWLayer will be fused

Flatten activations into 1-D layers assuming C-style (row-major) order.

A nnet.keras.layer.FlattenCStyleLayer is supported only when it is followed by a fully connected layer.

Yes

nnet.keras.layer.ZeroPadding2dLayerHWLayer will be fused.

Zero padding layer for 2-D input.

A nnet.keras.layer.ZeroPadding2dLayer is supported only when it is followed by a convolution layer, maxpool layer, or a grouped convolution layer.

Yes

nnet.onnx.layer.FlattenInto2dLayerHWLayer will be fused

Flattens a MATLAB 2D image batch in the way ONNX does, producing a 2D output array with CB format.

A nnet.onnx.layer.FlattenInto2dLayer layer is fused with the following fully connected layer.

Yes
nnet.onnx.layer.FlattenLayerHWLayer will be fused

Flatten layer for ONNX™ network.

A nnet.onnx.layer.FlattenLayer layer must be followed by a fully connected layer or a depth concatenation layer.

If the layer following the nnet.onnx.layer.FlattenLayer layer is a depth concatenation layer:

  • All inputs to the depth concatenation layer must flatten to layers of the same type.

  • The following layer must be a fully connected layer.

  • Flatten layers that are inputs to depth concatenation layers must not share inputs.

Yes
flattenLayerHWLayer will be fused

A flatten layer collapses the spatial dimensions of the input into the channel dimension.

A flatten layer should be followed by a fully connected layer.

No

Custom Layers

LayerLayer Type Hardware (HW) or Software(SW)Layer Output FormatDescription and LimitationsINT8 Compatible
Custom LayersHWInherit from inputCustom layers, with or without learnable parameters, that you define for your problem. To learn how to define your custom deep learning layers, see Create Deep Learning Processor Configuration for Custom Layers.No

Supported Boards

These boards are supported by Deep Learning HDL Toolbox:

Third-Party Synthesis Tools and Version Support

Deep Learning HDL Toolbox has been tested with:

  • Xilinx Vivado® Design Suite 2023.1

  • Intel Quartus® Prime Standard 22.1.1

Image Input Layer Normalization Hardware Implementation

To enable hardware implementation of the normalization functions for the image input layer, set the HardwareNormalization argument of the compile method to auto or on. When HardwareNormalization is set to auto, the compile method looks for the presence of addition and multiplication layers to implement the normalization function on hardware. The normalization is implemented on hardware by:

  • Creating a new constant layer, This layer holds the value which is to be subtracted.

  • Using existing addition and multiplication layers. The layers to be used depends on the normalization function being implemented.

Constant Layer Buffer Content

This table describes the value stored in the constant layer buffer.

Normalization FunctionNumber of ConstantsConstant Layer Buffer Value
zerocenter1- Mean
zscore2The first constant value is -Mean. The second constant value is 1/StandardDeviation

Related Topics