Resource Sharing Guidelines for Vector Processing and Matrix Multiplication

Resource sharing is an area optimization in which HDL Coder™ identifies multiple functionally equivalent resources and replaces them with a single resource. The data is time-multiplexed over the shared resource to perform the same operations. To learn more about how resource sharing works, see Resource Sharing.

You can follow these guidelines to learn how to use the resource sharing with streaming when processing 1-D vectors and 2-D matrices. Each guideline has a severity level that indicates the level of compliance requirements. To learn more, see HDL Modeling Guidelines Severity Levels.

Use StreamingFactor for Resource Sharing of Vector Signals

Open Script

Guideline ID

3.1.9

Severity

Informative

Description

To reduce circuit area of a Subsystem block that performs the same computation on each element of a 1-D vector, use the Subsystem HDL block property StreamingFactor. For a vector signal that has N elements, set StreamingFactor to N. By using time-division multiplexing to process each element, you can process the result by using smaller number of operations. The clock frequency of the operators becomes N times faster than that of the original model.

When the subsystem containing resources to be shared uses multiple vector signals with different sizes, the clock frequency is multiplied by the least common multiple of the vector sizes, which can reduce the maximum achievable target frequency. To achieve the desired frequency:

Add logic for demultiplexing the vector signal before it enters the subsystem and for multiplexing the signal that leaves the subsystem. You can then specify a SharingFactor on the subsystem instead of the StreamingFactor.
Pad the different vector signals to make them the same size as the vector signal that has the maximum size, and then specify the StreamingFactor.

Open the model hdlcoder_vector_stream_gain.

open_system('hdlcoder_vector_stream_gain')
set_param('hdlcoder_vector_stream_gain', 'SimulationCommand', 'Update')

The model accepts a 10-element vector signal as input and multiplies each element by a gain value that is one more than the previous value.

open_system('hdlcoder_vector_stream_gain/Gain_Stream')

To see the simulation results, simulate the model and open the Scope block.

sim('hdlcoder_vector_stream_gain')
open_system('hdlcoder_vector_stream_gain/Show Processing Time')

The Gain_Stream subsystem has a StreamingFactor set to 10. To generate HDL code for this subsystem, run the makehdl function:

makehdl('hdlcoder_vector_stream_gain/Gain_Stream')

After generating HDL code, to see the effect of the streaming optimization, open the generated model and navigate inside the Gain_Stream subsystem.

The vector data is serialized on the input side and the output size parallelizes the serial data. This optimization increases the total circuit size conversely when the target circuit size to be shared is small. The Gain block inside the shared subsystem is running at a rate that is 10 times faster than the model base rate, which avoids an increase in the subsystem latency and balances the reduction in maximum achievable frequency by the increase in area savings on the target hardware.

Use SharingFactor and HDL Block Properties for Sharing Matrix Multiplication Operations

Guideline ID

3.1.10

Severity

Informative

Description

The Matrix Multiply block is a Product block that has Multiplication block parameter set to Matrix(*). In the HDL Block Properties dialog box, the HDL architecture is set to Matrix Multiply and you can specify the DotProductStrategy.

DotProductStrategy Settings

DotProductStrategy	Description
`'Fully Parallel'` (default)	Performs multiplication and addition operations in parallel. `[MxN][NxM]` matrix multiplication requires `NM*M` multipliers.
`'Parallel Multiply-Accumulate'`	Uses the Parallel architecture of the Multiply-Accumulate block to implement the matrix multiplication. This architecture performs multiple Multiply-Add blocks in parallel with accumulation.
`'Serial Multiply-Accumulate'`	Uses the Serial architecture of the Multiply-Accumulate block to implement the matrix multiplication. This mode performs `N` times oversampling and number of multipliers becomes `M*M`.

To share resources and reduce the number of multipliers further, when you have multiple Matrix Multiply blocks in the same subsystem, set DotProductStrategy to Fully Parallel and specify the SharingFactor on the upper subsystem.

For multiplications involving complex and real numbers, the number of multipliers become doubled.

Number of Multipliers Generated by Multiplication of [MxN]*[NxM]

Multiplication Type	Fully Parallel/Parallel Multiply-Accumulate	Serial Multiply-Accumulate
Real x Real	`NMM`	`M*M`
Complex x Real	`NMM*2`	`MM2`
Complex x Complex	`NMM*4`	`MM4`

For floating-point matrix multiplication, select Use Floating Point. In this case, you must use the Fully Parallel DotProductStrategy. As this mode does not use element-wise operations and performs parallel multiplication and addition operations, use the SharingFactor instead of the StreamingFactor to share resources and save circuit area.

For an example that shows how to perform streaming matrix multiplication using floating-point types, see HDL Code Generation for Streaming Matrix Multiply System Object.