Real Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor

Examples

Implement Hardware-Efficient Real Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor

How to use the Real Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor block.

Open Script

Algorithms to Determine Fixed-Point Types for Real Q-less QR Matrix Solve A'AX=B

Derivation of algorithms for determining fixed-point types for real Q-less QR matrix solve.

Open Live Script

Determine Fixed-Point Types for Real Q-less QR Matrix Solve A'AX=B

Use fixed.realQlessQRFixedpointTypes to determine fixed-point types for computation of the real least-squares matrix equation.

Open Live Script

Compute Forgetting Factor Required for Streaming Input Data

Use fixed.forgettingFactor and fixed.forgettingFactorInverse to compute forgetting factor.

Open Live Script

Ports

Input

expand all

A(i,:) — Rows of real matrix A
vector

Rows of real matrix A, specified as a vector. A is an infinitely tall matrix of streaming data. If B is single or double, A must be the same data type as B. If A is a fixed-point data type, A must be signed, use binary-point scaling, and have the same word length as B. Slope-bias representation is not supported for fixed-point data types.

Data Types: single | double | fixed point

B — Matrix B
matrix

Real matrix B, specified as a matrix. B is an n-by-p matrix where n ≥ 2. If A is single or double, B must be the same data type as A. If B is a fixed-point data type, B must be signed, use binary-point scaling, and have the same word length as A. Slope-bias representation is not supported for fixed-point data types.

Data Types: single | double | fixed point

validInA — Whether A input is valid
`Boolean` scalar

Whether A(i, ;) input is valid, specified as a Boolean scalar. This control signal indicates when the data from the A(i,:) input port is valid. When this value is 1 (true) and the readyA value is 1 (true), the block captures the values at the A(i,:) input port. When this value is 0 (false), the block ignores the input samples.

After sending a true validInA signal, there may be some delay before readyA is set to false. To ensure all data is processed, you must wait until readyA is set to false before sending another true validInA signal.

Data Types: Boolean

validInB — Whether B input is valid
`Boolean` scalar

Whether B input is valid, specified as a Boolean scalar. This control signal indicates when the data from the B input port is valid. When this value is 1 (true) and the readyB value is 1 (true), the block captures the values at the B input port. When this value is 0 (false), the block ignores the input samples.

After sending a true validInB signal, there may be some delay before readyB is set to false. To ensure all data is processed, you must wait until readyB is set to false before sending another true validInB signal.

Data Types: Boolean

restart — Whether to clear internal states
`Boolean` scalar

Whether to clear internal states, specified as a Boolean scalar. When this value is 1 (true), the block stops the current calculation and clears all internal states. When this value is 0 (false) and the validInA and validInB values are both 1 (true), the block begins a new subframe.

Data Types: Boolean

Output

expand all

X — Matrix X
vector | matrix

Matrix X, returned as a vector or matrix.

Data Types: single | double | fixed point

validOut — Whether output data is valid
`Boolean` scalar

Whether the output data is valid, returned as a Boolean scalar. This control signal indicates when the data at the output port X is valid. When this value is 1 (true), the block has successfully computed a row of X. When this value is 0 (false), the output data is not valid.

Data Types: Boolean

readyA — Whether block is ready for input A
`Boolean` scalar

Whether the block is ready for input A, returned as a Boolean scalar. This control signal indicates when the block is ready for new input data. When this value is 1 (true) and validInA value is 1 (true), the block accepts input data in the next time step. When this value is 0 (false), the block ignores input data in the next time step.

After sending a true validInA signal, there may be some delay before readyA is set to false. To ensure all data is processed, you must wait until readyA is set to false before sending another true validInA signal.

Data Types: Boolean

readyB — Whether block is ready for input B
`Boolean` scalar

Whether the block is ready for input B, returned as a Boolean scalar. This control signal indicates when the block is ready for new input data. When this value is 1 (true) and validInB value is 1 (true), the block accepts input data in the next time step. When this value is 0 (false), the block ignores input data in the next time step.

After sending a true validInB signal, there may be some delay before readyB is set to false. To ensure all data is processed, you must wait until readyB is set to false before sending another true validInB signal.

Data Types: Boolean

Parameters

expand all

Number of columns in matrix A and rows in matrix B — Number of columns in matrix A and rows in matrix B
`4` (default) | positive integer-valued scalar

Number of columns in matrix A and rows in matrix B, specified as a positive integer-valued scalar.

Programmatic Use

Block Parameter: n

Type: character vector

Values: positive integer-valued scalar

Default: 4

Number of columns in matrix B — Number of columns in matrix B
`1` (default) | positive integer-valued scalar

Number of columns in matrix B, specified as a positive integer-valued scalar.

Programmatic Use

Block Parameter: p

Type: character vector

Values: positive integer-valued scalar

Default: 1

Forgetting factor — Forgetting factor applied after each row of matrix is factored
0.99 (default) | real positive scalar

Forgetting factor applied after each row of the matrix is factored, specified as a real positive scalar. The output is updated as each row of A is input indefinitely.

Programmatic Use

Block Parameter: forgettingFactor

Type: character vector

Values: positive integer-valued scalar

Default: 0.99

Regularization parameter — Regularization parameter
0 (default) | real nonnegative scalar

Regularization parameter, specified as a nonnegative scalar. Small, positive values of the regularization parameter can improve the conditioning of the problem and reduce the variance of the estimates. While biased, the reduced variance of the estimate often results in a smaller mean squared error when compared to least-squares estimates.

Programmatic Use

Block Parameter: regularizationParameter

Type: character vector

Values: real nonnegative scalar

Default: 0

Output datatype — Data type of output matrix X
`fixdt(1,18,14)` (default) | `double` | `single` | `fixdt(1,16,0)` | `<data type expression>`

Data type of the output matrix X, specified as fixdt(1,18,14), double, single, fixdt(1,16,0), or as a user-specified data type expression. The type can be specified directly, or expressed as a data type object such as Simulink.NumericType.

Programmatic Use

Block Parameter: OutputType

Type: character vector

Values: 'fixdt(1,18,14)' | 'double' | 'single' | 'fixdt(1,16,0)' | '<data type expression>'

Default: 'fixdt(1,18,14)'

Tips

Use fixed.forgettingFactor to compute the forgetting factor, α, for an infinite number of rows with the equivalent gain of a matrix with m rows.
Use fixed.forgettingFactorInverse to compute the number of rows, m, of a matrix with equivalent gain corresponding to forgetting factor α
.

Algorithms

expand all

Q-less QR Decomposition with Forgetting Factor

The Real Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor block implements the following recursion to compute the upper-triangular factor R of continuously streaming n-by-1 row vectors A(k,:) using forgetting factor α. It's as if matrix A is infinitely tall. The forgetting factor in the range 0 < α < 1 prevents it from integrating without bound.

$\begin{matrix} R_{0} = zeros (n, n) \\ [\sim, R_{1}] = qr ([\begin{matrix} R_{0} \\ A (1, :) \end{matrix}], 0) \\ R_{1} = α R_{1} \\ [\sim, R_{2}] = qr ([\begin{matrix} R_{1} \\ A (2, :) \end{matrix}], 0) \\ R_{2} = α R_{2} \\ ⋮ \\ [\sim, R_{k}] = qr ([[\begin{matrix} R_{k - 1} \\ A (k, :) \end{matrix}]], 0) \\ R_{k} = α R_{k} \\ ⋮ \end{matrix}$

Q-less QR Decomposition with Forgetting Factor and Tikhonov Regularization

The output X_k after processing the k^th input A(k,:) is computed using the following iteration.

$\begin{matrix} R_{0} = λ I_{n} \\ [~, R_{1}] = qr ([\begin{matrix} R_{0} \\ A (1, :) \end{matrix}], 0) \\ R_{1} = α R_{1} \\ X_{1} = R_{1} \ (R'_{1} \ B) \\ [~, R_{2}] = qr ([\begin{matrix} R_{1} \\ A (2, :) \end{matrix}], 0) \\ R_{2} = α R_{2} \\ X_{2} = R_{2} \ (R'_{2} \ B) \\ ⋮ \\ [~, R_{k}] = qr ([\begin{matrix} R_{k - 1} \\ A (k, :) \end{matrix}], 0) \\ R_{k} = α R_{k} \\ X_{k} = R_{k} \ (R'_{k} \ B) \\ ⋮ \end{matrix}$

This is mathematically equivalent to computing A'_kA_kX = B, where A_k is defined as follows, though the block never actually creates A_k.

$A_{k} = [\begin{matrix} α^{k} λ I_{n} \\ [\begin{matrix} α^{k} \\ α^{k - 1} \\ ⋱ \\ α \end{matrix}] A (1 : k, :) \end{matrix}]$

Forward and Backward Substitution

When an upper triangular factor is ready, then forward and backward substitution are computed with the current input B to produce output X.

$X = R_{k} \ (R_{k}^{'} \ B)$

Choosing the Implementation Method

Systolic implementations prioritize speed of computations over space constraints, while burst implementations prioritize space constraints at the expense of speed of the operations. The following table illustrates the tradeoffs between the implementations available for matrix decompositions and solving systems of linear equations.

Implementation	Throughput	Latency	Area
Systolic	C	O(n)	O(mn²)
Partial-Systolic	C	O(m)	O(n²)
Partial-Systolic with Forgetting Factor	C	O(n)	O(n²)
Burst	O(n)	O(mn)	O(n)

Where C is a constant proportional to the word length of the data, m is the number of rows in matrix A, and n is the number of columns in matrix A.

For additional considerations in selecting a block for your application, see Choose a Block for HDL-Optimized Fixed-Point Matrix Operations.

AMBA AXI Handshake Process

This block uses the AMBA AXI handshake protocol [1]. The valid/ready handshake process is used to transfer data and control information. This two-way control mechanism allows both the manager and subordinate to control the rate at which information moves between manager and subordinate. A valid signal indicates when data is available. The ready signal indicates that the block can accept the data. Transfer of data occurs only when both the valid and ready signals are high.

Block Timing

The Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor blocks accept matrix A row-by-row and matrix B as a single vector. After accepting the first valid pair of A and B matrices, the block outputs the X matrices row by row continuously.

For example, assume that the input A matrix is 3-by-3. Additionally assume that validIn asserts before ready, meaning that the upstream data source is faster than the QR decomposition.

In the figure,

A1r1 is the first row of the first A matrix, A1r2 is the second row of the first A matrix, and so on.
validIn to ready — From a successful A row input to the block being ready to accept the next row.
validOut to validOut — Because the Forward Backward Substitution block runs continuously, it generates output at a constant rate. This is the delay between two adjacent valid outputs.
Last row validIn to validOut — From the last m^th row input to the block starting to output the solution.
This block is always ready to accept B matrices, so readyB is always asserted.

The following table provides details of the timing for the Real Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor block. Latency depends on the size of matrix A and the data types of the A and B matrices. In the table:

n is the number of columns in matrix A.
wl represents the word length of the input data in matrix A.

Input Data Type	`validIn` to `ready` (cycles)	`validOut` to `validOut` (cycles)	Last Row `validIn` to `validOut` (cycles)
Fixed point `fi`	wl + 7	4n² + 25n + 5 + 2nwl + 2nnextpow2(wl)	4n² + 25n + 5 + 2nwl + 2nnextpow2(wl) + (wl + 6)*n + 2
Scaled double `fi`	wl + 7	4n² + 23n + 5 + 2nwl	4n² + 25n + 5 + 2nwl + (wl + 4)*n + 2
`double`	60	4n² + 21n + 5	4n² + 80n + 7
`single`	31	4n² + 21n + 5	4n² + 51n + 7

Hardware Resource Utilization

This block supports HDL code generation using the Simulink^® HDL Workflow Advisor. For an example, see HDL Code Generation and FPGA Synthesis from Simulink Model (HDL Coder) and Implement Digital Downconverter for FPGA (DSP HDL Toolbox).

In R2022b: The following tables show the post place-and-route resource utilization results and timing summary, respectively.

This example data was generated by synthesizing the block on a Xilinx^® Zynq^® UltraScale™ + RFSoC ZCU111 evaluation board. The synthesis tool was Vivado^® v.2020.2 (win64).

The following parameters were used for synthesis.

Block parameters:
- n = 16
- p = 1
- Matrix A dimension: inf-by-16
- Matrix B dimension: 16-by-1
Input data type: sfix16_En14
Target frequency: 250 MHz

Resource	Usage	Available	Utilization (%)
CLB LUTs	120582	425280	28.35
CLB Registers	90769	850560	10.67
DSPs	4	4272	0.09
Block RAM Tile	0	1080	0.00
URAM	0	80	0.00

	Value
Requirement	4 ns
Data Path Delay	3.853 ns
Slack	0.129 ns
Clock Frequency	258.33 MHz

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Slope-bias representation is not supported for fixed-point data types.

HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.

HDL Coder™ provides additional configuration options that affect HDL implementation and synthesized logic.

HDL Architecture

This block has one default HDL architecture.

HDL Block Properties

General
ConstrainedOutputPipeline	Number of registers to place at the outputs by moving existing delays within your design. Distributed pipelining does not redistribute these registers. The default is `0`. For more details, see ConstrainedOutputPipeline (HDL Coder).
InputPipeline	Number of input pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see InputPipeline (HDL Coder).
OutputPipeline	Number of output pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see OutputPipeline (HDL Coder).

Restrictions

Supports fixed-point data types only.

Version History

Introduced in R2020b

expand all

R2023a: Smart unrolling for improved resource utilization

This block depends on a partial-systolic QR decomposition block. Since 23a, when you update the diagram, the loop which composes the partial-systolic pipeline in the QR decomposition block is unrolled. This updated internal architecture removes dead operations in simulation and generated code, thus requiring fewer hardware resources. This block simulates with clock and bit-true fidelity with respect to library versions of these blocks in previous releases.

R2022a: Support for Tikhonov regularization parameter

The Real Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor block now supports the Tikhonov Regularization parameter.

Real Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor

Description

Examples

Implement Hardware-Efficient Real Partial-Systolic Matrix Solve Using Q-less QR Decomposition with Forgetting Factor

Algorithms to Determine Fixed-Point Types for Real Q-less QR Matrix Solve A'AX=B

Determine Fixed-Point Types for Real Q-less QR Matrix Solve A'AX=B

Compute Forgetting Factor Required for Streaming Input Data

Ports

Input

A(i,:) — Rows of real matrix A vector

B — Matrix B matrix

validInA — Whether A input is valid Boolean scalar

validInB — Whether B input is valid Boolean scalar

restart — Whether to clear internal states Boolean scalar

Output

X — Matrix X vector | matrix

validOut — Whether output data is valid Boolean scalar

readyA — Whether block is ready for input A Boolean scalar

readyB — Whether block is ready for input B Boolean scalar

Parameters

Number of columns in matrix A and rows in matrix B — Number of columns in matrix A and rows in matrix B 4 (default) | positive integer-valued scalar

Programmatic Use

Number of columns in matrix B — Number of columns in matrix B 1 (default) | positive integer-valued scalar

Programmatic Use

Forgetting factor — Forgetting factor applied after each row of matrix is factored 0.99 (default) | real positive scalar

Programmatic Use

Regularization parameter — Regularization parameter 0 (default) | real nonnegative scalar

Programmatic Use

Output datatype — Data type of output matrix X fixdt(1,18,14) (default) | double | single | fixdt(1,16,0) | <data type expression>

Programmatic Use

Tips

Algorithms

Q-less QR Decomposition with Forgetting Factor

Q-less QR Decomposition with Forgetting Factor and Tikhonov Regularization

Forward and Backward Substitution

Choosing the Implementation Method

AMBA AXI Handshake Process

Block Timing

Hardware Resource Utilization

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using Simulink® Coder™.

HDL Code Generation Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.

Version History

R2023a: Smart unrolling for improved resource utilization

R2022a: Support for Tikhonov regularization parameter

See Also

Blocks

Functions

Topics

A(i,:) — Rows of real matrix A
vector

B — Matrix B
matrix

validInA — Whether A input is valid
`Boolean` scalar

validInB — Whether B input is valid
`Boolean` scalar

restart — Whether to clear internal states
`Boolean` scalar

X — Matrix X
vector | matrix

validOut — Whether output data is valid
`Boolean` scalar

readyA — Whether block is ready for input A
`Boolean` scalar

readyB — Whether block is ready for input B
`Boolean` scalar

Number of columns in matrix A and rows in matrix B — Number of columns in matrix A and rows in matrix B
`4` (default) | positive integer-valued scalar

Number of columns in matrix B — Number of columns in matrix B
`1` (default) | positive integer-valued scalar

Forgetting factor — Forgetting factor applied after each row of matrix is factored
0.99 (default) | real positive scalar

Regularization parameter — Regularization parameter
0 (default) | real nonnegative scalar

Output datatype — Data type of output matrix X
`fixdt(1,18,14)` (default) | `double` | `single` | `fixdt(1,16,0)` | `<data type expression>`

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.