CORDIC Square Root HDL Optimized
Libraries:
FixedPoint Designer HDL Support /
Math Operations
Description
The CORDIC Square Root HDL Optimized block returns the square root of
u
, computed using a CORDICbased implementation optimized for HDL code
generation.
Examples
How to Use CORDIC Square Root HDL Optimized Block
This example shows how to use the CORDIC Square Root HDL Optimized block to compute the square root of real nonnegative scalars.
CORDICBased Square Root
The CORDIC Square Root HDL Optimized block uses a CORDIC algorithm in hyperbolic vectoring mode to compute the approximation of square root (see Compute Square Root Using CORDIC). This CORDICbased algorithm is different from the Simulink® Sqrt block, which uses bisection and NewtonRaphson methods. The algorithm in the CORDIC Square Root HDL Optimized block requires only iterative shiftadd operations.
I/O Interface
The CORDIC Square Root HDL Optimized block is fullypipelined. It can accept input data on any cycle, including on consecutive clock cycles. Use validIn to indicate a valid input. When the block has finished the computation, it will change validOut to true for one clock cycle. For inputs sent on consecutive clock cycles, validOut will also be set to true on consecutive clock cycles.
Customizable CORDIC Maximum Shift Value and Number of Iterations Per Pipeline Register
This block uses iterative normalization and CORDIC algorithms. If the input is fixed point or scaled doubles, it uses multiple steps for computation. The normalization uses nextpow2(u.WordLength)
iterations. The number of CORDIC iterations depends on the CORDIC maximum shift value. A larger word length can provide higher resolution but needs more iterations to process. This block can perform multiple iterations per pipeline stage. This results in smaller latency at cost of longer critical path in the generated HDL design.
For example, if the word length of the input u is 16, normalization requires 4 iterations. If the Automatically select CORDIC maximum shift value based on input word length parameter is selected, this block uses 16  1 = 15 as the CORDIC maximum shift value in the computation and it requires 17 iterations. The total number of iterations is 4 + 17 = 21 and the latency of the block is 2 + ceil(total number of iterations/nIterPerReg)
. If the number of iterations per pipeline register is set to 1, then the block latency is 23; if the number of iterations per pipeline register is set to 2, then the block latency is 13; etc. If the number of iterations per pipeline register is greater than or equal to the total number of required iterations, the block performs all iterations in one pipeline stage and the total latency is minimized to 3.
The total number of iterations and block latency can be calculated using the embblk.latency.cordicSqrtHDLOptimizedLatency
function.
If the input is floating point, the block latency is 0.
Define Simulation Parameters
Specify the number of input samples.
numSamples = 10;
Specify the data type as fixed
, scaledDouble
, single
, or double
.
DT = 'fixed';
For fixedpoint data type, specify the word length and fraction length.
wordLength = 16; FractionLength = 10;
If the Automatically select CORDIC maximum shift value based on input word length parameter
is not selected, define the maximum CORDIC shift value. For fixed point data types, this value cannot exceed wordLength  1
.
autoMaxVal = "on";
maximumShiftValue = wordLength  1;
Generate Input Data
Generate input data u. The input value must be a real nonnegative scalar.
rng('default');
u = abs(randn(1,numSamples));
Cast to Selected Data Type
Cast the input data u to the selected data type.
switch lower(DT) case 'fixed' u = cast(u,'like',fi([],1,wordLength,FractionLength)); case 'scaleddouble' u = cast(u,'like',fi([],1,wordLength,FractionLength),'DataType','ScaledDouble'); case 'single' u = single(u); case 'double' u = double(u); otherwise u = double(u); end
Configure Block Pipeline
Check how many iterations the block requires for the selected data type.
[~, totalIterations] = embblk.latency.cordicSqrtHDLOptimizedLatency(u,1,maximumShiftValue)
totalIterations = 21
Define the number of iterations to be performed in one pipeline stage.
nIterPerReg = 1;
Open the Model
Open the CORDICSquareRootModel
model.
model = 'CORDICSquareRootModel';
open_system(model);
Simulate the Model
Configure the model workspace and run the simulation.
fixed.example.setModelWorkspace(model,'u',u,'numSamples',numSamples,'maximumShiftValue',maximumShiftValue,... 'nIterPerReg',nIterPerReg); set_param([model,'/CORDIC Square Root HDL Optimized'],'autoMaximumShiftVal',autoMaxVal); out = sim(model);
Verify Output Solutions
Compare the fixedpoint result from the CORDIC Square Root HDL Optimized block with the floatingpoint result from the MATLAB sqrt
function.
yBuiltIn = sqrt(double(u))'; y = out.y(1:numSamples); absError = (double(y)yBuiltIn)
absError = 10×1
10^{3} ×
0.1450
0.7312
0.0029
0.8692
0.2197
0.9328
0.2752
0.5076
0.9682
0.1284
Block Latency
The block latency is the number of clock cycles between a successful input and when the corresponding output becomes valid. The latency of this block depends on the datatype, CORDIC maximum shift value, and Number of iterations per pipeline register.
Calculate the expected latency and total number of iterations. The CORDIC maximum shift value can be empty if the Automatically select CORDIC maximum shift value based on input word length parameter parameter is selected.
[explatency, ~] = embblk.latency.cordicSqrtHDLOptimizedLatency(u,nIterPerReg,maximumShiftValue)
explatency = 23
Retrieve block latency from the simulation.
tDataIn = find(out.logsout.get('validIn').Values.Data == 1); tDataOut = find(out.logsout.get('validOut').Values.Data == 1); actualLatency = tDataOut(1:numSamples)  tDataIn(1:numSamples)
actualLatency = 10×1
23
23
23
23
23
23
23
23
23
23
Ports
Input
u — Value to take square root of
nonnegative realvalued scalar
Value to take square root of, specified as a nonnegative realvalued scalar.
If u is a fixedpoint or scaled double data type, u must use binarypoint scaling. Slopebias representation is not supported for fixedpoint data types. Only binarypoint scaled fixedpoint data types are supported for code generation.
Data Types: single
 double
 fixed point
validIn — Whether input is valid
Boolean
scalar
Whether input is valid, specified as a Boolean scalar. This control signal
indicates when the data from the u input port is valid. When this value
is 1
(true
), the block captures the values at
the u input port. When this value is 0
(false
), the block ignores input samples.
Data Types: Boolean
restart — Whether to clear internal registers
Boolean
Whether to clear internal registers, specified as a Boolean scalar. When this
value is 1
(true
), the block stops the current
calculation and clears all internal registers. When this value is 0
(false
) and the validIn value is
1
(true
), the block begins a new
subframe.
Data Types: Boolean
Output
y — CORDICbased approximation of square root of input
realvalued scalar
CORDICbased approximation of square root of input, returned as a realvalued scalar.
Data Types: single
 double
 fixed point
validOut — Whether output data is valid
Boolean
Whether output data is valid, returned as a Boolean scalar. This control signal
indicates when the data at the output port y is valid. When this value is
1
(true
), the output data is valid. When this
value is 0
(false
), the output data is not
valid.
Data Types: Boolean
Parameters
To edit block parameters interactively, use the Property Inspector. From the Simulink^{®} Toolstrip, on the Simulation tab, in the Prepare gallery, select Property Inspector.
Automatically select CORDIC maximum shift value based on input word length — Automatically select CORDIC maximum shift value based on input word length
on
(default)  off
Automatically select CORDIC maximum shift value based on input word length. When this parameter is selected, the default CORDIC maximum shift value depends on the word length of the input u:
If the input u is fixedpoint or scaled double, the default is the word length minus 1.
If the input u is
single
, the default is23
.If the input u is
double
, the default is52
.
Programmatic Use
To set the block parameter value programmatically, use
the set_param
function.
To get the block parameter value
programmatically, use the get_param
function.
Parameter:  autoMaximumShiftVal 
Values:  on (default)  off 
Data Types:  char  string 
CORDIC maximum shift value — Maximum shift value of hyperbolic vectoring CORDIC
10
(default)  positive integervalued scalar
Maximum shift value of hyperbolic vectoring CORDIC, specified as a positive integervalued scalar.
Dependencies
To enable this parameter, deselect the Automatically select CORDIC maximum shift value based on input word length parameter.
Programmatic Use
To set the block parameter value programmatically, use
the set_param
function.
To get the block parameter value
programmatically, use the get_param
function.
Parameter:  maximumShiftValue 
Values:  10 (default)  positive integervalued scalar 
Data Types:  char  string 
Number of iterations per pipeline register — Number of CORDIC iterations to perform in pipeline stage
1
(default)  positive integervalued scalar
Number of CORDIC iterations to perform in pipeline stage, specified as a positive integervalued scalar. For more information, see Customizable Pipelining.
Programmatic Use
To set the block parameter value programmatically, use
the set_param
function.
To get the block parameter value
programmatically, use the get_param
function.
Parameter:  nIterPerReg 
Values:  1 (default)  positive integervalued scalar 
Data Types:  char  string 
More About
Algorithms
CORDIC
CORDIC is an acronym for COordinate Rotation DIgital Computer. The Givens rotationbased CORDIC algorithm is one of the most hardwareefficient algorithms available because it requires only iterative shiftadd operations (see References). The CORDIC algorithm eliminates the need for explicit multipliers.
For details of the CORDICbased algorithm used in this block, see Compute Square Root Using CORDIC.
How to Interface with the CORDIC Square Root HDL Optimized Block
Because of its fully pipelined nature, the CORDIC Square Root HDL
Optimized block is able to accept input data on any cycle, including consecutive
clock cycles. To send input data to the block, the validIn signal must be
true
. When the block has finished the computation and is ready to send
the output, it will change validOut to true
for one clock
cycle. For inputs set on consecutive cycles, validOut will also be set to
true
on consecutive cycles.
The latency of the block is defined from the input to the corresponding output. For
example in the figure below, from In1
to Out1
,
In2
to Out2
, In3
to
Out3
, etc.
Use the embblk.latency.cordicSqrtHDLOptimizedLatency
function to calculate the latency
of the block and total number of iterations of the block.
Customizable Pipelining
The CORDIC Square Root HDL Optimized block uses fullypipelined
architecture that implements iterative normalization and a CORDICbased square root
algorithm. If the input u is a fixedpoint or scaled double data type, the
block uses multiple pipeline stages for computation. The normalization requires
nextpow2(u.WordLength)
iterations. The number of CORDIC iterations
depends on the CORDIC maximum shift
value. A larger word length can provide higher resolution, but requires more
iterations to process. The CORDIC Square Root HDL Optimized block can perform
multiple iterations per pipeline stage. This results in lower latency at the cost of a
longer critical path in the generated HDL code.
For example, if the word length of the input u is 16
,
normalization requires 4
iterations. If the Automatically
select CORDIC maximum shift value based on input word length parameter is
selected, the CORDIC maximum shift value is 16  1 = 15
and requires
17
iterations. The total number of iterations is 4 + 17 =
21
and the latency of the block is 2 + ceil(total number of
iterations/nIterPerReg)
. If the number of iterations per pipeline register is
set to 1
, then the block latency is 23
; if the number
of iterations per pipeline register is set to 2
, then the block latency
is 13
; etc. If the number of iterations per pipeline register is greater
than the total number of required iterations, the block performs all iterations in one
pipeline stage and the total latency is minimized to 3
.
Hardware Resource Utilization
This block supports HDL code generation using the Simulink HDL Workflow Advisor. For an example, see HDL Code Generation and FPGA Synthesis from Simulink Model (HDL Coder) and Implement Digital Downconverter for FPGA (DSP HDL Toolbox).
This example data was generated by synthesizing the block on a Xilinx^{®} Zynq^{®}7000 xc7z045 SoC. The synthesis tool was Vivado^{®} v2023.1 (win64).
The following parameters were used for synthesis.
Input data type:
sfix16_en10
Automatically select CORDIC maximum shift value based on input word length:
on
Number of iterations per pipeline register:
1
Target frequency: 200 MHz
Resource  Usage  Available  Utilization (%) 

Slice LUTs  966  218600  0.44 
Slice Registers  670  437200  0.15 
DSPs  0  900  0.00 
Block RAM Tile  0  545  0.00 
URAM  0  0 
Value  

Requirement  5 ns (200 MHz) 
Data Path Delay  2.983 ns 
Slack  2.01 ns 
Clock Frequency  334.45 MHz 
Extended Capabilities
HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.
HDL Coder™ provides additional configuration options that affect HDL implementation and synthesized logic.
This block has one default HDL architecture.
General  

ConstrainedOutputPipeline  Number of registers to place at
the outputs by moving existing delays within your design. Distributed
pipelining does not redistribute these registers. The default is

InputPipeline  Number of input pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is

OutputPipeline  Number of output pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is

Only binarypoint scaled fixedpoint data types are supported for code generation.
Version History
Introduced in R2024a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)