## Fixed-Point Precision Rules for Avoiding Overflow in FIR Filters

Fixed-point FIR filters are commonly implemented on digital signal processors, FPGAs, and ASICs. A fixed-point filter uses fixed-point arithmetic and is represented by an equation with fixed-point coefficients. If the accumulator and output of the FIR filter do not have sufficient bits to represent their data, overflow occurs and distorts the signal. Use these two rules to determine FIR filter precision settings automatically. The aim is to minimize resource utilization (memory/storage and processing elements) while avoiding overflow. Because the rules are optimized based on the input precision, coefficient precision, and the coefficient values, the FIR filter must have nontunable coefficients.

The precision rules define the minimum and the maximum values of the FIR filter output. To determine these values, perform min/max analysis on the FIR filter coefficients.

### Output Limits for FIR Filters

FIR filter is defined by:

$$y[n]={\displaystyle \sum _{k=0}^{N-1}{h}_{k}x[n-k]}$$

*x[n]*is the input signal.*y[n]*is the output signal.*h*is the_{k}*k*filter coefficient.^{th}*N*is the length of the filter.

**Output Limits for FIR Filters with Real Input and Real
Coefficients**

Let the minimum value of the input signal be
*X _{min}*, where

*X*≤ 0, and the maximum value be

_{min}*X*, where

_{max}*X*≥ 0. The minimum output occurs when you multiply the positive coefficients by

_{max}*X*and the negative coefficients by

_{min}*X*. Similarly, the maximum output occurs when you multiply the positive coefficients by

_{max}*X*and the negative coefficients by

_{max}*X*.

_{min}If the sum of all the positive coefficients is

$${G}^{+}={\displaystyle \sum _{k=0,{h}_{k}>0}^{N-1}{h}_{k}}$$

and the sum of all the negative coefficients is denoted as

$${G}^{-}={\displaystyle \sum _{k=0,{h}_{k}<0}^{N-1}{h}_{k}}$$

then you can express the minimum output of the filter as

$${Y}_{\mathrm{min}}={G}^{+}{X}_{\mathrm{min}}+{G}^{-}{X}_{\mathrm{max}}$$

and the maximum output of the filter as

$${Y}_{\mathrm{max}}={G}^{+}{X}_{\mathrm{max}}+{G}^{-}{X}_{\mathrm{min}}$$

Therefore, the output of the filter lies in the interval
[*Y _{min}*,

*Y*].

_{max}**Complex Filter Convolution Equations**

You can define a complex filter (complex inputs and complex coefficients) in terms of the real and imaginary parts of its signals and coefficients:

$$\begin{array}{l}\mathrm{Re}(y[n])={\displaystyle \sum _{k=0}^{N-1}\mathrm{Re}({h}_{k})\mathrm{Re}(x[n-k])-{\displaystyle \sum _{k=0}^{N-1}\mathrm{Im}({h}_{k})\mathrm{Im}(x[n-k])}}\\ \mathrm{Im}(y[n])={\displaystyle \sum _{k=0}^{N-1}\mathrm{Re}({h}_{k})\mathrm{Im}(x[n-k])+{\displaystyle \sum _{k=0}^{N-1}\mathrm{Im}({h}_{k})\mathrm{Re}(x[n-k])}}\end{array}$$

The complex filter is decomposed into four real filters as depicted in the signal flow diagram. Each signal is annotated with an interval denoting its range.

**Output Limits for FIR Filters with Complex Input and Complex
Coefficients**

You can extend the real filter min/max analysis to complex filters. Assume that both
the real and imaginary parts of the input signal lie in the interval
[*X _{min}*,

*X*].

_{max}The complex filter contains two instances of the filter Re(*h _{k}*). Both filters have the same input range and therefore the same output
range in the interval [

*V*,

^{re}_{min}*V*]. Similarly, the complex filter contains two instances of the filter Im(

^{re}_{max}*h*). Both filters have the same output range in the interval [

_{k}*V*,

^{im}_{min}*V*].

^{im}_{max}Based on the min/max analysis of real filters, you can express
*V ^{re}_{min}*,

*V*,

^{re}_{max}*V*, and

^{im}_{min}*V*as:

^{im}_{max}$$\begin{array}{l}{V}_{\mathrm{min}}^{re}={G}_{re}^{+}{X}_{\mathrm{min}}+{G}_{re}^{-}{X}_{\mathrm{max}}\\ {V}_{\mathrm{max}}^{re}={G}_{re}^{+}{X}_{\mathrm{max}}+{G}_{re}^{-}{X}_{\mathrm{min}}\\ {V}_{\mathrm{min}}^{im}={G}_{im}^{+}{X}_{\mathrm{min}}+{G}_{im}^{-}{X}_{\mathrm{max}}\\ {V}_{\mathrm{max}}^{im}={G}_{im}^{+}{X}_{\mathrm{max}}+{G}_{im}^{-}{X}_{\mathrm{min}}\end{array}$$

*G*is the sum of the positive real parts of^{+}_{re}*h*, given by_{k}$${G}_{re}^{+}={\displaystyle \sum _{k=0,\mathrm{Re}({h}_{k})>0}^{N-1}\mathrm{Re}({h}_{k})}$$

*G*is the sum of the negative real parts of^{-}_{re}*h*, given by_{k}$${G}_{re}^{-}={\displaystyle \sum _{k=0,\mathrm{Re}({h}_{k})<0}^{N-1}\mathrm{Re}({h}_{k})}$$

*G*is the sum of the positive imaginary parts of^{+}_{im}*h*, given by_{k}$${G}_{im}^{+}={\displaystyle \sum _{k=0,\mathrm{Im}({h}_{k})>0}^{N-1}\mathrm{Im}({h}_{k})}$$

*G*is the sum of the negative imaginary parts of^{-}_{im}*h*, given by_{k}$${G}_{im}^{-}={\displaystyle \sum _{k=0,\mathrm{Im}({h}_{k})<0}^{N-1}\mathrm{Im}({h}_{k})}$$

The minimum and maximum values of the real and imaginary parts of the output are:

$$\begin{array}{l}{Y}_{\mathrm{min}}^{re}={V}_{\mathrm{min}}^{re}-{V}_{\mathrm{max}}^{im}\\ {Y}_{\mathrm{max}}^{re}={V}_{\mathrm{max}}^{re}-{V}_{\mathrm{min}}^{im}\\ {Y}_{\mathrm{min}}^{im}={V}_{\mathrm{min}}^{re}+{V}_{\mathrm{min}}^{im}\\ {Y}_{\mathrm{max}}^{im}={V}_{\mathrm{max}}^{re}+{V}_{\mathrm{max}}^{im}\end{array}$$

The worst-case minimum and maximum on either the real or imaginary part of the output is given by

$$\begin{array}{l}{Y}_{\mathrm{min}}=\mathrm{min}({Y}_{\mathrm{min}}^{re},{Y}_{\mathrm{min}}^{im})\\ {Y}_{\mathrm{max}}=\mathrm{max}({Y}_{\mathrm{max}}^{re},{Y}_{\mathrm{max}}^{im})\end{array}$$

### Fixed-Point Precision Rules

The fixed-point precision rules define the output word length and fraction length of the filter in terms of the accumulator word length and fraction length.

**Full-Precision Accumulator Rule**

Assume that the input is a signed or unsigned fixed-point signal with word length
*W _{x}* and fraction length

*F*. Also assume that the coefficients are signed or unsigned fixed-point values with fraction length

_{x}*F*. You can now define full precision as the fixed-point settings that minimize the word length of the accumulator while avoiding overflow or any loss of precision.

_{h}The accumulator fraction length is equal to the product fraction length, which is the sum of the input and coefficient fraction lengths.

$${F}_{a}={F}_{x}+{F}_{h}$$

If

*Y*= 0, then the accumulator is unsigned with word length_{min}$${W}_{a}=\lceil {\mathrm{log}}_{2}({Y}_{\mathrm{max}}{2}^{{F}_{a}}+1)\rceil $$

If *Y _{min}* < 0, then the accumulator is signed with word length

$${W}_{a}=\lceil {\mathrm{log}}_{2}(\mathrm{max}(-{Y}_{\mathrm{min}}{2}^{{F}_{a}},{Y}_{\mathrm{max}}{2}^{{F}_{a}}+1))\rceil +1$$

The ceil operator rounds to the nearest integer towards +∞.

**Output Same Word Length as Input Rule**

This rule sets the output word length to be the same as the input word length. Then,
it adjusts the fraction length to avoid overflow.
*W _{q}* is the output word length and

*F*is the output fraction length.

_{q}Truncate the accumulator to make the output word length same as the input word length.

$${W}_{q}={W}_{x}$$

.

Set the output fraction length *F _{q}* to

$${F}_{q}={F}_{a}-({W}_{a}-{W}_{x})$$

.

### Polyphase Interpolators and Decimators

You can extend these rules to polyphase FIR interpolators and decimators.

**FIR Interpolators**

Treat each polyphase branch of the FIR interpolator as a separate FIR filter. The output data type of the FIR interpolator is the worst-case data type of all the polyphase branches.

**FIR Decimators**

For decimators, the polyphase branches add up at the output. Hence, the output data type is computed as if it were a single FIR filter with all the coefficients of all the polyphase branches.