Accumulator Data Type not used in ERT generated code

Question

Paul Rancuret on 15 Mar 2021

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/773677-accumulator-data-type-not-used-in-ert-generated-code

Commented: Paul Rancuret on 17 Mar 2021

Accepted Answer: Andy Bartlett

Open in MATLAB Online

Here's an example model (Made/tested using R2017b):

As you can see I've made sure all data types used anywhere in this model are 'uint16' data type. I've also configured the 'sum' block to use this as the accumulator data type by matching the first input.

Here is the _step() function that gets generated when building this model using embedded coder (ERT target):

/* Model step function */
void testCounter_step(void)
{
  /* UnitDelay: '<Root>/Unit Delay' */
  testCounter_B.UnitDelay = testCounter_DW.UnitDelay_DSTATE;
  /* Outport: '<Root>/count' incorporates:
   *  Constant: '<Root>/Constant1'
   *  Sum: '<Root>/Sum'
   */
  testCounter_Y.count = (uint16_T)(((uint32_T)testCounter_B.UnitDelay) +
    ((uint32_T)((uint16_T)1U)));
  /* Update for UnitDelay: '<Root>/Unit Delay' incorporates:
   *  Outport: '<Root>/count'
   */
  testCounter_DW.UnitDelay_DSTATE = testCounter_Y.count;
}

The actual sum operation is converting both inputs to uint32, then performing the addition, then converting back to uint16. Why is it doing this? I thought that specifying the accumulator data type to use the same as first input would avoid these unnecessary conversions?

For reference, here are the hardware settings (should support 16-bit native math using 'short'). Let me know if there are any other model settings which need looked at.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Andy Bartlett on 17 Mar 2021

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/773677-accumulator-data-type-not-used-in-ert-generated-code#answer_650337

Edited: Andy Bartlett on 17 Mar 2021

Open in MATLAB Online

The C language was designed to closely match what computers do. It is common for a computer's CPU to be based around a particular base integer type, such as 16 bits or 32 bits. The core CPU registers will all be this size. Most machine instructions like addition, subtraction, etc. work on this size integer. Some might be available for bigger types, but that is not important to this discussion. The size of this base integer register type will be the size a C compiler declares for int. Other than load and store, it is uncommon for scalar machine instructions like add to work on types smaller than the base integer type.

As an example, let's consider your CPU-compiler-pair with int of 32 bits, short of 16 bits, and char of 8 bits. The CPU will have machine level instructions to add 32 bit values, but not 16 bit or 8 bit values. Instead what happens is a load will pull the 8-bit or 16-bit value from memory and put it into a 32 bit register with appropriate "sign extending" of the extra MS bits based on whether it was a unsigned or signed load instruction. The machine level instruction for 32 bit addition will then be executed. If immediate downcast back to 8 or 16 bits is desired, then the least significant bits would be stored into memory, or a bitwise operation would "wack" the MS bits of the register to make it waddle-and-quack like a 8 or 16 bit value.

Because of this CPU reality, the C language has what are called the "Usual Unary and Binary Rules" for type promotion. For example, the rules state that binarary operations on short or char will get promoted to integer before performing the actual operations. This C language behavior matches what was just described at the machine level, except the subsequent downcast does not automatically happen. If the extra instructions to downcast are desired, the C code author must explicitly include the cast or assign the expression to a smaller type.

The generated C code you showed is just being verbose about the upcasts to integer prior to the addition.

testCounter_Y.count = (uint16_T)(((uint32_T)testCounter_B.UnitDelay) + ((uint32_T)((uint16_T)1U)));

If the text of the code was instead this.

testCounter_Y.count = testCounter_B.UnitDelay + ((uint16_T)1U);

The generated machine code would still be identical (*). Following C language rules, the inputs to the addition would be upcast to integer (32 bits). The addition would be a 32 bit operation. The assignment would still have to do a downcast from the 32 bit integer register into the 16 bit variable. Textual it may look more efficient, but it is really the same.

Summary, there is no efficiency penalty. The text of the C code is just being explicit about what the C language rules dictate and what the CPU will naturally do at the machine instruction level.

(*) There is one small difference between the two pieces of text. C rules say the second text would promote to signed 32 bit int instead of unsigned. But you can still expect the efficiency to be the same assuming a reasonable smart compiler.

4 Comments
Show 2 older commentsHide 2 older comments

Paul Rancuret on 17 Mar 2021

Open in MATLAB Online

Just to follow up - since I was curious, I tested what would happen if I put something like a 16-bit native architecture for the test hardware (still using the same 32-bit native archictecture for device hardware as seen above). In that case, it produced the following code, so that it would leave it up to each individual compiler as to how this was handled:

testCounter_Y.count = testCounter_B.UnitDelay + 1U;

That makes sense, and provides some cross-compiler flexibility.

This is just something to keep in mind if I want to try and use code on different architectures - I really need to make sure those architectures are listed in the hardware settings, since Simulink is more verbose and compiler-specific than I had realized. I had assumed the purpose of allowing us to specify 'accumulator data type' was to provide our ability to control what data type it used in the coded sum operation, so we could better control how suitable that code is to be carried across multiple compilers at the block level manually. I'm not really sure what the purpose of the 'accumulator data type' parameter is, if it's still just going to explicitly use the configured target hardware native size anyway.

Thanks again for your answer Andy!

Andy Bartlett on 17 Mar 2021

Open in MATLAB Online

Hi Paul,

The accumulator type becomes beneficial when the output has less precision or less range than a full precision implementation step would have.

The sum block does internal steps in the range and precision of the accumulator type and then the final answer is cast to the output type.

For example, suppose the sum block has three inputs and is configured as

y = u1 - u2 + u3

with each input being uint32 type

output type is also uint32

and overflows are configured to saturate

The internal operations involve these steps

    accum = cast_to_accum_type( u1 )
    temp = cast_to_accum_type( u2 )
    accum = sat( accum - temp )
    temp = cast_to_accum_type( u3 )
    accum = sat( accum + temp )
    y = cast_to_out_type( accum )

Consider this input

    u1 = 0
    u2 = -105
    u3 = 110

If the accumulator type is uint32 then strange answers can occur.

    accum = u1 = 0
    temp = 105
    accum = sat( accum - temp ) = sat( 0 - 105 ) = 0
    temp = 110
    accum = sat( accum + temp ) = sat( 0 + 110 ) = 110
    y = accum = 110

Now consider doing the operations in an int64 accumulator type

    accum = u1 = 0
    temp = 105
    accum = sat( accum - temp ) = sat( 0 - 105 ) = -105
    temp = 110
    accum = sat( accum + temp ) = sat( -105 + 110 ) = 5
    y = sat( accum ) = sat( 5 ) = 5

With the big accumulator, the ideal answer of 5 is produced.

With the small accumulator identical to the output type, the answer is off by 105.

Regards,

Andy

Paul Rancuret on 17 Mar 2021

Yes that's true. I generally avoid setting blocks to saturate on overflow (both for this reason and to prevent dead code in cases where I know realistically it will never overflow). Thanks again for your answers, this clarification is helpful for me and others I'm sure!

Sign in to comment.

Accumulator Data Type not used in ERT generated code

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Accumulator Data Type not used in ERT generated code

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments