GNU Arm Embedded Toolchain

G++ generating redundant code

Asked by Michael Haben on 2016-09-09

(This is using the 2016q2 release of the toolchain, GCC 5.4.1).

Use first arm-none-eabi-gcc and then arm-none-eabi-g++ to compile this (without any optimisation i.e. -O0):

void doStuff(void);
int getNum(void);

void compare1(int n)
{
if(n > 33)
{
doStuff();
}
if(getNum() > 44)
{
doStuff();
}
}

GCC generates pretty much what I'd expect:

  27 0010 08301BE5 ldr r3, [fp, #-8]
  28 0014 210053E3 cmp r3, #33
  29 0018 000000DA ble .L2
  30 001c FEFFFFEB bl doStuff
  31 .L2:
  32 0020 FEFFFFEB bl getNum
  33 0024 0030A0E1 mov r3, r0
  34 0028 2C0053E3 cmp r3, #44
  35 002c 000000DA ble .L4
  36 0030 FEFFFFEB bl doStuff
  37 .L4:

But from G++:

  32 0010 08301BE5 ldr r3, [fp, #-8]
  33 0014 210053E3 cmp r3, #33
  34 0018 000000DA ble .L2
  35 001c FEFFFFEB bl _Z8doStuffv
  36 .L2:
  37 0020 FEFFFFEB bl _Z6getNumv
  38 0024 0030A0E1 mov r3, r0
  39 0028 2C0053E3 cmp r3, #44
  40 002c 0130A0C3 movgt r3, #1 <<< why is
  41 0030 0030A0D3 movle r3, #0 <<< all this
  42 0034 FF3003E2 and r3, r3, #255 <<< extra code
  43 0038 000053E3 cmp r3, #0 <<< generated?
  44 003c 0000000A beq .L4
  45 0040 FEFFFFEB bl _Z8doStuffv
  46 .L4:

It gets worse if G++ is given the -mthumb switch:

  40 0012 FFF7FEFF bl _Z6getNumv
  41 0016 0200 movs r2, r0
  42 0018 0123 movs r3, #1 <<< again
  43 001a 2C2A cmp r2, #44
  44 001c 00DC bgt .L3
  45 001e 0023 movs r3, #0 <<< that
  46 .L3:
  47 0020 1B06 lsls r3, r3, #24 <<< redundant
  48 0022 1B0E lsrs r3, r3, #24 <<< code
  49 0024 01D0 beq .L5 <<< and even an extra branch instruction!
  50 0026 FFF7FEFF bl _Z8doStuffv
  51 .L5:

And with the -mcpu=cortex-m3 switch:

  40 0016 0346 mov r3, r0
  41 0018 2C2B cmp r3, #44
  42 001a CCBF ite gt <<< good
  43 001c 0123 movgt r3, #1 <<< use
  44 001e 0023 movle r3, #0 <<< of
  45 0020 DBB2 uxtb r3, r3 <<< Thumb-2
  46 0022 002B cmp r3, #0 <<< conditional-execution!
  47 0024 01D0 beq .L4
  48 0026 FFF7FEFF bl _Z8doStuffv
  49 .L4:

It seems to be using a uint8_t as a flag, initially setting it true, changing it to false if the condition is not met, then zero-extending the uint8_t to 32 bits before jumping based on its value - 7 instructions instead of 2 (and using an extra register). So two questions - why is it using a flag at all, and why is the flag handled as a byte rather than a word?

Admittedly, the redundant instructions disappear if I enable optimisation O1, but that's not ideal for debugging.

(Why am I bothered? Because I'm looking at whether the Keil product justifies its cost, and one of the differences I've found is the efficiency of code it generates without optimisations enabled. It would really be better for us devs if we could keep on using Atmel Studio and feel confident that arm-g++ wasn't eating our precious code-space and CPU cycles with stuff like this! Friday rant over...)

Thanks,
Mike H

Question information

Language:: English Edit question

Status:: Solved

For:: GNU Arm Embedded Toolchain Edit question

Assignee:: No assignee Edit question

Solved by:: Michael Haben

Solved:: 2016-09-12

Last query:: 2016-09-12

Last reply:: 2016-09-09

Link existing bug

Revision history for this message

Andre Vieira (andre-simoesdiasvieira) said on 2016-09-09:

Hi Mike,

Usually I would discard such things as "O0 is supposed to be dumb", but this does look rather odd. I suggest you file a bug upstream, see https://gcc.gnu.org/bugzilla/ .

May I ask why you don't use "-Og"?

Regards,
Andre

Revision history for this message

Michael Haben (mike.haben) said on 2016-09-09:

Hi Andre,
thanks for replying - I'm glad I'm not the only one who regards this as a bug! I'll file it with GNU...

I've read of other people having problems debugging even using -Og - it seems that "optimizations that do not interfere with debugging" don't always not-interfere. I'd feel more confident in a compiler that didn't generate oddities like this in the first place, rather than relying on the optimiser to clean them up.

best regards,
Mike H.

Revision history for this message

Michael Haben (mike.haben) said on 2016-09-12:

Reply from one of the GCC maintainers:

<QUOTE>
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is not a bug which we are going to fix.
The reason why C++ is different is due to the getNum() needs to be wrapped with
a tree which allows for eh (exception handling) to be correct. GCC likes to wrap "complex"
statement but does not go and remove the wrapping when it can be proved it is
no longer needed.
If there was a temp associated with getNum you would need a cleanup statements
which is why the wrapping happens in the first place.

Basically GCC project has mentioned -O0 code generation is not always the best,
C++ is even worse.

By the way you get the same code between the two front-ends if you used a
variable for the return of getNum().
</QUOTE>

So... Clang, anyone?

Revision history for this message

Michael Haben (mike.haben) said on 2016-09-12:

Eventually answered my own question, sort-of: using the single optimisation-switch -ftree-ter (Temporary Expression Replacement) eliminates the extraneous code. Keepin' the faith with GCC!

To post a message you must log in.

Ask a question

Edit question

GNU Arm Embedded Toolchain

G++ generating redundant code

Question information

Related bugs

Related FAQ:

Subscribers