Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

microblaze: Fix -Os right shift optimization is allowed into delay slot #37

Draft
wants to merge 1 commit into
base: zephyr-gcc-12.2.0
Choose a base branch
from

Conversation

alpsayin
Copy link

In picolibc testing it's found that this produces code that compiler
squeezes into a single delay slot. And thus only the first instruction
emitted by this optimization is run and the rest is skipped.
Optimization is generated by

  [(set (match_operand:SI 0 "register_operand" "=&d")
       (ashiftrt:SI (match_operand:SI 1 "register_operand"  "d")
                   (match_operand:SI 2 "immediate_operand" "I")))]
  "(INTVAL (operands[2]) > 5 && optimize_size)"
  {
    operands[3] = gen_rtx_REG (SImode, MB_ABI_ASM_TEMP_REGNUM);

    output_asm_insn ("ori\t%3,r0,%2", operands);
    if (REGNO (operands[0]) != REGNO (operands[1]))
        output_asm_insn ("addk\t%0,%1,r0", operands);

    output_asm_insn ("addik\t%3,%3,-1", operands);
    output_asm_insn ("bneid\t%3,.-4", operands);
    return "sra\t%0,%0";
  }
  [(set_attr "type"    "arith")
  (set_attr "mode"    "SI")
  (set_attr "length"  "20")]

But arith type is not disallowed from going into delay slot (somehow)

[(and (eq_attr "type" "!branch,call,jump,icmp,multi,no_delay_arith,no_delay_load,no_delay_store,no_delay_imul,no_delay_move,darith")

Optimization generated code is between [191b8-191c8]

   191a8:	bc830194 	bgti	r3, 404		// 1933c
    if (subnormal_y) { /* subnormal y */
   191ac:	b0007ff0 	imm	32752
   191b0:	a47c0000 	andi	r3, r28, 0
   191b4:	be2301ec 	bneid	r3, 492		// 193a0
   191b8:	a2400014 	ori	r18, r0, 20
   191bc:	131e0000 	addk	r24, r30, r0
   191c0:	3252ffff 	addik	r18, r18, -1
   191c4:	be32fffc 	bneid	r18, -4		// 191c0
   191c8:	93180001 	sra	r24, r24
...
        iy = (hy >> 20) - 1023;
   193a0:	b810fe40 	brid	-448		// 191e0
   193a4:	3318fc01 	addik	r24, r24, -1023

where operands are:

operands[0] = r24
operands[1] = r29
operands[2] = 20
operands[3] = r18

As a result this code returns a iy (r24) value of whatever was in r24 - 1023`

The fix is simple. I've redeclated size-optimization as multi which is

  1. not delay-slot allowed
  2. Also the same type for other shift optimizations (they're left shift optimizations)
  3. [(set_attr "type" "multi")
  4. [(set_attr "type" "multi")

Currently under test via zephyrproject-rtos/sdk-ng#647

In picolibc testing it's found that this produces code that compiler
squeezes into a single delay slot. And thus only the first instruction
emitted by this optimization is run and the rest is skipped.

Signed-off-by: Alp Sayin <[email protected]>
@alpsayin alpsayin force-pushed the zephyr-gcc-12.2.0-bad-Os-shift-optimisation branch from 39eb6ac to 0e3007a Compare October 18, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant