Skip to content

Conversation

@pfusik
Copy link
Contributor

@pfusik pfusik commented Nov 6, 2025

This changes muls by 3 << C from (X << C + 2) - (X << C)
to (X << C + 1) + (X << C).
If Zba is available, the output is not affected as we emit
(shl (sh1add X, X), C) instead.

There are two advantages:

  • ADD is more compressible
  • Often a reduced instruction count, by a heuristic that
    (X << C + 1) is more likely to have another use than (X << C + 2)
@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Piotr Fusik (pfusik)

Changes

ADD is more compressible.


Patch is 130.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/166757.diff

14 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/mul.ll (+28-28)
  • (modified) llvm/test/CodeGen/RISCV/pr145360.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rv32xtheadba.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rv32zba.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/rv64xtheadba.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rv64zba.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll (+11-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll (+4-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll (+36-39)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll (+600-640)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vreductions-fp-sdnode.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/xqciac.ll (+2-2)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp index 995ae75da1c30..411ca744b1e7e 100644 --- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp @@ -16477,12 +16477,12 @@ static SDValue expandMulToAddOrSubOfShl(SDNode *N, SelectionDAG &DAG, uint64_t MulAmtLowBit = MulAmt & (-MulAmt); ISD::NodeType Op; uint64_t ShiftAmt1; - if (isPowerOf2_64(MulAmt + MulAmtLowBit)) { - Op = ISD::SUB; - ShiftAmt1 = MulAmt + MulAmtLowBit; - } else if (isPowerOf2_64(MulAmt - MulAmtLowBit)) { + if (isPowerOf2_64(MulAmt - MulAmtLowBit)) { Op = ISD::ADD; ShiftAmt1 = MulAmt - MulAmtLowBit; + } else if (isPowerOf2_64(MulAmt + MulAmtLowBit)) { + Op = ISD::SUB; + ShiftAmt1 = MulAmt + MulAmtLowBit; } else { return SDValue(); } diff --git a/llvm/test/CodeGen/RISCV/mul.ll b/llvm/test/CodeGen/RISCV/mul.ll index 4c9a98cabb15f..4533e14c672e7 100644 --- a/llvm/test/CodeGen/RISCV/mul.ll +++ b/llvm/test/CodeGen/RISCV/mul.ll @@ -1185,29 +1185,29 @@ define i32 @muli32_p384(i32 %a) nounwind { ; RV32I-LABEL: muli32_p384: ; RV32I: # %bb.0: ; RV32I-NEXT: slli a1, a0, 7 -; RV32I-NEXT: slli a0, a0, 9 -; RV32I-NEXT: sub a0, a0, a1 +; RV32I-NEXT: slli a0, a0, 8 +; RV32I-NEXT: add a0, a0, a1 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: muli32_p384: ; RV32IM: # %bb.0: ; RV32IM-NEXT: slli a1, a0, 7 -; RV32IM-NEXT: slli a0, a0, 9 -; RV32IM-NEXT: sub a0, a0, a1 +; RV32IM-NEXT: slli a0, a0, 8 +; RV32IM-NEXT: add a0, a0, a1 ; RV32IM-NEXT: ret ; ; RV64I-LABEL: muli32_p384: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a1, a0, 7 -; RV64I-NEXT: slli a0, a0, 9 -; RV64I-NEXT: sub a0, a0, a1 +; RV64I-NEXT: slli a0, a0, 8 +; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: muli32_p384: ; RV64IM: # %bb.0: ; RV64IM-NEXT: slli a1, a0, 7 -; RV64IM-NEXT: slli a0, a0, 9 -; RV64IM-NEXT: subw a0, a0, a1 +; RV64IM-NEXT: slli a0, a0, 8 +; RV64IM-NEXT: addw a0, a0, a1 ; RV64IM-NEXT: ret %1 = mul i32 %a, 384 ret i32 %1 @@ -1217,29 +1217,29 @@ define i32 @muli32_p12288(i32 %a) nounwind { ; RV32I-LABEL: muli32_p12288: ; RV32I: # %bb.0: ; RV32I-NEXT: slli a1, a0, 12 -; RV32I-NEXT: slli a0, a0, 14 -; RV32I-NEXT: sub a0, a0, a1 +; RV32I-NEXT: slli a0, a0, 13 +; RV32I-NEXT: add a0, a0, a1 ; RV32I-NEXT: ret ; ; RV32IM-LABEL: muli32_p12288: ; RV32IM: # %bb.0: ; RV32IM-NEXT: slli a1, a0, 12 -; RV32IM-NEXT: slli a0, a0, 14 -; RV32IM-NEXT: sub a0, a0, a1 +; RV32IM-NEXT: slli a0, a0, 13 +; RV32IM-NEXT: add a0, a0, a1 ; RV32IM-NEXT: ret ; ; RV64I-LABEL: muli32_p12288: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a1, a0, 12 -; RV64I-NEXT: slli a0, a0, 14 -; RV64I-NEXT: sub a0, a0, a1 +; RV64I-NEXT: slli a0, a0, 13 +; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: muli32_p12288: ; RV64IM: # %bb.0: ; RV64IM-NEXT: slli a1, a0, 12 -; RV64IM-NEXT: slli a0, a0, 14 -; RV64IM-NEXT: subw a0, a0, a1 +; RV64IM-NEXT: slli a0, a0, 13 +; RV64IM-NEXT: addw a0, a0, a1 ; RV64IM-NEXT: ret %1 = mul i32 %a, 12288 ret i32 %1 @@ -2117,14 +2117,14 @@ define i64 @muland_demand(i64 %x) nounwind { ; RV32IM: # %bb.0: ; RV32IM-NEXT: andi a0, a0, -8 ; RV32IM-NEXT: slli a2, a1, 2 -; RV32IM-NEXT: slli a1, a1, 4 -; RV32IM-NEXT: sub a1, a1, a2 +; RV32IM-NEXT: slli a1, a1, 3 +; RV32IM-NEXT: add a1, a1, a2 ; RV32IM-NEXT: li a2, 12 ; RV32IM-NEXT: mulhu a2, a0, a2 ; RV32IM-NEXT: add a1, a2, a1 ; RV32IM-NEXT: slli a2, a0, 2 -; RV32IM-NEXT: slli a0, a0, 4 -; RV32IM-NEXT: sub a0, a0, a2 +; RV32IM-NEXT: slli a0, a0, 3 +; RV32IM-NEXT: add a0, a0, a2 ; RV32IM-NEXT: ret ; ; RV64I-LABEL: muland_demand: @@ -2133,16 +2133,16 @@ define i64 @muland_demand(i64 %x) nounwind { ; RV64I-NEXT: srli a1, a1, 2 ; RV64I-NEXT: and a0, a0, a1 ; RV64I-NEXT: slli a1, a0, 2 -; RV64I-NEXT: slli a0, a0, 4 -; RV64I-NEXT: sub a0, a0, a1 +; RV64I-NEXT: slli a0, a0, 3 +; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: muland_demand: ; RV64IM: # %bb.0: ; RV64IM-NEXT: andi a0, a0, -8 ; RV64IM-NEXT: slli a1, a0, 2 -; RV64IM-NEXT: slli a0, a0, 4 -; RV64IM-NEXT: sub a0, a0, a1 +; RV64IM-NEXT: slli a0, a0, 3 +; RV64IM-NEXT: add a0, a0, a1 ; RV64IM-NEXT: ret %and = and i64 %x, 4611686018427387896 %mul = mul i64 %and, 12 @@ -2171,15 +2171,15 @@ define i64 @mulzext_demand(i32 signext %x) nounwind { ; RV64I-LABEL: mulzext_demand: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a1, a0, 32 -; RV64I-NEXT: slli a0, a0, 34 -; RV64I-NEXT: sub a0, a0, a1 +; RV64I-NEXT: slli a0, a0, 33 +; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64IM-LABEL: mulzext_demand: ; RV64IM: # %bb.0: ; RV64IM-NEXT: slli a1, a0, 32 -; RV64IM-NEXT: slli a0, a0, 34 -; RV64IM-NEXT: sub a0, a0, a1 +; RV64IM-NEXT: slli a0, a0, 33 +; RV64IM-NEXT: add a0, a0, a1 ; RV64IM-NEXT: ret %ext = zext i32 %x to i64 %mul = mul i64 %ext, 12884901888 diff --git a/llvm/test/CodeGen/RISCV/pr145360.ll b/llvm/test/CodeGen/RISCV/pr145360.ll index 1c77fadbd4b7d..013bab4ce6292 100644 --- a/llvm/test/CodeGen/RISCV/pr145360.ll +++ b/llvm/test/CodeGen/RISCV/pr145360.ll @@ -27,11 +27,11 @@ define i32 @unsigned(i32 %0, ptr %1) { ; CHECK-NEXT: slli a4, a3, 32 ; CHECK-NEXT: mulhu a2, a2, a4 ; CHECK-NEXT: srli a2, a2, 36 -; CHECK-NEXT: slli a4, a2, 5 -; CHECK-NEXT: slli a2, a2, 3 -; CHECK-NEXT: sub a2, a2, a4 +; CHECK-NEXT: slli a4, a2, 3 +; CHECK-NEXT: slli a2, a2, 4 +; CHECK-NEXT: add a2, a2, a4 ; CHECK-NEXT: srliw a4, a0, 3 -; CHECK-NEXT: add a2, a0, a2 +; CHECK-NEXT: sub a2, a0, a2 ; CHECK-NEXT: mulw a0, a4, a3 ; CHECK-NEXT: sw a2, 0(a1) ; CHECK-NEXT: ret @@ -68,10 +68,10 @@ define i32 @unsigned_div_first(i32 %0, ptr %1) { ; CHECK-NEXT: slli a3, a3, 32 ; CHECK-NEXT: mulhu a2, a2, a3 ; CHECK-NEXT: srli a2, a2, 36 -; CHECK-NEXT: slli a3, a2, 5 -; CHECK-NEXT: slli a4, a2, 3 -; CHECK-NEXT: sub a4, a4, a3 -; CHECK-NEXT: add a0, a0, a4 +; CHECK-NEXT: slli a3, a2, 3 +; CHECK-NEXT: slli a4, a2, 4 +; CHECK-NEXT: add a3, a4, a3 +; CHECK-NEXT: sub a0, a0, a3 ; CHECK-NEXT: sw a0, 0(a1) ; CHECK-NEXT: mv a0, a2 ; CHECK-NEXT: ret diff --git a/llvm/test/CodeGen/RISCV/rv32xtheadba.ll b/llvm/test/CodeGen/RISCV/rv32xtheadba.ll index 0e4a5c07020ee..fd341da86599f 100644 --- a/llvm/test/CodeGen/RISCV/rv32xtheadba.ll +++ b/llvm/test/CodeGen/RISCV/rv32xtheadba.ll @@ -98,8 +98,8 @@ define i32 @addmul6(i32 %a, i32 %b) { ; RV32I-LABEL: addmul6: ; RV32I: # %bb.0: ; RV32I-NEXT: slli a2, a0, 1 -; RV32I-NEXT: slli a0, a0, 3 -; RV32I-NEXT: sub a0, a0, a2 +; RV32I-NEXT: slli a0, a0, 2 +; RV32I-NEXT: add a0, a0, a2 ; RV32I-NEXT: add a0, a0, a1 ; RV32I-NEXT: ret ; @@ -136,8 +136,8 @@ define i32 @addmul12(i32 %a, i32 %b) { ; RV32I-LABEL: addmul12: ; RV32I: # %bb.0: ; RV32I-NEXT: slli a2, a0, 2 -; RV32I-NEXT: slli a0, a0, 4 -; RV32I-NEXT: sub a0, a0, a2 +; RV32I-NEXT: slli a0, a0, 3 +; RV32I-NEXT: add a0, a0, a2 ; RV32I-NEXT: add a0, a0, a1 ; RV32I-NEXT: ret ; @@ -193,8 +193,8 @@ define i32 @addmul24(i32 %a, i32 %b) { ; RV32I-LABEL: addmul24: ; RV32I: # %bb.0: ; RV32I-NEXT: slli a2, a0, 3 -; RV32I-NEXT: slli a0, a0, 5 -; RV32I-NEXT: sub a0, a0, a2 +; RV32I-NEXT: slli a0, a0, 4 +; RV32I-NEXT: add a0, a0, a2 ; RV32I-NEXT: add a0, a0, a1 ; RV32I-NEXT: ret ; @@ -269,8 +269,8 @@ define i32 @mul96(i32 %a) { ; RV32I-LABEL: mul96: ; RV32I: # %bb.0: ; RV32I-NEXT: slli a1, a0, 5 -; RV32I-NEXT: slli a0, a0, 7 -; RV32I-NEXT: sub a0, a0, a1 +; RV32I-NEXT: slli a0, a0, 6 +; RV32I-NEXT: add a0, a0, a1 ; RV32I-NEXT: ret ; ; RV32XTHEADBA-LABEL: mul96: diff --git a/llvm/test/CodeGen/RISCV/rv32zba.ll b/llvm/test/CodeGen/RISCV/rv32zba.ll index a6dbd94caad4f..ea9d117f2e2e3 100644 --- a/llvm/test/CodeGen/RISCV/rv32zba.ll +++ b/llvm/test/CodeGen/RISCV/rv32zba.ll @@ -85,8 +85,8 @@ define i32 @addmul6(i32 %a, i32 %b) { ; RV32I-LABEL: addmul6: ; RV32I: # %bb.0: ; RV32I-NEXT: slli a2, a0, 1 -; RV32I-NEXT: slli a0, a0, 3 -; RV32I-NEXT: sub a0, a0, a2 +; RV32I-NEXT: slli a0, a0, 2 +; RV32I-NEXT: add a0, a0, a2 ; RV32I-NEXT: add a0, a0, a1 ; RV32I-NEXT: ret ; @@ -135,8 +135,8 @@ define i32 @addmul12(i32 %a, i32 %b) { ; RV32I-LABEL: addmul12: ; RV32I: # %bb.0: ; RV32I-NEXT: slli a2, a0, 2 -; RV32I-NEXT: slli a0, a0, 4 -; RV32I-NEXT: sub a0, a0, a2 +; RV32I-NEXT: slli a0, a0, 3 +; RV32I-NEXT: add a0, a0, a2 ; RV32I-NEXT: add a0, a0, a1 ; RV32I-NEXT: ret ; @@ -210,8 +210,8 @@ define i32 @addmul24(i32 %a, i32 %b) { ; RV32I-LABEL: addmul24: ; RV32I: # %bb.0: ; RV32I-NEXT: slli a2, a0, 3 -; RV32I-NEXT: slli a0, a0, 5 -; RV32I-NEXT: sub a0, a0, a2 +; RV32I-NEXT: slli a0, a0, 4 +; RV32I-NEXT: add a0, a0, a2 ; RV32I-NEXT: add a0, a0, a1 ; RV32I-NEXT: ret ; @@ -310,8 +310,8 @@ define i32 @mul96(i32 %a) { ; RV32I-LABEL: mul96: ; RV32I: # %bb.0: ; RV32I-NEXT: slli a1, a0, 5 -; RV32I-NEXT: slli a0, a0, 7 -; RV32I-NEXT: sub a0, a0, a1 +; RV32I-NEXT: slli a0, a0, 6 +; RV32I-NEXT: add a0, a0, a1 ; RV32I-NEXT: ret ; ; RV32ZBA-LABEL: mul96: @@ -1272,8 +1272,8 @@ define ptr @shl_add_knownbits(ptr %p, i32 %i) { ; RV32I-NEXT: slli a1, a1, 18 ; RV32I-NEXT: srli a1, a1, 18 ; RV32I-NEXT: slli a2, a1, 1 -; RV32I-NEXT: slli a1, a1, 3 -; RV32I-NEXT: sub a1, a1, a2 +; RV32I-NEXT: slli a1, a1, 2 +; RV32I-NEXT: add a1, a1, a2 ; RV32I-NEXT: srli a1, a1, 3 ; RV32I-NEXT: add a0, a0, a1 ; RV32I-NEXT: ret diff --git a/llvm/test/CodeGen/RISCV/rv64xtheadba.ll b/llvm/test/CodeGen/RISCV/rv64xtheadba.ll index f4964288e3541..c57dfca1389b6 100644 --- a/llvm/test/CodeGen/RISCV/rv64xtheadba.ll +++ b/llvm/test/CodeGen/RISCV/rv64xtheadba.ll @@ -94,8 +94,8 @@ define i64 @addmul6(i64 %a, i64 %b) { ; RV64I-LABEL: addmul6: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a2, a0, 1 -; RV64I-NEXT: slli a0, a0, 3 -; RV64I-NEXT: sub a0, a0, a2 +; RV64I-NEXT: slli a0, a0, 2 +; RV64I-NEXT: add a0, a0, a2 ; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; @@ -113,8 +113,8 @@ define i64 @disjointormul6(i64 %a, i64 %b) { ; RV64I-LABEL: disjointormul6: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a2, a0, 1 -; RV64I-NEXT: slli a0, a0, 3 -; RV64I-NEXT: sub a0, a0, a2 +; RV64I-NEXT: slli a0, a0, 2 +; RV64I-NEXT: add a0, a0, a2 ; RV64I-NEXT: or a0, a0, a1 ; RV64I-NEXT: ret ; @@ -151,8 +151,8 @@ define i64 @addmul12(i64 %a, i64 %b) { ; RV64I-LABEL: addmul12: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a2, a0, 2 -; RV64I-NEXT: slli a0, a0, 4 -; RV64I-NEXT: sub a0, a0, a2 +; RV64I-NEXT: slli a0, a0, 3 +; RV64I-NEXT: add a0, a0, a2 ; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; @@ -227,8 +227,8 @@ define i64 @addmul24(i64 %a, i64 %b) { ; RV64I-LABEL: addmul24: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a2, a0, 3 -; RV64I-NEXT: slli a0, a0, 5 -; RV64I-NEXT: sub a0, a0, a2 +; RV64I-NEXT: slli a0, a0, 4 +; RV64I-NEXT: add a0, a0, a2 ; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; @@ -527,8 +527,8 @@ define i64 @mul96(i64 %a) { ; RV64I-LABEL: mul96: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a1, a0, 5 -; RV64I-NEXT: slli a0, a0, 7 -; RV64I-NEXT: sub a0, a0, a1 +; RV64I-NEXT: slli a0, a0, 6 +; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64XTHEADBA-LABEL: mul96: @@ -990,8 +990,8 @@ define signext i32 @mulw192(i32 signext %a) { ; RV64I-LABEL: mulw192: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a1, a0, 6 -; RV64I-NEXT: slli a0, a0, 8 -; RV64I-NEXT: subw a0, a0, a1 +; RV64I-NEXT: slli a0, a0, 7 +; RV64I-NEXT: addw a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64XTHEADBA-LABEL: mulw192: diff --git a/llvm/test/CodeGen/RISCV/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64zba.ll index d4b228828c04d..cc3a7a195e3b4 100644 --- a/llvm/test/CodeGen/RISCV/rv64zba.ll +++ b/llvm/test/CodeGen/RISCV/rv64zba.ll @@ -489,8 +489,8 @@ define i64 @addmul6(i64 %a, i64 %b) { ; RV64I-LABEL: addmul6: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a2, a0, 1 -; RV64I-NEXT: slli a0, a0, 3 -; RV64I-NEXT: sub a0, a0, a2 +; RV64I-NEXT: slli a0, a0, 2 +; RV64I-NEXT: add a0, a0, a2 ; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; @@ -514,8 +514,8 @@ define i64 @disjointormul6(i64 %a, i64 %b) { ; RV64I-LABEL: disjointormul6: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a2, a0, 1 -; RV64I-NEXT: slli a0, a0, 3 -; RV64I-NEXT: sub a0, a0, a2 +; RV64I-NEXT: slli a0, a0, 2 +; RV64I-NEXT: add a0, a0, a2 ; RV64I-NEXT: or a0, a0, a1 ; RV64I-NEXT: ret ; @@ -564,8 +564,8 @@ define i64 @addmul12(i64 %a, i64 %b) { ; RV64I-LABEL: addmul12: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a2, a0, 2 -; RV64I-NEXT: slli a0, a0, 4 -; RV64I-NEXT: sub a0, a0, a2 +; RV64I-NEXT: slli a0, a0, 3 +; RV64I-NEXT: add a0, a0, a2 ; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; @@ -692,8 +692,8 @@ define i64 @addmul24(i64 %a, i64 %b) { ; RV64I-LABEL: addmul24: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a2, a0, 3 -; RV64I-NEXT: slli a0, a0, 5 -; RV64I-NEXT: sub a0, a0, a2 +; RV64I-NEXT: slli a0, a0, 4 +; RV64I-NEXT: add a0, a0, a2 ; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; @@ -1250,8 +1250,8 @@ define i64 @mul96(i64 %a) { ; RV64I-LABEL: mul96: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a1, a0, 5 -; RV64I-NEXT: slli a0, a0, 7 -; RV64I-NEXT: sub a0, a0, a1 +; RV64I-NEXT: slli a0, a0, 6 +; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64ZBA-LABEL: mul96: @@ -1490,8 +1490,8 @@ define i64 @zext_mul96(i32 signext %a) { ; RV64I: # %bb.0: ; RV64I-NEXT: slli a0, a0, 32 ; RV64I-NEXT: srli a1, a0, 27 -; RV64I-NEXT: srli a0, a0, 25 -; RV64I-NEXT: sub a0, a0, a1 +; RV64I-NEXT: srli a0, a0, 26 +; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64ZBA-LABEL: zext_mul96: @@ -1568,8 +1568,8 @@ define i64 @zext_mul12884901888(i32 signext %a) { ; RV64I-LABEL: zext_mul12884901888: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a1, a0, 32 -; RV64I-NEXT: slli a0, a0, 34 -; RV64I-NEXT: sub a0, a0, a1 +; RV64I-NEXT: slli a0, a0, 33 +; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64ZBA-LABEL: zext_mul12884901888: @@ -2180,8 +2180,8 @@ define signext i32 @mulw192(i32 signext %a) { ; RV64I-LABEL: mulw192: ; RV64I: # %bb.0: ; RV64I-NEXT: slli a1, a0, 6 -; RV64I-NEXT: slli a0, a0, 8 -; RV64I-NEXT: subw a0, a0, a1 +; RV64I-NEXT: slli a0, a0, 7 +; RV64I-NEXT: addw a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64ZBA-LABEL: mulw192: @@ -3899,8 +3899,8 @@ define i64 @regression(i32 signext %x, i32 signext %y) { ; RV64I-NEXT: sub a0, a0, a1 ; RV64I-NEXT: slli a0, a0, 32 ; RV64I-NEXT: srli a1, a0, 29 -; RV64I-NEXT: srli a0, a0, 27 -; RV64I-NEXT: sub a0, a0, a1 +; RV64I-NEXT: srli a0, a0, 28 +; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64ZBA-LABEL: regression: @@ -4034,8 +4034,8 @@ define i64 @bext_mul12(i32 %1, i32 %2) { ; RV64I-NEXT: srlw a0, a0, a1 ; RV64I-NEXT: andi a0, a0, 1 ; RV64I-NEXT: slli a1, a0, 2 -; RV64I-NEXT: slli a0, a0, 4 -; RV64I-NEXT: sub a0, a0, a1 +; RV64I-NEXT: slli a0, a0, 3 +; RV64I-NEXT: or a0, a0, a1 ; RV64I-NEXT: ret ; ; RV64ZBANOZBB-LABEL: bext_mul12: @@ -4832,8 +4832,8 @@ define ptr @shl_add_knownbits(ptr %p, i64 %i) { ; RV64I-NEXT: slli a1, a1, 50 ; RV64I-NEXT: srli a1, a1, 50 ; RV64I-NEXT: slli a2, a1, 1 -; RV64I-NEXT: slli a1, a1, 3 -; RV64I-NEXT: sub a1, a1, a2 +; RV64I-NEXT: slli a1, a1, 2 +; RV64I-NEXT: add a1, a1, a2 ; RV64I-NEXT: srli a1, a1, 3 ; RV64I-NEXT: add a0, a0, a1 ; RV64I-NEXT: ret diff --git a/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll b/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll index bd912193c4fed..39732602cc85e 100644 --- a/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll +++ b/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll @@ -72,9 +72,8 @@ define fastcc <vscale x 64 x i32> @ret_split_nxv64i32(ptr %x) { ; CHECK-NEXT: csrr a2, vlenb ; CHECK-NEXT: vl8re32.v v8, (a1) ; CHECK-NEXT: slli a3, a2, 3 -; CHECK-NEXT: slli a4, a2, 5 ; CHECK-NEXT: slli a2, a2, 4 -; CHECK-NEXT: sub a4, a4, a3 +; CHECK-NEXT: add a4, a2, a3 ; CHECK-NEXT: add a5, a1, a2 ; CHECK-NEXT: vl8re32.v v16, (a5) ; CHECK-NEXT: add a5, a1, a3 @@ -112,16 +111,16 @@ define fastcc <vscale x 128 x i32> @ret_split_nxv128i32(ptr %x) { ; CHECK-NEXT: addi a3, a3, 16 ; CHECK-NEXT: vs8r.v v8, (a3) # vscale x 64-byte Folded Spill ; CHECK-NEXT: slli a3, a2, 3 -; CHECK-NEXT: slli a4, a2, 5 -; CHECK-NEXT: slli a5, a2, 4 +; CHECK-NEXT: slli a4, a2, 4 +; CHECK-NEXT: slli a5, a2, 5 ; CHECK-NEXT: slli a2, a2, 6 -; CHECK-NEXT: sub a6, a4, a3 -; CHECK-NEXT: add a7, a4, a3 -; CHECK-NEXT: sub t0, a2, a5 +; CHECK-NEXT: add a6, a4, a3 +; CHECK-NEXT: add a7, a5, a3 +; CHECK-NEXT: add t0, a5, a4 ; CHECK-NEXT: sub a2, a2, a3 ; CHECK-NEXT: add t1, a1, a3 -; CHECK-NEXT: add t2, a1, a5 -; CHECK-NEXT: add t3, a1, a4 +; CHECK-NEXT: add t2, a1, a4 +; CHECK-NEXT: add t3, a1, a5 ; CHECK-NEXT: vl8re32.v v8, (t1) ; CHECK-NEXT: csrr t1, vlenb ; CHECK-NEXT: slli t1, t1, 4 @@ -157,12 +156,12 @@ define fastcc <vscale x 128 x i32> @ret_split_nxv128i32(ptr %x) { ; CHECK-NEXT: addi a1, a1, 16 ; CHECK-NEXT: vl8r.v v0, (a1) # vscale x 64-byte Folded Reload ; CHECK-NEXT: vs8r.v v0, (a0) -; CHECK-NEXT: add a4, a0, a4 -; CHECK-NEXT: vs8r.v v16, (a4) ; CHECK-NEXT: add a5, a0, a5 +; CHECK-NEXT: vs8r.v v16, (a5) +; CHECK-NEXT: add a4, a0, a4 ; CHECK-NEXT: addi a1, sp, 16 ; CHECK-NEXT: vl8r.v v16, (a1) # vscale x 64-byte Folded Reload -; CHECK-NEXT: vs8r.v v16, (a5) +; CHECK-NEXT: vs8r.v v16, (a4) ; CHECK-NEXT: add a3, a0, a3 ; CHECK-NEXT: csrr a1, vlenb ; CHECK-NEXT: slli a1, a1, 4 diff --git a/llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll b/llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll index 7c9a283dd54bc..ed0eb810aa04a 100644 --- a/llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll +++ b/llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll @@ -291,7 +291,8 @@ define <vscale x 2 x i8> @extract_nxv32i8_nxv2i8_6(<vscale x 32 x i8> %vec) { ; CHECK: # %bb.0: ; CHECK-NEXT: csrr a0, vlenb ; CHECK-NEXT: srli a1, a0, 2 -; CHECK-NEXT: sub a0, a0, a1 +; CHECK-NEXT: srli a0, a0, 1 +; CHECK-NEXT: add a0, a0, a1 ; CHECK-NEXT: vsetvli a1, zero, e8, m1, ta, ma ; CHECK-NEXT: vslidedown.vx v8, v8, a0 ; CHECK-NEXT: ret @@ -314,7 +315,8 @@ define <vscale x 2 x i8> @extract_nxv32i8_nxv2i8_22(<vscale x 32 x i8> %vec) { ; CHECK: # %bb.0: ; CHECK-NEXT: csrr a0, vlenb ; CHECK-NEXT: srli a1, a0, 2 -; CHECK-NEXT: sub a0, a0, a1 +; CHECK-NEXT: srli a0, a0, 1 +; CHECK-NEXT: add a0, a0, a1 ; CHECK-NEXT: vsetvli a1, zero, e8, m1, ta, ma ; CHECK-NEXT: vslidedown.vx v8, v10, a0 ; CHECK-NEXT: ret diff --git a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll index ac9f26314a9ab..2590d2b0b77ee 100644 --- a/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll +++ b/llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll @@ -475,16 +475,15 @@ define {<vscale x 16 x i1>, <vscale x 16 x i1>, <vscale x 16 x i1>, <vscale x 16 ; CHECK-NEXT: csrr a0, vlenb ; CHECK-NEXT: slli a0, a0, 3 ; CHECK-NEXT: sub sp, sp, a0 -; CHECK-NEXT: vsetvli a0, zero, e8, m2, ta, ma +; CHECK-NEXT: vsetvli a0, zero, e8, m1, ta, ma ; CHECK-NEXT: vmv1r.v v8, v0 ; CHECK-NEXT: csrr a0, vlenb -; CHECK-NEXT: vmv.v.i v10, 0 ; CHECK-NEXT: srli a1, a0, 2 -; CHECK-NEXT: sub a2, a0, a1 -; CHECK-NEXT: vsetvli a3, zero, e8, m1, ta, ma -; CHECK-NEXT: vslidedown.vx v0, v0, a2 ; CHECK-NEXT: srli a0, a0, 1 +; CHECK-NEXT: add a2, a0, a1 +; CHECK-NEXT: vslidedown.vx v0, v0, a2 ; CHECK-NEXT: vsetvli a2,... [truncated] 
@pfusik pfusik requested a review from topperc November 12, 2025 17:37
; CHECK-NEXT: csrr a0, vlenb
; CHECK-NEXT: srli a1, a0, 2
; CHECK-NEXT: sub a0, a0, a1
; CHECK-NEXT: srli a0, a0, 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pessimised X * 3 / 4 from X - (X >> 2) to (X >> 1) + (X >> 2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to fix that tomorrow, by checking if it's an anti-heuristic pattern (see the description).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that with Zba, this code currently produces

 csrr a0, vlenb srli a0, a0, 3 sh1add a0, a0, a0 slli a0, a0, 1 

it could be

 csrr a0, vlenb srli a0, a0, 2 sh1add a0, a0, a0 
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll try to address that too, as a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this pessimization. Please review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that with Zba, this code currently produces

#168019

@pfusik
Copy link
Contributor Author

pfusik commented Nov 12, 2025

I expanded the description.

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pfusik pfusik merged commit e0aec1f into llvm:main Nov 13, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

3 participants