Skip to content

Conversation

@DimitryAndric
Copy link
Collaborator

Fixes #99396

The result of VirtRegAuxInfo::weightCalcHelper can be influenced by x87 excess precision, which can result in slightly different register choices when the compiler is hosted on x86_64 or i386. This leads to different object file output when cross-compiling to i386, or native.

Similar to 7af3432, add a volatile qualifier to the local Weight variable to force it onto the stack, and avoid the excess precision.

@llvmbot
Copy link
Member

llvmbot commented Jul 23, 2024

@llvm/pr-subscribers-llvm-support
@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-backend-x86

Author: Dimitry Andric (DimitryAndric)

Changes

Fixes #99396

The result of VirtRegAuxInfo::weightCalcHelper can be influenced by x87 excess precision, which can result in slightly different register choices when the compiler is hosted on x86_64 or i386. This leads to different object file output when cross-compiling to i386, or native.

Similar to 7af3432, add a volatile qualifier to the local Weight variable to force it onto the stack, and avoid the excess precision.


Full diff: https://github.com/llvm/llvm-project/pull/100165.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/CalcSpillWeights.cpp (+3-1)
  • (added) llvm/test/CodeGen/X86/pr99396.ll (+59)
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp index 1d767a3484bca..60055e6ba0cd3 100644 --- a/llvm/lib/CodeGen/CalcSpillWeights.cpp +++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp @@ -257,7 +257,9 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start, return -1.0f; } - float Weight = 1.0f; + // Force Weight onto the stack so that x86 doesn't add hidden precision, + // similar to HWeight below. + volatile float Weight = 1.0f; if (IsSpillable) { // Get loop info for mi. if (MI->getParent() != MBB) { diff --git a/llvm/test/CodeGen/X86/pr99396.ll b/llvm/test/CodeGen/X86/pr99396.ll new file mode 100644 index 0000000000000..7bc1153b57990 --- /dev/null +++ b/llvm/test/CodeGen/X86/pr99396.ll @@ -0,0 +1,59 @@ +; RUN: llc < %s -mtriple=i386-unknown-freebsd -enable-misched -relocation-model=pic | FileCheck %s + +target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i128:128-f64:32:64-f80:32-n8:16:32-S128" +target triple = "i386-unknown-freebsd15.0" + +@c = external local_unnamed_addr global ptr + +declare i32 @fn2() local_unnamed_addr + +declare i32 @fn3() local_unnamed_addr + +define noundef i32 @fn4() #0 { +entry: + %0 = load i32, ptr @fn4, align 4 +; CHECK: movl fn4@GOT(%ebx), %edi +; CHECK-NEXT: movl (%edi), %edx + %1 = load ptr, ptr @c, align 4 +; CHECK: movl c@GOT(%ebx), %eax +; CHECK-NEXT: movl (%eax), %esi +; CHECK-NEXT: testl %esi, %esi + %cmp.g = icmp eq ptr %1, null + br i1 %cmp.g, label %if.then.g, label %if.end3.g + +if.then.g: ; preds = %entry + %2 = load i32, ptr inttoptr (i32 1 to ptr), align 4 + %cmp1.g = icmp slt i32 %2, 0 + br i1 %cmp1.g, label %if.then2.g, label %if.end3.g + +if.then2.g: ; preds = %if.then.g + %.g = load volatile i32, ptr null, align 2147483648 + br label %f.exit + +if.end3.g: ; preds = %if.then.g, %entry + %h.i.g = icmp eq i32 %0, 0 + br i1 %h.i.g, label %f.exit, label %while.body.g + +while.body.g: ; preds = %if.end3.g, %if.end8.g + %buff.addr.019.g = phi ptr [ %incdec.ptr.g, %if.end8.g ], [ @fn4, %if.end3.g ] + %g.addr.018.g = phi i32 [ %dec.g, %if.end8.g ], [ %0, %if.end3.g ] + %call4.g = tail call i32 @fn3(ptr %1, ptr %buff.addr.019.g, i32 %g.addr.018.g) + %cmp5.g = icmp slt i32 %call4.g, 0 + br i1 %cmp5.g, label %if.then6.g, label %if.end8.g + +if.then6.g: ; preds = %while.body.g + %call7.g = tail call i32 @fn2(ptr null) + br label %f.exit + +if.end8.g: ; preds = %while.body.g + %dec.g = add i32 %g.addr.018.g, 1 + %incdec.ptr.g = getelementptr i32, ptr %buff.addr.019.g, i32 1 + store i64 0, ptr %1, align 4 + %h.not.g = icmp eq i32 %dec.g, 0 + br i1 %h.not.g, label %f.exit, label %while.body.g + +f.exit: ; preds = %if.end8.g, %if.then6.g, %if.end3.g, %if.then2.g + ret i32 0 +} + +attributes #0 = { "frame-pointer"="all" "tune-cpu"="generic" } 
%0 = load i32, ptr @fn4, align 4
; CHECK: movl fn4@GOT(%ebx), %edi
; CHECK-NEXT: movl (%edi), %edx
%1 = load ptr, ptr @c, align 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use named values in tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I haven't got much experience with writing these tests, what does this mean? I'm intending here to check the exact registers used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%0 should be %load or some other name etc.

float Weight = 1.0f;
// Force Weight onto the stack so that x86 doesn't add hidden precision,
// similar to HWeight below.
volatile float Weight = 1.0f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap this in a macro in Support somewhere to only do this if using degenerate x87 handling?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, maybe a specific stack_float_t type?

Comment on lines 3 to 4
target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-i128:128-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-freebsd15.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are redundant with the command line arguments


/// Type to force float point values onto the stack, so that x86 doesn't add
/// hidden precision, avoiding rounding differences on various platforms.
using stack_float_t = volatile float;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason this would go in support would be to conditionalize this on using the volatile only when x87 is in use

Fixes llvm#99396 The result of `VirtRegAuxInfo::weightCalcHelper` can be influenced by x87 excess precision, which can result in slightly different register choices when the compiler is hosted on x86_64 or i386. This leads to different object file output when cross-compiling to i386, or native. Similar to 7af3432, we need to add a `volatile` qualifier to the local `Weight` variable to force it onto the stack, and avoid the excess precision. Define `stack_float_t` in `MathExtras.h` for this purpose, and use it.

/// Type to force float point values onto the stack, so that x86 doesn't add
/// hidden precision, avoiding rounding differences on various platforms.
#if defined(__i386__) || defined(_M_IX86)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is the exact condition but ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment