Skip to content

Conversation

@lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 16, 2025

In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses.

We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in TargetTransformInfo::isReMaterializableImpl. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites.

https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations.

However #160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and #160709 and #159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial.

With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets.

llvm-test-suite regalloc.NumSpills geomean llvm-test-suite regalloc.NumReloads geomean
-target riscv64-linux-gnu -march=rva23u64 -O3 -5.2% -8.1%
-target arm64-apple-darwin -O3 -10.8% -11.6%
-target x86_64-linux-gnu -O3 -6.2% -6.5%
SPEC CPU 2017 regalloc.NumSpills geomean SPEC CPU 2017 regalloc.NumReloads geomean
-target riscv64-linux-gnu -march=rva23u64 -O3 -4.9% -5.5%
-target x86_64-linux-gnu -O3 -3.2% -4.0%

I wasn't able to build SPEC CPU 2017 on arm64-apple-darwin due to incompatibilities with the macOS SDK headers.

This also allows us to rematerialize loads and stores on RISC-V in a future patch.

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 16, 2025

I've attached the exact results for each test below.

AArch64 llvm-test-suite

Program regalloc.NumSpills regalloc.NumReloads lhs rhs diff lhs rhs diff test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 139.00 141.00 1.4% 309.00 304.00 -1.6% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test 1276.00 1277.00 0.1% 2088.00 1978.00 -5.3% test-suite :: MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl.test 6.00 6.00 0.0% 9.00 9.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt.test 6.00 6.00 0.0% 9.00 9.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/Trimaran/enc-3des/enc-3des.test 5.00 5.00 0.0% 4.00 4.00 0.0% test-suite :: MultiSource/Benchmarks/Trimaran/enc-rc4/enc-rc4.test 69.00 69.00 0.0% 68.00 68.00 0.0% test-suite :: MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test 1.00 1.00 0.0% 2.00 2.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test 3.00 3.00 0.0% 6.00 6.00 0.0% test-suite :: MicroBenchmarks/Builtins/Int128/Builtins.test 14.00 14.00 0.0% 28.00 28.00 0.0% test-suite :: MultiSource/Benchmarks/llubenchmark/llu.test 2.00 2.00 0.0% 3.00 3.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Expansion-dbl/Expansion-dbl.test 4.00 4.00 0.0% 7.00 7.00 0.0% test-suite :: MultiSource/Benchmarks/Ptrdist/ks/ks.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test 9.00 9.00 0.0% 12.00 12.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test 8.00 8.00 0.0% 11.00 11.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/ControlLoops-dbl/ControlLoops-dbl.test 3.00 3.00 0.0% 6.00 6.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/ControlLoops-flt/ControlLoops-flt.test 3.00 3.00 0.0% 6.00 6.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test 3.00 3.00 0.0% 6.00 6.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-dbl/Equivalencing-dbl.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test 3.00 3.00 0.0% 6.00 6.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl.test 3.00 3.00 0.0% 6.00 6.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl.test 9.00 9.00 0.0% 12.00 12.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt.test 10.00 10.00 0.0% 13.00 13.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl.test 5.00 5.00 0.0% 8.00 8.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt.test 4.00 4.00 0.0% 7.00 7.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt.test 3.00 3.00 0.0% 6.00 6.00 0.0% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 123.00 123.00 0.0% 124.00 124.00 0.0% test-suite :: MultiSource/Benchmarks/Olden/voronoi/voronoi.test 20.00 20.00 0.0% 22.00 22.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/symm/symm.test 12.00 12.00 0.0% 12.00 12.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/syrk/syrk.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/trmm/trmm.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg/bicg.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/doitgen/doitgen.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/mvt/mvt.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/cholesky/cholesky.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt.test 7.00 7.00 0.0% 8.00 8.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/lu/lu.test 7.00 7.00 0.0% 7.00 7.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/ludcmp/ludcmp.test 2.00 2.00 0.0% 2.00 2.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/medley/deriche/deriche.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/medley/floyd-warshall/floyd-warshall.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/medley/nussinov/nussinov.test 7.00 7.00 0.0% 7.00 7.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/stencils/fdtd-2d/fdtd-2d.test 7.00 7.00 0.0% 17.00 17.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/SmallPT/smallpt.test 42.00 42.00 0.0% 64.00 64.00 0.0% test-suite :: SingleSource/Benchmarks/Stanford/FloatMM.test 11.00 11.00 0.0% 11.00 11.00 0.0% test-suite :: SingleSource/Benchmarks/Stanford/Perm.test 1.00 1.00 0.0% 4.00 4.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/syr2k/syr2k.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/gemver/gemver.test 2.00 2.00 0.0% 2.00 2.00 0.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/partialsums.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/datamining/covariance/covariance.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/puzzle.test 4.00 4.00 0.0% 4.00 4.00 0.0% test-suite :: SingleSource/Benchmarks/CoyoteBench/almabench.test 99.00 99.00 0.0% 91.00 91.00 0.0% test-suite :: SingleSource/Benchmarks/McGill/misr.test 12.00 12.00 0.0% 14.00 14.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test 20.00 20.00 0.0% 22.00 22.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.test 16.00 16.00 0.0% 21.00 21.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/oopack_v1p8.test 42.00 42.00 0.0% 42.00 42.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_container.test 11.00 11.00 0.0% 21.00 21.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test 5.00 5.00 0.0% 10.00 10.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/fbench.test 1.00 1.00 0.0% 3.00 3.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/flops-5.test 14.00 14.00 0.0% 15.00 15.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/flops-6.test 13.00 13.00 0.0% 13.00 13.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/flops-8.test 14.00 14.00 0.0% 15.00 15.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/mandel-2.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/mandel.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/perlin.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/richards_benchmark.test 3.00 3.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/Ptrdist/anagram/anagram.test 3.00 3.00 0.0% 3.00 3.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test 3.00 3.00 0.0% 6.00 6.00 0.0% test-suite :: MultiSource/Benchmarks/Olden/power/power.test 22.00 22.00 0.0% 32.00 32.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/automotive-basicmath/automotive-basicmath.test 4.00 4.00 0.0% 4.00 4.00 0.0% test-suite :: MultiSource/Benchmarks/McCat/04-bisect/bisect.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test 25.00 25.00 0.0% 26.00 26.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.test 32.00 32.00 0.0% 34.00 34.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench.test 59.00 59.00 0.0% 90.00 90.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/Pathfinder/PathFinder.test 17.00 17.00 0.0% 12.00 12.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 39.00 39.00 0.0% 37.00 37.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test 45.00 45.00 0.0% 54.00 54.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HACCKernels/HACCKernels.test 19.00 19.00 0.0% 26.00 26.00 0.0% test-suite :: MultiSource/Benchmarks/BitBench/five11/five11.test 3.00 3.00 0.0% 3.00 3.00 0.0% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk.test 199.00 199.00 0.0% 456.00 343.00 -24.8% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk.test 6.00 6.00 0.0% 7.00 7.00 0.0% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test 25.00 25.00 0.0% 26.00 26.00 0.0% test-suite :: MultiSource/Applications/spiff/spiff.test 31.00 31.00 0.0% 37.00 37.00 0.0% test-suite :: MultiSource/Applications/siod/siod.test 10.00 10.00 0.0% 13.00 13.00 0.0% test-suite :: MultiSource/Applications/kimwitu++/kc.test 80.00 80.00 0.0% 79.00 79.00 0.0% test-suite :: MultiSource/Applications/hbd/hbd.test 44.00 44.00 0.0% 88.00 87.00 -1.1% test-suite :: MicroBenchmarks/SLPVectorization/SLPVectorizationBenchmarks.test 6.00 6.00 0.0% 12.00 12.00 0.0% test-suite :: MicroBenchmarks/MemFunctions/MemFunctions.test 11.00 11.00 0.0% 157.00 157.00 0.0% test-suite :: MicroBenchmarks/LoopVectorization/LoopVectorizationBenchmarks.test 129.00 129.00 0.0% 464.00 464.00 0.0% test-suite :: MicroBenchmarks/LoopVectorization/LoopInterleavingBenchmarks.test 129.00 129.00 0.0% 464.00 464.00 0.0% test-suite :: MicroBenchmarks/LoopVectorization/LoopEpilogueVectorizationBenchmarks.test 129.00 129.00 0.0% 464.00 464.00 0.0% test-suite :: MicroBenchmarks/ImageProcessing/Blur/blur.test 9.00 9.00 0.0% 9.00 9.00 0.0% test-suite :: MicroBenchmarks/ImageProcessing/BilateralFiltering/BilateralFilter.test 12.00 12.00 0.0% 16.00 16.00 0.0% test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test 8.00 8.00 0.0% 17.00 17.00 0.0% test-suite :: SingleSource/Benchmarks/Stanford/RealMM.test 11.00 11.00 0.0% 11.00 11.00 0.0% test-suite :: MultiSource/Benchmarks/Olden/bh/bh.test 15.00 15.00 0.0% 15.00 15.00 0.0% test-suite :: MultiSource/Benchmarks/Olden/em3d/em3d.test 1.00 1.00 0.0% 2.00 2.00 0.0% test-suite :: MultiSource/Benchmarks/Olden/health/health.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.test 11.00 11.00 0.0% 26.00 26.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael.test 2.00 2.00 0.0% 2.00 2.00 0.0% test-suite :: MultiSource/Benchmarks/Olden/bisort/bisort.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/network-dijkstra/network-dijkstra.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 123.00 123.00 0.0% 124.00 124.00 0.0% test-suite :: MultiSource/Applications/SPASS/SPASS.test 1438.00 1428.00 -0.7% 3053.00 3038.00 -0.5% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 688.00 682.00 -0.9% 949.00 940.00 -0.9% test-suite :: SingleSource/Benchmarks/Misc/himenobmtxpa.test 224.00 222.00 -0.9% 606.00 531.00 -12.4% test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 732.00 724.00 -1.1% 1873.00 1835.00 -2.0% test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test 255.00 252.00 -1.2% 295.00 286.00 -3.1% test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test 306.00 300.00 -2.0% 563.00 539.00 -4.3% test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test 95.00 93.00 -2.1% 192.00 188.00 -2.1% test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test 95.00 93.00 -2.1% 192.00 188.00 -2.1% test-suite :: SingleSource/Benchmarks/Misc/flops.test 43.00 42.00 -2.3% 49.00 47.00 -4.1% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 1052.00 1025.00 -2.6% 1627.00 1602.00 -1.5% test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test 193.00 188.00 -2.6% 768.00 905.00 17.8% test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test 191.00 186.00 -2.6% 766.00 908.00 18.5% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 201.00 195.00 -3.0% 434.00 410.00 -5.5% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2662.00 2582.00 -3.0% 5693.00 5610.00 -1.5% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 2240.00 2169.00 -3.2% 3627.00 3530.00 -2.7% test-suite :: MultiSource/Benchmarks/McCat/05-eks/eks.test 123.00 119.00 -3.3% 169.00 170.00 0.6% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 2378.00 2295.00 -3.5% 4447.00 4357.00 -2.0% test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test 81.00 78.00 -3.7% 185.00 176.00 -4.9% test-suite :: MicroBenchmarks/LCALS/SubsetBLambdaLoops/lcalsBLambda.test 81.00 78.00 -3.7% 185.00 176.00 -4.9% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 1051.00 1010.00 -3.9% 2931.00 2869.00 -2.1% test-suite :: MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test 49.00 47.00 -4.1% 64.00 60.00 -6.2% test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test 44.00 42.00 -4.5% 85.00 81.00 -4.7% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 1132.00 1080.00 -4.6% 1605.00 1519.00 -5.4% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 282.00 269.00 -4.6% 537.00 491.00 -8.6% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test 347.00 331.00 -4.6% 1479.00 1467.00 -0.8% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 1298.00 1236.00 -4.8% 3792.00 3537.00 -6.7% test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test 203.00 193.00 -4.9% 336.00 314.00 -6.5% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 769.00 730.00 -5.1% 1601.00 1466.00 -8.4% test-suite :: MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo.test 59.00 56.00 -5.1% 131.00 130.00 -0.8% test-suite :: MultiSource/Benchmarks/sim/sim.test 112.00 106.00 -5.4% 257.00 241.00 -6.2% test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test 207.00 195.00 -5.8% 344.00 324.00 -5.8% test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test 17.00 16.00 -5.9% 25.00 24.00 -4.0% test-suite :: MultiSource/Benchmarks/Fhourstones/fhourstones.test 17.00 16.00 -5.9% 9.00 8.00 -11.1% test-suite :: MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test 64.00 60.00 -6.2% 103.00 96.00 -6.8% test-suite :: SingleSource/Benchmarks/Stanford/Oscar.test 32.00 30.00 -6.2% 104.00 92.00 -11.5% test-suite :: MultiSource/Applications/d/make_dparser.test 141.00 132.00 -6.4% 432.00 342.00 -20.8% test-suite :: MultiSource/Applications/oggenc/oggenc.test 1202.00 1123.00 -6.6% 2446.00 2123.00 -13.2% test-suite :: MultiSource/Applications/SIBsim4/SIBsim4.test 299.00 279.00 -6.7% 678.00 637.00 -6.0% test-suite :: SingleSource/Benchmarks/Linpack/linpack-pc.test 104.00 97.00 -6.7% 255.00 256.00 0.4% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 265.00 247.00 -6.8% 743.00 736.00 -0.9% test-suite :: MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode.test 187.00 174.00 -7.0% 207.00 195.00 -5.8% test-suite :: MultiSource/Applications/ClamAV/clamscan.test 1714.00 1593.00 -7.1% 3779.00 3353.00 -11.3% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 193.00 178.00 -7.8% 431.00 397.00 -7.9% test-suite :: MultiSource/Benchmarks/nbench/nbench.test 154.00 142.00 -7.8% 218.00 192.00 -11.9% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 369.00 339.00 -8.1% 508.00 466.00 -8.3% test-suite :: MultiSource/Applications/minisat/minisat.test 12.00 11.00 -8.3% 18.00 15.00 -16.7% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 363.00 331.00 -8.8% 521.00 477.00 -8.4% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1124.00 1023.00 -9.0% 2202.00 1981.00 -10.0% test-suite :: SingleSource/Benchmarks/Polybench/stencils/adi/adi.test 10.00 9.00 -10.0% 15.00 12.00 -20.0% test-suite :: MultiSource/Benchmarks/FreeBench/analyzer/analyzer.test 10.00 9.00 -10.0% 23.00 20.00 -13.0% test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test 47.00 42.00 -10.6% 47.00 42.00 -10.6% test-suite :: MultiSource/Applications/obsequi/Obsequi.test 63.00 56.00 -11.1% 75.00 69.00 -8.0% test-suite :: MultiSource/Benchmarks/Rodinia/backprop/backprop.test 72.00 64.00 -11.1% 91.00 99.00 8.8% test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test 197.00 175.00 -11.2% 236.00 210.00 -11.0% test-suite :: MultiSource/Benchmarks/Rodinia/srad/srad.test 42.00 37.00 -11.9% 58.00 55.00 -5.2% test-suite :: MultiSource/Applications/hexxagon/hexxagon.test 63.00 54.00 -14.3% 108.00 86.00 -20.4% test-suite :: MultiSource/Applications/lua/lua.test 35.00 30.00 -14.3% 48.00 39.00 -18.8% test-suite :: SingleSource/Benchmarks/Misc/whetstone.test 20.00 17.00 -15.0% 20.00 17.00 -15.0% test-suite :: MicroBenchmarks/harris/harris.test 20.00 17.00 -15.0% 31.00 26.00 -16.1% test-suite :: MultiSource/Benchmarks/Olden/mst/mst.test 6.00 5.00 -16.7% 11.00 10.00 -9.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/SimpleMOC/SimpleMOC.test 112.00 90.00 -19.6% 240.00 220.00 -8.3% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 285.00 229.00 -19.6% 464.00 389.00 -16.2% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 5.00 4.00 -20.0% 8.00 5.00 -37.5% test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test 168.00 134.00 -20.2% 421.00 351.00 -16.6% test-suite :: MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1.test 19.00 15.00 -21.1% 21.00 13.00 -38.1% test-suite :: MultiSource/Benchmarks/VersaBench/beamformer/beamformer.test 74.00 58.00 -21.6% 86.00 68.00 -20.9% test-suite :: MultiSource/Benchmarks/BitBench/uuencode/uuencode.test 4.00 3.00 -25.0% 4.00 3.00 -25.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/fannkuch.test 4.00 3.00 -25.0% 4.00 3.00 -25.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test 76.00 55.00 -27.6% 96.00 69.00 -28.1% test-suite :: SingleSource/Benchmarks/Polybench/stencils/heat-3d/heat-3d.test 14.00 10.00 -28.6% 17.00 11.00 -35.3% test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 2903.00 2051.00 -29.3% 3027.00 2231.00 -26.3% test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 647.00 447.00 -30.9% 741.00 541.00 -27.0% test-suite :: MultiSource/Benchmarks/Trimaran/enc-md5/enc-md5.test 3.00 2.00 -33.3% 3.00 2.00 -33.3% test-suite :: SingleSource/Benchmarks/Adobe-C++/functionobjects.test 12.00 8.00 -33.3% 22.00 11.00 -50.0% test-suite :: SingleSource/Benchmarks/McGill/chomp.test 28.00 18.00 -35.7% 33.00 22.00 -33.3% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/durbin/durbin.test 5.00 3.00 -40.0% 8.00 5.00 -37.5% test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding.test 475.00 271.00 -42.9% 479.00 275.00 -42.6% test-suite :: MultiSource/Applications/sgefa/sgefa.test 30.00 17.00 -43.3% 34.00 16.00 -52.9% test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-2d/jacobi-2d.test 17.00 9.00 -47.1% 17.00 9.00 -47.1% test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt.test 6.00 3.00 -50.0% 9.00 6.00 -33.3% test-suite :: MultiSource/Benchmarks/Olden/perimeter/perimeter.test 2.00 1.00 -50.0% 2.00 1.00 -50.0% test-suite :: MultiSource/Applications/viterbi/viterbi.test 11.00 5.00 -54.5% 15.00 9.00 -40.0% test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test 5.00 2.00 -60.0% 5.00 2.00 -60.0% test-suite :: MultiSource/Benchmarks/FreeBench/neural/neural.test 16.00 6.00 -62.5% 26.00 12.00 -53.8% test-suite :: SingleSource/Benchmarks/Stanford/Puzzle.test 3.00 1.00 -66.7% 5.00 1.00 -80.0% test-suite :: SingleSource/Benchmarks/Adobe-C++/stepanov_vector.test 91.00 25.00 -72.5% 103.00 29.00 -71.8% test-suite :: MultiSource/Benchmarks/Rodinia/pathfinder/pathfinder.test 4.00 1.00 -75.0% 4.00 1.00 -75.0% test-suite :: SingleSource/Benchmarks/Misc-C++-EH/spirit.test 144.00 18.00 -87.5% 448.00 20.00 -95.5% test-suite :: SingleSource/Benchmarks/Adobe-C++/stepanov_abstraction.test 87.00 3.00 -96.6% 87.00 3.00 -96.6% test-suite :: MultiSource/Benchmarks/SciMark2-C/scimark2.test 2.00 0.00 -100.0% 2.00 -100.0% test-suite :: MicroBenchmarks/ImageProcessing/Dilate/Dilate.test 0.00 0.00 test-suite :: MicroBenchmarks/ImageProcessing/Dither/Dither.test 0.00 0.00 test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test 0.00 0.00 test-suite :: MicroBenchmarks/LoopInterchange/LoopInterchange.test 0.00 0.00 test-suite :: MultiSource/Applications/aha/aha.test 0.00 0.00 test-suite :: MultiSource/Applications/lambda-0.1.3/lambda.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/BitBench/drop3/drop3.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/BitBench/uudecode/uudecode.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/FreeBench/mason/mason.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/01-qbsort/qbsort.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/03-testtrie/testtrie.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/09-vor/vor.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/12-IOtest/iotest.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/17-bintr/bintr.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/MiBench/network-patricia/network-patricia.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/MiBench/security-sha/security-sha.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/NPB-serial/is/is.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Olden/treeadd/treeadd.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Prolangs-C++/city/city.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Prolangs-C++/employ/employ.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Prolangs-C++/life/life.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Prolangs-C++/ocean/ocean.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Prolangs-C++/primes/primes.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Prolangs-C++/simul/simul.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Ptrdist/ft/ft.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/VersaBench/8b10b/8b10b.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/VersaBench/bmm/bmm.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/mediabench/adpcm/rawcaudio/rawcaudio.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/mediabench/adpcm/rawdaudio/rawdaudio.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/BenchmarkGame/Large/fasta.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/BenchmarkGame/nsieve-bits.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/BenchmarkGame/recursive.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/CoyoteBench/huffbench.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/CoyoteBench/lpbench.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Dhrystone/dry.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Dhrystone/fldry.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/McGill/queens.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc-C++/bigfib.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc-C++/mandel-text.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_v1p2.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/aarch64-init-cpu-features.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/dt.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/evalloop.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/ffbench.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-1.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-2.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-3.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-4.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-7.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/fp-convert.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/lowercase.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/matmul_f64_4x4.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/pi.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/revertBits.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/salsa20.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/gesummv/gesummv.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/trisolv/trisolv.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-1d/jacobi-1d.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/EH/Shootout-C++-except.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ackermann.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ary.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ary2.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ary3.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-fibo.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-hash.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-hash2.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-heapsort.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists1.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-matrix.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-methcall.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-moments.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-nestedloop.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-objinst.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-random.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-sieve.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-strcat.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-ackermann.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-ary3.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-fib2.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-hash.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-heapsort.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-lists.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-matrix.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-methcall.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-nestedloop.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-objinst.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-random.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-sieve.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-strcat.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Stanford/Bubblesort.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Stanford/Queens.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Stanford/Quicksort.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Stanford/Towers.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Stanford/Treesort.test 0.00 0.00 test-suite :: tools/fpcmp-target.test 0.00 0.00 test-suite :: tools/test/abrt.test 0.00 0.00 test-suite :: tools/test/check_env.test 0.00 0.00 test-suite :: tools/test/ret0.test 0.00 0.00 test-suite :: tools/test/ret1.test 0.00 0.00 Geomean difference -10.8% -11.6% 

RISC-V llvm-test-suite

Metric: regalloc.NumSpills,regalloc.NumReloads Program regalloc.NumSpills regalloc.NumReloads lhs rhs diff lhs rhs diff test-suite :: SingleSource/Benchmarks/Misc/ffbench.test 0.00 1.00 inf% 1.00 inf% test-suite :: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.test 3.00 4.00 33.3% 4.00 6.00 50.0% test-suite :: MultiSource/Applications/spiff/spiff.test 63.00 65.00 3.2% 84.00 84.00 0.0% test-suite :: MultiSource/Benchmarks/Trimaran/enc-3des/enc-3des.test 32.00 33.00 3.1% 40.00 41.00 2.5% test-suite :: SingleSource/Benchmarks/Misc/himenobmtxpa.test 199.00 203.00 2.0% 277.00 274.00 -1.1% test-suite :: MultiSource/Applications/kimwitu++/kc.test 140.00 141.00 0.7% 626.00 627.00 0.2% test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test 273.00 274.00 0.4% 498.00 502.00 0.8% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test 399.00 400.00 0.3% 683.00 694.00 1.6% test-suite :: MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt.test 19.00 19.00 0.0% 26.00 26.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test 13.00 13.00 0.0% 19.00 19.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test 13.00 13.00 0.0% 19.00 19.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl.test 19.00 19.00 0.0% 26.00 26.00 0.0% test-suite :: MicroBenchmarks/Builtins/Int128/Builtins.test 8.00 8.00 0.0% 12.00 12.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl.test 13.00 13.00 0.0% 20.00 20.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/ControlLoops-flt/ControlLoops-flt.test 9.00 9.00 0.0% 14.00 14.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl.test 29.00 29.00 0.0% 36.00 36.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test 28.00 28.00 0.0% 35.00 35.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl.test 30.00 30.00 0.0% 40.00 40.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt.test 32.00 32.00 0.0% 43.00 43.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl.test 5.00 5.00 0.0% 6.00 6.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt.test 4.00 4.00 0.0% 5.00 5.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt.test 14.00 14.00 0.0% 21.00 21.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/ControlLoops-dbl/ControlLoops-dbl.test 9.00 9.00 0.0% 14.00 14.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt.test 5.00 5.00 0.0% 7.00 7.00 0.0% test-suite :: MultiSource/Benchmarks/Olden/voronoi/voronoi.test 2.00 2.00 0.0% 2.00 2.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 142.00 142.00 0.0% 145.00 145.00 0.0% test-suite :: MultiSource/Benchmarks/NPB-serial/is/is.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: MultiSource/Benchmarks/Olden/bh/bh.test 21.00 21.00 0.0% 22.00 22.00 0.0% test-suite :: MultiSource/Benchmarks/Olden/em3d/em3d.test 1.00 1.00 0.0% 2.00 2.00 0.0% test-suite :: MultiSource/Benchmarks/Olden/mst/mst.test 4.00 4.00 0.0% 8.00 8.00 0.0% test-suite :: MultiSource/Benchmarks/Olden/power/power.test 66.00 66.00 0.0% 64.00 64.00 0.0% test-suite :: MultiSource/Benchmarks/Prolangs-C++/city/city.test 7.00 7.00 0.0% 10.00 10.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt.test 62.00 62.00 0.0% 86.00 86.00 0.0% test-suite :: MultiSource/Benchmarks/Prolangs-C++/employ/employ.test 2.00 2.00 0.0% 1.00 1.00 0.0% test-suite :: MultiSource/Benchmarks/Prolangs-C++/ocean/ocean.test 3.00 3.00 0.0% 3.00 3.00 0.0% test-suite :: MultiSource/Benchmarks/Ptrdist/ks/ks.test 4.00 4.00 0.0% 6.00 6.00 0.0% test-suite :: MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test 113.00 113.00 0.0% 178.00 178.00 0.0% test-suite :: MicroBenchmarks/ImageProcessing/AnisotropicDiffusion/AnisotropicDiffusion.test 11.00 11.00 0.0% 12.00 12.00 0.0% test-suite :: MultiSource/Benchmarks/SciMark2-C/scimark2.test 3.00 3.00 0.0% 3.00 3.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl.test 5.00 5.00 0.0% 7.00 7.00 0.0% test-suite :: MultiSource/Benchmarks/Trimaran/enc-md5/enc-md5.test 26.00 26.00 0.0% 29.00 29.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl.test 18.00 18.00 0.0% 28.00 28.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/syr2k/syr2k.test 3.00 3.00 0.0% 11.00 3.00 -72.7% test-suite :: SingleSource/Benchmarks/Misc/fbench.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/flops.test 6.00 6.00 0.0% 6.00 6.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/mandel-2.test 6.00 6.00 0.0% 6.00 6.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/whetstone.test 17.00 17.00 0.0% 16.00 16.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.test 4.00 4.00 0.0% 4.00 4.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/symm/symm.test 2.00 2.00 0.0% 3.00 3.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/medley/deriche/deriche.test 3.00 3.00 0.0% 3.00 3.00 0.0% test-suite :: MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt.test 18.00 18.00 0.0% 28.00 28.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/stencils/adi/adi.test 19.00 19.00 0.0% 29.00 27.00 -6.9% test-suite :: SingleSource/Benchmarks/Polybench/stencils/heat-3d/heat-3d.test 26.00 26.00 0.0% 46.00 46.00 0.0% test-suite :: SingleSource/Benchmarks/SmallPT/smallpt.test 39.00 39.00 0.0% 56.00 56.00 0.0% test-suite :: SingleSource/Benchmarks/Stanford/FloatMM.test 12.00 12.00 0.0% 13.00 13.00 0.0% test-suite :: SingleSource/Benchmarks/Stanford/Oscar.test 24.00 24.00 0.0% 162.00 140.00 -13.6% test-suite :: SingleSource/Benchmarks/Stanford/Queens.test 5.00 5.00 0.0% 6.00 6.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_v1p2.test 8.00 8.00 0.0% 8.00 8.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_container.test 1.00 1.00 0.0% 2.00 2.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/bigfib.test 3.00 3.00 0.0% 3.00 3.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test 25.00 25.00 0.0% 26.00 26.00 0.0% test-suite :: SingleSource/Benchmarks/McGill/misr.test 11.00 11.00 0.0% 14.00 14.00 0.0% test-suite :: SingleSource/Benchmarks/CoyoteBench/almabench.test 74.00 74.00 0.0% 76.00 76.00 0.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/puzzle.test 1.00 1.00 0.0% 4.00 4.00 0.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/partialsums.test 6.00 6.00 0.0% 9.00 9.00 0.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test 56.00 56.00 0.0% 56.00 56.00 0.0% test-suite :: SingleSource/Benchmarks/Adobe-C++/stepanov_vector.test 10.00 10.00 0.0% 41.00 41.00 0.0% test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 8.00 8.00 0.0% 8.00 8.00 0.0% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 142.00 142.00 0.0% 145.00 145.00 0.0% test-suite :: MultiSource/Benchmarks/llubenchmark/llu.test 6.00 6.00 0.0% 7.00 7.00 0.0% test-suite :: MultiSource/Benchmarks/VersaBench/dbms/dbms.test 9.00 9.00 0.0% 16.00 16.00 0.0% test-suite :: MultiSource/Benchmarks/VersaBench/beamformer/beamformer.test 134.00 134.00 0.0% 150.00 150.00 0.0% test-suite :: MultiSource/Benchmarks/Trimaran/enc-rc4/enc-rc4.test 4.00 4.00 0.0% 4.00 4.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/security-sha/security-sha.test 3.00 3.00 0.0% 3.00 3.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.test 15.00 15.00 0.0% 24.00 24.00 0.0% test-suite :: SingleSource/Benchmarks/Stanford/RealMM.test 14.00 14.00 0.0% 15.00 15.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael.test 1.00 1.00 0.0% test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test 126.00 126.00 0.0% 223.00 223.00 0.0% test-suite :: MicroBenchmarks/harris/harris.test 21.00 21.00 0.0% 32.00 32.00 0.0% test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test 101.00 101.00 0.0% 99.00 99.00 0.0% test-suite :: MultiSource/Benchmarks/BitBench/five11/five11.test 2.00 2.00 0.0% 2.00 2.00 0.0% test-suite :: MultiSource/Benchmarks/Fhourstones/fhourstones.test 15.00 15.00 0.0% 9.00 9.00 0.0% test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test 126.00 126.00 0.0% 223.00 223.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.test 29.00 29.00 0.0% 39.00 39.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench.test 44.00 44.00 0.0% 67.00 67.00 0.0% test-suite :: MultiSource/Benchmarks/FreeBench/neural/neural.test 2.00 2.00 0.0% 3.00 3.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/Pathfinder/PathFinder.test 23.00 23.00 0.0% 21.00 21.00 0.0% test-suite :: MultiSource/Applications/lambda-0.1.3/lambda.test 19.00 19.00 0.0% 25.00 25.00 0.0% test-suite :: MultiSource/Applications/minisat/minisat.test 22.00 22.00 0.0% 32.00 32.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HACCKernels/HACCKernels.test 20.00 20.00 0.0% 42.00 42.00 0.0% test-suite :: MultiSource/Applications/siod/siod.test 31.00 31.00 0.0% 36.00 36.00 0.0% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk.test 18.00 18.00 0.0% 20.00 20.00 0.0% test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test 315.00 315.00 0.0% 328.00 328.00 0.0% test-suite :: MultiSource/Benchmarks/FreeBench/analyzer/analyzer.test 9.00 9.00 0.0% 9.00 9.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/network-dijkstra/network-dijkstra.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: MicroBenchmarks/SLPVectorization/SLPVectorizationBenchmarks.test 8.00 8.00 0.0% 12.00 12.00 0.0% test-suite :: MicroBenchmarks/ImageProcessing/BilateralFiltering/BilateralFilter.test 8.00 8.00 0.0% 9.00 9.00 0.0% test-suite :: MultiSource/Benchmarks/MiBench/automotive-basicmath/automotive-basicmath.test 3.00 3.00 0.0% 6.00 6.00 0.0% test-suite :: MicroBenchmarks/ImageProcessing/Blur/blur.test 20.00 20.00 0.0% 22.00 22.00 0.0% test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test 1168.00 1167.00 -0.1% 1518.00 1487.00 -2.0% test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test 1096.00 1095.00 -0.1% 1441.00 1410.00 -2.2% test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test 970.00 968.00 -0.2% 1261.00 1229.00 -2.5% test-suite :: MicroBenchmarks/LCALS/SubsetBLambdaLoops/lcalsBLambda.test 970.00 968.00 -0.2% 1261.00 1229.00 -2.5% test-suite :: MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode.test 220.00 219.00 -0.5% 314.00 311.00 -1.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 388.00 386.00 -0.5% 563.00 553.00 -1.8% test-suite :: MicroBenchmarks/LCALS/SubsetCLambdaLoops/lcalsCLambda.test 1023.00 1017.00 -0.6% 1379.00 1343.00 -2.6% test-suite :: MicroBenchmarks/LCALS/SubsetCRawLoops/lcalsCRaw.test 1026.00 1019.00 -0.7% 1382.00 1345.00 -2.7% test-suite :: MultiSource/Applications/SPASS/SPASS.test 1395.00 1385.00 -0.7% 2902.00 2890.00 -0.4% test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 2437.00 2418.00 -0.8% 6140.00 6324.00 3.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/bison/mybison.test 112.00 111.00 -0.9% 151.00 149.00 -1.3% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/SimpleMOC/SimpleMOC.test 72.00 71.00 -1.4% 165.00 161.00 -2.4% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 557.00 549.00 -1.4% 909.00 887.00 -2.4% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 983.00 967.00 -1.6% 1506.00 1479.00 -1.8% test-suite :: MultiSource/Benchmarks/Rodinia/backprop/backprop.test 56.00 55.00 -1.8% 58.00 57.00 -1.7% test-suite :: MultiSource/Benchmarks/Rodinia/srad/srad.test 55.00 54.00 -1.8% 78.00 73.00 -6.4% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 980.00 962.00 -1.8% 2618.00 2453.00 -6.3% test-suite :: MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl.test 54.00 53.00 -1.9% 77.00 76.00 -1.3% test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test 210.00 206.00 -1.9% 436.00 437.00 0.2% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG.test 46.00 45.00 -2.2% 62.00 59.00 -4.8% test-suite :: MultiSource/Applications/hexxagon/hexxagon.test 91.00 89.00 -2.2% 161.00 159.00 -1.2% test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test 1529.00 1495.00 -2.2% 3341.00 3254.00 -2.6% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 2984.00 2916.00 -2.3% 6552.00 6425.00 -1.9% test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 1212.00 1183.00 -2.4% 3061.00 2987.00 -2.4% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 230.00 223.00 -3.0% 438.00 430.00 -1.8% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 451.00 437.00 -3.1% 589.00 570.00 -3.2% test-suite :: MultiSource/Benchmarks/nbench/nbench.test 187.00 181.00 -3.2% 229.00 221.00 -3.5% test-suite :: MultiSource/Benchmarks/Rodinia/hotspot/hotspot.test 28.00 27.00 -3.6% 30.00 30.00 0.0% test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test 252.00 243.00 -3.6% 366.00 356.00 -2.7% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1006.00 970.00 -3.6% 1814.00 1773.00 -2.3% test-suite :: MultiSource/Applications/lua/lua.test 76.00 73.00 -3.9% 124.00 128.00 3.2% test-suite :: MultiSource/Applications/hbd/hbd.test 50.00 48.00 -4.0% 95.00 86.00 -9.5% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 294.00 282.00 -4.1% 441.00 423.00 -4.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT.test 310.00 297.00 -4.2% 1064.00 1051.00 -1.2% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 959.00 917.00 -4.4% 1346.00 1271.00 -5.6% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 2358.00 2252.00 -4.5% 3689.00 3441.00 -6.7% test-suite :: MultiSource/Benchmarks/McCat/05-eks/eks.test 66.00 63.00 -4.5% 78.00 75.00 -3.8% test-suite :: MultiSource/Benchmarks/sim/sim.test 176.00 168.00 -4.5% 353.00 353.00 0.0% test-suite :: MultiSource/Benchmarks/Ptrdist/bc/bc.test 43.00 41.00 -4.7% 152.00 150.00 -1.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 294.00 280.00 -4.8% 449.00 429.00 -4.5% test-suite :: MultiSource/Applications/SIBsim4/SIBsim4.test 265.00 251.00 -5.3% 656.00 655.00 -0.2% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/doitgen/doitgen.test 18.00 17.00 -5.6% 31.00 30.00 -3.2% test-suite :: MultiSource/Benchmarks/MallocBench/espresso/espresso.test 244.00 230.00 -5.7% 386.00 365.00 -5.4% test-suite :: MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo.test 17.00 16.00 -5.9% 31.00 30.00 -3.2% test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl.test 15.00 14.00 -6.7% 21.00 20.00 -4.8% test-suite :: MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt.test 15.00 14.00 -6.7% 21.00 20.00 -4.8% test-suite :: MultiSource/Benchmarks/MallocBench/cfrac/cfrac.test 30.00 28.00 -6.7% 50.00 48.00 -4.0% test-suite :: MultiSource/Applications/oggenc/oggenc.test 744.00 691.00 -7.1% 1439.00 1357.00 -5.7% test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test 28.00 26.00 -7.1% 32.00 30.00 -6.2% test-suite :: MultiSource/Benchmarks/TSVC/Expansion-dbl/Expansion-dbl.test 55.00 51.00 -7.3% 66.00 62.00 -6.1% test-suite :: MultiSource/Applications/ClamAV/clamscan.test 2016.00 1869.00 -7.3% 4485.00 4069.00 -9.3% test-suite :: MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt.test 54.00 50.00 -7.4% 65.00 61.00 -6.2% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt.test 13.00 12.00 -7.7% 16.00 15.00 -6.2% test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl.test 39.00 36.00 -7.7% 48.00 44.00 -8.3% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 62.00 57.00 -8.1% 80.00 69.00 -13.7% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 35.00 32.00 -8.6% 32.00 29.00 -9.4% test-suite :: SingleSource/Benchmarks/Polybench/stencils/fdtd-2d/fdtd-2d.test 23.00 21.00 -8.7% 48.00 46.00 -4.2% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1163.00 1058.00 -9.0% 2094.00 1970.00 -5.9% test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl.test 11.00 10.00 -9.1% 14.00 13.00 -7.1% test-suite :: MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt.test 11.00 10.00 -9.1% 14.00 13.00 -7.1% test-suite :: MicroBenchmarks/LoopVectorization/LoopEpilogueVectorizationBenchmarks.test 65.00 59.00 -9.2% 83.00 77.00 -7.2% test-suite :: MicroBenchmarks/LoopVectorization/LoopInterleavingBenchmarks.test 65.00 59.00 -9.2% 83.00 77.00 -7.2% test-suite :: MicroBenchmarks/LoopVectorization/LoopVectorizationBenchmarks.test 65.00 59.00 -9.2% 83.00 77.00 -7.2% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 43.00 39.00 -9.3% 228.00 214.00 -6.1% test-suite :: MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk.test 107.00 97.00 -9.3% 272.00 239.00 -12.1% test-suite :: MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test 32.00 29.00 -9.4% 41.00 37.00 -9.8% test-suite :: MultiSource/Applications/d/make_dparser.test 219.00 197.00 -10.0% 676.00 569.00 -15.8% test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-dbl/Equivalencing-dbl.test 19.00 17.00 -10.5% 25.00 23.00 -8.0% test-suite :: MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt.test 19.00 17.00 -10.5% 25.00 23.00 -8.0% test-suite :: SingleSource/Benchmarks/Adobe-C++/functionobjects.test 9.00 8.00 -11.1% 39.00 38.00 -2.6% test-suite :: MultiSource/Benchmarks/BitBench/uuencode/uuencode.test 9.00 8.00 -11.1% 11.00 10.00 -9.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test 1610.00 1431.00 -11.1% 2979.00 2724.00 -8.6% test-suite :: MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1.test 17.00 15.00 -11.8% 15.00 13.00 -13.3% test-suite :: SingleSource/Benchmarks/McGill/chomp.test 24.00 21.00 -12.5% 28.00 24.00 -14.3% test-suite :: MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt.test 8.00 7.00 -12.5% 11.00 10.00 -9.1% test-suite :: MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl.test 8.00 7.00 -12.5% 11.00 10.00 -9.1% test-suite :: MultiSource/Applications/viterbi/viterbi.test 16.00 14.00 -12.5% 27.00 25.00 -7.4% test-suite :: MultiSource/Applications/sgefa/sgefa.test 7.00 6.00 -14.3% 7.00 6.00 -14.3% test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 768.00 626.00 -18.5% 1218.00 1076.00 -11.7% test-suite :: SingleSource/Benchmarks/Linpack/linpack-pc.test 51.00 41.00 -19.6% 93.00 85.00 -8.6% test-suite :: SingleSource/Benchmarks/Polybench/medley/nussinov/nussinov.test 5.00 4.00 -20.0% 11.00 14.00 27.3% test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-2d/jacobi-2d.test 11.00 8.00 -27.3% 17.00 14.00 -17.6% test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test 7.00 5.00 -28.6% 10.00 7.00 -30.0% test-suite :: MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt.test 7.00 5.00 -28.6% 10.00 7.00 -30.0% test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl.test 3.00 2.00 -33.3% 5.00 4.00 -20.0% test-suite :: MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt.test 3.00 2.00 -33.3% 5.00 4.00 -20.0% test-suite :: MultiSource/Applications/aha/aha.test 3.00 2.00 -33.3% 3.00 2.00 -33.3% test-suite :: SingleSource/Benchmarks/Stanford/Quicksort.test 3.00 2.00 -33.3% 3.00 2.00 -33.3% test-suite :: MultiSource/Applications/obsequi/Obsequi.test 138.00 90.00 -34.8% 191.00 153.00 -19.9% test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test 17.00 11.00 -35.3% 30.00 15.00 -50.0% test-suite :: SingleSource/Benchmarks/Misc-C++-EH/spirit.test 306.00 159.00 -48.0% 422.00 1.00 -99.8% test-suite :: MultiSource/Benchmarks/Ptrdist/anagram/anagram.test 6.00 3.00 -50.0% 6.00 3.00 -50.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/fannkuch.test 9.00 4.00 -55.6% 14.00 4.00 -71.4% test-suite :: MultiSource/Benchmarks/Rodinia/pathfinder/pathfinder.test 2.00 0.00 -100.0% 2.00 -100.0% test-suite :: MicroBenchmarks/ImageProcessing/Dilate/Dilate.test 0.00 0.00 test-suite :: MicroBenchmarks/ImageProcessing/Dither/Dither.test 0.00 0.00 test-suite :: MicroBenchmarks/ImageProcessing/Interpolation/Interpolation.test 0.00 0.00 test-suite :: MicroBenchmarks/LoopInterchange/LoopInterchange.test 0.00 0.00 test-suite :: MicroBenchmarks/MemFunctions/MemFunctions.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/BitBench/drop3/drop3.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/BitBench/uudecode/uudecode.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/FreeBench/mason/mason.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/01-qbsort/qbsort.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/03-testtrie/testtrie.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/04-bisect/bisect.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/08-main/main.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/09-vor/vor.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/12-IOtest/iotest.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/McCat/17-bintr/bintr.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/MiBench/network-patricia/network-patricia.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Olden/bisort/bisort.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Olden/health/health.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Olden/perimeter/perimeter.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Olden/treeadd/treeadd.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Prolangs-C++/life/life.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Prolangs-C++/primes/primes.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Prolangs-C++/simul/simul.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Ptrdist/ft/ft.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/VersaBench/8b10b/8b10b.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/VersaBench/bmm/bmm.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/mediabench/adpcm/rawcaudio/rawcaudio.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/mediabench/adpcm/rawdaudio/rawdaudio.test 0.00 0.00 test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Adobe-C++/stepanov_abstraction.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/BenchmarkGame/Large/fasta.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/BenchmarkGame/nsieve-bits.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/BenchmarkGame/recursive.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/CoyoteBench/huffbench.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/CoyoteBench/lpbench.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Dhrystone/dry.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Dhrystone/fldry.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/McGill/queens.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc-C++/mandel-text.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc-C++/oopack_v1p8.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/dt.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/evalloop.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-1.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-2.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-3.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-4.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-5.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-6.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-7.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/flops-8.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/fp-convert.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/lowercase.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/mandel.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/matmul_f64_4x4.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/perlin.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/pi.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/revertBits.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/richards_benchmark.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Misc/salsa20.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/datamining/covariance/covariance.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/gemver/gemver.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/gesummv/gesummv.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/syrk/syrk.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/trmm/trmm.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg/bicg.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/mvt/mvt.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/cholesky/cholesky.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/durbin/durbin.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/lu/lu.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/ludcmp/ludcmp.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/trisolv/trisolv.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/medley/floyd-warshall/floyd-warshall.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-1d/jacobi-1d.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/EH/Shootout-C++-except.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ackermann.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ary.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ary2.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ary3.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-fibo.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-hash.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-hash2.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-heapsort.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists1.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-matrix.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-methcall.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-moments.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-nestedloop.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-objinst.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-random.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-sieve.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-strcat.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-ackermann.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-ary3.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-fib2.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-hash.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-heapsort.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-lists.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-matrix.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-methcall.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-nestedloop.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-objinst.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-random.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-sieve.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Shootout/Shootout-strcat.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Stanford/Bubblesort.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Stanford/Perm.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Stanford/Puzzle.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Stanford/Towers.test 0.00 0.00 test-suite :: SingleSource/Benchmarks/Stanford/Treesort.test 0.00 0.00 Geomean difference -5.3% -8.2% 

x86_64 llvm-test-suite

Metric: regalloc.NumSpills,regalloc.NumReloads Program regalloc.NumSpills regalloc.NumReloads lhs rhs diff lhs rhs diff test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 594.00 594.00 0.0% 737.00 737.00 0.0% test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding.test 202.00 202.00 0.0% 215.00 215.00 0.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test 9.00 9.00 0.0% 8.00 8.00 0.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test 25.00 25.00 0.0% 11.00 11.00 0.0% test-suite :: SingleSource/Benchmarks/CoyoteBench/almabench.test 199.00 199.00 0.0% 201.00 201.00 0.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/partialsums.test 16.00 16.00 0.0% 19.00 19.00 0.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/recursive.test 7.00 7.00 0.0% 9.00 9.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/oopack_v1p8.test 3.00 3.00 0.0% 3.00 3.00 0.0% test-suite :: SingleSource/Benchmarks/McGill/misr.test 14.00 14.00 0.0% 18.00 18.00 0.0% test-suite :: SingleSource/Benchmarks/McGill/queens.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/Large/ray.test 27.00 27.00 0.0% 30.00 30.00 0.0% test-suite :: SingleSource/Benchmarks/CoyoteBench/fftbench.test 32.00 32.00 0.0% 36.00 38.00 5.6% test-suite :: SingleSource/Benchmarks/Stanford/Puzzle.test 2.00 2.00 0.0% 2.00 2.00 0.0% test-suite :: SingleSource/Benchmarks/Stanford/Queens.test 6.00 6.00 0.0% 7.00 7.00 0.0% test-suite :: SingleSource/Benchmarks/Stanford/RealMM.test 26.00 26.00 0.0% test-suite :: SingleSource/Benchmarks/Stanford/FloatMM.test 26.00 26.00 0.0% test-suite :: SingleSource/Benchmarks/Shootout/Shootout-matrix.test 25.00 25.00 0.0% 30.00 32.00 6.7% test-suite :: SingleSource/Benchmarks/Misc-C++/mandel-text.test 3.00 3.00 0.0% 3.00 3.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/bigfib.test 13.00 13.00 0.0% 32.00 31.00 -3.1% test-suite :: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.test 17.00 17.00 0.0% 16.00 16.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_container.test 44.00 44.00 0.0% 51.00 51.00 0.0% test-suite :: SingleSource/Benchmarks/Misc-C++/stepanov_v1p2.test 27.00 27.00 0.0% 38.00 38.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/ReedSolomon.test 18.00 18.00 0.0% 27.00 22.00 -18.5% test-suite :: SingleSource/Benchmarks/Misc/fbench.test 65.00 65.00 0.0% 58.00 58.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/flops.test 52.00 52.00 0.0% 29.00 29.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/flops-8.test 4.00 4.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/flops-6.test 4.00 4.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/flops-5.test 4.00 4.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/datamining/correlation/correlation.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/whetstone.test 26.00 26.00 0.0% 25.00 25.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/pi.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/salsa20.test 13.00 13.00 0.0% 10.00 10.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/mandel.test 5.00 5.00 0.0% 2.00 2.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/mandel-2.test 7.00 7.00 0.0% 6.00 6.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/perlin.test 23.00 23.00 0.0% 21.00 21.00 0.0% test-suite :: SingleSource/Benchmarks/Misc/matmul_f64_4x4.test 6.00 6.00 0.0% 7.00 7.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/gemver/gemver.test 3.00 3.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/mvt/mvt.test 2.00 2.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/doitgen/doitgen.test 31.00 31.00 0.0% 30.00 30.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg/bicg.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/syrk/syrk.test 5.00 5.00 0.0% 21.00 21.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/syr2k/syr2k.test 5.00 5.00 0.0% 19.00 19.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/symm/symm.test 2.00 2.00 0.0% 7.00 7.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/gesummv/gesummv.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/ludcmp/ludcmp.test 3.00 3.00 0.0% 3.00 3.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt.test 5.00 5.00 0.0% 6.00 6.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/medley/deriche/deriche.test 2.00 2.00 0.0% 2.00 2.00 0.0% test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-sieve.test 1.00 1.00 0.0% 1.00 1.00 0.0% test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-matrix.test 94.00 94.00 0.0% 173.00 175.00 1.2% test-suite :: SingleSource/Benchmarks/SmallPT/smallpt.test 91.00 91.00 0.0% 145.00 145.00 0.0% test-suite :: SingleSource/Benchmarks/Shootout/Shootout-hash.test 2.00 2.00 0.0% 9.00 9.00 0.0% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/durbin/durbin.test 9.00 9.00 0.0% 11.00 11.00 0.0% test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ary3.test 1.00 1.00 0.0% 2.00 2.00 0.0% test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-moments.test 5.00 5.00 0.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ary.test 1.00 1.00 0.0% 2.00 2.00 0.0% test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 436.00 432.00 -0.9% 355.00 351.00 -1.1% test-suite :: SingleSource/Benchmarks/Linpack/linpack-pc.test 410.00 403.00 -1.7% 442.00 445.00 0.7% test-suite :: SingleSource/Benchmarks/Misc-C++-EH/spirit.test 167.00 163.00 -2.4% 513.00 507.00 -1.2% test-suite :: SingleSource/Benchmarks/McGill/chomp.test 34.00 33.00 -2.9% 43.00 41.00 -4.7% test-suite :: SingleSource/Benchmarks/Misc/himenobmtxpa.test 103.00 98.00 -4.9% 107.00 103.00 -3.7% test-suite :: SingleSource/Benchmarks/Polybench/stencils/adi/adi.test 15.00 14.00 -6.7% 29.00 26.00 -10.3% test-suite :: SingleSource/Benchmarks/BenchmarkGame/fannkuch.test 9.00 8.00 -11.1% 15.00 6.00 -60.0% test-suite :: SingleSource/Benchmarks/Adobe-C++/functionobjects.test 50.00 44.00 -12.0% 103.00 91.00 -11.7% test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-hash.test 8.00 7.00 -12.5% 13.00 12.00 -7.7% test-suite :: SingleSource/Benchmarks/Misc/ffbench.test 14.00 12.00 -14.3% 17.00 15.00 -11.8% test-suite :: SingleSource/Benchmarks/Polybench/stencils/heat-3d/heat-3d.test 7.00 6.00 -14.3% 7.00 6.00 -14.3% test-suite :: SingleSource/Benchmarks/Stanford/Oscar.test 52.00 44.00 -15.4% 88.00 77.00 -12.5% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 150.00 126.00 -16.0% 551.00 497.00 -9.8% test-suite :: SingleSource/Benchmarks/Polybench/stencils/fdtd-2d/fdtd-2d.test 20.00 16.00 -20.0% 36.00 30.00 -16.7% test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-hash2.test 13.00 10.00 -23.1% 15.00 12.00 -20.0% test-suite :: SingleSource/Benchmarks/CoyoteBench/lpbench.test 24.00 18.00 -25.0% 29.00 26.00 -10.3% test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/cholesky/cholesky.test 3.00 2.00 -33.3% 4.00 3.00 -25.0% test-suite :: SingleSource/Benchmarks/Adobe-C++/stepanov_vector.test 90.00 59.00 -34.4% 95.00 52.00 -45.3% test-suite :: SingleSource/Benchmarks/Adobe-C++/stepanov_abstraction.test 122.00 62.00 -49.2% 103.00 25.00 -75.7% test-suite :: SingleSource/Benchmarks/Polybench/medley/nussinov/nussinov.test 4.00 2.00 -50.0% 13.00 11.00 -15.4% test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-2d/jacobi-2d.test 2.00 1.00 -50.0% 5.00 5.00 0.0% test-suite :: SingleSource/Benchmarks/BenchmarkGame/Large/fasta.test test-suite :: SingleSource/Benchmarks/BenchmarkGame/nsieve-bits.test test-suite :: SingleSource/Benchmarks/BenchmarkGame/puzzle.test test-suite :: SingleSource/Benchmarks/CoyoteBench/huffbench.test test-suite :: SingleSource/Benchmarks/Dhrystone/dry.test test-suite :: SingleSource/Benchmarks/Dhrystone/fldry.test test-suite :: SingleSource/Benchmarks/Misc/dt.test test-suite :: SingleSource/Benchmarks/Misc/evalloop.test test-suite :: SingleSource/Benchmarks/Misc/flops-1.test test-suite :: SingleSource/Benchmarks/Misc/flops-2.test test-suite :: SingleSource/Benchmarks/Misc/flops-3.test test-suite :: SingleSource/Benchmarks/Misc/flops-4.test test-suite :: SingleSource/Benchmarks/Misc/flops-7.test test-suite :: SingleSource/Benchmarks/Misc/fp-convert.test test-suite :: SingleSource/Benchmarks/Misc/lowercase.test test-suite :: SingleSource/Benchmarks/Misc/revertBits.test test-suite :: SingleSource/Benchmarks/Misc/richards_benchmark.test test-suite :: SingleSource/Benchmarks/Polybench/datamining/covariance/covariance.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/blas/trmm/trmm.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/lu/lu.test test-suite :: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/trisolv/trisolv.test test-suite :: SingleSource/Benchmarks/Polybench/medley/floyd-warshall/floyd-warshall.test test-suite :: SingleSource/Benchmarks/Polybench/stencils/jacobi-1d/jacobi-1d.test test-suite :: SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d.test test-suite :: SingleSource/Benchmarks/Shootout-C++/EH/Shootout-C++-except.test test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ackermann.test test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-ary2.test test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-fibo.test test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-heapsort.test test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists.test test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists1.test test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-methcall.test test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-nestedloop.test test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-objinst.test test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-random.test test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-strcat.test test-suite :: SingleSource/Benchmarks/Shootout/Shootout-ackermann.test test-suite :: SingleSource/Benchmarks/Shootout/Shootout-ary3.test test-suite :: SingleSource/Benchmarks/Shootout/Shootout-fib2.test test-suite :: SingleSource/Benchmarks/Shootout/Shootout-heapsort.test test-suite :: SingleSource/Benchmarks/Shootout/Shootout-lists.test test-suite :: SingleSource/Benchmarks/Shootout/Shootout-methcall.test test-suite :: SingleSource/Benchmarks/Shootout/Shootout-nestedloop.test test-suite :: SingleSource/Benchmarks/Shootout/Shootout-objinst.test test-suite :: SingleSource/Benchmarks/Shootout/Shootout-random.test test-suite :: SingleSource/Benchmarks/Shootout/Shootout-sieve.test test-suite :: SingleSource/Benchmarks/Shootout/Shootout-strcat.test test-suite :: SingleSource/Benchmarks/Stanford/Bubblesort.test test-suite :: SingleSource/Benchmarks/Stanford/Perm.test test-suite :: SingleSource/Benchmarks/Stanford/Quicksort.test test-suite :: SingleSource/Benchmarks/Stanford/Towers.test test-suite :: SingleSource/Benchmarks/Stanford/Treesort.test Geomean difference -6.2% -6.5% 

RISC-V SPEC CPU 2017

Metric: regalloc.NumSpills,regalloc.NumReloads Program regalloc.NumSpills regalloc.NumReloads lhs rhs diff lhs rhs diff test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test 103.00 104.00 1.0% 196.00 196.00 0.0% test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test 103.00 104.00 1.0% 196.00 196.00 0.0% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 6569.00 6568.00 -0.0% 15245.00 15177.00 -0.4% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 4373.00 4361.00 -0.3% 9975.00 9647.00 -3.3% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 4373.00 4361.00 -0.3% 9975.00 9647.00 -3.3% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 43211.00 42791.00 -1.0% 76421.00 75706.00 -0.9% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 11214.00 11035.00 -1.6% 24565.00 24268.00 -1.2% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 11214.00 11035.00 -1.6% 24565.00 24268.00 -1.2% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 305.00 300.00 -1.6% 434.00 420.00 -3.2% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 305.00 300.00 -1.6% 434.00 420.00 -3.2% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1897.00 1858.00 -2.1% 4173.00 4063.00 -2.6% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1897.00 1858.00 -2.1% 4173.00 4063.00 -2.6% test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test 707.00 690.00 -2.4% 1049.00 1026.00 -2.2% test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test 707.00 690.00 -2.4% 1049.00 1026.00 -2.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12927.00 12452.00 -3.7% 25357.00 24688.00 -2.6% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3461.00 3333.00 -3.7% 8736.00 7990.00 -8.5% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3461.00 3333.00 -3.7% 8736.00 7990.00 -8.5% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1778.00 1703.00 -4.2% 2978.00 2808.00 -5.7% test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test 676.00 646.00 -4.4% 1256.00 1189.00 -5.3% test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test 676.00 646.00 -4.4% 1256.00 1189.00 -5.3% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 289.00 265.00 -8.3% 552.00 515.00 -6.7% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 289.00 265.00 -8.3% 552.00 515.00 -6.7% test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test 297.00 269.00 -9.4% 523.00 470.00 -10.1% test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test 297.00 269.00 -9.4% 523.00 470.00 -10.1% test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test 57.00 50.00 -12.3% 58.00 51.00 -12.1% test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test 53.00 46.00 -13.2% 54.00 47.00 -13.0% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 1665.00 1397.00 -16.1% 2642.00 2243.00 -15.1% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 1665.00 1397.00 -16.1% 2642.00 2243.00 -15.1% Geomean difference -4.9% -5.5% 

x86_64 SPEC CPU 2017

Program regalloc.NumSpills regalloc.NumReloads lhs rhs diff lhs rhs diff test-suite :: External/SPEC/CFP2017rate/519.lbm_r/519.lbm_r.test 94.00 94.00 0.0% 83.00 83.00 0.0% test-suite :: External/SPEC/CFP2017speed/619.lbm_s/619.lbm_s.test 94.00 94.00 0.0% 83.00 83.00 0.0% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 10919.00 10907.00 -0.1% 25216.00 25094.00 -0.5% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 5367.00 5319.00 -0.9% 7102.00 6922.00 -2.5% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 15560.00 15405.00 -1.0% 38061.00 37357.00 -1.8% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 15560.00 15405.00 -1.0% 38061.00 37357.00 -1.8% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 5004.00 4935.00 -1.4% 14091.00 13422.00 -4.7% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 5004.00 4935.00 -1.4% 14091.00 13422.00 -4.7% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 373.00 367.00 -1.6% 829.00 813.00 -1.9% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 373.00 367.00 -1.6% 829.00 813.00 -1.9% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 31906.00 31247.00 -2.1% 56840.00 54384.00 -4.3% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 7248.00 7092.00 -2.2% 18435.00 17948.00 -2.6% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 7248.00 7092.00 -2.2% 18435.00 17948.00 -2.6% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 47779.00 46669.00 -2.3% 87284.00 86015.00 -1.5% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 2539.00 2456.00 -3.3% 5032.00 4861.00 -3.4% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 2539.00 2456.00 -3.3% 5032.00 4861.00 -3.4% test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test 948.00 917.00 -3.3% 1376.00 1343.00 -2.4% test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test 948.00 917.00 -3.3% 1376.00 1343.00 -2.4% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 588.00 568.00 -3.4% 722.00 693.00 -4.0% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 588.00 568.00 -3.4% 722.00 693.00 -4.0% test-suite :: External/SPEC/CINT2017speed/605.mcf_s/605.mcf_s.test 166.00 160.00 -3.6% 403.00 392.00 -2.7% test-suite :: External/SPEC/CINT2017rate/505.mcf_r/505.mcf_r.test 166.00 160.00 -3.6% 403.00 392.00 -2.7% test-suite :: External/SPEC/CINT2017rate/520.omnetpp_r/520.omnetpp_r.test 2298.00 2158.00 -6.1% 3386.00 3192.00 -5.7% test-suite :: External/SPEC/CINT2017speed/620.omnetpp_s/620.omnetpp_s.test 2298.00 2158.00 -6.1% 3386.00 3192.00 -5.7% test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test 787.00 734.00 -6.7% 1023.00 923.00 -9.8% test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test 787.00 734.00 -6.7% 1023.00 923.00 -9.8% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 5268.00 4786.00 -9.1% 10284.00 9114.00 -11.4% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 5268.00 4786.00 -9.1% 10284.00 9114.00 -11.4% Geomean difference -3.2% -4.0% 

@hstk30-hw hstk30-hw requested review from asb, efriedma-quic, guy-david and preames and removed request for asb and guy-david September 17, 2025 01:09
Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think this is generally the right direction, and am very supportive of the work.

However, I want to suggest a code structure change. I think we need to split the current isTriviallyReMaterializable into two. Version 1 keeps the current behavior. Version 2 explicitly allows the virtual regs, and the caller takes the responsibility for checking liveness.

As was discussed in the old phabricator review, I think there's an important difference between "we know this instruction is going to be materializeable in the future", and "we know this instruction is rematerializeable right now". The later question gets to use a a lot more information.

The tricky bit is that I think we already have this distinction in the current code, and just don't realize it. Several of the backends (AMDGPU, RISC-V for vector ops) already allow rematerialization of instructions with live virt regs!

My suggestion would be something along the line of removing Trivially from the name of isTriviallyReMaterializable, and instead pass a boolean argument named "DisallowVRegUses". Most callers pass true, with the one in LRE passing false.

This also allows targets to "opt in" to the new behavior. Benchmarks that want to keep the old behavior could unconditionally pass true to the generic implementation in their target hook.

@preames
Copy link
Collaborator

preames commented Sep 22, 2025

#160153 is a starting point on the alternative approach I was suggesting in my last comment.

@preames
Copy link
Collaborator

preames commented Sep 26, 2025

Just to note, a bunch of changes have gone in with the goal of making this change more straight forward. An API reorg (#160377) made it easier to audit the call sites, and their expectations. We've been auditing call sites one by one to try and figure out the expected behavior between trivial and non-trivial remat (since we actually have both already, just in a much less aggressive form)

We had two heuristic changes which were prerequisites for this change, both have now landed:
#159180, and #160709

At this point, I think we're ready to rebase this, double check the perf impact again, and then move forward with landing this in the next few days. We'll need to audit the remaining callsites for non-trivial remat one more time, and maybe we'll find another blocker, but at the moment, I don't know of any.

Luke, when you rebase, please make sure to adjust the framing on the review description. As we've discussed, this isn't actually introducing the concept of non-trivial remat - we had two backends abusing the prior APIs to achieve this - it's "simply" greatly increasing how aggressive we are about non-trivial remat by default. Framing it that was should make it easier to understand for later readers.

@lukel97 lukel97 changed the title [RegAlloc] Allow rematerialization with virtual reg uses [RegAlloc] Remove default restriction on non-trivial rematerialization Sep 30, 2025
@lukel97 lukel97 force-pushed the regalloc/allow-virtreg-remat branch from b51bc96 to 608eabb Compare September 30, 2025 10:25
Stacked on llvm#159180. Unless overridden by the target, we currently only allow rematerlization of instructions with immediate or constant physical register operands, i.e. no virtual registers. The comment states that this is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. This patch relaxes this constraint which reduces a significant amount of reloads across various targets. This is another attempt at https://reviews.llvm.org/D106408, but llvm#159180 aims to have addressed the issue with the weights that may have caused the previous regressions.
@lukel97 lukel97 force-pushed the regalloc/allow-virtreg-remat branch from 608eabb to 6c47182 Compare September 30, 2025 10:40
@lukel97
Copy link
Contributor Author

lukel97 commented Sep 30, 2025

This should be ready for review now, I've rebased and rerun the results on rva23u64 -O3 and arm64-apple-darwin -O3 and there was virtually no change ( < 0.1%) to the previous results in number of registers spilled/reloaded. I've also updated the PR description to clarify that we actually previously had non-trivial remat, and to mention the other work that went into untangling the API.

@llvmbot
Copy link
Member

llvmbot commented Sep 30, 2025

@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-llvm-globalisel
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses.

We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in TargetTransformInfo::isReMaterializableImpl. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites.

https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations.

However #160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and #160709 and #159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial.

With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets.

llvm-test-suite regalloc.NumSpills geomean llvm-test-suite regalloc.NumReloads geomean
-target riscv64-linux-gnu -march=rva23u64 -O3 -5.2% -8.1%
-target arm64-apple-darwin -O3 -10.8% -11.6%
-target x86_64-linux-gnu -O3 -6.2% -6.5%
SPEC CPU 2017 regalloc.NumSpills geomean SPEC CPU 2017 regalloc.NumReloads geomean
-target riscv64-linux-gnu -march=rva23u64 -O3 -4.9% -5.5%
-target x86_64-linux-gnu -O3 -3.2% -4.0%

I wasn't able to build SPEC CPU 2017 on arm64-apple-darwin due to incompatibilities with the macOS SDK headers.

This also allows us to rematerialize loads and stores on RISC-V in a future patch.


Patch is 218.03 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159211.diff

28 Files Affected:

  • (modified) llvm/lib/CodeGen/TargetInstrInfo.cpp (-6)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll (+415-397)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/machine-combiner-copy.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll (+23-24)
  • (modified) llvm/test/CodeGen/AArch64/peephole-and-tst.ll (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/reserveXreg-for-regalloc.ll (+1-5)
  • (modified) llvm/test/CodeGen/AArch64/tbl-loops.ll (+4-4)
  • (modified) llvm/test/CodeGen/ARM/combine-movc-sub.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/pr69586.ll (+102-102)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/reductions.ll (+3-3)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/spillingmove.ll (+72-79)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/varying-outer-2d-reduction.ll (+40-40)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination.ll (+6-7)
  • (modified) llvm/test/CodeGen/Thumb2/mve-float16regloops.ll (+40-42)
  • (modified) llvm/test/CodeGen/Thumb2/mve-float32regloops.ll (+49-51)
  • (modified) llvm/test/CodeGen/Thumb2/mve-gather-increment.ll (+138-140)
  • (modified) llvm/test/CodeGen/Thumb2/mve-phireg.ll (+14-16)
  • (modified) llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll (+238-281)
  • (modified) llvm/test/CodeGen/Thumb2/mve-qrintrsplat.ll (+10-12)
  • (modified) llvm/test/CodeGen/Thumb2/mve-scatter-increment.ll (+5-5)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vecreduce-addpred.ll (+6-10)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vecreduce-mlapred.ll (+41-51)
  • (modified) llvm/test/CodeGen/X86/AMX/amx-greedy-ra-spill-shape.ll (+37-45)
  • (modified) llvm/test/CodeGen/X86/dag-update-nodetomatch.ll (+2-3)
  • (modified) llvm/test/CodeGen/X86/delete-dead-instrs-with-live-uses.mir (+2-2)
  • (modified) llvm/test/CodeGen/X86/inalloca-invoke.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/licm-regpressure.ll (+56-6)
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index 2f3b7a2c8fcdf..3c41bbeb4b327 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -1657,12 +1657,6 @@ bool TargetInstrInfo::isReMaterializableImpl( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; - - // Don't allow any virtual-register uses. Rematting an instruction with - // virtual register uses would length the live ranges of the uses, which - // is not necessarily a good idea, certainly not "trivial". - if (MO.isUse()) - return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll b/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll index ed68723e470a2..41f7ab89094ad 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll @@ -1219,14 +1219,14 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; ; GISEL-LABEL: test_shl_i1024: ; GISEL: ; %bb.0: ; %entry -; GISEL-NEXT: sub sp, sp, #416 -; GISEL-NEXT: stp x28, x27, [sp, #320] ; 16-byte Folded Spill -; GISEL-NEXT: stp x26, x25, [sp, #336] ; 16-byte Folded Spill -; GISEL-NEXT: stp x24, x23, [sp, #352] ; 16-byte Folded Spill -; GISEL-NEXT: stp x22, x21, [sp, #368] ; 16-byte Folded Spill -; GISEL-NEXT: stp x20, x19, [sp, #384] ; 16-byte Folded Spill -; GISEL-NEXT: stp x29, x30, [sp, #400] ; 16-byte Folded Spill -; GISEL-NEXT: .cfi_def_cfa_offset 416 +; GISEL-NEXT: sub sp, sp, #432 +; GISEL-NEXT: stp x28, x27, [sp, #336] ; 16-byte Folded Spill +; GISEL-NEXT: stp x26, x25, [sp, #352] ; 16-byte Folded Spill +; GISEL-NEXT: stp x24, x23, [sp, #368] ; 16-byte Folded Spill +; GISEL-NEXT: stp x22, x21, [sp, #384] ; 16-byte Folded Spill +; GISEL-NEXT: stp x20, x19, [sp, #400] ; 16-byte Folded Spill +; GISEL-NEXT: stp x29, x30, [sp, #416] ; 16-byte Folded Spill +; GISEL-NEXT: .cfi_def_cfa_offset 432 ; GISEL-NEXT: .cfi_offset w30, -8 ; GISEL-NEXT: .cfi_offset w29, -16 ; GISEL-NEXT: .cfi_offset w19, -24 @@ -1242,38 +1242,44 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: ldp x10, x11, [x1] ; GISEL-NEXT: mov w8, w2 ; GISEL-NEXT: lsr x9, x8, #6 -; GISEL-NEXT: and x16, x8, #0x3f +; GISEL-NEXT: and x12, x8, #0x3f +; GISEL-NEXT: str x0, [sp, #144] ; 8-byte Folded Spill +; GISEL-NEXT: and x14, x8, #0x3f ; GISEL-NEXT: mov w13, #64 ; =0x40 -; GISEL-NEXT: sub x21, x13, x16 -; GISEL-NEXT: str x0, [sp, #112] ; 8-byte Folded Spill -; GISEL-NEXT: mov x24, x16 -; GISEL-NEXT: lsl x25, x10, x16 +; GISEL-NEXT: and x16, x8, #0x3f +; GISEL-NEXT: lsl x0, x10, x12 ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: lsr x26, x10, x21 -; GISEL-NEXT: lsl x2, x11, x16 -; GISEL-NEXT: lsr x23, x11, x21 -; GISEL-NEXT: mov x22, x21 -; GISEL-NEXT: csel x12, x25, xzr, eq +; GISEL-NEXT: sub x2, x13, x14 +; GISEL-NEXT: lsr x3, x10, x2 +; GISEL-NEXT: lsl x6, x11, x14 +; GISEL-NEXT: and x14, x8, #0x3f +; GISEL-NEXT: csel x12, x0, xzr, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: str x1, [sp, #312] ; 8-byte Folded Spill +; GISEL-NEXT: lsr x20, x11, x2 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: str x23, [sp, #208] ; 8-byte Folded Spill +; GISEL-NEXT: mov x24, x0 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #3 -; GISEL-NEXT: stp x24, x22, [sp, #40] ; 16-byte Folded Spill +; GISEL-NEXT: mov x7, x3 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #4 +; GISEL-NEXT: mov x28, x1 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #5 +; GISEL-NEXT: and x21, x8, #0x3f ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #6 +; GISEL-NEXT: str x6, [sp, #24] ; 8-byte Folded Spill ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #7 +; GISEL-NEXT: str x28, [sp, #304] ; 8-byte Folded Spill ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #8 +; GISEL-NEXT: str x7, [sp, #272] ; 8-byte Folded Spill ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #9 +; GISEL-NEXT: str x20, [sp, #112] ; 8-byte Folded Spill ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #10 ; GISEL-NEXT: csel x12, xzr, x12, eq @@ -1290,13 +1296,13 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x10, x10, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x10, [sp, #192] ; 8-byte Folded Spill -; GISEL-NEXT: csel x10, xzr, x26, eq +; GISEL-NEXT: str x10, [sp, #232] ; 8-byte Folded Spill +; GISEL-NEXT: csel x10, xzr, x3, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x10, x2, x10 +; GISEL-NEXT: orr x10, x6, x10 ; GISEL-NEXT: csel x10, x10, xzr, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: csel x10, x25, x10, eq +; GISEL-NEXT: csel x10, x0, x10, eq ; GISEL-NEXT: cmp x9, #2 ; GISEL-NEXT: csel x10, xzr, x10, eq ; GISEL-NEXT: cmp x9, #3 @@ -1327,25 +1333,24 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x9, #15 ; GISEL-NEXT: csel x13, xzr, x13, eq ; GISEL-NEXT: cmp x8, #0 -; GISEL-NEXT: lsl x20, x12, x16 +; GISEL-NEXT: lsl x26, x12, x14 ; GISEL-NEXT: csel x11, x11, x13, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x11, [sp, #184] ; 8-byte Folded Spill -; GISEL-NEXT: csel x11, xzr, x23, eq +; GISEL-NEXT: str x11, [sp, #224] ; 8-byte Folded Spill +; GISEL-NEXT: csel x11, xzr, x20, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x11, x20, x11 -; GISEL-NEXT: lsr x15, x12, x21 -; GISEL-NEXT: lsl x14, x10, x16 +; GISEL-NEXT: orr x11, x26, x11 +; GISEL-NEXT: lsr x15, x12, x2 +; GISEL-NEXT: lsl x30, x10, x16 ; GISEL-NEXT: csel x11, x11, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: lsr x17, x10, x21 -; GISEL-NEXT: csel x13, xzr, x26, eq +; GISEL-NEXT: lsr x17, x10, x2 +; GISEL-NEXT: csel x13, xzr, x3, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: str x20, [sp, #8] ; 8-byte Folded Spill -; GISEL-NEXT: orr x13, x2, x13 +; GISEL-NEXT: orr x13, x6, x13 ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: csel x11, x25, x11, eq +; GISEL-NEXT: csel x11, x0, x11, eq ; GISEL-NEXT: cmp x9, #3 ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #4 @@ -1375,23 +1380,23 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x11, [sp, #176] ; 8-byte Folded Spill +; GISEL-NEXT: str x11, [sp, #216] ; 8-byte Folded Spill ; GISEL-NEXT: csel x11, xzr, x15, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x11, x14, x11 +; GISEL-NEXT: orr x11, x30, x11 ; GISEL-NEXT: csel x11, x11, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x23, eq +; GISEL-NEXT: csel x12, xzr, x20, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: orr x12, x20, x12 +; GISEL-NEXT: orr x12, x26, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x26, eq +; GISEL-NEXT: csel x12, xzr, x3, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: orr x12, x2, x12 +; GISEL-NEXT: orr x12, x6, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: cmp x9, #3 -; GISEL-NEXT: csel x11, x25, x11, eq +; GISEL-NEXT: csel x11, x0, x11, eq ; GISEL-NEXT: cmp x9, #4 ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #5 @@ -1421,33 +1426,33 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: lsl x0, x12, x16 ; GISEL-NEXT: csel x10, x10, x13, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x10, [sp, #168] ; 8-byte Folded Spill +; GISEL-NEXT: str x10, [sp, #208] ; 8-byte Folded Spill ; GISEL-NEXT: csel x10, xzr, x17, eq ; GISEL-NEXT: cmp x9, #0 ; GISEL-NEXT: orr x10, x0, x10 -; GISEL-NEXT: lsr x27, x12, x21 +; GISEL-NEXT: lsr x4, x12, x2 ; GISEL-NEXT: lsl x19, x11, x16 ; GISEL-NEXT: csel x10, x10, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: lsr x3, x11, x21 +; GISEL-NEXT: mov x16, x15 ; GISEL-NEXT: csel x13, xzr, x15, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: stp x27, x0, [sp, #240] ; 16-byte Folded Spill -; GISEL-NEXT: orr x13, x14, x13 -; GISEL-NEXT: mov x7, x3 +; GISEL-NEXT: str x4, [sp, #248] ; 8-byte Folded Spill +; GISEL-NEXT: orr x13, x30, x13 +; GISEL-NEXT: str x0, [sp, #48] ; 8-byte Folded Spill ; GISEL-NEXT: csel x10, x13, x10, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x23, eq +; GISEL-NEXT: csel x13, xzr, x20, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: orr x13, x20, x13 +; GISEL-NEXT: orr x13, x26, x13 ; GISEL-NEXT: csel x10, x13, x10, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x26, eq +; GISEL-NEXT: csel x13, xzr, x3, eq ; GISEL-NEXT: cmp x9, #3 -; GISEL-NEXT: orr x13, x2, x13 +; GISEL-NEXT: orr x13, x6, x13 ; GISEL-NEXT: csel x10, x13, x10, eq ; GISEL-NEXT: cmp x9, #4 -; GISEL-NEXT: csel x10, x25, x10, eq +; GISEL-NEXT: csel x10, x24, x10, eq ; GISEL-NEXT: cmp x9, #5 ; GISEL-NEXT: csel x10, xzr, x10, eq ; GISEL-NEXT: cmp x9, #6 @@ -1473,8 +1478,8 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x10, x12, x10, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x10, [sp, #160] ; 8-byte Folded Spill -; GISEL-NEXT: csel x10, xzr, x27, eq +; GISEL-NEXT: str x10, [sp, #200] ; 8-byte Folded Spill +; GISEL-NEXT: csel x10, xzr, x4, eq ; GISEL-NEXT: cmp x9, #0 ; GISEL-NEXT: orr x10, x19, x10 ; GISEL-NEXT: csel x10, x10, xzr, eq @@ -1486,20 +1491,22 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: tst x8, #0x3f ; GISEL-NEXT: csel x12, xzr, x15, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: orr x12, x14, x12 +; GISEL-NEXT: and x15, x8, #0x3f +; GISEL-NEXT: orr x12, x30, x12 ; GISEL-NEXT: csel x10, x12, x10, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x23, eq +; GISEL-NEXT: csel x12, xzr, x20, eq ; GISEL-NEXT: cmp x9, #3 -; GISEL-NEXT: orr x12, x20, x12 +; GISEL-NEXT: orr x12, x26, x12 ; GISEL-NEXT: csel x10, x12, x10, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x26, eq +; GISEL-NEXT: csel x12, xzr, x3, eq ; GISEL-NEXT: cmp x9, #4 -; GISEL-NEXT: orr x12, x2, x12 +; GISEL-NEXT: lsr x3, x11, x2 +; GISEL-NEXT: orr x12, x6, x12 ; GISEL-NEXT: csel x10, x12, x10, eq ; GISEL-NEXT: cmp x9, #5 -; GISEL-NEXT: csel x10, x25, x10, eq +; GISEL-NEXT: csel x10, x24, x10, eq ; GISEL-NEXT: cmp x9, #6 ; GISEL-NEXT: csel x10, xzr, x10, eq ; GISEL-NEXT: cmp x9, #7 @@ -1522,21 +1529,23 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x9, #15 ; GISEL-NEXT: csel x13, xzr, x13, eq ; GISEL-NEXT: cmp x8, #0 -; GISEL-NEXT: lsl x4, x12, x16 +; GISEL-NEXT: lsl x22, x12, x15 ; GISEL-NEXT: csel x11, x11, x13, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x11, [sp, #152] ; 8-byte Folded Spill +; GISEL-NEXT: str x11, [sp, #192] ; 8-byte Folded Spill ; GISEL-NEXT: csel x11, xzr, x3, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x11, x4, x11 -; GISEL-NEXT: lsl x30, x10, x16 -; GISEL-NEXT: lsr x28, x10, x21 +; GISEL-NEXT: orr x11, x22, x11 +; GISEL-NEXT: lsl x5, x10, x15 +; GISEL-NEXT: lsr x27, x10, x2 ; GISEL-NEXT: csel x11, x11, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x27, eq +; GISEL-NEXT: csel x13, xzr, x4, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: str x30, [sp, #200] ; 8-byte Folded Spill +; GISEL-NEXT: mov x25, x27 ; GISEL-NEXT: orr x13, x19, x13 +; GISEL-NEXT: mov x14, x5 +; GISEL-NEXT: str x27, [sp, #328] ; 8-byte Folded Spill ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: tst x8, #0x3f ; GISEL-NEXT: csel x13, xzr, x17, eq @@ -1544,30 +1553,29 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: orr x13, x0, x13 ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x15, eq +; GISEL-NEXT: csel x13, xzr, x16, eq ; GISEL-NEXT: cmp x9, #3 -; GISEL-NEXT: orr x13, x14, x13 +; GISEL-NEXT: orr x13, x30, x13 ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x23, eq +; GISEL-NEXT: csel x13, xzr, x20, eq ; GISEL-NEXT: cmp x9, #4 -; GISEL-NEXT: orr x13, x20, x13 +; GISEL-NEXT: orr x13, x26, x13 ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x26, eq +; GISEL-NEXT: csel x13, xzr, x7, eq ; GISEL-NEXT: cmp x9, #5 -; GISEL-NEXT: orr x13, x2, x13 +; GISEL-NEXT: orr x13, x6, x13 ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: cmp x9, #6 -; GISEL-NEXT: lsr x13, x12, x21 -; GISEL-NEXT: csel x11, x25, x11, eq +; GISEL-NEXT: lsr x13, x12, x2 +; GISEL-NEXT: csel x11, x24, x11, eq ; GISEL-NEXT: cmp x9, #7 ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #8 -; GISEL-NEXT: mov x6, x13 +; GISEL-NEXT: mov x15, x13 ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #9 -; GISEL-NEXT: str x6, [sp, #256] ; 8-byte Folded Spill ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #10 ; GISEL-NEXT: csel x11, xzr, x11, eq @@ -1584,18 +1592,18 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x11, [sp, #144] ; 8-byte Folded Spill +; GISEL-NEXT: str x11, [sp, #184] ; 8-byte Folded Spill ; GISEL-NEXT: csel x11, xzr, x13, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x11, x30, x11 +; GISEL-NEXT: orr x11, x5, x11 ; GISEL-NEXT: csel x11, x11, xzr, eq ; GISEL-NEXT: tst x8, #0x3f ; GISEL-NEXT: csel x12, xzr, x3, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: orr x12, x4, x12 +; GISEL-NEXT: orr x12, x22, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x27, eq +; GISEL-NEXT: csel x12, xzr, x4, eq ; GISEL-NEXT: cmp x9, #2 ; GISEL-NEXT: orr x12, x19, x12 ; GISEL-NEXT: csel x11, x12, x11, eq @@ -1605,22 +1613,22 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: orr x12, x0, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x15, eq +; GISEL-NEXT: csel x12, xzr, x16, eq ; GISEL-NEXT: cmp x9, #4 -; GISEL-NEXT: orr x12, x14, x12 +; GISEL-NEXT: orr x12, x30, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x23, eq +; GISEL-NEXT: csel x12, xzr, x20, eq ; GISEL-NEXT: cmp x9, #5 -; GISEL-NEXT: orr x12, x20, x12 +; GISEL-NEXT: orr x12, x26, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x26, eq +; GISEL-NEXT: csel x12, xzr, x7, eq ; GISEL-NEXT: cmp x9, #6 -; GISEL-NEXT: orr x12, x2, x12 +; GISEL-NEXT: orr x12, x6, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: cmp x9, #7 -; GISEL-NEXT: csel x11, x25, x11, eq +; GISEL-NEXT: csel x11, x24, x11, eq ; GISEL-NEXT: cmp x9, #8 ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #9 @@ -1635,39 +1643,34 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #14 ; GISEL-NEXT: csel x12, xzr, x11, eq -; GISEL-NEXT: ldp x11, x5, [x1, #64] +; GISEL-NEXT: ldp x11, x1, [x1, #64] ; GISEL-NEXT: cmp x9, #15 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x12, x10, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: lsl x21, x11, x16 -; GISEL-NEXT: str x12, [sp, #136] ; 8-byte Folded Spill -; GISEL-NEXT: csel x12, xzr, x28, eq +; GISEL-NEXT: lsl x23, x11, x21 +; GISEL-NEXT: str x12, [sp, #176] ; 8-byte Folded Spill +; GISEL-NEXT: csel x12, xzr, x27, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x12, x21, x12 -; GISEL-NEXT: lsr x10, x11, x22 -; GISEL-NEXT: mov x16, x19 +; GISEL-NEXT: orr x12, x23, x12 +; GISEL-NEXT: lsr x21, x11, x2 +; GISEL-NEXT: str x23, [sp, #288] ; 8-byte Folded Spill ; GISEL-NEXT: csel x12, x12, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: mov x1, x16 ; GISEL-NEXT: csel x13, xzr, x13, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: str x16, [sp, #304] ; 8-byte Folded Spill -; GISEL-NEXT: orr x13, x30, x13 +; GISEL-NEXT: orr x13, x5, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f ; GISEL-NEXT: csel x13, xzr, x3, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: lsl x3, x5, x24 -; GISEL-NEXT: orr x13, x4, x13 +; GISEL-NEXT: orr x13, x22, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: stp x21, x3, [sp, #216] ; 16-byte Folded Spill -; GISEL-NEXT: csel x13, xzr, x27, eq +; GISEL-NEXT: csel x13, xzr, x4, eq ; GISEL-NEXT: cmp x9, #3 ; GISEL-NEXT: orr x13, x19, x13 -; GISEL-NEXT: mov x19, x28 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f ; GISEL-NEXT: csel x13, xzr, x17, eq @@ -1675,27 +1678,30 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: orr x13, x0, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x15, eq +; GISEL-NEXT: csel x13, xzr, x16, eq ; GISEL-NEXT: cmp x9, #5 -; GISEL-NEXT: orr x13, x14, x13 +; GISEL-NEXT: orr x13, x30, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x23, eq +; GISEL-NEXT: csel x13, xzr, x20, eq ; GISEL-NEXT: cmp x9, #6 -; GISEL-NEXT: orr x13, x20, x13 +; GISEL-NEXT: orr x13, x26, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x26, eq +; GISEL-NEXT: csel x13, xzr, x7, eq ; GISEL-NEXT: cmp x9, #7 -; GISEL-NEXT: orr x13, x2, x13 +; GISEL-NEXT: orr x13, x6, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: cmp x9, #8 -; GISEL-NEXT: csel x12, x25, x12, eq +; GISEL-NEXT: and x13, x8, #0x3f +; GISEL-NEXT: csel x12, x24, x12, eq ; GISEL-NEXT: cmp x9, #9 +; GISEL-NEXT: lsl x10, x1, x13 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #10 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #11 +; GISEL-NEXT: stp x10, x15, [sp, #312] ; 16-byte Folded Spill ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #12 ; GISEL-NEXT: csel x12, xzr, x12, eq @@ -1708,69 +1714,69 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x11, x11, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x11, [sp, #128] ; 8-byte Folded Spill -; GISEL-NEXT: csel x11, xzr, x10, eq +; GISEL-NEXT: str x11, [sp, #168] ; 8-byte Folded Spill +; GISEL-NEXT: csel x11, xzr, x21, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x11, x3, x11 +; GISEL-NEXT: orr x11, x10, x11 +; GISEL-NEXT: mov x10, x23 ; GISEL-NEXT: csel x11, x11, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x28, eq +; GISEL-NEXT: csel x12, xzr, x27, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: mov x28, x4 -; GISEL-NEXT: orr x12, x21, x12 -; GISEL-NEXT: str x28, [sp, #32] ; 8-byte Folded Spill +; GISEL-NEXT: mov x27, x24 +; GISEL-NEXT: orr x12, x23, x12 +; GISEL... [truncated] 
@llvmbot
Copy link
Member

llvmbot commented Sep 30, 2025

@llvm/pr-subscribers-backend-x86

Author: Luke Lau (lukel97)

Changes

In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses.

We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in TargetTransformInfo::isReMaterializableImpl. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites.

https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations.

However #160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and #160709 and #159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial.

With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets.

llvm-test-suite regalloc.NumSpills geomean llvm-test-suite regalloc.NumReloads geomean
-target riscv64-linux-gnu -march=rva23u64 -O3 -5.2% -8.1%
-target arm64-apple-darwin -O3 -10.8% -11.6%
-target x86_64-linux-gnu -O3 -6.2% -6.5%
SPEC CPU 2017 regalloc.NumSpills geomean SPEC CPU 2017 regalloc.NumReloads geomean
-target riscv64-linux-gnu -march=rva23u64 -O3 -4.9% -5.5%
-target x86_64-linux-gnu -O3 -3.2% -4.0%

I wasn't able to build SPEC CPU 2017 on arm64-apple-darwin due to incompatibilities with the macOS SDK headers.

This also allows us to rematerialize loads and stores on RISC-V in a future patch.


Patch is 218.03 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159211.diff

28 Files Affected:

  • (modified) llvm/lib/CodeGen/TargetInstrInfo.cpp (-6)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll (+415-397)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/machine-combiner-copy.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll (+23-24)
  • (modified) llvm/test/CodeGen/AArch64/peephole-and-tst.ll (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/reserveXreg-for-regalloc.ll (+1-5)
  • (modified) llvm/test/CodeGen/AArch64/tbl-loops.ll (+4-4)
  • (modified) llvm/test/CodeGen/ARM/combine-movc-sub.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/pr69586.ll (+102-102)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/reductions.ll (+3-3)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/spillingmove.ll (+72-79)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/varying-outer-2d-reduction.ll (+40-40)
  • (modified) llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination.ll (+6-7)
  • (modified) llvm/test/CodeGen/Thumb2/mve-float16regloops.ll (+40-42)
  • (modified) llvm/test/CodeGen/Thumb2/mve-float32regloops.ll (+49-51)
  • (modified) llvm/test/CodeGen/Thumb2/mve-gather-increment.ll (+138-140)
  • (modified) llvm/test/CodeGen/Thumb2/mve-phireg.ll (+14-16)
  • (modified) llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll (+238-281)
  • (modified) llvm/test/CodeGen/Thumb2/mve-qrintrsplat.ll (+10-12)
  • (modified) llvm/test/CodeGen/Thumb2/mve-scatter-increment.ll (+5-5)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vecreduce-addpred.ll (+6-10)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vecreduce-mlapred.ll (+41-51)
  • (modified) llvm/test/CodeGen/X86/AMX/amx-greedy-ra-spill-shape.ll (+37-45)
  • (modified) llvm/test/CodeGen/X86/dag-update-nodetomatch.ll (+2-3)
  • (modified) llvm/test/CodeGen/X86/delete-dead-instrs-with-live-uses.mir (+2-2)
  • (modified) llvm/test/CodeGen/X86/inalloca-invoke.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/licm-regpressure.ll (+56-6)
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index 2f3b7a2c8fcdf..3c41bbeb4b327 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -1657,12 +1657,6 @@ bool TargetInstrInfo::isReMaterializableImpl( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; - - // Don't allow any virtual-register uses. Rematting an instruction with - // virtual register uses would length the live ranges of the uses, which - // is not necessarily a good idea, certainly not "trivial". - if (MO.isUse()) - return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll b/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll index ed68723e470a2..41f7ab89094ad 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/split-wide-shifts-multiway.ll @@ -1219,14 +1219,14 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; ; GISEL-LABEL: test_shl_i1024: ; GISEL: ; %bb.0: ; %entry -; GISEL-NEXT: sub sp, sp, #416 -; GISEL-NEXT: stp x28, x27, [sp, #320] ; 16-byte Folded Spill -; GISEL-NEXT: stp x26, x25, [sp, #336] ; 16-byte Folded Spill -; GISEL-NEXT: stp x24, x23, [sp, #352] ; 16-byte Folded Spill -; GISEL-NEXT: stp x22, x21, [sp, #368] ; 16-byte Folded Spill -; GISEL-NEXT: stp x20, x19, [sp, #384] ; 16-byte Folded Spill -; GISEL-NEXT: stp x29, x30, [sp, #400] ; 16-byte Folded Spill -; GISEL-NEXT: .cfi_def_cfa_offset 416 +; GISEL-NEXT: sub sp, sp, #432 +; GISEL-NEXT: stp x28, x27, [sp, #336] ; 16-byte Folded Spill +; GISEL-NEXT: stp x26, x25, [sp, #352] ; 16-byte Folded Spill +; GISEL-NEXT: stp x24, x23, [sp, #368] ; 16-byte Folded Spill +; GISEL-NEXT: stp x22, x21, [sp, #384] ; 16-byte Folded Spill +; GISEL-NEXT: stp x20, x19, [sp, #400] ; 16-byte Folded Spill +; GISEL-NEXT: stp x29, x30, [sp, #416] ; 16-byte Folded Spill +; GISEL-NEXT: .cfi_def_cfa_offset 432 ; GISEL-NEXT: .cfi_offset w30, -8 ; GISEL-NEXT: .cfi_offset w29, -16 ; GISEL-NEXT: .cfi_offset w19, -24 @@ -1242,38 +1242,44 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: ldp x10, x11, [x1] ; GISEL-NEXT: mov w8, w2 ; GISEL-NEXT: lsr x9, x8, #6 -; GISEL-NEXT: and x16, x8, #0x3f +; GISEL-NEXT: and x12, x8, #0x3f +; GISEL-NEXT: str x0, [sp, #144] ; 8-byte Folded Spill +; GISEL-NEXT: and x14, x8, #0x3f ; GISEL-NEXT: mov w13, #64 ; =0x40 -; GISEL-NEXT: sub x21, x13, x16 -; GISEL-NEXT: str x0, [sp, #112] ; 8-byte Folded Spill -; GISEL-NEXT: mov x24, x16 -; GISEL-NEXT: lsl x25, x10, x16 +; GISEL-NEXT: and x16, x8, #0x3f +; GISEL-NEXT: lsl x0, x10, x12 ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: lsr x26, x10, x21 -; GISEL-NEXT: lsl x2, x11, x16 -; GISEL-NEXT: lsr x23, x11, x21 -; GISEL-NEXT: mov x22, x21 -; GISEL-NEXT: csel x12, x25, xzr, eq +; GISEL-NEXT: sub x2, x13, x14 +; GISEL-NEXT: lsr x3, x10, x2 +; GISEL-NEXT: lsl x6, x11, x14 +; GISEL-NEXT: and x14, x8, #0x3f +; GISEL-NEXT: csel x12, x0, xzr, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: str x1, [sp, #312] ; 8-byte Folded Spill +; GISEL-NEXT: lsr x20, x11, x2 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: str x23, [sp, #208] ; 8-byte Folded Spill +; GISEL-NEXT: mov x24, x0 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #3 -; GISEL-NEXT: stp x24, x22, [sp, #40] ; 16-byte Folded Spill +; GISEL-NEXT: mov x7, x3 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #4 +; GISEL-NEXT: mov x28, x1 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #5 +; GISEL-NEXT: and x21, x8, #0x3f ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #6 +; GISEL-NEXT: str x6, [sp, #24] ; 8-byte Folded Spill ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #7 +; GISEL-NEXT: str x28, [sp, #304] ; 8-byte Folded Spill ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #8 +; GISEL-NEXT: str x7, [sp, #272] ; 8-byte Folded Spill ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #9 +; GISEL-NEXT: str x20, [sp, #112] ; 8-byte Folded Spill ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #10 ; GISEL-NEXT: csel x12, xzr, x12, eq @@ -1290,13 +1296,13 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x10, x10, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x10, [sp, #192] ; 8-byte Folded Spill -; GISEL-NEXT: csel x10, xzr, x26, eq +; GISEL-NEXT: str x10, [sp, #232] ; 8-byte Folded Spill +; GISEL-NEXT: csel x10, xzr, x3, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x10, x2, x10 +; GISEL-NEXT: orr x10, x6, x10 ; GISEL-NEXT: csel x10, x10, xzr, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: csel x10, x25, x10, eq +; GISEL-NEXT: csel x10, x0, x10, eq ; GISEL-NEXT: cmp x9, #2 ; GISEL-NEXT: csel x10, xzr, x10, eq ; GISEL-NEXT: cmp x9, #3 @@ -1327,25 +1333,24 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x9, #15 ; GISEL-NEXT: csel x13, xzr, x13, eq ; GISEL-NEXT: cmp x8, #0 -; GISEL-NEXT: lsl x20, x12, x16 +; GISEL-NEXT: lsl x26, x12, x14 ; GISEL-NEXT: csel x11, x11, x13, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x11, [sp, #184] ; 8-byte Folded Spill -; GISEL-NEXT: csel x11, xzr, x23, eq +; GISEL-NEXT: str x11, [sp, #224] ; 8-byte Folded Spill +; GISEL-NEXT: csel x11, xzr, x20, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x11, x20, x11 -; GISEL-NEXT: lsr x15, x12, x21 -; GISEL-NEXT: lsl x14, x10, x16 +; GISEL-NEXT: orr x11, x26, x11 +; GISEL-NEXT: lsr x15, x12, x2 +; GISEL-NEXT: lsl x30, x10, x16 ; GISEL-NEXT: csel x11, x11, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: lsr x17, x10, x21 -; GISEL-NEXT: csel x13, xzr, x26, eq +; GISEL-NEXT: lsr x17, x10, x2 +; GISEL-NEXT: csel x13, xzr, x3, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: str x20, [sp, #8] ; 8-byte Folded Spill -; GISEL-NEXT: orr x13, x2, x13 +; GISEL-NEXT: orr x13, x6, x13 ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: csel x11, x25, x11, eq +; GISEL-NEXT: csel x11, x0, x11, eq ; GISEL-NEXT: cmp x9, #3 ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #4 @@ -1375,23 +1380,23 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x11, [sp, #176] ; 8-byte Folded Spill +; GISEL-NEXT: str x11, [sp, #216] ; 8-byte Folded Spill ; GISEL-NEXT: csel x11, xzr, x15, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x11, x14, x11 +; GISEL-NEXT: orr x11, x30, x11 ; GISEL-NEXT: csel x11, x11, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x23, eq +; GISEL-NEXT: csel x12, xzr, x20, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: orr x12, x20, x12 +; GISEL-NEXT: orr x12, x26, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x26, eq +; GISEL-NEXT: csel x12, xzr, x3, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: orr x12, x2, x12 +; GISEL-NEXT: orr x12, x6, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: cmp x9, #3 -; GISEL-NEXT: csel x11, x25, x11, eq +; GISEL-NEXT: csel x11, x0, x11, eq ; GISEL-NEXT: cmp x9, #4 ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #5 @@ -1421,33 +1426,33 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: lsl x0, x12, x16 ; GISEL-NEXT: csel x10, x10, x13, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x10, [sp, #168] ; 8-byte Folded Spill +; GISEL-NEXT: str x10, [sp, #208] ; 8-byte Folded Spill ; GISEL-NEXT: csel x10, xzr, x17, eq ; GISEL-NEXT: cmp x9, #0 ; GISEL-NEXT: orr x10, x0, x10 -; GISEL-NEXT: lsr x27, x12, x21 +; GISEL-NEXT: lsr x4, x12, x2 ; GISEL-NEXT: lsl x19, x11, x16 ; GISEL-NEXT: csel x10, x10, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: lsr x3, x11, x21 +; GISEL-NEXT: mov x16, x15 ; GISEL-NEXT: csel x13, xzr, x15, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: stp x27, x0, [sp, #240] ; 16-byte Folded Spill -; GISEL-NEXT: orr x13, x14, x13 -; GISEL-NEXT: mov x7, x3 +; GISEL-NEXT: str x4, [sp, #248] ; 8-byte Folded Spill +; GISEL-NEXT: orr x13, x30, x13 +; GISEL-NEXT: str x0, [sp, #48] ; 8-byte Folded Spill ; GISEL-NEXT: csel x10, x13, x10, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x23, eq +; GISEL-NEXT: csel x13, xzr, x20, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: orr x13, x20, x13 +; GISEL-NEXT: orr x13, x26, x13 ; GISEL-NEXT: csel x10, x13, x10, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x26, eq +; GISEL-NEXT: csel x13, xzr, x3, eq ; GISEL-NEXT: cmp x9, #3 -; GISEL-NEXT: orr x13, x2, x13 +; GISEL-NEXT: orr x13, x6, x13 ; GISEL-NEXT: csel x10, x13, x10, eq ; GISEL-NEXT: cmp x9, #4 -; GISEL-NEXT: csel x10, x25, x10, eq +; GISEL-NEXT: csel x10, x24, x10, eq ; GISEL-NEXT: cmp x9, #5 ; GISEL-NEXT: csel x10, xzr, x10, eq ; GISEL-NEXT: cmp x9, #6 @@ -1473,8 +1478,8 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x10, x12, x10, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x10, [sp, #160] ; 8-byte Folded Spill -; GISEL-NEXT: csel x10, xzr, x27, eq +; GISEL-NEXT: str x10, [sp, #200] ; 8-byte Folded Spill +; GISEL-NEXT: csel x10, xzr, x4, eq ; GISEL-NEXT: cmp x9, #0 ; GISEL-NEXT: orr x10, x19, x10 ; GISEL-NEXT: csel x10, x10, xzr, eq @@ -1486,20 +1491,22 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: tst x8, #0x3f ; GISEL-NEXT: csel x12, xzr, x15, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: orr x12, x14, x12 +; GISEL-NEXT: and x15, x8, #0x3f +; GISEL-NEXT: orr x12, x30, x12 ; GISEL-NEXT: csel x10, x12, x10, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x23, eq +; GISEL-NEXT: csel x12, xzr, x20, eq ; GISEL-NEXT: cmp x9, #3 -; GISEL-NEXT: orr x12, x20, x12 +; GISEL-NEXT: orr x12, x26, x12 ; GISEL-NEXT: csel x10, x12, x10, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x26, eq +; GISEL-NEXT: csel x12, xzr, x3, eq ; GISEL-NEXT: cmp x9, #4 -; GISEL-NEXT: orr x12, x2, x12 +; GISEL-NEXT: lsr x3, x11, x2 +; GISEL-NEXT: orr x12, x6, x12 ; GISEL-NEXT: csel x10, x12, x10, eq ; GISEL-NEXT: cmp x9, #5 -; GISEL-NEXT: csel x10, x25, x10, eq +; GISEL-NEXT: csel x10, x24, x10, eq ; GISEL-NEXT: cmp x9, #6 ; GISEL-NEXT: csel x10, xzr, x10, eq ; GISEL-NEXT: cmp x9, #7 @@ -1522,21 +1529,23 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x9, #15 ; GISEL-NEXT: csel x13, xzr, x13, eq ; GISEL-NEXT: cmp x8, #0 -; GISEL-NEXT: lsl x4, x12, x16 +; GISEL-NEXT: lsl x22, x12, x15 ; GISEL-NEXT: csel x11, x11, x13, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x11, [sp, #152] ; 8-byte Folded Spill +; GISEL-NEXT: str x11, [sp, #192] ; 8-byte Folded Spill ; GISEL-NEXT: csel x11, xzr, x3, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x11, x4, x11 -; GISEL-NEXT: lsl x30, x10, x16 -; GISEL-NEXT: lsr x28, x10, x21 +; GISEL-NEXT: orr x11, x22, x11 +; GISEL-NEXT: lsl x5, x10, x15 +; GISEL-NEXT: lsr x27, x10, x2 ; GISEL-NEXT: csel x11, x11, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x27, eq +; GISEL-NEXT: csel x13, xzr, x4, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: str x30, [sp, #200] ; 8-byte Folded Spill +; GISEL-NEXT: mov x25, x27 ; GISEL-NEXT: orr x13, x19, x13 +; GISEL-NEXT: mov x14, x5 +; GISEL-NEXT: str x27, [sp, #328] ; 8-byte Folded Spill ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: tst x8, #0x3f ; GISEL-NEXT: csel x13, xzr, x17, eq @@ -1544,30 +1553,29 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: orr x13, x0, x13 ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x15, eq +; GISEL-NEXT: csel x13, xzr, x16, eq ; GISEL-NEXT: cmp x9, #3 -; GISEL-NEXT: orr x13, x14, x13 +; GISEL-NEXT: orr x13, x30, x13 ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x23, eq +; GISEL-NEXT: csel x13, xzr, x20, eq ; GISEL-NEXT: cmp x9, #4 -; GISEL-NEXT: orr x13, x20, x13 +; GISEL-NEXT: orr x13, x26, x13 ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x26, eq +; GISEL-NEXT: csel x13, xzr, x7, eq ; GISEL-NEXT: cmp x9, #5 -; GISEL-NEXT: orr x13, x2, x13 +; GISEL-NEXT: orr x13, x6, x13 ; GISEL-NEXT: csel x11, x13, x11, eq ; GISEL-NEXT: cmp x9, #6 -; GISEL-NEXT: lsr x13, x12, x21 -; GISEL-NEXT: csel x11, x25, x11, eq +; GISEL-NEXT: lsr x13, x12, x2 +; GISEL-NEXT: csel x11, x24, x11, eq ; GISEL-NEXT: cmp x9, #7 ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #8 -; GISEL-NEXT: mov x6, x13 +; GISEL-NEXT: mov x15, x13 ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #9 -; GISEL-NEXT: str x6, [sp, #256] ; 8-byte Folded Spill ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #10 ; GISEL-NEXT: csel x11, xzr, x11, eq @@ -1584,18 +1592,18 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x11, [sp, #144] ; 8-byte Folded Spill +; GISEL-NEXT: str x11, [sp, #184] ; 8-byte Folded Spill ; GISEL-NEXT: csel x11, xzr, x13, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x11, x30, x11 +; GISEL-NEXT: orr x11, x5, x11 ; GISEL-NEXT: csel x11, x11, xzr, eq ; GISEL-NEXT: tst x8, #0x3f ; GISEL-NEXT: csel x12, xzr, x3, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: orr x12, x4, x12 +; GISEL-NEXT: orr x12, x22, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x27, eq +; GISEL-NEXT: csel x12, xzr, x4, eq ; GISEL-NEXT: cmp x9, #2 ; GISEL-NEXT: orr x12, x19, x12 ; GISEL-NEXT: csel x11, x12, x11, eq @@ -1605,22 +1613,22 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: orr x12, x0, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x15, eq +; GISEL-NEXT: csel x12, xzr, x16, eq ; GISEL-NEXT: cmp x9, #4 -; GISEL-NEXT: orr x12, x14, x12 +; GISEL-NEXT: orr x12, x30, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x23, eq +; GISEL-NEXT: csel x12, xzr, x20, eq ; GISEL-NEXT: cmp x9, #5 -; GISEL-NEXT: orr x12, x20, x12 +; GISEL-NEXT: orr x12, x26, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x26, eq +; GISEL-NEXT: csel x12, xzr, x7, eq ; GISEL-NEXT: cmp x9, #6 -; GISEL-NEXT: orr x12, x2, x12 +; GISEL-NEXT: orr x12, x6, x12 ; GISEL-NEXT: csel x11, x12, x11, eq ; GISEL-NEXT: cmp x9, #7 -; GISEL-NEXT: csel x11, x25, x11, eq +; GISEL-NEXT: csel x11, x24, x11, eq ; GISEL-NEXT: cmp x9, #8 ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #9 @@ -1635,39 +1643,34 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: csel x11, xzr, x11, eq ; GISEL-NEXT: cmp x9, #14 ; GISEL-NEXT: csel x12, xzr, x11, eq -; GISEL-NEXT: ldp x11, x5, [x1, #64] +; GISEL-NEXT: ldp x11, x1, [x1, #64] ; GISEL-NEXT: cmp x9, #15 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x12, x10, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: lsl x21, x11, x16 -; GISEL-NEXT: str x12, [sp, #136] ; 8-byte Folded Spill -; GISEL-NEXT: csel x12, xzr, x28, eq +; GISEL-NEXT: lsl x23, x11, x21 +; GISEL-NEXT: str x12, [sp, #176] ; 8-byte Folded Spill +; GISEL-NEXT: csel x12, xzr, x27, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x12, x21, x12 -; GISEL-NEXT: lsr x10, x11, x22 -; GISEL-NEXT: mov x16, x19 +; GISEL-NEXT: orr x12, x23, x12 +; GISEL-NEXT: lsr x21, x11, x2 +; GISEL-NEXT: str x23, [sp, #288] ; 8-byte Folded Spill ; GISEL-NEXT: csel x12, x12, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: mov x1, x16 ; GISEL-NEXT: csel x13, xzr, x13, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: str x16, [sp, #304] ; 8-byte Folded Spill -; GISEL-NEXT: orr x13, x30, x13 +; GISEL-NEXT: orr x13, x5, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f ; GISEL-NEXT: csel x13, xzr, x3, eq ; GISEL-NEXT: cmp x9, #2 -; GISEL-NEXT: lsl x3, x5, x24 -; GISEL-NEXT: orr x13, x4, x13 +; GISEL-NEXT: orr x13, x22, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: stp x21, x3, [sp, #216] ; 16-byte Folded Spill -; GISEL-NEXT: csel x13, xzr, x27, eq +; GISEL-NEXT: csel x13, xzr, x4, eq ; GISEL-NEXT: cmp x9, #3 ; GISEL-NEXT: orr x13, x19, x13 -; GISEL-NEXT: mov x19, x28 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f ; GISEL-NEXT: csel x13, xzr, x17, eq @@ -1675,27 +1678,30 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: orr x13, x0, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x15, eq +; GISEL-NEXT: csel x13, xzr, x16, eq ; GISEL-NEXT: cmp x9, #5 -; GISEL-NEXT: orr x13, x14, x13 +; GISEL-NEXT: orr x13, x30, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x23, eq +; GISEL-NEXT: csel x13, xzr, x20, eq ; GISEL-NEXT: cmp x9, #6 -; GISEL-NEXT: orr x13, x20, x13 +; GISEL-NEXT: orr x13, x26, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x13, xzr, x26, eq +; GISEL-NEXT: csel x13, xzr, x7, eq ; GISEL-NEXT: cmp x9, #7 -; GISEL-NEXT: orr x13, x2, x13 +; GISEL-NEXT: orr x13, x6, x13 ; GISEL-NEXT: csel x12, x13, x12, eq ; GISEL-NEXT: cmp x9, #8 -; GISEL-NEXT: csel x12, x25, x12, eq +; GISEL-NEXT: and x13, x8, #0x3f +; GISEL-NEXT: csel x12, x24, x12, eq ; GISEL-NEXT: cmp x9, #9 +; GISEL-NEXT: lsl x10, x1, x13 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #10 ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #11 +; GISEL-NEXT: stp x10, x15, [sp, #312] ; 16-byte Folded Spill ; GISEL-NEXT: csel x12, xzr, x12, eq ; GISEL-NEXT: cmp x9, #12 ; GISEL-NEXT: csel x12, xzr, x12, eq @@ -1708,69 +1714,69 @@ define void @test_shl_i1024(ptr %result, ptr %input, i32 %shift) { ; GISEL-NEXT: cmp x8, #0 ; GISEL-NEXT: csel x11, x11, x12, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: str x11, [sp, #128] ; 8-byte Folded Spill -; GISEL-NEXT: csel x11, xzr, x10, eq +; GISEL-NEXT: str x11, [sp, #168] ; 8-byte Folded Spill +; GISEL-NEXT: csel x11, xzr, x21, eq ; GISEL-NEXT: cmp x9, #0 -; GISEL-NEXT: orr x11, x3, x11 +; GISEL-NEXT: orr x11, x10, x11 +; GISEL-NEXT: mov x10, x23 ; GISEL-NEXT: csel x11, x11, xzr, eq ; GISEL-NEXT: tst x8, #0x3f -; GISEL-NEXT: csel x12, xzr, x28, eq +; GISEL-NEXT: csel x12, xzr, x27, eq ; GISEL-NEXT: cmp x9, #1 -; GISEL-NEXT: mov x28, x4 -; GISEL-NEXT: orr x12, x21, x12 -; GISEL-NEXT: str x28, [sp, #32] ; 8-byte Folded Spill +; GISEL-NEXT: mov x27, x24 +; GISEL-NEXT: orr x12, x23, x12 +; GISEL... [truncated] 
Copy link
Contributor Author

@lukel97 lukel97 Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this now gets rematerialized we need another way of showing that the register pressure is too high, so I copied what was originally done in https://reviews.llvm.org/D106408 to converted it to an MIR test.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

A couple of points for later reference:

  • As the compile time concern reported against https://reviews.llvm.org/D106408 was never identified (i.e. no reproducer shared), we may see a regression on some workloads after this lands. Please do not revert unless a reproducer is available! I have a rough idea of the possible cause, but need a reproducer to confirm a fix.\
  • If for some reason this doesn't stick, we will probably move to enabling this selectively by target. AMDGPU already does this. RISC-V allows a couple specific cases, and we do want this more broadly.
  • I have audited the remaining callsites of TII->isReMaterializeable. I think the ones that are left all want the non-trivial behavior; hopefully we didn't miss anything.
  • See #161972 for a possible opt-quality improvement. (I wonder if the scheme in eliminateDeadDefs interacts with the compile time point above.)
@lukel97 lukel97 enabled auto-merge (squash) October 4, 2025 21:59
@lukel97 lukel97 merged commit 795a115 into llvm:main Oct 4, 2025
9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 4, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot3 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/15646

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure) ... llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 92405 tests, 64 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. FAIL: lld :: MachO/read-workers.s (9965 of 92405) ******************** TEST 'lld :: MachO/read-workers.s' FAILED ******************** Exit Code: -6 Command Output (stdout): -- # RUN: at line 2 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-mc -filetype=obj -triple=x86_64-apple-darwin /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/read-workers.s -o /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o # executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-mc -filetype=obj -triple=x86_64-apple-darwin /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/read-workers.s -o /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o # note: command had no output on stdout or stderr # RUN: at line 5 ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=0 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # executed command: ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=0 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # note: command had no output on stdout or stderr # RUN: at line 6 ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=1 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # executed command: ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=1 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # note: command had no output on stdout or stderr # RUN: at line 7 ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=2 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # executed command: ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=2 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # .---command stderr------------ # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | #0 0x00005559961af436 ___interceptor_backtrace /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:4530:13 # | #1 0x00005559964424e8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13 # | #2 0x000055599643bc89 llvm::sys::RunSignalHandlers() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Signals.cpp:0:5 # | #3 0x00005559964445ee SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38 # | #4 0x00007fc8d0e458d0 (/lib/x86_64-linux-gnu/libc.so.6+0x458d0) # | #5 0x00007fc8d0ea49bc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0xa49bc) # | #6 0x00007fc8d0e4579e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4579e) # | #7 0x00007fc8d0e288cd abort (/lib/x86_64-linux-gnu/libc.so.6+0x288cd) # | #8 0x000055599623211c (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld+0xc82a11c) # | #9 0x000055599622ffbe __sanitizer::Die() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5 # | #10 0x0000555996210c1b push_back /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common.h:543:7 # | #11 0x0000555996210c1b __asan::ScopedInErrorReport::~ScopedInErrorReport() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:193:29 # | #12 0x0000555996212aad __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:536:1 # | #13 0x0000555996213676 __asan_report_load1 /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_rtl.cpp:128:1 # | #14 0x0000555997051f06 operator() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Driver.cpp:352:5 # | #15 0x0000555997051f06 operator() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Driver.cpp:372:11 Step 10 (stage2/asan_ubsan check) failure: stage2/asan_ubsan check (failure) ... llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds. -- Testing: 92405 tests, 64 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. FAIL: lld :: MachO/read-workers.s (9965 of 92405) ******************** TEST 'lld :: MachO/read-workers.s' FAILED ******************** Exit Code: -6 Command Output (stdout): -- # RUN: at line 2 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-mc -filetype=obj -triple=x86_64-apple-darwin /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/read-workers.s -o /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o # executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llvm-mc -filetype=obj -triple=x86_64-apple-darwin /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/read-workers.s -o /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o # note: command had no output on stdout or stderr # RUN: at line 5 ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=0 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # executed command: ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=0 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # note: command had no output on stdout or stderr # RUN: at line 6 ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=1 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # executed command: ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=1 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # note: command had no output on stdout or stderr # RUN: at line 7 ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=2 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # executed command: ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings --read-workers=2 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/lld/test/MachO/Output/read-workers.s.tmp.o -o /dev/null # .---command stderr------------ # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug. # | #0 0x00005559961af436 ___interceptor_backtrace /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:4530:13 # | #1 0x00005559964424e8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13 # | #2 0x000055599643bc89 llvm::sys::RunSignalHandlers() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Signals.cpp:0:5 # | #3 0x00005559964445ee SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38 # | #4 0x00007fc8d0e458d0 (/lib/x86_64-linux-gnu/libc.so.6+0x458d0) # | #5 0x00007fc8d0ea49bc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0xa49bc) # | #6 0x00007fc8d0e4579e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4579e) # | #7 0x00007fc8d0e288cd abort (/lib/x86_64-linux-gnu/libc.so.6+0x288cd) # | #8 0x000055599623211c (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld+0xc82a11c) # | #9 0x000055599622ffbe __sanitizer::Die() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5 # | #10 0x0000555996210c1b push_back /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common.h:543:7 # | #11 0x0000555996210c1b __asan::ScopedInErrorReport::~ScopedInErrorReport() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:193:29 # | #12 0x0000555996212aad __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:536:1 # | #13 0x0000555996213676 __asan_report_load1 /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_rtl.cpp:128:1 # | #14 0x0000555997051f06 operator() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Driver.cpp:352:5 # | #15 0x0000555997051f06 operator() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Driver.cpp:372:11 
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 6, 2025
llvm#159211) In the register allocator we define non-trivial rematerialization as the rematerlization of an instruction with virtual register uses. We have been able to perform non-trivial rematerialization for a while, but it has been prevented by default unless specifically overriden by the target in `TargetTransformInfo::isReMaterializableImpl`. The original reasoning for this given by the comment in the default implementation is because we might increase a live range of the virtual register, but we don't actually do this. LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize instructions whose virtual registers are already live at the use sites. https://reviews.llvm.org/D106408 had originally tried to remove this restriction but it was reverted after some performance regressions were reported. We think it is likely that the regressions were caused by the fact that the old isTriviallyReMaterializable API sometimes returned true for non-trivial rematerializations. However llvm#160377 recently split the API out into a separate non-trivial and trivial version and updated the call-sites accordingly, and llvm#160709 and llvm#159180 fixed heuristics which weren't accounting for the difference between non-trivial and trivial. With these fixes in place, this patch proposes to again allow non-trivial rematerialization by default which reduces a significant amount of spills and reloads across various targets. For llvm-test-suite built with -O3 -flto, we get the following geomean reduction in reloads: - arm64-apple-darwin: 11.6% - riscv64-linux-gnu: 8.1% - x86_64-linux-gnu: 6.5%
@alexfh
Copy link
Contributor

alexfh commented Oct 25, 2025

Hi @lukel97, is this change expected to affect compilation time a lot? We see quite significant regressions on some protobuf-generated files:

  • before
 ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 130.3845 ( 61.6%) 0.6329 ( 3.6%) 131.0175 ( 57.1%) 131.0921 ( 57.1%) Greedy Register Allocator #2 
  • after
 ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 399.9995 ( 79.2%) 0.8087 ( 4.4%) 400.8082 ( 76.5%) 401.0294 ( 76.5%) Greedy Register Allocator #2 

This results in around 2x end-to-end compilation slowdown.

Do you see any opportunities for optimization? Is there a way to mitigate this using compiler flags?

I can try preparing a reduced test case, but it's going to take some time.

@preames
Copy link
Collaborator

preames commented Oct 26, 2025

Hi @lukel97, is this change expected to affect compilation time a lot? We see quite significant regressions on some protobuf-generated files:

...

I can try preparing a reduced test case, but it's going to take some time.

To quote myself from earlier in the thread: "As the compile time concern reported against https://reviews.llvm.org/D106408 was never identified (i.e. no reproducer shared), we may see a regression on some workloads after this lands. Please do not revert unless a reproducer is available! I have a rough idea of the possible cause, but need a reproducer to confirm a fix."

Once a reproducer is available, we can definitely revert while this is investigated. A compiler time regression of this magnitude is definitely not acceptable, we just need a reproducer with which to identify it.

@lukel97
Copy link
Contributor Author

lukel97 commented Oct 26, 2025

Thanks for reporting this @alexfh, a reproducer would definitely be much appreciated. In the meantime you can probably work around it in your target's isRematerializableImpl by appending something like

for (auto &MO : MI.uses()) if (MO.isReg() && MO.getReg() && MO.getReg().isVirtual()) return false; 

Which should restore the previous behaviour.

@preames Should we revert this now or wait until we get a reproducer first? I will be travelling tomorrow but feel free to go ahead and revert on my behalf if needed.

@alexfh
Copy link
Contributor

alexfh commented Oct 28, 2025

It turned out to be rather challenging to extract a shareable test case for this. The original input is equivalent to ~100M IR, which is already close to be a compiler stress test. And it also seems like we're only seeing the compilation time regression with memory sanitizer enabled, which likely expands the scope of what could go wrong (though register allocator seems to be sufficiently remote and independent from sanitizer instrumentation).

I wonder if a Clang profile collected while compiling the problematic file before and after this commit would be helpful to diagnose the problem?

@alexfh
Copy link
Contributor

alexfh commented Oct 28, 2025

Thanks for reporting this @alexfh, a reproducer would definitely be much appreciated. In the meantime you can probably work around it in your target's isRematerializableImpl by appending something like

for (auto &MO : MI.uses()) if (MO.isReg() && MO.getReg() && MO.getReg().isVirtual()) return false; 

Which should restore the previous behaviour.

Would it be possible to add a flag for this? We need a way to disable the new behavior for individual files without resorting to patching Clang.

@preames Should we revert this now or wait until we get a reproducer first? I will be travelling tomorrow but feel free to go ahead and revert on my behalf if needed.

I'm not asking for a revert. I don't see a ton of evidence that this commit is problematic across the board. So far it was just a single file from a huge codebase.

@preames
Copy link
Collaborator

preames commented Nov 3, 2025

I wonder if a Clang profile collected while compiling the problematic file before and after this commit would be helpful to diagnose the problem?

Yes please, this might very well be sufficient. I'd encourage you to file an issue with all the information you have. I'm requesting an issue only because it sounds like this may be a hunt for a bit, and I want to leave this review description open for other reports without risking things getting interwoven and lost.

@rnk
Copy link
Collaborator

rnk commented Nov 4, 2025

I wanted to share that I think the reduction in reloads seemed promising, and @weiguozhi is benchmarking this change internally at Google. It's probably too early to say, and you can make what you will of our workloads, but we're seeing something between insignificant changes and improvements close to the noise threshold.

Clearly this is triggering superlinear register allocation behavior and that deserves to be investigated, but overall, thanks for revisiting this. 👍

@lukel97
Copy link
Contributor Author

lukel97 commented Nov 21, 2025

@alexfh Hi Alexander, did you have any luck getting a reproducer for the compile time regression? Or are the sources being compiled publicly available?

@alexfh
Copy link
Contributor

alexfh commented Nov 28, 2025

@alexfh Hi Alexander, did you have any luck getting a reproducer for the compile time regression? Or are the sources being compiled publicly available?

As per my earlier comment (#159211 (comment)) extracting a shareable test case for this would be a rather problematic task for what seems to be a regression on a single file from our huge codebase. I'll grab clang profile on the problematic input and post it here, maybe that would be enough to see any optimization opportunities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment