Skip to content

Conversation

@CrooseGit
Copy link
Contributor

  • This patch begins modifies the CI pipeline to run all the tests under both release and dev profiles, as currently they are just being ran in release twice.

This is part 2/2 of this PR

@rustbot
Copy link
Collaborator

rustbot commented Nov 28, 2025

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@CrooseGit
Copy link
Contributor Author

r? @folkertdev
Here's part two, thanks for reviewing the first part.

@rustbot rustbot assigned folkertdev and unassigned Amanieu Nov 28, 2025
Copy link
Contributor

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands this PR makes CI ~6 minutes longer. That is really more than I'm comfortable with...

@sayantn I believe we had some ideas earlier of running some x86 tests under qemu, and only the avx512 ones with the intel emulator?

Comment on lines 904 to 908
_mm_ucomineq_sh
_mm_ucomineq_sh

# Fail in dev mode
_mm256_fnmsub_ph
_mm256_fnmadd_ph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a known issue? it's surprising given the implementation

#[inline] #[target_feature(enable = "avx512fp16,avx512vl")] #[cfg_attr(test, assert_instr(vfnmsub))] #[unstable(feature = "stdarch_x86_avx512_f16", issue = "127213")] pub fn _mm256_fnmsub_ph(a: __m256h, b: __m256h, c: __m256h) -> __m256h { unsafe { simd_fma(simd_neg(a), b, simd_neg(c)) } }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be surprised if its known as it fails in debug and until now it was not being run in debug.
Here's the failure:
https://github.com/rust-lang/stdarch/actions/runs/19740825453/job/56564044181?pr=1949#logs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wild, miri gives an answer different to both C and Rust in this example. On the other hand, the rust assembly looks allright to me. I don't have access to a machine with +avx512fp16 so I can't test the real truth.

Copy link
Contributor

@sayantn sayantn Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that Rust doesn't guarantee NaN sign/payload propagation, and those tests have differing NaN sign bits

this reflects a problem in our printing logic for fp16 values - we shouldn't compare bit patterns, just numerical values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pick a different input then? As it stands the functions would also be ignored in release mode now which is unfortunate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, or a better path would be to actually print fp16 values correctly in C (which might be a bit difficult, @madhav-madhusoodanan you were trying this out, right?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that is the real problem, I see.

slightly cursed idea: can we just compile the rust hex printing logic to asm and include it? C just cannot print these numbers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot rely on comparing the bit patterns at all for floats, Rust doesn't guarantee that NaN sign/payload will be preserved across functions - it should print NaN or something equivalent for nans. One way would be to convert fp16s to floats in C and the printing them normally (not bitwise)

@CrooseGit CrooseGit changed the title Dev/reucru01/run ci in dev Updates CI pipeline to run tests under in debug as well as release Nov 28, 2025
@sayantn
Copy link
Contributor

sayantn commented Nov 29, 2025

@folkertdev We can try that, I don't know how much of a benefit that would really be though. A massive portion of x86 intrinsics is avx512, so the sde run will still be the bottleneck

@sayantn
Copy link
Contributor

sayantn commented Nov 29, 2025

@CrooseGit could you rebase and test this please now that #1968 is merged?

@CrooseGit CrooseGit force-pushed the dev/reucru01/run-ci-in-dev branch from 10a4851 to e8e320f Compare December 1, 2025 11:22
@sayantn
Copy link
Contributor

sayantn commented Dec 1, 2025

@CrooseGit this won't test all x86 intrinsics - the tool by default only checks 5% of x86 intrinsics as it is very slow. But as we are adding a new run here, we should be sure that it passes, so could you push a temporary commit to check that the full x86 CI passes. It's just one line (in ci/intrinsic-test.sh, L78)

- : "${TEST_SAMPLE_INTRINSICS_PERCENTAGE:=5}" + : "${TEST_SAMPLE_INTRINSICS_PERCENTAGE:=100}" 
@folkertdev
Copy link
Contributor

And to not increase CI times further for the debug run that number should probably be even lower (idk, 3%?). Maybe we can set up some manually-triggered CI thing to run the tests for all intrinsics?

@sayantn
Copy link
Contributor

sayantn commented Dec 1, 2025

Iiuc #1952 does a pretty good reduction in runtime right?

@folkertdev
Copy link
Contributor

Right then we can bump it again

@sayantn
Copy link
Contributor

sayantn commented Dec 1, 2025

I see that _kadd_maskN functions are erroring due to addition overflow. We have to use wrapping_add there

@sayantn
Copy link
Contributor

sayantn commented Dec 1, 2025

@CrooseGit I have fixed the issue in #1969, it'd be great if you can rebase and push

@CrooseGit
Copy link
Contributor Author

@sayantn
Of course, will do.
Would you like all the tests ran again, or back to just a subset?

@folkertdev
Copy link
Contributor

Just the subset should be fine, these were the only 4 failures

Failed to run rust program for intrinsic _kadd_mask16 Failed to run rust program for intrinsic _kadd_mask32 Failed to run rust program for intrinsic _kadd_mask64 Failed to run rust program for intrinsic _kadd_mask8 4 differences found (tested 5530 intrinsics) 

perhaps you can validate them specifically manually just to be sure

@CrooseGit CrooseGit force-pushed the dev/reucru01/run-ci-in-dev branch from 2c035bf to a32c9cd Compare December 2, 2025 11:30
@folkertdev
Copy link
Contributor

This still takes 18 minutes in the longest job

https://github.com/rust-lang/stdarch/actions/runs/19857105546/job/56897656177

Can you tweak the number of tests run when using debug to make CI not run longer than it currently does (so, ~12 minutes total)

@CrooseGit
Copy link
Contributor Author

This still takes 18 minutes in the longest job

https://github.com/rust-lang/stdarch/actions/runs/19857105546/job/56897656177

Can you tweak the number of tests run when using debug to make CI not run longer than it currently does (so, ~12 minutes total)

@folkertdev the link you included refers to the regular run-docker.sh job, and I'm not quite sure how we could easily cut down the number of tests ran for that. Do you have any suggestions?

@folkertdev
Copy link
Contributor

Hmm right. @sayantn could we split that job in two? We could at least split out the avx512 from the rest, but then the avx512 might still take too long.

@sayantn
Copy link
Contributor

sayantn commented Dec 2, 2025

That's a nice idea, but I don't think this pr should be blocked on that (personally ~6min CI time increase isn't a lot, at least temporarily).

@adamgemmell
Copy link
Contributor

fwiw I agree that 6 mins isn't much on stdarch which has a pretty low velocity of PRs, especially when fixing it costs coverage.

That said, do these extra runs here have a purpose when optimisations are turned off?

@sayantn
Copy link
Contributor

sayantn commented Dec 2, 2025

I don't think so, they might have been more useful in the early days of Rust to detect miscompilations. I don't think they serve any purpose now, even in release mode, considering that now we also have intrinsic-test support for x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

6 participants