- Notifications
You must be signed in to change notification settings - Fork 306
Updates CI pipeline to run tests under in debug as well as release #1967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| r? @folkertdev |
folkertdev left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands this PR makes CI ~6 minutes longer. That is really more than I'm comfortable with...
@sayantn I believe we had some ideas earlier of running some x86 tests under qemu, and only the avx512 ones with the intel emulator?
| _mm_ucomineq_sh | ||
| _mm_ucomineq_sh | ||
| | ||
| # Fail in dev mode | ||
| _mm256_fnmsub_ph | ||
| _mm256_fnmadd_ph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a known issue? it's surprising given the implementation
#[inline] #[target_feature(enable = "avx512fp16,avx512vl")] #[cfg_attr(test, assert_instr(vfnmsub))] #[unstable(feature = "stdarch_x86_avx512_f16", issue = "127213")] pub fn _mm256_fnmsub_ph(a: __m256h, b: __m256h, c: __m256h) -> __m256h { unsafe { simd_fma(simd_neg(a), b, simd_neg(c)) } }There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be surprised if its known as it fails in debug and until now it was not being run in debug.
Here's the failure:
https://github.com/rust-lang/stdarch/actions/runs/19740825453/job/56564044181?pr=1949#logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wild, miri gives an answer different to both C and Rust in this example. On the other hand, the rust assembly looks allright to me. I don't have access to a machine with +avx512fp16 so I can't test the real truth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is that Rust doesn't guarantee NaN sign/payload propagation, and those tests have differing NaN sign bits
this reflects a problem in our printing logic for fp16 values - we shouldn't compare bit patterns, just numerical values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pick a different input then? As it stands the functions would also be ignored in release mode now which is unfortunate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably, or a better path would be to actually print fp16 values correctly in C (which might be a bit difficult, @madhav-madhusoodanan you were trying this out, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah that is the real problem, I see.
slightly cursed idea: can we just compile the rust hex printing logic to asm and include it? C just cannot print these numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot rely on comparing the bit patterns at all for floats, Rust doesn't guarantee that NaN sign/payload will be preserved across functions - it should print NaN or something equivalent for nans. One way would be to convert fp16s to floats in C and the printing them normally (not bitwise)
| @folkertdev We can try that, I don't know how much of a benefit that would really be though. A massive portion of x86 intrinsics is avx512, so the sde run will still be the bottleneck |
| @CrooseGit could you rebase and test this please now that #1968 is merged? |
10a4851 to e8e320f Compare | @CrooseGit this won't test all x86 intrinsics - the tool by default only checks 5% of x86 intrinsics as it is very slow. But as we are adding a new run here, we should be sure that it passes, so could you push a temporary commit to check that the full x86 CI passes. It's just one line (in |
| And to not increase CI times further for the debug run that number should probably be even lower (idk, 3%?). Maybe we can set up some manually-triggered CI thing to run the tests for all intrinsics? |
| Iiuc #1952 does a pretty good reduction in runtime right? |
| Right then we can bump it again |
| I see that |
| @CrooseGit I have fixed the issue in #1969, it'd be great if you can rebase and push |
| @sayantn |
| Just the subset should be fine, these were the only 4 failures perhaps you can validate them specifically manually just to be sure |
2c035bf to a32c9cd Compare | This still takes 18 minutes in the longest job https://github.com/rust-lang/stdarch/actions/runs/19857105546/job/56897656177 Can you tweak the number of tests run when using |
@folkertdev the link you included refers to the regular run-docker.sh job, and I'm not quite sure how we could easily cut down the number of tests ran for that. Do you have any suggestions? |
| Hmm right. @sayantn could we split that job in two? We could at least split out the avx512 from the rest, but then the avx512 might still take too long. |
| That's a nice idea, but I don't think this pr should be blocked on that (personally ~6min CI time increase isn't a lot, at least temporarily). |
| fwiw I agree that 6 mins isn't much on stdarch which has a pretty low velocity of PRs, especially when fixing it costs coverage. That said, do these extra runs here have a purpose when optimisations are turned off? |
| I don't think so, they might have been more useful in the early days of Rust to detect miscompilations. I don't think they serve any purpose now, even in release mode, considering that now we also have intrinsic-test support for x86 |
This is part 2/2 of this PR