- Notifications
You must be signed in to change notification settings - Fork 798
[SYCL][HIP] Add AMDGPU reflect pass to choose between safe and unsafe AMDGPU atomics #11467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ldrumm left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely happy about introducing another reflect pass, but I appreciate that we'll likely get better perf than unconditionally prefetching.
What happens to __oclc_amdgpu_reflect if this pass isn't run?
The func will remain in the module and a linking error will result at some stage. Which is the same behaviour as the LLVM reflect pass |
379d3a6 to e5657aa Compare e5657aa to b4617ef Compare b4617ef to 57ae613 Compare 5a8c9ac to 9051df1 Compare 1872497 to bad104b Compare 0eda761 to 982ea7a Compare 982ea7a to 4d17f4f Compare df15e57 to 7f2771b Compare - Change getNumOperands to arg size - Move size check above the for loop
| @frasercrmck all your comments have been addressed. Thanks for review! In terms of function vs module pass - I understand that it might be slightly more optimal to make this a module pass, however I also think for comprehensibility this pass should not diverge too much from NVVMReflect, which is a func pass. Let me know if you think that this is OK |
frasercrmck left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| Friendly ping @intel/dpcpp-tools-reviewers |
ldrumm left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good modulo some minor code nits
- Use a vector of CallInsts instead of Instructions. - Change assert(fasle) to report_fatal_error.
| Thanks @ldrumm changes made |
6c5ef4e to c4ab1ed Compare - Use auto - Use drop_back to remove null byte - Replace hip_be with hip
Change opt test to use update_test_checks.py
AMDGPU reflect pass is needed to choose between safe and unsafe atomics
at the libclc level. In the long run we will delete this patch as work
is being done to ensure correct lowering of atomic instructions. See
patches:
llvm/llvm-project#85052
llvm/llvm-project#69229
This work is necessary as malloc shared atomics rely on PCIe atomics
which can have patchy and unreliable support. Therefore, we want to be
able to choose at compile time whether we should use safe atomics using
CAS (which PCIe should support), or if we want to rely of the
availability of the newest PCIe atomics, if malloc shared atomics are
desired.
Also changes the implementation of
atomic_or,atomic_andso that theycan choose between the safe or unsafe version based on the AMDGPU
reflect value.