- Notifications
You must be signed in to change notification settings - Fork 798
Closed
Labels
Description
Is there a way to get/mention an equivalent functionality of __launch_bounds__() in SYCL.
Porting an optimized CUDA kernel to SYCL which preserves similar launch configuration (<<<....>>>) parameters, but without the functionality of __launch_bounds__() in SYCL leads to the following error because of the kernel launch specifies too many threads for the kernel's register count
Any suggestions: (a) One solution is to tweak with the global and local iteration space for the nd_range but wasn`t sure if this would be portable & performance approach when switching to other devices (i.e., PVC, MI250x, etc).
Error:
PI CUDA ERROR: Value: 701 Name: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES Description: too many resources requested for launch Function: cuda_piEnqueueKernelLaunch Source Location: ..../llvm_sycl/sycl/plugins/cuda/pi_cuda.cpp:3214 terminate called after throwing an instance of 'sycl::_V1::runtime_error' what(): Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)frobnitzem