About error code "invalid device function" by nvcc with compute_ and sm_ compile option

Question

I hope you can help me to figure out the correct compiler option required for the below card:

> ./deviceQuery Starting... > > CUDA Device Query (Runtime API) version (CUDART static linking) > > Detected 1 CUDA Capable device(s) > > Device 0: "GeForce GTX 780 Ti" > CUDA Driver Version / Runtime Version 7.0 / 6.5 > CUDA Capability Major/Minor version number: 3.5 > Total amount of global memory: 3072 MBytes (3220897792 > bytes) > (15) Multiprocessors, (192) CUDA Cores/MP: > 2880 CUDA Cores > GPU Clock rate: 1020 MHz (1.02GHz) > Memory Clock rate: 3500 Mhz > Memory Bus Width: 384-bit > L2 Cache Size: 1572864 bytes ... Maximum Texture > Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), > 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers > 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) > layers 2D=(16384, 16384), 2048 layers Total amount of constant > memory: 65536 bytes Total amount of shared memory per > block: 49152 bytes Total number of registers available per > block: 65536 Warp size: 32 > Maximum number of threads per multiprocessor: 2048 Maximum number > of threads per block: 1024 Max dimension size of a thread > block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size > (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: > 2147483647 bytes Texture alignment: 512 > bytes Concurrent copy and kernel execution: Yes with 1 copy > engine(s) Run time limit on kernels: Yes > Integrated GPU sharing Host Memory: No Support host > page-locked memory mapping: Yes Alignment requirement for > Surfaces: Yes Device has ECC support: > Disabled Device supports Unified Addressing (UVA): Yes Device > PCI Bus ID / PCI location ID: 3 / 0 Compute Mode: > < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > > > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA > Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce GTX 780 Ti > Result = PASS

I have a piece of cuda code and debug with nvcc (CUDA 6.5). When I added those options:

-arch compute_20 -code sm_20

then program gave me this error:

error code invalid device function

If I remove those options (nvcc source -o exe), the program runs fine. Can anyone help me figure out which compute_ and sm_ is suitable for my card by looking at the output of ./deviceQuery? I read from the nvidia manual that using the correct option of compute_ and sm_ for the card results in significant speed up . Has anyone observed quantitatively this speed up?

Thanks

Today I had the same strange problem with using MSVC from VS2022 + NVCC from CUDA Toolkit 12.4. And I could not understand what was wrong. One workaround that helped is actually eliminating C++ /GL (learn.microsoft.com/en-us/cpp/build/reference/…) and Linker /LTCG (learn.microsoft.com/en-us/cpp/build/reference/…) — Konstantin Burlachenko
– Konstantin Burlachenko, Commented Jun 24, 2024 at 0:30

Robert Crovella · Accepted Answer · 2019-01-22 03:19:54Z

"Invalid Device Function" error in CUDA generally means you have compiled with GPU architecture settings that don't match or are not compatible with the GPU you are running on.

The general process to solve this is to run the deviceQuery sample code on your GPU, determine the compute capability major and minor versions from the output, and use that to select compile architecture settings for your GPU.

if you your GPU is architecture compute capability X.Y, then a very simple choice would be:

-arch=sm_XY

Can anyone help me figure out which compute_ and sm_ is suitable for my card by looking at the output of ./deviceQuery?

Following your example, the correct settings for GTX 780 Ti are:

-arch compute_35 -code sm_35

The above will generate code that will run on a cc3.5 device (only). I think it's better just to specify:

-arch=sm_35

which is a shorthand for slightly more complicated version:

-gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35

This will generate code that will run on a cc3.5 or newer device. The 3.5/35 number arises from this line in your deviceQuery output:

Capability Major/Minor version number: 3.5

If you want to understand the switch options/differences better, I suggest you review the nvcc manual and this question/answer.

For more description of the behavior of the -arch switch, see here.

Thanks Robert. Just a small question: "CUDA Capability Major/Minor version number: 3.5" => does it mean that _35 for the compute_35 and sm_35 for this card?
Yes, the 3.5 tells us that we want to use compute_35 and sm_35 to target this card.

Collectives™ on Stack Overflow

About error code "invalid device function" by nvcc with compute_ and sm_ compile option

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related