22 questions
0 votes
1 answer
493 views
If I tell NVCC to -gencode arch=native, what do I use for the code= argument?
Suppose my machine has GPUs with compute capabilities XX and YY. Having read: https://stackoverflow.com/a/35657430/1593077 I know I can call nvcc like so: nvcc \ -o myapp \ -gencode arch=...
0 votes
1 answer
353 views
CUDA -arch for older GPUs while only compiling host code calling CUDA APIs or third party libs
Assume the CUDA version installed only supports my old GPU when -arch=sm_35 is passed. Otherwise, kernels do not execute. Suppose I now only call CUDA Runtime APIs (cudaMalloc, cudaFree, etc) in my C++...
-2 votes
1 answer
3k views
Tensorflow Warning: TensorFlow was not built with CUDA kernel binaries compatible with compute capability 8.6
I have an old Intel Core i7 950 CPU with no AVX support, a newer NVIDIA RTX 3060 Ti GPU with compute capability 8.6, and the Windows 10 OS. Despite the default Tensorflow distribution requiring AVX ...
0 votes
1 answer
280 views
How to check for which CUDA compute capabilites kernels are available?
Is there a way to check at runtime for which CUDA compute capabilites the current program was compiled? Or do the arch=compute_xx,code=sm_xx flags set any defines which could be checked? Background is ...
1 vote
1 answer
59 views
CMake idiom regarding minimum microarchitecture checking
Suppose I have a CUDA project and I'm writing its CMakeLists.txt. In my project, I have several .cu source files with kernels, each of which has a minimum NVIDIA microarchitecture version it supports. ...
1 vote
1 answer
3k views
Understanding Warp Scheduler Utilization in CUDA: Maximum Concurrent Warps vs Resident Warps
In CUDA compute capability 8.6, each Streaming Multiprocessor (SM) has four warp schedulers. Each warp scheduler can schedule up to 16 warps concurrently, meaning that theoretically up to 64 warps ...
0 votes
2 answers
3k views
CUDA atomicAdd_block is undefined
According to CUDA Programming Guide, "Atomic functions are only atomic with respect to other operations performed by threads of a particular set ... Block-wide atomics: atomic for all CUDA ...
1 vote
2 answers
465 views
Pre 8.x equivalent of __reduce_max_sync() in CUDA
cuda-memcheck has detected a race condition in the code that does the following: condition = /*different in each thread*/; shared int owner[nWarps]; /* ... owner[i] is initialized to blockDim.x+1 */ ...
1 vote
0 answers
1k views
Setting the constraint in slurm job script for GPU compute capability
I am trying to set a constraint so that my job would only run on GPUs with compute capability higher (or equal) to 7. Here is my script named torch_gpu_sanity_venv385-11.slurm: #!/bin/bash #SBATCH --...
7 votes
4 answers
25k views
How can I get CMake to automatically detect the value for CUDA_ARCHITECTURES?
Newer versions of CMake (3.18 and later), are "aware" of the choice of CUDA architectures which compilation of CUDA code targets. Targets have a CUDA_ARCHITECTURES property, which, when set, ...
1 vote
1 answer
864 views
Cache behaviour in Compute Capability 7.5
These are my assumptions: There are two types of loads, cached and uncached. In the first one, the traffic goes through L1 and L2, while in the second one, the traffic goes only through L2. The ...
0 votes
2 answers
1k views
Compile CUDA code with cmake and 3.5 compute capability
I need to compile a CUDA code that use a dynamic parallelism with cmake. The code is: #include <stdio.h> __global__ void childKernel() { printf("Hello "); } __global__ void ...
1 vote
2 answers
2k views
Maximum number of concurrent kernels & virtual code architecture
So I found this wikipedia resource Maximum number of resident grids per device (Concurrent Kernel Execution) and for each compute capability it says a number of concurrent kernels, which I assume ...
27 votes
3 answers
26k views
What utility/binary can I call to determine an nVIDIA GPU's Compute Capability?
Suppose I have a system with a single GPU installed, and suppose I've also installed a recent version of CUDA. I want to determine what's the compute capability of my GPU. If I could compile code, ...
0 votes
2 answers
4k views
Cannot use GPU with Tensorflow
I've tensorflow installed with CUDA 7.5 and cuDNN 5.0. My graphics card is NVIDIA Geforce 820M with capability 2.1. However, I get this error. Ignoring visible gpu device (device: 0, name: GeForce ...