Subscribe to RSS

Question 1

Suppose my machine has GPUs with compute capabilities XX and YY. Having read: https://stackoverflow.com/a/35657430/1593077 I know I can call nvcc like so: nvcc \ -o myapp \ -gencode arch=...

Question 2

Assume the CUDA version installed only supports my old GPU when -arch=sm_35 is passed. Otherwise, kernels do not execute. Suppose I now only call CUDA Runtime APIs (cudaMalloc, cudaFree, etc) in my C++...

Question 3

I have an old Intel Core i7 950 CPU with no AVX support, a newer NVIDIA RTX 3060 Ti GPU with compute capability 8.6, and the Windows 10 OS. Despite the default Tensorflow distribution requiring AVX ...

Question 4

Is there a way to check at runtime for which CUDA compute capabilites the current program was compiled? Or do the arch=compute_xx,code=sm_xx flags set any defines which could be checked? Background is ...

Question 5

Suppose I have a CUDA project and I'm writing its CMakeLists.txt. In my project, I have several .cu source files with kernels, each of which has a minimum NVIDIA microarchitecture version it supports. ...

Question 6

In CUDA compute capability 8.6, each Streaming Multiprocessor (SM) has four warp schedulers. Each warp scheduler can schedule up to 16 warps concurrently, meaning that theoretically up to 64 warps ...

Question 7

According to CUDA Programming Guide, "Atomic functions are only atomic with respect to other operations performed by threads of a particular set ... Block-wide atomics: atomic for all CUDA ...

Question 8

cuda-memcheck has detected a race condition in the code that does the following: condition = /*different in each thread*/; shared int owner[nWarps]; /* ... owner[i] is initialized to blockDim.x+1 */ ...

Question 9

I am trying to set a constraint so that my job would only run on GPUs with compute capability higher (or equal) to 7. Here is my script named torch_gpu_sanity_venv385-11.slurm: #!/bin/bash #SBATCH --...

Question 10

Newer versions of CMake (3.18 and later), are "aware" of the choice of CUDA architectures which compilation of CUDA code targets. Targets have a CUDA_ARCHITECTURES property, which, when set, ...

Question 11

These are my assumptions: There are two types of loads, cached and uncached. In the first one, the traffic goes through L1 and L2, while in the second one, the traffic goes only through L2. The ...

Question 12

I need to compile a CUDA code that use a dynamic parallelism with cmake. The code is: #include <stdio.h> __global__ void childKernel() { printf("Hello "); } __global__ void ...

Question 13

So I found this wikipedia resource Maximum number of resident grids per device (Concurrent Kernel Execution) and for each compute capability it says a number of concurrent kernels, which I assume ...

Question 14

Suppose I have a system with a single GPU installed, and suppose I've also installed a recent version of CUDA. I want to determine what's the compute capability of my GPU. If I could compile code, ...

Question 15

I've tensorflow installed with CUDA 7.5 and cuDNN 5.0. My graphics card is NVIDIA Geforce 820M with capability 2.1. However, I get this error. Ignoring visible gpu device (device: 0, name: GeForce ...

Collectives™ on Stack Overflow

If I tell NVCC to -gencode arch=native, what do I use for the code= argument?

CUDA -arch for older GPUs while only compiling host code calling CUDA APIs or third party libs

Tensorflow Warning: TensorFlow was not built with CUDA kernel binaries compatible with compute capability 8.6

How to check for which CUDA compute capabilites kernels are available?

CMake idiom regarding minimum microarchitecture checking

Understanding Warp Scheduler Utilization in CUDA: Maximum Concurrent Warps vs Resident Warps

CUDA atomicAdd_block is undefined

Pre 8.x equivalent of __reduce_max_sync() in CUDA

Setting the constraint in slurm job script for GPU compute capability

How can I get CMake to automatically detect the value for CUDA_ARCHITECTURES?

Cache behaviour in Compute Capability 7.5

Compile CUDA code with cmake and 3.5 compute capability

Maximum number of concurrent kernels & virtual code architecture

What utility/binary can I call to determine an nVIDIA GPU's Compute Capability?

Cannot use GPU with Tensorflow

Hot Network Questions