Skip to main content
AI Assist is now on Stack Overflow. Start a chat to get instant answers from across the network. Sign up to save and share your chats.
0 votes
1 answer
493 views

Suppose my machine has GPUs with compute capabilities XX and YY. Having read: https://stackoverflow.com/a/35657430/1593077 I know I can call nvcc like so: nvcc \ -o myapp \ -gencode arch=...
einpoklum's user avatar
  • 138k
0 votes
1 answer
353 views

Assume the CUDA version installed only supports my old GPU when -arch=sm_35 is passed. Otherwise, kernels do not execute. Suppose I now only call CUDA Runtime APIs (cudaMalloc, cudaFree, etc) in my C++...
md657's user avatar
  • 13
-2 votes
1 answer
3k views

I have an old Intel Core i7 950 CPU with no AVX support, a newer NVIDIA RTX 3060 Ti GPU with compute capability 8.6, and the Windows 10 OS. Despite the default Tensorflow distribution requiring AVX ...
Thot's user avatar
  • 65
0 votes
1 answer
280 views

Is there a way to check at runtime for which CUDA compute capabilites the current program was compiled? Or do the arch=compute_xx,code=sm_xx flags set any defines which could be checked? Background is ...
Dominic's user avatar
  • 491
1 vote
1 answer
59 views

Suppose I have a CUDA project and I'm writing its CMakeLists.txt. In my project, I have several .cu source files with kernels, each of which has a minimum NVIDIA microarchitecture version it supports. ...
einpoklum's user avatar
  • 138k
1 vote
1 answer
3k views

In CUDA compute capability 8.6, each Streaming Multiprocessor (SM) has four warp schedulers. Each warp scheduler can schedule up to 16 warps concurrently, meaning that theoretically up to 64 warps ...
Tokubara's user avatar
  • 490
0 votes
2 answers
3k views

According to CUDA Programming Guide, "Atomic functions are only atomic with respect to other operations performed by threads of a particular set ... Block-wide atomics: atomic for all CUDA ...
user2348209's user avatar
1 vote
2 answers
465 views

cuda-memcheck has detected a race condition in the code that does the following: condition = /*different in each thread*/; shared int owner[nWarps]; /* ... owner[i] is initialized to blockDim.x+1 */ ...
Serge Rogatch's user avatar
1 vote
0 answers
1k views

I am trying to set a constraint so that my job would only run on GPUs with compute capability higher (or equal) to 7. Here is my script named torch_gpu_sanity_venv385-11.slurm: #!/bin/bash #SBATCH --...
Mona Jalal's user avatar
  • 38.9k
7 votes
4 answers
25k views

Newer versions of CMake (3.18 and later), are "aware" of the choice of CUDA architectures which compilation of CUDA code targets. Targets have a CUDA_ARCHITECTURES property, which, when set, ...
einpoklum's user avatar
  • 138k
1 vote
1 answer
864 views

These are my assumptions: There are two types of loads, cached and uncached. In the first one, the traffic goes through L1 and L2, while in the second one, the traffic goes only through L2. The ...
rm95's user avatar
  • 317
0 votes
2 answers
1k views

I need to compile a CUDA code that use a dynamic parallelism with cmake. The code is: #include <stdio.h> __global__ void childKernel() { printf("Hello "); } __global__ void ...
Gianluca Brilli's user avatar
1 vote
2 answers
2k views

So I found this wikipedia resource Maximum number of resident grids per device (Concurrent Kernel Execution) and for each compute capability it says a number of concurrent kernels, which I assume ...
user2255757's user avatar
27 votes
3 answers
26k views

Suppose I have a system with a single GPU installed, and suppose I've also installed a recent version of CUDA. I want to determine what's the compute capability of my GPU. If I could compile code, ...
einpoklum's user avatar
  • 138k
0 votes
2 answers
4k views

I've tensorflow installed with CUDA 7.5 and cuDNN 5.0. My graphics card is NVIDIA Geforce 820M with capability 2.1. However, I get this error. Ignoring visible gpu device (device: 0, name: GeForce ...
Keerthana Gopalakrishnan's user avatar

15 30 50 per page