Questions tagged [cuda]
CUDA is a parallel computing platform and programming model for Nvidia GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs.
55 questions
2 votes
1 answer
90 views
RAII Wrapper For CUDA Pointers
I was recently working on my CUDA wrappers library, and this particular class is one of the oldest pieces of code in the entire project. Since that time, I added tons of other features (for example <...
12 votes
1 answer
799 views
Strongly-typed CUDA device memory
When I discovered that CUDA device memory was represented by plain old void* I was horrified by having to deal with C-style type safety and resource ownership (i.e. ...
7 votes
1 answer
265 views
RAII Wrapper For Registering/Mapping CUDA Resources
I've implemented a resource management class for CUDA interop using RAII to ensure exception safety. The goal is to handle the registration/unregistration and mapping/unmapping, of graphics resources (...
1 vote
0 answers
94 views
Sphere Generation System With CUDA-OpenGL Interop
This is some kind of follow up to my previous question, this question will be more focused on the actual tessellating pipeline. What I changed from previous question Implemented the async sphere ...
1 vote
0 answers
67 views
CUDA Sphere Tesselation With Support For LOD
I was working on my version of "Universe Sandbox" and first thought comes to your mind is "where the hell are my planets?" so I thought loading models sucks and made this thing, It'...
8 votes
1 answer
291 views
CUDA/NVRTC context switching function
I've implemented a feature in my C++ fractal explorer application to switch between CUDA and NVRTC. The main reason for the NVRTC/Driver API context is to support runtime compilation of custom CUDA ...
15 votes
1 answer
2k views
CUDA Mandelbrot Kernel
I'm looking for feedback and suggestions on improving the performance and quality of my CUDA kernel for rendering the Mandelbrot set. I've implemented a "ping-pong" style coloring and ...
3 votes
1 answer
103 views
Tracking total iterations in CUDA fractal renderer
I'm developing a fractal renderer in CUDA and need advice on tracking the total number of iterations performed during rendering. This is important for real-time dragging and zooming performance. ...
6 votes
0 answers
168 views
FractalRendering on GPU with CUDA
I am doing a fractal renderer using CUDA, SFML, C++, recently optimized it to eat less memory, now I am going to optimize the actual fractals, because for some reason, it is the most holding back ...
2 votes
1 answer
85 views
I have a pytorch module that takes in some parameters and predicts the difference between one of it inputs and the target
One instance of the following module uses up to almost 75% of my vram. So, I was wondering how I could improve that without slowing down runtime too much. The code is below: ...
3 votes
1 answer
129 views
Pytorch code running slow for Deep Q learning (Reinforcement Learning)
I'm a new student in reinforcement learning. Below is the code that I wrote for deep Q learning: ...
1 vote
0 answers
252 views
A CUDA kernel for a matrix product as outer product vectors
To multiply the matrices A and B using the outer product of vectors, we can express each row of matrix A as a row vector and each column of matrix B as a column vector. Then, we can take the outer ...
2 votes
1 answer
173 views
Applying cointegration function from statsmodels on a large dataframe
I need to apply the coint function from the statsmodels library to 207 times series with 1397 points each, two by two. Currently, it takes between 35-40 minutes on my computer with an Intel 24 Cores ...
5 votes
3 answers
237 views
Summation over different determinants that are independently computed using CUDA
Do you have any suggestions for improving the efficiency of the code below? I believe that better optimization can be implemented in the GPU function cuKer_sum, which is located in the ...
5 votes
1 answer
223 views
CUDA kernel to compare pairs of matrices
My first time writing anything significant in CUDA. This kernel takes two arrays representing square matrices and compares them pair-wise. It takes into consideration large input arrays, and ...