Profiling with NVIDIA Nsight Compute
We are currently working in a lab-like environment where all our programs for the book are small, and most of the time they have a single kernel that relates to the technique we are learning about. However, in real-world programs we will face situations in which many distinct functions interact with each other, passing their results forward to achieve results for our users. In such scenarios, the use of profiling tools becomes fundamental to identifying the hot spots where we should concentrate our time-saving efforts.
Optimizing code without proper information can make us spend too much time on a piece of code that is rarely executed. Such blind optimization can unfold in many ways, but one possible result is that after applying various optimizations we find no improvement at all. It is important to use tools that help us identify where the bottlenecks really are.
Configuring access to GPU performance counters
We need to make sure...