Summary
In this chapter we learnt how to use a new tool in our IDE to allow us to debug our code in a visual way. This is a very handy tool to be proficient in, for many will be the times when you must find the proverbial needle in the haystack. After that we returned to a topic first introduced in Chapter 5, but now we were able to explore many aspects of the use of CUDA streams and also the impact of data transfer size on performance. We learned about the relations between chunk size, transfer size and the number of data partitions.
By using our profiler from Chapter 7 we were able to visualize the overlapping of memory transfers and computations. This not only shows that the new technique really works, but also enables us to leverage one more use of NVIDIA Nsight Compute which we learnt about previously.
We concluded the chapter by talking about the use of multiple GPUs, even though this is a more unusual setup.
In the next chapter we will learn how to expose our code...