You're reading from GPU Programming with C++ and CUDA Uncover effective techniques for writing efficient GPU-parallel C++ applications

Product type Paperback

Published in Aug 2025

Publisher Packt

ISBN-13 9781805124542

Length 270 pages

Edition 1st Edition

Languages

C++

Tools

CUDA

Concepts

Application Development

Author (1):

Paulo Motta

View More author details

Table of Contents (17) Chapters

Preface

1. Understanding Where We Are Heading

2. Introduction to Parallel Programming FREE CHAPTER

3. Setting Up Your Development Environment

4. Hello CUDA

5. Hello Again, but in Parallel

6. Bring It On!

7. A Closer Look into the World of GPUs

8. Parallel Algorithms with CUDA

9. Performance Strategies

10. Moving Forward

11. Overlaying Multiple Operations

12. Exposing Your Code to Python

13. Exploring Existing GPU Models

14. Unlock Your Book’s Exclusive Benefits

How to unlock these benefits in three easy steps

15. Other Books You May Enjoy

16. Index

Understanding the thread, block, and grid concepts

In Chapter 4 we finally executed truly parallel CUDA programs, and we saw that a configuration is needed for launching the kernel. More than that, this launch configuration forms part of the program itself. Taking a closer look at the vector addition program, we saw that the CPU version has a for loop while the GPU version has a configuration in place of this for loop. Thus we need to discuss the concepts that determine how execution takes place. Threads, blocks, and grids work together, but let’s get to know them individually first.

Threads

A thread is the basic unit of execution in CUDA, with each thread having its own unique ID that allows it to independently access a specific portion of the data. For a given problem, we define a set of threads that can handle the requisite amount of data. Thread IDs can be one-dimensional, two-dimensional, or three-dimensional (threadIdx.x, threadIdx.y, threadIdx.z), depending on...