I'm working on a project where I need my CUDA device to make computations on a struct containing pointers.
typedef struct StructA { int* arr; } StructA; When I allocate memory for the struct and then copy it to the device, it will only copy the struct and not the content of the pointer. Right now I'm working around this by allocating the pointer first, then set the host struct to use that new pointer (which resides on the GPU). The following code sample describes this approach using the struct from above:
#define N 10 int main() { int h_arr[N] = {1,2,3,4,5,6,7,8,9,10}; StructA *h_a = (StructA*)malloc(sizeof(StructA)); StructA *d_a; int *d_arr; // 1. Allocate device struct. cudaMalloc((void**) &d_a, sizeof(StructA)); // 2. Allocate device pointer. cudaMalloc((void**) &(d_arr), sizeof(int)*N); // 3. Copy pointer content from host to device. cudaMemcpy(d_arr, h_arr, sizeof(int)*N, cudaMemcpyHostToDevice); // 4. Point to device pointer in host struct. h_a->arr = d_arr; // 5. Copy struct from host to device. cudaMemcpy(d_a, h_a, sizeof(StructA), cudaMemcpyHostToDevice); // 6. Call kernel. kernel<<<N,1>>>(d_a); // 7. Copy struct from device to host. cudaMemcpy(h_a, d_a, sizeof(StructA), cudaMemcpyDeviceToHost); // 8. Copy pointer from device to host. cudaMemcpy(h_arr, d_arr, sizeof(int)*N, cudaMemcpyDeviceToHost); // 9. Point to host pointer in host struct. h_a->arr = h_arr; } My question is: Is this the way to do it?
It seems like an awful lot of work, and I remind you that this is a very simple struct. If my struct contained a lot of pointers or structs with pointers themselves, the code for allocation and copy will be quite extensive and confusing.
h_ais (or should be) an "image" of the device structure held in host memory. Assigning it to hold a pointer in host memory is probably some combination of bad practice/wrong/device memory leak depending on what your true intentions are. After you have copied the contents ofd_aback toh_ayou have "come full circle" and are back where you started from.h_atod_arr(step 4). So when I copy the data back, I also have to set the pointer inh_ato the array I just copied it to. I agree that step 7 is redundant in my example above because there is no other information held in the struct, but if there was that step wouldn't be redundant.. Or am I completely mistaken?