cudaFree is not freeing memory

Question

The code below calculates the dot product of two vectors a and b. The correct result is 8192. When I run it for the first time the result is correct. Then when I run it for the second time the result is the previous result + 8192 and so on:

1st iteration: result = 8192 2nd iteration: result = 8192 + 8192 3rd iteration: result = 8192 + 8192 and so on.

I checked by printing it on screen and the device variable dev_c is not freed. What's more writing to it causes something like a sum, the result beeing the previous value plus the new one being written to it. I guess that could be something with the atomicAdd() operation, but nonetheless cudaFree(dev_c) should erase it after all.

#define N 8192 #define THREADS_PER_BLOCK 512 #define NUMBER_OF_BLOCKS (N/THREADS_PER_BLOCK) #include <stdio.h> __global__ void dot( int *a, int *b, int *c ) { __shared__ int temp[THREADS_PER_BLOCK]; int index = threadIdx.x + blockIdx.x * blockDim.x; temp[threadIdx.x] = a[index] * b[index]; __syncthreads(); if( 0 == threadIdx.x ) { int sum = 0; for( int i= 0; i< THREADS_PER_BLOCK; i++ ){ sum += temp[i]; } atomicAdd(c,sum); } } int main( void ) { int *a, *b, *c; int *dev_a, *dev_b, *dev_c; int size = N * sizeof( int); cudaMalloc( (void**)&dev_a, size ); cudaMalloc( (void**)&dev_b, size ); cudaMalloc( (void**)&dev_c, sizeof(int)); a = (int*)malloc(size); b = (int*)malloc(size); c = (int*)malloc(sizeof(int)); for(int i = 0 ; i < N ; i++){ a[i] = 1; b[i] = 1; } cudaMemcpy( dev_a, a, size, cudaMemcpyHostToDevice); cudaMemcpy( dev_b, b, size, cudaMemcpyHostToDevice); dot<<< N/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>( dev_a, dev_b, dev_c); cudaMemcpy( c, dev_c, sizeof(int) , cudaMemcpyDeviceToHost); printf("Dot product = %d\n", *c); cudaFree(dev_a); cudaFree(dev_b); cudaFree(dev_c); free(a); free(b); free(c); return 0; }

Community · Accepted Answer · 2017-05-23 11:52:04Z

12

cudaFree doesn't erase anything, it simply returns memory to a pool to be re-allocated. cudaMalloc doesn't guarantee the value of memory that has been allocated. You need to initialize memory (both global and shared) that your program uses, in order to have consistent results. The same is true for malloc and free, by the way.

edited May 23, 2017 at 11:52

CommunityBot

11 silver badge

answered Oct 27, 2012 at 13:39

Robert Crovella

154k12 gold badges254 silver badges300 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ZviBar Over a year ago

I initialized the shared variable within the kernel and it works like charm. Thanks!

Rajind Ruparathna Over a year ago

@ZviBar I am having the same issue. Can you please tell me how you initialized within the kernel ? Doesn't every thread reset that value if we initialize inside the kernel ? Thank you in advance.

ZviBar Over a year ago

@RajindRuparathna that was over three years ago and I haven't touched Cuda since then. Sorry.

Rajind Ruparathna Over a year ago

Doing a cudaMemcpy using a initialized array worked for me. cudaMemcpy(dev_c, c, size,cudaMemcpyHostToDevice); where c in an array initialized to zero using a for loop.

Joachim Isaksson · Accepted Answer · 2012-10-27 13:39:28Z

From the documentation of cudaMalloc();

The memory is not cleared.

That means that dev_c is not initialized, and your atomicAdd(c,sum); will add to any random value that happens to be stored in memory at the returned position.

Collectives™ on Stack Overflow

cudaFree is not freeing memory

2 Answers 2

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Linked

Related