1

cuda-memcheck has detected a race condition in the code that does the following:

condition = /*different in each thread*/; shared int owner[nWarps]; /* ... owner[i] is initialized to blockDim.x+1 */ if(condition) { owner[threadIdx.x/32] = threadIdx.x; } 

So basically this code computes the owner thread for each warp based on some condition. For some warp there could be no owner, but for some the number of owners can be more than 1, and then a race condition happens because multiple threads assign a value to the same shared memory region.

After trying the docs, I think what I need can be done with:

const uint32_t mask = __ballot_sync(0xffffffff, condition); if(mask != 0) { const unsigned max_owner = __reduce_max_sync(mask, threadIdx.x); if(threadIdx.x == max_owner) { // at most 1 thread assigns here per warp owner[threadIdx.x/32] = max_owner; } } 

However, my attempt has 2 issues:

  1. I don't really need to find the max thread - it's enough to select any 1 thread for each warp if there is a thread with condition==true
  2. It requires CUDA compute capability 8.x, while I need to support devices of 5.2 compute capability

Could you please help me solve the above issues?

2
  • Why is letting the race condition decide, not suitable? If you declare the shared memory volatile and synchronize the warp, you can read back and know, who the owner is. Commented Oct 17, 2021 at 8:42
  • @Sebastian , cuda-memcheck --tool racecheck complains about such code with level ERROR. Commented Oct 17, 2021 at 18:26

2 Answers 2

1

The following function seems to solve the problem:

void SetOwnerThread(int* dest, const bool condition) { const uint32_t mask = __ballot_sync(0xffffffff, condition); if(!mask) { return; } const uint32_t lowest_bit = mask & -mask; const uint32_t my_bit = (1 << (threadIdx.x & 31)); if(lowest_bit == my_bit) { dest = threadIdx.x; } } 
Sign up to request clarification or add additional context in comments.

Comments

1

No, this is not the right way. You should be using __ffs``. __ffs` will tell you which is the first set bit, where 1 is the LSB and 32 is the MSB. 0 means no bits set.

const uint32_t min_owner_plus_1 = __ffs(__ballot_sync(0xffffffff, condition)); if (min_owner_plus_1 != 0) { const unsigned min_owner = min_owner_plus_1 - 1; if (threadIdx.x == min_owner) { // at most 1 thread assigns here per warp owner[threadIdx.x/32] = min_owner; } } This will only use a single warp sync operation instead of 2. 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.