Skip to main content
deleted 9 characters in body
Source Link
Loki Astari
  • 97.7k
  • 5
  • 126
  • 341

Expanding on comments:

I would reverse the splitting. I would split the writable memory across the threads so each thread wrote to a unique section of memory (it would be good if this were divided by page size as it may help performance). We will call the section of memory that a thread writes to a page.

Then each thread will loop over all the primes and remove them from the page associated with the thread. Using the original code as a starting point it would look like this:

std::size_t sectionSizePerThreadpageSizePerThread = static_cast<std::size_t>(std::ceil( static_cast<float>(n) / // Size of write memory. static_cast<float>(std::thread::hardware_concurrency()) for (std::size_t first = 0u; first < n; first += sectionSizePerThreadpageSizePerThread) { auto last = std::min(first+sectionSizePerThreadfirst+pageSizePerThread, n); // Spawn a thread to strike out the multiples // of the prime numbers corresponding to the // elements of primes between first and last threads.emplace_back( [&primes, &is_prime](Integer begin, Integer end) { for (auto prime: primes) { Integer first = begin/prime; first += (first < begin) ? prime : 0; for (Integer n = first ; n < end ; n += prime) { is_prime[n].store(false, std::memory_order_relaxed); } } }, first, last); } 

Note: It does not matter if read memory is_prime is shared across all the threads as each thread can have its own copy in the local cache and not be affected by other threads (as it is read only).

Expanding on comments:

I would reverse the splitting. I would split the writable memory across the threads so each thread wrote to a unique section of memory (it would be good if this were divided by page size as it may help performance). We will call the section of memory that a thread writes to a page.

Then each thread will loop over all the primes and remove them from the page associated with the thread. Using the original code as a starting point it would look like this:

std::size_t sectionSizePerThread = static_cast<std::size_t>(std::ceil( static_cast<float>(n) / // Size of write memory. static_cast<float>(std::thread::hardware_concurrency()) for (std::size_t first = 0u; first < n; first += sectionSizePerThread) { auto last = std::min(first+sectionSizePerThread, n); // Spawn a thread to strike out the multiples // of the prime numbers corresponding to the // elements of primes between first and last threads.emplace_back( [&primes, &is_prime](Integer begin, Integer end) { for (auto prime: primes) { Integer first = begin/prime; first += (first < begin) ? prime : 0; for (Integer n = first ; n < end ; n += prime) { is_prime[n].store(false, std::memory_order_relaxed); } } }, first, last); } 

Note: It does not matter if read memory is_prime is shared across all the threads as each thread can have its own copy in the local cache and not be affected by other threads (as it is read only).

Expanding on comments:

I would reverse the splitting. I would split the writable memory across the threads so each thread wrote to a unique section of memory (it would be good if this were divided by page size as it may help performance). We will call the section of memory that a thread writes to a page.

Then each thread will loop over all the primes and remove them from the page associated with the thread. Using the original code as a starting point it would look like this:

std::size_t pageSizePerThread = static_cast<std::size_t>(std::ceil( static_cast<float>(n) / // Size of write memory. static_cast<float>(std::thread::hardware_concurrency()) for (std::size_t first = 0u; first < n; first += pageSizePerThread) { auto last = std::min(first+pageSizePerThread, n); // Spawn a thread to strike out the multiples // of the prime numbers corresponding to the // elements of primes between first and last threads.emplace_back( [&primes, &is_prime](Integer begin, Integer end) { for (auto prime: primes) { Integer first = begin/prime; first += (first < begin) ? prime : 0; for (Integer n = first ; n < end ; n += prime) { is_prime[n].store(false, std::memory_order_relaxed); } } }, first, last); } 

Note: It does not matter if read memory is_prime is shared across all the threads as each thread can have its own copy in the local cache and not be affected by other threads (as it is read only).

Source Link
Loki Astari
  • 97.7k
  • 5
  • 126
  • 341

Expanding on comments:

I would reverse the splitting. I would split the writable memory across the threads so each thread wrote to a unique section of memory (it would be good if this were divided by page size as it may help performance). We will call the section of memory that a thread writes to a page.

Then each thread will loop over all the primes and remove them from the page associated with the thread. Using the original code as a starting point it would look like this:

std::size_t sectionSizePerThread = static_cast<std::size_t>(std::ceil( static_cast<float>(n) / // Size of write memory. static_cast<float>(std::thread::hardware_concurrency()) for (std::size_t first = 0u; first < n; first += sectionSizePerThread) { auto last = std::min(first+sectionSizePerThread, n); // Spawn a thread to strike out the multiples // of the prime numbers corresponding to the // elements of primes between first and last threads.emplace_back( [&primes, &is_prime](Integer begin, Integer end) { for (auto prime: primes) { Integer first = begin/prime; first += (first < begin) ? prime : 0; for (Integer n = first ; n < end ; n += prime) { is_prime[n].store(false, std::memory_order_relaxed); } } }, first, last); } 

Note: It does not matter if read memory is_prime is shared across all the threads as each thread can have its own copy in the local cache and not be affected by other threads (as it is read only).