parallel_for over a number of threads

Question

I have limited experience with multithreading, and I'm currently looking at the pytorch code, where a for loop is parallelized using their custom implementation of parallel_for (it seems to be similarly defined in other codebases and in C++) here:

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp#L2747

My question is, why is it parallelizing over the number of threads? In most use cases where I see a for loop parallelized, it divides the domain (e.g., indices of an array), but here it is dividing the threads. Is this some standard way of multithreading?

MSalters · Accepted Answer · 2022-06-17 14:58:03Z

1

Sayy you want to have a parallel_for loop over 4000 items, and you have 2 CPU's (threads) available. You can choose an arbitrary domain of size 1000. Each thread now needs to process 2 of those domains. You've factored the problem into 2*2*1000.

If you don't choose an arbitrary domain, but let the thread count set the domain size, you factor the problem into 2*2000. This is a bit simpler; there's less overhead for the threads. Each thread gets a single domain.

answered Jun 17, 2022 at 14:58

MSalters

182k11 gold badges171 silver badges376 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

westcoaststudent Over a year ago

ah I see, that makes sense. How do you decide which approach to use? Is it usually obvious, or do you need to benchmark empirically to see which one might be better for a given use case?

MSalters Over a year ago

@westcoaststudent: Empirically? You start with the easiest solution, which is a plain old for. Usually that's good enough. The next step is to just choose the parallel_for form that's easiest to use. pytorch really is exceptional software; most programmers will never write software that is as widely used. There's no point spending one day to make software run 1 second faster, unless that software is ran at least thousands of times.

Collectives™ on Stack Overflow

parallel_for over a number of threads

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related