I am working on a fractal rendering software. The basic setup is that I have a big 2-dimensional array (picture), where values are incremented.
The simple rendering process is
while( iteration < maxIteration ) { update some pixel in array; } This is stupidly simple to parallellize; just have several threads to do this simultaneously, since each thread will (very likely) work with different pixels at the same time, and even if there is an update collision in the array, this is fine. The array is shared among the threads!
However, to keep track of the total number of iteratins done, I need iteration to be volatile, which I suspect slows down the code a little.
What baffels me is that I get virtually the same speed for 4 threads and 16 threads, and I run this on a 64-core machine, which is verified by Runtime.getRuntime().availableProcessors().
One issue is that I have no control over where in the array the threads work, hence, the issue might be a big case cache misses? The array is of the size of a fullhd-image: 1920x1080x4 longs.
Thus, I seek possible issues, and solutions to them, since I think this might be a common type of problem.
Edit: The code I am trying to optimize is available here (sourceforge). The class ThreadComputator represents one thread, and all these do iterations. The number of iterations done is stored in the shared variable currentIteration, which (in the current code) is incremented in a synchronized block.
All threads write to the Histogram object, which essentially is a big array of doubles. Writing to this does not need to be atomic, as overwrites will be rare, and the error is tolerated.
AtomicIntegerinstead (orAtomicLong).