2
$\begingroup$

I’m using ParallelMap on a very large dataset (millions of elements), but it quickly consumes all available RAM.

For example:

result = ParallelMap[func, data] 

This causes RAM usage to fill up completely.
To reduce memory usage, I tried dividing the data into 20 or more parts, but it doesn’t seem to help. I’m not sure if this is the correct approach and I just need to divide into more chunks, or if there’s something fundamentally wrong with this method.

blockSize = Ceiling[Length[data]/20]; parts = Partition[data, blockSize, blockSize, {1, 1}, {}]; result = Reap[ Do[ partialRes = ParallelMap[func, part, Method -> "FinestGrained"]; Sow[partialRes]; Clear[partialRes]; , {part, parts} ] ] 

Even when splitting the data into 20 or more chunks, the RAM usage still becomes full.
Is there a better way to manage memory when using ParallelMap on large datasets?

$\endgroup$
2
  • $\begingroup$ If the output of your function is similarly large as an input than splitting the data will not improve things. So maybe saving output to hard disk for each part and joining them afterwards may help. $\endgroup$ Commented Nov 5 at 10:34
  • $\begingroup$ @azerbajdzan I guess you're right. The output is quite large, and it probably causes the RAM to overflow. $\endgroup$ Commented Nov 5 at 14:42

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.