This is another 3.5x speedup. See herehere for more info on vectorization.
In general, here are some ways to try to speed up applying a function to a list:
In general, here are some ways to try to speed up applying a function to a list:
As @AlbertRetey said, parallelization in Mathematica is not extremely efficient. It is only worth it if the original code took a relatively long time to execute. With a multi-second execution time, it may be worth thinking about parallelization. With sub-second execution times, it probably won't be possible to gain anything.
This is the useful and practical answer to your question. I am still interested in what it is exactly that's slow here, hence my other answer. But that's mostly of theoretical interest.
There are techniques other than parallelization with the parallel tools to speed up things. I'm going to show a few below:
p[x_] = x^2; Table[p[i], {i, 10^5}]; // AbsoluteTiming (* {0.075937, Null} *) The original timing is less than 0.1 s on my computer (M11.0.1), so the parallel tools aren't the right choice for further speedups. What else can we do?
It's good to be aware that Table automatically compiled its argument when it can. Let's help it do this. Let's inline the code:
Table[i^2, {i, 10^5}]; // AbsoluteTiming (* {0.00322, Null} *) This is a more than 20x speedup. When the structure of p allows for it (e.g. symbolic expressions), we can simply do
Table[p[i] // Evaluate, {i, 10^5}]; // AbsoluteTiming (* {0.002974, Null} *) Another way is vectorization:
Range[10^5]^2; // AbsoluteTiming (* {0.00082, Null} *) p[Range[10^5]]; // AbsoluteTiming (* {0.000837, Null} *) This is another 3.5x speedup. See here for more info on vectorization.
Vector arithmetic is extremely efficient. It will typically be faster than naive C code because it makes use of SIMD instructions and is often internally parallelized. (It uses a not so naive C implementation internally.)
Finally, there is another way to parallelize in Mathematica: by making listable compiled functions.
cf = Compile[{{x, _Integer}}, x^2, RuntimeAttributes -> {Listable}, Parallelization -> True] cf[Range[10^5]]; // AbsoluteTiming (* {0.003539, Null} *) Since it's Listable, this will do element by element processing (i.e. it won't use vector arithmetic), but it makes use of all the CPU cores in your computer. The Compile documentation page has more examples.
As you can see, it is slower than the other approaches for this particular trivial case. But I assume that you showed this case only as an example. In practice, we will usually have more complex cases, where this approach may be worth it.
If the function is really simple and can be translated to vector arithmetic, do that first. However, this gets complicated quickly, and is not always possible. The first problem is typically too many branches (functions like
If).Next, try to write the function in a way that operations like
Map,Tablecan automatically compile it.Next,
Compileit manually. Make use of vector arithmetic inside of compiled functions too. If this is not possible, make a compiled function that operates on single elements, but make it parallelized and listable, then apply it to a list of values in one go.