Has this implementation of FDM touched the speed limit of Mathematica?

Question

Still, I'll use the implementation of the 1D FDTD method (you can simply understand it as a kind of explicit finite difference scheme for the Maxwell's equation) as the example. Just for completeness, here is the 1D Maxwell's equation:

$$\mu \frac{\partial H_y}{\partial t}=\frac{\partial E_z}{\partial x}$$ $$\epsilon \frac{\partial E_z}{\partial t}=\frac{\partial H_y}{\partial x}$$

and the corresponding finite difference equation:

$$H_y^{q+\frac{1}{2}}\left[m+\frac{1}{2}\right]\text{==}H_y^{q-\frac{1}{2}}\left[m+\frac{1}{2}\right]+\frac{\Delta _t}{\mu \Delta _x}\left(E_z^q[m+1]-E_z^q[m]\right)$$ $$E_z^{q+1}[m]==E_z^q[m]+\frac{\Delta _t}{\epsilon \Delta _x}\left(H_y^{q+\frac{1}{2}}\left[m+\frac{1}{2}\right]-H_y^{q+\frac{1}{2}}\left[m-\frac{1}{2}\right]\right)$$

The toy code ~~I've repeatedly used in several posts~~ implementing the difference scheme is:

ie = 200; ez = ConstantArray[0., {ie + 1}]; hy = ConstantArray[0., {ie}]; fdtd1d = Compile[{{steps}}, Module[{ie = ie, ez = ez, hy = hy}, Do[ez[[2 ;; ie]] += (hy[[2 ;; ie]] - hy[[1 ;; ie - 1]]); ez[[1]] = Sin[n/10]; hy[[1 ;; ie]] += (ez[[2 ;; ie + 1]] - ez[[1 ;; ie]]), {n, steps}]; ez]]; result = fdtd1d[10000]; // AbsoluteTiming

Notice that constants like $\mu$, $\Delta _t$ are omitted for simplicity.

Personally I think it's a typical example for the implementation of finite difference method (FDM), so here's the question: has this piece of code (at least almost) touched the speed limit of Mathematica? In fact several months ago, I found that if the code is rewrited with Julia, it'll be 4 times faster.

Indeed, I know this might be the case that one should use the best-suited tool for a specific job, but since I've already gained some stupid pride on using Mathematica and am unwilling to spend time to learn a new programming language (Wolfram is almost my first programming language, I used to learn some VB, but already gave it back to my teacher when started to use Mathematica), I still want to make sure if the Mathematica version of the code can be faster.

If it's the limitation, I'd like to know why there's such a big difference.

Any help would be appreciated.

Using CompilePrint[fdtd1d] we find that the first three instructions in the compiled function are calls to MainEvaluate. If you add CompilationOptions -> {"InlineExternalDefinitions" -> True} and repeat, the first two instructions are now CopyTensor, although there is virtually no difference in the timings between the two functions. This may mean nothing, or it may be something worth investigating. — dr.blochwave
– dr.blochwave, Commented Oct 7, 2014 at 15:54
Probably on reflection means little - the performance hit is likely later on. — dr.blochwave
– dr.blochwave, Commented Oct 7, 2014 at 18:38
@blochwave Yeah, in this case "InlineExternalDefinitions" -> True or changing the beginning part to With[{ie2 = ie, ez2 = ez, hy2 = hy}, Compile[{{steps}}, Module[{ie = ie2, ez = ez2, hy = hy2}, … or With[{ie = 200}, Compile[{{steps}}, Module[{ez = Table[0., {ie + 1}], hy = Table[0., {ie}]}, … doesn't help, the bottleneck is inside Do. — xzczd
– xzczd ♦, Commented Oct 8, 2014 at 3:24
@AlexeyBobrick Just scaned this document and tried several flags in "CompileOptions" (-O3, -Ofast, etc.) but didn't get a speed-up… Could you give an example? (BTW -Ofast does help in speeding up this code :D) — xzczd
– xzczd ♦, Commented Oct 9, 2014 at 12:53

xzczd · Accepted Answer · 2014-10-11 04:00:16Z

Okay, this is a bit of an embarassment.

Here is a very small modification of the original code. I simply made explicit option settings, made a denominator to Sin explicitly real, that kind of thing. My tests show the same timing as the original, give or take an iota.

ie = 200; ez = ConstantArray[0., {ie + 1}]; hy = ConstantArray[0., {ie}]; fdtd1d = Compile[{{steps}}, Module[{ie = ie, ez = ez, hy = hy}, Do[ez[[2 ;; ie]] += (hy[[2 ;; ie]] - hy[[1 ;; ie - 1]]); ez[[1]] = Sin[n/10.]; hy[[1 ;; ie]] += (ez[[2 ;; ie + 1]] - ez[[1 ;; ie]]), {n, steps}]; ez], CompilationOptions -> {"InlineExternalDefinitions" -> True}, "CompilationTarget" -> "C", "RuntimeOptions" -> "Speed"];

I'll beef up the example a factor of 10.

result = fdtd1d[100000]; // AbsoluteTiming (* Out[172]= {0.555320, Null} *)

Now we remove the spans and replace with inner loops. No vectorization, that is.

fdtd1d2 = Compile[{{steps}}, Module[{ie = ie, ez = ez, hy = hy}, Do[ Do[ez[[j]] += (hy[[j]] - hy[[j - 1]]), {j, 2, ie}]; Do[ez[[1]] = Sin[n/10.]; hy[[j - 1]] += (ez[[j]] - ez[[j - 1]]), {j, 2, ie + 1}], {n, steps}]; ez], CompilationOptions -> {"InlineExternalDefinitions" -> True}, "CompilationTarget" -> "C", "RuntimeOptions" -> "Speed"]; result2 = fdtd1d2[100000]; // AbsoluteTiming result2 == result (* Out[174]= {0.179435, Null} Out[175]= True *)

So that's a factor of 3. I guess I need to report this as a suggestion to revisit vector operations in Compile.

--- edit ---

It is of course slightly faster to take the assignment of ez[[1]] =... out of the second inner loop (sound of hand smacking forehead). Also it turns out to be slightly faster to reduce the index arithmetic in the second loop. I also took the assignment of the constant, ie, out of the Module variables and made it really constant using With; this does not seem to affect timing one way or the other.

fdtd1d3 = With[{ie = ie}, Compile[{{steps, _Integer}}, Module[ {ez = ez, hy = hy}, Do[ ez[[1]] = Sin[n/10.]; Do[ez[[j]] += (hy[[j]] - hy[[j - 1]]), {j, 2, ie}]; Do[ hy[[j]] += (ez[[j + 1]] - ez[[j]]), {j, 1, ie}], {n, steps}]; ez], CompilationOptions -> {"InlineExternalDefinitions" -> True}, "CompilationTarget" -> "C", "RuntimeOptions" -> "Speed"]]; result3 = fdtd1d3[100000]; // AbsoluteTiming result3 === result2 (* Out[107]= {0.135636, Null} Out[108]= True *)

So that's a modest improvement.

--- end edit ---

--- edit #2 ---

Per suggestion by @s0rce, we'll use Compile`GetElement instead of Part.

fdtd1d4 = Compile[{{steps, _Integer}}, Module[ {ie = ie, ez = ez, hy = hy}, Do[ ez[[1]] = Sin[n/10.]; Do[ez[[j]] += (Compile`GetElement[hy, j] - Compile`GetElement[hy, j - 1]), {j, 2, ie}]; Do[ hy[[j]] += (Compile`GetElement[ez, j + 1] - Compile`GetElement[ez, j]), {j, 1, ie}], {n, steps}]; ez], CompilationOptions -> {"InlineExternalDefinitions" -> True}, "CompilationTarget" -> "C", "RuntimeOptions" -> "Speed"]; result4 = fdtd1d4[100000]; // AbsoluteTiming result4 === result2 (* Out[122]= {0.076532, Null} Out[123]= True *)

Now that's progress.

--- end edit #2 ---

Just a side note, it's not necessary to make numbers inside Sin explicitly real and use "InlineExternalDefinitions" -> True, and in some computer (for example mine) one may need to use something like And @@ PossibleZeroQ[result2 - result // Chop] to test the result. Then, this answer is really… surprising, it seems to be the first example in this site showing devectorizing can speed up code inside Compile. — xzczd
– xzczd ♦, Commented Oct 9, 2014 at 7:37
There's an additional 1.5X speedup by replacing e.g. ez[[j]] += (Compile`GetElement[hy, j] - Compile`GetElement[hy, j - 1]), {j, 2, ie}] with ez[[j]] = Compile`GetElement[ez, j] + (Compile`GetElement[hy, j] - Compile`GetElement[hy, j - 1]), {j, 2, ie}] etc. — RunnyKine
– RunnyKine, Commented Oct 10, 2014 at 17:56
Hi, @DanielLichtblau. As you said in the answer " I need to report this as a suggestion to revisit vector operations in Compile." It's been 2 years, in the latest M11, the timing is the same... Is it technically so hard to make vector operation faster, considering MMA has such a powerful symbolic power( things like Symbolic C already there)? After all, they are absolutely equivalent, why will equivalent thing lead to different timing... — matheorem
– matheorem, Commented Sep 7, 2016 at 12:51

Stack Exchange Network

Has this implementation of FDM touched the speed limit of Mathematica?

1 Answer 1

Linked

Hot Network Questions

Has this implementation of FDM touched the speed limit of Mathematica?

1 Answer 1

Linked

Related

Hot Network Questions