Still, I'll use the implementation of the 1D FDTD method (you can simply understand it as a kind of explicit finite difference scheme for the Maxwell's equation) as the example. Just for completeness, here is the 1D Maxwell's equation:
$$\mu \frac{\partial H_y}{\partial t}=\frac{\partial E_z}{\partial x}$$ $$\epsilon \frac{\partial E_z}{\partial t}=\frac{\partial H_y}{\partial x}$$
and the corresponding finite difference equation:
$$H_y^{q+\frac{1}{2}}\left[m+\frac{1}{2}\right]\text{==}H_y^{q-\frac{1}{2}}\left[m+\frac{1}{2}\right]+\frac{\Delta _t}{\mu \Delta _x}\left(E_z^q[m+1]-E_z^q[m]\right)$$ $$E_z^{q+1}[m]==E_z^q[m]+\frac{\Delta _t}{\epsilon \Delta _x}\left(H_y^{q+\frac{1}{2}}\left[m+\frac{1}{2}\right]-H_y^{q+\frac{1}{2}}\left[m-\frac{1}{2}\right]\right)$$
The toy code I've repeatedly used in several posts implementing the difference scheme is:
ie = 200; ez = ConstantArray[0., {ie + 1}]; hy = ConstantArray[0., {ie}]; fdtd1d = Compile[{{steps}}, Module[{ie = ie, ez = ez, hy = hy}, Do[ez[[2 ;; ie]] += (hy[[2 ;; ie]] - hy[[1 ;; ie - 1]]); ez[[1]] = Sin[n/10]; hy[[1 ;; ie]] += (ez[[2 ;; ie + 1]] - ez[[1 ;; ie]]), {n, steps}]; ez]]; result = fdtd1d[10000]; // AbsoluteTiming Notice that constants like $\mu$, $\Delta _t$ are omitted for simplicity.
Personally I think it's a typical example for the implementation of finite difference method (FDM), so here's the question: has this piece of code (at least almost) touched the speed limit of Mathematica? In fact several months ago, I found that if the code is rewrited with Julia, it'll be 4 times faster.
Indeed, I know this might be the case that one should use the best-suited tool for a specific job, but since I've already gained some stupid pride on using Mathematica and am unwilling to spend time to learn a new programming language (Wolfram is almost my first programming language, I used to learn some VB, but already gave it back to my teacher when started to use Mathematica), I still want to make sure if the Mathematica version of the code can be faster.
If it's the limitation, I'd like to know why there's such a big difference.
Any help would be appreciated.
CompilePrint[fdtd1d]we find that the first three instructions in the compiled function are calls toMainEvaluate. If you addCompilationOptions -> {"InlineExternalDefinitions" -> True}and repeat, the first two instructions are nowCopyTensor, although there is virtually no difference in the timings between the two functions. This may mean nothing, or it may be something worth investigating. $\endgroup$"InlineExternalDefinitions" -> Trueor changing the beginning part toWith[{ie2 = ie, ez2 = ez, hy2 = hy}, Compile[{{steps}}, Module[{ie = ie2, ez = ez2, hy = hy2}, …orWith[{ie = 200}, Compile[{{steps}}, Module[{ez = Table[0., {ie + 1}], hy = Table[0., {ie}]}, …doesn't help, the bottleneck is insideDo. $\endgroup$"CompileOptions"(-O3,-Ofast, etc.) but didn't get a speed-up… Could you give an example? (BTW-Ofastdoes help in speeding up this code :D) $\endgroup$