I am doing an analysis of experimental results in which I need to repeat the same GaussianFilter hundred of times on different data. As explained in the documentation, GaussianFilter just convolves the data with a Gaussian kernel. Does it recompute the kernel every time I call the function, or will it somehow preserve and reuse the previous kernel? Would it be more efficient computationally for me to precompute the kernel (which I could do easily by applying GaussianFilter to a KroneckerDelta array), then do hundreds of ListConvolves instead of hundreds of GaussianFilters?
2 Answers
Here I implemented three different versions of Gaussian filtering (for periodic data). It took me a while to adjust the constants and still some of them might be wrong.
Prepare the Gaussian kernel
n = 200000; σ = .1; t = Subdivide[-1. Pi, 1. Pi, n - 1]; ker = 1/Sqrt[2 Pi]/ σ Exp[-(t/σ)^2/2]; ker = Join[ker[[Quotient[n,2] + 1 ;;]], ker[[;; Quotient[n,2]]]]; Generate noisy function
u = Sin[t] + Cos[2 t] + 1.5 Cos[3 t] + .5 RandomReal[{-1, 1}, Length[t]]; The three methods with their timings. As Niki Estner pointed out, GaussianFilter with the option Method -> "Gaussian" performs much batter than GaussianFilter with the default emthod.
kerhat = 2 Pi/Sqrt[N@n] Fourier[ker]; vConvolve = (2. Pi/n) ListConvolve[ker, u, {-1, -1}]; // RepeatedTiming // First vFFT = Re[Fourier[InverseFourier[u] kerhat]]; // RepeatedTiming // First vFilter = GaussianFilter[u, 1./(Pi) σ n, Padding -> "Periodic"]; // RepeatedTiming // First vGaussian = GaussianFilter[u, 1./(Pi) σ n, Padding -> "Periodic", Method -> "Gaussian"]; // RepeatedTiming // First 0.0038
0.0058
0.055
0.0072
ListLinePlot[{u, vFFT, vFilter, vConvolve}] From further experiments with different values for n, GaussianFilter seems to be slower by a factor 10-20 over a wide range of n (from n = 1000 to n = 1000000). So it seems that it does use some FFT-based method (because it has the same speed asymptotics) but maybe some crucial part of the algorithm is not compiled (the factor 10 is somewhat an indicator for that) or does not use the fastest FFT implementation possible. A bit weird.
So, to my own surprise, your idea of computing the kernel once does help but for quite unexpected reasons.
- 1$\begingroup$ IIRC, it's using a discrete Gaussian kernel, which involves non-compilable modified Bessel functions, so that might be the reason for the added computational effort you observe. $\endgroup$J. M.'s missing motivation– J. M.'s missing motivation2019-03-15 16:35:05 +00:00Commented Mar 15, 2019 at 16:35
- $\begingroup$ Yes, that is one reason I proposed using GaussianFilter itself to generate the kernel. $\endgroup$Leon Avery– Leon Avery2019-03-15 16:36:57 +00:00Commented Mar 15, 2019 at 16:36
- 1$\begingroup$ If I add
Method -> "Gaussian"to theGaussianFiltercall, it's about as fast as the other two $\endgroup$Niki Estner– Niki Estner2019-03-16 12:32:10 +00:00Commented Mar 16, 2019 at 12:32 - 2$\begingroup$ @NikiEstner Ah, that's good to know! Somewhat unexpected that we have to call out twice for a
"Gaussian"in order to get one, isn't it? $\endgroup$Henrik Schumacher– Henrik Schumacher2019-03-16 13:01:50 +00:00Commented Mar 16, 2019 at 13:01 - $\begingroup$ @Henrik Schumacher: Your results are about 6 times faster than mine. My platform is Xeon E3-1270, V3, 3.695 MHz; 4 cores, 8 threads; 32 GB ECC memory; Windows 8.1 Prof, 64-bit. I'm not used to such shenanigans. Can you suggest a reason? In any case, thanks for the good work. $\endgroup$CElliott– CElliott2019-03-20 16:39:16 +00:00Commented Mar 20, 2019 at 16:39
It's hard to know for sure, but one way to test for caching is to apply a single command to lots of data sets, or to apply the command separately to each set. For instance:
n = 5000; data = RandomReal[{-1, 1}, {n, 10000}]; GaussianFilter[#, 100] & /@ data; // AbsoluteTiming Do[GaussianFilter[data[[i]], 100], {i, n}] // AbsoluteTiming Do[GaussianFilter[data[[i]], 100 + RandomInteger[{-15, 15}]], {i, n}] // AbsoluteTiming The second line generates 5000 different sets of data, each 10000 length. The third applies one Gaussian filter to all the data sets. The third line applies a separate GaussianFilter to each set. The final line forces the GaussianFilter to recompute its kernel. The timings are pretty much the same. This suggests that whatever is happening, the time needed to calculate the Gaussian filter parameters is pretty negligeable.

GaussianFilteremploys FFT (FFT, multiply, inverse FFT). In this case, it would be a very bad idea to replace it by a convolution unless the convolution is also implemented by FFT: The naive way of convolution would require $O(n^2)$ flops for a vector of length $n$ while the FFT method would need only $O(n \, \log(n))$ flops. $\endgroup$RepeatedTimings. $\endgroup$