There are a lot of kernels available for a univariate KDE. R uses normal by default, but the efficacy discussion seems to support the use of Epanechnikov. What should influence kernel choice for univariate exploratory analysis?
- 2$\begingroup$ Since you're doing EDA, one thought is to use a range of kernels and look at the results. In most applications you will find the choice of kernel makes little difference; the bandwidth is more important by far and usually is worth some exploration and visual fine-tuning. The largest qualitative difference among kernel shapes is between those that are discontinuous and those that are highly differentiable. (Discontinuous--uniform--kernels actually are routinely used in 2D analyses, despite the discontinuous effects they produce.) $\endgroup$whuber– whuber ♦2014-10-15 22:55:13 +00:00Commented Oct 15, 2014 at 22:55
- $\begingroup$ @whuber Could you provide some examples of discontinuous kernels in EDA? I remember seeing Epanechnikov one, but there were so many data points that it looked smooth anyway. $\endgroup$Simon Kuang– Simon Kuang2014-10-15 23:10:54 +00:00Commented Oct 15, 2014 at 23:10
2 Answers
This is not really a data visualization question. The information is fairly readily available online, eg http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0405/MISHRA/kde.html
mentions using AMISE to select bandwidth, same approach for kernels could be used. But for EDA, you would want to work like the recommendation for histograms, re-plot with different binwidths to learn different things in the data. Sometimes using a different kernel may be helpful. The normal kernel is generally useful, and I think the bandwidth is more important than the actual kernel.
I would suggest adding tags: distributions, nonparametric. Possibly get better answers under these topics.
The framework of regularization theory (see Regularization Theory and Neural Networks Architectures by Girosi et. al) allows to tackle the problem of looking for a good kernel in a systematic way.
The idea is that the kernel is determined by a smoothness stabilizer which is analogous to controlling the complexity in the MDL sense, or the bias-variance error decomposition.
The idea is that you attempt to solve the problem, $$ H(f) = \sum_{i}\left(f(x_{i})-y_{i}\right)^{2} + \lambda ||Df||^{2} $$ where $D$ is a differential operator like for example $\frac{d^{2}}{dx^{2}}$. Now it can be proved that this results in the following solution, $$ f(x) = \sum_{i}c_{i}G(x-x_{i}) $$ where $G$ is the Green function associated with the regularizer. By means of cross-validation you can search for good values of $\lambda$ and the order of the differential operator.