Return to Revisions

8 of 10

added 121 characters in body

edited Jul 29, 2016 at 9:36

144k
27
277
536

For simplicity, let's assume that we are talking about some really simple kernel, say triangular kernel:

$$ K(x) = \begin{cases} 1 - |x| & \text{if } x \in [-1, 1] \\ 0 & \text{otherwise} \end{cases} $$

Recall that in kernel density estimation for estimating density $\hat f_h$ we combine $n$ kernels parametrized by $h$ centered at points $x_i$:

$$ \hat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big) $$

Notice that by $\frac{x-x_i}{h}$ we mean that we want to re-scale the difference of some $x$ with point $x_i$ by factor $h$. Most of the kernels (excluding Gaussian) are limited to the $(-1, 1)$ range, so this means that they will return densities equal to zero for points out of $(x_i-h, x_i+h)$ range. Saying it differently, $h$ is scale parameter for kernel, that changes it's range from $(-1, 1)$ to $(-h, h)$.

This is illustrated on the plot below, where $n=7$ points are used for estimating kernel densities with different bandwidthes $h$ (colored points on top mark the individual values, colored lines are the kernels, gray line is overall kernel estimate). As you can see, $h < 1$ makes the kernels narrower, while $h > 1$ makes them wider. Changing $h$ influences both the individual kernels and the final kernel density estimate, since it's a mixture distribution of individual kernels. Higher $h$ makes the kernel density estimate smoother, while as $h$ gets smaller it leads to kernels being closer to individual datapoints, and with $h \rightarrow 0$ you would end up with just a bunch of Direc delta functions centered at $x_i$ points.

And the R code that produced the plots:

set.seed(123) n <- 7 x <- rnorm(n, sd = 3) K <- function(x) ifelse(x >= -1 & x <= 1, 1 - abs(x), 0) kde <- function(x, data, h, K) { n <- length(data) out <- outer(x, data, function(xi,yi) K((xi-yi)/h)) rowSums(out)/(n*h) } xx = seq(-8, 8, by = 0.001) for (h in c(0.5, 1, 1.5, 2)) { plot(NA, xlim = c(-4, 8), ylim = c(0, 0.5), xlab = "", ylab = "", main = paste0("h = ", h)) for (i in 1:n) { lines(xx, K((xx-x[i])/h)/n, type = "l", col = rainbow(n)[i]) rug(x[i], lwd = 2, col = rainbow(n)[i], side = 3, ticksize = 0.075) } lines(xx, kde(xx, x, h, K), col = "darkgray") }

For more details you can check the great introductory books by Silverman (1986) and Wand & Jones (1995).

Silverman, B.W. (1986). Density estimation for statistics and data analysis. CRC/Chapman & Hall.

Wand, M.P and Jones, M.C. (1995). Kernel Smoothing. London: Chapman & Hall/CRC.

answered Jul 29, 2016 at 8:30

Tim

144k
27
277
536