Suppose I'm interested in estimating the probability $p=\Pr((U,V)\in A)$ with a random sample $\{(U_i,V_i)\}_{i=1}^N$. The easiest way of doing it is to use the sample mean: $\widehat{p}=1/N\times \sum_{i=1}^N 1((U_i,V_i)\in A)$, i.e., the relative frequency estimator based on the indicator function, and weak law of large numbers guarantee the consistency of $\widehat{p}$. But the indicator function is nonsmooth, and I want a smoothed estimator. I know the Nadaraya-Watson kernel density estimator, I'm considering proposing something that might looks similarly as follows: $\widehat{p}_s=1/N\times \sum_{i=1}^N \frac{1}{h^2} k(\frac{(U_i,V_i)???}{h})????$, where $k(\cdot)$ is the kernel function and $h$ is the bandwidth. I come across the difficulty of not knowing what to write inside the kernel function (the question marks) and thus don't know how to proceed.
Thus my question is, how to construct a smoothed estimator (based on kernel smoothing) for the probability? When is it consistent?
It would be great if you could lay out the conditions for the kernel and bandwidth so that the estimator is consistent for the probability of interest, and prove its consistency under your conditions.