Revisions to Bandwidth parameters in multivariate KDE using scipy.stats.gaussian_kde

added 89 characters in body

edited Nov 18, 2022 at 12:48

12k
13
27
42

I was looking for an answer to this bandwidth matrix optimisation problem and I found this excellent other thread, so I thought I'd drop it here. :

https://stackoverflow.com/questions/67189978/difference-in-bandwidth-for-scikitDifference in bandwidth for scikit-learn-kde-and-multivariate-kde-of-statsmodels KDE and multivariate KDE of statsmodels.

In short, it says that the sklearn KernelDensity() implementation uses bandwidth as a multiplier of the diagonal matrix (so second case of Tim's answer), while statsmodel's KDEMultivariate() estimates different multipliers (so third picture, I believe). I am not sure how this compares to scipy which multiplies the covariance matrix by the single scalar. It looks to fall in the same case as KDEMultivariate(), but with a little less control over the dimension-specific toggling. From what I understand (again from that other stackoverflow answer), they both use rule of thumb for coming up with the covariance matrix.

added 264 characters in body

Source Link

edited Nov 18, 2022 at 12:06

Magi

41
4

I was looking for an answer to this bandwidth matrix optimisation problem and I found this excellent other thread, so I thought I'd drop it here. https://stackoverflow.com/questions/67189978/difference-in-bandwidth-for-scikit-learn-kde-and-multivariate-kde-of-statsmodels

In short, it says that the sklearn KernelDensity() implementation uses bandwidth as a multiplier of the diagonal matrix (so second case of Tim's answer), while statsmodel's KDEMultivariate() estimates different multipliers (so third picture, I believe). I am not sure how this compares to scipy which multiplies the covariance matrix by the single scalar. It looks to fall in the same case as KDEMultivariate(), but with a little less control over the dimension-specific toggling. From what I understand (again from that other stackoverflow answer), they both use rule of thumb for coming up with the covariance matrix.

Source Link

answered Nov 18, 2022 at 11:59

Magi

41
4

I was looking for an answer to this bandwidth matrix optimisation problem and I found this excellent other thread, so I thought I'd drop it here. https://stackoverflow.com/questions/67189978/difference-in-bandwidth-for-scikit-learn-kde-and-multivariate-kde-of-statsmodels

In short, it says that the sklearn KernelDensity() implementation uses bandwidth as a multiplier of the diagonal matrix (so second case of Tim's answer), while statsmodel's KDEMultivariate() estimates different multipliers (so third picture, I believe). I am not sure how this compares to scipy which multiplies the covariance matrix by the single scalar.

Stack Exchange Network

Return to Answer