How to peform clustering on heavily right skewed data and zero inflated data

Question

I am currently working on clustering continuous variables (such as AOV, RPV, and conversions(conversion/visits)). The variables are heavily right skewed with long tails and one variable is dominated by zeroes meaning with more than 50% of values zeroes. And overall most of my data is concentrated near origin. The variables are also on different scales. Traditional clustering like k means is not performing well as data is clearly not spherical to cluster using k means.

I need suggestions for how to proceed with optimal clustering approach, data transformation and handle zero inflated data where cluster numbers are not pre-defined but rather are dynamic and adjust as per the data

Welcome to CV Please spell out AOV and RPV. Also, please tell us what you want the clusters to be like. That is, how do you want these skewed variables to be treated? What will you do with the clusters? What do you mean by "optimal"? — Peter Flom
– Peter Flom, Commented Sep 24 at 16:52
I edited your question to make it clearer and more grammatical. Please check that I did not change what you intended to ask. — Peter Flom
– Peter Flom, Commented Sep 24 at 16:56
This will depend on what in your specific application are the relevant characteristics of a cluster. There is software for mixtures of skew normal and skew t-distributions, however these may have difficulties with a large percentage of zero values. One consideration is whether having a zero on such a variable is distinctive enough a feature of observations that you may want to have these separated by clustering from the others. Another consideration is whether a transformation will do something good. — Christian Hennig
– Christian Hennig, Commented Sep 24 at 17:24

Stack Exchange Network

How to peform clustering on heavily right skewed data and zero inflated data

0

Hot Network Questions

How to peform clustering on heavily right skewed data and zero inflated data

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions