Clustering not producing even clusters

Question

I'm using k-means clustering to processes running on machines.

Dataset sample :

machine name, process m1,java m2,tomcat m1,word m3,excel

Build a matrix of associated counts :

 java,tomcat,word,excel m1,1,0,1,0 m2,0,1,0,0 m3,0,0,0,1

I then run k-means against this dataset (have tried Euclidean and Manhattan distance functions) The dataset is extremely sparse which I think is causing the generated clusters to not make much sense as many machines get grouped into the same cluster(as they are very similar)

How to achieve clusters where each cluster contains approx equal number of points ? Or perhaps this is not possible due to the sparseness of the data and instead I should try to cluster on a different attributes of dataset ?

How many attributes are you considering in your dataset? And how many examples? — Pablo Suau
– Pablo Suau, Commented Apr 10, 2015 at 13:14

Has QUIT--Anony-Mousse · Accepted Answer · 2015-04-28 15:29:24Z

Cluster analysis is not supposed to produce paritions of equal size. It is meant to discover structure in the data.

If the majority of objects is highly similar, then this majority is supposed to be in the majority cluster.

Consider all your data is identical. Any clustering algorithm producing more than one cluster has failed, in my opinion...

So you may be using the wrong class of algorithms for your problem.

Stack Exchange Network

Clustering not producing even clusters

1 Answer 1

Hot Network Questions

Clustering not producing even clusters

1 Answer 1

Related

Hot Network Questions