Clustering and Analysis in Data Mining
What is Clustering?The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering.
Why Clustering?ScalabilityAbility to deal with different types of attributesDiscovery of clusters with arbitrary shapeMinimal requirements for domain knowledge to determine input parametersAbility to deal with noisy dataIncremental clustering and insensitivity to the order of input records:High dimensionalityConstraint-based clusteringInterpretability and usability
 Data types in Cluster AnalysisData matrix (or object-by-variable structure)Interval-Scaled VariablesBinary VariablesA categorical variableA discrete ordinal variableA ratio-scaled variable
Methods used in clustering:Partitioning method.Hierarchical method.Data Density based method.Grid based method.Model Based method.
Hierarchical methods in clustering There are two types of hierarchical clustering methods:Agglomerative hierarchical clusteringDivisive hierarchical clustering
Agglomerative hierarchical clusteringThis bottom-up strategy starts by placing each object in its own cluster and then merges these atomic clusters into larger and larger clusters, until all of the objects are in a single cluster or until certain termination conditions are satisfied.
Divisive hierarchical clusteringThis top-down strategy does the reverse of agglomerative hierarchical clustering by starting with all objects in one cluster. It subdivides the cluster into smaller and smaller pieces, until each object forms a cluster on its own or until it satisfies certain termination conditions, such as a desired number of clusters is obtained or the diameter of each cluster is within a certain threshold.
Density-Based methods in clusteringDBSCAN: A Density-Based Clustering Method Based on Connected Regions withSufficiently High DensityOPTICS: Ordering Points to Identify the Clustering StructureDENCLUE: Clustering Based on Density Distribution Functions
Grid-Based methods in clusteringSTING: Statistical information gridSTING is a grid-based multi resolution clustering technique in which the spatial area is divided into rectangular cells.Wave Cluster: Clustering Using Wavelet TransformationWave Cluster is a multi resolution clustering algorithm that first summarizes the data by imposing a multidimensional grid structure onto the data space. It then uses a wavelet transformation to transform the original feature space, finding dense regions in the transformed space
Model-Based Clustering MethodsExpectation-MaximizationConceptual ClusteringNeural Network Approach
Methods of Clustering High-Dimensional DataCLIQUE: A Dimension-Growth Subspace Clustering MethodCLIQUE (CLustering In QUEst) was the first algorithm proposed for dimension-growth subspace clustering in high-dimensional space.PROCLUS: A Dimension-Reduction Subspace Clustering MethodPROCLUS (PROjected CLUStering) is a typical dimension-reduction subspace clustering method. That is, instead of starting from single-dimensional spaces, it starts by finding an initial approximation of the clusters in the high-dimensional attribute space. Each dimension is then assigned a weight for each cluster, and the updated weights are used in the next iteration to regenerate the clusters.
Constraint-Based Cluster Analysis Constraint-based clustering finds clusters that satisfy user-specified preferences or constraints, few categories of constraints are :Constraints on individual objectsConstraints on the selection of clustering parametersConstraints on distance or similarity functionsUser-specified constraints on the properties of individual clustersSemi-supervised clustering based on “partial” supervision
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

Data Mining: clustering and analysis

  • 1.
  • 2.
    What is Clustering?Theprocess of grouping a set of physical or abstract objects into classes of similar objects is called clustering.
  • 3.
    Why Clustering?ScalabilityAbility todeal with different types of attributesDiscovery of clusters with arbitrary shapeMinimal requirements for domain knowledge to determine input parametersAbility to deal with noisy dataIncremental clustering and insensitivity to the order of input records:High dimensionalityConstraint-based clusteringInterpretability and usability
  • 4.
     Data types inCluster AnalysisData matrix (or object-by-variable structure)Interval-Scaled VariablesBinary VariablesA categorical variableA discrete ordinal variableA ratio-scaled variable
  • 5.
    Methods used inclustering:Partitioning method.Hierarchical method.Data Density based method.Grid based method.Model Based method.
  • 6.
    Hierarchical methods inclustering There are two types of hierarchical clustering methods:Agglomerative hierarchical clusteringDivisive hierarchical clustering
  • 7.
    Agglomerative hierarchical clusteringThisbottom-up strategy starts by placing each object in its own cluster and then merges these atomic clusters into larger and larger clusters, until all of the objects are in a single cluster or until certain termination conditions are satisfied.
  • 8.
    Divisive hierarchical clusteringThistop-down strategy does the reverse of agglomerative hierarchical clustering by starting with all objects in one cluster. It subdivides the cluster into smaller and smaller pieces, until each object forms a cluster on its own or until it satisfies certain termination conditions, such as a desired number of clusters is obtained or the diameter of each cluster is within a certain threshold.
  • 9.
    Density-Based methods inclusteringDBSCAN: A Density-Based Clustering Method Based on Connected Regions withSufficiently High DensityOPTICS: Ordering Points to Identify the Clustering StructureDENCLUE: Clustering Based on Density Distribution Functions
  • 10.
    Grid-Based methods inclusteringSTING: Statistical information gridSTING is a grid-based multi resolution clustering technique in which the spatial area is divided into rectangular cells.Wave Cluster: Clustering Using Wavelet TransformationWave Cluster is a multi resolution clustering algorithm that first summarizes the data by imposing a multidimensional grid structure onto the data space. It then uses a wavelet transformation to transform the original feature space, finding dense regions in the transformed space
  • 11.
  • 12.
    Methods of ClusteringHigh-Dimensional DataCLIQUE: A Dimension-Growth Subspace Clustering MethodCLIQUE (CLustering In QUEst) was the first algorithm proposed for dimension-growth subspace clustering in high-dimensional space.PROCLUS: A Dimension-Reduction Subspace Clustering MethodPROCLUS (PROjected CLUStering) is a typical dimension-reduction subspace clustering method. That is, instead of starting from single-dimensional spaces, it starts by finding an initial approximation of the clusters in the high-dimensional attribute space. Each dimension is then assigned a weight for each cluster, and the updated weights are used in the next iteration to regenerate the clusters.
  • 13.
    Constraint-Based Cluster Analysis Constraint-based clustering finds clusters that satisfy user-specified preferences or constraints, few categories of constraints are :Constraints on individual objectsConstraints on the selection of clustering parametersConstraints on distance or similarity functionsUser-specified constraints on the properties of individual clustersSemi-supervised clustering based on “partial” supervision
  • 14.
    Visit more selfhelp tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net