 INTRODUCTION  STING  WAVECLUSTER  CLIQUE-Clustering in QUEST  FAST PROCESSING TIME
 The grid based clustering approach uses a multi resolution grid data structure.  The object space is quantized into finite number of cells that form a grid structure.  The major advantage of this method is fast processing time.  It is dependent only on the number of cells in each dimension in the quantized space.
 Statistical information GRID.  Spatial area is divided into rectangular cells  Several levels of cells-at different levels of resolution  High level cell is partitioned into several lower level cells.  Statistical attributes are stored in cell. (mean , maximum , minimum)
 Computation is query independent  Parallel processing-supported.  Data is processed in a single pass  Quality depends on granuerily
 A multi-resolution clustering approach which applies wavelet transform to the feature space  A wavelet transform is a signal processing technique that decomposes a signal into different frequency sub-band  Both grid-based and density-based  Input parameters:  # of cells for each dimension  The wavelet , and the # of application wavelet transform.
 Complexity O(N)  Detect arbitrary shaped clusters at different scales.  Not sensitive to noise , not sensitive to input order.  Only applicable to low dimensional data.
CLIQUE can be considered as both density- based and grid-based 1.It partitions each dimension into the same number of equal length interval. 2.It partitions an m-dimensional data space into non-overlapping rectangular units. 3.A unit is dense if the fraction of total data points contained in the unit exceeds the input model parameter. 4.A cluster is a maximal set of connected dense units within a subspace.
 Attempt to optimize the fit between the data and some mathematical model.  ASSUMPTION:-data are generated by a mixture of underlying portability distributes.  TECHNIQUES:  expectation-maximization  Conceptual clustering  Neural networks approach
 ITERATIVE REFINEMENT ALGORITHM- used to find parameter estimates EXTENSION OF K-MEANS  Assigns an object to a cluster according to a weight representing portability of membership.  Initial estimate of parameters  Iteratively reassigns scores.
 A form of clustering in machine learning  Produces a classification scheme for a set of unlabeled objects.  Finds characteristics description for each concept  COBWEB  A popular and simple method of incremental conceptual learning.  Creates a hierarchical clustering in the form of a classification tree.
Animal P(Co)=1.0 P(scales | Co)=0.25 Fish P(C1)=0.25 P(scales|C1)= 1.0 Amphibian P(C2)=0.25 P(moist|C2)=1. 0 Mammal/bird P(C3)=0.5 P(hair|C3)=0. 5 Mammal P(C4)=0.5 P(hair|C4)=1 .0 Bird P(C5)=0.5 P(feathers|c5 )=1.0
 Represent each cluster as an exemplar , acting as a “prototype” of the cluster.  New objects are distributed to the cluster whose exemplar is the most similar according to some distance measure. SELF ORGANIZING MAP  Competitive learning  Involves a hierarchical architecture of several units  Organization of units-forms a feature map  Web document clustering.
FEATURE TRANSFORMATION METHODS  PCA , SVD-Summarize data by creating linear combinations of attributes.  But do not remove any attributes ; transformed attributes-complex to interpret FEATURE SELECTION METHODS  Most relevant of attributes with represent to class labels  Entropy analysis .

Grid based method & model based clustering method

  • 2.
     INTRODUCTION  STING WAVECLUSTER  CLIQUE-Clustering in QUEST  FAST PROCESSING TIME
  • 3.
     The gridbased clustering approach uses a multi resolution grid data structure.  The object space is quantized into finite number of cells that form a grid structure.  The major advantage of this method is fast processing time.  It is dependent only on the number of cells in each dimension in the quantized space.
  • 4.
     Statistical informationGRID.  Spatial area is divided into rectangular cells  Several levels of cells-at different levels of resolution  High level cell is partitioned into several lower level cells.  Statistical attributes are stored in cell. (mean , maximum , minimum)
  • 5.
     Computation isquery independent  Parallel processing-supported.  Data is processed in a single pass  Quality depends on granuerily
  • 7.
     A multi-resolutionclustering approach which applies wavelet transform to the feature space  A wavelet transform is a signal processing technique that decomposes a signal into different frequency sub-band  Both grid-based and density-based  Input parameters:  # of cells for each dimension  The wavelet , and the # of application wavelet transform.
  • 9.
     Complexity O(N) Detect arbitrary shaped clusters at different scales.  Not sensitive to noise , not sensitive to input order.  Only applicable to low dimensional data.
  • 10.
    CLIQUE can beconsidered as both density- based and grid-based 1.It partitions each dimension into the same number of equal length interval. 2.It partitions an m-dimensional data space into non-overlapping rectangular units. 3.A unit is dense if the fraction of total data points contained in the unit exceeds the input model parameter. 4.A cluster is a maximal set of connected dense units within a subspace.
  • 11.
     Attempt tooptimize the fit between the data and some mathematical model.  ASSUMPTION:-data are generated by a mixture of underlying portability distributes.  TECHNIQUES:  expectation-maximization  Conceptual clustering  Neural networks approach
  • 12.
     ITERATIVE REFINEMENTALGORITHM- used to find parameter estimates EXTENSION OF K-MEANS  Assigns an object to a cluster according to a weight representing portability of membership.  Initial estimate of parameters  Iteratively reassigns scores.
  • 13.
     A formof clustering in machine learning  Produces a classification scheme for a set of unlabeled objects.  Finds characteristics description for each concept  COBWEB  A popular and simple method of incremental conceptual learning.  Creates a hierarchical clustering in the form of a classification tree.
  • 14.
  • 15.
     Represent eachcluster as an exemplar , acting as a “prototype” of the cluster.  New objects are distributed to the cluster whose exemplar is the most similar according to some distance measure. SELF ORGANIZING MAP  Competitive learning  Involves a hierarchical architecture of several units  Organization of units-forms a feature map  Web document clustering.
  • 16.
    FEATURE TRANSFORMATION METHODS PCA , SVD-Summarize data by creating linear combinations of attributes.  But do not remove any attributes ; transformed attributes-complex to interpret FEATURE SELECTION METHODS  Most relevant of attributes with represent to class labels  Entropy analysis .