Parent topic Previous topic Next topic 

A conceptually different clustering approach is point-set partitioning. Essentially this procedure is a form of K-means clustering, as described in Section 4.2.12 on classification. The user specifies the value K, and a set of K random points are placed in the study region as seed points. Each point in the dataset is then allocated to the nearest seed point. The set of points assigned to each seed point is then used to create a new set of seed points comprised from the centres of these initial groupings. The procedure continues until the sum of distances (or squared distances) from each point to its cluster centre seed cannot be reduced significantly by further iterations. Some implementations run the procedure multiple times, with differing initial seeds, selecting the final result from the solution that minimises overall cluster dispersion. Another widely used option, particularly with large datasets and many variables (dimensions), involves ‘training’ the selection by analysing the clusters for a subset of the data and then using the best solution set as the starting point for the entire dataset. Crimestat, working on 2-dimensional pointset clustering, attempts to identify very good starting points for the initial seeds by a form of simple density analysis (placing a grid over the point set and identifying distinct areas of point concentrations).

Unlike NNh clustering, the K-means procedure assigns all events to a unique cluster and clusters do not form hierarchical groupings. Its dependence on the user selection of the K-value and the underlying sub-optimal algorithm used are distinct weaknesses. Dependence on K can be reduced by systematically increasing K from 1,…K and examining the weighting of cluster centres on the problem variables and plotting the total and average cluster dispersion values as K increases.

  Back to Top    Back to Home Parent topic Previous topic Next topic