Parent topic Previous topic Next topic 

Harvey (1969, p.346) describes the key steps of multivariate classification as follows:

·         quantitative analysis of the inter-relationships among the attributes or among the objects

·         transformation or reduction of the correlations to a geometric structure with known properties (usually Euclidean)

·         grouping or clustering of the objects or attributes on the basis of the distance measured in this transformed space, and once the classes have been firmly identified

·         the development of rules for assigning phenomena to classes

Important additional steps to those listed by Harvey are in-process and post-process analysis of the quality of classification achieved. In-process quality analysis attempts to use multiple classification approaches (typically soft classifiers ― see further, below) to evaluate the robustness and strength of the classes that have been defined. Alternatively or additionally, random samples of classified data (e.g. pixels) are taken and compared with the ground truth, by field survey (using new or previously collected data).

Classification of datasets based purely on attribute information (i.e. ignoring spatial relationships) can readily be conducted using general purpose statistical and mathematical toolsets. Some techniques, such as clustering based on multivariate analysis of variance, require that input datasets be Normally distributed, but many methods make no such assumptions. Most statistical and mathematical libraries provide a range of tools to facilitate classification and assignment. These typically include:

·         facilities that can be used to reduce the complexity (dimensionality) of the source datasets, such as factor analysis, principal components analysis and multidimensional scaling

·         facilities to identify clusters within the dataset, either on a single level for a fixed number of clusters (e.g. K-means clustering) or on a hierarchical basis (typically techniques that rely on some form of linkage function, usually based on a function of the simple n-dimensional Euclidean distance between points within separate clusters); and

·         facilities for optimally assigning new observations to existing classes (e.g. linear discriminant analysis)

Before we examine the application of such methods to the analysis of multi-band image data (the main application area for such methods within GIS and remotely-sensed image processing packages), we first summarise the K-means and linear discriminant analysis procedures mentioned above.

K-means clustering attempts to partition a multivariate dataset into K distinct (non-overlapping) clusters such that points within a cluster are as close as possible in multi-dimensional space, and as far away as possible from points in other clusters. The input dataset can be viewed as a set of objects with an associated set of n (typically real-valued) measures or attributes. Each object can then be viewed as a single point or vector x={x1,x2,x3…,xn} in n‑dimensional space. The cluster procedure is then as follows:

·         The method starts with a set of K initial cluster centres. These might be assigned: (i) as K random locations in n-space within the observed data ranges; (ii) as K uniformly sited locations within the observed data ranges; (iii) as a user-defined set of K locations; (iv) by selecting K datapoints at random from the observed set as cluster centres; or (v) as a set of K locations obtained by clustering a small subset of the data using random or uniform allocation. A minimum separation distance for starting points may be specified

·         The distance between every point and each of the K means is computed, based on some prior defined metric (often using squared Euclidean, L22, or City block, L1, as these are both fast to compute ― see further, Section 4.4.1). The Idrisi software package, for example, uses Euclidean distances and the general expression:

where dik defines the distance from the ith pixel in an image to the kth centroid, Xin is the n-element (band) vector of the ith pixel, and Cin is the n-element vector of the kth centroid

·         Points are assigned to the nearest centre (i.e. it is a locally optimising greedy heuristic; see further, Section 7.2.2.1). Any entirely empty clusters may be discarded and (optionally) a new start point added

·         The location of the centre of each cluster is then re-computed based on the set of points allocated to that centre, and the previous step is then repeated. This sequence is iterated until no further reassignments occur, or a preset number of iterations have been completed, or no cluster centre is moved by more than a pre-specified small amount. The total distance (DSUM) of all points to the centres of the K clusters to which they have been assigned is calculated

·         Each point is then re-examined in turn (Phase 2 of the process) and checked to see if DSUM is reduced if the point is assigned to another cluster. If DSUM is reduced the point is reassigned to the cluster that results in the maximum overall reduction

·         Stopping criteria may be determined by setting a limit to the number of iterations of the process (e.g. 50), or the level of change in DSUM, or the number or percentage of data points that continue to be re-allocated (e.g. 1%), or a combination of these options

·         Special consideration may be given to the re-allocation of data items where the cluster membership is very small (e.g. contain less than 1% of the data). The option to remove low membership clusters and re-allocate their members (i.e. consolidating the clusters to provide a smaller number of larger clusters) is quite common

·         The entire process may then be repeated with a different choice of starting points. This may result in an overall reduction in the objective function, DSUM, since the algorithm is not guaranteed to provide an optimal partitioning

·         Generic clustering algorithms, such as K-means, may result in many spatially isolated pixels or zones, which may be members of a well-represented class (>>1%). In some instances a smoothing filter can consolidate such occurrences with their surrounding zones, but must be applied with care since the resultant classification will knowingly contain mixtures

·         It is often also the practice to assign different weights to clusters when finalising the classes, in accordance with a priori interpretation of the resulting outcomes. The benefits of so doing should be set out in a clear and transparent manner in order to aid interpretation of the results by third parties

Linear Discriminant Analysis (LDA), as applied within Idrisi, assumes we have P classes and P linear discriminant functions. These functions, which are rather like linear regression equations, are computed by analysis of training site data comprised of m image bands. Having computed the weights for each of these linear functions, a set of P weighted sums is then calculated for each pixel in the target image and the sum that has the highest value or score, Si, is taken as the class to which the pixel is to be assigned. The sum is of the form:

where ci is a constant for class i, xj is the observed pixel value in class j, and wi,j is the pre-computed weight applying to band j and class i. Hence with m=7 spectral bands there would be a set of 7+1 associated weights for each discriminant function. If there were P=5 such functions (i.e. 5 classes), there would be a matrix, W, of 40 weight values, and each pixel would be assigned to one of the 5 classes.

Most GIS packages, such as such as TNTMips, Idrisi, GRASS and ArcGIS Spatial Analyst, include a range of multivariate classification procedures. These are principally for image classification and pixel assignment, as are those provided within the specialised image processing suites, ERDAS from Leica and ENVI from RSI. The most commonly provided include K-means, ISODATA, and Maximum Likelihood procedures. More details, with specific reference to multi-band and hyperspectral image analysis, are provided in Sections 4.2.12.3 and 4.2.12.5, below. Details of image classification techniques that use artificial neural network (ANN) methods are provided in Section 8.3. Details of many of the traditional methods described may be found in Tou and Gonzales (1974, Chapter 3), Ball and Hall (1965), Richards (1986) and Burrough and McDonnell (1998, Ch.11).

  Back to Top    Back to Home Parent topic Previous topic Next topic