|
|
In this subsection we provide a brief description of many of the multi-band classifier procedures generally available, with special reference to those provided within one of the main raster-focused GIS packages, Idrisi. Other packages, notably ENVI, also provide many of these facilities.
Within GIS and remotely-sensed image processing software multivariate classification methods are almost exclusively applied to image datasets — typically multi-band image files. The objective is to identify distinct areas or features (collections of pixels or cells) such as forest, water, grassland and buildings and assign all occurrences of such features to distinct classes. This is either performed semi-automatically (supervised, requiring training site data) or automatically (unsupervised).
Supervised classification (S) involves the manual analysis of selected images to identify features of interest and assign polygonal regions to user-defined classes. The statistical characteristics of these regions within and across rasters, sometimes described as signatures, can then be used to guide subsequent classification. Special toolsets are generally provided for signature development. It may also be necessary to normalise the datasets (to ensure the range of values in each band is similar) and/or to (dimensionally) transform the datasets in some way before proceeding to the formal clustering and assignment processes (for example using techniques such as principal components analysis, PCA, or factor analysis, FA).
Unsupervised classification (U) either does not require training data to be specified in advance, or performs a form of automated selection from the input images. With these techniques it is often helpful to specify more clusters than you would ultimately like or expect and then to combine some of those produced if they seem essentially the same in terms of your own objectives rather than purely computational factors. For example separate clusters may be produced for deciduous and coniferous vegetation cover, but if the objective is to classify areas as woodland without separating these by type, post classification combination may be effective. Most of the more sophisticated raster-based GIS packages provide such facilities, for example TNTMips (see the TNTMips Image Classification Guide) and Idrisi (see the Idrisi Guide to GIS and Image Processing), as well as the more generic ArcGIS Spatial Analyst toolset and the specialised image processing suites, ERDAS from Leica and ENVI from RSI. Several of these packages refer to Richards (1986) as a primary source for many of the standard procedures they adopt for classification and interested readers are recommended to refer to this volume for more information.
Each pixel of a multi-band image to be classified can be regarded as being comprised of a vector x={x1,x2,x3…,xm} of data values, one per band. Typically the data are real-valued and positive. In so-called remotely-sensed multi-band image processing the number of bands is generally not very large, typically 7 or fewer. In such multi-band images it is often the case that most of the useful information is contained in just two or three bands, for example the red, near infra-red and middle infra-red bands, and in some instances these may be used rather than the full set. Hyperspectral images, on the other hand, are comprised of a very large number of finely divided spectral bands, from 10s to 100s of such bands. A third type of multi-band image is one in which the images are co-registered but are not all derived from remote-sensing ― some of the bands or raster layers may be derived from other forms of survey and/or be computationally derived (e.g. rasters generated by rasterizing zonal, linear or point vector data). An example involving the analysis of such data using artificial neural network (ANN) methods is provided in Section 8.3.3.2.
Table 4‑6 summarises the principal set of so-called hard classifiers (H), some of which also provide soft classification as an option. Hard classifiers assign every pixel in the target image to one of n classes, or to a null class (usually coded as 0). Soft classifiers (S) assign pixels to classes with a value that indicates their degree of membership of that class. This may be expressed as a probability value, pk, with the sum over all k=1..P classes being 1 (or 100%), or may be a value that is not regarded as a probability and may not sum to 1 (typically summing to less than 1). Display of soft classified images is often as a series of separate layers, one per class, with the pixels being colour- or grey-scale coded to indicate their degree of membership.
The output of soft classifiers can be hardened, in a variety of ways. One approach is to take the output from a soft classification and generate a set of classification maps that provide the first (most likely/highest scoring), second etc. class assignments as separate displays. Another approach is to utilise ancillary data, such as (buffered) GIS vector data that is overlaid on the classified results. For example, this may assist in identifying classes that are or are not built-up land (i.e. increasing the strength of evidence for specific classes) based on the known distance from roads. Yet another approach is to utilise prior geostatistical analysis of higher resolution remotely-sensed imagery obtained from a smaller area, or field survey data, to provide enhanced guidance for the classification process.
Most of the classification procedures described in Table 4‑6 are non-hierarchical, i.e. they classify pixels in the target image into distinct classes that are not grouped into levels. Some do provide a tree-like structure (e.g. CTA). In this case, conceptually one might imagine high level groups such as Built, Wetland, Forest, Grass, and Rock. Forest, for example, might be divided into deciduous and evergreen. Deciduous might be further divided into species-type subgroups. In Table 4‑6 the H/S column identifies whether the classification is hard or soft, with some classifiers providing both options. The S/U column identifies whether the procedure is essentially supervised (S) or unsupervised (U). In addition, and not described in detail here, are mixture analyses, in which pixels may be allocated to one or more than one class simultaneously. For example, a pixel might be assigned to two different kinds of woodland or two different kinds of soil type, thus regarded as being of explicitly mixed rather than single class membership. Mixture assignments are usually on an hierarchical basis, e.g. with 3 classes [A], [B] and [C] the 7 possible mixtures would be [A], [B], [C], [A,B], [A,C], [B,C], and [A,B,C]. Clearly there may be a very large number of mixture class combinations, so this approach must be applied with caution.
Table 4‑6 Image classification facilities — Selected classifiers
|
Classifier (Idrisi function shown in CAPS) |
S/U |
H/S |
Description |
|
Simple one-pass clustering |
U |
H |
A technique that generates up to P clusters by assigning each input cell to the nearest cluster if its Euclidean distance is less than a given threshold. If not the cell becomes a new cluster centre. It principal merit is speed, but its quality of assignment may not be acceptable (see also, PIPED, below) |
|
Parallelepiped |
S |
H |
Based on a set of lower and upper threshold values determined for a signature on each band. To be assigned to a particular class, a pixel must exhibit values within this range (absolute limits or standard deviation, for training dataset) for every band considered. Non-unique assignment with no assignment (Class=0) if a pixel lies outside the threshold for all signatures. Very fast, but generally not recommended for use because the class ‘boxes’ defined by the thresholds typically overlap, meaning that pixel assignment to specific classes is arbitrary in these regions |
|
Minimum distance to mean |
S |
H |
Essentially the same as simple one-pass clustering but cluster centres are pre-determined by analysis of a training dataset, using the mean values on each band for a signature. Pixels are assigned to the class with the mean closest to the value of that pixel. Applied when the number of pixels used to define signatures is very small or when training sites are not well defined. No assignment (Class=0) is made if a pixel lies outside a maximum search distance set by the user for all signatures. Data may be raw values or pre-normalised. Use in preference to MAXLIKE if training sites are not well defined in terms of classes. Fast, often performing well, but does not account for the variability between classes since it uses mean values only ― MAXLIKE (below) is generally a preferable approach (see also ISOCLUST) |
|
Maximum likelihood (MAXLIKE) |
S |
H |
A method that uses statistical analysis (variance and covariance) of a training dataset (class signatures) whose contents are assumed to be Normally distributed (prior transformation of the input dataset may therefore be advisable). It seeks to determine the probability (or likelihood) that a cell should be assigned to a particular cluster, with assignment being based on the computed Maximum Likelihood value. MAXLIKE operates in a similar manner to MINDIST but takes account of correlation between bands. Pixels are assigned to the most likely class based on a comparison of the posterior probability that it belongs to, each of the signatures being considered (i.e. Bayesian). Prior probabilities may be set in various ways, including uniform (all classes equally likely) or using separate knowledge of some or all classes, e.g. 30% is expected to be woodland, 20% grassland, etc, where the labels woodland and grassland correspond to specific signatures. Unique assignment ― no assignment (Class=0) is made if a pixel lies outside a pre-specified probability level (e.g. 1%, 5%, i.e. less than x% likelihood of belonging to any of the signature classes). Requires number of pixels >10 times number of bands. Limitations (Idrisi): number of bands <65; number of signatures<256. Quality heavily affected by homogeneity of training sites. Relatively fast (see also ISOCLUST) |
|
(Stepwise) Linear Discriminant (FISHER) |
S |
H |
This is essentially a Linear Discriminant Analysis (LDA) method (see Section 4.2.12.2), which attempts to compute linear functions of the dataset variables that best explain or discriminate between values in a training dataset. With the stepwise variant new linear functions are added incrementally, orthogonal to each other, and then these functions are used to assign all data points to the classes. The criterion function minimised in such methods is usually Mahalanobis distance, or the D2 function. Linear Discriminant Analysis of the training site data is used to form a set of linear functions that express the degree of support for each class. The assigned class for each pixel is then that class which receives the highest support after evaluation of all functions. In the Idrisi implementation there are as many classification functions as there are classes. Each function allows classification scores to be computed for each pixel for each class. Limitations (Idrisi): number of bands <65; number of signatures<256. Unique assignment. All pixels are assigned to classes. Relatively fast |
|
K-nearest neighbours |
S |
H,S |
In this procedure the Euclidean distance is calculated for each pixel in an input image to all sampled training pixels in all classes. Then, for the specified k, only those k-nearest neighbours are examined in determining an image pixel’s class or degree of membership to a class. For the hard classification output, the class that dominates the k-nearest neighbours is assigned to that pixel. For the soft classification output, the proportion for each category among the k-nearest neighbours is evaluated. For each class an image is produced with each pixel assigned its respective proportion. Variable speed of operation, depending on parameters, especially size of k-value |
|
Histogram clustering |
U |
H |
CLUSTER uses a histogram peak technique of cluster analysis. This is equivalent to looking for the peaks in a one-dimensional histogram, where a peak is defined as a value with a greater frequency than its neighbours on either side (see Figure 4‑22). Once the peaks have been identified, all possible values are assigned to the nearest peak and the divisions between classes fall at the midpoints between peaks. In Idrisi a maximum of 7 image bands can be used as input, and pixels are assigned to a maximum of 255 classes. A 1-dimensional to 7-dimensional histogram is used to find the peaks. A peak is a class where the frequency is higher than all of its non-diagonal neighbours (broad level), or higher than all but one of its non-diagonal neighbours (fine level). Input images are pre-processed into 256 grey-scale values, generally with the tails of the histograms cut off (e.g. 2.5% at each end). CLUSTER can be used to develop class signatures that may then be applied to supervised classification systems like MAXLIKE. Fast |
|
U |
H |
Similar to the K-means procedure but at each iteration the various clusters are examined to see if they would benefit from being combined or split, based on a number of criteria: (i) combination — if two cluster centres are closer than a pre-defined tolerance they are combined and a new mean calculated as the cluster centre; (ii) if the number of members in a cluster is below a given level (e.g. 20) the cluster is discarded and the members re-assigned to the closest cluster; and (iii) separation — if the number of members, or the standard deviation, or the average distance from the cluster centre exceed pre-defined values then the cluster may be split. ISOCLUST is an automated procedure that combines the CLUSTER operation with the MAXLIKE cluster assignment process, applied iteratively ― both are described above. It is similar to the ISODATA procedure (see Dunn, 1973). CLUSTER is used to generate initial cluster centres (seed locations) and these define the signatures which are then used by MAXLIKE to determine cluster assignments. Restrictions are as per CLUSTER and MAXLIKE. The procedure is quite slow, depending on the complexity of the images and the number of clusters, but is claimed to produce very strong cluster assignment. |
|
|
K-means |
U |
H |
The classical K-means algorithm has been described earlier in this Guide (Section 4.2.12.2). Limitations (Idrisi): n<65 image bands and K<256 clusters. Very dependent on selection of number of clusters and their initial centre locations (seed locations). This is a heuristic, greedy algorithm for minimizing SSE (Sum of Squared Errors) hence it may not converge to a global optimum. Three options are provided in Idrisi for seed locations: random partitioning ― each pixel is assigned at random to one of K clusters and the cluster centres are computed from the assigned points; random centres ― randomly selects K pixels as the initial centres and assigns other pixels to these; and diagonal values ― the range of pixel data values in n-space is computed and values assigned at even spacing along this [min] to [max] n-vector. Relatively fast |
|
Fuzzy c-means (FCM) |
U |
H/S |
Similar to the K-means procedure, originally developed by Dunn (1973) but uses weighted distances rather than unweighted distances. Weights are computed from prior analysis of training data for a specified number of classes. These cluster centres then define the classes and all cells are assigned a membership weight for each cluster. The process then proceeds as for K-means but with distances weighted by the prior assigned membership coefficients (see also, Section 4.2.13.4) |
|
Minimum distribution angle (MDA) |
U |
H |
An iterative procedure similar to K-means but instead of computing the distance from points to selected centres this method treats cell centres and data points as directed vectors from the origin. The angle between the data point and the cluster centre vector provides a measure of similarity of attribute mix (ignoring magnitude). This concept is similar to considering mixes of red and blue paint to produce purple. It is the proportions that matter rather than the amounts of paint used. See also, Spectral angle clustering in Section 4.2.12.5 |
|
Multi-level perceptron (MLP) |
S/U |
H/S |
A neural network technique. Described in detail in Section 8.3.1 et seq with a worked example. Fairly slow |
|
Self organising maps |
S/U |
H |
A neural network technique based on Kohonen (1990). Described in detail in Section 8.3.3 et seq. With the unsupervised model variant K-means is generally used to locate the initial clusters. Moderate speed of clustering |
|
Fuzzy ARTMAP |
S/U |
H |
ART stands for Adaptive Resonance Theory, a form of neural network model. Fuzzy ART is a clustering algorithm that operates on vectors with fuzzy input patterns (real numbers between 0.0 and 1.0) and incorporates an incremental learning approach which allows it to learn continuously (remembering previous learned states). The MAP part of the name refers to the use of an additional MAP layer in the neural network model when used in supervised mode. For more details see Mannan and Roy (1998), or Fischer (2006, Ch.11) |
|
Classified tree analysis |
S |
H/S |
A univariate hierarchical data splitting procedure, that progressively divides the training dataset pixels into two classes based on a splitting rule, and then further subdivides these two classes (i.e. a binary tree structure). Division (splitting) may be achieved using the entropy statistic or similar measures (e.g. gain ratio, Gini statistic). See further, Zambon et al. (2006) and Idrisi’s focus paper describing the technique |
Figure 4‑22 SPOT Band 1 image histogram - distinct peaks highlighted (CLUSTER)

|
|