Parent topic Previous topic Next topic 

A neural network-based unsupervised classification procedure known as a Self-Organising Map (SOM) can be developed from an extension to the single spectrum angular mapping (SAM) analysis described in Section 4.2.12.3. In the following example the same 512x614 pixel image with 224 spectral layers provides the sample dataset. The procedure operates broadly as per the earlier discussion. The number of classes in this process is preset, in this case to 512, using a 16x32 SOM neural network. Each cell in the net, 1…k, is initially assigned a random vector, mk, of values ― typically these will be random numbers for each spectral band drawn from a uniform distribution over the observed range for the band in question or for all bands in the image.

The next step is to train the neural network by comparing a randomly selected pixel from the source image that has spectral vector x say, to the vectors of each neural network unit. This comparison is achieved using a norm:

that measures the separation between the sample spectral vector, x, and the kth unit in the neural network, mk. The measure most commonly applied is of the form:

where xi and mki are the ith elements of their respective vectors. The best matching unit (BMU) is the SOM unit, c say, which has the smallest separation from the input vector as computed using the selected norm. The components or weights of this BMU vector are then modified using a very simple general formula:

where t is the time period (iteration number or epoch) and p(t) is an adjustment factor that varies with time, t. In practice p(t) consists of two components:

p(t)=α(t)hcj(t)

The first of these components, α(t), is known as the learning rate, and is simply a fractional value that is (typically) adjusted over time. For example, if T is the total number of steps or epochs in the training period, α(t) might be taken as a simple decreasing linear function of time over the interval [0,T], starting at α(0)=0.5:

α(t)= α(0)(1-t/T) or αt= α0(1-t/T)

The second component is the grid neighbourhood time-adjusted kernel function, hcj(t). All vectors (j) within a specified neighbourhood of the BMU, c (typically all those that relate to adjacent grid cells and/or within a specified radius, r, of c) are adjusted using a pre-selected formula, i.e. not just the central BMU, at c. In its simplest form, all map vectors (j) within grid radius r of c are adjusted equally, i.e. hcj=1 if distance  otherwise hcj=0. Two-dimensional kernel functions of the type previously described in connection with density modelling are commonly used (see further, Table 8‑7), particularly Gaussian or truncated Gaussian functions. This adjustment can be seen as migrating point c in k-space and its near neighbours towards the sample point x. This process, which is a form of sequential regression, is repeated for a substantial sample of pixels often involving thousands of iterations. If the input data are 1-, 2- or 3-dimensional the progress of the adjustment process can be visualised, for example by displaying the results every 10 epochs, providing a form of video of the progress of the procedure.

Figure 8‑20 SOM classification of remotely-sensed hyperspectral data

When the process has completed the individual units or grid cells will have converged to a set of vectors that provide a form of ‘best representation’ of the sample data. The SOM grid will contain similar vectors in close proximity to each other. These vectors can then be used to create a classified version of the source image, using the spectral angle method described earlier. The spectrum of each pixel in the source dataset is compared with the model spectra in the SOM grid and the BMU determines the classification applied. Coding nearby units in the SOM with similar or the same colours and applying these to the source dataset produces a geographic map that both identifies regions that are similar in spectral data and at the same time offers a fine level of discrimination. Figure 8‑20 illustrates this classification process for the Cuprite dataset discussed earlier in this Guide (Section 4.2.12.3).

Issues: There are a number of important observations that should be made regarding the SOM procedure described:

·         Initialisation: the choice of the initial set of map vectors can influence both the speed of the computation and its outcome. Speed may be considerably improved by judicious choice of initial vectors in some problems, i.e. rather than random vector assignment. Multiple runs of the procedure with different initialised vectors may help to produce more consistent or robust results than a single run. For example, it is entirely possibly that two or more groups of similar vectors will be located within the SOM, but which remain isolated from each other rather than forming a single, cohesive grouping

·         Pre-processing: with remote sensed data, factors such as atmospheric scattering, topography, surface lighting (solar effects) and data capture variables (e.g. view angle, stripes) may warrant pre-processing of the data to minimise or remove known effects

·         Normalisation: whilst it may be reasonable to expect the data values of a hyperspectral image to have a broadly similar range across the spectral bands this is not the case for general attribute data. For example, with attributes such as cases of a particular disease in sampled zones the data range might be 0-100, whilst other variables, such as the zone population or a pollutant measurement, may be in a much larger range. The latter variable might dominate the BMU computation and hence hide variations in key attributes. Pre-normalisation of the data vectors, for example by ensuring each has unit variance, may be advisable in such cases. Normalisations supported within the SOM Toolbox include: variance (vectors normalised to unit variance and 0 mean); range (vector components normalised to a [0,1] range); log (simple logarithmic transformation); softmax (vector components are initially variance normalised and the results converted using the logistic function into a [0,1] range with logistic curve form); and histogram (data values are replaced by histogram bins with approximately equal numbers of observations in each bin. Bins are ordered 1, 2, 3 etc. and then finally scaled to give a [0,1] data range. Note that output data and visualisations can be produced for non-normalised, normalised or for data that has been normalised and then de-normalised prior to output

·         Missing data: the computation of the distance metric assumes that all spectral bands or all attribute sets in the sample and final datasets are present. If one or more gaps in the data exist this can either be ignored and assumed to have little or no impact on the result, or the missing attribute data could be assumed to apply to all samples, hence effectively removing this component from the analysis

·         Masking and weighting: it may be desirable to mask off (binary weight) or apply a variable weighting to some data in the sample or final datasets. This could be at the attribute level or the feature level

·         Learning and tuning: it is commonly the case that different rates of adjustment are required for an initial learning period (sometimes called the ordering phase) and the subsequent fine-tuning phase of learning. In both instances similar time-dependent functions may be used, but each with different initial parameters. For example, the default parameters in the MATLab Neural Network toolbox are: 0.9 (ordering-phase learning rate, for an initial 1000 steps); max grid cell separation distance (ordering-phase neighbourhood distance>1); 0.02 (tuning-phase learning rate, for subsequent steps); and 1 (tuning-phase neighbourhood distance, i.e. immediate neighbours only)

·         Distance metrics: a variety of grid or lattice distance metrics, dcj, may be employed. Examples include: Euclidean; Manhattan (rectilinear); Box distance (directly adjacent cells in a rectangular grid 3x3 cell Moore neighbourhood=1 distance unit away, or the outer cells of an expanded Moore neighbourhood of 5x5 cells=2 units away etc.); and Link distance (based on connectivity steps between grid cells or nodes). Note that calculations, especially at grid edges, will vary according to the chosen grid form and topology (e.g. rectangular or hexagonal; sheet, cylinder or toroidal)

·         Neighbourhood functions and Learning rate functions:  Table 8‑7 provides a summary of some of the commonly used functions employed. Essentially the neighbourhood functions are kernel functions of the type encountered previously

Table 8‑7 SOM neighbourhood and learning rate functions

Functions

Name

Expressions

Neighbourhood functions

Bubble

 

Gaussian

 

Cut Gaussian

 

Epanechnikov

where σt is the neighbourhood radius at time t, dcj is the distance between map units c and j on the grid using the chosen grid metric, and 1(x) is the step function 1(x)=0 if x<0 and 1(x)=1 if x>=0

Learning rate functions

Linear

 

Power

 

Inverse

where T is the overall training length (steps) and a0 is the initial learning rate

 

Idrisi (V15 and later) includes a number of neural network modelling tools, principally provided to facilitate image analysis and classification. In Section 8.3.1.4 we showed an example of the use of Idrisi’s MLP neural network facility as part of a landcover change modelling (LCM) process.

In Figure 8‑21 and Figure 8‑22 we show the standard SOM NN facility applied to a multi-band satellite image set. We have selected unsupervised training, but supervised training is also supported by this module. As with the MLP module, multiple input rasters are specified ― here they consist of 3 SPOT images covering a region of Malawi in East Africa, with one of the bands de‑striped before processing (Band 2). The input layer is thus 3 nodes, whilst the SOM output layer must be specified as an n-by-n grid. We have specified an 8x8 grid with initial clustering performed by k-means and a maximum of 10 clusters. The default initial neighbourhood radius was selected which reduces to 1 as training proceeds.

The process commences with the ‘coarse tuning’ phase, during which the network attempts to identify clusters in the input data and group similar clusters in the output layer. The second stage is to classify the map using the learnt weights, assigning a cluster ID to every pixel of the image space. In this case nine clusters were identified and then mapped as shown in Figure 8‑22.

Figure 8‑21 Self Organising Map (SOM) classification - Idrisi

Figure 8‑22 SOM classified 3-band image

A. Source image — SPOT band 1

B. SOM classified result (9 classes)

                                                                                                     

  Back to Top    Back to Home Parent topic Previous topic Next topic