Radial basis function networks

Navigation:  Geocomputational methods and modeling > Artificial Neural Networks (ANN) >

Radial basis function networks

Previous pageReturn to chapter overviewNext page

Radial basis function neural networks (RBFNNs) are essentially a combination of the ideas found in multi-level perceptrons (MLPs, Section 8.3.1, Introduction to artificial neural networks) and those described earlier in this Guide (Section 6.6.4, Radial basis and spline functions) under the main heading ‘deterministic surface interpolation’, subsection ‘radial basis functions’. In addition to interpolation they can be used for function approximation, as explained further below. As before, we are attempting to find a model, or approximation, for the function mapping, f:XY, where X is the input vector or set of vectors and Y is the set of matching output vectors. The standard (regression-type) RBFNN model uses the same 3 layer topology (including bias nodes) as the MLP, but the output is modeled as a linear combination of basis functions, of the form:

where the activation function, φ(), is a radial basis function. This function operates not on the input data vectors, but on the distance of input data vectors, x, from a pre-selected ‘center’, c. This center vector, c, has the same dimensionality as the input vector, x. Figure 8‑27 illustrates the architecture of the RBFNN model we describe.

The hidden layer computes the distance from the input layer data to each of the centers, and this set of distances, D, are then transformed before passing them to the output layer. When these center vectors are set to be the same as the input vectors the RBFNN solution will fit f at the input data points, so may be used for function (or surface) interpolation and grid generation (i.e. evaluating f at new data points).

Figure 8‑27 Radial basis function NN model

clip0426

Diagram based on Abdi (1994, p271)

When the center vectors containing fewer elements than the input set the RBFNN approximates f. In this second case these center vectors are often initialized as random vectors within the data range of the input vectors. Making this operation explicit the equation above can be re-written as:

As with other feedforward ANNs the matrix of weights, W, is trained using back-propagation. RBFNNs are becoming more widely used, partly because they are often faster to train and tend to suffer less than MLPs from local minima. However, their structure can make them somewhat cumbersome and brings the risk of producing models that are over-fitted. Radial basis functions, φ(), that may be used in the expressions above include the multi-quadric and simple power functions (as described in Section 6.6.4), but in most RBFNN implementations the Gaussian function is the preferred choice:

The major attraction of the Gaussian is that if the centers, c, are distinct and the distance matrix D is square, the transformed matrix φ(D) is non-singular and positive definite ― important characteristics for standard inversion. Here the function φ() is defined as being applied element-wise to the components of D. We can illustrate the RBFNN procedure with approximation of a univariate function, given a set of input data values x and responses, or output values, y (compare this description to that described in Section 6.6.4). Let the vector c, of centers, be the same as x. This means that every element of c will have a distance of 0 from its matching input element. Clearly if x is large then c will also be large. Effectively our dataset consists of n (x,y) pairs and the vector c is simply the set of x-values, {x}. The distance matrix D={dij} is an n by n symmetric matrix of the separation of each x-value from every other x-value, and so has a zero principal diagonal. This distance matrix is then transformed by the activation function, φ(), where φ() is the Gaussian function with variance parameter σ2=1. Hence we have:

We still require the weights matrix, W, in order to be able to compute the output values using the expression provided earlier. With classical RBF methods we can compute this matrix directly by inversion, as:

Using RBFNN methods the weights matrix is initialized with random vectors with values in the range of the input data, x, and these are used for forward propagation. The result will be differences between the expected output (y values) and those produced, i.e. errors, which are then used to adjust the set of weights by back propagation. The weight matrix is thus learnt, over a number of epochs, avoiding the need for matrix inversion. This becomes of increased value where larger problems are concerned or the structure of the problem means that the matrix D cannot be inverted. When this approach is applied to the dataset illustrated earlier in Figure 8‑19, the results are directly comparable, but the learning process is smoother (displays a more consistent rate of improvement) than that seen with the basic MLP approach (Figure 8‑20). RBFNN support has recently been added to the range of machine learning classifiers supported by the Idrisi GIS package from Clark Labs (Selva edition onwards).