|
|
Radial basis function neural networks (RBFNNs) are essentially a combination of the ideas found in multi-level perceptrons (MLPs, Section 8.3.1.1) and those described earlier in this Guide (Section 6.6.4) under the main heading ‘deterministic surface interpolation’, subsection ‘radial basis functions’. In addition to interpolation they can be used for function approximation, as explained further below. As before, we are attempting to find a model, or approximation, for the function mapping, f:X→Y, where X is the input vector or set of vectors and Y is the set of matching output vectors. The standard (regression-type) RBFNN model uses the same 3 layer topology (including bias nodes) as the MLP, but the output is modelled as a linear combination of basis functions, of the form:
![]()
where the activation function, φ(), is a radial basis function. This function operates not on the input data vectors, but on the distance of input data vectors, x, from a pre-selected ‘centre’, c. This centre vector, c, has the same dimensionality as the input vector, x. Figure 8‑18 illustrates the architecture of the RBFNN model we describe.
Figure 8‑18 Radial basis function NN model

Diagram based on Abdi (1994, p271)
The hidden layer computes the distance from the input layer data to each of the centres, and this set of distances, D, are then transformed before passing them to the output layer. When these centre vectors are set to be the same as the input vectors the RBFNN solution will fit f at the input data points, so may be used for function (or surface) interpolation and grid generation (i.e. evaluating f at new data points). When the centre vectors containing fewer elements than the input set the RBFNN approximates f. In this second case these centre vectors are often initialised as random vectors within the data range of the input vectors. Making this operation explicit the equation above can be re-written as:

As with other feedforward ANNs the matrix of weights, W, is trained using back-propagation. RBFNNs are becoming more widely used, partly because they are often faster to train and tend to suffer less than MLPs from local minima. However, their structure can make them somewhat cumbersome and brings the risk of producing models that are over-fitted. Radial basis functions, φ(), that may be used in the expressions above include the multi-quadric and simple power functions (as described in Section 6.6.4), but in most RBFNN implementations the Gaussian function is the preferred choice:
![]()
The major attraction of the Gaussian is that if the centres, c, are distinct and the distance matrix D is square, the transformed matrix φ(D) is non-singular and positive definite ― important characteristics for standard inversion. Here the function φ() is defined as being applied element-wise to the components of D. We can illustrate the RBFNN procedure with approximation of a univariate function, given a set of input data values x and responses, or output values, y (compare this description to that described in Section 6.6.4). Let the vector c, of centres, be the same as x. This means that every element of c will have a distance of 0 from its matching input element. Clearly if x is large then c will also be large. Effectively our dataset consists of n (x,y) pairs and the vector c is simply the set of x-values, {x}. The distance matrix D={dij} is an n by n symmetric matrix of the separation of each x-value from every other x-value, and so has a zero principal diagonal. This distance matrix is then transformed by the activation function, φ(), where φ() is the Gaussian function with variance parameter σ2=1. Hence we have:
![]()
We still require the weights matrix, W, in order to be able to compute the output values using the expression provided earlier. With classical RBF methods we can compute this matrix directly by inversion, as:
![]()
Using RBFNN methods the weights matrix is initialised with random vectors with values in the range of the input data, x, and these are used for forward propagation. The result will be differences between the expected output (y values) and those produced, i.e. errors, which are then used to adjust the set of weights by back propagation. The weight matrix is thus learnt, over a number of epochs, avoiding the need for matrix inversion. This becomes of increased value where larger problems are concerned or the structure of the problem means that the matrix D cannot be inverted. When this approach is applied to the dataset illustrated earlier in Figure 8‑10, the results are directly comparable, but the learning process is smoother (displays a more consistent rate of improvement) than that seen with the basic MLP approach (Figure 8‑11).
|
|