|
|
When each node in the network is connected to every node in the next layer by a feedforward link, it is commonly referred to as a multi-level perceptron (MLP, Figure 8‑7). The perceptron part of this description is a reference to its early development in a simpler form as a model of neuron triggering in the retina of an eye.
Figure 8‑7 MLP 3-5-2 with bias nodes

By convention all nodes in the hidden layer and the output layer are also connected to a bias node. This feeds a constant input value, 1, into the set of weights, and acts in a similar manner to the constant term in a regression model. Bias node weights are treated in the same manner as per other node weights. In the MLP, with 1 bias node associated with each feedforward layer, n input nodes, one hidden layer with m hidden nodes and p output nodes, the W and Z weight matrices have dimensions (n+1)x(m) and (m+1)x(p). The MLP then has the architecture shown in Figure 8‑7. The effective number of parameters, λ, for such a network is the sum of the number of feedforward connections (including the bias connections), so in this example λ=20+12=32.
Input data are weighted according to their wij values and combined (typically summed) in the hidden layer. This weighted sum is then modified by what is known as an activation function. This two-step process of summation of inputs and then modification of this sum by an activation function, f, to create the output value can be illustrated at the node level as shown below (Figure 8‑8):
Figure 8‑8 ANN hidden node structure

Various activation functions can be used, but the most commonly used in spatial analysis is the logistic or sigmoid function:
![]()
where β is a slope parameter, typically set to β=1. With this function if x=0 the value output is 0.5. With x large and negative the output approximates 0, whilst for large positive x it approaches 1 (see Figure 8‑9). For large positive β the slope of this function increases until it becomes essentially a step function from 0 to 1. The logistic function provides a model that approximates the actual response behaviour of biological neurons to stimuli, but is primarily used owing to its convenient mathematical properties, as explained further below. Other commonly cited activation functions include simple step or threshold functions (of limited use in spatial analysis problems), the tanh() function (widely used) and simple linear activation (sometimes this name is used for the identity function, which involves no change). The basic tanh() function has a range [‑1,1] but may be adjusted to provide a [0,1] range if required by taking (tanh()+1)/2 (which, of course, is exactly the same as the logistic function, so the two functions can be seen to be closely related). Experience suggests that for some problems the tanh() function, which produces positive and negative values, can produce faster learning that logistic functions. Graphs of each of these functions are shown in Figure 8‑9. Radial basis activation functions are described separately in Section 8.3.2.
Figure 8‑9 Sample activation functions

If we denote the set of inputs as a data matrix, X, and the set of outputs as a data matrix, Y, an artificial neural network is simply a mathematical operation that maps X to Y, i.e. f:X→Y. Some authorities describe this operation as a network function. Typically this mapping is a form of non-linear weighted sum ― indeed non-linearity is an essential feature of ANNs applied to most problems. As noted above, the input data {xi} are modified twice, first by weighted summation and then by use of the activation function. The same procedure applies to the next layer, where the hidden layer output values, hj, are summed and optionally modified as per the input layer to produce the final results in the output layer, yk:

Although the same activation function, g(), may used for all nodes and layers, some implementations apply different activation functions at the output layer from those used for the hidden layers (e.g. linear or identity functions).
|
|