Parent topic Previous topic Next topic 

In this example we suppose that we have a total of 21 measured values of a response (or dependent) variable, y, for a set of equally spaced values of an independent variable, x (this example is based on sample MATLAB code produced by Roger Jang, Computer Science Department, Tsing Hua University, Taiwan and also tested using the NetLab code library). The data suggest a highly non-linear relationship (Figure 8‑10A). This dataset provides the training information in the form of (x,y) pairs for the model, for which there is one input node and one output node, plus a bias node for each layer. The activation function used in this case was the tanh() function, with standard gradient descent back-propagation. A learning rate of 0.02 was used with a momentum factor of 0.8 ― using a higher learning rate, for example, produced far less stable RMSE behaviour (Figure 8‑11A and B). The stopping criteria were set to 1000 epochs (iterations) or RMSE<0.01. Weights were all initialised with a random uniform value in the range [‑0.5,0.5]. Choice of the number of hidden nodes was made by varying the values between 3 and 10 and examining the fit of the model and the RMSE curve.

Figure 8‑10 MLP: Test data and fitted model

A. Test response data

b. Fitted solution curve

The fitted solution shown in Figure 8‑10B was based on 4 hidden nodes, this being the smallest value that provided a satisfactory fit to the sample data. As might be expected, running the same dataset with a different MLP algorithm requires different settings to achieve the same or similar results. For example, the problem described above was also tested using software (NetLab) that uses a different (generally faster) training procedure, known as scaled conjugate gradients. In this model a tanh() activation function was again used for the hidden nodes whilst a linear function was used for the output nodes. Weight matrices were initialised with scaled random values based on a Normal distribution and a learning rate of 0.02 was again used, but this program variant requires no momentum parameter. It yielded slightly improved results to those above in around 150 iterations.

Figure 8‑11 MLP: RMSE curves

A. RMSE vs. epochs, learning rate=0.02

B. RMSE for learning rate=0.05

It is immediately apparent from this discussion that this ANN modelling process is effectively a form of nonlinear regression, similar to iterative least squares regression (ILSR), previously mentioned in connection with Geographically Weighted Regression in Section 5.6.3. It is also clear that the process requires a level of interaction and experimentation with the software tools, in order to determine the ideal parameter and node settings for a given function estimation problem.

  Back to Top    Back to Home Parent topic Previous topic Next topic