In Section 6.5, Gridding and interpolation methods, we noted that interpolation typically involves generating a fine rectangular grid covering the study region, and then estimation of the surface value or height for every grid intersection or cell. The estimation process involves the use of a simple linear expression in order to compute grid values:

where zj is the z-value to be estimated for location j, the λi are a set of estimated weights and the zi are the known (measured) values at points (xi,yi). As zj is a simple weighted average an additional constraint is required ensuring that the sum of the weights adds up to 1:

The interpolation problem is to determine the optimum weights to be used. If λi=0 for all i except for the measured point closest to the grid intersection then this would represent a form of nearest-neighbor interpolation. If all n points in the dataset were used and weighted equally every point would have weights 1/n and would be given the same z-value. In many cases, as per Tobler’s First Law, measured points closer to zj are more likely to be similar to zj than those further afield and hence warrant weighting more strongly than observations that are a long way away. Indeed, there may be little to gain from including points that lie beyond a given radius, or more than m<n points away, or points that lie in certain directions.

Each of the methods described has its own approach to computation of these weights. In each case the weights are determined by the choice of model, algorithm and user-defined parameters. This differs from the approach described in Section 6.7, Geostatistical Interpolation Methods, where the structure of the input data is first analyzed and on the basis of this a model and preferred set of parameters are identified — i.e. these are not directly selected by the user. Despite this apparent distinction, all methods may be subjected to a procedure known as cross-validation, which can assist in the choice of model and parameters. The simplest form of cross-validation is systematic point removal and estimation. The procedure is as follows:

• | the model and parameters are chosen |

• | one by one each of the original source points is removed from the dataset and the model is used to estimate the value at that particular point. The differences between the input (true) value and the estimated value is calculated and stored |

• | statistics are then prepared providing measures such as the mean error, mean absolute error, RMSE, the maximum and minimum errors, and whereabouts these occur |

• | the user then may modify the model parameters or even select another model, repeating the process until the error estimates are at an acceptable level |

• | the gridding process then proceeds and the data are mapped |

There are variants of this procedure (for example selection of a random subset of the source data points for simultaneous removal — a technique known as jackknifing), and various alternative cross-validation methods. The latter include:

• | re-sampling — obtaining additional data (as opposed to resampling the existing data) |

• | detailed modeling of related datasets (e.g. easy-to-measure variables) — examples might include distance from a feature such as a river or point-source of pollution, or measurement of a highly correlated variable, such as using light emission at night as a surrogate for human activity leading to increased CO2 emissions |

• | detailed modeling incorporating related datasets — look at possible stratification, modeling non-stationarity, boundaries/ faults |

• | comparison with independent data of the same variate (e.g. aerial photographs) |