|
|
Point samples may be unduly clustered spatially, for a variety of reasons. For example, samples from boreholes and wells may provide the basis for a chemical analysis of groundwater supplies, and the distribution of these may be clustered. Geological and subsea surveys frequently involve intensive data collection in localised areas, with sparsely sampled areas elsewhere. Practical constraints, such as access in built-up or industrialised zones, may also dictate sampling schemes that exhibit strong clustering. And of course, there may be clustering as a feature of the sampling design (e.g. stratified sampling, repeat sampling in small areas to obtain a representative measure of selected attributes). The latter may have been designed to ensure that different regions of interest (ROIs) are represented adequately, or that suspected areas of greater local variation are sampled in more detail than areas that are suspected of being more uniform.
Measured attributes in such instances may not be representative of population (whole region) attributes because observations in close proximity to one another may exhibit strong positive spatial autocorrelation — neighbouring measurements often have very similar attributes (see further Section 5.5.1). This results in attributes within these regions having undue weight in subsequent calculations. In the extreme, almost all observations may have been taken in a small region with consistently high or low attribute values, whilst very few have been taken from all remaining parts of the study area. Assuming spatial autocorrelation is present, clustering has the effect that measures such as the calculation of mean values, the estimation of regression parameters, or the determination of confidence intervals may be substantially biased.
A partial solution to problems of this kind is known as spatial declustering. Essentially this involves removing or reducing the known or estimated adverse effects of clustering in order to obtain a more representative picture of the underlying population data and/or to ensure techniques such as feature extraction and surface modelling operate in an acceptable and useful manner. There are several approaches that may be adopted, each of which involves adjusting the sample values prior to further analysis. One of the simplest procedures for declustering involves defining a regular grid over sampled points (rather as per the grid generation procedure described in the subsection 5.1.2.1). The grid cell size is selected such that it is meaningful for the problem at hand (e.g. feature extraction) and/or ensures that the average number of points falling in a grid cell is 1 (typically). Cells which contain many sample points may then be regarded as clustered or possibly over-sampled, and a statistic such as the median value of the measured attribute(s) across all sampled points in that cell may then be used as the single assigned cell (centre) value. Another commonly provided declustering technique based on this grid-overlay approach is to use the density of points as a weighting function. For example, cells with 0 points have zero weight, cells with 1 point have a weight of 1, and cells with n points have each point weighted 1/n (hence in effect this is a simple averaging procedure). In reality both of these procedures amount to a kind of stratification of already sampled locations subsequent to their selection. It is important to note that procedures of this kind present no substitute for randomness in the selection of locations to be sampled and can amount to very dubious practice if the intention is subsequently to build an inferential statistical model using the observations that are retained.
In a similar vein, and as an alternative to count-based weighting, area-based weighting is provided as an option in several packages. This involves generating a set of Voronoi regions around each sample point, which results in small areas for closely spaced points and large areas for sparsely arranged sample points. The weights applied are then directly related to these areas. This method is simple but needs to have some justification and/or validation in terms of the problem under consideration, and may suffer from serious edge-effect problems, depending on how the Voronoi regions are computed (e.g. to the edge of the mapped region, or to the MBR or convex hull of the sample point set). Hybridised variants of area-based weighting (e.g. by adjusting the weights using known physical boundaries and/or nearest neighbour distances) have been shown to substantially reduce mean absolute error (MAE) and RMSE in some instances, e.g. see Dubois and Saisana (2002). Revised point-weighting schemes of this kind can be generated within GIS packages and then applied to the target attributes prior to further analysis. The scheme proposed by Dubois and Saisana, for example, which they tested on DEM data for Switzerland, was of the form:
![]()
where wi is the weight applied to the ith sample point, i=1,2,…n; si is the area of the ith Voronoi region; sm is the average area of all the Voronoi regions (i.e. study area/n); and di2 is the squared distance of the ith sample point to its nearest neighbour. Models of this type do not have universal application, and selection of appropriate declustering procedures requires careful analysis of the sample data, sub-sampling and cross-validating against some form of ground truth where necessary, and then applying adjustments in a manner appropriate to the problem and dataset to hand.
|
|