Areal interpolation

Navigation:  Building Blocks of Spatial Analysis > Geometric and Related Operations >

Areal interpolation

Previous pageReturn to chapter overviewNext page

The polygon-on-polygon overlay process highlights an important problem: “how should attributes be assigned from input polygons to newly created polygons whose size and shape differ from the original set?” This is a specific example of a closely related problem, that of re-districting. Suppose that we have a set of non-overlapping polygons that represent census tracts, and a second set of polygons that represent hospital service districts, generated (for example) as simple Voronoi polygons around each hospital (see further, Section 4.2.14, Tessellations and triangulations). We wish to estimate the population numbers and age/sex mix for the service districts. Based on this information we might assign budgets to the hospitals or iteratively adjust the service districts to provide a more even set of service areas (see further, Section 4.2.11, Districting and re-districting).

In a simple overlay procedure we calculate the proportion of each census district that intersects with each separate service zone (by area) and then assign this proportion of the relevant attribute to the hospital in question. This assumes that the distribution of each attribute is constant, or uniform, within each census tract while also ensuring that the total counts for each attribute remain consistent with the census figures (so-called volume-preserving or pycnophylactic assignment). The standard intersection operator in many GIS packages will not carry out such proportional assignment, but simply carry over the source attributes to the target polygons. Assuming the source attribute table includes an explicit area value, and the intersected target provides an explicit or intrinsic measure of the intersection areas, then the proportional allocation procedure may be carried out by adding a calculated field containing the necessary adjustments. Note that this discussion assumes spatially extensive attributes (see Section 2.1.2, Attributes) whereas with spatially intensive attributes alternative procedures must be applied (for example, initially using kernel density estimation — see Section 4.3.4, Density, kernels and occupancy, for more details).

Figure 4‑16 and Figure 4‑17 illustrate this process for test UK census Output Areas in part of Manchester. The population totals for each source area are shown in Figure 4‑16, together with the area of overlap with a sample 1 km square region and a selected source polygon intersection (highlighted in red).

Figure 4‑16 Areal interpolation from census areas to a single grid cell


Figure 4‑17 shows the result of applying area-based proportional assignment to the square region in question. The region highlighted in red in the Figure 4‑16 has its population estimate of 173 reduced to 32 in this case, with the sum of these proportional assignments (1632) providing the estimated total population for the entire square.

Variants of this procedure attempt to correct for the often unrealistic assumption of uniformity in the spatial distribution within polygons. If data are available for smaller sample regions (e.g. unit postcode areas in the above example) then this may be utilized in a similar manner to that already described. Another alternative is to model the distribution of the variable of interest (e.g. population) as a continuous surface. Essentially this procedure assigns the attribute value of interest to a suitable polygon center and then calculates the attribute value over a fine grid within the polygon by taking into account the values of the attribute at adjacent or nearby polygon centers.

Figure 4‑17 Proportionally assigned population values


For example, if the source attribute values were much higher east of the target polygon than west, the values assigned within the source polygon might be assumed to show a slope from east to west rather than be distributed uniformly. Having estimated values for the source polygons on the fine grid, these values are then summed for each target polygon and adjusted to make sure that the totals match those for the original polygon. The individual grid cells may then be re-assigned to the target polygons (the hospital service areas in our example) giving a hopefully more accurate picture of the attribute values for these service areas. Current mainstream GIS packages do not tend to support the latter procedure, although simple sequences of operations, scripts or programs can be written to make such field assignments.

The SURPOP online software utility was an example of such a program. In this instance the population, Pi, of each grid cell was approximated as:

where Pj is the population of “centroid” j, and wij is the weight applied to centroid j for grid cell i. The number N is determined by the windowsize used, i.e. the size of a moving NxN grid cell region. The weights distribute a proportion of each centroid’s population value to the cells within the window and are determined as a form of distance decay function (see also, Section 4.4.5, Distance decay models) with a decay parameter, α:

where d is the average inter-centroid distance within the sampled window, and sij is the distance between cell i and centroid j. As a result of tests on UK enumeration districts, typically α=1 in this model. The values chosen for the grid resolution, windowsize and decay parameter, together with the selection of centroid locations, all affect the resulting surface model produced.

The above, adjusted volume-preserving assignment approaches, have significant drawbacks where the geography of the study region is highly variable and/or where attributes do not vary in a similar manner (do not exhibit strong positive covariance). For example, population is typically concentrated in urban areas and not spread evenly across arbitrary zones. In urban-rural borders and rural areas with distinct settlements a far better approach would be to utilize ancillary information, such as land-use, road network density (by type) or remote sensed imagery, to adjust attribute allocation. This ancillary data, which is often selected on the basis that it is readily available in a convenient form for use within GIS packages, can then be used as a form of weighting. Recent tests (see for example, Hawley and Moellering, 2005) have shown that the quality of areal interpolation can be substantially improved by such methods.