Distance decay models

Navigation:  Building Blocks of Spatial Analysis > Distance Operations >

Distance decay models

Previous pageReturn to chapter overviewNext page

Almost by definition, spatial modeling makes extensive use of distance measures. In many instances a suitable metric is selected and used directly, but for many other situations some generalized function of distance is used. Typically these models involve the assumption that the strength of relationships between locations, or the effect one location has upon another, diminishes as separation increases. There are a wide range of GIS analysis techniques that make use of this concept — we describe a number of these briefly below, and address them separately in a number of the later chapters of this Guide. There are, of course, many situations in which such models are not appropriate, such as in the context of telecommunications links and in situations where time-distance computations are more relevant (e.g. single- or multi-modal network travel modeling, discussed further in Section 7.1 et seq., Introduction to Network and Location Analysis).

The most widely used distance decay models are those in which distance is introduced as an inverse function to some power, typically 1 or 2. Thus the value of some variable of interest, z, at location j, zj, might be modeled as some function, f(), of attribute values, zi, associated with other locations, i, weighted by the inverse of the distance separating locations i and j, dij raised to a power, β:

The exponent, β, has the effect of reducing the influence of other locations as the distance to these increases. With β=0 distance has no effect, whilst with β=1 the impact is linear. Values of β>>1 rapidly diminish the contribution to the expression from locations that are more remote.

One of the commonest applications of this kind of model is in surface interpolation. Most GIS and surface analysis packages provide support for such operations (see further, Section 6.6.1, Deterministic Interpolation Methods). However, inverse distance weighting (IDW), as it is known, is used in many other situations. An example is the study by Draper et al. (2005) referred to in Section 3.1, Spatial analysis as a process. The distance of cancer cases and controls from overhead power lines was calculated and the inverse and inverse squared distances were then used as weights in modeling the observations. The idea behind this approach was to take into account the fact that electromagnetic radiation intensity diminishes with distance from the source.

Inverse distance modeling is one of the main techniques used in the fields of transportation, travel demand modeling and trade area analysis. The propensity to travel between pairs of locations is often assumed to be related to the separation between the locations and some measure of the size of the source and destination zones. The basic model is derived from analogies with the force of gravity:

Here the interaction (e.g. number of trips) between zones i and j is taken to be some measure of the mass, M, at each location, divided by the distance between the zones. As before, α and β are parameters to be estimated. The general form of such ‘spatial interaction’ models is:

where Tij is the number of trips between zones i and j, Oi is a measure of the size of the origin location i (e.g. the total number of trips to commuters in zone i), Dj is a measure of the size of the destination zone (e.g. the total number of work places in zone j), and f(dij) is some appropriate function of the distance or degree of separation of zones i and j. Ai and Bj are balancing factors of the form:

These latter expressions are designed to ensure that the modeled row and column totals add up to the total number of trips from and to each zone (i.e. a so-called doubly-constrained model, which assumes that the totals are correct and not subject to sampling error). Note that by taking logarithms the model is transformed into a (weighted) linear sum, and is often analyzed in this form. In this case the weights are powers applied to the origin, destination and distance variables that must be estimated from sample data.

For a fuller discussion and background to such modeling see CATMOG 2 and CATMOG 4, Roy (2004) which provides a very complete and up-to-date book on the subject, Miller and Shaw (2001, Ch.8), Wilson (1967), and Fischer (2006, chapter 3, reprinted from a paper of 2000). Fischer’s paper describes both the basic spatial interaction model outlined above and alternative methods for model calibration, including least squares, maximum likelihood, and in Parts II and IV of the same volume, neural network methods. Common variants of this type of model are the origin-constrained and destination-constrained versions, in which case either the Ai or the Bj are determined by balancing, but not both. In the case of trade area modeling, gravity models or Huff models (which are origin constrained) are often used. The former typically use inverse distance power models coupled with measures of location attraction, such as retail square footage. Vertical Mapper, for the MapInfo product, is an example of GIS software that provides such facilities. Typical models for f(dij) include: simple inverse distance models (as above); negative exponential models; and combined or Gamma models. All three types (plus others) are supported by TransCAD as impedance functions within its trip distribution modeling facility. Crimestat incorporates similar functions within its trip distribution modeling component. The statistical models within GeoBUGS provide for a generalized powered exponential family of distance decay models of the form:

where φ is the principal decay parameter relating the decay pattern to the effective range of correlations that are meaningful. Note that the parameter k acts as a smoothing factor and with k=2 this is essentially a Gaussian distance decay function. Geographically Weighted Regression (GWR) utilizes similar decay functions (the authors describe these as kernel functions — see further, Section 5.6.3, Geographically Weighted Regression (GWR)), this time of the form:

In these functions the parameter, h, is also known as the bandwidth. A small bandwidth results in very rapid distance decay, whereas a larger value will result in a smoother weighting scheme. The Gaussian model is an unbounded function, and as an alternative GeoBUGS provides the following bounded or disk model of distance decay:

With f(d)=0 otherwise. This model decays slightly more rapidly than a straight line over the interval [0,α), especially over the initial part of the interval. The parameter, α, here has a similar role to that of the bandwidth parameter in the GWR expressions above.

All such models may suffer from problems associated with clustering — for example where a number of alternative (competing) destinations are equidistant from an origin zone, but may be located either in approximately the same direction or in very different directions. Such clustering has implications for modeling, since competition and even interaction between destination zones is inadequately represented. Most GIS and related software packages do not provide direct support for analysis of such factors, and it is up to the user to specify and implement an appropriate modeling framework to address these issues.

These are all models based on a mathematical function of distance. In addition to such models other models or measurements of impedance are extensively used, typically in the form of origin-destination (O-D) matrices. The latter may be derived from measurements or from alternative distance-related models such as intervening opportunities, see Stouffler (1940), or lagged adjacencies (e.g. see the spatial weights modeling options in GeoDa, and Section 5.5.2, Global spatial autocorrelation). Distance, d, in this instance is a free-space or network-related measure of impedance to interaction (e.g. network distance, time, cost or adjacency). The most widely implemented distance impedance expressions are of the form:

α/dβ, αeβd and αeγd/dβ

where α, β and γ are parameters to be selected or estimated. Figure 4‑74 and Figure 4‑75 illustrate the first two of these models, in each case standardized with an α value to provide a value of 10 at d=0.1 units and a range of values for β. The Gamma model provides curves that are intermediate between the two sets shown, depending on the specific values of γ and β chosen or fitted from sample data.

Figure 4‑74 Inverse distance decay, α/dβ

clip0133

Figure 4‑75 Exponential distance decay, αeβd

clip0134

Pure inverse distance models (and the Gamma model) have a (serious) problem when distances are small, since the expression will tend to infinity as d tends to 0. A partial solution to this problem is to include a small distance adjustment to such formulas, such that d=d+ε where ε>0, or to exclude distances that are below a given threshold from the inverse distance calculations (possibly defining a fixed value for the function in such cases). These approaches stabilize computations but are essentially arbitrary. The exponential model has the advantage that it tends to 1 as d tends to 0, so may be preferable if observations are known to be very closely placed or even coincident.

Models of this kind may be constrained in various ways: by limiting the maximum range to be included in computations; by limiting the maximum number of locations to be considered; by restricting the range of values the parameters may take; and by explicitly taking account of barriers (e.g. inaccessible regions). More sophisticated analysis would take into account aspects of the dynamics and uncertainty associated with the interactions or flows being modeled. Somewhat different models of distance decay are incorporated into radial basis interpolation and geostatistical analysis (see further, Sections 6.6.4, Radial basis and spline functions, and 6.7, Geostatistical Interpolation Methods).

The analysis of data variation by distance or distance bands (e.g. variograms and correlogram analysis) is designed to identify overall patterns of variation in a measured variable with linear (Euclidean) distance. In such instances a maximum distance or range is identified if possible, and the pattern of variation in the measured variable is modeled to reflect the diminishing associations observed as distance increases. Here the models effectively treat distance as an independent variable and the observations (generally averaged by distance band) as a form of dependent variable.