Metrics

Navigation:  Building Blocks of Spatial Analysis > Distance Operations >

Metrics

Previous pageReturn to chapter overviewNext page

Introduction

The term metric in connection with spatial analysis normally refers to the form of distance calculation applied, although it has a number of other meanings (e.g. the metric system of weights and measures; in the Manifold GIS package it means the sequence of coordinates that defines a vector object). Here we stick to its use in connection with distance computation. The essential features of the notion of distance are:

a finite or infinite set of distinct, definable objects, and
a rule or set of rules for determining the degree of separation of object pairs (a “measure”).

From a mathematical perspective a metric dij, denoting a measure of separation between points i and j, should satisfy a number of specific conditions. These are:

The standard formulas dE and dS we discussed earlier in this Guide meet all four of these requirements, but there are many situations in which one or more of these conditions fails to be met. Particularly common are cases in which either the triangle inequality or symmetry conditions fail. Typically these occur in transport networks, where one-way routing systems apply or where there are variable tariffs or traffic flow asymmetries. Circular routes, such as the inner and outer circuits of Glasgow’s “Clockwork Orange” underground system (Figure 4‑54) provide an example of asymmetry. This service operates one clockwise and one anti-clockwise route, each of length 10.4 kms and transit time of 24 minutes. On a given circuit, the journey-time from Kelvinbridge to Hillhead, for example, which is 2 to 3 minutes, is clearly not the same as Hillhead to Kelvinbridge (over 20 minutes), and between stations it is not possible to change circuits.

Figure 4‑54 Glasgow's Clockwork Orange Underground

clip0107

Asymmetry is also a characteristic of a number of telecommunications networks, where some modem standards and some broadband internet technologies (e.g. ADSL) operate in this way — information is generally received at a faster rate than it can be transmitted. A similar situation applies in many hybrid broadcast/telecommunications systems, such as satellite, coax or fiber-optic downstream and copper/cable upstream, and in some mobile network architectures. Pure broadcast technologies, such as terrestrial television and radio, are strictly one-way and only facilitate two-way communication via completely separate media (such as post, telephone or email). This kind of asymmetry also reflects a fundamental human characteristic — our ability to receive and process information is far greater than our ability to respond to it in an active manner.

If the triangularity constraint is not met we have the possibility that

which offends our notion of “between-ness” and thus of conventional notions of distance and space. For example, this would suggest that the "distance" from London to Madrid plus the "distance" from Madrid to Lima might be less than (cheaper than, faster than) the direct "distance" from London to Lima. A similar pattern can be found with railway and other transport timetables, where express trains and stopping trains in combination may reach a destination faster than a direct stopping train, even though the rail distance traveled may be greater. It may not be possible to draw a time- or space-consistent map of such data, but this does not mean that the measure used is not valuable. Indeed, in such cases non-cartographic representations (e.g. GIS internal representation, to/from matrices) of the data are often more meaningful and useful in optimization and routing problems, and as such are explicitly supported in GIS-T packages such as TransCAD and Cube.

Terrestrial distances

Most GIS software uses the standard Euclidean formula, dE, when computing distances between plane coordinate pairs and for calculation of “local” distances, as in the case of the vector buffer operator (see further, Section 4.4.4, Buffering). It is important to check to see if distance computations provided represent true distances (e.g. in miles or kms) since Euclidean computations on raw latitude/longitude values produces incorrect results. Some packages, such as MapInfo, provide the option of using spherical distance, dS, as a default. Manifold computes distances either using Euclidean or Ellipsoidal methods (supported via its functions Distance() and DistanceEarth(), and in its Tracker facility).

The MATLab Mapping Toolbox provides a range of distance and associated mapping functions, which help to illustrate some of the differences that apply on larger scale maps. Figure 4‑55 shows a map of the North Atlantic on a Mercator projection, with the great circle path and the line of constant bearing, or rhumb line between Boston, Mass. and Bristol, England. The MATLab function distance() will compute the length of each path, taking the latitude and longitude of Boston and Bristol as inputs, and optionally a geoid parameter. The GRASS module geodist.c provides similar functionality. In the example shown the great circle distance is computed as 5105.6 kms based on its default terrestrial radius of 6371 kms; the Haversine formula we gave in Section 4.2.1, Length and area for vector data, produces the same result, whilst the rhumb line distance is 5283.4 kms.

Figure 4‑55 Great circle and constant bearing paths, Boston to Bristol

clip0108

The geoid parameter, rather confusingly named, provides the option to select a model of the Earth which has an ellipsoidal form, specified by a two element vector: (i) the length of the major semi-axis of the ellipsoid and; (ii) its eccentricity or flattening coefficient. If these parameters are not specified it is assumed that the two semi-axes are equal, and thus provide a spherical or great circle measure. If geoid (ellipsoid) values are provided then an approximation method is used to calculate distances. It first treats the two terrestrial coordinates as if they were Cartesian coordinates in three dimensions and calculates the Euclidean distance between them. This value, which is the chord length of an elliptical section, is then adjusted by a correction factor to produce the final estimate of ellipsoidal distance. In our example this gives a revised distance of 5110.4 kms, a few kilometers longer than the great circle distance, as expected. Finally, yet another alternative is to use the more accurate Vincenty (1975) algorithm to obtain the ellipsoidal distance, which gives a figure of 5119.7 kms for the matching ellipsoid in this example.

Extended Euclidean and Lp-metric distances

Euclidean distance is the most widely used GIS metric. In 2D space (planar or projected space) the Euclidean distance between points a(x1,y1) and b(x2,y2) is a special case of the n-dimensional Euclidean metric. Within GIS this extended Euclidean metric is used to determine 3D distances (across surfaces and within solid structures) and in multi-dimensional analyses such as clustering. For 3D problems using {x,y,z} as our coordinate representation the measure is simply:

More generally we may represent our points a and b using subscripts to represent the dimensions 1,2,…n. Thus we have a=(a1,a2) and b=(b1,b2) for the 2D case. The general form of the Euclidean metric in n-space then becomes:

This formula is processor-intensive to compute for very large number of point pairs and dimensions, mainly due to the square root operation. For some applications, such as K-means clustering (see Section 4.2.12, Classification and clustering) and line generalization (see Section 4.2.4, Line Smoothing and point-weeding) the square root is not taken, which breaks its metric behavior, but remains suitable for these specific purposes. In other areas, such as incremental grid operations, approximations to the Euclidean metric are taken to avoid such problems (see further, Section 4.4.2, Cost distance).

The formula above also arises, with some modifications, in the analysis of statistical datasets. The standard Euclidean metric in n-space takes no account of the average variation of the data points. However, it can be normalized (made scale free) by calculating the variance of the point set components, σi2, and using these as a divisor:

This is known as the normalized Euclidean metric (see also, Section 4.3.3, Ratios, indices, normalization, standardization and rate smoothing) and is essentially the same as the Euclidean metric if the variances of point set components (the average variation of values or attributes of each dimensional component are equal. Pre-normalizing the attributes of input datasets (e.g. raster files) and then computing conventional n-dimensional Euclidean distances will produce the same result. This expression is also closely related to a more general measure, introduced by P C Mahalanobis in 1936 and named after him, which is used to determine the similarity of an unknown sample set to a known one. In addition to the variances of the components Mahalanobis distance takes into account the covariances that may exist. This more general measure is used in a number of GIS packages in connection with the analysis of remote sensing data.

For applications that require network distance measures between locations, Euclidean distance is generally inadequate and computing large numbers of (shortest) network distances may be impractical or inappropriate. In order to obtain more useful measures of inter-location distance many authors have suggested that the standard Euclidean distance formula be amended in various ways. The most frequently used modifications are based on replacing the use of the square and square root operations — i.e. replacing the powers of 2 and 1/2 — with other values, typically p and 1/p, estimated from sampling the network to be analyzed. We shall refer to such modifications as the family of Lp metrics or p-norms, with L2 being the standard Euclidean measure. They are amongst a range of alternative metrics supported by the Locational Analysis package, LOLA, which provides output in ArcGIS shape file format (SHP) as an option (see further, Section 7.4.1 et seq., Location problems).

The distance between two points in the plane: a(x1,y1) and b(x2,y2) with this metric is defined as dp(a,b), where:

Figure 4‑56 plots “circles” using various values for p, with the case p=2 (the Euclidean metric) giving a perfect circle — the only value which provides a rotationally invariant distance measure. The lines show the position of a point at a fixed distance from the center using the simple pth power/pth root model.

Figure 4‑56 p-metric circles

clip0110

In geographic literature the case p=1 is variously called the Manhattan, Taxicab, Rectilinear or City block metric, and is supported in a number of software packages, including the now defunct LOLA project and the Crimestat package (in the latter instance with both Plane and Spherical computations). With p=2 we have the standard Euclidean metric and with p=∞ we have the so-called minimax or Chebyshev metric used in a number of optimization problems. Note that the shape of the “circle” for p=1 is a square at a 45 degree angle, and this is the same as the shape for p=∞.

Values for p of approximately 1.7 have been found by many studies to approximate road distances within and between towns, and such estimates may be used in place of direct-line distance (where p=2) in problems such as locational analysis and the modeling of travel patterns — see for example, Love et al. (1972) and (1988). An alternative is to use a simple route factor (typically around 1.3) as a multiplier and to continue using Euclidean distances, but inflated by this route factor. Note that if p<1 the triangle inequality no longer holds, i.e. it is possible for the distance (or time) from A to B plus the distance from B to C to be less than the distance (or time) from A to C. This characteristic is quite common in transport infrastructures, where one-way systems, timetabled routes and traffic congestion alter the expected paths significantly. Numerous variations on the simple Lp metric model have been studied, including using a mixture of powers for the x- and y-directions (p and q), adding weighting factors for the two directions, including a different power, s, for the root etc. Most of these variations provide modest improvements in estimation of transport route distances at the expense of increased complexity in determination of the parameters. A summary of the possible values for p in the Lp metrics and their interpretation is provided in Table 4‑9.

One of the most striking features of the Lp metric diagram (Figure 4‑56) is that for every value of p except 2 the “circles” show rotational bias — maximized at 45 degrees. Similar bias exists if other variants of the Lp metric family are plotted. This highlights the fact that a distance value obtained by measurement and application of any of these modified formulas will change if the reference frame is rotated. If this seems a bit obscure, imagine measuring the distance between two points randomly located on a rectangular street network oriented north-south/east-west. In general this will involve finding the length of a route in the north-south direction and adding the length of a route in the east-west direction. If, however, we could ignore the physical network and rotate and translate the pattern of streets about one of the points until the two points both lay along one road (a rotated north-south route), the length measured would be shorter. This comment applies equally to grid files, where distances (and many other measures) are often calculated in an incremental manner which is affected by the grid orientation.

Table 4‑9 Interpretation of p-values

0<p<1

The Lp formula does not yield a metric because the triangle inequality no longer holds — for example, construction of minimal spanning trees on the basis of progressively connecting nearest-neighbors is not valid. Routes in cities may have values of p<1 indicating that routes are complex — a value of p=2/3 indicates that distances are effectively twice as far as direct line estimates would suggest. The locus of points in this case is concave

1<p<2

The distance, d, is a metric which is intermediate between the Manhattan grid-like street pattern and the normal Euclidean metric, and is thus applicable to more complex city street networks

p>2

As p increases greater weight is given to the largest components of distance in the network. Consider the distance from (0,0) to (1,1‑E) where 0<E<1. For p=1 d=2‑E; for p=2 and E<<1 d≈√2(1‑E) and as p increases, d1. In all cases as p∞ then d= max(|x1‑x2|, |y1‑y2|)

The above observations on rotational relationships can be used in problem solving — for example, in some cases it may be possible to solve selected problems by applying the sequence of actions:

rotate the problem coordinate frame
solve the problem using the metric appropriate to the rotated frame, and then
rotate the frame back to its original position.

For example, Okabe et al. (2000, Section 3.7) show that in order to determine the Voronoi diagram for a set of planar points under the L metric, the set of points may be rotated by an angle of +π/4 radians and the regions computed using the L1 metric, with the resulting diagram rotated back by ‑π/4 radians to produce the final solution.

Although the p‑metric family provides real value in analytic studies, this lack of rotational invariance is undesirable in the context of distance measurement. Clearly it would be simple for suppliers to include such distance expressions in their GIS packages, but the estimation of the parameters and the interpretation of their results require considerable care. In practice the advent of more powerful hardware and software, and greatly improved network datasets, has meant that many problems can be tackled by explicit network analysis rather than the application of alternative metrics (see further, Chapter 7, Network and Location Analysis).