Parent topic Previous topic Next topic 

If (xi,yi) are the coordinate pairs of a point set then the Mean Centre, M1(x0,y0), is simply the average of the coordinate values, as we saw earlier:

In this version of the expression for M1 there is an additional (optional) component, wi, representing weights associated with the points included in the calculation. For points sets the weights might be the number of crimes recorded at a particular location, or the number of beds in a hospital; for multiple polygons, where these are represented by their individual centres or centroids, the weights would typically be an attribute value associated with each polygon. Clearly if all weights=1 the formula simplifies to a calculation that is purely based on the coordinate values, with the number of points, n=∑wi.

M1 in this case is the centre of gravity, or centroid, of the point set assuming the locations used in the calculation are treated as point masses. M1 is the location that minimises the sum of (weighted) squared distance to all the points in the set, is simple and fast to compute, and is widely used. However, it is not the point that minimises the sum of the (weighted) distance to each of the points. The latter is known sometimes as the (bivariate) “median centre” (although Crimestat, for example, uses the term median centre to be simply the middle values of the x- and y-coordinates). It is best described as the MAT point (the centre of Minimum Aggregate Travel, M6). This point may be determined using the following iterative procedure, with M1 as the initial pair (x0,y0) and k=0,1…:

In this expression di,k is the distance from the ith point to the kth estimated optimal location. It is usual to adjust distances in this, and similar formulas involving division by interpoint distances, by a small increment and/or to apply code checks to avoid divide-by-zero or close to zero situations. Iteration is continued until the change in the objective function (the cumulative distance) or both coordinates is less than some pre-specified small value. These formulas for M1 and M6 can be derived by taking the standard equation for distance, dE, or distance squared, dE2, and partially differentiating first with respect to x and then with respect to y, and finally equating the results to 0 to determine the minimum value. An extension to this type of weighted mean is provided by the Geographically Weighted Regression (GWR) software package, and is described further in Section 5.6.3. With the GWR software it is possible to compute a series of locally defined (geographically weighted) means, and associated variances, which may then be analysed and/or mapped.

The positions of M1, M3 and M6 are illustrated in Figure 4‑11, using the same set of coordinates as for the vertices A-F of the polygon described in subsection 4.2.5.1.

Figure 4‑11 Point set centres

Note that M1=M2 in this case, and for both M1 and M3 (the MBR centre) their position is unchanged from the polygonal case. Also note that if we move point B to B¢ the position of M6 is unchanged (although the cumulative distance will be greater). This observation is true for each of the points A-F — they can be moved away from the MAT point to an arbitrary distance along a line connecting them to M6, and the position of M6 will be unaffected. This would not generally be the case with M1 or M3.

The locations described for different types of centre, both here and in subsection 4.2.5.1, often assume that all points (or polygons) are weighted equally. If the set of weights, wi, are not all equal, the locations of M1 and M6 will be altered, but M3 will be unchanged. The affect of unequal weights is to “pull” the location of M1 and M6 towards the locations with higher weights. For example, if point B in our previous example had a weight of 3, M1 would be moved to (9.00,5.63) just to the right of M6 in Figure 4‑11, M3 would be unaltered, and M6 would be altered to (10.41,4.29) which is very close to point C. Any weights in the dataset that are zero or missing will generally result in the point being removed from the calculation. Such occurrences require checking to ensure valid data is not being discarded due to incomplete information or errors.

There is no difference between weighted calculations where the weights are integer values and the standard calculations for unit weights if some points in a set are co-located. For example, if point B is recorded in a dataset 3 times, its effective weight would be 3, and instead of 6 points there would be 8 to consider. Co-located point recording is very common, especially in crime and medical datasets where each incident is associated with a nominal rather than precise location. Examples of co-located data might be the closest street intersection to an incident, the nominal coordinates of a shopping mall, the location of the doctor’s surgery where a patient is registered, or a location that recurs because incidents or cases have been rounded to the nearest 50 metres for data protection reasons. It is often a good idea to execute queries looking for duplicate and unweighted locations before conducting analyses of this type, since mapped datasets may not reveal the true underlying patterns. Likewise it is important to check that co-located data are meaningful for the analysis to be undertaken — surgery location is not generally a substitute for a patient’s home address.

The preceding calculations have all been carried out using the standard Euclidean metric, dE. As previously stated, this is the standard for GIS packages, but other metrics may be more appropriate depending on the problem at hand (see further, Section 4.4.1). Specialist packages like LOLA and Crimestat support a range of other metrics. Using the city block metric (L1) and the minimax metric (L) the MAT point, M6, is no longer guaranteed to be unique — all points within a defined region may be equally close to the input point set (Crimestat only provides a single location, so LOLA or similar facilities are preferable for such computations). For example, the point set in Figure 4‑11 with L1 metric has an MAT solution point set (a rectangle) bounded by (4,4) and (10,8). Clearly, if the point set lay on a network the location of M6 would again been different.

To add to the confusion of the above Crimestat provides three further measures of centrality for point sets. Each involves variations on the way the mean is calculated (see further, Table 1‑4): the geometric mean (the x- and y-coordinates are calculated using the sum of the logarithms of the coordinates, averaging and then taking antilogs); the harmonic mean (the x- and y-coordinates are calculated using the reciprocal of the coordinates, averaging and then taking the reciprocal of the result); and the triangulated mean (a Crimestat “special”). The first two alternative means are less sensitive to outliers (extreme values) than the conventional mean centre, whilst the latter measure is claimed to represent the directionality of the data better. The Crimestat manual, Chapter 4, provides more details of each measure, with examples. Note that the harmonic mean is vulnerable to coordinate values which are 0 or close to 0.

  Back to Top    Back to Home Parent topic Previous topic Next topic