﻿ Data Exploration and Spatial Statistics > Point Sets and Distance Statistics > Proximity matrix comparisons

Proximity matrix comparisons

In many instances data are collected from a series of point-like locations and recorded with both location and attribute information. A distance matrix can then readily be constructed from the location data, and in some instances, a similar matrix of ‘proximity’ can be constructed from the attribute data. For example, at each location the genetic make-up of species may be recorded and a measure derived enabling the ‘distance’ (proximity, similarity) between these genetic measures to be calculated for each pair of locations. The result is two (or more) square matrices of matching dimensions. Typically these matrices are symmetric and the researcher is interested in examining whether there is any correlation between the proximity data matrix and the spatial matrix, bearing in mind that we might expect nearby locations to be non-independent on attribute measures, which contradicts classical correlation statistics requirements.

The Mantel test, which is widely used in spatial ecology, is one approach to addressing this issue. The test, devised by Mantel (1967) for analysis of spatio-temporal datasets (see Section 5.4.4, Hot spot and cluster analysis) computes a form of product moment correlation statistic, r, between matching pairs of entries in each distance matrix. The null hypothesis is that this correlation is zero. Only the upper or lower triangles of the two matrices are required for the computation as the entries are symmetric. The resulting correlation coefficient is then tested against a simulated probability distribution computed by randomly permutating the rows and columns of either one of the matrices and then computing the value of r under this permutation. Repeated random permutations (1000+) and computations of r will yield an ordered set of values which will include the original r-value within their spread. Suppose the proportion of permuted values that are larger than the observed value is 7% — this suggested that there is roughly a 7% chance of seeing a correlation as large as the observed value or greater, i.e. the observed value is quite high (despite its absolute value often not being very large), which might be regarded as a significant result indicating that spatial factors (or spatial autocorrelation — see further, Section 5.5.2, Global spatial autocorrelation) in the data are important. The test can be extended in a number of ways, for example by grouping the Euclidean distances into distinct classes and then computing separate correlation coefficients for each distance class, thereby generating a Mantel correlogram. For more details Urban (2003) provides an excellent summary.

Many GIS and non-GIS software packages, especially those designed for use in spatial ecology, support computation of Mantel tests. These include packages such as PASSaGE, the R-Project (see the topic page and specifically the ade4 package), and purely statistical packages such as XLStat (for Excel).

Related techniques that involve matrix comparisons include the work of Hubert and Golledge (1982) and the grid-based methods described in Section 5.3.2, Crosstabulated grid data, the Kappa Index and Cramer’s V statistic. This latter section describes a range of statistics that are used to compare grid datasets, especially remote-sensed data, including the use of Chi-Square statistics, the Kappa Index and Cramer’s V statistic. Hubert and Golledge address the case in which two rectangular matrices represent data on the same n objects (e.g. cities) and the same m attributes (e.g. crime rates, by type, in consecutive years). The authors devise an index that measures the degree to which both matrices are similar, together with a significance testing strategy that takes into account the possible dependency among the m attributes. Spatial dependence issues remain, and an approach to addressing these through a measure of spatial association due to Tjøstheim (1978) is discussed. As an alternative, with more direct recognition of spatial effects, autocorrelation-based regression techniques (see Section 5.6.4, Spatial autoregressive and Bayesian modeling) may be a preferred approach to the analysis of this type of dataset.