Parent topic Previous topic Next topic 
  

  Translate this page (Google, opens new window/tab):  

It is often the case that event data, such as disease incidence, has associated temporal information, for example: the date of birth; the date of death; the date of diagnosis. In such cases it is often of interest to determine whether there is evidence of spatio-temporal clustering. The question was initially raised by Knox (1964) when investigating cases of childhood leukemia in a region of NE England. Knox proposed a simple means of testing for such clustering, or space-time interaction. He suggested that each case, i, one could compute the (Euclidean) distance to every other case, j, and record this in a table or matrix as the set {xij}. Likewise one could compute the separation in time between each of these events, and produce a second set {yij}. For the data under consideration Knox then sought to identify a critical distance, D, and critical time interval, T, that appeared to be meaningful for the problem being examined. Using these critical values, the data could then be classified into a 2x2 contingency table, with the suspected spatio-temporal cluster being the count of events that fell in the first cell, i.e. the cell with xij<D and yij<T. Knox has 96 cases and hence there were 96(95)/2=4560 pairs to consider. Of these there were 152 close pairs in time and 25 close pairs in space, with 5 being close in both space and time. The expected number can be estimated as per normal contingency tables using the marginal totals, i.e. as 152 x 25/4560=0.8333. The question then arises as to how to test whether the observed value of 5 is significantly greater than the expected value which is <1. At the time Knox proposed using the Poisson distribution with mean 0.8333 as the basis for significance testing, which suggested that a value of 5 was highly significant.

Subsequently many statisticians examined Knox’s approach and ideas, highlighting a number of problems with the method and suggesting a range of modifications and developments. Of particular interest in the paper by Mantel (1967) , in which he develops Knox’s ideas and which has led to not only tests for spatio-temporal clustering, but also a more general methodology that has been adopted widely in spatial ecology (see further, Section 5.4.5, below). Mantel proposed using a test statistic of the form:

This measure has the advantage of including the actual distance and time measures, rather than discarding this information. Knox’s test can be seen as a special case of Mantel’s Z statistic if the xij<D are coded as 1 and yij<T are also coded as 1, and otherwise entries are coded as 0. The Mantel statistic then simply counts the number of close pairs. Note that, depending on how the summation limits are defined, it may be necessary to divide the result by 2, and that in this form distances are treated as symmetric.

To test the significance of the observed Z-value Mantel proposes using a Monte Carlo simulation approach. A simple procedure is to randomly permute the rows and columns of one of the matrices, typically the locations matrix. After each permutation the Z-statistic is computed. A set of, say 1000 permutations are performed, generating a probability distribution for Z under the assumption that the space-time matching is random. The observed Z-value can then be compared with this computed probability distribution to obtain an estimate of the significance of the observed result.

If the Mantel Z-statistic is amended slightly, normalizing it to fall in the range [-1,1], it can be seen to be form of product moment correlation coefficient. This is achieved by adjusting the data values by subtracting the mean value in each table and dividing by the observed standard deviation (i.e. a z-transform), and then adjusting for the number of pairs: n(n-1)/2 less 1 to obtain the degrees of freedom:

Several issues may make the analysis of such data more difficult than appears at first sight. Some of these issues include: is the examination of pairwise data sufficient? (are 3-way interactions important for example); are there any factors in the determination of the temporal data that may bias the results (for example, how close are onset of a disease and diagnosis? are cases detected because they are being looked for especially or defined as cases where they were not so defined in the past/elsewhere in the study region?; how are critical time and distance values to be determined (for Knox tests), and is it reasonable to assume this distance is constant when underlying population densities may vary considerably?; if one examines a region in which an unexpectedly large number of cases have been reported, testing for space-time clustering may show no significant spatio-temporal effects, which is actually the result of region pre-selection rather than failure to detect a spatio-temporal effect; changes in the underlying population distribution in the study region over time may have occurred, which will affect the results; the distance measure applied may not be appropriate — Mantel suggested that for contagious diseases in particular a reciprocal transform be used to adjust for the overly strong influence of large distances and times on the results (with a constant included to avoid the distortions apparent with very small times and distances).

Jaquez (1996) proposed using k-nearest neighbors (k-NN) rather than an explicit distance measure. This test is similar to those described above, but does not rely on distance directly — instead he defines the k-nearest neighbors as the set of cases as near or nearer to a given case that the kth NN. He then defines a similar expression to the basic Mantel or Knox statistic:

In this statistic the variables are taken as binary values as per the Knox model, so the totals are counts. As with Mantel’s test he proposes permuting one or other table to generate a suitable reference distribution (he suggests permuting the rows of the time matrix). In a range of tests Jaquez demonstrated that the k-NN approach was more powerful than the standard Mantel and Knox methods and is less susceptible to several of the issues described above. The value of k chosen may be varied to test the sensitivity of the results to the value chosen.

Kulldorf and Hjalmars (1999) sought to address another of the weaknesses of the Knox test, that of population shift bias. They showed that if one has the background population data over time, and therefore information on population growth or decline in the various parts of the study region, this element of bias can be removed. This is achieved by randomly assigning cases to a given region and timeslot in proportion to the actual population (or population at risk) in that region at that time. Computations and tests then proceed much as per Mantel and Jaquez describe. The difficulty with this approach, is that it requires access to data that may not be readily available.

A range of spatial and spatio-temporal cluster analysis tools are provided in the National Cancer Institute’s (NCI) SaTScan software, which is available free of charge, from http://www.satscan.org/. These include purely spatial scanning statistics (as described further in Section 5.2.6), the scanning version of permutation-type models described above, and the space-time scan statistics developed by Kulldorf (1997). The latter extends the spatial scanning approach to include a time dimension, with the scan being a cylinder (the spatial scan being the cylinder radius and the time scan as the height). A key feature of this software is the identification of both the existence of significant clusters (by size) and where these clusters are located.

  Back to Top    Back to Home Parent topic Previous topic Next topic