Copyright (c) 2006-2012 Home page: www.spatialanalysisonline.com.


2.4    Spatial Statistics

2.4.1        Spatial probability

Humans will never have a complete understanding of everything that happens on the Earth’s surface, and so it is often convenient to resort to thinking in terms of probabilities. In principle one could completely characterize the physics of a human hand and a coin, but in practice it is much more productive to assign probabilities to the outcomes of a coin toss. In similar fashion spatial analysts may avoid the virtual impossibility of predicting exactly where landslides will occur by assigning them probabilities based on patterns of known causes, such as clay soils, rainfall, and earthquakes. A map of probabilities assigns each location a value between 0 and 1, forming a probability field.

Such a map considers only the probability of a single, isolated event, however. The probability that two points a short distance apart will both be subject to landslide is not simply the product of the two probabilities, as it would be if the two outcomes were independent, a conclusion that can be seen as another manifestation of Tobler’s First Law. For example, if the probability of a landslide at Point A is ½ and at Point B a short distance away is also ½, the probability that both will be affected is more than ½x½=¼, and possibly even as much as ½. Technically, the marginal probabilities of isolated events may not be as useful as the joint probabilities of related events — and joint probabilities are properties of pairs of points and thus impossible to display in map form unless the number of points is very small.

2.4.2        Probability density

One of the most useful applications of probability to the Earth’s surface concerns uncertainty about location. Suppose the location of a point has been measured using GPS, but inevitably the measurements are subject to uncertainty, in this case amounting to an average error of 5m in both the east-west and north-south directions. Standard methods exist for analyzing measurement error, based on the assumption, well justified by theory, that errors in a measurement form a bell curve or Gaussian distribution. Spatially, one can think of the east-west and north-south bell curves as combining to form a bell. But the surface formed by the bell is not a surface of probability in the sense of Section 2.4.1 — it does not vary between 0 and 1, and it does not give the marginal probability of the presence of the point. Instead, the bell is a surface of probability density, and the probability that the point lies within any defined area is equal to the volume of the bell’s surface over that area. The volume of the entire bell is exactly 1, reflecting the fact that the point is certain to lie somewhere.

It is easy to confuse probability density with spatial probability, since both are fields. But they have very different purposes and contexts. Probability density is most often encountered in analyses of positional uncertainty, including uncertainty over the locations of points and lines.

2.4.3        Uncertainty

Any geographic dataset is only a representation of reality, and it inevitably leaves its user with uncertainty about the nature of the real world that is being represented. This uncertainty may concern positions, as discussed in the Section 2.4.2, but it may also concern attributes, and even topological relationships. Uncertainty in data will propagate into uncertainty about conclusions derived from data. For example, uncertainty in positions will cause uncertainty in distances computed from those positions, in the elements of a W matrix, and in the results of analyses based on that matrix.

Uncertainty can be due to the inaccuracy or limitations of measuring instruments, since an instrument that measures a property to limited accuracy leaves its user uncertain about the true value of the property. It can be due to vagueness in definitions, when land is assigned to classes that are not rigorously defined, so that different observers may classify the same land differently. Uncertainty can also be due to missing or inadequate documentation, when the user is left to guess as to the meaning or definitions of certain aspects of the data. Clearly it is important to spatial analysts to know about the uncertainties present in data, and to investigate how those uncertainties impact the results of analysis. A range of techniques have been developed, and there is a rich literature on uncertainty in spatial data and its impacts (see further, Zhang and Goodchild, 2002 and Longley et al. (2010, Ch. 6).

2.4.4        Statistical inference

One of the most important tools of science is statistical inference, the practice of reasoning from the analysis of samples to conclusions about the larger populations from which the samples were drawn. Entire courses are devoted to the topic, and to its detailed techniques — the t, F, and Chi-Squared tests, linear modeling, and many more. Today it is generally accepted that any result obtained from an experiment, through the analysis of a sample of measurements or responses to a survey, will be subjected to a significance test to determine what conclusions can reasonably be drawn about the larger world that is represented by the measurements or responses.

The earliest developments in statistical inference were made in the context of controlled experiments. For example, much was learned about agriculture by sowing test plots with new kinds of seeds, or submitting test plots to specific treatments. In such experiments it is reasonable to assume that every sample was created through a random process, and that the samples collectively represent what might have happened had the sample been much larger — in other words, what might be expected to happen under similar conditions in the universe of all similar experiments, whether conducted by the experimenter or by a farmer. Because there is variation in the experiment, it is important to know whether the variation observed in the sample is sufficiently large to reach conclusions about variation in the universe as a whole. Figure 212 illustrates this process of statistical inference. A sample is drawn from the population by a random process. Data are then collected about the sample, and analyzed. Finally, inferences are made about the characteristics of the population, within the bounds of uncertainty inherent in the sampling process.

Figure 212 The process of statistical inference

These techniques have become part of the standard apparatus of science, and it is unusual for scientists to question the assumptions that underlie them. But the techniques of spatial analysis are applied in very different circumstances from the controlled experiments of the agricultural scientist or psychologist. Rather than creating a sample by laying out experimental plots or recruiting participants in a survey, the spatial analyst typically has to rely on so-called natural experiments, in which the variation among samples is the result of circumstances beyond the analyst’s control.

In this context the two fundamental principles of statistical inference raise important questions: (i) were the members of the sample selected randomly and independently from a larger population, or did Tobler’s First Law virtually ensure lack of independence, and/or did the basic heterogeneity of the Earth’s surface virtually ensure that samples drawn in another location would be different? (ii) what universe is represented by the samples? and (iii) is it possible to reason from the results of the analysis to conclusions about the universe?

All too often the answers to these questions are negative. Spatial analysis is often conducted on all of the available data, so there is no concept of a universe from which the data were drawn, and about which inferences are to be made. It is rarely possible to argue that sample observations were drawn independently, unless they are spaced far apart. Specialized methods have been devised that circumvent these problems to some degree, and they will be discussed at various points in the book. More often, however, the analyst must be content with results that apply only to the sample under analysis, and cannot be generalized to some larger universe.


Copyright (c) 2006-2012 Home page: www.spatialanalysisonline.com.