Copyright (c) 2006-2012 Home page: www.spatialanalysisonline.com.
Humans will
never have a complete understanding of everything that happens on the Earth’s
surface, and so it is often convenient to resort to thinking in terms of
probabilities. In principle one could completely characterize the physics of a
human hand and a coin, but in practice it is much more productive to assign
probabilities to the outcomes of a coin toss. In similar fashion spatial
analysts may avoid the virtual impossibility of predicting exactly where
landslides will occur by assigning them probabilities based on patterns of
known causes, such as clay soils, rainfall, and earthquakes. A map of
probabilities assigns each location a value between 0 and 1, forming a
probability field.
Such a map
considers only the probability of a single, isolated event, however. The
probability that two points a short distance apart will both be subject to
landslide is not simply the product of the two probabilities, as it would be if
the two outcomes were independent, a conclusion that can be seen as another
manifestation of Tobler’s First Law. For example, if the probability of a landslide at Point A is ½ and at
Point B a short distance away is also ½, the probability that both will be
affected is more than ½x½=¼, and possibly even as much as ½. Technically, the marginal probabilities of isolated
events may not be as useful as the joint
probabilities of related events — and joint probabilities are properties of
pairs of points and thus impossible to display in map form unless the number of
points is very small.
One of the most
useful applications of probability to the Earth’s surface concerns uncertainty
about location. Suppose the location of a point has been measured using GPS, but inevitably the measurements are subject to uncertainty, in this
case amounting to an average error of 5m in both the east-west and north-south
directions. Standard methods exist for analyzing measurement error, based on
the assumption, well justified by theory, that errors in a measurement form a
bell curve or Gaussian distribution. Spatially, one can think of the east-west and north-south bell curves
as combining to form a bell. But the surface formed by the bell is not a
surface of probability in the sense of Section 2.4.1 — it does not vary between 0 and 1, and it does not
give the marginal probability of the presence of the point. Instead, the bell
is a surface of probability density,
and the probability that the point lies within any defined area is equal to the
volume of the bell’s surface over that area. The volume of the entire bell is
exactly 1, reflecting the fact that the point is certain to lie somewhere.
It is easy to
confuse probability density with spatial probability, since both are fields.
But they have very different purposes and contexts. Probability density is most
often encountered in analyses of positional uncertainty, including uncertainty
over the locations of points and lines.
Any geographic
dataset is only a representation of reality, and it inevitably leaves its user
with uncertainty about the nature of the real world that is being represented.
This uncertainty may concern positions, as discussed in the Section 2.4.2, but it may also concern attributes, and even
topological relationships. Uncertainty in data will propagate into uncertainty
about conclusions derived from data. For example, uncertainty in positions will
cause uncertainty in distances computed from those positions, in the elements
of a W matrix, and in the results of
analyses based on that matrix.
Uncertainty can
be due to the inaccuracy or limitations of measuring instruments, since an
instrument that measures a property to limited accuracy leaves its user
uncertain about the true value of the property. It can be due to vagueness in
definitions, when land is assigned to classes that are not rigorously defined,
so that different observers may classify the same land differently. Uncertainty
can also be due to missing or inadequate documentation, when the user is left
to guess as to the meaning or definitions of certain aspects of the data.
Clearly it is important to spatial analysts to know about the uncertainties
present in data, and to investigate how those uncertainties impact the results
of analysis. A range of techniques have been developed, and there is a rich
literature on uncertainty in spatial data and its impacts (see further, Zhang and Goodchild, 2002 and Longley et al. (2010, Ch. 6).
One of the most
important tools of science is statistical inference, the practice of reasoning
from the analysis of samples to conclusions about the larger populations from
which the samples were drawn. Entire courses are devoted to the topic, and to
its detailed techniques — the t, F, and Chi-Squared tests, linear modeling, and many more. Today it is generally accepted that
any result obtained from an experiment, through the analysis of a sample of
measurements or responses to a survey, will be subjected to a significance test to determine what
conclusions can reasonably be drawn about the larger world that is represented
by the measurements or responses.
The earliest
developments in statistical inference were made in the context of controlled
experiments. For example, much was learned about agriculture by sowing test
plots with new kinds of seeds, or submitting test plots to specific treatments.
In such experiments it is reasonable to assume that every sample was created
through a random process, and that the samples collectively represent what
might have happened had the sample been much larger — in other words, what
might be expected to happen under similar conditions in the universe of all
similar experiments, whether conducted by the experimenter or by a farmer.
Because there is variation in the experiment, it is important to know whether
the variation observed in the sample is sufficiently large to reach conclusions
about variation in the universe as a whole. Figure
2‑12 illustrates this process of statistical inference. A
sample is drawn from the population by a random process. Data are then
collected about the sample, and analyzed. Finally, inferences are made about
the characteristics of the population, within the bounds of uncertainty
inherent in the sampling process.
Figure 2‑12 The process of statistical inference

These
techniques have become part of the standard apparatus of science, and it is
unusual for scientists to question the assumptions that underlie them. But the
techniques of spatial analysis are applied in very different circumstances from
the controlled experiments of the agricultural scientist or psychologist.
Rather than creating a sample by laying out experimental plots or recruiting
participants in a survey, the spatial analyst typically has to rely on
so-called natural experiments, in
which the variation among samples is the result of circumstances beyond the
analyst’s control.
In this context
the two fundamental principles of statistical inference raise important
questions: (i) were the members of the sample selected randomly and
independently from a larger population, or did Tobler’s First Law virtually ensure lack of independence, and/or did the basic
heterogeneity of the Earth’s surface virtually ensure that samples drawn in
another location would be different? (ii) what universe is represented by the
samples? and (iii) is it possible to reason from the results of the analysis to
conclusions about the universe?
All too often
the answers to these questions are negative. Spatial analysis is often
conducted on all of the available data, so there is no concept of a universe
from which the data were drawn, and about which inferences are to be made. It
is rarely possible to argue that sample observations were drawn independently,
unless they are spaced far apart. Specialized methods have been devised that
circumvent these problems to some degree, and they will be discussed at various
points in the book. More often, however, the analyst must be content with
results that apply only to the sample under analysis, and cannot be generalized
to some larger universe.