Statistical Methods and Spatial Data

Navigation:  Data Exploration and Spatial Statistics >

Statistical Methods and Spatial Data

Previous pageReturn to chapter overviewNext page

The “R Spatial” project, a part of the overall R Statistics OpenSource software programme, includes many facilities for spatial data handling, display and statistical analysis. As such it provides perhaps the most complete collection of software tools for analysts. These include facilities for:

Point pattern analysis
Geostatistics
Disease mapping and analysis
Spatial regression, and
Ecological analysis

Details of the main routines that are available via this project are listed in the “R-Project spatial statistics software packages” Appendix at the end of this Guide. As will be seen, the functionality provided incorporates many of the core statistical methods that are discussed in the Sections that follow.

Many of the techniques of spatial analysis described elsewhere in this Guide make use of statistical measures and methods. These include the contextual discussion in Section 2.3, Spatial Statistics, listing of univariate statistical measures in Table 1‑3, and directional analysis in Section 4.5. Statistical procedures with specific spatial extensions are described in this chapter and in Section 6.7, Geostatistical Interpolation Methods. In each case ideas and methods from mainstream statistics have been extended and developed in order to address the specific needs of spatial datasets. The nature of these extensions is different from the ways in which multivariate statistics are derived from their univariate counterparts because of the ways in which they depend upon the fundamental organizing concepts of distance, direction, contiguity and scale. In many instances classical hypothesis testing and inferential procedures cannot be applied to spatial problems (or at least, not without reservations and/or the use of specialized modeling techniques). This may be because the datasets do not satisfy classical independence or distributional requirements and/or because the sampling framework is unknown or unsuitable.

The result of research in this field over the past 50 years or so has been the development of a collection of core statistical procedures for spatial analysis. Many of these procedures are descriptive and/or exploratory. Such methods are not without the support of strong statistical foundations, but recognize that the requirements and assumptions of classical statistics are often not strictly met. However, recent advances in spatial modeling and estimation procedures, inferential methods, and associated software tools mean that it is now possible to model relatively complex, fine structured, large spatial datasets using a wide range of methods.

Cressie (1993) proposed a taxonomy of spatial statistics based on the underlying model of the dataset being considered. The three main topics he identified using this approach are:

point pattern analysis: corresponding to a location-specific view of the data (discussed here in Sections 5.4, Point Sets and Distance Statistics and 5.6, Spatial Regression)
lattice or regional analysis: corresponding to zonal models of space, notably planar enforced sets of regions (such as administrative or census districts — discussed here in many of the sections of this chapter), and
geostatistical modeling: applying to a continuous field view of the underlying dataset — core issues relating to spatial modeling are discussed in Section 5.5, Spatial Autocorrelation, and geostatistical methods as applied to interpolation of field data in Section 6.7, Geostatistical Interpolation Methods

Cressie’s basic taxonomy has been revised and developed further by Anselin (2002, p.14). He summarizes key aspects associated with the object/field distinction in spatial data models and their implications as shown in Table 5‑1. This analysis highlights not only the more obvious differences between the two main data models, but also the implications of these in terms of the analytical approaches implied. It also identifies the focus on interpolation and infill in the case of fields, versus extrapolation and domain expansion (spatial and temporal prediction) in the case of vector structures. The distinctions between the various groupings are not always clear, and methods applied in one area are often carried across in part to others. For example, a grid may represent a field view, a lattice view or an aggregated point set view of underlying data. Some aspects of this issue are described in Section 5.3, Grid-based Statistics and Metrics.

Table 5‑1 Implications of Data Models

 

Object

Field

GIS

vector

raster

Spatial Data

points, lines, polygons

surfaces

Location

discrete

continuous

Observations

process realization

sample

Spatial Arrangement

spatial weights

distance function

Statistical Analysis

lattice

geostatistics

Prediction

extrapolation

interpolation

Models

lag and error

error

Asymptotics

expanding domain

infill

In the field of ecological data analysis, Perry et al. (2002) provide an excellent review of spatial pattern analysis and statistical methods. Their ‘guidelines’ paper includes recommendations and examples of many of the techniques discussed in the subsequent sections of this chapter. Although their commentary applies primarily to ecological data (e.g. entomology, animal and plant distribution, forestry), which has an emphasis on one-dimensional (transect) analysis and sampled rectangular zones (quadrats) , much of their discussion may equally be applied to spatial data derived from other application areas. A version of the summary table they include in their paper is provided as Table 5‑2.

Table 5‑2 Description of methods for analysis of spatial data in ecology

Method

Type of data

A

B

C

D

E

F

G

Ripley’s K and L

Point (x,y)

n

y

y

(y)

n

1,2

y

Quadrat variance methods*

Point (x,y)

n

(n)

y

n

n

1

n

Block quadrat variance methods*

Point with attributes (x,y,z)

n

(n)

y

(y)

n

2

n

Correlograms, Moran’s I, Geary C etc.

Point with attributes (x,y,z)

n

y

y

y

n

1,2

y

Geostatistics — variograms

Point with attributes (x,y,z)

n

n

y

y

n

1,2

y

Geostatistics — Kriging

Point with attributes (x,y,z)

y

y

y

y

n

1,2

y

Angular correlation*

Point with attributes (x,y,z)

n

y

n

y

n

2

y

Wavelets*

Point with attributes (x,y,z)

n

n

y

n

y

1,2

n

SADIE: spatial analysis by distance indices*

Point with attributes (x,y,z)

n

y

n

n

y

1,2

y

Landscape metrics

Area with attributes (A,z)

(n)

(n)

n

(n)

n

2

y

Variance-mean (Morisita, Taylor etc.)*

Non-spatial (attributes only)

n

n

n

n

n

1,2

y

Nearest neighbor

Point (x,y)

n

n

n

n

n

1,2

y

Key: A: Model based? B: Allows hypothesis tests? C: Information available at multiple scales? D: Information available on anisotropy? E: Information available on local pattern? F: 1- or 2-dimensions or both 1 and 2 dimensions? G: Irregularly spaced units allowed? n=no, y=yes, (n)=rarely, (y)=possible; * technique not currently covered in this Guide but is supported in ecological analysis software such as SAM and PASSaGE. For more details see Perry et al. (2002)