Spatial autoregressive and Bayesian modeling

Navigation:  Data Exploration and Spatial Statistics > Spatial Regression >

Spatial autoregressive and Bayesian modeling

Previous pageReturn to chapter overviewNext page

Spatial autoregressive modeling

Many of the techniques that are briefly described in this final subsection originate from time series analysis and were subsequently developed from the mid-1950s within the discipline known as spatial statistics. They have been applied and substantially extended in the last 25 years, notably by econometricians, geographers and medical statisticians. Additional disciplines that have made extensive use of these techniques include the actuarial, ecological and environmental sciences. Detailed discussion of the methods and underlying theory may be found in Cressie (1993), Bailey and Gatrell (1995), Anselin (1988), Anselin and Bera (1998), Anselin (2002) and Haining (2003). Haining (2003) discusses the various forms of model that may be appropriate for the statistical analysis of spatial data. He notes the choice of a particular modeling approach often reflects the data types, preferences and experience within the disciplines involved, with social sciences, earth sciences and medical researchers all tending to focus on different approaches.

The procedures have been implemented in: SpaceStat; S-Plus; the R-Spatial project; MATLab Spatial Statistics Toolbox (Pace et al.); WinBUGS; SAM; PySal; and GeoDa packages, amongst others. A number of these have been specifically developed to deal with large (and often sparse) matrix difficulties that arise with detailed regional and national datasets.

A pure spatial autoregressive model simply consists of a spatially lagged version of the dependent variable, y:

As can be seen this is similar to a standard linear regression model where the first term is constructed from a predefined n by n spatial weighting matrix, W, applied to the observed variable, y, together with a spatial autoregression parameter, ρ, which typically has to be estimated from the data. Anselin (2008, p257) describes spatial lag models as “a formal representation of the equilibrium outcome of processes of social and spatial interaction”. Essentially a spatial lag model is expressing the notion that the value of a variable at a given location is related to the values of the same variable measured at nearby locations, reflecting some kind of interaction effect. The spatial weights matrix, W, is almost always standardized such that its rows sum to 1, hence it is effectively including a weighted average of neighboring values into the regression equation. Note that in this case W is not necessarily symmetric (compare this with the symmetry requirements on conditional autoregressive, or CAR, models that are discussed later in this section).

For an individual observation the basic spatial lagged autoregression equation is simply:

Note the similarity of this model to a series of simultaneous equations (hence the description of such models as simultaneous autoregressive, or SAR, models). The model can also be compared with a simple time series autoregressive model, from which it owes its origins:

Since the dependent variable, y, appears on both sides of the expression:

it can be re-arranged to give:

from which we can obtain an expression for the variance of y as:

hence

where C is the variance-covariance matrix.

This derivation has made no distributional assumptions regarding the response variable or the errors. In this model the spatial weights matrix, W, is effectively raised to the power 1, so only first-order neighbors are included in the autoregressive function, and for this reason this model is therefore sometimes described as having a ‘first-order’ specification.

Although widely used, Wall (2004) has pointed out significant weaknesses in the interpretation of the spatial structure of commonly applied SAR and CAR spatial weighting schemes. These models were originally designed for use on infinite regular lattices, rather than finite irregular lattices (which have edges and variable numbers of neighbors per zone) and in this latter context are less well behaved. She recommends considering use of geostatistical models as an alternative or additional approach when analyzing lattice (zonal) datasets, since in this approach the covariance function is modeled directly. Of course, the use of geostatistical methods also have their weaknesses in this context, notably the need to use zone centroids or similar arbitrary points rather than the zones themselves when constructing the experimental variogram (see further, Sections 6.7.1, Core concepts in Geostatistics).

If we add additional predictor variables, x, to the pure spatially lagged autoregression model described earlier we have a mixed regressive spatial autoregressive model (mrsa):

As can be seen this is the same as a standard linear regression model with the addition of the autoregressive component (the SAM software refers to this formulation as a ‘lagged response model’). As before we can re-arrange this expression to solve for y:

The design of this kind of mixed model incorporates spatial autocorrelation together with the influence of other (aspatial) predictor variables. The objective of this revised approach is to obtain a significant improvement over a standard OLS model. The level of improvement will depend on how well the revised model represents or explains the source data, and to an extent this will vary depending on the detailed form of the weighting matrix, W.

Theoretical analyses have shown that this type of model can be derived from a variety of different processes, including direct processes such as spatial diffusion, certain forms of spatial interaction (including spillover and gravity or potential-type process models), and indirect processes such as resource distribution. This lack of a well-defined link between process and form is commonplace in spatial analysis, and is well-documented in fields such as point set clustering and fractal analysis. That is also applies here, in spatial regression modeling, should come as no surprise.

A second approach to SAR modeling is known as the spatial error model. This model is applied when there appears to be significant spatial autocorrelation, but tests for spatial lag effects do not suggests that inclusion of the latter would provide a significant improvement. A decision diagram for selecting the appropriate model based on a set of additional diagnostics (Lagrangian multiplier test statistics) is included in GeoDa (see the tutorial materials for a discussion of their use). The spatial error model (from GeoDa) is defined as:

Hence the basic model is as per the standard linear model, but now the error term is assumed to be made up of a spatially weighted error vector, λWε, and a vector of iid errors, u. We can re-arrange the expression for ε, above, to obtain:

hence the error variance-covariance matrix, C, in this case is given by:

The Georgia, USA educational attainment dataset used in Section 5.6.3, Geographically Weighted Regression (GWR) and in Table 5‑12 can be analyzed in a similar manner using spatial autoregressive methods. If this is conducted within GeoDa the OLS results match those within GWR (although the AIC values differ slightly owing to the differences in the detailed expressions applied). However, to apply a spatial autoregressive model a spatial weights matrix is required. In the following example we have set the spatial weights to be defined by simple first-order rook’s move contiguity (adjacent edges), and then examined the GeoDa diagnostics to determine which form of regression model seems most appropriate to apply. In this instance the spatial error model was identified as the most appropriate and the regression re-run using this model. The results are summarized in Table 5‑13, which is simply an extended version of the Table 5‑12, including the new autoregressive parameter estimates.

Table 5‑13 Georgia dataset — comparative regression estimates and diagnostics

Predictor variables

Global parameter estimate

Spatial error model parameter estimates

GWR parameter estimates

Total population, β1

0.24 x10‑4

0.24 x10‑4

0.14 to 0.28 x10‑4

% rural, β2

‑0.044

‑0.046

‑0.06 to ‑0.03

% elderly, β3

‑0.06*

‑0.099*

‑0.26 to ‑0.06

% foreign born, β4

1.26

1.196

0.51 to 2.42

% poverty, β5

‑0.15

‑0.145

‑0.20 to –0.00

% black, β6

0.022*

0.013*

‑0.04 to 0.08

Intercept, β0

14.78

15.46

12.62 to 16.49

lambda, λ

 

0.313

 

Diagnostics

 

 

 

Residual SS (RSS)

1816

1708

1506

Adjusted R2

0.63

0.67

0.68

Effective parameters

7

7

12.81

AIC/AICc

855.4

846.0

839.2

* not significant

Although the RSS value in Table 5‑13 is not as low as with GWR, the model is intrinsically far simpler and enables a more global view of the relationship between variables. There is an argument for utilizing both global regression and GWR approaches when analyzing datasets of this type, since they provide different perspectives on the data, and different insights into the use of such data for predictive purposes. The spatial error model applied in this example was defined above as:

Observing that:

we have:

Hence this expression models the dependent variable y as a combination of a general (global) linear trend component, Xβ, plus a pure spatial autocorrelation component, λWy, plus a (negative) neighboring trend or predictor component, λWXβ, plus a set of iid random errors, u (the SAM software refers to this formulation as a ‘lagged predictor model’)

Comparing this to the mrsa model above:

we see that the spatial error model can be viewed as a form of mixed spatial lag model with an additional spatial component, the neighboring trend, λWXβ.

These types of model can be generalized still further (Haining, 2003, p355), for example as:

where the scalars α, ρ, and φ, and the vectors β and δ are all parameters to be estimated, and the final term represents spatial autocorrelation on the errors. Clearly one could proceed from the generalized model to the particular, or vice versa. Likewise one could progressively increase or decrease the set of explanatory variables in the model.

Given the considerable complexity of spatial phenomena, Haining suggests a data-driven approach to statistical modeling, which can be seen as fitting comfortably within the Data and Analysis components of the PPDAC framework described in Section 3.1, Spatial analysis as a process, and Figure 3‑3. His approach commences with ESDA, proceeds to model specification for the current data, and then progresses to an iterative cycle of selection and implementation of parameter estimation, assessment of model fit and re-specification where necessary.

Conditional autoregressive and Bayesian modeling

A somewhat different conceptual model, which in practice may produce similar results, is known as conditional autoregressive modeling (CAR). The essential idea here is that the probability of values estimated at any given location are conditional on the level of neighboring values. The standard or ‘proper’ CAR model for the expectation of a specific observation, yi, is of the form:

where μi is the expected value at i, and ρ is a spatial autocorrelation parameter that determines the size and nature (positive or negative) of the spatial neighborhood effect. The summation term in this expression is simply the weighted sum of the mean adjusted values at all other locations j — this may or may not be a reasonable assumption for a particular problem under consideration.

In the standard CAR model spatial weights are often computed using some form of distance decay function. The range of this function may be unbounded or set to a value beyond which the weights are taken as 0. This range might be determined from some a priori knowledge relating to the problem at hand, or perhaps estimated from a semivariogram or correlogram (see further, Section 6.7.1, Core concepts in Geostatistics).

In the CAR model the covariance matrix is of the form:

and if the conditional variances of y are assumed constant this simplifies to:

Requirements on the specification of the weighting matrix, W, and conditional variance matrix, M, include: (i) M is an n by n diagonal matrix with mii>0; (ii) to ensure symmetry of the variance-covariance matrix wijmji=wjimij; and (iii) 0>ρ>ρmax (typically) where ρmax is determined from the largest eigenvalues of M‑1/2WM1/2.

In the study by Lichstein et al. (2002), cited in Section 5.6.2, Simple regression and trend surface modeling, they chose to use CAR following the recommendation of Cressie (1993) and because they felt it to be more appropriate for their study. They found no real difference in the results obtained with the CAR model from those achieved using SAR modeling of the type described in the previous section. Wall (2004) likewise found little difference between CAR and SAR models in her analysis of educational data across USA states.

A range of CAR models are supported by the GeoBUGS extension to the WinBUGS package. This software is specifically designed to support Bayesian rather than frequentist statistical modeling, and uses computationally intensive techniques (Markov Chain Monte Carlo or MCMC simulation with Gibbs sampling) to obtain the fitted parameter estimates and confidence intervals. An ArcGIS tool (Adjacency for WinBUGS) is available from the USGS to generate the spatial adjacency matrix required for WinBUGS CAR models. The application in this case is for modeling and mapping avian abundance, especially for migratory bird species whose conservation is of concern.

Haining (2003) discusses the use of such Bayesian models, in which additional (prior) information (for example, national or regional crime survey data) is used to strengthen the modeling process and reduce bias in local estimates. The Bayesian approach treats the unknown parameters (e.g. the vector β) as a set of random variables, just like the data, to which may be associated prior distributions. The prior guesses for these parameters (possibly ‘borrowed’ from spatially adjacent and/or regional or national information) are then combined with the likelihood of the observed data to obtain posterior distributions for the parameters, from which inferential analysis proceeds. Essentially this provides a broader range of modeling approaches than pure (classical) frequentist analysis, and has been shown to result in substantial improvements over using simple rate data such as standard mortality ratios (SMRs); see for example, Yasui et al. (2000) for a fuller discussion of this question. These kinds of model have mostly been applied in epidemiological studies, for both mapping and modeling purposes, but have also been applied to other forms of spatial data (e.g. see Li et al., 2007).

In the so-called proper CAR model (WinBUGS function car.proper) the variance-covariance matrix is positive definite. The example values given in the WinBUGS manual for M and W based on expected counts, Ei, are of the form:

mii=1/Ei

wij=(Ej/Ei)1/2 for neighboring areas i,j or

wij=0 otherwise

This particular example relates to the Sudden Infant Death Syndrome (SIDS) data described in Section 4.3.3, Ratios, indices, normalization, standardization and rate smoothing, and in Cressie and Chan (1989) and more recently revisited by Berke (2004). Here the definition of neighboring area was not based on adjacency but on distance between county seats (d<30 miles), a value determined from an examination of an experimental variogram (an estimate of a variogram based on sample data). The specific model applied in this case was actually of the form:

where the term in curly brackets is a (Euclidean) distance decay function, with k selected as 0, 1 or 2, and C(k) is a constant of proportionality to ensure results are easily compared across different values of k. In this study the authors chose k=1 as this provided the best results when considered from a likelihood perspective, hence their weights were of the form:

Edge effects in this model are quite significant, since over a third of counties lie on the State boundary and clearly States do not represent closed systems for many (most) applications.

In this example the ‘proper’ (or autoGaussian) model fitted for this dataset was not applied to the full raw dataset, but to a Freeman-Tukey variance-stabilizing square root transform of the data (see Table 1‑3 and Freeman and Tukey, 1950) with Anson County omitted as an outlier. This county is the one picked up as an outlier in Figure 4‑40, the “Excess risk rate map for SIDS data”.

Cressie and Chan had looked for non-spatial explanatory variables based on population density, percentage urban, number of hospital beds per 100,000 population, median family income and non-white live-birth rate. They then extended their analysis to include spatial patterns, but even after doing so could not adequately explain the observed variations in the data for this period, or for the subsequent 5 year period. It remains the case that the causes of SIDS are not fully understood, but medical research has shown that the placement of very young children on their back when sleeping, the use of pacifiers (dummies) and avoidance of overheating, all help to reduce the risks involved substantially. It is reasonable to suggest that the spatial variations observed and their changes over time might have been, in part, a reflection of cultural and social factors (such as advice given to mothers by local medical staff). These factors were not explicitly picked up by the non-spatial explanatory variables. Although such factors may be related to race-specific customs, it is likely that the spatial variations observed and modeled may have reflected variations in these advisory and behavioral factors. Certainly it would have warranted a very close examination of such factors in counties with unusually high and low death rates in each time period.

An intrinsic version of the CAR model (IAR or ICAR) is also supported, in which the variance-covariance matrix is not positive definite, but is semi-definite (WinBUGS functions car.normal and the robust variant car.l1). The intrinsic version (applied initially in an image processing context) is based on pairwise differences between the observed values (similar to the computations used in variogram analysis, from which it originates — see Matheron (1973) and Künsch (1987) for a detailed mathematical treatment) and is now a more popular choice of CAR model for many researchers. Intrinsic models are a generalization of the standard conditional autoregressive models to support certain types of non-stationarity. The example values given in the WinBUGS manual for M and W for the intrinsic CAR model, based on Besag et al., (1991) and Besag and Kooperberg (1995) are of the form:

mii=1/ni where ni is the number of areas adjacent to i, and

or wij=0 otherwise

The use of simple 1/0 weighting schemes for CAR models is not really appropriate for finite irregular lattices, and frequently a row-adjusted scheme of the form W*={wij*} is used, where wij*=wij/wi. (often written within this field as wij/wi+). Hence the expected conditional means, for example, refer to an average rather than a summation. The symmetry requirement for CAR models cited earlier, i.e. wijmji=wjimij implies that the conditional variances should be proportional to 1/wi+.

Having fitted the chosen model to the sample data, the residuals may be examined by mapping and/or by using the Moran I correlogram, I(h), to identify any remaining patterns. If the residuals appear to show little or no spatial pattern it supports the view that the fitted model provides a good representation of the observed spatial patterns. However, as noted earlier, different models with fundamentally different interpretations may provide equally good fits to the data, hence drawing inferences from such models is difficult. Detailed examinations of the likely processes that apply for the particular dataset under consideration are vital for such analyses.

In the examples cited in this subsection, the response variable, y, has been assumed to be continuous. As with GWR, autoregressive models have been developed to handle discrete and binary data, for example autoLogistic and autoPoisson models — see Haining (2003, Chapters 9 and 10) for more details. Haining (2003, p.367 et seq) provides examples of the use of WinBUGS for Bayesian autoregressive modeling of burglaries in Sheffield, UK, by ward (Binomial logistic model) and children excluded from school (Poisson model). He includes sample code and data for these examples, together with maps of the results and provisional interpretations.

Griffith (2005) tested six alternative spatial regression models using data on cases of West Nile Virus (WNV) in the USA, by State (for current mapping of cases see Figure 5-39 below and the USGS website: http://diseasemaps.usgs.gov). These tests included SAR, CAR and spatial filtering specifications (see further, Section 5.6.5, Spatial filtering models). He concluded that none of the models provided an ideal specification for the observed data, but a number of general lessons could be learnt to assist analysts in determining the best approach to use in disease map modeling:

switching between alternative model specifications should yield similar intercept values − if markedly different values are obtained an analyst should be suspicious and ascertain why
non-Normal data (such as the WNV cases) are best described with non-Normal probability models
it appears from the tests on WNV data that Spatial filtering models can be used to explore the nature of spatial autocorrelation effects (positive and negative effects) more quickly and effectively than SAR and CAR models

Figure 5-39 West Nile Virus incidence, USA 2014

wnv2014