Previous topic Next topic 
  

  Translate this page (Google, opens new window/tab):  

Many of the techniques that are briefly described in this final subsection originate from time series analysis and were subsequently developed from the mid-1950s within the discipline known as spatial statistics. They have been applied and substantially extended in the last 25 years, notably by econometricians, geographers and medical statisticians. Additional disciplines that have made extensive use of these techniques include the actuarial, ecological and environmental sciences. Detailed discussion of the methods and underlying theory may be found in Cressie (1993), Bailey and Gatrell (1995), Anselin (1988), Anselin and Bera (1998), Anselin (2002) and Haining (2003). The procedures have been implemented in the SpaceStat, S-Plus, MATLab Spatial Statistics Toolbox (Pace et al.), WinBUGS and GeoDa packages, amongst others. A number of these have been specifically developed to deal with large (and often sparse) matrix difficulties that arise with detailed regional and national datasets.

A pure spatial autoregressive (SAR) model simply consists of a spatially lagged version of the dependent variable, y:

As can be seen this is similar to a standard linear regression model where the first term is constructed from a predefined n by n spatial weighting matrix, W, applied to the observed variable, y, together with a spatial autoregression parameter, ρ, which typically has to be estimated from the data. The spatial weights matrix, W, is almost always standardised such that its rows sum to 1. For an individual observation the equation is simply:

Note the similarity of this model with a simple time series autoregressive model (from which it is derived):

Since the dependent variable, y, appears on both sides of the expression:

it can be re-arranged to solve for y:

from which we can obtain an expression for the variance of y as:

hence

where C is the variance-covariance matrix. This derivation has made no distributional assumptions regarding the response variable or the errors. Furthermore the matrix ρW does not have to be symmetric. The equivalent result for the conditional autoregressive (CAR) model, which we discuss later in this subsection, is:

If we add to the pure SAR model additional predictor variables, x, we have a mixed regressive spatial autoregressive model (mrsa), which is defined as follows:

As can be seen this is the same as a standard linear regression model with the addition of the SAR component. The design of this kind of mixed model specifically incorporates spatial autocorrelation whilst including the influence of other (aspatial) predictor variables. The objective of this revised approach is to obtain a significant improvement over a standard OLS model. The level of improvement will depend on how well the revised model represents or explains the source data, and to an extent this will vary depending on the detailed form of the weighting matrix, W.

Theoretical analyses have shown that this type of model can be derived from a variety of different processes, including direct processes such as spatial diffusion, certain forms of spatial interaction (including spillover and gravity or potential-type process models), and indirect processes such as resource distribution. This lack of a well-defined link between process and form is commonplace in spatial analysis, and is well-documented in fields such as point set clustering and fractal analysis. That is also applies here, in spatial regression modelling, should come as no surprise.

A second approach to SAR modelling is known as the spatial error model. This model is applied when there appears to be significant spatial autocorrelation, but tests for spatial lag effects do not suggests that inclusion of the latter would provide a significant improvement. A decision diagram for selecting the appropriate model based on a set of additional diagnostics (Lagrangian multiplier test statistics) is included in the GeoDa tutorial materials. The spatial error model (from GeoDa) is defined as:

Hence the basic model is as per the standard linear model, but now the error term is assumed to be made up of a spatially weighted vector, λWε, and a vector of iid errors, u.

The Georgia educational attainment dataset used to illustrate GWR can be analysed in a similar manner using SAR methods. If this is conducted within GeoDa the OLS results match those within GWR (although the AIC values differ slightly owing to the differences in the detailed expressions applied). However, to apply an SAR model a spatial weights matrix is required. In the following example we have set the spatial weights to be defined by simple rook’s move contiguity (adjacent edges), and then examined the GeoDa diagnostics to determine which form of SAR regression model seems most appropriate to apply. In this instance the spatial error model was identified as the most appropriate and the regression re-run using this model. The results are summarised in Table 5‑12, which is simply an extended version of the Table 5‑11, including the new SAR parameter estimates. Although the RSS value is not as low as with GWR, the model is intrinsically far simpler and enables a more global view of the relationship between variables. There is an argument for utilising both global OLS/SAR and GWR approaches when analysing datasets of this type, since they provide different perspectives on the data, and different insights into the use of such data for predictive purposes.

Given the error term:

and observing that also:

we have:

Hence this expression models the dependent variable y as a combination of a general (global) linear trend component, Xβ, plus a pure SAR component, λWy, minus a neighbouring trend component, λWXβ, plus a set of iid random errors, u. Comparing this to the mrsa model above:

we see that the spatial error model can be viewed as a form of mixed spatial lag model with an additional autoregressive component, the neighbouring trend, λWXβ.

 

Table 5‑12 Georgia dataset — SAR comparative regression estimates and diagnostics

Predictor variables

Global parameter estimate

SAR-E parameter estimates

GWR parameter estimates

Total population, β1

0.24 x10‑4

0.24 x10‑4

0.14 to 0.28 x10‑4

% rural, β2

‑0.044

‑0.046

‑0.06 to ‑0.03

% elderly, β3

‑0.06*

‑0.099*

‑0.26 to ‑0.06

% foreign born, β4

1.26

1.196

0.51 to 2.42

% poverty, β5

‑0.15

‑0.145

‑0.20 to –0.00

% black, β6

0.022*

0.013*

‑0.04 to 0.08

Intercept, β0

14.78

15.46

12.62 to 16.49

lambda, λ

 

0.313

 

Diagnostics

 

 

 

Residual SS

1816

1708

1506

Adjusted R2

0.63

0.67

0.68

Effective parameters

7

7

12.81

AIC/AICc

855.4

846.0

839.2

* not significant

The type of model can be generalised still further (Haining, 2003, p355), for example as:

where the scalars α, ρ, and φ, and the vectors β and d are all parameters to be estimated, and the final term represents an SAR on the errors. Clearly one could proceed from the generalised model to the particular, or vice versa. Likewise one could progressively increase or decrease the set of explanatory variables in the model. Given the considerable complexity of spatial phenomena, Haining suggests a data-driven approach to statistical modelling, which can be seen as fitting comfortably within the Data and Analysis components of the PPDAC framework described in Section 3.2.1 and Figure 3‑4. His approach commences with ESDA, proceeds to model specification for the current data, and then progresses to an iterative cycle of selection and implementation of parameter estimation, assessment of model fit and re-specification where necessary.

  Back to Top    Back to Home  Previous topic Next topic