|
|
A somewhat different conceptual model, which may also provide similar results, is known as conditional autoregressive modelling (CAR). In the study by Lichstein et al. (2002), cited earlier, they chose to use CAR rather than SAR modelling, following the recommendation of Cressie (1993) and because they felt it to be more appropriate for their study. They found no real difference in the results obtained with the CAR model from those achieved using SAR modelling.
The standard or proper CAR model for the expectation of a specific observation, yi, is of the form:
![]()
where μi is the expected value at i, and ρ is a spatial autocorrelation parameter that determines the size and nature (positive or negative) of the spatial neighbourhood effect. The summation term in this expression is simply the weighted sum of the mean adjusted values at all other locations j — this may or may not be a reasonable assumption for a particular problem under consideration. Note also that SAR weighting schemes (Ws) can be converted to conditional or CAR schemes (Wc) through a matrix relation of the form:
![]()
although the reverse is not generally possible.
In the standard CAR model spatial weights are often computed using some form of distance decay function. The range of this function may be unbounded or set to a value beyond which the weights are taken as 0. This range might be determined from some a priori knowledge relating to the problem at hand, or perhaps estimated from a semivariogram or correlogram (see further, Section 6.7.1.15). Also, as noted earlier, in the CAR model the covariance matrix is of the form:
![]()
and if the conditional variances of y are assumed constant this simplifies to:
![]()
Requirements on the specification of the weighting matrix, W, and conditional variance matrix, M, include: (i) M is an n by n diagonal matrix with mii>0; (ii) to ensure symmetry of the variance-covariance matrix wijmji=wjimij; and (iii) 0>ρ>ρmax (typically) where ρmax is determined from the largest eigenvalues of M‑1/2WM1/2.
A range of CAR models are supported by the GeoBUGS extension to the WinBUGS package. This software is specifically designed to support Bayesian rather than frequentist statistical modelling, and uses computationally intensive techniques (Markov Chain Monte Carlo or MCMC simulation with Gibbs sampling) to obtain the fitted parameter estimates and confidence intervals. Haining (2003) discusses the use of such Bayesian models, in which additional (prior) information (for example, national or regional crime survey data) is used to strengthen the modelling process and reduce bias in local estimates. The Bayesian approach treats the unknown parameters (e.g. the vector β) as a set of random variables, just like the data, to which may be associated prior distributions. The prior guesses for these parameters are then combined with the likelihood of the observed data to obtain posterior distributions for the parameters, from which inferential analysis proceeds. Essentially this provides a broader range of modelling approaches than pure (classical) frequentist analysis, and has been shown to result in substantial improvements over using simple rate data such as SMRs. See for example, Yasui et al. (2000) for a fuller discussion of this question.
In the so-called proper CAR model (WinBUGS function car.proper) the variance-covariance matrix is positive definite. The example values given in the WinBUGS manual for M and W based on expected counts, Ei, are of the form:
mii=1/Ei
wij=(Ej/Ei)1/2 for neighbouring areas i,j or
wij=0 otherwise
This particular example relates to the Sudden Infant Death Syndrome (SIDS) data described in Section 4.3.3 and in Cressie and Chan (1989) and more recently revisited by Berke (2004). Here the definition of neighbouring area was not based on adjacency but on distance between county seats (d<30 miles), a value determined from an examination of an experimental variogram (an estimate of a variogram based on sample data). The specific model applied in this case was actually of the form:
![]()
where the term in curly brackets is a distance decay function, with k selected as 0, 1 or 2, and C(k) is a constant of proportionality to ensure results are easily compared across different values of k. In this study the authors chose k=1 as this provided the best results when considered from a likelihood perspective, hence their weights were of the form:
![]()
Edge effects in this model are quite significant, since over a third of counties lie on the State boundary and clearly States do not represent closed systems for many (most) applications.
In this example the ‘proper’ (or autoGaussian) model fitted for this dataset was not applied to the full raw dataset, but to a Freeman-Tukey variance-stabilising square root transform of the data (see Table 1‑4 and Freeman and Tukey, 1950) with Anson County omitted as an outlier. This county is the one picked up as an outlier in Figure 4‑39, the Excess risk rate map for SIDS data.
Cressie and Chan had looked for non-spatial explanatory variables based on population density, percentage urban, number of hospital beds per 100,000 population, median family income and non-white live-birth rate. They then extended their analysis to include spatial patterns, but even after doing so could not adequately explain the observed variations in the data for this period, or for the subsequent 5 year period. It remains the case that the causes of SIDS are not fully understood, but medical research has shown that the placement of children on their back when sleeping, the use of pacifiers (dummies) and avoidance of overheating, all help to reduce the risks involved substantially. It is reasonable to suggest that the spatial variations observed and their changes over time might have been, in part, a reflection of cultural and social factors (such as advice given to mothers by local medical staff). These factors were not explicitly picked up by the non-spatial explanatory variables. Although such factors may be related to race-specific customs, it is likely that the spatial variations observed and modelled may have reflected variations in these advisory and behavioural factors. Certainly it would have warranted a very close examination of such factors in counties with unusually high and low death rates in each time period.
An intrinsic version of the CAR model (IAR or ICAR) is also supported, in which the variance-covariance matrix is not positive definite, but is semi-definite (WinBUGS functions car.normal and the robust variant car.l1). The intrinsic version (applied initially in an image processing context) is based on pairwise differences between the observed values (similar to the computations used in variogram analysis, from which it originates — see Matheron (1973) and Künsch (1987) for a detailed mathematical treatment) and is now a more popular choice of CAR model for many researchers. Intrinsic models are a generalisation of the standard conditional autoregressive models to support certain types of non-stationarity. The example values given in the WinBUGS manual for M and W for the intrinsic CAR model, based on Besag et al., (1991) and Besag and Kooperberg (1995) are of the form:
mii=1/ni
where ni is the number of areas adjacent to i, and
wij=1 for neighbouring areas
or
wij=0 otherwise
The use of simple 1/0 weighting schemes for SAR or CAR models is not really appropriate for finite irregular lattices, and frequently a row-adjusted scheme of the form W*={wij*} is used, where wij*=wij/wi. (often written within this field as wij/wi+). Hence the expected conditional means, for example, refer to an average rather than a summation. The symmetry requirement for CAR models cited earlier, i.e. wijmji=wjimij implies that the conditional variances should be proportional to 1/wi+.
Although widely used, Wall (2004) has pointed out significant weaknesses in the spatial interpretation of such weighting schemes when applied in SAR and CAR models of this type. She recommends the use of geostatistical models as an alternative or additional approach, especially when attempting to understand the spatial structure of lattice (zonal) datasets.
Having fitted the chosen CAR or SAR model to the sample data, the residuals may be examined by mapping and/or by using the Moran I correlogram, I(h), to identify any remaining patterns. If the residuals appear to show little or no spatial pattern it supports the view that the fitted model provides a good representation of the observed spatial patterns. However, as noted earlier, different models with fundamentally different interpretations may provide equally good fits to the data, hence drawing inferences from such models is difficult. Detailed examinations of the likely processes that apply for the particular dataset under consideration are vital for such analyses.
In the examples cited in this subsection, the response variable, y, has been assumed to be continuous. As with GWR, autoregressive models have been developed to handle discrete and binary data, for example autoLogistic and autoPoisson models — see Haining (2003, Chapters 9 and 10) for more details. Haining (2003, p.367 et seq) provides examples of the use of WinBUGS for Bayesian autoregressive modelling of burglaries in Sheffield, UK, by ward (Binomial logistic model) and children excluded from school (Poisson model). He includes sample code and data for these examples, together with maps of the results and provisional interpretations.
|
|