Analytical methodologies

Navigation:  Methodological Context >

Analytical methodologies

Previous pageReturn to chapter overviewNext page

As with all scientific disciplines, spatial analysis has its fair share of terminology, abbreviations, methods and tools. In this subsection we start by examining spatial analysis in the broader context of analytical methodologies, and attempt to highlight the similarities with methods applied in a range of disciplines. We then extend the discussion to consider some of the distinctive aspects of spatial and GIS analysis.

Earlier we suggested that the process of spatial analysis often proceeds in a simple sequence from problem specification to outcome. In reality this is an over-simplification — the analytical process is far more complex and iterative than this summary of steps suggests. So, for example, Mitchell (2005) suggests the sequence for spatial data analysis shown in Figure 3‑1, below. Here it appears that there is a natural flow from start to finish, but this is rarely the case. Not only is the process iterative, but at each stage one often looks back to the previous step and re-evaluates the validity of the decisions made. Explicit recognition of this need is incorporated in the approach of Haining (2003, p.359) to “Data-driven statistical modeling”, which is otherwise very similar to Mitchell’s model.

The need for a more iterative approach is partly a reflection of adopting scientific methods, but also a recognition that most analytical tasks take place within a much broader context. For example, pragmatic decisions often have to be made based on a series of common questions:

how well defined is the problem I am seeking to address?
how much time, money and resource can I afford to apply to this problem?
what research has previously been carried out on problems of this type, and what strengths and weaknesses have these shown?

Figure 3‑1 Analytical process — Mitchell

clip0017.zoom48

After Mitchell (2005)

who will be the recipient of the results, and what are their expectations and requirements?
if I commence by examining the requirements of the outcome closely, what implications does this have on my selection of techniques and data, and what caveats should I apply to the scope and validity of my analyses — e.g. in relation to sample/region size, timespan, attribute set, validation etc.?
how will I deal with data inadequacies — e.g. missing data, unsuitable data, delays in receiving or obtaining access to key datasets?
how will I deal with limitations and errors in the software I have chosen to use?
what are the implications of producing wrong or misleading results?
are there independent and verifiable means for validating results obtained?

This list is by no means exhaustive but serves to illustrate the kind of questioning that should form part of any analytical exercise, in GIS and many other fields. In the context of GIS, readers are recommended to study the “Framework for theory” section of the paper by Steinitz (1993), which proposes a six step approach to landscape planning, with many resonances in our own methodological discussions, below. Also recommended is the technical report by Stratton (2006) “Guidance on spatial wildfire analysis”, which again echoes many of the observations we make in the following subsections.

In supplier-client relationships it is often useful to start with the customer requirements and expectations, including the form of presentation of results (visualizations, data analyses, reports, public participation exercises etc.). From a clear understanding of the problem and expectations, one can then work back through the process stages identifying what needs to be included and might be excluded in order to have the best chance of meeting those expectations and requirements within the timescale and resourcing available. Echoes of these ideas are found in the commentary on cartographic modeling provided with the Idrisi GIS package:

“In developing a cartographic model we find it most useful to begin with the final product and proceed backwards in a step-by-step manner towards the existing data. This process guards against the tendency to let the available data shape the final product.”

An interesting example of the modern analytical process at work is a study of the incidence of childhood cancer in relation to distance from high voltage power lines by Draper et al. (2005). The client in this instance was not a single person or organization, but health policy makers and practitioners, together with academic and medical specialists in the field. With such scrutiny, political sensitivity and implications for the health of many children, great care in defining the problem to be addressed, selecting the data, conducting the analysis and reporting the findings was essential. In this instance some of their initial findings were reported by the press and radio prior to the intended publication date, so the authors took the decision to publish ahead of schedule despite the issues this might raise. The researchers summarized their approach in the form shown in Figure 3‑2:

Figure 3‑2 Analytical process — Draper

clip0018.zoom49

After Draper et al. (2005)

Each step of this process is described clearly, qualifying the information and referring to earlier work where appropriate. In addition the paper clarifies the roles of the various participants in the research, addresses issues relating to competing interests (one of the researchers involved was an employee of the main power grid company), and deals with issues of funding and ethical approval. All of these elements provide transparency and enable “clients” to evaluate the quality and importance of the research. The form of publication (via the British Medical Journal) also facilitated extensive online discussion, with comments on the paper by acknowledged experts from around the world being published electronically, as well as the author’s responses to these comments.

An interesting aspect of this research, and a significant number of similar studies, is that it deals with rare events. In general it is far more difficult to be confident about drawing definitive conclusions about such events, but this does not detract from the importance of attempting to carry out research of this kind. What it does highlight is the need to identify and report on the scope of the study, and attempt to identify and highlight any obvious weaknesses in the data or methodology. In the cancer study just cited, for example, the datasets used were details of the location of power grid lines (by type, including any changes that had occurred during the 30+ year study period) together with around 30,000 records stored in the national cancer registry. No active participation from patients or their families was involved, and homes were not visited to measure actual levels of Electro-Magnetic (EM) radiation. These observations raise important but unstated questions: how accurate are the grid line datasets and the cancer registry records? is home address at birth an appropriate measure (surrogate for exposure to EM radiation)? is vertical as well as horizontal proximity to high voltage lines of importance? is proximity to pylons carrying insulators and junction equipment rather than just the lines an important factor? is voltage variability important? were the controls similarly located relative to the power lines as the cases — if not, could this account for some aspects of the results reported? given that the research findings identified a pattern of increased risk for certain cancers that was not a monotonic function of distance, can this result be explained, are the observations purely chance occurrences, or are there some other factors at work?

The latter issue is often described as confounding, i.e. identifying a result which is thought to be explained by a factor that has been identified, but is in reality related to one or more other factors that did not form part of the analysis. For example, in this instance one might also consider that: (i) socio-economic status and lifestyle factors may be associated with proximity to overhead power lines, pylons and transformers; (ii) population densities may vary with distance from power lines; (iii) pre-natal exposure may be important; and (iv) the location of nurseries and playgroups may be relevant. Any of these factors may be important to incorporate in the research. In this instance the authors did analyze some of these points but did not include them in their reporting, whilst other points remain for them to consider further and comment on in a future publication. The “distance from power lines” study is an example of modern scientific research in action — indeed, it has provided an important input to the debate on health and safety, leading to national recommendations affecting future residential building programmes.

Mackay and Oldford (2000) have produced an excellent discussion of the context and role of (statistical) analysis within the broader framework of scientific research methods: “Scientific method, statistical method and the speed of light”. Their paper studies the work carried out in 1879 by A A Michelson, a 24-year old Naval ensign, and his (largely successful) efforts to obtain an accurate estimate of the speed of light. The paper is both an exercise in understanding the role of statistical analysis within scientific research, and a fascinating perspective on a cornerstone of spatial analysis, that of distance determination. They conclude that statistical analysis is defined in large measure by its methodology, in particular its focus on seeking an understanding of a population from sample data. In a similar way spatial analysis and GIS analysis are defined by the methods they adopt, focusing on the investigation of spatial patterns and relationships as a guide to a broader understanding of spatial patterns and processes. As such spatial analysis is defined by both its material (spatial datasets) and its methods. Within this set of methods are the full range of univariate and multivariate statistical methods, coupled with a wide range of specifically spatial tools, and a broad mix of modeling and optimization techniques.

Mackay and Oldford describe the statistical method in terms of a sequence of steps they label PPDAC: Problem; Plan; Data; Analysis; and Conclusions. This methodology is very similar to those we have already described, although a little broader in scope. They then expand upon each step and in the paper relate each step to the process adopted by Michelson. This approach emphasizes the place of formal analysis as very much a part of a process, rather than a distinct discipline that can be considered in isolation. Daskin (1995, Ch. 9) makes a very similar case within the context of planning facility locations, and sees this as an iterative process culminating in implementation and ongoing monitoring. Spatial analysis as a whole may be considered to follow very similar lines — however the steps need clarification and some amendment to Mackay and Oldford’s formulation.

A summary of a revised PPDAC approach is shown in Figure 3‑3, below. As can be seen from the diagram, although the clockwise sequence (15) applies as the principal flow, each stage may and often will feed back to the previous stage. In addition, it may well be beneficial to examine the process in the reverse direction, starting with Problem definition and then examining expectations as to the format and structure of the Conclusions (without pre-judging the outcomes!). This procedure then continues, step-by-step, in an anti-clockwise manner (ea) determining the implications of these expectations for each stage of the process. A more detailed commentary on the PPDAC stages, particularly as they relate to problems in spatial analysis, is provided in Section 3.3, Spatial analysis and the PPDAC model. In addition, an example application of the methodology to noise mapping is provided on the accompanying website in the RESOURCES section.

Figure 3‑3 PPDAC as an iterative process

clip0019.zoom33