Geospatial analysis and model building

Navigation:  Methodological Context >

Geospatial analysis and model building

Previous pageReturn to chapter overviewNext page

Geospatial analysis often involves the design and execution of one or more ‘models’. In many instances pre-existing models or simple variants of these are used. Such models may be provided within standard GIS packages, or they may be offered by third parties as add-ons, or they may have been specially developed for specific applications, such as airborne pollution dispersal modeling, flood modeling, wildfire management or traffic forecasting. In other cases models are designed using a number of approaches and tools, and then applied to address the specific problem at hand. The construction of such models follows the well-trodden procedure known as the “build-fit-criticize” cycle. This envisages the designer building a model on paper or computer or using a physical analogue, and then developing the model by evaluating its behavior under various conditions and datasets. By examining the performance of the model (the criticize part of the cycle) in terms of outputs, performance speed, completeness, robustness, efficiency, quality and ultimately, fitness for purpose, the model can be progressively improved and fine-tuned to meet the requirements.

Elsewhere in this Guide are many examples of specific and generic models that have been developed for use in the geospatial field. These range from simple deterministic models of behavior (distance decay models), to so-called ‘cartographic modeling’ (essentially map algebra and related operations), through to complex micro-simulation modeling frameworks that may be used for a wide range of bottom-up and dynamic modeling problems. Extended discussion of models and modeling in the context of geospatial analysis has been provided by Harvey (1968, Chs. 10-12), Goodchild et al. (1993) and Longley et al. (2010, Chs. 8 and 16)and readers are recommended to review these discussions before embarking upon spatial model building. In particular Longley et al. (2010) characterize geospatial modeling, as opposed to analysis, as involving: multiple stages (perhaps representing different points in time); implementing ideas and hypotheses; and experimenting with policy options and scenarios.

Unfortunately, as with many other areas of GIS, the term model has acquired a wide range of meanings. In a number of GIS packages, such as ArcGIS and Idrisi, model building refers specifically to the use of graphical tools to design and execute a series of geoprocessing tasks. This provides both a way of designing and communicating the workflow associated with a task, and a means of implementing that workflow in a convenient manner. A very simple ArcGIS example illustrates this (Figure 3‑5).

Figure 3‑5 Simple GIS graphical model (ESRI ArcGIS)

clip0021

In this example two items have been dragged to the ArcGIS ModelBuilder design window: the first is a point set file (a shapefile of lung cancer cases from Lancashire, UK, which is an X,Y point set); the second is the geoprocessing function, Kernel Density, which generates two linked items in the graphical window, the processing function itself and the output raster. All that remains is to link the input shapefile to the processing function (identifying the input and flow) and then to run the ‘model’. Each of the graphical entities can be clicked to specify attributes and parameters, and the model saved for future use. When the model is run an output raster of density data is generated, according to the parameters set in the model.

Clearly this is a trivial example — it is essentially an example of the simple form of general-purpose model structure:

INPUT PROCESS OUTPUT

This form of model can readily be developed in a number of ways. For example, in Figure 3‑5 we have a single input, single output and single process. Many single geospatial processing operations require more than one input – for example, the Overlay and Intersect functions (note that the order in which links are established may be important). Likewise, some processing functions will produce multiple outputs.

Even this reflects a very simple model structure. In many real-world workflow situations a whole series of input process output steps are involved, with connections between the various outputs and the next stage of inputs. Such models may be structured into a series of sub-models, nested within a broader overall structure, designed to address a broader range of problems or a complex task. The process of analyzing a large task and breaking it down into smaller, more manageable components, is typical of systems analysis and business analysis exercises, which precede the process of systems design and implementation.

A very wide range of models can be constructed using this kind of approach, and optionally exported to a high-level programming language or script, such as Python or VBscript. Once in the explicit form of the preferred scripting language the functionality of the model can be examined closely, the ordering of links double-checked, extra functionality added, test datasets used to validate the model and outputs, and the ‘program’ retained for repeated use, perhaps with more complex parameterization of inputs.

The type of model described above is relatively simple, in that it flows from input to output in a well-defined manner, using pre-existing processing functionality. This is a static, deterministic model – the result is a specific output which will always be generated in the same manner from a given input. This is ideal for a range well-defined processing tasks, but is still limited in terms of model functionality.

One way to make high-level models of this type become more like traditional concepts of high-level or ‘top-down’ model-building is to incorporate new, purpose-designed, processing modules within the flow. This could involve entirely new model component design, or the development of components that incorporate pre-existing facilities within the GIS package or from third party toolsets (which may or may not be GIS-package related). Typically such models are designed as block diagrams with flows, rather as with the simple models described earlier, but with a much wider range of operations within and between blocks permitted. In particular, one development of the high-level model framework is to permit iterative behavior, leading to models with a dynamic-like structure. In such models one or more output datasets becomes the input data for the next iteration. A series of iterations are then run, with intermediate outputs providing a time-like profile for the model. Indeed, generating a large number of iterations may produce a series of raster grids which provide a sufficient number of data ‘frames’ to generate a video of the process as it develops.

Another common development for high-level models is to build-in stochastic behavior for some components, thus enabling a range of outcomes to be generated from a single starting set of data. In such models the results are obtained after repeated runs of the model, generating a range of outcomes and a probability ‘envelope’ of results. Non-deterministic models of this type are especially appropriate in forecasting or predictive studies.

Figure 3‑6 illustrates the kind of model of this type that can be constructed using Idrisi. In this instance a static model was initially designed to predict areas of likely residential expansion within land use classes designated as ‘forest’. The initial input is the pattern of residential land use in 1991 (RESIDENTIAL91) together with a pre-existing low density residential suitability raster (LDRESSUIT) which is based on factors such as proximity to roads, terrain etc. Both these input files are grids with matching size (565x452 cells) and resolution. In the model diagram rectangular boxes are grids and processing modules are parallelograms. The details of the model are not especially important here, but one can see that the process flows from top left (inputs) to bottom right) output, and is concluded with an overlay of the original input grid and the model output grid (NEWRESID) to produce a map which highlights the predicted growth.

Figure 3‑6 Dynamic residential growth model (Idrisi)

clip0022

This static model forms part of the tutorial data distributed with the software, so readers can work through this example themselves if they have access to Idrisi. There are two aspects of this particular model we wish to highlight: the first is the process box marked RANDOM which appears early in the process — this takes as input the LANDUSE91 grid and introduces a small random chance that forested land might convert to residential in any part of the study area; the second aspect is the red line at the bottom of the model, which takes the output (NEWRESID) and feeds it back as the input for the next iteration of the model. This feedback loop provides a form of dynamic modeling, and enables the overall model to be run for a number of iterations, with intermediate growth maps and associated data being optionally retained for closer examination. Clearly these kinds of macro-level models can provide a very powerful framework for certain classes of geospatial analysis, especially when addressing complex recurring processing tasks and when examining alternative scenarios in multi-criteria decision making problems.

GIS data and analytical processes often contribute a part of a larger multi-disciplinary modeling process. Consider, for example, the FlamMap wildfire modeling system. Input to this model includes eight GIS raster layers that describe the ‘fuels’ and topography: elevation, slope, aspect, fuel model, canopy cover, canopy height, crown base height and crown bulk density; these layers are then combined in the model with an Initial Fuel Moistures file, and optional components including the inclusion of a custom Fuel Model, weather, and wind files. Within the model further models are applied in combination: a surface fire model; crown fire initiation model; crown fire spread model and a dead fuel moisture model. This example serves to illustrate how many components make up many real-world operational models. In this instance GIS provides key inputs, and also is used to map key outputs, which are produced in the form of both raster and vector files for subsequent mapping. These outputs are then utilized in a second stage of spatial analysis, covering issues such as risk and impact assessment (Figure 3‑7), fire management planning, and comparing modeled minimum travel time (MTT) and travel paths of a fire front to actual data. The FlamMap input and output data used to create Figure 3‑7, with accompanying metadata and output files, was obtained from ForestERA.

Figure 3‑7 Modeling wildfire risks, Arizona, USA

clip0023.zoom70

© Forest Ecosystem Restoration Analysis (ForestERA) project, Northern Arizona University (NAU).

Dynamic modeling may be implemented in many other ways. These range from traditional mathematical models, e.g. using computational fluid dynamics (CFD) to model flow processes; to micro-scale simulation using agent-based modeling (ABM, see further Section 8.1 which provides an Introduction to Geocomputation and Section 8.2 which addresses Geosimulation); and a variety of real-time dynamical systems. Examples of the latter include systems that support interfaces for receiving geospatial data in real-time (such as satellite and mobile field-based georeferenced data), and systems that provide interactive interfaces that allow users to explore and manipulate the physical and parameter space of the models they are using. In this context new forms of interactive and dynamic visualization are of increasing importance, facilitating and supporting the analysis and decision-making process.

Models that generate relatively narrow solution envelopes, that have been validated with current or past datasets, may provide greater confidence in forecast values than models that have a wide spread of possible outcomes. On the other hand, such models may give a narrow range of results because they are built to mimic existing data (as occurs with many neural network models) or because the range of possible outcomes has been limited by the model design and parameterization. Models that generate consistently good predictions over time, with a wide range of input data, can reasonably be regarded as more robust and less uncertain, and hence to this degree are ‘better’ than models that have limited predictive success or which have had limited testing with real-world data. It is highly desirable that models that provide predictions (spatial, temporal) include estimates of the uncertainty associated with these predictions, as this helps the end user when making judgments about the results presented, for example in risk assessment situations. For example, the FlamMap output used to create Figure 3‑7 includes a matching uncertainty raster, as does the modeling of zinc concentrations in the soil in Figures 6‑46B and Figure 6‑47B. Ideally, for major areas of long-term concern, models are continually refined and improved in the light of experience, testing and new and improved source of data and infrastructure.

In many types of geospatial model it is normal for the outcomes to vary according to the way in which input parameters are varied and the relative importance, or weight, that is attached to the various model components – these elements can be thought of as the ‘parameter space’ for a given model. For example, the decision as to where to locate a waste-processing plant will depend on a large number of input variables. Each may have a different weight or set of weights that might be applied: construction costs; air pollution; noise; transport access; proximity to water supplies; proximity to existing and planned residential areas; usage costs; visibility etc. The selection of these weights, typically by specialists in the individual disciplines and/or involving public participation, will influence the range of outcomes that might subsequently be evaluated by the planning team. Although not widely available within GIS packages, such multi-criteria decision making (MCDM) frameworks can and have been formalized (e.g. using AHP, ANP and similar procedures), and are implemented in some GIS packages (such as Idrisi).

It is important to note that such procedures can be extremely heavy users of computer resource, not just processing power but also memory and disk storage, especially where sizable intermediate raster files are generated and retained. This in turn leads to important questions as to model design and implementation. For example, it may be effective to use a high-level modeling facility to create the basic structure and program for a particular problem, and then to re-implement all or part of this process by modifying the generated code or implementing the model in a different software environment. This might be a high-resource environment such as a network service or Grid computing facility or a high performance special application. An example of a simple alteration to the storage of intermediate grids would be to store the initial grid plus the changes between grids (i.e. incremental differences), since the latter can generally be stored in a far more compact manner.

An illustration of the value of implementing geospatial models in a different software environment is the widespread use of application-specific modeling tools that accept standard GIS files as input and generate GIS data files as optional output (sometimes referred to as ‘loose coupling’). Many engineering-based models adopt this approach — examples include: hydrographic modeling (e.g. the Groundwater Modeling System from EMS‑i); sound and pollution modeling (e.g. the CadnaA noise mapping facilities from DataKustik); telecommunications planning (e.g. the Cellular Expert software from HNIC-Baltic — actually an example of close-coupling); and wildfire management (see, for example, the FlamMap fire behavior modeling and mapping system discussed earlier). Researchers and analysts in the GIS field should always ensure that they investigate work carried out in other disciplines, especially where there are strong application-specific features of the problem under investigation. In such cases there may well be tools, models, datasets and techniques readily available or suitable for modification that can be of enormous assistance.