Modeling dynamic processes within GIS

Navigation:  Geocomputational methods and modeling > Introduction to Geocomputation >

Modeling dynamic processes within GIS

Previous pageReturn to chapter overviewNext page

Geocomputation is a cutting edge research area within the field of GIS and geospatial analysis. For this reason it is strongly influenced by recent developments in programming, data processing and interface design. Nowhere is this more apparent than in the concern for modeling of dynamic process.

Current commercial and public domain GIS software systems all contain numerous tools for acquiring, pre-processing, and transforming data. Their use in modeling includes data management, format conversion, projection change, resampling, and raster-vector conversion. GIS also include excellent tools for visualization/mapping, rendering, querying, and analyzing model results, as well as assessing the accuracies and uncertainties associated with inputs and outputs.

Typically, all of the capabilities described above are accessible via end-user graphical and command line interfaces. However, these capabilities have recently become accessible through Application Programming Interfaces (APIs), via software libraries. The exposure of GIS APIs is a significant recent improvement, as external programmers now have access to the underlying software components or ‘objects’ upon which GIS software vendors base their systems. This is perhaps the most pertinent enhancement, as many of the techniques used in GIS analysis are more robust if they can be linked with an extensive toolkit of methods for simulation (see further, below).

Recently in GIS there has also been a move to use industry-standard programming languages (e.g. Java, C++, and Visual Basic), and scripting languages (e.g. Python, VBScript, and Jscript) rather than proprietary, home grown scripting languages (e.g. ESRI’s Arc Macro Language, AML, or Avenue). Interoperability standards such as the Microsoft .NET framework facilitate this process by allowing compliant packages to be called from the same script. In addition to scripts, graphical flowcharts can be used to express sequences of operations that define a model (see further, Section 3.4, Geospatial analysis and model building). One of the first graphic platforms for conceptualizing and implementing spatial models was the ERDAS software, which allows the user to build complex modeling sequences from primitive operations. ESRI’s ArcGIS and Clark Labs’ Idrisi are other examples of GIS products that allow models to be authored and executed in a graphical environment. In principle, graphic-model building can be used for dynamic modeling via an iterative process, where the output of one time-step becomes the input for the next. However, this method poses two problems:

the GIS will not have been designed for an iterative process, requiring the user to re-enter the data at the beginning of each time step, and
the time required to run a model can be considerable

The former of these problems can be overcome with scripting languages; both can potentially be overcome by integrating the GIS with a simulation/modeling (s/m) system better equipped for the task at hand. Before exploring the possibilities of linking GIS and s/m systems, the following subsection evaluates the capability of GIS to handle space-time information, which computer simulations generate in volume.

Representing time and change within GIS

The subject of time within GIS has received a considerable amount of attention. Heywood et al. (2006) comment that, ideally, GIS would be able to represent temporal change using methods that explicitly represent spatial change, as well as different states through time. Furthermore, methods allowing direct manipulation and comparison of simulated or observational data in temporal and spatial dimensions should be catered for. In reality there are two main challenges restricting the integration of time within GIS:

1) continuous data over a period of time are rarely available for an entity or system of interest; and

2) data models and structures able to record, store, and visualize information about an object in different temporal states are still in their infancy

In the context of geocomputation the former challenge is less of a constraint since computer simulation is capable of generating an abundance of data over a continuous period of time, while much progress has been made on the latter issue. The following discussion outlines issues related to the representation of time and change, as well as approaches for incorporating space-time information within GIS. The basic objective of any temporal database is to record change over time, where change can be thought of as an event or collection of events. An event might be a change in state of one or more locations, entities, or both. Changes that might affect an event can be distinguished in terms of their temporal pattern. Peuquet (2005) has suggested four types:

Continuous: events occurring throughout some period of time
Majorative: events occurring most of the time
Sporadic: events occurring some of the time, and
Unique: events that only occur once

The distribution of events within these temporal patterns can also be very complex (e.g. chaotic, cyclic, or steady state). This is complicated further because change may occur at various rates as well (e.g. from sudden to gradual). Hence, duration and frequency are important descriptive characteristics within this taxonomy of temporal patterns.

There are three approaches for capturing space-time information within a GIS:

location-based
time-based, and
entity-based

The only method of viewing a data model within existing GIS, as a space-time representation, is as a temporal series of spatially-registered ‘snapshots’. Invariably this approach employs a raster data model with only a single information type stored (e.g. elevation, density, precipitation, etc.) for each cell at any one point in time. Information for the entire layer is stored for each time step, regardless of whether change has occurred since the previous step. There are several criticisms of this approach. Firstly, the data volume increases enormously, because redundant information is stored in consecutive snapshots. Secondly, the state of a spatial entity can only be retrieved by querying cells of adjacent snapshots, because information is stored implicitly between each time step. And finally, the exact point when change has occurred cannot be determined. Langran (1992) has proposed a modification of this approach. The temporal-raster (or grid) approach allows multiple values to be stored for each pixel. A new value, and the time at which change occurred for each pixel, is stored which can result in a variable number of records for each cell. Recording the time at which change has occurred allows for values to be sorted by time. The most recent value for each cell can therefore be retrieved, which represents the present state of the system. The obvious advantage to this approach is the reduction of redundant data stored for each cell.

Peuquet and Duan (1995) have proposed a time-based approach to storing space-time information within a GIS, where change is stored as a sequence of events through time. Time is stored in increasing order from an initial point, with the temporal interval correlating to successive events. An event is recorded at the time when the amount of accumulated change is considered significant, or by another domain-specific rule. This type of representation has the advantage of facilitating time-based queries, and the addition of a new event is straightforward as it can simply be added to the end of the timeline. Furthermore, in terms of modeling an important feature of any model is its ability to represent alternative versions of the same reality. The concept of representing multiple realities over time is called branching. Branching allows various model simulation runs to be compared, or simulation results to be compared to observed data. The time-based approach facilitates the branching of time in order to represent alternative or parallel sequences of events resulting from specific scenarios, because it is a strictly ordinal representation.

Finally, several entity-based space-time models have been proposed. Conceptually these models extend the topological vector approach (e.g. coverage model), tracking changes in the geometry of entities incrementally through time. The amendment vector model was the first of this type, and extended frameworks have been proposed subsequently. Besides maintaining the integrity of entities and their changing topology, these approaches are able to represent asynchronous changes to entity geometries. However, the space-time topology of these vectors becomes increasingly complex as amendments accumulate through time. In addition, aspatial entity attributes can change over time. To record aspatial changes, a separate relational database is often used. However, if change occurs at a different rate between the spatial and aspatial aspects of an entity, maintaining the identity of individual entities becomes difficult, especially when entities split or merge.

Object-oriented data models have transformed the entity-based storage of space-time information within GIS and have become mainstream within commercial GIS (e.g. the geodatabase structure with ArcGIS). They have grown increasingly sophisticated, catering for a powerful modeling environment. The object-oriented data model provides a cohesive representation that allows the identity of objects, as well as complex interrelationships to be maintained through time. Specifically, temporal and location behavior can be assigned as an attribute of features rather than the space itself, which has the distinct advantage of allowing objects to be updated asynchronously.

Despite the advantages of the object-oriented data model, Reitsma and Albrecht (2006) observe that, to date, no data model or data structure allows the representation of processes (i.e. recording a process that has changed the state of an object within a model). Consequently, queries about where a process is occurring at an instant of time cannot be expressed with these current approaches. Notwithstanding, object-oriented data models are the canonical approach to the storage of space-time data generated by agent-based models and their visualization within GIS. Nevertheless, the visualization of agent-based models within GIS is still limited currently to a temporal series of snapshots.

Linkage/coupling versus integration/embedding

Models implemented as direct extensions of an underlying GIS, through either graphic model-building or scripts, generally make two assumptions: 1) all operations required by the model are available in the GIS (or in another system called by the model); and 2) the GIS provides sufficient performance to handle the execution of the model. In reality, a GIS will often fail to provide adequate performance, especially with very large datasets and a large number of iterations, because it has not been designed as a simulation/modeling (s/m) engine. This one-size-fits-all approach inherent in GIS is a limiting factor, and attention has therefore been focused on linking, either through coupling or integration/embedding GIS with s/m systems.

In situations where GIS and s/m systems already exist (e.g. as commercial products), or the cost of rebuilding the functionality of one system into another is too great, the systems can be coupled. Coupling can be broadly defined as the connection of two stand-alone systems by data transfer. Three types of coupling are distinguishable, although these are only a subset of the much larger fields of enterprise application integration (Linthicum, 2000) and software interoperability (Sondheim et al., 2005). The attributes of each approach from loose to tight/close are described below. Table 8‑1 summarizes the competing objectives of the different coupling approaches.

Loose Coupling: A loose connection usually involves the asynchronous operation of functions within each system, with data exchanged between systems in the form of files. For example, the GIS might be used to prepare inputs, which are then passed to the s/m system, where after execution the results of the model are returned to the GIS for display and analysis. This approach requires the GIS and s/m system to understand the same data format; if no common format is available an additional piece of software will be required to convert formats in both directions. Occasionally, specific new programs must be developed to perform format modifications.

Moderate Coupling: Essentially this category encapsulates techniques between loose and tight/close coupling. For example, Westervelt (2002) advocates remote procedure calls (RPCs) and shared database access links between the GIS and s/m system, allowing indirect communication between the systems. Inevitably, this reduces the execution speed of the integrated system, and decreases the ability to simultaneously execute components belonging to the different software.

Table 8‑1 Agent-based modeling and GIS coupling

Objective and Explanation

Loose

Moderate

Close/Tight

Integration Speed: The programmer time involved in linking the programs

Fast

Medium

Slow

Programmer Expertise: Required level of software development expertise

Low

High

Medium

Multiple Authorship Avoidance: In some instances it might be necessary for the programmer to modify the original software product. Any alteration reduces the ownership responsibility. Major alterations could totally sever this link, resulting in limited or no support by the original author(s)

High

Medium

Low

Execution Speed: How rapidly does the integrated software execute?

Slow

Medium

Fast

Simultaneous Execution: Can components of the system run simultaneously and communicate with one another? Can the components operate on separate platforms?

Low

Low

High

Debugging: How difficult is it to locate execution errors in the linked system?

Easy

Moderate

Hard

adapted from Westervelt (2002) — grayed boxes are considered more desirable characteristics

Tight or Close Coupling: This type of linkage is characterized by the simultaneous operation of systems allowing direct inter-system communication during the program execution. For example, standards such as Microsoft’s COM and .NET frameworks allow a single script to invoke commands from both systems (Ungerer and Goodchild, 2002). A variant of this approach allows inter-system communication by different processes that may be run on one of more networked computers (i.e. distributed processing).

Coupling has often been the preferred approach for linking GIS and s/m systems. However, this has tended to result in very specialized and isolated solutions, which have prevented the standardization of linkage. An alternative to coupling is to embed or to integrate the required functionality of either the GIS or s/m system within the dominant system using its underlying programming language. The final system is either referred to as GIS-centric or modeling-centric depending on which system is dominant. In both instances, the GIS tools or modeling capabilities can be executed by calling functions from the dominant system, usually through a Graphical User Interface (GUI). Compared to coupling, an embedded or integrated system is more likely to appear seamless to a user. However, in the past, integration has been based on existing closed and monolithic GIS and simulation systems, which pose a risk of designing systems which are also closed, monolithic, and therefore costly.

Interest in modeling-centric systems has increased considerably over recent years, predominately because of the development of s/m toolkits with scripting capabilities that do not require advanced computer programming skills (Gilbert and Bankes, 2002). Often the s/m toolkit can access GIS functions, such as data management and visualization capabilities, from a GIS software library. For example, the Repast toolkit exploits functions from GeoTools (a Java GIS software library) for importing and exporting data, Java Topology Suite (JTS) for data manipulation, and OpenMap for visualization. The toolkit itself maintains the agents and environment (i.e. their attributes), using identity relationships for communication between the different systems. Functions available from GIS software libraries reduce the development time of a model, and are likely to be more efficient because they have been developed over many years with attention to efficiency. Additionally, the use of standard GIS tools for spatial analysis improves functional transparency of a model, as it makes use of well known and understood algorithms. Alternatively, spatial data management and analysis functions can be developed within the modeling toolkit, although this strategy may impose substantial costs in terms of time: to program the model; to update spatial data; and to use spatial analysis functions within the model.

The GIS-centric approach is an attractive alternative, not least because the large user-base of some GIS expands the potential user-base for the final model. Analogous to the modeling-centric approach, GIS-centric integration can be carried out using software libraries of s/m functions accessed through the GIS interface (see, for example, software tools from Natureserve and SAIC).

Brown et al. (2005) have proposed an alternative approach which straddles both the GIS-centric and modeling-centric frameworks. Rather than providing functionality within one system, this middleware-based approach manages connections between systems, allowing a model to make use of the functionality available within the GIS or the s/m toolkit most appropriate for a given task. Thus, the middleware approach allows the s/m toolkit to handle the identity and relationship of, and between, agents and their environments. The GIS would then manage spatial features as well as temporal and topological relationships of the model. Essentially, the s/m toolkit handles what it is designed for (i.e. implementing the model), while the GIS can be used to run the model and provide visualization of the output. An example of this approach is the ABM extension within ArcGIS (referred to as Agent Analyst), which allows users to create, edit, and run Repast (Python language variant, RepastPy) models from within ArcGIS.

Commenting on their experience in implementing the Agent Analyst software and dynamic modeling, ESRI scientists Johnston and Maguire (2007) found that a number of important software issues arose with this approach, confirming many of the observations we have made earlier:

how to store and manage many outputs from simulations and scenarios
how time is handled explicitly
synchronizing input time series with different time intervals
tools to analyze the simulation results
metrics to compare and evaluate different scenarios

Current experience suggests that researchers interested in developing a geospatial model involving many interacting agents (possibly tens of thousands) with complex behaviors and interactions should consider either GIS-centric or modeling-centric integration rather than a middleware approach. A GIS is either integrated into an s/m toolkit, or vice versa; the definition of a middleware approach is essentially tight coupling (see above). At present, for many problems, commencing with a modeling-centric approach is likely to yield more effective results with fewer technical problems.