Spatial analysis, GIS and software tools

Navigation:  Introduction and terminology >

Spatial analysis, GIS and software tools

Previous pageReturn to chapter overviewNext page

Our objective in producing this Guide is to be comprehensive in terms of concepts and techniques (but not necessarily exhaustive), representative and independent in terms of software tools, and above all practical in terms of application and implementation. However, we believe that it is no longer appropriate to think of a standard, discipline-specific textbook as capable of satisfying every kind of new user need. Accordingly, an innovative feature of our approach here is the range of formats and channels through which we disseminate the material.

Given the vast range of spatial analysis techniques that have been developed over the past half century many topics can only be covered to a limited depth, whilst others have been omitted because they are not implemented in current mainstream GIS products. This is a rapidly changing field and increasingly GIS packages are including analytical tools as standard built‑in facilities or as optional toolsets, add‑ins or analysts. In many instances such facilities are provided by the original software suppliers (commercial vendors or collaborative non‑commercial development teams) whilst in other cases facilities have been developed and are provided by third parties. Many products offer software development kits (SDKs), programming languages and language support, scripting facilities and/or special interfaces for developing one’s own analytical tools or variants.

In addition, a wide variety of web-based or web-deployed tools have become available, enabling datasets to be analyzed and mapped, including dynamic interaction and drill-down capabilities, without the need for local GIS software installation. These tools include the widespread use of Java applets, Flash-based mapping, AJAX and Web 2.0 applications, and interactive Virtual Globe explorers, some of which are described in this Guide. They provide an illustration of the direction that many toolset and service providers are taking.

Throughout this Guide there are numerous examples of the use of software tools that facilitate geospatial analysis. In addition, some subsections of the Guide and the software section of the accompanying website, provide summary information about such tools and links to their suppliers. Commercial software products rarely provide access to source code or full details of the algorithms employed. Typically they provide references to books and articles on which procedures are based, coupled with online help and “white papers” describing their parameters and applications. This means that results produced using one package on a given dataset can rarely be exactly matched to those produced using any other package or through hand‑crafted coding. There are many reasons for these inconsistencies including: differences in the software architectures of the various packages and the algorithms used to implement individual methods; errors in the source materials or their interpretation; coding errors; inconsistencies arising out of the ways in which different GIS packages model, store and manipulate information; and differing treatments of special cases (e.g. missing values, boundaries, adjacency, obstacles, distance computations etc.).

Non‑commercial packages sometimes provide source code and test data for some or all of the analytical functions provided, although it is important to understand that “non‑commercial” often does not mean that users can download the full source code. Source code greatly aids understanding, reproducibility and further development. Such software will often also provide details of known bugs and restrictions associated with functions — although this information may also be provided with commercial products it is generally less transparent. In this respect non‑commercial software may meet the requirements of scientific rigor more fully than many commercial offerings, but is often provided with limited documentation, training tools, cross‑platform testing and/or technical support, and thus is generally more demanding on the users and system administrators. In many instances open source and similar not-for-profit GIS software may also be less generic, focusing on a particular form of spatial representation (e.g. a grid or raster spatial model). Like some commercial software, it may also be designed with particular application areas in mind, such as addressing problems in hydrology or epidemiology.

The process of selecting software tools encourages us to ask: (i) “what is meant by geospatial analysis techniques?” and (ii) “what should we consider to be GIS software?” To some extent the answer to the second question is the simpler, if we are prepared to be guided by self-selection. For our purposes we focus principally on products that claim to provide geographic information systems capabilities, supporting at least 2D mapping (display and output) of raster (grid based) and/or vector (point/line/polygon based) data, with a minimum of basic map manipulation facilities. We concentrate our review on a number of the products most widely used or with the most readily accessible analytical facilities. This leads us beyond the realm of pure GIS. For example: we use examples drawn from packages that do not directly provide mapping facilities (e.g. Crimestat) but which provide input and/or output in widely used GIS map-able formats; products that include some mapping facilities but whose primary purpose is spatial or spatio-temporal data exploration and analysis (e.g. GS+, STIS/SpaceStat, GeoDa, PySal); and products that are general- or special-purpose analytical engines incorporating mapping capabilities (e.g. MATLab with the Mapping Toolbox, WinBUGS with GeoBUGS) — for more details on these and other example software tools, please see the website page:

http://www..spatialanalysisonline.com/software.html

The more difficult of the two questions above is the first — what should be considered as “geospatial analysis”? In conceptual terms, the phrase identifies the subset of techniques that are applicable when, as a minimum, data can be referenced on a two-dimensional frame and relate to terrestrial activities. The results of geospatial analysis will change if the location or extent of the frame changes, or if objects are repositioned within it: if they do not, then “everywhere is nowhere”, location is unimportant, and it is simpler and more appropriate to use conventional, aspatial, techniques.

Many GIS products apply the term (geo)spatial analysis in a very narrow context. In the case of vector-based GIS this typically means operations such as: map overlay (combining two or more maps or map layers according to predefined rules); simple buffering (identifying regions of a map within a specified distance of one or more features, such as towns, roads or rivers); and similar basic operations. This reflects (and is reflected in) the use of the term spatial analysis within the Open Geospatial Consortium (OGC) “simple feature specifications” (see further Table 4‑2). For raster-based GIS, widely used in the environmental sciences and remote sensing, this typically means a range of actions applied to the grid cells of one or more maps (or images) often involving filtering and/or algebraic operations (map algebra). These techniques involve processing one or more raster layers according to simple rules resulting in a new map layer, for example replacing each cell value with some combination of its neighbors’ values, or computing the sum or difference of specific attribute values for each grid cell in two matching raster datasets. Descriptive statistics, such as cell counts, means, variances, maxima, minima, cumulative values, frequencies and a number of other measures and distance computations are also often included in this generic term “spatial analysis”.

However, at this point only the most basic of facilities have been included, albeit those that may be the most frequently used by the greatest number of GIS professionals. To this initial set must be added a large variety of statistical techniques (descriptive, exploratory, explanatory and predictive) that have been designed specifically for spatial and spatio-temporal data. Today such techniques are of great importance in social and political sciences, despite the fact that their origins may often be traced back to problems in the environmental and life sciences, in particular ecology, geology and epidemiology. It is also to be noted that spatial statistics is largely an observational science (like astronomy) rather than an experimental science (like agronomy or pharmaceutical research). This aspect of geospatial science has important implications for analysis, particularly the application of a range of statistical methods to spatial problems.

Limiting the definition of geospatial analysis to 2D mapping operations and spatial statistics remains too restrictive for our purposes. There are other very important areas to be considered. These include: surface analysis —in particular analyzing the properties of physical surfaces, such as gradient, aspect and visibility, and analyzing surface-like data “fields”; network analysis — examining the properties of natural and man-made networks in order to understand the behavior of flows within and around such networks; and locational analysis. GIS-based network analysis may be used to address a wide range of practical problems such as route selection and facility location, and problems involving flows such as those found in hydrology. In many instances location problems relate to networks and as such are often best addressed with tools designed for this purpose, but in others existing networks may have little or no relevance or may be impractical to incorporate within the modeling process. Problems that are not specifically network constrained, such as new road or pipeline routing, regional warehouse location, mobile phone mast positioning, pedestrian movement or the selection of rural community health care sites, may be effectively analyzed (at least initially) without reference to existing physical networks. Locational analysis “in the plane” is also applicable where suitable network datasets are not available, or are too large or expensive to be utilized, or where the location algorithm is very complex or involves the examination or simulation of a very large number of alternative configurations.

A further important aspect of geospatial analysis is visualization ( or geovisualization) — the use, creation and manipulation of images, maps, diagrams, charts, 3D static and dynamic views, high resolution satellite imagery and digital globes, and their associated tabular datasets (see further, Slocum et al., 2008, Dodge et al., 2008, Longley et al. (2010, ch.13) and the work of the GeoVista project team). For further insights into how some of these developments may be applied, see Andrew Hudson-Smith (2008) “Digital Geography: Geographic visualization for urban environments” and Martin Dodge and Rob Kitchin’s earlier “Atlas of Cyberspace” which is now available as a free downloadable document.

GIS packages and web-based services increasingly incorporate a range of such tools, providing static or rotating views, draping images over 2.5D surface representations, providing animations and fly-throughs, dynamic linking and brushing and spatio-temporal visualizations. This latter class of tools has been, until recently, the least developed, reflecting in part the limited range of suitable compatible datasets and the limited set of analytical methods available, although this picture is changing rapidly. One recent example is the availability of image time series from NASA’s Earth Observation Satellites, yielding vast quantities of data on a daily basis (e.g. Aqua mission, commenced 2002; Terra mission, commenced 1999).

Geovisualization is the subject of ongoing research by the International Cartographic Association (ICA), Commission on Geovisualization, who have organized a series of workshops and publications addressing developments in geovisualization, notably with a cartographic focus.

As datasets, software tools and processing capabilities develop, 3D geometric and photo-realistic visualization are becoming a sine qua non of modern geospatial systems and services — see Andy Hudson-Smith’s “Digital Urban” blog for a regularly updated commentary on this field. We expect to see an explosion of tools and services and datasets in this area over the coming years — many examples are included as illustrations in this Guide. Other examples readers may wish to explore include: the static and dynamic visualizations at 3DNature and similar sites; the 2D and 3D Atlas of Switzerland; Urban 3D modeling programmes such as LandExplorer and CityGML; and the integration of GIS technologies and data with digital globe software, e.g. data from Digital Globe and GeoEye/Satellite Imaging, and Earth-based frameworks such as Google Earth, Microsoft Virtual Earth, NASA Worldwind and Edushi (Chinese). There are also automated translators between GIS packages such as ArcGIS and digital Earth models (see for example Arc2Earth).

These novel visualization tools and facilities augment the core tools utilized in spatial analysis throughout many parts of the analytical process: exploration of data; identification of patterns and relationships; construction of models; dynamic interaction with models; and communication of results — see, for example, the recent work of the city of Portland, Oregon, who have used 3D visualization to communicate the results of zoning, crime analysis and other key local variables to the public. Another example is the 3D visualizations provided as part of the web-accessible London Air Quality network (see example at the front of this Guide). These are designed to enable:

users to visualize air pollution in the areas that they work, live or walk
transport planners to identify the most polluted parts of London.
urban planners to see how building density affects pollution concentrations in the City and other high density areas, and
students to understand pollution sources and dispersion characteristics

Physical 3D models and hybrid physical-digital models are also being developed and applied to practical analysis problems. For example: 3D physical models constructed from plaster, wood, paper and plastics have been used for many years in architectural and engineering planning projects; hybrid sandtables are being used to help firefighters in California visualize the progress of wildfires (see Figure 1‑1A, below); very large sculptured solid terrain models (e.g. see STM) are being used for educational purposes, to assist land use modeling programmes, and to facilitate participatory 3D modeling in less-developed communities (P3DM); and 3D digital printing technology is being used to rapidly generate 3D landscapes and cityscapes from GIS, CAD and/or VRML files with planning, security, architectural, archaeological and geological applications (see Figure 1‑1B, below and the websites of Z corporation and Stratasys for more details). To create large landscape models multiple individual prints, which are typically only around 20cm x 20cm x 5cm, are made, in much the same manner as raster file mosaics.

Figure 1‑1A: 3D Physical GIS models: Sand-in-a-box model, Albuquerque, USA

clip0001.zoom31

Figure 1‑1B: 3D Physical GIS models: 3D GIS printing

clip0002.zoom119

GIS software, notably in the commercial sphere, is driven primarily by demand and applicability, as manifest in willingness to pay. Hence, to an extent, the facilities available often reflect commercial and resourcing realities (including the development of improvements in processing and display hardware, and the ready availability of high quality datasets) rather than the status of development in geospatial science. Indeed, there may be many capabilities available in software packages that are provided simply because it is extremely easy for the designers and programmers to implement them, especially those employing object-oriented programming and data models. For example, a given operation may be provided for polygonal features in response to a well-understood application requirement, which is then easily enabled for other features (e.g. point sets, polylines) despite the fact that there may be no known or likely requirement for the facility.

Despite this cautionary note, for specific well-defined or core problems, software developers will frequently utilize the most up-to-date research on algorithms in order to improve the quality (accuracy, optimality) and efficiency (speed, memory usage) of their products. For further information on algorithms and data structures, see the online NIST Dictionary of algorithms and data structures.

Furthermore, the quality, variety and efficiency of spatial analysis facilities provide an important discriminator between commercial offerings in an increasingly competitive and open market for software. However, the ready availability of analysis tools does not imply that one product is necessarily better or more complete than another — it is the selection and application of appropriate tools in a manner that is fit for purpose that is important. Guidance documents exist in some disciplines that assist users in this process, e.g. Perry et al. (2002) dealing with ecological data analysis, and to a significant degree we hope that this Guide will assist users from many disciplines in the selection process.