Cross tabulations and conditional choropleth plots

Navigation:  Data Exploration and Spatial Statistics > Exploratory Spatial Data Analysis >

Cross tabulations and conditional choropleth plots

Previous pageReturn to chapter overviewNext page

Mapped zonal data typically consist of a single variable, or a ratio of two variables, one of which is acting as a normalization factor, e.g. mapping the ratio

r=persons_in_employment/total_population

Separate maps may be created for each variable or ratio of interest, but these are typically independent entities that may be difficult to compare and interpret. It would be relatively straightforward to create a series of maps of a particular variable of interest, for example the reported rate of lung cancer by health district where each map showed the rate for areas where the proportion of smokers was high, medium or low. This would be a form of control or “conditioning” on the information shown. This simple approach can be implemented within any GIS. The approach can be extended further to two (or more) variables by crosstabulating the source data. If the crosstabulation is carried out on categorical data (e.g. sex, racial grouping) then once again a series of maps may be generated for each cell in the crosstabulation. However, with unclassified continuously varying data it is useful to examine the effect of specific levels of such data on the spatial distribution. Specialized visualization tools have been developed recently to support operations of this type, including CCMaps and extensions to GeoDa (derived from the ideas developed in the CCMaps project). They are known as interactive Conditional Choropleth mapping tools and may be dynamically linked to other visualizations such as box plots, histograms and scatterplots. The aim of such software, in the words of the original authors, is to:

Stimulate analytical reasoning
Detect the unexpected
Discover the unexpected, and
Stimulate hypothesis generation

Figure 5‑12 illustrates this procedure using data on lung cancer mortality rates, by county, for the USA — see Carr et al. (2000, 2002) for a brief description of the method and this particular dataset. There are 9 maps in total. The colored bar at the top shows how the counties have been classified, with for example 34% in the blue (low) category, corresponding to 63.7-375 deaths per 100,000. The breakpoints on this scale may be dragged to provide alternative classification levels, the effects of which are dynamically updated in the map windows. Each map row represents one level of the percentage of the population below the USA designated poverty level (right hand slider scale) and each map column represents one level of the recorded annual precipitation level. The region in South-East of the USA (top right in the set of map windows) appears to have a high incidence of lung cancer mortality and a high score for both conditioning variables.

The figures in the top right of each mapped window show the weighted mean mortality rate, and the R-squared value in the lower right corner shows the percentage of the overall variability accounted for by these weighted means. By adjusting the two conditioning variable sliders (which are actually a form of box plot) or by using a built-in search facility (described as cognostics) a combination of slider values that maximize R2 can be obtained — in this example to a value of just below 43%. For more details regarding the application of CCmapping and associated linked visualizations (e.g. conditional box-plots and conditional scatterplots) please refer to the CCMaps and/or GeoDa documentation and referenced articles.

Figure 5‑12 Conditional Choropleth mapping

clip0168.zoom74