Districting and re-districting

Navigation:  Building Blocks of Spatial Analysis > Geometric and Related Operations >

Districting and re-districting

Previous pageReturn to chapter overviewNext page

A common analysis problem involves the combination of many small zones (typically stored as polygons) into a smaller number of merged larger zones or districts. This merging process is usually subject to a set of spatial and attribute-related constraints. Spatial constraints might be:

districts must be comprised of adjacent (coterminous) regions
districts must be sensible shapes e.g. reasonably compact

Attribute constraints might be:

no district may have less than 100 people
all districts must have a similar number of people (e.g. within a target range)

This kind of regionalization problem comes in many forms, depending on the precise nature of the problem to be tackled and constraints applied. For example, problems may seek solutions for which a given attribute exceeds some pre-specified threshold value (see further, Duque et al., 2012) or may seek to define zones that maximize within-region flows, often referred to as functional regions. In general the problems can only be solved for non-trivial numbers of source zones using heuristic algorithms. Whilst these will not produce provably optimal solutions (even if such solutions exist) they may be able to produce good quality solutions (substantial improvements over random or manual solutions) within a reasonable amount of processing time. In some instances solutions can be compared with known optimal solutions, enabling the quality of alternative solution procedures to be assessed formally.

Several GIS packages now facilitate the automatic or semi-automatic creation of districts, or the re-organization of existing districts. These operations have many applications, from the designation of service areas to the definition (and re-definition) of census and electoral districts. For example, within Manifold (with the Business Tools extension) there are facilities to enable an existing set of districts to be re-assigned automatically to one of N new districts, such that each new district has a similar (balanced) value by area (an intrinsic attribute) or by a selected attribute (optionally weighted). This automated process may yield unsatisfactory results (for example, very convoluted districts) and an alternative, visual interface facility, is provided to enable the user to make manual alterations and see the results in terms of the attribute or attributes of interest. In the case of ArcGIS Districting is provided as a free downloadable add-in, and is essentially a manual operation supported by map and statistical windows. The CommonGIS package includes a districting facility as part of its integrated Library of Optimization Algorithms for Geographical Information Systems (LOGIS).

Districting and re-districting are generally processes of agglomeration or construction. The initial set of regions is reduced to a smaller set, according to selected rules. Automating this process involves a series of initial allocations, comparison of the results with the constraints and targets, and then re-allocation of selected smaller regions until the targets are met as closely as possible. As with many such problems, it may not be possible to prove that a given solution is the best possible (i.e. is optimal), but only that it is the best that has been found using the procedures adopted.

Creating new districts can be a confusing process. In addition to the kinds of issue we have already discussed there are two important affects to be aware of: scale (grouping or statistical) effects; and zoning (arrangement) effects.

The first of these issues, scale effects, are best understood through the use of a simple example. Consider the employment statistics shown in Table 4‑4. Areas A and B both contain a total of 100,000 people who are classified as either employed or not. In area A 10% of both Europeans and Asians are unemployed (i.e. equal proportions), and likewise in Area B we have equal proportions (this time 20% unemployed). So we expect that combining areas A and B will give us 200,000 people, with an equal proportion of Europeans and Asians unemployed (we would guess this to be 15%), but it is not the case — 13.6% of Europeans and 18.3% of Asians are seen to be unemployed! The reason for this unexpected result is that in Area A there are many more Europeans than Asians, so we are working from different total populations. This kind of problem is widespread in spatial datasets and is best addressed by the use of multiple scales of analysis, if possible and appropriate, as part of the initial research process (e.g. using variance/semi-variance analysis techniques, fractal analysis, or creating optimized districts with specified attributes as discussed above).

Table 4‑4 Regional employment data — grouping affects

 

Employed

(000s)

Unemployed

(000s)

Total (000s)

(Unemployed %)

Area A

 

 

 

European

81

9

90 (10%)

Asian

9

1

10 (10%)

Total

90

10

100 (10%)

Area B

 

 

 

European

40

10

50 (20%)

Asian

40

10

50 (20%)

Total

80

20

100 (20%)

A and B

 

 

 

European

121

19

140 (13.6%)

Asian

49

11

60 (18.3%)

Total

170

30

200 (15%)

The second issue is due to the way in which voting and census areas are defined — their shape, and the way in which they are aggregated, affects the results and can even change which party is elected. This is not to say that a particular arrangement will have such effects, but that it is possible to deliberately produce a districting plan that meets specific criteria, at least for a single attribute. This question (with or without reference to scale effects) is often referred to by the awkward name Modifiable Areal Unit Problem, or MAUP, as described by Openshaw (1997, 1984) and more recently discussed in some detail by Wong (2008).

Figure 4‑18 illustrates this issue for an idealized region consisting of 9 small voting districts. The individual zone, row, column and overall total number of voters are shown in diagram A, with a total of 1420 voters of whom roughly 56% will vote for the first party listed/the red party (R) and 44% for the second party listed/ the blue party (B). With 9 voting districts we expect roughly 5 to be won by the reds and 4 by the blues on a “first past the post” voting system (majority in a voting district wins the district), as is indeed the case in this example. However, if these zones are actually not the voting districts themselves, but combinations of adjacent zones are used to define the voting areas, then the results may be quite different. As diagrams B to F show, with a first-past-the-post voting system then we could have a result in which every district was won by the reds (case C), to one in which 75% of the districts were won by the blues (case F). Note that the solutions shown are not unique, and several arrangements of adjacent zones will give the same voting results.

Figure 4‑18 Grouping data — Zone arrangement effects on voting results

clip0047.zoom83

So it is not just the process of grouping that generates confusing results, but also the pattern of grouping, which is of great interest to those responsible for defining and revising electoral district boundaries. And this is not just a problem confined to voting patterns. For example, if the information being gathered relates to the proportions of trace metals (for example lead and zinc) in the soil, similar issues arise. Samples based on different field boundaries would show that in some arrangements the proportion of lead exceeded that of zinc, whilst other arrangements would show the opposite results. Similarly, measures of inter-zonal flows and spatial autocorrelation statistics (see further, Section 5.5.2, Global spatial autocorrelation) are significantly affected by scale and pattern aspects of grouping zones.

In practice the problem of zones arrangements may not be as significant an issue as it might appear. As Professor Richard Webber, the noted expert on geodemographics has remarked (private communication): “I have yet to come across any real world example of a conclusion being invalidly reached as a result of this hypothetical possibility”. A wise strategy is to examine the data at various levels of aggregation, including the lowest (finest level) possible, and check that scale and zoning effects are understood and are unlikely to distort interpretations. Automated zoning was used to create the current UK 2001 Census of Population areas — see further Martin (2000) and Openshaw (1977). The procedure (known as AZP) was based on a 7-step approach, as follows:

Step 1: Start by generating a random zoning system of N small zones into M regions, M<N
Step 2: Make a list of the M regions.
Step 3: Select and remove any region K at random from this list
Step 4: Identify a set of zones bordering on members of region K that could be moved into region K without destroying the internal contiguity of the donor region(s)
Step 5: Randomly select zones from this list until either there is a local improvement in the current value of the objective function or a move that is equivalently as good as the current best. Then make the move, update the list of candidate zones, and return to step 4 or else repeat step 5 until the list is exhausted
Step 6: When the list for region K is exhausted return to step 3, select another region, and repeat steps 4 to 6
Step 7: Repeat steps 2-6 until no further improving moves are made

In the UK case the initial small zones were taken as unit postcodes, which identify roughly 10-14 addresses. Unfortunately these individual addresses are stored as a list of georeferenced point data (nominal centers) rather than postcode areas, so the first stage of the analysis involved generating Voronoi polygons from these points, for the whole of the UK. These initial polygons were then merged to form unit postcode areas (Figure 4‑19), based on the generated polygons and road alignments. Once these basic building blocks had been constructed the automated zoning process could begin. This is illustrated as a series of assignments in Figure 4‑20, starting with an arbitrary assignment of contiguous unit postcode areas (Figure 4‑20a), and then progressively selecting such areas for possible assignment to an adjacent zone (e.g. the pink regions in Figure 4‑20b and Figure 4‑20e). Assignment rules included target population size and a measure of social homogeneity, coupled with a measure of shape compactness and enforced contiguity. This process was then applied to a number of test areas across the UK, as is illustrated in the two scenarios, A and B, for part of Manchester, shown in Figure 4‑21 (computed zones are shown with colors assigned on a ‘unique values’ basis). These scenarios were as follows:

A: Threshold population=100; Target population=150; Shape constraint P2A=(perimeter squared)/area; no social homogeneity constraint applied
B: Threshold population=100; Target population=250; Shape constraint P2A=(perimeter squared)/area; social homogeneity constraint = tenure and dwelling type intra-area correlations

If scenario B is modified to exclude the shape constraint the assignment of areas is broadly similar, with detailed variations resulting in less compact final areas. and just 5% containing 40-99 households. Based on this methodology the Great Britain Ordnance Survey developed a full set of Output Areas (OAs) with a target population of 125 households with a minimum of 40. These OAs were used for the 2001 census enabling OA data and postcodes to be linked, and providing a set of OAs that can be grouped into higher level units, such as wards and counties, as required. A total of 175,434 OAs were created in this manner, with 37.5% containing 120-129 households, 79.6% containing 110-139 households,

Figure 4‑19 Creating postcode polygons

clip0048

Figure 4‑20 Automated Zone Procedure (AZP)

clip0050

clip0051

clip0052

clip0053

clip0054

clip0055

Figure 4‑21 AZP applied to part of Manchester, UK

A. Target population 150

B. Target population 250

clip0056.zoom50

clip0057.zoom50