Boundaries and zone membership

Navigation:  Building Blocks of Spatial Analysis > Geometric and Related Operations >

Boundaries and zone membership

Previous pageReturn to chapter overviewNext page

The definition and selection of boundaries is often a difficult, but nevertheless essential element in many areas of spatial analysis. One group of problems where this is especially important is in the analysis of point patterns — sets of points in the plane representing cases of a particular disease, or the location of ancient artifacts, or a particular species of plant. If the density of points (frequency of occurrence per unit area) is required then the total area under examination needs to be defined. Definition of this area, A, requires identification of a boundary, although there is now a variety of special procedures (known as kernel density methods) available for computation of densities without pre-defined fixed boundaries (see further, Section 4.3.4, Density, kernels and occupancy).

A further problem relates to the process of modeling the real world with abstracted linear forms (i.e. polylines and polygons). There are a variety of ways of modeling linear forms — mathematical, statistical, fractal, zonal — and each of these models will have implications for analysis. Data recorded in a GIS database, either as vector or grid elements, have already been modeled and to some extent the kinds of analyses that may be carried out are pre-determined. For example, most GIS packages do not provide facilities for analyzing the complexity or “texture” of modeled lines and surfaces, although some, such as Idrisi and Fragstats do support such analyses by examining grid datasets at various scales and/or using variable sized windows. The ArcGIS add-in, Hawth’s Tools/GME provides a line metrics tool for generating measures of estimated fractal dimension and sinuosity of stored polylines. For fractal dimension this software computes, for each polyline:

where n is the number of line segments that make up the polyline, d is the distance between the start and end points of the polyline, and L is the total length of the polyline, i.e. the cumulative length of all polyline segments. Sinuosity is computed as a simple ratio: d/L, for each polyline. In both cases the results are added as new fields to the feature attribute table.

Convex hulls

For point sets in the plane a simple and often useful boundary is the convex hull (Figure 4‑27). This is taken as the convex polygon of smallest area that completely encloses the set (it may also be used for line and polygon sets).

Figure 4‑27 Convex hull of sample point set

clip0063.zoom66

If the data you are working with consist solely of a single point set then the convex hull of this set encloses all the information you have available, even though you may know that the data extend beyond these limits.

The convex hull may also be useful in the analysis of optimum location and networking problems. For example the center of gravity of a set of points in the plane will always lie inside or on the convex hull of the point set, and the sequence of points which comprises the convex hull form part of the solution to certain network routing problems (systematic minimal tours of a set of locations). However, it is important to note that the convex hull of a set of objects (points, lines etc.) is affected markedly by extreme values, so outliers and/or errors can have a substantial impact on the shape, area, perimeter length and centroid of a convex hull. For example, one or two extreme values can substantially increase the area enclosed by the convex hull. Typically, however, convex hulls under-estimate area for many applications and hence will over-estimate densities if computed using an area measure based upon this hull.

By definition the convex hull includes a number of points as part of its boundary. For some problems this may be satisfactory whilst for others it may not be suitable. For example, when analyzing the distribution of points in a study region it may be preferable to minimize the potential distortions to sample statistics by avoiding points that lie on or close to the convex hull. Creating an inner buffer of the convex hull and analyzing the subset of points lying within the core of the region, with or without consideration of points lying in the buffer zone, may be preferable. Alternatively the convex hull may be uniformly increased (buffered) in size, for example by a given percentage of area or by a specified distance such as half the mean distance to the nearest-neighbors of enclosed points.

Non-convex hulls

The computation of convex hulls is a fast and useful facility, but does not necessarily match the requirements of the analyst attempting to define a region enclosing a set of objects. We have already noted, for example, that the method is strongly affected by the location of outliers, and as a result may enclose large regions that are of no interest or relevance to the analysis. Whilst the convex hull of a point set in the plane is unique, any number and variety of non-convex hulls may be generated. Such alternative hulls may meet certain pre-defined application-related criteria. Examples might include: the hull must be a polygon; the polygon must be as near convex as possible; the polygon must have the smallest possible area enclosing the set as possible; the polygon must reflect the density of points in the sample; the polygon must assign each point its own nominal region and include all points.

There are three principal methods for generating non-convex polygonal hulls (NCPHs) that meet specified criteria. These are: (a) expansion; (b) contraction; and (c) density contouring. Some of these methods may be implemented using most GIS packages, but the majority require some programming. TNTMips includes facilities for computing several of the variants described within its Polygon Fitting facility. These algorithms were originally developed to help identify animal “home ranges” but may be utilized in many areas — see for example, Harvey and Barbour (1965), White and Garrott (1990, pp 145-82, “Home range estimation”) and Mitchell (2006).

Case (a) expansion: in this case the non-convex hull is created by assigning each point a surrounding region and then growing this region until all points are covered and a continuous external hull is obtained. This may be achieved in several ways. One method is to compute Voronoi polygons for all the points and use the external boundary of the (finite) set as the NCPH (see for example Figure 4‑33B). This leaves the problem of how to treat the points that make up the convex hull. An alternative non-polygonal approach is to buffer all the points and increase the buffer width until a single merged zone is formed. This process may highlight the need to consider the benefits of using several sub-regions rather than a single region — for example, in cases where many of the observations lie near the region boundary with a large central region containing few observations. Yet another alternative is to overlay the study region with a rectangular grid and define the NCPH with reference to this (for example, requiring that the grid includes every point and a minimum number of cells surrounding every point). Where necessary interstitial gaps in the grid are grown to ensure a contiguous overall region is achieved without necessarily growing the external boundary.

Case (b) contraction: this case involves reduction of the convex hull according to selected rules. The simplest of these is a systematic area minimization procedure. First the area of the convex hull is computed. Next a point on the convex hull is removed and the new convex hull recomputed. The area of this new region is calculated and stored. The process is repeated for all points on the original convex hull and the point that results in the greatest decrease in area is permanently removed. The procedure then iterates until only a pre-defined percentage of the original points remain (e.g. 90%) or the area has been reduced by a given percentage. The result is a convex hull of an optimized subset of the original points. A second procedure, which results in a NCPH, involves a form of shrinkage of the convex hull around the point set, rather like a plastic bag full of small balls having the air removed. In this method the longest linear segment of the convex hull is selected and replaced with two segments connecting the original two end points via an intermediate point that is the closest to the original line — almost the reverse of point-weeding (Section 4.2.4). The process continues until the overall area enclosed has been reduced by a pre-determined amount, or a number of iterations have been reached — 10 is a typical number to use. Hybridization of this method with the preceding method is possible.

Note that in the majority of contraction and expansion methods the result is either a grid or polygon form, with the latter having straight line edges. Procedures also exist in which the restriction of convexity, connectedness and/or to straight edges is relaxed, e.g. forming so-called alpha hulls or alpha shapes where concavities and unconnected sets are permitted. Consider a circle or disc of radius 1/alpha. When alpha is small (close to zero) the disc radius is large (infinite at 0). With a large enough radius the disc can be placed over any given point set such that the set is fully enclosed and precisely two points from the set lie on its boundary (Figure 4‑28A). These are then typically connected by a straight line, and a complete hull can thus be generated. Figure 4‑28B to D illustrate the results obtained by varying alpha, as explained below.

Figure 4‑28 Alpha hulls

clip0064.zoom80

clip0065.zoom80

a. alpha>0

b. alpha>0, increasing positive

clip0066.zoom80

clip0067.zoom80

c. alpha<0

d. alpha<0, increasing negative

In Figure 4‑28B, with larger alpha the disc radius is reduced and only a limited number of circles can be drawn that enclose the point set and have a pair of points from the set on their boundary. The resulting alpha shape excludes many of the original points. Figure 4‑28C and Figure 4‑28D illustrate the shapes formed as alpha becomes increasingly negative. In this case the complement of the positive disc shapes are used. Some have likened this process in 3D to using an ice-cream scoop of a given radius to extract the 3D solid surrounding the immovable point set ― smaller scoops can remove more material until the form becomes increasingly convex and starts to break up.

Specialized tools exist for constructing such hulls in 2D and 3D and are of particular application in visualization of environmental and similar datasets. The images shown in Figure 4‑28 were generated using the interactive Java applet developed by François Bélair whilst at the Computational Geometry Laboratory of McGill University.

Case (c) density contouring: this last of the three cases involves overlaying the study region with a (fine) grid and then computing an estimated point density at each grid point. There are several ways in which the density can be estimated, but the most widely used, within GIS and related packages, is kernel estimation (discussed in some detail in Section 4.3.4, Density, kernels and occupancy). Once a density value has been assigned to each intersection or cell, the region taken for the NPCH may be determined by specifying a threshold density and using the grid directly, or by contouring the grid. The final result may then be used directly or converted to polygonal form, depending on the requirement.

Minimum Bounding Rectangles (MBRs)

One difficulty with convex hulls is that they are slightly awkward shapes to work with, and in many cases it is simpler to deal with a boundary that is a regular shape, such as a circle or rectangle. Rectangles that align with the coordinate system and enclose selected features compactly are known as Minimum Bounding Rectangles, or Minimum Enclosing (or Envelope) Rectangles (MBRs or MERs).

In most instances it is very fast to determine whether the MBR for one feature or set of features is completely contained within a study region or within the MBR of another set of features. Identifying where the center of an MBR lies is also very simple and thus they are useful for first-stage identification of simple overlays and membership.

MBRs and convex hulls provide an indication of the limits to the area over which you can reliably interpolate values and many GIS packages take one or other of these boundaries as the limit for procedures such as interpolation, generation of contour lines or the creation of triangulated irregular networks (TINs). Figure 4‑29 shows the result of interpolating income per capita data to a fine grid using the census map shown earlier in Figure 4‑10 (darker areas have higher per capita income). In this case the ArcGIS Spatial Analyst facility has been used, and as can be seen the interpolation process extends to the MBR of census tract centers and no further.

We can also see, in this example:

the interpolation process replaces census tracts with a grid surface of values, comprising around 250x500 cells, which we have classified into 9 groups for display purposes
the generated surface pattern is closely tied to the designated centroid locations; and
interpolation has occurred for our rogue locations, including those that should have been screened out as being lakes (no data) prior to interpolation, and two small polygons which are classified as land but for which no census data are provided.

Figure 4‑29 Interpolation within “centroid” MBR

clip0068

Data source: US Census Bureau

Another problem with hulls (MBRs, convex polygons or more general hull forms) is that by definition they enclose the points for which you have data, and you may be aware of other points, which could or should be taken into consideration, but which lie outside this region. For example, the nearest tree to one on the boundary may actually be in the adjacent field and not within the pre-defined boundary (Figure 4‑30).

Figure 4‑30 Point locations inside and outside bounding polygon

clip0069.zoom79

MBRs and other simple shapes are provided as options for feature enclosure in many GIS packages and in more specialized toolsets such as Crimestat (which is concerned with point events). Frequently the aim is to select a boundary which is reasonable and meaningful for subsequent analysis, often analyses that involve statistical calculations that rely on knowing the point density in advance. As boundaries are altered, the area, A, is changed and this can have a dramatic influence on the analytical results. This may distort analysis, for example giving poor interpolation results near to the region boundary or under-estimating the average distances between sampled points.

Fuzzy boundaries

A number of GIS packages include special facilities for identifying, selecting and analyzing boundaries. These may be the borders of distinct zones or areas, within which values on one or more attributes are relatively homogeneous, and distinct from those in adjacent areas, or they may be zones of rapid spatial change in more continuously varying attribute data. Amongst the tools available for boundary determination are raster-based GIS products, like Idrisi, and the more specialized family of spatial analysis tools from Biomedware. Indeed Biomedware has a specific product, Boundaryseer, for boundary detection and analysis, using techniques such as overlap and sub-boundary statistical analysis (the pattern analysis software, PASSaGE, provides similar facilities). The problem these products seek to address is that of identifying where and how boundaries between different zones should be drawn and interpreted when the underlying data varies continuously, often without sharp breaks. For example, soil types may be classified by the proportion of sand, clay, organic matter and mineral content. Each of these variables will occur in a mix across a study region and may gradually merge from one recognized type to another.

In order to define boundaries between areas with, for example, different soil types, clear polygon-like boundaries are inappropriate. In such cases a widely used procedure is to allocate all grid cells to a set of k zones or classes based on fuzzy membership rather than binary or 0/1 membership. The idea here is to assign a number between 0 and 1 to indicate how much membership of the zone or class the cell is to be allocated, where 0 means not a member and 0.5 means the cell’s grade of membership is 50% (but this is not a probabilistic measure). If a zone is mapped with a fuzzy boundary, the 50% or 0.5 set of cells is sometime regarded as the equivalent to the boundary or delimiter (cross-over point) in a crisp, discrete model situation. Using this notion, crisp polygon boundaries or identified zone edges in raster modeled data may be replaced with a band based on the degree of zone membership. Alternatively grid datasets may be subjected to fuzzy classification procedures whereby each grid cell acquires a membership value for one of k fuzzy classes, and a series of membership maps are generated, one for each class (see further, the discussion on soft classifiers in Section 4.2.12, Classification and clustering). Subsequent processing on these maps may then be used to locate boundaries.

Several fuzzy membership functions (MFs) have been developed to enable the assignment process to be automated and to reflect expert opinion on the nature and extent of transitional zones. Burrough and McDonnell (1998, Ch.11) provides an excellent discussion of fuzzy sets and their applications in spatial classification and boundary determination problems. The GIS product Idrisi, which utilizes some of Burrough’s work, provides four principal fuzzy MFs:

Sigmoidal or (double) s-shaped functions, which are produced by a combination of linear and cos2() functions in Idrisi’s case, or as an algebraic expression of the form
m=1/(1+a(z‑c)2)

where a is a parameter that varies the width of the function, z is the property being modeled (e.g. proportion of clay) and c is the value of this proportion at the function midpoint (see Figure 4‑31; here membership values of >0.5 are regarded as being definitely members of the set A)

J-shaped functions, which are rather like the sigmoidal MFs but with the rounded top sliced off flat over some distance, x (if x=0 then the two sides of the J meet at a point). The equation used in this case is of the form:
m=1/(1+((z‑a)/(a‑c))2)

Linear functions, which are like the J-shaped function but with linear sides, like the slope of a pitched roof, and are thus simple to calculate and have a fixed and well-defined extent, and

User-defined functions, which are self-explanatory. In most applications MFs are symmetric, although monotonic increasing or decreasing options are provided in Idrisi

An alternative to using MFs as spreading functions is to apply a two-stage process: first, use a (fuzzy) classification procedure to assign a membership value (mik=0.0‑1.0) to each grid cell, i, for each of k classes. This value is based on how similar the cell attributes are to each of k (pre-selected) attribute types or clusters. These may then be separately mapped, as discussed earlier.

Figure 4‑31 Sigmoidal fuzzy membership functions

clip0070

Having generated the membership assignments boundaries are then generated by a second algorithm. Boundaryseer provides three alternative procedures:

Wombling (a family of procedures named after the author, W H Womble, 1951)
Confusion Index (CI) measures, which are almost self-explanatory, and
Classification Entropy (CE) based on information theoretic ideas

Wombling involves examining the gradient of the surface or surfaces under consideration in the immediate neighborhood of each cell or point. Typically, with raster datasets, this process examines the four cells in Rook’s or Bishop’s move position relative to the target cell or point — boundaries are identified based on the rate of change of a linear surface fitted through these four points, with high or sudden rates of change being the most significant. Wombling methods can be applied to vector and image datasets as well as raster data. Wombling edge-detection is provided in a number of other software products, including the SAM and PASSaGE macroecology packages.

The Confusion Index (CI) works on the presumption that if we compute the ratio of the second highest membership value of a cell i, mi2, to the highest, mi1, then any values close to 1 indicate that the cell could realistically be assigned to either class, and hence is likely to be on the boundary between the two classes.

Finally, Classification Entropy, devised by Brown (1998), creates a similar value to the CI measure, again in the range [0,1], but this time using the normalized entropy measure:

where the summation extends over all the k-classes and the measure applies to each of the j cells or locations. This latter kind of measure is used in a number of spatial analysis applications (see further, Section 4.6).

Fuzzy procedures are by no means the only methods for detecting and mapping boundaries. Many other techniques and options exist. For example, the allocation of cells to one of k-classes can be undertaken using purely deterministic (spatially constrained) clustering procedures or probabilistic methods, including Bayesian and belief-based systems. In the latter methods cells are assigned probabilities of membership of the various classes. A fuller description of such methods is provided in the documentation for the Boundaryseer and Idrisi packages. As is apparent from this discussion, boundary detection is not only a complex and subtle process, but is also closely related to general-purpose classification and clustering methods, spatially-constrained clustering methods and techniques applied in image processing.

As noted earlier, boundaries may also be subject to analysis, for example to test hypotheses such as: “is the overlap observed between these two boundaries significant, and if so in what way, or could it have occurred at random?”; “is the form of a boundary unusual or related to those of neighboring areas?” Tools to answer such questions are provided in Boundaryseer but are otherwise not generally available.

Breaklines and natural boundaries

Natural boundaries, such as coastlines, rivers, lake shores, cliffs, man-made obstacles, geological intrusions and fault lines are widespread and are frequently significant in terms of understanding spatial processes. When conducting spatial analysis it is often important to examine datasets for the existence of such features, and to take explicit account of their impact where appropriate. A large number of structures and arrangements may be encountered, and the standard analytical tools provided within your preferred GIS package may not handle these adequately for your purposes. For example, simply masking out regions, such as lakes, areas of sea, regions where the underlying geology is not igneous etc., is often misleading, if all this procedure does is to hide those areas which are covered. Ideally the analytical procedure applied should be able to take account of the breakline or natural boundary in a meaningful manner. In the case of breaklines there are three different types that tend to be supported, principally in connection with surface analysis and modeling (see further, Chapter 6, Surface and Field Analysis).

Terminology usage varies here, so we shall use the form provided within ArcGIS 3D Analyst for convenience:

Case 1: hard breaklines, which are well-defined surface structures, whose shape and heights are known and fixed, and which mark a significant change of surface continuity — for example, indicating a road, a stream, a dam or a major change of slope which has been surveyed along its length (e.g. a ridge). Such breaklines are usually specified as a set of triples {x,y,z} and intermediate values are usually determined by simple linear interpolation
Case 2: soft breaklines are similar to hard breaklines, but do not imply a change in surface continuity. They indicate that the known values along the breakline should be maintained, for example when creating a TIN of the surface, but no sharp change in the continuity of the underlying surface is implied
Case 3: faults are more complex, in that they often have a spatial extent due to displacement and not just a simple linear form

The surface analysis package, Surfer, models breaklines as 3D polylines and faults as 2D polylines or polygons, reflecting their variable form. ArcGIS models breaklines as 3D polylines and faults as 2D polylines. Software packages treat these types of boundary in very different ways. Cases (1) and (2) take into account all data points provided, although in Case (1) z-values computed for the linear breakline take precedence over points that may lie on the other side of the line from the point being analyzed. Case (3) is quite different, since fault lines and zones are treated as barriers. In ArcGIS data values beyond a barrier are not included in computations, where the notion of beyond is defined by visibility from the source point; in Surfer linear barriers are treated as obstacles, which may be skirted around (involving considerable additional distances and extra processing), whilst polygonal faults are treated more like self-contained regions, with interpolation inside and outside the polygonal regions being treated separately.