Length and area for raster datasets

Navigation:  Building Blocks of Spatial Analysis > Geometric and Related Operations >

Length and area for raster datasets

Previous pageReturn to chapter overviewNext page

Length and area computations may also be meaningful for grid datasets, and in this case the base calculations are very simple, reflecting the grid cell size or resolution (summing the number of cells and multiplying by the edge length or edge length squared, as appropriate). The difficulty in this case is in the definition of what comprises a line or “polygon”. A set of contiguous cells (a clump or patch) with a common attribute value or classification (e.g. deciduous woodland) has an area determined by the number of cells in the clump and the grid resolution. The perimeter length of the clump is a little less obvious. Taking the perimeter as the length of the external boundary of each cell in the clump appears to be the simplest definition, although this will be much longer than a vectorised (smoothed) equivalent measure (typically around 50% longer). Furthermore, depending on the software used, differences in definition of what a clump is (for example, whether a clump can include holes, open edges that are not part of the perimeter, cells whose attributes do not match those of the remaining cells) will result in variations in both area and perimeter calculation.

Many environmental science datasets (e.g. soil type, land-use, vegetation cover) show a near continuous spread of attribute values across cells, with few if any distinct sharp breaks. In this case boundaries are less definitive, determined more by common consent or other procedures (see further, Section 4.2.9, Overlay and combination operations) than by clear demarcation of cell values. The determination of areas and perimeters of zones in such cases is less clear, although some progress has been made using the concept of membership functions (MFs). A further issue with gridded data is that the gridding model (or imaging process) results in distortions or constraints on the way in which data are processed. The distortions are primarily of three kinds — Orientation, Metrics and Resolution:

Orientation — with a single rectangular grid the allocation of source data values to cells and the calculations of lengths and areas will alter if the grid is first rotated; with two or more grids common orientation and resolution are essential if data are to be combined in any way. This may mean that source data (e.g. remote sensed images) will have been manipulated prior to provision of the gridded data to ensure matching is possible, with each source item receiving separate treatment to achieve this end, and/or that data will require resampling during processing. Of course, with multi-band data relating to a single image the orientation and resolution will always match.

Metrics — in order to compute distances for gridded data it is common to add up the number of cells comprising a line or boundary and multiplying this number by the cell edge length, E. This is essentially a “rook’s move” calculation, zig-zagging across the grid, although it may be adjusted by allowing diagonal or “bishop’s moves” also (see Figure 4‑2). Diagonal lengths are then taken as E√2.

Figure 4‑2 3x3 grid neighborhood


This 3x3 neighborhood model is used in many areas of geospatial analysis, including distance computations, image processing, cellular automata methods (where the complete 8-cell set that surrounds the central cell is known as the Moore neighborhood (see further, Section 8.1.2, Cellular automata (CA)), surface analysis, and spatial autocorrelation analysis amongst others.

These methods only provide correct values for the distance between cells in 4 or 8 directions, with global distances to all other directions (for example, those highlighted in gray in Figure 4‑3) being in error by up to 41% for the 4-point rook’s move model and 7.6% for the 8-point queen’s move model. Each of the squares shown in gray below are 2.24 units from the central cell but would be calculated as either 3 units away (rook’s move) or 2.41 units (queen’s move). The position of these gray squares is sometimes described as Knight’s move, again by reference to the movement of chess pieces.

Figure 4‑3 5x5 grid neighborhood


Similar issues arise in computations of cost distance, path alignment and the calculation of surface gradient, aspect and related measures (e.g. material flows). This is not always a question of what is correct or incorrect in some absolute sense, since this will be determined by the application and adequacy or appropriateness of the approach adopted, but it is a significant issue to be aware of.

Resolution — the cell size (and shape) of a grid or image file and the range of attribute values assigned to cells has an important role to play in spatial measures and analysis. Amongst the more obvious is the fact that finer resolution images/grids provide more detailed representation of points, lines and areas, as well as a finer breakdown of associated attribute data. But there are many additional factors to be considered, including the increase in data and processing overhead with finer resolutions. Another factor is the implication that an attribute value assigned to a cell applies throughout the cell, so larger cells imply a greater degree of averaging, including the possibility of missing important variations within the cell. At one extreme, assuming cells are assigned a single attribute value, a grid consisting of a single huge cell will show no variation in attribute values at all, whilst variation between cells will tend to increase as cell sizes are reduced and the number of cells increases. At the other extreme, with cell sizes that are extremely small (and thus very large in number) every cell may be different and contain a unique value or a single observation (presence/ absence). Likewise, with attributes showing binary presence/ absence all cells will show either presence or absence, whilst with continuous variables (e.g. terrain height, soil moisture levels) entries may be continuous (e.g. any real positive number) or categorized (e.g. coded as 0-255 in an 8-bit attribute coding). Cell entries may also be counts of events, in which cases the earlier comments on cell size have a direct bearing — with one large cell all N observations will fall in the single grid cell, whilst with smaller cell sizes there will be a range of values 0≤nN until with very small cell sizes the only values for n will be 0 and 1 unless events are precisely co-located.