Length and area for vector data

Navigation:  Building Blocks of Spatial Analysis > Geometric and Related Operations >

Length and area for vector data

Previous pageReturn to chapter overviewNext page

Since the majority of GIS mapping involves the use of datasets that are provided in projected, plane coordinates, distances and associated operations such as area calculations are carried out using Euclidean geometry. Polygon areas (and the areas of sections of polygons) are calculated using integration by the trapezoidal or Simpson’s rule, taking advantage of the fact that progressing clockwise around all the polygon vertices generates the desired result.

For example, in Figure 4‑1 the area between the x-axis and the two vertical lines at x1 and x2 is simply the width between the two lines, i.e. (x2‑x1) times the average height of the two verticals, i.e. (y1+y2)/2, giving a rectangle like the first one shown to the right of the polygon. Thus this rectangle has area:

The area between the x-axis and the two verticals at x2 and x3 is shown by the second narrow rectangle, and if we then subtract the shorter and wider rectangle, which represents the area outside of the polygon between the two vertical lines of interest, we obtain the area of the part of the overall polygon we are interested in (A,B,C,D). More generally, if we take the (n‑1) distinct coordinates of the polygon vertices to be the set {xi,yi} with (xn,yn)=(x1,y1) then the total area of the polygon is given by:

Figure 4‑1 Area calculation using Simpson’s rule


Traversing a polygon in a clockwise direction generates a sequence of positive and negative rectangular sections (negative when xi is less than xi+1) whose total sums to a positive value, the area of the polygon. Multiplying out the brackets the formula can also be written as

This is an expression we use again in Section 4.2.5, Centroids and centers, in connection with the determination of polygon centroids. If the order is reversed the total is the same but has a negative sign. This feature can be used to remove unwanted polygons contained within the main polygon (e.g. areas of water or built-up zones within a district when calculating available land area for farming or the area to be included and excluded in density or similar areal computations). The procedure fails if the polygon is self-crossing (which should not occur in well-structure GIS files) or if some of the y‑coordinates are negative — in this latter case either positive-only coordinate systems should be used or a fixed value added (temporarily) to all y-coordinate values to ensure they all remain positive during the calculation process.

Lengths and areas are normally computed using Cartesian (x,y) coordinates, which is satisfactory for most purposes where small regions (under 100 kms x 100 kms) are involved. For larger areas, such as entire countries or continents, and where data are stored and manipulated in latitude/longitude form, lengths (distances) and areas should be computed using spherical or similar measures (e.g. ellipsoidal, geoidal). If one uses Cartesian calculations over large regions the results will be incorrect — for example, using the standard ArcGIS measure tool to show the distance between Cairo and Capetown on a map of Africa will give a value substantially different from the “true”, great circle distance (depending on the projection applied in ArcGIS v9.3 and above the great circle distance can be obtained if the SHIFT key is held down whilst using the measure tool). Computation of polygon areas using spherical or ellipsoidal models is complex, requiring careful numerical integration. Functions such as AREAINT in the Mapping Toolbox of MATLab provide such facilities, and the source code for the module can be inspected and amended if required. It should also be noted that in most cases only two coordinates are utilized, whereas on surfaces with variable heights three dimensional distances, areas and volumetric analysis may be more appropriate (see further, Section 4.2.3, Surface area).

In some cases GIS software products will select the appropriate calculation method based on the dataset, whilst in others (e.g. MapInfo) the option to specify Cartesian or Spherical computation may be provided where relevant. If we identify two mapped locations as having Cartesian coordinates (xi,yi) and (xj,yj) then the distance between them, dij, is simply the familiar:

The length of a polyline or perimeter length of a polygon is then computed as the sum of the individual segments comprising the feature. Distances calculated using this formula, or the 3D equivalent, we shall denote dE, or Euclidean distance.

A straight line, l, in the x-y plane can be expressed as an equation: ax+by+c=0, where a, b and c are constants. Using this formulation the Euclidean distance of a point p(xp,yp) from this line is given by:

Within GIS the distance from a point to a line segment or set of line segments (e.g. a polyline) is frequently needed. This requires computation of one of three possible distances for each (point, line segment) pair: (i) if a line can be extended from the sample point that is orthogonal to the line segment, then the distance can be computed using the general formula dp,l provided above; (ii) and (iii) if case (i) does not apply, the sample point must lie in a position that means the closest point of the line segment is one of its two end points. The minimum of these two is taken in this instance. The process is then repeated for each part of the polyline and the minimum of all the lengths computed selected.

In spherical coordinates the (great circle) distance equation often quoted is the so-called Cosine formula:

where: R is the radius of the Earth (e.g. the average radius of the WGS84 ellipsoid); (φ,λ) pairs are the latitude and longitude values for the points under consideration; and ‑180≤λ≤180, ‑90≤φ≤90 (in degrees, although such calculations normally are computed using radians; n.b. 180 degrees=Pi radians and one radian=57.30 degrees). This formula is sensitive to computational errors in some instances (very small angular differences). A safer equation to use is the Haversine formula:

Most GIS packages do not specify how they carry out such operations, although in many cases basic computations are made using the Euclidean metric and reported in decimal degrees. Distances calculated using the Haversine formula we shall denote dS, or Spherical distance.