Directional analysis of linear datasets

Navigation:  Building Blocks of Spatial Analysis > Directional Operations >

Directional analysis of linear datasets

Previous pageReturn to chapter overviewNext page

Directional analysis applied to linear forms (line segments, polylines, or smooth curves) is sometimes known as alignment analysis or lineament analysis. Line elements within a GIS are normally directed, in the sense that they have a start and end point determined during data capture. This directed aspect of the line may or may not be relevant, and if relevant may be incorrect for the analysis you are considering. Hence all such datasets need to be examined carefully before proceeding with analysis.

In addition to line-based data from which orientation is derived, the same types of analysis may be applied to point datasets with an associated directional attribute or attributes. For example, these might be: recordings of the initial migration direction of birds of various types from a small area of woodland; the dispersal pattern of seeds from a tree; the wind direction recorded at a series of meteorological stations at hourly intervals; or the direction of a particular type of crime event in a city with respect to a given reference point. As such this is simply a particular application of such statistics to a defined dataset. However, if such data are stored within a GIS as attributes in a table associated with a point set, then conventional descriptive statistics such as mean and variance should not be used. Similar concerns apply to any form of modulo or cyclic data, for example fields containing temporal data recording the time of day in minutes from midnight or numeric day of week associated with events recorded over a number of weeks.

Data with directional information is essentially of two types: directed (or vector) data, where the direction is unique and specified as an angle in the range [0,360] or [0,2π] using radian measure; and oriented or bi-directional data, often referred to as axial data, where only the orientation is defined. Axial data are normally doubled, and converted to a 360 range using modulo arithmetic, processed and the results are then re-converted to a [0,180] or [0,π] range.

Determination and processing of line direction within a GIS is problematic for a number of reasons:

the way in which a line is represented, the level of detail it portrays, and the extent to which real world features are generalized
the data capture process and composite form of line representation (polylines) — where is the true start and end of the line for directional analysis purposes?
the nature of circular (or cyclic) measure: the orientation of an undirected line at 90 degrees to the vertical is indistinguishable from one at 270 degrees; furthermore the difference between 280 and 90 is not taken to be 280‑90=190, but 360‑280+90=170. Similarly the mean direction of 3 lines at 280 (or ‑80), 90 and 90 is not sensibly calculated as (280+90+90)/3=186.7 or as (‑80+90+90)/3=16.7, but a consistent and meaningful definition of such an average is needed. With two angles, 350 and 10 from due north this is even more obvious, as a mean direction of 180 (i.e. due South) does not make sense

We treat each of these issues in turn. If the source data of forms are suspected of being fractal in nature, across a very wide range of scales, then in theory they may exhibit no reliable tangents (directions) at any point. In practice the process of data capture accepts that our model within a GIS has involved simplification of the real world, and that the data may be represented as a collection of non-fractal discrete elements (polylines or pixels). Assuming this representation is acceptable and meaningful for the problem being investigated, analysis of the captured data can proceed. Particular attention should be given to the location of the initial and final points (end nodes) in such cases, as these may offer more acceptable point pairs for the analysis of broader directional patterns than any of the intervening line segments.

More problematic is the issue of generalization. Subsequent to data capture, feature representation within the GIS may be subject to some form of generalization as discussed earlier (Section 4.2.3, Surface area). These processes significantly alter the direction of linear forms, especially the component parts (segments) of polylines. For these reasons, when performing directional analysis on polylines, a range of alternative ways of describing the line direction may be needed. Examples include: end node to end node; linear best fit to all nodes; disaggregated analysis (treating all line segments as separate elements); and weighted analysis (treating a polyline as a weighted average of the directions of its component segments, e.g. weighted by segment length). Smooth curves (e.g. contours) may be treated in a similar manner, approximating these by polylines and applying the same concepts as for standard polylines. A GIS package that supports such selection facilities is TNTMips — most other packages have more limited facilities.

These approaches address the first and second bullet points above. One way to address the third bullet point is to treat the lines or line segments as vectors (i.e. having an origin, a magnitude and a direction with respect to that origin) and then to use trigonometric functions of the directions rather than the directions themselves to perform the computations. This will give us the direction (and optionally magnitude) of the resultant vector, r (the mean effect of each vector “pulling” in its own direction). For example:

let the set of N=i+1 points determining a polyline define a set of i directions {θi} from a given origin with respect to a predefined direction (e.g. grid north)
compute the two vector components (northing and easting): Vn=Σcosθi and Ve=Σsinθi

The resultant vector, r, has mean or preferred direction tan-1(Ve/Vn). For example, using our earlier example with three vectors at ‑80, 90 and 90 from horizontal the resultant mean direction is +80.3; with the two vectors 350 and 10 the resultant is 0 (due north). The directional mean by itself is of limited value unless the underlying data demonstrate a consistent directional pattern that one wishes to summarize. If a set of vectors show an arbitrary or random pattern the mean direction could be almost any value, which is of little use.

The length or magnitude of the resultant vector if all N component vectors have unit magnitude or are simply provided as angular measurements is simply:

If {θi}=0 for all i, all the Vn components will be 0 and all the Ve components will be 1, hence |r|=N where |r| denotes the vector magnitude of r. Dividing through by N standardizes |r| such that the length of the mean vector, r*=|r*|, lies in the range [0,1]. Larger values suggest greater clustering of the sample vectors around the mean.

The circular variance is simply var=1‑|r|/N, or var=1‑r*, and again lies in the range [0,1]. The circular standard deviation is defined as:

If the point set for analysis includes the vector length, vi, then the expression for the directional mean can be generalized (weighted by length) as Vn=Σvicos θi and Ve=Σvisin θi. The resultant vector magnitude in this case will no longer lie in the range [0,1] if normalized by the number of vectors. Assuming that we have a suitably encoded dataset consisting of distinct or linked polylines (e.g. a stream network), we can explore directional trends in the entire dataset or subsections of the data. This process is illustrated in Figure 4‑76 for a sample stream network in the Crowe Butte region of Washington State, using tools from the TNTMips package.

Figure 4‑76 Directional analysis of streams

A. Hydrology, Crow Butte region

B. Direction roses: end point and segment versions

clip0135.zoom75

clip0136

clip0136

The two direction rose diagrams show the number of stream segments having orientations in 10 degree groupings (i.e. rather like a circular frequency diagram or histogram). The pattern is completely symmetric since in this case lines are regarded as undirected.

Any given section of the rose diagram can be selected, and the number of components (frequency of the sample) will be displayed. Higher frequencies are shown as longer segments, although some software provides the option of using areas to indicate frequency variations rather than lengths on rose diagrams. Furthermore the line elements that contribute to specific directional bands can be identified on the associated map, shown here in red (a form of linking, familiar to most users of GIS packages). The directional data for all lines in this case were treated as end node to end node. The lower rose diagram shows the results when all component parts of polylines are included in the analysis, showing a much more even spread of orientations. As with almost all forms of GIS analysis, it is important to observe and take account of the effects of the region selection process, and as can be seen here, many of the streams are arbitrarily dissected by the boundary, which may impact the interpretation of results.

Similar functionality is provided within ArcGIS, in its Linear Directional Mean facility. Whilst this does provide a choice of metric (Euclidean or Manhattan) and selection of directed or non-directed computation, selection of alternative modes of interpreting polylines (as described above) is not provided so polylines would need to be segmented (or combined) as required prior to analysis.

For a more complete set of tools, including distributional analysis, specialized software such as Oriana from Kovach Computer Services is required. This facilitates analysis of point-like datasets or datasets that have been extracted from a GIS in this form. Input is in tabular form (text, csv, Excel etc.), and graphing and distributional analysis of single or multiple datasets is supported. Oriana may be unable to process large datasets (such as grid outputs in column format), in which case the use of generic programs such as MATLab and the Surfer-related program Grapher may be more effective. Those wishing to use license-free software, there are a number of options, including the Python language charting and graphing suite, Matplotlib. The formulas used by Oriana follow those provided in Fisher (1993) and Mardia and Jupp (1999). Measures are provided which compare the observed distribution to a uniform distribution and/or the distribution due to von Mises that is the circular equivalent of the Normal distribution. Graphically the von Mises distribution with a mean at π is essentially identical to the Normal distribution with the x-axis being in the range [0,2π). The general form of the von Mises distribution is:

where α is the mean direction of the distribution in the range [0,2π), and κ>0 is a shape parameter known as the concentration (effectively equivalent to the standard deviation). I0(κ) is the modified Bessel function of the first kind, of order 0 (see Section 1.4.2, Statistical measures and related formulas). The circular variance of this distribution is:

This modified Bessel function is relatively straightforward to compute for integer orders, although it requires summation of an infinite series:

and

Several types of spatial dataset consist of multiple components that incorporate directional information. For example: wind direction and speed over time for a specific location or set of locations; single grid files of derived gradient data (slope, or gradient magnitude and aspect, or direction of maximum gradient — see further, Section 6.2, Surface Geometry); dual grid files containing either direction and magnitude data (polar information) in separate grids, or Cartesian data (x and y components) in each grid, for example simple optimal path tracking data (see further, Section 4.4.2, Cost distance). Plotting such data may utilize a variety of display methods.

For temporal data a variant of the rose diagram can be utilized that provides stacked histogram functionality in the radial direction in addition to the circular histogram facility provided by a standard rose diagram. The Grapher program, available from the same providers as Surfer, is an example of a package that supports such displays, as do Oriana and Matplotlib, but many statistical graphing packages provide similar functionality. This kind of diagram is illustrated in Figure 4‑77, where the radial extent shows the frequency of measured wind speeds (hourly intervals) separated into speed categories in knots (with zero speed values removed). The black arrow shows the mean vector (direction and speed) for the entire dataset of almost 4500 records.

Figure 4‑77 Two-variable wind rose

clip0138.zoom30

In some instances smoothed contour plots of wind direction are created rather than stacked histograms. This enables multiple plots to be superimposed, as is the case for selected sites in the London Air Quality network. In this case the plots show pollution levels, by type, as the magnitude and average wind direction over a selected time period as the directional component.