Parent topic Previous topic Next topic 
  

  Translate this page (Google, opens new window/tab):  

GIS toolsets and related software incorporate few facilities that directly address issues of sampling and sample design. Most commonly the terms sampling and resampling in GIS are used to refer to the frequency with which an existing dataset (raster image or in some cases, vector object) is sampled for simple display or processing purposes (e.g. when overlaying multiple data layers, or computing surface transects). These operations are not directly related to questions of statistical sampling. Two aspects of statistical sampling are explicitly supported within several GIS packages. These are: (i) the selection (sampling) of specific point or grid cell locations within an existing dataset; and (ii) the removal of spatial bias from collected datasets, using a procedure known as declustering. We describe each of these in subsections 5.1.2.1 and 5.1.2.2.

A number of GIS software packages, such as TNTMips, ENVI, Idrisi and GRASS provide tools to assist in the selection of sample points, grid cells or regions of interest (ROI) from input datasets. Often these datasets are remote-sensing images, which may or may not have been subjected to some form of initial classification procedure. Examples of the facilities provided are listed below:

ENVI — takes a raster image file as input and provides three types of sampling, which it describes as:

·         Stratified random sampling, which may be proportionate or disproportionate. In the former case random samples are made from each class or ROI in proportion to the class or region size. Disproportionate sampling essentially requires users to specify the sample size, although the elements will still be randomly selected from each class or ROI

·         Equalized random sampling, which selects an equal number of observations at random from each class or ROI

·         Random sampling, which ignores classes or ROIs and simply selects a predefined number of cells or points at random

The selected points or cells (which may be output as a separate georeferenced list or table) are then used in post-classification analysis — comparing classifications with ground truth in each case (obtained from field survey or other independent data sources).

Idrisi offers similar facilities to ENVI via its SAMPLE function, providing random, systematic or stratified random point (cell) sampling from an input image (grid). The selection process for stratified random simply involves regarding the input image as being constructed from rectangular blocks of cells, and then sampling random cells within these larger blocks.

GRASS provides simple random sampling which may be combined with masking to create forms of stratified random samples. This facility may be somewhat cumbersome to implement. GRASS also provides a facility to generate random sets of cells that are at least D units apart, where D is a user-specified buffer distance. This can result in a more stratified than random sample and it is suggested that D should be derived with reference to observed levels of spatial autocorrelation (cf. Sections 5.5 and 6.7).

TNTMips supports a range of point sampling facilities to be used within vector polygons (e.g. field boundaries, Figure 5‑2). These provide the more familiar form of statistical sampling frameworks that would precede field studies, and have application for research into areas such as soil composition (e.g. for precision farming), groundwater analysis, geological studies, or ecological research. However, they could equally well be applied to urban environments, as a precursor to environmental monitoring or even household surveys. The software provides for a two-stage process: (i) the creation of grids within the polygonal regions to be studied; and (ii) the selection of points within these grid structures. Grids are of user-definable size (edge length or area), shape (triangular, hexagonal, square, linear strips or random rectangles), and orientation (angle of rotation). A series of sample generated grids are shown in Figure 5‑2.

Figure 5‑2 Grid generation examples

A. Square grid

B. Hexagonal grid

C. Random rectangular grid, 60°

Figure 5‑3 Grid sampling examples within hexagonal grid, 1 hectare area

A. Regular (cell center)

B. Systematic (random offset)

C. Random, no center bias

 

Where a regular cell framework has been generated the software then supports creation of sampling points within each cell — single points in this example. The methods supported are regular (center of cell, Figure 5‑3A); systematic unaligned (Figure 5‑3B) in which the first cell point is selected with random x,y coordinates and subsequent points are selected using the same x or y coordinate as the previous cell, but with one of the two coordinates selected at random, alternating on a column-by-column basis); and random (Figure 5‑3C). In the latter two cases a weighting factor is provided that biases selection towards the center of the cell (100= no bias, 1=maximum bias). Selected sample points that nominally fall inside a cell but outside of the polygon boundary are excluded.

With general purpose GIS packages it is straightforward to generate a random, regular or partially randomized point set (within or externally to the GIS), and then to compute the intersection of this set with pre-defined polygons or grid cells. With this approach simple point-sampling schemes may be created, although precise matching to polygon forms, sample numbers or attribute weightings may be difficult. Purpose-built add-ins, such as Hawth’s Tools for ArcGIS, provide a range of tailored sampling facilities. These include: (i) generation of random points, with a range of selection options (including use of raster or polygon reference layers — see Figure 5‑4A, Mississippi, USA); (ii) random selection from an existing feature set (points, lines, polygons — see Figure 5‑4B, 200 radio-activity monitoring sites in Germany. Random sample of 30 (red/large dots)<100 units of radiation and 30 (crosses)>=100 units of radiation); and (iii) conditional point sampling, designed for case-control analysis and similar applications. The latter facilitates a variety of random point generation methods in a region surrounding specified source points.

Random points in the plane may be used as sampling points or in connection with modeling ― for example as part of a Monte Carlo simulation of a probability distribution. Random points can also be generated on a network. Naturally the distribution of such points will be affected by the distribution of the network links in the plane, and may thus appear clustered with respect to the plane (excluding the network). The corollary of this observation is that a clustered point pattern in the plane that is, in fact, a set of points on a network, may actually be a random uniform distribution when shortest network path distances are used rather than Euclidean planar distances. For example, the set of 100 points in Figure 5‑5A (Tripolis, in Greece) appears to be far from random, but in fact this is a random uniform (Poisson) point set on a network, as shown in Figure 5‑5B. This example was generated using the SANET software, which also provides a wide range of tools for analyzing observed point patterns on or almost on networks such as that illustrated (see further, Section 5.4.1).

The term quadrat sampling is applied to schemes in which information on all static point data (e.g. trees, birds’ nests etc.) is collected using an overlay of regular form (e.g. a square or hexagonal grid). Collected data are then aggregated to the level of the quadrat, whose size, orientation and internal variability will all affect the resultant figures (e.g. counts). Very small quadrats will ultimately contain 0 or 1 point objects, whilst very large quadrats will contain almost all the observations and hence will be of little value in understanding the variability of the data over space. An alternative to procedures based on lattices of quadrats is to “drop” quadrats onto the study area at random. Such quadrats may be of any size or shape, but circular forms have the advantage of being directionally invariant. A disadvantage with this approach is that some areas may be repeat sampled unless precautions are taken to exclude areas once sampled.

Figure 5‑4 Random point generation examples — ArcGIS

A. Random sample points, 5 per county

B. Stratified random selection, 30% of each stratum

Figure 5‑5 Random point samples on a network

A. Point set in the plane

B. Random point set on a network

SANET software: Prof A Okabe et al; Network data (Tripolis, Greece): S Sirigos; see also Figure 7‑16

  Back to Top    Back to Home Parent topic Previous topic Next topic