Quadrat analysis of grid datasets

Navigation:  Data Exploration and Spatial Statistics > Grid-based Statistics and Metrics >

Quadrat analysis of grid datasets

Previous pageReturn to chapter overviewNext page

As noted earlier, quadrat sampling is the term used to describe a procedure of sampling and recording point-based data within regularly shaped regions (typically a grid of square cells; see further, Thomas (1977, CATMOG 12). Since grid data are already in this form, it is possible to analyze grids where the contents of grid cells (or blocks of cells) are regarded as counts of point objects (Figure 5‑15). In the example illustrated a 5x5 grid has been used to collect data on point events shown by the symbol x. The event distribution has then been coded as counts in the grid below. Simple statistics may then be computed, such as the mean number of points per cell/cell block (4 in this example), and the variance of this measure (4.59 in this case).

If the distribution of points across the set of grid cells is random, it can be modeled using the Poisson distribution. The Poisson distribution is applicable where events (points in our case) are independent, there are a large number of events (typically 100+), and the probability of an individual event occurring (e.g. a point falling in any particular location) is small and uniform. It is derived as an approximation to the Binomial distribution by applying these conditions.

As noted in Table 1‑3, the Poisson distribution has the form:

where m is the mean and x is the count of events. In our example the (sample) mean is 4, so the individual terms of the distribution may be computed (i.e. for x=0,1,2,3…) and used as a set of “expected” values, under the null hypothesis that the observed frequency distribution is random. The set of n observed frequency values may then be compared to the set of expected values using a simple Chi-square test, to obtain an estimate of the probability that the data reflects a random distribution of events.

Figure 5‑15 Quadrat counts

clip0174.zoom122

3

2

6

2

2

2

4

3

7

3

2

6

6

9

4

5

6

3

5

5

3

7

3

2

0

For the example shown above a frequency analysis of the form shown in Table 5‑6 may be drawn up. The sum row shows the total number of observations made (grid cells) and the value of the χ2 statistic. The degrees of freedom (DF) in this case are 11‑1‑1=9, because there are 11 frequency classes, the total count is known (‑1DF), and the mean (m) has been estimated from the sample (‑1DF). The 5% probability level from tables or computed value of the Chi-square distribution is χ20.05,9=16.9, thus a value of 9.3 is well within the expectation for a random pattern and thus we cannot reject the null hypothesis on the basis of this information.

Table 5‑6 Simple Chi-square frequency table computation

Freq

Obs, O

Exp, E

|O-E|

|OE|2/E

0

1

.5

.5

0.64

1

0

1.8

1.8

1.83

2

6

3.7

2.3

1.49

3

6

4.9

1.1

0.25

4

2

4.9

2.9

1.7

5

3

3.9

.9

0.21

6

4

2.6

1.4

0.75

7

2

1.5

.5

0.18

8

0

.7

.7

0.74

9

1

.3

.7

1.35

10

0

.1

.1

0.13

Sum

25

 

 

χ2=9.3

Aggregating rows to ensure most table cell counts are greater than 5 — e.g. by grouping frequencies into four classes (0,1), (2,3), (4,5) and (6+) — gives a χ2 statistic of 4.6 with 3 degrees of freedom, and χ20.05,2=6, confirming the previous result. This kind of test is supported within the spatstat software, although the authors point to an important weakness of the test as providing only an indication of whether or not the pattern appears to be drawn from an homogenous Poisson process, rather than providing additional information (e.g. in what way does a pattern depart from the null hypothesis?). Furthermore, a close match of a sample to a Poisson frequency distribution does not, of itself, guarantee that the sample is truly random. Consider the case of a sample collected along a transect which has the following observed sequence of values (e.g. counts of a particular insect or plant in each 1 meter section of the transect):

0,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,6,6,6,7,7,8,9

This sequence has frequencies for classes 0,1,2, 3… etc of: 1,2,4,5,5,4,3,2,1,1 and these are the frequencies one would expect from a Poisson distribution with mean 4, as per Table 5‑6. However, the data clearly shows a linear trend in observed values and thus departs strongly from a random pattern. Similar issues may arise in 2D samples in which bilinear or other trends exist (see, for example, Figure 5‑27). The assumption in many analytical techniques is that the underlying probability of occurrence is homogeneously distributed across the sample region — if it can be shown that this is not the case, then simple models will give misleading results. Software such as spatstat does enable non-stationary Poisson process models, such as those with a trend or a variation in process intensity that depends upon a covariate such as soil conditions, to be fitted using maximum likelihood methods. Essentially a stationary process is one that is invariant under Euclidean translation.

The observed and expected frequency distributions can also be compared by computing the maximum absolute difference in their cumulative probability distributions and then applying the Kolmogorov-Smirnov (KS) test statistic. The KS procedure is in some respects a more powerful and flexible approach than the Chi-square test. Testing procedures of this type are rarely supported directly within GIS and related packages, other than specialized tools such as spatstat, where it is provided as the kstest() function. In this instance, the kstest() function may be applied to datasets divided on the basis of some covariate of interest, such as terrain slope. KS procedures are widely supported in statistical packages such as Minitab, SPSS and STATA and may be readily computed programmatically or by use of a generic tool such as Excel. Note that KS tests should really be reserved for expected distributions and covariates that are continuous as there may be difficulties in estimation of the appropriate mean value for the expected distribution and tied values can weaken its applicability.