Table 1‑3, below, provides a list of common measures (univariate statistics) applied to datasets, and associated formulas for calculating the measure from a sample dataset in summation form (rather than integral form) where necessary. In some instances these formulas are adjusted to provide estimates of the population values rather than those obtained from the sample of data one is working on.
Many of the measures can be extended to twodimensional forms in a very straightforward manner, and thus they provide the basis for numerous standard formulas in spatial statistics. For a number of univariate statistics (variance, skewness, kurtosis) we refer to the notion of (estimated) moments about the mean. These are computations of the form
When r=1 this summation will be 0, since this is just the difference of all values from the mean. For values of r>1 the expression provides measures that are useful for describing the shape (spread, skewness, peakedness) of a distribution, and simple variations on the formula are used to define the correlation between two or more datasets (the product moment correlation). The term moment in this context comes from physics, i.e. like ‘momentum’ and ‘moment of inertia’, and in a spatial (2D) context provides the basis for the definition of a centroid — the center of mass or center of gravity of an object, such as a polygon (see further, Section 4.2.5, Centroids and centers).
Table 1‑3 Common formulas and statistical measures
This table of measures has been divided into 9 subsections for ease of use. Each is provided with its own subheading:
•  Counts and specific values 
•  Measures of centrality 
•  Measures of spread 
•  Measures of distribution shape 
•  Measures of complexity and dimensionality 
•  Common distributions 
•  Data transforms and back transforms 
•  Selected functions 
•  Matrix expressions 
For more details on these topics, see the relevant topic within the StatsRef website.
Counts and specific values
Measure 
Definition 
Expression(s) 

Count 
The number of data values in a set 
Count({xi})=n 
Top m, Bottom m 
The set of the largest (smallest) m values from a set. May be generated via an SQL command 
Topm{xi}={Xn‑m+1,…Xn‑1,Xn}; Botm{xi}={X1,X2,… Xm}; 
Variety 
The number of distinct i.e. different data values in a set. Some packages refer to the variety as diversity, which should not be confused with information theoretic and other diversity measures 

Majority 
The most common i.e. most frequent data values in a set. Similar to mode (see below), but often applied to raster datasets at the neighborhood or zonal level. For general datasets the term should only be applied to cases where a given class is 50%+ of the total 

Minority 
The least common i.e. least frequently occurring data values in a set. Often applied to raster datasets at the neighborhood or zonal level 

Maximum, Max 
The maximum value of a set of values. May not be unique 
Max{xi}=Xn 
Minimum, Min 
The minimum value of a set of values. May not be unique 
Min{xi}=X1 
Sum 
The sum of a set of data values 
Measures of centrality
Measure 
Definition 
Expression(s) 

Mean (arithmetic) 
The arithmetic average of a set of data values (also known as the sample mean where the data are a sample from a larger population). Note that if the set {fi} are regarded as weights rather than frequencies the result is known as the weighted mean. Other mean values include the geometric and harmonic mean. The population mean is often denoted by the symbol μ. In many instances the sample mean is the best (unbiased) estimate of the population mean and is sometimes denoted by μ with a ^ symbol above it) or as a variable such as x with a bar above it. 

Mean (harmonic) 
The harmonic mean, H, is the mean of the reciprocals of the data values, which is then adjusted by taking the reciprocal of the result. The harmonic mean is less than or equal to the geometric mean, which is less than or equal to the arithmetic mean 

Mean (geometric) 
The geometric mean, G, is the mean defined by taking the products of the data values and then adjusting the value by taking the nth root of the result. The geometric mean is greater than or equal to the harmonic mean and is less than or equal to the arithmetic mean 
hence 
Mean (power) 
The general (limit) expression for mean values. Values for p give the following means: p=1 arithmetic; p=2 root mean square; p=‑1 harmonic. Limit values for p (i.e. as p tends to these values) give the following means: p=0 geometric; p=‑∞ minimum; p=∞ maximum 

Trimmean, TM, t, Olympic mean 
The mean value computed with a specified percentage (proportion), t/2, of values removed from each tail to eliminate the highest and lowest outliers and extreme values. For small samples a specific number of observations (e.g. 1) rather than a percentage, may be ignored. In general an equal number, k, of high and low values should be removed and the number of observations summed should equal n(1‑t) expressed as an integer. This variant is sometimes described as the Olympic mean, as is used in scoring Olympic gymnastics for example 
t∈[0,1] 
Mode 
The most common or frequently occurring value in a set. Where a set has one dominant value or range of values it is said to be unimodal; if there are several commonly occurring values or ranges it is described as multimodal. Note that arithmetic mean‑mode≈3 (arithmetic mean‑median) for many unimodal distributions 

Median, Med 
The middle value in an ordered set of data if the set contains an odd number of values, or the average of the two middle values if the set contains an even number of values. For a continuous distribution the median is the 50% point (0.5) obtained from the cumulative distribution of the values or function 
Med{xi}=X(n+1)/2 ; n odd Med{xi}=(Xn/2+Xn/2+1)/2; n even 
Midrange, MR 
The middle value of the Range 
MR{xi}=Range/2 
Root mean square (RMS) 
The root of the mean of squared data values. Squaring removes negative values 
Measures of spread
Measure 
Definition 
Expression(s) 

Range 
The difference between the maximum and minimum values of a set 
Range{xi}=Xn‑X1 
Lower quartile (25%), LQ 
In an ordered set, 25% of data items are less than or equal to the upper bound of this range. For a continuous distribution the LQ is the set of values from 0% to 25% (0.25) obtained from the cumulative distribution of the values or function. Treatment of cases where n is even and n is odd, and when i runs from 1 to n or 0 to n vary 
LQ={X1, … X(n+1)/4} 
Upper quartile (75%), UQ 
In an ordered set 75% of data items are less than or equal to the upper bound of this range. For a continuous distribution the UQ is the set of values from 75% (0.75) to 100% obtained from the cumulative distribution of the values or function. Treatment of cases where n is even and n is odd, and when i runs from 1 to n or 0 to n vary 
UQ={X3(n+1)/4, … Xn} 
Interquartile range, IQR 
The difference between the lower and upper quartile values, hence covering the middle 50% of the distribution. The interquartile range can be obtained by taking the median of the dataset, then finding the median of the upper and lower halves of the set. The IQR is then the difference between these two secondary medians 
IQR=UQLQ 
Trimrange, TR, t 
The range computed with a specified percentage (proportion), t/2, of the highest and lowest values removed to eliminate outliers and extreme values. For small samples a specific number of observations (e.g. 1) rather than a percentage, may be ignored. In general an equal number, k, of high and low values are removed (if possible) 
TRt=Xn(1‑t/2)‑Xnt/2, t∈[0,1] TR50%=IQR 
Variance, Var, σ2, s2 , μ2 
The average squared difference of values in a dataset from their population mean, μ, or from the sample mean (also known as the sample variance where the data are a sample from a larger population). Differences are squared to remove the effect of negative values (the summation would otherwise be 0). The third formula is the frequency form, where frequencies have been standardized, i.e. ∑fi=1. Var is a function of the 2nd moment about the mean. The population variance is often denoted by the symbol μ2 or σ2. The estimated population variance is often denoted by s2 or by σ2 with a ^ symbol above it 

Standard deviation, SD, s or RMSD 
The square root of the variance, hence it is the Root Mean Squared Deviation (RMSD). The population standard deviation is often denoted by the symbol σ. SD* shows the estimated population standard deviation (sometimes denoted by σ with a ^ symbol above it or by s) 

Standard error of the mean, SE 
The estimated standard deviation of the mean values of n samples from the same population. It is simply the sample standard deviation reduced by a factor equal to the square root of the number of samples, n>=1 

Root mean squared error, RMSE 
The standard deviation of samples from a known set of true values, xi*. If xi* are estimated by the mean of sampled values RMSE is equivalent to RMSD 

Mean deviation/error, MD or ME 
The mean deviation of samples from the known set of true values, xi* 

Mean absolute deviation/error, MAD or MAE 
The mean absolute deviation of samples from the known set of true values, xi* 

Covariance, Cov 
Literally the pattern of common (or co) variation observed in a collection of two (or more) datasets, or partitions of a single dataset. Note that if the two sets are the same the covariance is the same as the variance 
Cov(x,x)=Var(x) 
Correlation/ product moment or Pearson’s correlation coefficient, r 
A measure of the similarity between two (or more) paired datasets. The correlation coefficient is the ratio of the covariance to the product of the standard deviations. If the two datasets are the same or perfectly matched this will give a result=1 
r=Cov(x,y)/SDxSDy 
Coefficient of variation, CV 
The ratio of the standard deviation to the mean, sometime computed as a percentage. If this ratio is close to 1, and the distribution is strongly left skewed, it may suggest the underlying distribution is Exponential. Note, mean values close to 0 may produce unstable results 

Variance mean ratio, VMR 
The ratio of the variance to the mean, sometime computed as a percentage. If this ratio is close to 1, and the distribution is unimodal and relates to count data, it may suggest the underlying distribution is Poisson. Note, mean values close to 0 may produce unstable results 
Measures of distribution shape
Measure 
Definition 
Expression(s) 

Skewness, α3 
If a frequency distribution is unimodal and symmetric about the mean it has a skewness of 0. Values greater than 0 suggest skewness of a unimodal distribution to the right, whilst values less than 0 indicate skewness to the left. A function of the 3rd moment about the mean (denoted by α3 with a ^ symbol above it for the sample skewness) 

Kurtosis, α4 
A measure of the peakedness of a frequency distribution. More pointy distributions tend to have high kurtosis values. A function of the 4th moment about the mean. It is customary to subtract 3 from the raw kurtosis value (which is the kurtosis of the Normal distribution) to give a figure relative to the Normal (denoted by α4 with a ^ symbol above it for the sample kurtosis) 
where , 
Measures of complexity and dimensionality
Measure 
Definition 
Expression(s) 

Information statistic (Entropy), I (Shannon’s) 
A measure of the amount of pattern, disorder or information, in a set {xi} where pi is the proportion of events or values occurring in the ith class or range. Note that if pi=0 then pilog2(pi) is 0. I takes values in the range [0,log2(k)]. The lower value means all data falls into 1 category, whilst the upper means all data are evenly spread 

Information statistic (Diversity), Div 
Shannon’s entropy statistic (see above) standardized by the number of classes, k, to give a range of values from 0 to 1 

Dimension (topological), DT 
Broadly, the number of (intrinsic) coordinates needed to refer to a single point anywhere on the object. The dimension of a point=0, a rectifiable line=1, a surface=2 and a solid=3. See text for fuller explanation. The value 2.5 (often denoted 2.5D) is used in GIS to denote a planar region over which a singlevalued attribute has been defined at each point (e.g. height). In mathematics topological dimension is now equated to a definition similar to cover dimension (see below) 
DT=0,1,2,3,… 
Dimension (capacity, cover or fractal), DC 
Let N(h) represent the number of small elements of edge length h required to cover an object. For a line, length 1, each element has length 1/h. For a plane surface each element (small square of side length 1/h) has area 1/h2, and for a volume, each element is a cube with volume 1/h3. More generally N(h)=1/hD, where D is the topological dimension, so N(h)= h‑D and thus log(N(h))=‑Dlog(h) and so Dc=‑log(N(h))/log(h). Dc may be fractional, in which case the term fractal is used 
Dc>=0 
Common distributions
Measure 
Definition 
Expression(s) 

Uniform (continuous) 
All values in the range are equally likely. Mean=a/2, variance=a2/12. Here we use f(x) to denote the probability distribution associated with continuous valued variables x, also described as a probability density function 

Binomial (discrete) 
The terms of the Binomial give the probability of x successes out of n trials, for example 3 heads in 10 tosses of a coin, where p=probability of success and q=1‑p=probability of failure. Mean, m=np, variance=npq. Here we use p(x) to denote the probability distribution associated with discrete valued variables x 

Poisson (discrete) 
An approximation to the Binomial when p is very small and n is large (>100), but the mean m=np is fixed and finite (usually not large). Mean=variance=m 

Normal (continuous) 
The distribution of a measurement, x, that is subject to a large number of independent, random, additive errors. The Normal distribution may also be derived as an approximation to the Binomial when p is not small (e.g. p≈1/2) and n is large. If μ=mean and σ=standard deviation, we write N(μ,σ) as the Normal distribution with these parameters. The Normal or ztransform z=(x‑μ)/σ changes (normalizes) the distribution so that it has a zero mean and unit variance, N(0,1). The distribution of n mean values of independent random variables drawn from any underlying distribution is also Normal (Central Limit Theorem) 
Data transforms and back transforms
Measure 
Definition 
Expression(s) 

Log 
If the frequency distribution for a dataset is broadly unimodal and leftskewed, the natural log transform (logarithms base e) will adjust the pattern to make it more symmetric/similar to a Normal distribution. For variates whose values may range from 0 upwards a value of 1 is often added to the transform. Back transform with the exp() function 
z=ln(x) or z=ln(x+1) n.b. ln(x)=loge(x)=log10(x)*log10(e) x=exp(z) or x=exp(z)‑1 
Square root 
A transform that may adjust the dataset to make it more similar to a Normal distribution. For variates whose values may range from 0 upwards a value of 1 is often added to the transform. For 0<=x<=1 (e.g. rate data) the combined form of the transform is often used, and is known as the FreemanTukey (FT) transform 

Logit 
Often used to transform binary response data, such as survival/nonsurvival or present/absent, to provide a continuous value in the range (‑∞,∞), where p is the proportion of the sample that is 1 (or 0). The inverse or backtransform is shown as p in terms of z. This transform avoids concentration of values at the ends of the range. For samples where proportions p may take the values 0 or 1 a modified form of the transform may be used. This is typically achieved by adding 1/2n to the numerator and denominator, where n is the sample size. Often used to correct Sshaped (logistic) relationships between response and explanatory variables 

Normal, ztransform 
This transform normalizes or standardizes the distribution so that it has a zero mean and unit variance. If {xi} is a set of n sample mean values from any probability distribution with mean μ and variance σ2 then the ztransform shown here as z2 will be distributed N(0,1) for large n (Central Limit Theorem). The divisor in this instance is the standard error. In both instances the standard deviation must be nonzero 

BoxCox, power transforms 
A family of transforms defined for positive data values only, that often can make datasets more Normal; k is a parameter. The inverse or backtransform is also shown as x in terms of z 

Angular transforms (FreemanTukey) 
A transform for proportions, p, designed to spread the set of values near the end of the range. k is typically 0.5. Often used to correct Sshaped relationships between response and explanatory variables. If p=x/n then the FreemanTukey (FT) version of this transform is the averaged version shown. This is a variancestabilizing transform 
Selected functions
Measure 
Definition 
Expression(s) 

Bessel functions of the first kind 
Bessel functions occur as the solution to specific differential equations. They are described with reference to a parameter known as the order, shown as a subscript. For nonnegative real orders Bessel functions can be represented as an infinite series. Order 0 expansions are shown here for standard (J) and modified (I) Bessel functions. Usage in spatial analysis arises in connection with directional statistics and spline curve fitting. See the Mathworld website entry for more details 
and 
Exponential integral function, E1(x) 
A definite integral function. Used in association with spline curve fitting. See the Mathworld website entry for more details 

Gamma function, Γ 
A widely used definite integral function. For integer values of x: Γ(x)=(x‑1)! and Γ(x/2)=(x/2‑1)! so Γ(3/2)=(1/2)!/2=(√π)/2 See the Mathworld website entry for more details 
Matrix expressions
Measure 
Definition 
Expression(s) 

Identity 
A matrix with diagonal elements 1 and offdiagonal elements 0 

Determinant 
Determinants are only defined for square matrices. Let A be an n by n matrix with elements {aij}. The matrix Mij here is a subset of A known as the minor, formed by eliminating row i and column j from A. An n by n matrix, A, with Det=0 is described as singular, and such a matrix has no inverse. If Det(A) is very close to 0 it is described as illconditioned 
A, Det(A) 
Inverse 
The matrix equivalent of division in conventional algebra. For a matrix, A, to be invertible its determinant must be nonzero, and ideally not very close to zero. A matrix that has an inverse is by definition nonsingular. A symmetric realvalued matrix is positive definite if all its eigenvalues are positive, whereas a positive semidefinite matrix allows for some eigenvalues to be 0. A matrix, A, that is invertible satisfies the relation AA‑1=I 
A‑1 
Transpose 
A matrix operation in which the rows and columns are transposed, i.e. in which elements aij are swapped with aji for all i,j. The inverse of a transposed matrix is the same as the transpose of the matrix inverse 
AT or A′ (AT)–1=(A‑1)T 
Symmetric 
A matrix in which element aij=aji for all i,j 
A=AT 
Trace 
The sum of the diagonal elements of a matrix, aii — the sum of the eigenvalues of a matrix equals its trace 
Tr(A) 
Eigenvalue, Eigenvector 
If A is a realvalued k by k square matrix and x is a nonzero realvalued vector, then a scalar λ that satisfies the equation shown in the adjacent column is known as an eigenvalue of A and x is an eigenvector of A. There are k eigenvalues of A, each with a corresponding eigenvector. The matrix A can be decomposed into three parts, as shown, where E is a matrix of its eigenvectors and D is a diagonal matrix of its eigenvalues 
(A‑λI)x=0 A=EDE‑1 (diagonalization) 