Parent topic Previous topic Next topic 

Measure

Definition

Expression(s)

Range

The difference between the maximum and minimum values of a set

Range{xi}=Xn‑X1

Lower quartile (25%), LQ

In an ordered set, 25% of data items are less than or equal to the upper bound of this range. For a continuous distribution the LQ is the set of values from 0% to 25% (0.25) obtained from the cumulative distribution of the values or function. Treatment of cases where n is even and n is odd, and when i runs from 1 to n or 0 to n vary

LQ={X1, … X(n+1)/4}

Upper quartile (75%), UQ

In an ordered set 75% of data items are less than or equal to the upper bound of this range. For a continuous distribution the UQ is the set of values from 75% (0.75) to 100% obtained from the cumulative distribution of the values or function. Treatment of cases where n is even and n is odd, and when i runs from 1 to n or 0 to n vary

UQ={X3(n+1)/4, … Xn}

Inter-quartile range, IQR

The difference between the lower and upper quartile values, hence covering the middle 50% of the distribution. The inter-quartile range can be obtained by taking the median of the dataset, then finding the median of the upper and lower halves of the set. The IQR is then the difference between these two secondary medians

IQR=UQ-LQ

Trim-range, TR, t

The range computed with a specified percentage (proportion), t/2, of the highest and lowest values removed to eliminate outliers and extreme values. For small samples a specific number of observations (e.g. 1) rather than a percentage, may be ignored. In general an equal number, k, of high and low values are removed (if possible)

TRt=Xn(1‑t/2)‑Xnt/2, t[0,1]

TR50%=IQR

Variance, Var, σ2, s2 , μ2

The average squared difference of values in a dataset from their population mean, μ, or from the sample mean (also known as the sample variance where the data are a sample from a larger population). Differences are squared to remove the effect of negative values (the summation would otherwise be 0). The third formula is the frequency form, where frequencies have been standardised, i.e. ∑fi=1. Var is a function of the 2nd moment about the mean. The population variance is often denoted by the symbol μ2 or σ2.

The estimated population variance is often denoted by s2 or by σ2 with a ^ symbol above it

Standard deviation, SD, s or RMSD

The square root of the variance, hence it is the Root Mean Squared Deviation (RMSD). The population standard deviation is often denoted by the symbol σ. SD* shows the estimated population standard deviation (sometimes denoted by σ with a ^ symbol above it or by s)

Standard error of the mean, SE

The estimated standard deviation of the mean values of n samples from the same population. It is simply the sample standard deviation reduced by a factor equal to the square root of the number of samples, n>=1

 

Root mean squared error, RMSE

The standard deviation of samples from a known set of true values, xi*. If xi* are estimated by the mean of sampled values RMSE is equivalent to RMSD

Mean deviation/error, MD or ME

The mean deviation of samples from the known set of true values, xi*

Mean absolute deviation/error, MAD or MAE

The mean absolute deviation of samples from the known set of true values, xi*

Covariance, Cov

Literally the pattern of common (or co-) variation observed in a collection of two (or more) datasets, or partitions of a single dataset. Note that if the two sets are the same the covariance is the same as the variance

Cov(x,x)=Var(x)

Correlation/ product moment or Pearson’s correlation coefficient, r

A measure of the similarity between two (or more) paired datasets. The correlation coefficient is the ratio of the covariance to the product of the standard deviations. If the two datasets are the same or perfectly matched this will give a result=1

r=Cov(x,y)/SDxSDy

Coefficient of variation, CV

The ratio of the standard deviation to the mean, sometime computed as a percentage. If this ratio is close to 1, and the distribution is strongly left skewed, it may suggest the underlying distribution is Exponential. Note, mean values close to 0 may produce unstable results

CV=

Variance mean ratio, VMR

The ratio of the variance to the mean, sometime computed as a percentage. If this ratio is close to 1, and the distribution is unimodal and relates to count data, it may suggest the underlying distribution is Poisson. Note, mean values close to 0 may produce unstable results

VMR=

  Back to Top    Back to Home Parent topic Previous topic Next topic