|
|
|
Measure |
Definition |
Expression(s) |
|
Range |
The difference between the maximum and minimum values of a set |
Range{xi}=Xn‑X1 |
|
Lower quartile (25%), LQ |
In an ordered set, 25% of data items are less than or equal to the upper bound of this range. For a continuous distribution the LQ is the set of values from 0% to 25% (0.25) obtained from the cumulative distribution of the values or function. Treatment of cases where n is even and n is odd, and when i runs from 1 to n or 0 to n vary |
LQ={X1, … X(n+1)/4} |
|
Upper quartile (75%), UQ |
In an ordered set 75% of data items are less than or equal to the upper bound of this range. For a continuous distribution the UQ is the set of values from 75% (0.75) to 100% obtained from the cumulative distribution of the values or function. Treatment of cases where n is even and n is odd, and when i runs from 1 to n or 0 to n vary |
UQ={X3(n+1)/4, … Xn} |
|
Inter-quartile range, IQR |
The difference between the lower and upper quartile values, hence covering the middle 50% of the distribution. The inter-quartile range can be obtained by taking the median of the dataset, then finding the median of the upper and lower halves of the set. The IQR is then the difference between these two secondary medians |
IQR=UQ-LQ |
|
Trim-range, TR, t |
The range computed with a specified percentage (proportion), t/2, of the highest and lowest values removed to eliminate outliers and extreme values. For small samples a specific number of observations (e.g. 1) rather than a percentage, may be ignored. In general an equal number, k, of high and low values are removed (if possible) |
TRt=Xn(1‑t/2)‑Xnt/2, t TR50%=IQR |
|
Variance, Var, σ2, s2 , μ2 |
The average squared difference of values in a dataset from their population mean, μ, or from the sample mean (also known as the sample variance where the data are a sample from a larger population). Differences are squared to remove the effect of negative values (the summation would otherwise be 0). The third formula is the frequency form, where frequencies have been standardised, i.e. ∑fi=1. Var is a function of the 2nd moment about the mean. The population variance is often denoted by the symbol μ2 or σ2. The estimated population variance is often denoted by s2 or by σ2 with a ^ symbol above it |
|
|
Standard deviation, SD, s or RMSD |
The square root of the variance, hence it is the Root Mean Squared Deviation (RMSD). The population standard deviation is often denoted by the symbol σ. SD* shows the estimated population standard deviation (sometimes denoted by σ with a ^ symbol above it or by s) |
|
|
Standard error of the mean, SE |
The estimated standard deviation of the mean values of n samples from the same population. It is simply the sample standard deviation reduced by a factor equal to the square root of the number of samples, n>=1 |
|
|
Root mean squared error, RMSE |
The standard deviation of samples from a known set of true values, xi*. If xi* are estimated by the mean of sampled values RMSE is equivalent to RMSD |
|
|
Mean deviation/error, MD or ME |
The mean deviation of samples from the known set of true values, xi* |
|
|
Mean absolute deviation/error, MAD or MAE |
The mean absolute deviation of samples from the known set of true values, xi* |
|
|
Covariance, Cov |
Literally the pattern of common (or co-) variation observed in a collection of two (or more) datasets, or partitions of a single dataset. Note that if the two sets are the same the covariance is the same as the variance |
Cov(x,x)=Var(x) |
|
Correlation/ product moment or Pearson’s correlation coefficient, r |
A measure of the similarity between two (or more) paired datasets. The correlation coefficient is the ratio of the covariance to the product of the standard deviations. If the two datasets are the same or perfectly matched this will give a result=1 |
r=Cov(x,y)/SDxSDy
|
|
Coefficient of variation, CV |
The ratio of the standard deviation to the mean, sometime computed as a percentage. If this ratio is close to 1, and the distribution is strongly left skewed, it may suggest the underlying distribution is Exponential. Note, mean values close to 0 may produce unstable results |
CV= |
|
Variance mean ratio, VMR |
The ratio of the variance to the mean, sometime computed as a percentage. If this ratio is close to 1, and the distribution is unimodal and relates to count data, it may suggest the underlying distribution is Poisson. Note, mean values close to 0 may produce unstable results |
VMR= |
|
|