I have noticed the use of “standard error” creeping into the analyses and reporting across a range of different datasets and studies. In almost all cases, the use of standard error has been erroneous and is, in fact, misleading. Below is a discussion to higlight this point.
Firstly, a crucial point that you must always remember is that a mean value by itself is meaningless — to make sense of a mean value, some indication of the variation around the mean is required!
We use this variation around means (or medians) as a way to preliminarily assess whether such values are significantly different. At it simplest, two populations of data with, for example, mean values of 900 and 1000 would be judged to be very different if the range of values in the two populations do not overlap. i.e.
But what if there is a substantial overlap? e.g.
It is difficult to make an assessment as to whether these two mean values are significantly different or not (remember, significance is usually verified using a statistical test). Thus, one needs a different statistic; when data are normally distibruted this statistic is variance. The variance is the average degree to which the each data point differs from the mean. The greater the spread of values in a population of data, the greater the variance will be. Variance is caculated by:
However, variance is based on a squared value and is therefore no longer in the same units as the mean value. Variance is predominantly used in statistical theory, whereas biologists use another statistic: standard deviation. Standard deviation is the preferred statistic for reporting the variation around a mean as it is reported in the same units as the mean, thus is far more intuitive for interpretation. The calculation of standard deviation is straightforward - it is simply the square-root of variance.
So, what then is “standard error”? This statistic is calculated by dividing the standard deviation value by the (square-root of the) sample size: i.e. \[se=\frac{sd}{\sqrt{n}} \]
Thus, increases in your sample size will reduce the value of the standard error. As a quick example, data sampled from a simulated distribution is provided below…
n | Mean | SD | SE |
---|---|---|---|
10 | 1,056.7 | 104.7 | 33.1 |
25 | 1,049.7 | 79.0 | 15.8 |
50 | 1,004.4 | 92.6 | 13.1 |
100 | 990.4 | 100.8 | 10.1 |
1,000 | 996.7 | 98.1 | 3.1 |
10,000 | 998.7 | 99.9 | 1.0 |
100,000 | 1,000.1 | 99.9 | 0.3 |
1,000,000 | 1,000.1 | 99.9 | 0.1 |
You will notice in Table 1 that as the sample size increases (n), the estimated (or sampled) mean value closes in on the mean value used to simulate the data (i.e. 1000) — also known as the ‘true’ mean. The same applies to the standard deviation. However, standard error gets smaller and smaller…
What is going on here?
The full name for standard error is standard error of the mean. Thus, standard error is an estimate of the accuracy of the mean value! With a low sample size, the standard error value is large — and the mean value some way off the true mean (the one used to estimate the data). But as the sample size increases, the value of the standard error declines as does the difference between the estimated mean and the true mean.
Thus, the standard error should only ever be reported when you wish to know the accuracy of the mean value. It does not (!) provide any indication of the variance present in the dataset. I said above that a mean value by itself is meaningless. The standard error only tells you a little bit more about the mean value, and thus the mean value remains meaningless. In almost all cases, you should report a statistic that gives some indication of the variation around the mean (e.g. standard deviation).
Just to be absolutely clear — there is no way you can report mean values with standard error bars and make judgement calls on whether different groups (or ‘populations’) within the data are significantly different from one another.