Department of Environmental Science, AUT

The standard deviation and the standard error: Prerequisites

Standard Deviation and Standard Error

Content you should have understood before watching this video:

  • Number 1, ‘Variables’
  • Number 2, ‘Variation’
  • Number 3, ‘Measuring Variation’

Standard deviation

Standard Deviation and Standard Error
  • The variance has one problem: it is measured in units squared
  • This isn’t a very meaningful metric so we take the square root value
  • This is the standard deviation (\(s\), sometimes \(sd\)):

\[s = \sqrt{\frac{\sum(x_i-\bar{x})^2}{n-1}} = \sqrt{\frac{5.2}{4}} = 1.14\] NB: mostly, the population standard deviation is called \(s\), while the sample standard deviation is called \(\sigma\)

In R:

friends = c(1, 2, 3, 3, 4)
sd(friends)
[1] 1.140175

Sample standard deviation: why divide by n-1?

Standard Deviation and Standard Error

Standard deviation and standard error

Standard Deviation and Standard Error

Consider this example:

x = c(10, 20)
y = c(5, 18, 22, 13, 9, 23)
sd(x)
[1] 7.071068
sd(y)
[1] 7.238784

Standard deviation and standard error

Standard Deviation and Standard Error

So: the standard deviation does not indicate how well we can estimate the mean, for this purpose, we use the standard error of the mean (note sd = s = standard deviation): \[s.e. = \frac{s}{\sqrt{n}}\]

sd(x)/sqrt(2)
[1] 5
sd(y)/sqrt(6)
[1] 2.955221

Standard deviation and standard error

Standard Deviation and Standard Error

Important to remember

Standard Deviation and Standard Error
  • The variance and standard deviation represent the same thing:
    • The spread in a variable, how much variability there is
    • The higher the value, the higher the variability
    • With increasing sample size, we achieve a more precise estimate for the variability
  • The standard error
    • measures how well we estimate the mean of the population
    • decreases with the number of observations because we gain more confidence in the estimate of the mean

Calculating the standard error in R:

friends <- c(1, 2, 3, 3, 4)
sd(friends)/sqrt(5) #or:
[1] 0.509902
sd(friends)/sqrt(length(friends)) #which of the two is better?
[1] 0.509902

The most important in a nutshell

Standard Deviation and Standard Error
  • We use the standard error if we want to show how well we can estimate the mean, so most ‘error bars’ will be s.e., not s.d.
  • standard deviation and variance characterise the spread of a variable
  • In any case, always specify what your error bars mean!