9 - Confidence Intervals

Department of Environmental Science, AUT

Confidence Intervals: Prerequisites

Confidence Intervals

Content you should have understood before watching this video:

Number 2, ‘Variables’
Number 4, ‘Basic Statistical Metrics’
Number 5, ‘Standard Deviation and Standard Error’
Number 7, ‘Distributions’
Number 8, ‘Quantiles and Probabilities’

Calculating confidence intervals

Confidence Intervals

A 95% confidence interval is an interval, within which the true mean falls 95% of the time if we took multiple samples.
In plain language, it gives us an idea within which range the true mean likely lies.
Confidence intervals are computed by subtracting/adding an error term to the mean:

For a 95% confidence interval and large samples (>30), the error is the 97.5% quantile of the standard normal distribution (1.96) times the standard error, e.g. at a mean of 10, a standard deviation of 1.58, and a sample size of 30:

\[CI = 10 \pm 1.96 \frac{1.58}{\sqrt{30}} = 10 \pm 0.57\]

So the true mean is likely to sit between 9.43 and 10.57 if we took multiple samples

Note that at small sample sizes, things are a bit different, but we won’t go into that here

Calculating confidence intervals

Confidence Intervals

In R (note the small sample size):

x = c(8, 12, 10, 9, 11)
m = mean(x)
n = 5
error = qnorm(p = .975)*sd(x)/sqrt(n)
lower = m - error
upper = m + error
lower
[1] 8.614096
upper
[1] 11.3859

The formula for a 95% CI is simple: \[CI = mean \pm quantile_{97.5} \frac{sd}{\sqrt{n}}\] Adapt the quantile value if you would like to calculate e.g. a 90% CI

An example

Confidence Intervals

You want to provide a confidence interval on post-operative recovery (number of days, following approximately a normal distribution). You have 157 patients, and the average recovery is 83 days, with a standard deviation of 9 days. Indicate a 90% confidence interval for the recovery time.

The most important in a nutshell

Confidence Intervals

A confidence interval does NOT mean that our sample comprises the true mean with a 95% chance, instead, it indicates that the true mean is included 95% of the time if we took multiple samples from a population
Let’s not forget: it assumes normality of the data!
Strictly follow the ‘recipe’ to compute CIs, it is not difficult!