Content you should have understood before watching this video:
- Number 2, ‘Variables’
- Number 4, ‘Basic Statistical Metrics’
- Number 5, ‘Standard Deviation and Standard Error’
- Number 7, ‘Distributions’
- Number 8, ‘Quantiles and Probabilities’
Calculating confidence intervals
A 95% confidence interval is an interval, within which the true mean falls 95% of the time if we took multiple samples.
In plain language, it gives us an idea within which range the true mean likely lies.
Confidence intervals are computed by subtracting/adding an error term to the mean:
For a 95% confidence interval and large samples (>30), the error is the 97.5% quantile of the standard normal distribution (1.96) times the standard error, e.g. at a mean of 10, a standard deviation of 1.58, and a sample size of 30:
\[CI = 10 \pm 1.96 \frac{1.58}{\sqrt{30}} = 10 \pm 0.57\]
So the true mean is likely to sit between 9.43 and 10.57 if we took multiple samples
Note that at small sample sizes, things are a bit different, but we won’t go into that here
Calculating confidence intervals
In R (note the small sample size):
x = c(8, 12, 10, 9, 11) m = mean(x) n = 5 error = qnorm(p = .975)*sd(x)/sqrt(n) lower = m - error upper = m + error lower [1] 8.614096 upper [1] 11.3859
The formula for a 95% CI is simple: \[CI = mean \pm quantile_{97.5} \frac{sd}{\sqrt{n}}\] Adapt the quantile value if you would like to calculate e.g. a 90% CI
An example
You want to provide a confidence interval on post-operative recovery (number of days, following approximately a normal distribution). You have 157 patients, and the average recovery is 83 days, with a standard deviation of 9 days. Indicate a 90% confidence interval for the recovery time.
The most important in a nutshell
- A confidence interval does NOT mean that our sample comprises the true mean with a 95% chance, instead, it indicates that the true mean is included 95% of the time if we took multiple samples from a population
- Let’s not forget: it assumes normality of the data!
- Strictly follow the ‘recipe’ to compute CIs, it is not difficult!