Prediction intervals

Introduction

A prediction interval is a lower and an upper limit within which a future measurement of a sample taken from a population will fall (given a certain probability)
It is based on what has been observed in a sample taken from the given population
Prediction intervals therefor predict the distribution of individual future measurements
Confidence intervals predict the distribution of estimates of parameters in the population (i.e. the mean), which can not inherently be observed (the whole population cannot be included in a sample so as to calculate the true parameter)
Therefor, under the assumption of a normal distribution of a variable in a population, a confidence interval is used to estimate a true mean or standard deviation, or at least, the limits in between which it may be located
- For a given confidence level, c %, if an experiment is repeated an infinite number of times, the true population parameter will be found within the intervals c% of the time
A prediction intervals is simply concenrned with limits within which a future measurement for that variable will fall and expresses this as a percentage probability

A serum total cholesterol value is taken from \(500\) patients after treatment with a new cholesterol-lowering drug
The sample mean is \(180.7\) mg/dL, with a standard deviation of \(19.4\)
The \(95\)%confidence interval for the mean is \(179\) to \(182.4\)
- This means that repeating this experiments an infinite many times, with a random sample taken from the population each time, that in 95% of the cases, the true population parameter would correctly fall within the given limits
The prediction interval for the mean is \(142.5\) to \(218.9\)
- This means that with a confidence of \(95\)% a new measurement taken from a random sample wouyld be within these limits

Set pseudo-random
Take \(500\) samples from a normal distribution with \(\mu = 180\) and \(\sigma^2 = 20\)
Save the dataset as a data.frame

set.seed(123)
df <- data.frame(Cholesterol = round(rnorm(500,mean = 180,sd = 20),digits = 0))

mean(df$Cholesterol)

## [1] 180.694

sd(df$Cholesterol)

## [1] 19.44285

ci <- predict(lm(df$Cholesterol ~ 1),
              interval = "confidence")
ci[1,]

##      fit      lwr      upr 
## 180.6940 178.9856 182.4024

pi <- predict(lm(df$Cholesterol ~ 1),
              interval = "predict")

## Warning in predict.lm(lm(df$Cholesterol ~ 1), interval = "predict"): predictions on current data refer to _future_ responses

pi[1,]

##      fit      lwr      upr 
## 180.6940 142.4559 218.9321