Once the model is fitted we can use it to make predictions on the value of the response variable \(Y\), for any given value of \(x=x_0\).
For this example we will continue to use the {rocket} data.
rocket<-read.csv("https://raw.githubusercontent.com/kitadasmalley/sp21_MATH376LMT/main/data/rocketProp.csv",
header=TRUE)
colnames(rocket)<-c("obs", "shear", "age")
# fitting and store the SLR model
mod<-lm(shear~age, rocket)
We simple input \(x=x_0\) into the fitted equation \[\hat{\beta}_0+\hat{\beta}_1 x_0\]
# create a new data object using the name of the x variable
# evaluate at age =15
newdata<-data.frame(age=15)
# use the predict function to evaluate the fitted equation at x
predict(mod, newdata)
## 1
## 2070.518
\[\mu_{Y|x_0}=E[Y|x_0]=\beta_0+\beta_1 x_0\]
The form of a \(100(1-\alpha)%\) confidence interval for the mean response is given by
\[(\hat{\beta}_0+\hat{\beta}_1 x_0)\pm t_{df=n-2, \alpha/2}^* \times \sqrt{MS(Res)\times (\frac{1}{n}+\frac{(x_0-\bar{x})^2}{\sum_{i=1}^n (x_i-\bar{x})^2})}\]
# confidence interval for x=15
predict(mod, newdata, interval="confidence")
## fit lwr upr
## 1 2070.518 2024.289 2116.748
A confidence interval for the mean response does not reflecct the bounds in which we are realisitically expecting to observe a single new observation at \(x=x_0\). A single observation has more variablity the average of observations. Therefore, we account for that by changing the error term.
The form of a \(100(1-\alpha)%\) prediction interval for the mean response is given by
\[(\hat{\beta}_0+\hat{\beta}_1 x_0)\pm t_{df=n-2, \alpha/2}^* \times \sqrt{MS(Res)\times (1+\frac{1}{n}+\frac{(x_0-\bar{x})^2}{\sum_{i=1}^n (x_i-\bar{x})^2})}\]
# prediction interval for x=15
predict(mod, newdata, interval="predict")
## fit lwr upr
## 1 2070.518 1863.382 2277.655
confBand<-predict(mod, interval="confidence")
predBand<-predict(mod, interval="predict")
## Warning in predict.lm(mod, interval = "predict"): predictions on current data refer to _future_ responses
# rename cols in predBand to be unique from confBand
colnames(predBand)<-c("fit2", "lwr2", "upr2")
# create a new dataframe with the data and new bands
newDF<-cbind(rocket, confBand, predBand)
# use tidyverse to make graphic of bands
# install.packages(“tidyverse”)
library(tidyverse)
ggplot(newDF, aes(x=age, y=shear))+
geom_point()+
geom_abline(slope=mod$coefficients[2], intercept=mod$coefficients[1],
color="blue", lty=2, lwd=1)+
geom_line(aes(y=lwr), color="green", lty=2, lwd=1)+
geom_line(aes(y=upr), color="green", lty=2, lwd=1)+
geom_line(aes(y=lwr2), color="red", lty=2, lwd=1)+
geom_line(aes(y=upr2), color="red", lty=2, lwd=1)+
theme_bw()
Caution: Be careful of order of operations
age<-rocket$age
shear<-rocket$shear
beta_0<-mod$coefficients[1]
beta_0
## (Intercept)
## 2627.822
beta_1<-mod$coefficients[2]
beta_1
## age
## -37.15359
x_0<-15
pointEst<-beta_0+beta_1*x_0
pointEst
## (Intercept)
## 2070.518
ss_res<-sum((shear-mod$fitted)^2)
n<-length(mod$residuals)
ms_res<-ss_res/(n-2)
seConf<-sqrt(ms_res*(1 / n + (x_0 - mean(age))^2 / (sum((age - mean(age))^2))))
seConf
## [1] 22.00456
pointEst+c(-1,1)*qt(0.975, n-2)*seConf
## [1] 2024.289 2116.748
sePred<-sqrt(ms_res*(1+(1 / n) + (x_0 - mean(age))^2 / (sum((age - mean(age))^2))))
sePred
## [1] 98.59301
pointEst+c(-1,1)*qt(0.975, n-2)*sePred
## [1] 1863.382 2277.655