We will be using the Rocket dataset:

A rocket motor is manufactured by bonding an igniter propellant and a sustainer propellant together inside a metal housing. The shear strength of the bond between the two types of propellants is an important quality characteristic. It is suspected that shear strength is related to the age in weeks of the batch of sustainer propellant. Twenty observations on shear strength and the age of the corresponding batch of propellant have been collected in the dataset.

1. Creating a Simple Linear Model in R

1) Import the data

rocket<-read.csv("https://raw.githubusercontent.com/kitadasmalley/sp21_MATH376LMT/main/data/rocketProp.csv",
                 header=TRUE)

2) Rename the columns so they are easier to work with

colnames(rocket)<-c("obs", "shear", "age")

shear<-rocket$shear
age<-rocket$age

3) Make a plot

library(tidyverse)

ggplot(data=rocket, aes(x=age, y=shear))+
  geom_point()+
  geom_smooth(method="lm", se=FALSE)

4) Run a simple linear regression model on shear against age and store the results

mod<-lm(shear~age, data=rocket)

2. Hypothesis Test for the Slope

Step 1: State the hypotheses

\[H_0: \beta_{1}=0\] \[H_A: \beta_{1}\neq 0\]

Step 2: Reference Distribution

Student t-distribution with \(n-2\) degrees of freedom.

What is \(n\)?

# the dimension is the number of row 
n<-dim(rocket)[1]
n
## [1] 20

Step 3: Test Statistic

\[t=\frac{\hat{\beta}_1-0}{SE(\hat{\beta_1})}\] where \[SE(\hat{\beta_1})=\frac{\sqrt{SS(Res)/(n-2)}}{\sqrt{\sum(x_i-\bar{x})^2}}=\frac{\sqrt{MS(Res)}}{\sqrt{\sum(x_i-\bar{x})^2}}\] Recall: \[SS(Res)=\sum_{i=1}^n(Y_i-\hat{Y}_i)^2\]

Let’s derive these components!

Numerator:

1) Call out the model coefficients
mod$coefficients
## (Intercept)         age 
##  2627.82236   -37.15359
2) The second element of the coefficient vector is the estimated slope, \(\hat{\beta}_1\). Store this and save it to be used later.
beta_1<-mod$coefficients[2]
beta_1
##       age 
## -37.15359

Denominator:

3) In order to estimate the standard error associated with \(\beta_1\) we will need to use the model residuals. Let begin by taking a look at the model residuals.
mod$residuals
##           1           2           3           4           5           6 
##  106.758301  -67.274574  -14.593631   65.088687 -215.977609 -213.604131 
##           7           8           9          10          11          12 
##   48.563824   40.061618    8.729573   37.567141   20.374323  -88.946393 
##          13          14          15          16          17          18 
##   80.817415   71.175153  -45.143358   94.442278    9.499187   37.097528 
##          19          20 
##  100.684823  -75.320154
4) We really wanted the squared residuals, because positive and negative residuals would cancel out. Notice how R performs exponentiation element wise here.
mod$residuals^2
##           1           2           3           4           5           6 
## 11397.33476  4525.86831   212.97408  4236.53718 46646.32750 45626.72480 
##           7           8           9          10          11          12 
##  2358.44497  1604.93327    76.20545  1411.29011   415.11305  7911.46082 
##          13          14          15          16          17          18 
##  6531.45451  5065.90236  2037.92279  8919.34388    90.23455  1376.22657 
##          19          20 
## 10137.43356  5673.12555
5) Compute the sum of squared residuals, \(SS(Res)\).
ss_res<-sum(mod$residuals^2)
ss_res
## [1] 166254.9

6) Use the above to calculate the mean square of the residuals, MS(Res).

ms_res<-ss_res/(n-2)
ms_res
## [1] 9236.381

7) Calculate the standard error for \(\hat{\beta}_1\).

se_b1<-sqrt(ms_res/sum((age-mean(age))^2))
se_b1
## [1] 2.889107
8) Combine the numerator and denominator together to get the test statistic.
test_stat<-beta_1/se_b1
9) Obtain the p-value for a two-sided test.
pt(abs(test_stat), df=n-2, lower.tail=FALSE)*2
##          age 
## 1.643344e-10

What kind of conclusion can we make from this?

3. Confidence Interval for the Slope Parameter, \(\beta_1\)

\[\hat{\beta}_1 \pm t_{df, \alpha/2}^* \times SE(\hat{\beta}_1)\]

1) Find the point estimate

Recall from above:

beta_1<-mod$coefficients[2]
beta_1
##       age 
## -37.15359
2) Find the critical value for 95% confidence from a t-distribution with \(n-2\) degrees of freedom
qt(0.975, df=n-2)
## [1] 2.100922
3) Find the standard error for \(\hat{\beta}_1\)

Recall from above:

se_b1<-sqrt(ms_res/sum((age-mean(age))^2))
se_b1
## [1] 2.889107
4) Put it together and what do you get?…
beta_1+c(-1,1)*qt(0.975, df=n-2)*se_b1
## [1] -43.22338 -31.08380
5) Check your solution with the confidence interval funcation
confint(mod)
##                  2.5 %    97.5 %
## (Intercept) 2534.99540 2720.6493
## age          -43.22338  -31.0838