Q1)

Q2)

Q3)

Q4)

col-definitions come from the following paper : https://ia601406.us.archive.org/15/items/in.ernet.dli.2015.225800/2015.225800.Statistical-Methods_text.pdf

head(df, 3)
##   abrasion_loss hardness tensile_strength
## 1           372       45              162
## 2           206       55              233
## 3           175       61              232
mdl <- lm(abrasion_loss ~ tensile_strength + hardness, data=df)
summary(mdl); plot(mdl, which = 1)
## 
## Call:
## lm(formula = abrasion_loss ~ tensile_strength + hardness, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -79.385 -14.608   3.816  19.755  65.981 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      885.1611    61.7516  14.334 3.84e-14 ***
## tensile_strength  -1.3743     0.1943  -7.073 1.32e-07 ***
## hardness          -6.5708     0.5832 -11.267 1.03e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.49 on 27 degrees of freedom
## Multiple R-squared:  0.8402, Adjusted R-squared:  0.8284 
## F-statistic:    71 on 2 and 27 DF,  p-value: 1.767e-11

a)

\[ \theta = \mathbb{E}(Y | \text{tensile_strength} = 200, \text{hardness} = 70) \]

And we estimate \(\theta\) ( \(y = X\beta + \epsilon\) ) with \(\hat{\theta}\) ( \(\hat{y} = X\hat{\beta}\) ).

# Least Squares estimate : 
predict(mdl, newdata = 
          data.frame(
            tensile_strength = c(200), 
            hardness = c(70))
        )
##        1 
## 150.3407

and recall : \(\hat{\beta} \sim N(\beta,\sigma^2(X^{T}X)^{-1})\)

Let \(\vec{x}_o = (1 \ 200 \ 70)^{T}\)

Therefore : \(\vec{x}_o^{T}\hat{\beta} \sim N(\theta, \vec{x}_o^{T} \sigma^2(X^{T}X)^{-1}\vec{x}_o)\)

Except we have data and not the true \(\sigma^2\), so we approximate since we have unknown variance. We do this using the t-dist with n - (p + 1) degrees of freedom. We estimate \(\sigma^2\) with \(\hat{\sigma}^2\) which is \(SS_{\text{Err}}/\text{df} = \frac{\sum (y - \hat{y})^2}{\text{df}}\)

All this to say, \(\hat{\theta}\) follows a t-dist and \(\theta\) follows a normal. Where t-dist is our approx and normal is the underlying dist – as we know, t-dist tends towards normal for large \(n\)

Q5)