Lets define :
\[ H_o : \beta_1 = \beta_2 \\ H_a : \beta_1 \ne \beta_2 \\ \text{or,} \\ H_o : A\vec{\beta}=\vec{0} \\ H_o : A\vec{\beta}\ne\vec{0} \]
I have outlined the steps to this problem above.
# generate key objects :
invXTX <- matrix(
c(1, 0.25, 0.25,
0.25, 0.5, -0.25,
0.25, -0.25, 2),
nrow = 3,
byrow = TRUE
)
A <- matrix(c(0, 1, -1))
bohat <- matrix(c(10, 12, 15))
s2 <- 2
n <- 15
alpha <- .05
df <- n - ncol(invXTX)
Calculations :
V_Ab <- (t(A) %*% invXTX %*% A)*s2
Ab <- t(A) %*% bohat
T_stat <- Ab/sqrt(V_Ab)
T_crit <- qt(1-alpha/2, df)
Conclusion based on test statistic :
T_stat >= T_crit # Reject Alternative Hyp
## [,1]
## [1,] FALSE
as we can see our difference isnt extreme enough to go against our level of prefered confidence.
Generate confidence interval and see if 0 is inside of the range :
M_err <- T_crit * sqrt(V_Ab)
R <- c(Ab - M_err,Ab + M_err); R # 0 is in range
## [1] -8.33698 2.33698
Therefore, we reject the alternative. Both have the same conclusion, good.
the key difference is now we are asking about multiple Qs, we are asking a question about the model as a whole. So we use F-test.
# calc sun of sqr
SST <- 120
SSE <- s2 * df
SSR <- SST - SSE
# calc df's
df_err <- df
df_R <- 2
df_T <- n - 1
Calc F-stat and crit then test hyp :
F_stat <- (SSR/df_R)/(SSE/df_err)
F_crit <- qf(1 - alpha, df1 = df_R, df2 = df_err)
F_stat >= F_crit # gotta extreme enough data
## [1] TRUE
as we can see, we should reject Null hyp, since it looks like atleast one of our reg-coef are sign.
I have outlined the logic of part a and b above. Part a is answered within the photo.
notice b3 < 0, indicating C1 inc yield. We will test to see if this diff is real and sign :
n <- 30
T_s <- -0.32/.36
T_c <- qt(1-alpha/2, n - (3+1))
# hyp tst :
T_s >= T_c # we dont have stat sign result, it appears minimal diff
## [1] FALSE
conf interval :
M_err <- T_c * .36
-0.32 + c(-M_err, M_err) # as we can see, 0 is in this!
## [1] -1.0599906 0.4199906
dat <- read.table(text = "
y x1 x2 x3
25.5 1.74 5.30 10.8
31.2 6.32 5.42 9.4
25.9 6.22 8.41 7.2
38.4 10.52 4.63 8.5
18.4 1.19 11.60 9.4
26.7 1.22 5.85 9.9
26.4 4.10 6.62 8.0
25.9 6.32 8.72 9.1
32.0 4.08 4.42 8.7
25.2 4.15 7.60 9.2
39.7 10.15 4.83 9.4
35.9 1.72 3.12 7.6
26.5 1.70 5.30 8.2
", header = TRUE)
# Change col names :
dat$Survival_pct <- dat$y
dat$wt1 <- dat$x1
dat$wt2 <- dat$x2
dat$wt3 <- dat$x3
dat <- dat[,-(1:4)]
# Design matrix :
X <- cbind(int = rep(1, nrow(dat)), dat)
X <- X[,-2] # take out resp var
# Calc : XTX
XTX <- t(as.matrix(X))%*%as.matrix(X); XTX
## int wt1 wt2 wt3
## int 13.00 59.4300 81.8200 115.400
## wt1 59.43 394.7255 360.6621 522.078
## wt2 81.82 360.6621 576.7264 728.310
## wt3 115.40 522.0780 728.3100 1035.960
# Calc : (XTX)^-1
invXTX <- solve(XTX); invXTX
## int wt1 wt2 wt3
## int 8.06479464 -0.082592705 -0.094195115 -0.790526876
## wt1 -0.08259271 0.008479816 0.001716687 0.003720020
## wt2 -0.09419511 0.001716687 0.016629424 -0.002063308
## wt3 -0.79052688 0.003720020 -0.002063308 0.088601286
# Calc : XTy
XTy <- t(as.matrix(X)) %*% dat$Survival_pct; XTy
## [,1]
## int 377.700
## wt1 1877.911
## wt2 2247.285
## wt3 3339.300
plot(x=dat$wt1,y = dat$Survival_pct)
abline(lm(Survival_pct~wt1, data = dat))
As we can see, there appears to be downward then a upward trend – perhaps parabolic curve fits better.
plot(x=dat$wt2,y = dat$Survival_pct)
abline(lm(Survival_pct~wt2, data = dat))
There appears to be a downward trend with wt2. Less variance than the next plot
plot(x=dat$wt3,y = dat$Survival_pct)
abline(lm(Survival_pct~wt3, data = dat))
There appears to be a downward trend. More variance than the last plot above.
# Direct computation using our tools :
beta <- invXTX %*% XTy; beta
## [,1]
## int 39.4815184
## wt1 1.0092246
## wt2 -1.8726572
## wt3 -0.3666997
# proof in the pudding :
mdl <- lm(Survival_pct ~ wt1 + wt2 + wt3, data = dat);mdl
##
## Call:
## lm(formula = Survival_pct ~ wt1 + wt2 + wt3, data = dat)
##
## Coefficients:
## (Intercept) wt1 wt2 wt3
## 39.4815 1.0092 -1.8727 -0.3667
Notice both are the least squares est and get the sane thing^
Fitted values :
# Predicts based on data
as.matrix(predict(mdl)) # make mat for pretty output
## [,1]
## 1 27.35213
## 2 32.26304
## 3 27.36961
## 4 38.31121
## 5 15.51270
## 6 26.12740
## 7 28.28875
## 8 26.19328
## 9 32.13172
## 10 26.06397
## 11 37.23324
## 12 32.58778
## 13 28.26518
# Direct computation :
yhat <- as.matrix(X) %*% beta; yhat
## [,1]
## [1,] 27.35213
## [2,] 32.26304
## [3,] 27.36961
## [4,] 38.31121
## [5,] 15.51270
## [6,] 26.12740
## [7,] 28.28875
## [8,] 26.19328
## [9,] 32.13172
## [10,] 26.06397
## [11,] 37.23324
## [12,] 32.58778
## [13,] 28.26518
Notice using R func and direct computation is the same.
# lazy way :
# Confidence interval :
predict(mdl,
newdata = data.frame(wt1=3, wt2=8, wt3=9),
interval = "confidence")
## fit lwr upr
## 1 24.22764 22.46864 25.98664
Recall :
\[ \text{CI} \implies \text{est.} \pm \text{crit}_v*\text{se(est.)} \]
We have a t-dist for crit_val, df = n - (p + 1)
alpha <- .1; df <- nrow(dat) - (ncol(X))
crit <- qt(1 - alpha/2, df); crit
## [1] 1.833113
Recall that our regression line is \(E(y | \vec{x})\) and \(\hat{y}\) is our est we use based on our input value \(x_o\) = (2,3,9)
So we need \(\text{se}(\hat{y})\). Which in our last hw we note :
\[ \vec{x}_o^{T}\hat{\beta} \sim N(\theta, \vec{x}_o^{T} \sigma^2(X^{T}X)^{-1}\vec{x}_o) \]
Except we have data and not the true \(\sigma^2\), so we approximate since we have unknown variance. We do this using the t-dist with n - (p + 1) degrees of freedom. We estimate \(\sigma^2\) with \(\hat{\sigma}^2\) which is \(SS_{\text{Err}}/\text{df} = \frac{\sum (y - \hat{y})^2}{\text{df}}\)
Therefore :
var_est <- sum((dat$Survival_pct - yhat)^2) /df
Therefore :
xo <- data.frame(int = 1, wt1=3, wt2=8, wt3=9)
xoT <- as.matrix(xo)
xo_pred <- xoT %*% beta
Merr <- sqrt(xoT %*% invXTX %*% t(xoT)*var_est) * crit
CI90 <- c(lwr = xo_pred - Merr, upper = xo_pred + Merr); CI90
## lwr upper
## 22.80225 25.65302
typical small numerical mismatches due to XTX calculation. HW 5 discusses how to adjust for that (question 2)
# lazy way :
# prediction interval :
predict(mdl,
newdata = data.frame(wt1=3, wt2=8, wt3=9),
interval = "prediction")
## fit lwr upr
## 1 24.22764 19.14559 29.30969
And our prediction interval is just “1+ …” so :
\[ \hat{\sigma}\sqrt{\,1 + x_0^\top (X^\top X)^{-1} x_0\,} \]
se_pred <- sqrt(1 + xoT %*% invXTX %*% t(xoT)) * sqrt(var_est)
M_err <- se_pred * crit
c(lwr = xo_pred - M_err, upr = xo_pred + M_err)
## lwr upr
## 20.10946 28.34582
Which again is pretty close and only very slightly off
# lazy sol
summary(mdl)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.4815184 5.9855444 6.5961449 9.971689e-05
## wt1 1.0092246 0.1940887 5.1998116 5.641984e-04
## wt2 -1.8726572 0.2717976 -6.8898960 7.147095e-05
## wt3 -0.3666997 0.6273747 -0.5844987 5.732428e-01
Recall :
\[ T = \frac{\hat{\beta}_i}{\text{se}(\hat{\beta_i})} \]
So :
Recall from hw 6 :
\[ \hat{\beta}_i \sim \mathcal{N}\!\left( \beta_i,\; \sigma^2 \big[(X^{\top}X)^{-1}\big]_{\,i+1,\,i+1}\right) \]
So we will est \(\sigma\) with \(\hat{\sigma}\) . Therefore :
cov_mat <- var_est * invXTX
seb1 <- sqrt(cov_mat[2,2]); seb1
## [1] 0.1940887
seb2 <- sqrt(cov_mat[3,3]); seb2
## [1] 0.2717976
seb3 <- sqrt(cov_mat[4,4]); seb3
## [1] 0.6273747
T_val <- summary(mdl)$coefficients[-1,"Estimate"] / c(seb1, seb2, seb3); T_val
## wt1 wt2 wt3
## 5.1998116 -6.8898960 -0.5844987
# Notice that these are equal :
summary(mdl)$coefficients[-1,"t value"]
## wt1 wt2 wt3
## 5.1998116 -6.8898960 -0.5844987