| Y | X1 | X2 |
|---|---|---|
| 42.0 | 7.0 | 33.0 |
| 33.0 | 4.0 | 41.0 |
| 75.0 | 16.0 | 7.0 |
| 28.0 | 3.0 | 49.0 |
| 91.0 | 21.0 | 5.0 |
| 55.0 | 8.0 | 31.0 |
Assume the standard multiple regression model with independent normal error terms. Compute b, e, H, SSErr, \(R^2\), \(s^2_b\), \(\hat{Y}\) for \(X1\) = 10, \(X2\) = 30. You can do the computations using software or by hand, although it would be lengthy to do them by hand.
scalar (1, 1, 1, 1, 1, 1)scalar <- c(1, 1, 1, 1, 1, 1)
Y <- c(42, 33, 75, 28, 91, 55)
X1 <- c(7, 4, 16, 3, 21, 8)
X2 <- c(33, 41, 7, 49, 5, 31)
# namely X is the matrix
X <- matrix(c(scalar, X1, X2), 6, 3, byrow = FALSE)
X
## [,1] [,2] [,3]
## [1,] 1 7 33
## [2,] 1 4 41
## [3,] 1 16 7
## [4,] 1 3 49
## [5,] 1 21 5
## [6,] 1 8 31
\[ \boldsymbol{\hat{\beta}} = \boldsymbol{(X^TX)}^{-1}\boldsymbol{(X^TY)} \]
library(matlib)
b = inv(t(X) %*% X) %*% t(X) %*% Y
b
## [,1]
## [1,] 33.9321020
## [2,] 2.7847707
## [3,] -0.2643979
H = X %*% inv(t(X) %*% X) %*% t(X)
H
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.23143639 0.25168006 0.21178834 0.1488734 -0.05475455 0.21099418
## [2,] 0.25168006 0.31240977 0.09437951 0.2662835 -0.14787196 0.22314063
## [3,] 0.21178834 0.09437951 0.70442097 -0.3191731 0.10446756 0.20412257
## [4,] 0.14887339 0.26628346 -0.31917314 0.6142637 0.14143589 0.14834214
## [5,] -0.05475455 -0.14787196 0.10446756 0.1414359 0.94040059 0.01632796
## [6,] 0.21099418 0.22314063 0.20412257 0.1483421 0.01632796 0.19708945
\[ e = Y -\hat{Y} = Y - H * Y \]
e <- Y - H %*% Y
e
## [,1]
## [1,] -2.70036663
## [2,] -1.23087135
## [3,] -1.63764825
## [4,] -1.33091751
## [5,] -0.09029763
## [6,] 6.98606687
\[ SSE = e^T * e \]
SSE <- t(e) %*% e
SSE
## [,1]
## [1,] 62.07354
# SSErr <- sum(e^2)
# SSErr
\[ R^2 = \frac{SSRegression}{SSTotal} = 1 - \frac{SSError}{SSTotal} \]
SST <- sum((Y - mean(Y))^2) # total sum of squares
SST
## [1] 3072
R2 <- 1 - SSE/SST # R^2
R2
## [,1]
## [1,] 0.9797938
\[ MSE = \frac{SSE}{n-2} \]
MSE <- SSE/ (6-2)
MSE
## [,1]
## [1,] 15.51839
\[ S^2\{b\} = MSE(X^T * X)^{-1} \]
s2b <- MSE[1,1] * inv(t(X) %*% X)
s2b
## [,1] [,2] [,3]
## [1,] 536.60338 -25.6191888 -10.1962033
## [2,] -25.61919 1.2462499 0.4830506
## [3,] -10.19620 0.4830506 0.1968509
Xone <- c(1, 10, 30)
yhat <- t(Xone) %*% b
yhat
## [,1]
## [1,] 53.84787
brand <- read.table("./data/CH06PR05.txt")
brand %>%
rename(brand = "V1", moisture = "V2", sweetness = "V3") -> brand
brand
# Y, X1, X2
It was collected to study the relation between degree of brand liking (Y ) and moisture content (X1) and sweetness (X2) of the product. (a) Fit a regression model to these data and state the estimated regression function. Interpret b1. \[
\hat{Y} = 37.6500 + 4.4250X1 + 4.3750X2
\] - The slope \(b_1\) is 4.425, meaning that the mean response degree of brand liking is increase by 4.425 with 1 unit increase of moisture when sweetness is held constant.
reg <- lm(brand ~ moisture + sweetness, data = brand)
summary(reg)
##
## Call:
## lm(formula = brand ~ moisture + sweetness, data = brand)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.400 -1.762 0.025 1.587 4.200
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.6500 2.9961 12.566 1.20e-08 ***
## moisture 4.4250 0.3011 14.695 1.78e-09 ***
## sweetness 4.3750 0.6733 6.498 2.01e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.693 on 13 degrees of freedom
## Multiple R-squared: 0.9521, Adjusted R-squared: 0.9447
## F-statistic: 129.1 on 2 and 13 DF, p-value: 2.658e-09
par(mfrow = c(2, 2))
plot(reg)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
ncvTest(reg)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 0.6261627, Df = 1, p = 0.42877
reg) model and the full model.full <- lm(brand ~ as.factor(moisture) + as.factor(sweetness), data = brand)
anova(reg, full)
anova(full, reg)
anova(reg)
summary(reg)
##
## Call:
## lm(formula = brand ~ moisture + sweetness, data = brand)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.400 -1.762 0.025 1.587 4.200
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.6500 2.9961 12.566 1.20e-08 ***
## moisture 4.4250 0.3011 14.695 1.78e-09 ***
## sweetness 4.3750 0.6733 6.498 2.01e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.693 on 13 degrees of freedom
## Multiple R-squared: 0.9521, Adjusted R-squared: 0.9447
## F-statistic: 129.1 on 2 and 13 DF, p-value: 2.658e-09
confint(reg, level = (1-0.005))
## 0.25 % 99.75 %
## (Intercept) 27.545738 47.754262
## moisture 3.409483 5.440517
## sweetness 2.104236 6.645764
brand is explained by the moisture and sweetness independent variables.summary(reg)$r.square
## [1] 0.952059
brand is explained by model addition that are significant of moisture and sweetness.summary(reg)$adj.r.square
## [1] 0.9446834
degree of brand liking is between 73.88111 to 80.66889 when the moisture content is 5 and the sweetness of the product is 4.predict(reg, tibble(moisture = 5, sweetness = 4), interval = "confidence", level = 0.99)
## fit lwr upr
## 1 77.275 73.88111 80.66889
degree of brand liking is between 73.88111 to 80.66889 when the moisture content is 5 and the sweetness of the product is 4.predict(reg, tibble(moisture = 5, sweetness = 4), interval = "prediction", level = 0.99)
## fit lwr upr
## 1 77.275 68.48077 86.06923
Remark. Now you can check if you did #3h correctly.
Hints. First, show that the sample averages of Yi and Ybi are the same. Then, write the sample correlation coefficient between \(Y\) and \(\hat Y\) as \[
r_{Y\widehat{Y}} = \frac{\sum
(Y_i-\overline{Y})(\widehat{Y}_i-\overline{Y})}{\sqrt{\sum
(Y_i-\overline{Y})^2\sum(\widehat{Y}_i-\overline{Y})^2}}
= \frac{\sum
(\widehat{Y}_i-\overline{Y}+Y_i-\widehat{Y}_i)(\widehat{Y}_i-\overline{Y})}{\sqrt{\sum
(Y_i-\overline{Y})^2\sum(\widehat{Y}_i-\overline{Y})^2}}
\] and use known properties of residuals \(\sum e_i=0\), \(\sum X_{ij} e_i=0\), \(\sum \widehat{Y}_i e_i=0\). }}