library(tidyverse)
Consider the Supervisor Performance Data in Table 3.3
table3.3 <- read_tsv("Table3.3.txt")
The value for the estimated regression coefficients are listed below each of their associated variables below:
fit3.3 <- lm(Y~., data = table3.3)
round(coefficients(fit3.3), 4)
## (Intercept) X1 X2 X3 X4 X5
## 10.7871 0.6132 -0.0731 0.3203 0.0817 0.0384
## X6
## -0.2171
sum1 <- sum(fit3.3$fitted.values)
sum2 <- sum(table3.3$Y)
\(\sum\limits_{i=1}^{n} \hat y_i =\) 1939 and \(\sum\limits_{i=1}^{n} y_i =\) 1939
Predict student scores on the final exam using student their scores on previous two exams (Table 3.10)
table3.10 <- read_tsv("Table3.10.txt")
fit_P1 <- lm(`F` ~ P1, data = table3.10)
coefP1 <- round(coefficients(fit_P1), 4)
fit_P2 <- lm(`F` ~ P2, data = table3.10)
coefP2 <- round(coefficients(fit_P2), 4)
fit_P1P2 <- lm(`F` ~., data = table3.10)
coefP1P2 <- round(coefficients(fit_P1P2), 4)
\(\underline{\text{Model 1:}} \hspace{0.25cm} \hat F =\) -22.3424 \(+\) 1.2605 \(P_1\)
\(\underline{\text{Model 2:}} \hspace{0.25cm} \hat F =\) -1.8535 \(+\) 1.0043 \(P_2\)
\(\underline{\text{Model 3:}} \hspace{0.25cm} \hat F =\) -14.5005 \(+\) 0.4883 \(P_1\) \(+\) 0.672 \(P_2\)
I will use t-test hypothesis test for each model where \(H_0: \hat\beta_0 = 0\) and \(H_A: \hat\beta_0 \neq 0\)
Model 1 and Model 2 both have \(20\) degrees of freedom (d.o.f) and Model 3 has \(19\) d.o.f.
Using a significance level, \(\alpha = 0.05\), then the critical t-values for a two-tailed test are below:
\(t_{crit \ \left (\frac{\alpha}{2}, \ 20 \right )} = -2.086 \hspace{1.5cm} t_{crit \ \left (\frac{\alpha}{2}, \ 19 \right )} = -2.093\)
# Saving Model Summaries
sumP1 <- summary(fit_P1)
sumP2 <- summary(fit_P2)
sumP1P2 <- summary(fit_P1P2)
# Standard Errors
seP1B0 <- sumP1$coefficients[1,2]
seP2B0 <- sumP2$coefficients[1,2]
seP1P2B0 <- sumP1P2$coefficients[1,2]
\(t^* = \frac{\hat\beta_0}{s.e.(\hat\beta_0)}\)
\(\underline{\text{Model 1:}} \hspace{0.25cm} t^* = \frac{-22.342}{11.564} = -1.932\)
\(\underline{\text{Model 2:}} \hspace{0.25cm} t^* = \frac{-1.854}{7.562} = -0.245\)
\(\underline{\text{Model 3:}} \hspace{0.25cm} t^* = \frac{-14.501}{9.236} = -1.57\)
For all models \(|t^*| < |t_{crit}|\). As a result, we fail to reject the null hypothesis.
To determine which predictor is better, I will select the predictor that yields the highest \(R^2\) values in SLR (Model 1 and Model 2) and RTO (new models).
# RTO: F ~ P1
fit_0P1 <- lm(`F` ~ 0 + P1, data = table3.10)
coef0P1 <- round(coefficients(fit_0P1), 4)
sum0P1 <- summary(fit_0P1)
# RTO: F ~ P2
fit_0P2 <- lm(`F` ~ 0 + P2, data = table3.10)
coef0P2 <- round(coefficients(fit_0P2), 4)
sum0P2 <- summary(fit_0P2)
# sumP1$r.squared
# sumP2$r.squared
# sum0P1$r.squared
# sum0P2$r.squared
Recall Previously Fitted Models:
\(\underline{\text{Model 1:}} \hspace{0.25cm} \hat F =\) -22.3424 \(+\) 1.2605 \(P_1\)
\(\underline{\text{Model 2:}} \hspace{0.25cm} \hat F =\) -1.8535 \(+\) 1.0043 \(P_2\)
New Fitted RTO Models:
\(\underline{\text{Model 1.1:}} \hspace{0.25cm} \hat F =\) 0.9913 \(P_1\)
\(\underline{\text{Model 2.1:}} \hspace{0.25cm} \hat F =\) 0.9822 \(P_2\)
\(\underline{\text{Model 1:}} \hspace{0.25cm} R^2 = 0.8023\)
\(\underline{\text{Model 2:}} \hspace{0.25cm} R^2 = 0.8600\)
\(\underline{\text{Model 1.1:}} \hspace{0.25cm} R^2 = 0.9959\)
\(\underline{\text{Model 2.1:}} \hspace{0.25cm} R^2 = 0.9975\)
\(\underline{\textbf{Conclusion:}} \hspace{0.25cm} P_2\) is a better predictor of \(F\) as it yields higher \(R^2\) than \(P_1\) in both of their respective RTO and SLR models.
Which of the three models with intercepts would you use to predict the final examination scores for a student who scored 78 and 85 on the first and second preliminary examinations, respectively? What is your prediction in this case?
# sumP1$adj.r.squared
# sumP2$adj.r.squared
# sumP1P2$adj.r.squared
Since Model 3 has two predictors, I will consider adjusted \(R^2\) values:
\(\underline{\text{Model 1:}} \hspace{0.25cm} R_{\textit{adj}}^2 = 0.7924\)
\(\underline{\text{Model 2:}} \hspace{0.25cm} R_{\textit{adj}}^2 = 0.8530\)
\(\underline{\text{Model 3:}} \hspace{0.25cm} R_{\textit{adj}}^2 = 0.8744\)
\(\underline{\textbf{Conclusion:}} \hspace{0.25cm}\) I would use Model 3: \(\hat F =\) -14.5005 + 0.4883 \(P_1\) + 0.672 \(P_2\)
If \(P_1 = 78\) and \(P_2 = 85\), Model 3 predicts \(F = 80.71\).