BANA7052 Fall 2025 Homework 3
Eli Bales Kazuhide Watanabe
November 11, 2025
Tools & Packages
# Load the dataset and packages
library(dplyr)
library(ggplot2)
library(kableExtra)
library(patchwork)
Problem 1
Alumni Donation Data (Multiple Linear Regression). Continuing with the alumni donation data, fit a multiple linear regression model to the data, where the alumni giving rate (alumni_giving_rate) is the response variable (Y), and the percentage of classes with fewer than 20 students (percent_of_classes_under_20) and student/faculty ratio (student_faculty_ratio) as the predictors.
(a) What is your final estimated model?
alumni <- read.csv('/Users/kazuhidewatanabe/Desktop/BANA_7052/alumni.csv')
alumni_fit <- lm(alumni_giving_rate ~ percent_of_classes_under_20 + student_faculty_ratio, data = alumni)
summary_alumni_fit <- summary(alumni_fit)
The estimated multiple linear regression model is:
\[\hat{Y} = 39.6556 + 0.1662X_1 + (-1.7021)X_2\]
(b) What is the predicted alumni giving rate for an observation
with percent_of_classes_under_20 = 50 and student_faculty_ratio =
10?
\[\hat{Y} = 39.6556 + 0.1662(50) + (-1.7021)(10)\] \[\hat{Y} = 30.9446\]
(c) Test the statistical significance of the regression
coefficient using t-tests; use \(\alpha =
0.05\). Obtain the t-statistics and p-values, interpret the
results, make a conclusion (i.e.reject or not reject), and explain why.
Note: please explain what the null hypothesis is.
coef_table_alum <- as.data.frame(summary_alumni_fit$coefficients)
names(coef_table_alum) <- c("Estimate", "Std Error", "t Value", "p Value")
coef_table_alum$`p Value` <- format(coef_table_alum$`p Value`, scientific = TRUE, digits = 2)
coef_table_alum %>%
kbl(
caption = "<span style='font-size:12pt; white-space:nowrap;'>
Table 1: Estimated Coefficients for Model Y<sub>2</sub> ~ X<sub>1</sub> + X<sub>2</sub> (Alumni Data)
</span>",
align = "c",
digits = 3
) %>%
kable_classic(full_width = F, html_font = "Cambria") %>%
column_spec(2, width = "3cm") %>%
column_spec(3, width = "3cm") %>%
column_spec(4, width = "3cm") %>%
column_spec(5, width = "3cm")
| Estimate | Std Error | t Value | p Value | |
|---|---|---|---|---|
| (Intercept) | 39.656 | 13.508 | 2.936 | 5.2e-03 |
| percent_of_classes_under_20 | 0.166 | 0.163 | 1.022 | 3.1e-01 |
| student_faculty_ratio | -1.702 | 0.442 | -3.850 | 3.7e-04 |
The \(X_1\) and \(X_2\) coefficient of this equation differ in significance as the p-value for:
\[ X_1: 0.31 > 0.05 \] \[ X_2: 0.00037 < 0.05 \]
For each coefficient \((\beta_i)\), the hypotheses are defined as:
\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]
\(H_0\): Neither the percentage of small classes nor the student-faculty ratio has any linear relationship with alumni giving rate.
(a) \(X_1\)
\[ t = 1.022, \quad p = 0.31 \]
Don’t reject \(H_0\) → \(X_1\) is not a significant predictor of \(Y\).
This means small class size percentage is not a significant predictor of alumni giving rate in this model, after controlling for student–faculty ratio.
(b) \(X_2\)
\[ t = -3.850, \quad p = 0.00037 \]
Reject \(H_0\) → \(X_2\) is a significant negative predictor of \(Y\).
This indicates a significant negative relationship: as the student–faculty ratio increases (larger classes per teacher), the alumni giving rate tends to decrease.
Conclusion:
The model suggests that student–faculty ratio is a key factor influencing alumni donations, while percent of small classes does not show a significant independent effect.
(d) What is the F statistic? Is it significant? Clearly write
out the null hypothesis, F-statistic, and p-value and interpret the test
results.
f_table <- data.frame(
Statistic = "F",
`F_Value` = 28.79,
`Numerator df` = 2,
`Denominator df` = 45,
`p Value` = "8.866e-09"
)
f_table %>%
kbl(
caption = "<span style='font-size:12pt; white-space:nowrap;'>
Table 2: F-Statistics for Model Y<sub>2</sub> ~ X<sub>1</sub> + X<sub>2</sub> (Alumni Data)
</span>",
align = "c",
digits = 3
) %>%
kable_classic(full_width = F, html_font = "Cambria") %>%
column_spec(2, width = "3cm") %>%
column_spec(3, width = "3cm") %>%
column_spec(4, width = "3cm") %>%
column_spec(5, width = "3cm")
| Statistic | F_Value | Numerator.df | Denominator.df | p.Value |
|---|---|---|---|---|
| F | 28.79 | 2 | 45 | 8.866e-09 |
For each coefficient \((\beta_i)\), the hypotheses are defined as:
\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]
Interpretation
Because the p-value is extremely small (much less than 0.05), we reject the null hypothesis.
The F-test result indicates that the overall regression model is statistically significant. This means that percent_of_classes_under_20 and student_faculty_ratio, taken together, significantly explain variation in the alumni giving rate. Even though one predictor (\(X_1\)) individually was not significant in its t-test, the joint contribution of both predictors improves the model fit significantly relative to a model with no predictors.
(e) What is the value of the coefficient of determination?
Please interpret.
cat(
" Multiple R-squared:", round(summary_alumni_fit$r.squared, 4), "\n",
"Adjusted R-squared:", round(summary_alumni_fit$adj.r.squared, 4), "\n"
)
## Multiple R-squared: 0.5613
## Adjusted R-squared: 0.5418
The multiple regression model explains approximately 56.13 percent of the variation in the response variable (R² = 0.5613). After adjusting for the number of predictors, the model accounts for about 54.18 percent of the variation (Adjusted R² = 0.5418). These values indicate that the predictors collectively provide a moderate level of explanatory power. While the model captures more than half of the variability in the outcome, a substantial portion remains unexplained, suggesting that additional predictors or nonlinear relationships may also influence the response.
(f) What is the correlation coefficient \(r_1\) between percent_of_classes_under_20
and alumni_giving_rate and the correlation coefficient \(r_2\) betwen student_faculty_ratio and
alumni_giving_rate? Do you see any relationship between \(r_1\), \(r_2\), and \(R^2\)?
r1 <- cor(alumni$percent_of_classes_under_20, alumni$alumni_giving_rate)
r2 <- cor(alumni$student_faculty_ratio, alumni$alumni_giving_rate)
\[ r_1 = 0.6456504 \]
This is the correlation between percent_of_classes_under_20 and alumni_giving_rate.
Interpretation
\[ r_2 = −0.7423975 \]
This is the correlation between student_faculty_ratio and alumni_giving_rate.
Interpretation
r_summary_table <- data.frame(
Statistic = c("Correlation r\u2081 (percent_of_classes_under_20)",
"Correlation r\u2082 (student_faculty_ratio)",
"r\u2081\u00B2",
"r\u2082\u00B2",
"Model R\u00B2",
"Adjusted R\u00B2"),
Value = c(0.6456504,
-0.7423975,
0.6456504^2,
(-0.7423975)^2,
0.5613,
0.5418)
)
r_summary_table %>%
mutate(Value = round(Value, 3)) %>%
kbl(
caption = "<span style='font-size:12pt; white-space:nowrap;'>
Table 3: Relationship between r<sub>1</sub>, r<sub>2</sub>, and R<sup>2</sup>
</span>",
digits = 3,
escape = FALSE
) %>%
kable_classic(full_width = FALSE, html_font = "Cambria") %>%
column_spec(1, width = "9cm") %>%
column_spec(2, width = "3cm")
| Statistic | Value |
|---|---|
| Correlation r₁ (percent_of_classes_under_20) | 0.646 |
| Correlation r₂ (student_faculty_ratio) | -0.742 |
| r₁² | 0.417 |
| r₂² | 0.551 |
| Model R² | 0.561 |
| Adjusted R² | 0.542 |
In simple linear regression
\[ R^2 = r^2 \]
But in multiple linear regression the predictors:
So:
Problem 2
Simulation Study (Simple Linear Regression). Assume mean function \(E\left(Y|X\right) = 10 + 5 X_1 - 2X_2\):
(a) Generate \(N = 1000\)
observations from the model: \(Y \sim 10 + 5
X_1 - 2X_2 + \varepsilon\) where \(X_1
\sim N\left(\mu = 2, \sigma = 0.1\right)\), \(X_2 \sim N\left(\mu = 0, \sigma =
0.4\right)\), and \(\epsilon \sim
N\left(\mu = 0, \sigma = 0.5\right)\).
set.seed(7052)
n <- 1000
X_1 <- rnorm(n, mean = 2, sd = 0.1)
X_2 <- rnorm(n, mean = 0, sd = 0.4)
error <- rnorm(n, mean = 0, sd = 0.5)
Y <- 10 + 5*X_1 - 2*X_2 + error
df <- data.frame(Y, X_1, X_2)
We generated the data by simulating the predictor variable \(X\) and the random error term using the
rnorm() function. The response variable \(Y\) was then constructed according to the
simple linear regression model:
\[ Y = 10 + 5X_1 + 2X_2 + \varepsilon, \]
where \(\varepsilon \sim N(0, 0.5)\).
(b) Fit a multiple linear regression to the simulated data from
part a. What is the estimated prediction equation? Report the estimated
coefficients and their standard errors. Are they significant? Clearly
write out the null and alternative hypotheses, observed t-statistic(s),
p-value(s), and interpret the estimates and test results. What is fitted
model’s MSE?
# Fit multiple linear regression with two predictors
model_multi <- lm(Y ~ X_1 + X_2, data = df)
summary_model_multi <- summary(model_multi)
The estimated simple linear regression model is:
\[\hat{Y} = 10.2640 + 4.8778X_1 + (-2.0389)X_2\]
coef_table <- as.data.frame(summary_model_multi$coefficients)
names(coef_table) <- c("Estimate", "Std Error", "t Value", "p Value")
coef_table$`p Value` <- format(coef_table$`p Value`, scientific = TRUE, digits = 2)
coef_table %>%
kbl(
caption = "<span style='font-size:12pt; white-space:nowrap;'>
Table 4: Estimated Coefficients for Model Y<sub>2</sub> ~ X<sub>1</sub> + X<sub>2</sub> (n = 1000; Error SD = 0.5)
</span>",
align = "c",
digits = 3
) %>%
kable_classic(full_width = F, html_font = "Cambria") %>%
column_spec(2, width = "3cm") %>%
column_spec(3, width = "3cm") %>%
column_spec(4, width = "3cm") %>%
column_spec(5, width = "3cm")
| Estimate | Std Error | t Value | p Value | |
|---|---|---|---|---|
| (Intercept) | 10.264 | 0.306 | 33.538 | 1.1e-165 |
| X_1 | 4.878 | 0.153 | 31.981 | 5.0e-155 |
| X_2 | -2.039 | 0.038 | -53.156 | 3.3e-293 |
All coefficients of this equation are significant as the p-value for:
\[ X_1: 5.0 \times 10^{-155} < 0.05 \] \[ X_2: 3.3 \times 10^{-293} < 0.05 \]
For each coefficient \((\beta_i)\), the hypotheses are defined as:
\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]
(a) \(X_1\)
\[ t = 31.981, \quad p = 5.0 \times 10^{-155} \]
Reject \(H_0\) → \(X_1\) is a significant positive predictor of \(Y\).
(b) \(X_2\)
\[ t = -53.156, \quad p = 3.3 \times 10^{-293} \]
Reject \(H_0\) → \(X_2\) is a significant negative predictor of \(Y\).
cat("Mean Squared Error:",(mse <- mean(model_multi$residuals^2)))
## Mean Squared Error: 0.2338812
(c) Repeat part b), but re-simulate the data and change the
error term to \(\epsilon \sim N\left(\mu = 0,
\sigma = 1\right)\) (i.e., standard normal).
# Changing the error sig value to 1
set.seed(7052)
n <- 1000
X_1 <- rnorm(n, mean = 2, sd = 0.1)
X_2 <- rnorm(n, mean = 0, sd = 0.4)
error2 <- rnorm(n, mean = 0, sd = 1)
Y2 <- 10 + 5*X_1 - 2*X_2 + error2
df2 <- data.frame(Y2, X_1, X_2)
# Fit multiple linear regression with two predictors
model_multi2 <- lm(Y2 ~ X_1 + X_2, data = df2)
summary_model_multi2 <- summary(model_multi2)
The estimated simple linear regression model is:
\[\hat{Y} = 10.5279 + 4.7555X_1 + (-2.0778)X_2\]
coef_table2 <- as.data.frame(summary_model_multi2$coefficients)
names(coef_table2) <- c("Estimate", "Std Error", "t Value", "p Value")
coef_table2$`p Value` <- format(coef_table2$`p Value`, scientific = TRUE, digits = 3)
coef_table2 %>%
kbl(
caption = "<span style='font-size:12pt; white-space:nowrap;'>
Table 5: Estimated Coefficients for Model Y<sub>2</sub> ~ X<sub>1</sub> - X<sub>2</sub> (n = 1000; Error SD = 1)
</span>",
align = "c",
digits = 3
) %>%
kable_classic(full_width = F, html_font = "Cambria") %>%
column_spec(2, width = "3cm") %>%
column_spec(3, width = "3cm") %>%
column_spec(4, width = "3cm") %>%
column_spec(5, width = "3cm")
| Estimate | Std Error | t Value | p Value | |
|---|---|---|---|---|
| (Intercept) | 10.528 | 0.612 | 17.200 | 2.91e-58 |
| X_1 | 4.756 | 0.305 | 15.590 | 3.35e-49 |
| X_2 | -2.078 | 0.077 | -27.085 | 1.56e-121 |
All coefficients of this equation are significant as the p-value for:
\[ X_1: 3.35 \times 10^{-49} < 0.05 \] \[ X_2: 1.56 \times 10^{-121} < 0.05 \]
For each coefficient \((\beta_i)\), the hypotheses are defined as:
\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]
(a) \(X_1\)
\[ t = 15.590, \quad p = 3.35 \times 10^{-49} \]
Reject \(H_0\) → \(X_1\) is a significant positive predictor of \(Y\).
(b) \(X_2\)
\[ t = -27.085, \quad p = 1.56 \times 10^{-121} \]
Reject \(H_0\) → \(X_2\) is a significant negative predictor of \(Y\).
cat("Mean Squared Error:",(mse <- mean(model_multi2$residuals^2)))
## Mean Squared Error: 0.9355248
(d) Repeat parts a)–c) using \(n =
400\). What do you conclude? What is the effect to the model
parameter estimates when the error variance gets smaller? What is the
effect when the sample size gets bigger?
set.seed(7052)
n2 <- 400
X_1_n <- rnorm(n2, mean = 2, sd = 0.1)
X_2_n <- rnorm(n2, mean = 0, sd = 0.4)
error_n <- rnorm(n2, mean = 0, sd = 0.5)
Y_n <- 10 + 5*X_1_n - 2*X_2_n + error_n
df_n <- data.frame(Y_n, X_1_n, X_2_n)
# Fit multiple linear regression with two predictors
model_multi_n <- lm(Y_n ~ X_1_n + X_2_n, data = df_n)
summary_model_multi_n <- summary(model_multi_n)
The estimated multi linear regression model is:
\[\hat{Y} = 10.698 + 4.662X_1 - (-1.958)X_2\]
coef_table_n <- as.data.frame(summary_model_multi_n$coefficients)
names(coef_table_n) <- c("Estimate", "Std Error", "t Value", "p Value")
coef_table_n$`p Value` <- format(coef_table_n$`p Value`, scientific = TRUE, digits = 3)
coef_table_n %>%
kbl(
caption = "<span style='font-size:12pt; white-space:nowrap;'>
Table 6: Estimated Coefficients for Model Y<sub>2</sub> ~ X<sub>1</sub> + X<sub>2</sub> (n = 400; Error SD = 0.5)
</span>",
align = "c",
digits = 3
) %>%
kable_classic(full_width = F, html_font = "Cambria") %>%
column_spec(2, width = "3cm") %>%
column_spec(3, width = "3cm") %>%
column_spec(4, width = "3cm") %>%
column_spec(5, width = "3cm")
| Estimate | Std Error | t Value | p Value | |
|---|---|---|---|---|
| (Intercept) | 10.698 | 0.521 | 20.532 | 2.32e-64 |
| X_1_n | 4.662 | 0.259 | 18.023 | 1.72e-53 |
| X_2_n | -1.958 | 0.065 | -30.081 | 1.99e-104 |
All coefficients of this equation are significant as the p-value for:
\[ X_1: 1.72 \times 10^{-53} < 0.05 \] \[ X_2: 1.99 \times 10^{-104} < 0.05 \]
For each coefficient \((\beta_i)\), the hypotheses are defined as:
\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]
(a) \(X_1\)
\[ t = 18.023, \quad p = 1.72 \times 10^{-53} \]
Reject \(H_0\) → \(X_1\) is a significant positive predictor of \(Y\).
(b) \(X_2\)
\[ t = -30.081, \quad p = 1.99 \times 10^{-104} \]
Reject \(H_0\) → \(X_2\) is a significant negative predictor of \(Y\).
cat("Mean Squared Error:",(mse <- mean(model_multi_n$residuals^2)))
## Mean Squared Error: 0.2557567
set.seed(7052)
# Changing the error sig value to 1
n2 <- 400
X_1_n <- rnorm(n2, mean = 2, sd = 0.1)
X_2_n <- rnorm(n2, mean = 0, sd = 0.4)
error2_n <- rnorm(n2, mean = 0, sd = 1)
Y2_n <- 10 + 5*X_1_n - 2*X_2_n + error2_n
df2_n <- data.frame(Y2_n, X_1_n, X_2_n)
# Fit multiple linear regression with two predictors
model_multi2_n <- lm(Y2_n ~ X_1_n + X_2_n, data = df2_n)
summary_model_multi2_n <- summary(model_multi2_n)
The estimated simple linear regression model is:
\[\hat{Y} = 11.3965 + 4.3247X_1 - (-1.9156)X_2\]
coef_table2_n <- as.data.frame(summary_model_multi2_n$coefficients)
names(coef_table2_n) <- c("Estimate", "Std Error", "t Value", "p Value")
coef_table2_n$`p Value` <- format(coef_table2_n$`p Value`, scientific = TRUE, digits = 3)
coef_table2_n %>%
kbl(
caption = "<span style='font-size:12pt; white-space:nowrap;'>
Table 7: Estimated Coefficients for Model Y<sub>2</sub> ~ X<sub>1</sub> + X<sub>2</sub> (n = 400; Error SD = 1)
</span>",
align = "c",
digits = 3
) %>%
kable_classic(full_width = F, html_font = "Cambria") %>%
column_spec(2, width = "3cm") %>%
column_spec(3, width = "3cm") %>%
column_spec(4, width = "3cm") %>%
column_spec(5, width = "3cm")
| Estimate | Std Error | t Value | p Value | |
|---|---|---|---|---|
| (Intercept) | 11.397 | 1.042 | 10.936 | 1.64e-24 |
| X_1_n | 4.325 | 0.517 | 8.359 | 1.08e-15 |
| X_2_n | -1.916 | 0.130 | -14.716 | 1.98e-39 |
All coefficients of this equation are significant as the p-value for:
\[ X_1: 1.08 \times 10^{-15} < 0.05 \] \[ X_2: 1.98 \times 10^{-39} < 0.05 \]
For each coefficient \((\beta_i)\), the hypotheses are defined as:
\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]
(a) \(X_1\)
\[ t = 10.368, \quad p = 1.08 \times 10^{-15} \]
Reject \(H_0\) → \(X_1\) is a significant positive predictor of \(Y\).
(b) \(X_2\)
\[ t = -14.667, \quad p = 1.98 \times 10^{-39} \]
Reject \(H_0\) → \(X_2\) is a significant negative predictor of \(Y\).
cat("Mean Squared Error:",(mse <- mean(model_multi2_n$residuals^2)))
## Mean Squared Error: 1.023027
As the sample size \(n\) increases,
the Mean Squared Error (MSE) values in our results change only slightly,
but this does not mean the model is performing
worse.
The MSE in this context represents the model’s estimate of the true
variance of the residuals:
\[ \widehat{\sigma}^2_{MSE} = \frac{SSE}{n - p} \]
This value estimates the underlying error variance \(\sigma^2\), not predictive accuracy.
Because \(E[\widehat{\sigma}^2_{MSE}] =
\sigma^2\), the expected MSE should remain approximately constant
across different \(n\) values.
As \(n\) increases:
Conceptually, increasing the sample size gives the regression model
more information to estimate both coefficients and residual variance
accurately.
The model becomes more precise, and the MSE becomes a
more consistent estimator of the true error
variance.
The standard deviation of the errors, \(\sigma\), measures the amount of random
noise around the regression line.
Since the MSE estimates \(\sigma^2\),
any change in \(\sigma\) directly
affects the expected MSE.
When \(\sigma\) decreases:
This pattern is reflected clearly in our simulation results. When the error standard deviation was \(\sigma = 1\), the MSE values were approximately 0.94–1.02. When the error standard deviation was reduced to \(\sigma = 0.5\), the MSE values dropped to about 0.23–0.26.
The ratio: \[ \frac{0.24}{0.96} \approx 0.25 \]
Conceptually, a smaller \(\sigma\)
means less random noise, so the regression line fits the data more
tightly.
The smaller the \(\sigma\), the smaller
the average squared residuals (MSE), and the more accurate and
precise the model becomes.
(e) What about the MSE from each model?
cat(
" Mean Squared Error (n = 100; Error SD = 0.5):", mean(model_multi$residuals^2), "\n",
"Mean Squared Error (n = 100; Error SD = 1 ):", mean(model_multi2$residuals^2), "\n",
"Mean Squared Error (n = 400; Error SD = 0.5):", mean(model_multi_n$residuals^2), "\n",
"Mean Squared Error (n = 400; Error SD = 1 ):", mean(model_multi2_n$residuals^2), "\n"
)
## Mean Squared Error (n = 100; Error SD = 0.5): 0.2338812
## Mean Squared Error (n = 100; Error SD = 1 ): 0.9355248
## Mean Squared Error (n = 400; Error SD = 0.5): 0.2557567
## Mean Squared Error (n = 400; Error SD = 1 ): 1.023027
Problem 3
Multiple Linear Regression in Matrix Notation
(a) Write out the multiple linear regression model with normal
errors in matrix form.
General form
\[ \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \qquad \boldsymbol{\varepsilon} \sim N\!\left(\mathbf{0}, \,\sigma^2 \mathbf{I}_n\right) \]
Matrix
Let \(p=3\) (intercept, \(X_1\), \(X_2\)). Then
\[ \mathbf{Y} = \begin{bmatrix} Y_1 \\ \vdots \\ Y_n \end{bmatrix},\quad \mathbf{X} = \begin{bmatrix} 1 & X_{11} & X_{21} \\ \vdots & \vdots & \vdots \\ 1 & X_{1n} & X_{2n} \end{bmatrix},\quad \boldsymbol{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix},\quad \boldsymbol{\varepsilon} = \begin{bmatrix} \varepsilon_1 \\ \vdots \\ \varepsilon_n \end{bmatrix}. \]
(b) State the model assumptions from part (a)
Assumptions:
The relationship between \(Y\) and the predictors is linear in the parameters, meaning \[ E(Y \mid X) = X\boldsymbol{\beta}. \]
All errors \(\varepsilon_i\) are independent and follow a normal distribution with mean 0 and variance \(\sigma^2\): \[ \varepsilon_i \sim N(0, \sigma^2). \]
Formally, in vector notation, \[ \boldsymbol{\varepsilon} \sim N\!\left(\mathbf{0},\, \sigma^2 \mathbf{I}_n\right). \]
The design matrix \(X\) is
full rank, meaning there is no perfect
multicollinearity.
If the model has \(p\) predictors, then
the model matrix must have rank \(p +
1\) (including the intercept column).
(c) Write out the model matrix 𝑋 for the model in part (a)
For our two-predictor model (\(p=3\)):
\[ \mathbf{X} = \begin{bmatrix} 1 & X_{11} & X_{21}\\ 1 & X_{12} & X_{22}\\ \vdots & \vdots & \vdots\\ 1 & X_{1n} & X_{2n} \end{bmatrix} \quad\text{(size } n \times 3\text{)}. \]
Then \(\boldsymbol{\beta} = \begin{bmatrix}\beta_0\\ \beta_1\\ \beta_2\end{bmatrix}\), and \(\mathbf{Y} = \begin{bmatrix}Y_1\\ \vdots\\ Y_n\end{bmatrix}\)
(d) What is the least squares estimate of \(\beta\) in part (a)
\[ b = \hat{\boldsymbol{\beta}} = \left(\mathbf{X}^\top \mathbf{X}\right)^{-1}\mathbf{X}^\top \mathbf{Y} \]
(e) What does it mean for the estimate in part d) to be
unbiased?
If our least squares estimate from the previous part is unbiased, that means the random “noise” in the data averages out across repeated samples, and OLS does not systematically overestimate or underestimate the true coefficients.
This does not mean that every individual sample
produces the exact true values.
Instead, it means that on average, across many samples drawn
from the same population, the estimator hits the true parameter.
Formally, the unbiasedness condition is: \[ E\!\left[\hat{\boldsymbol{\beta}} \mid X\right] = \boldsymbol{\beta}, \] or equivalently for each coefficient, \[ E(\hat{\beta}_j) = \beta_j \]