BANA7052_Group2

BANA7052 Fall 2025 Homework 3

Eli Bales Kazuhide Watanabe

November 11, 2025

Tools & Packages

# Load the dataset and packages
library(dplyr)
library(ggplot2)
library(kableExtra)
library(patchwork)

Problem 1

Alumni Donation Data (Multiple Linear Regression). Continuing with the alumni donation data, fit a multiple linear regression model to the data, where the alumni giving rate (alumni_giving_rate) is the response variable (Y), and the percentage of classes with fewer than 20 students (percent_of_classes_under_20) and student/faculty ratio (student_faculty_ratio) as the predictors.

(a) What is your final estimated model?

alumni <- read.csv('/Users/kazuhidewatanabe/Desktop/BANA_7052/alumni.csv')

alumni_fit <- lm(alumni_giving_rate ~ percent_of_classes_under_20 + student_faculty_ratio, data = alumni)

summary_alumni_fit <- summary(alumni_fit)

Estimated Model

The estimated multiple linear regression model is:

\[\hat{Y} = 39.6556 + 0.1662X_1 + (-1.7021)X_2\]

(b) What is the predicted alumni giving rate for an observation with percent_of_classes_under_20 = 50 and student_faculty_ratio = 10?

Predicted Alumni Giving Rate \(X_1 = 50, X_2 = 10\)

\[\hat{Y} = 39.6556 + 0.1662(50) + (-1.7021)(10)\] \[\hat{Y} = 30.9446\]

(c) Test the statistical significance of the regression coefficient using t-tests; use \(\alpha = 0.05\). Obtain the t-statistics and p-values, interpret the results, make a conclusion (i.e.reject or not reject), and explain why. Note: please explain what the null hypothesis is.

coef_table_alum <- as.data.frame(summary_alumni_fit$coefficients)

names(coef_table_alum) <- c("Estimate", "Std Error", "t Value", "p Value")

coef_table_alum$`p Value` <- format(coef_table_alum$`p Value`, scientific = TRUE, digits = 2)

coef_table_alum %>%
  kbl(
    caption = "<span style='font-size:12pt; white-space:nowrap;'>
    Table 1: Estimated Coefficients for Model Y<sub>2</sub> ~ X<sub>1</sub> + X<sub>2</sub> (Alumni Data)
    </span>",
    align = "c",
    digits = 3
  ) %>%
  kable_classic(full_width = F, html_font = "Cambria") %>%
  column_spec(2, width = "3cm") %>%  
  column_spec(3, width = "3cm") %>%  
  column_spec(4, width = "3cm") %>%
  column_spec(5, width = "3cm")

Table 1: Estimated Coefficients for Model Y₂ ~ X₁ + X₂ (Alumni Data)
	Estimate	Std Error	t Value	p Value
(Intercept)	39.656	13.508	2.936	5.2e-03
percent_of_classes_under_20	0.166	0.163	1.022	3.1e-01
student_faculty_ratio	-1.702	0.442	-3.850	3.7e-04

Significance

The \(X_1\) and \(X_2\) coefficient of this equation differ in significance as the p-value for:

\[ X_1: 0.31 > 0.05 \] \[ X_2: 0.00037 < 0.05 \]

Null & Alternative Hypothesis

For each coefficient \((\beta_i)\), the hypotheses are defined as:

\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]

\(H_0\): Neither the percentage of small classes nor the student-faculty ratio has any linear relationship with alumni giving rate.

(a) \(X_1\)

\[ t = 1.022, \quad p = 0.31 \]

Don’t reject \(H_0\) → \(X_1\) is not a significant predictor of \(Y\).

This means small class size percentage is not a significant predictor of alumni giving rate in this model, after controlling for student–faculty ratio.

(b) \(X_2\)

\[ t = -3.850, \quad p = 0.00037 \]

Reject \(H_0\) → \(X_2\) is a significant negative predictor of \(Y\).

This indicates a significant negative relationship: as the student–faculty ratio increases (larger classes per teacher), the alumni giving rate tends to decrease.

Conclusion:

The model suggests that student–faculty ratio is a key factor influencing alumni donations, while percent of small classes does not show a significant independent effect.

(d) What is the F statistic? Is it significant? Clearly write out the null hypothesis, F-statistic, and p-value and interpret the test results.

f_table <- data.frame(
  Statistic = "F",
  `F_Value` = 28.79,
  `Numerator df` = 2,
  `Denominator df` = 45,
  `p Value` = "8.866e-09"
)

f_table %>%
  kbl(
    caption = "<span style='font-size:12pt; white-space:nowrap;'>
    Table 2: F-Statistics for Model Y<sub>2</sub> ~ X<sub>1</sub> + X<sub>2</sub> (Alumni Data)
    </span>",
    align = "c",
    digits = 3
  ) %>%
  kable_classic(full_width = F, html_font = "Cambria") %>%
  column_spec(2, width = "3cm") %>%  
  column_spec(3, width = "3cm") %>%  
  column_spec(4, width = "3cm") %>%
  column_spec(5, width = "3cm")

Table 2: F-Statistics for Model Y₂ ~ X₁ + X₂ (Alumni Data)
Statistic	F_Value	Numerator.df	Denominator.df	p.Value
F	28.79	2	45	8.866e-09

Null & Alternative Hypothesis

For each coefficient \((\beta_i)\), the hypotheses are defined as:

\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]

Interpretation

Because the p-value is extremely small (much less than 0.05), we reject the null hypothesis.

The F-test result indicates that the overall regression model is statistically significant. This means that percent_of_classes_under_20 and student_faculty_ratio, taken together, significantly explain variation in the alumni giving rate. Even though one predictor (\(X_1\)) individually was not significant in its t-test, the joint contribution of both predictors improves the model fit significantly relative to a model with no predictors.

(e) What is the value of the coefficient of determination? Please interpret.

cat(
  " Multiple R-squared:", round(summary_alumni_fit$r.squared, 4), "\n",
  "Adjusted R-squared:", round(summary_alumni_fit$adj.r.squared, 4), "\n"
)

##  Multiple R-squared: 0.5613 
##  Adjusted R-squared: 0.5418

The multiple regression model explains approximately 56.13 percent of the variation in the response variable (R² = 0.5613). After adjusting for the number of predictors, the model accounts for about 54.18 percent of the variation (Adjusted R² = 0.5418). These values indicate that the predictors collectively provide a moderate level of explanatory power. While the model captures more than half of the variability in the outcome, a substantial portion remains unexplained, suggesting that additional predictors or nonlinear relationships may also influence the response.

(f) What is the correlation coefficient \(r_1\) between percent_of_classes_under_20 and alumni_giving_rate and the correlation coefficient \(r_2\) betwen student_faculty_ratio and alumni_giving_rate? Do you see any relationship between \(r_1\), \(r_2\), and \(R^2\)?

r1 <- cor(alumni$percent_of_classes_under_20, alumni$alumni_giving_rate)
r2 <- cor(alumni$student_faculty_ratio, alumni$alumni_giving_rate)

Interpretation of \(r_1\)

\[ r_1 = 0.6456504 \]

This is the correlation between percent_of_classes_under_20 and alumni_giving_rate.

Interpretation

The correlation is positive and moderately strong.
As the percent of small classes increases, the alumni giving rate tends to increase.

Interpretation of \(r_2\)

\[ r_2 = −0.7423975 \]

This is the correlation between student_faculty_ratio and alumni_giving_rate.

Interpretation

The correlation is negative and strong.
As the student-to-faculty ratio increases (meaning more students per instructor → larger classes), the alumni giving rate tends to decrease.

Relationship between \(r_1\) \(r_2\) and \(R^2\)

r_summary_table <- data.frame(
  Statistic = c("Correlation r\u2081 (percent_of_classes_under_20)",
                "Correlation r\u2082 (student_faculty_ratio)",
                "r\u2081\u00B2",
                "r\u2082\u00B2",
                "Model R\u00B2",
                "Adjusted R\u00B2"),
  Value = c(0.6456504,
            -0.7423975,
            0.6456504^2,
            (-0.7423975)^2,
            0.5613,
            0.5418)
)

r_summary_table %>%
  mutate(Value = round(Value, 3)) %>%
  kbl(
    caption = "<span style='font-size:12pt; white-space:nowrap;'>
      Table 3: Relationship between r<sub>1</sub>, r<sub>2</sub>, and R<sup>2</sup>
    </span>",
    digits = 3,
    escape = FALSE
  ) %>%
  kable_classic(full_width = FALSE, html_font = "Cambria") %>%
  column_spec(1, width = "9cm") %>%
  column_spec(2, width = "3cm")

Table 3: Relationship between r₁, r₂, and R²
Statistic	Value
Correlation r₁ (percent_of_classes_under_20)	0.646
Correlation r₂ (student_faculty_ratio)	-0.742
r₁²	0.417
r₂²	0.551
Model R²	0.561
Adjusted R²	0.542

In simple linear regression

\[ R^2 = r^2 \]

But in multiple linear regression the predictors:

can overlap
may explain shared portions of variance
and the model adjusts for that overlap

So:

\(r_2^2\) ≈ 0.551 is the strongest single-predictor
\(r_1^2\) ≈ 0.417 also explains a lot of variation
but the two predictors overlap, so their combined predictive value is not the sum
therefore the model’s \(R^2\) of 0.5613 is slightly above \(r_2^2\), but far below \(r_1^2 + r_2^2\)

Problem 2

Simulation Study (Simple Linear Regression). Assume mean function \(E\left(Y|X\right) = 10 + 5 X_1 - 2X_2\):

(a) Generate \(N = 1000\) observations from the model: \(Y \sim 10 + 5 X_1 - 2X_2 + \varepsilon\) where \(X_1 \sim N\left(\mu = 2, \sigma = 0.1\right)\), \(X_2 \sim N\left(\mu = 0, \sigma = 0.4\right)\), and \(\epsilon \sim N\left(\mu = 0, \sigma = 0.5\right)\).

set.seed(7052)
n <- 1000
X_1 <- rnorm(n, mean = 2, sd = 0.1)
X_2 <- rnorm(n, mean = 0, sd = 0.4)
error <- rnorm(n, mean = 0, sd = 0.5)
Y <- 10 + 5*X_1 - 2*X_2 + error

df <- data.frame(Y, X_1, X_2)

We generated the data by simulating the predictor variable \(X\) and the random error term using the rnorm() function. The response variable \(Y\) was then constructed according to the simple linear regression model:

\[ Y = 10 + 5X_1 + 2X_2 + \varepsilon, \]

where \(\varepsilon \sim N(0, 0.5)\).

(b) Fit a multiple linear regression to the simulated data from part a. What is the estimated prediction equation? Report the estimated coefficients and their standard errors. Are they significant? Clearly write out the null and alternative hypotheses, observed t-statistic(s), p-value(s), and interpret the estimates and test results. What is fitted model’s MSE?

# Fit multiple linear regression with two predictors
model_multi <- lm(Y ~ X_1 + X_2, data = df)

summary_model_multi <- summary(model_multi)

Estimated Model

The estimated simple linear regression model is:

\[\hat{Y} = 10.2640 + 4.8778X_1 + (-2.0389)X_2\]

Estimated Coefficients

coef_table <- as.data.frame(summary_model_multi$coefficients)

names(coef_table) <- c("Estimate", "Std Error", "t Value", "p Value")

coef_table$`p Value` <- format(coef_table$`p Value`, scientific = TRUE, digits = 2)

coef_table %>%
  kbl(
    caption = "<span style='font-size:12pt; white-space:nowrap;'>
    Table 4: Estimated Coefficients for Model Y<sub>2</sub> ~ X<sub>1</sub> + X<sub>2</sub> (n = 1000; Error SD = 0.5)
    </span>",
    align = "c",
    digits = 3
  ) %>%
  kable_classic(full_width = F, html_font = "Cambria") %>%
  column_spec(2, width = "3cm") %>%  
  column_spec(3, width = "3cm") %>%  
  column_spec(4, width = "3cm") %>%
  column_spec(5, width = "3cm")

Table 4: Estimated Coefficients for Model Y₂ ~ X₁ + X₂ (n = 1000; Error SD = 0.5)
	Estimate	Std Error	t Value	p Value
(Intercept)	10.264	0.306	33.538	1.1e-165
X_1	4.878	0.153	31.981	5.0e-155
X_2	-2.039	0.038	-53.156	3.3e-293

Significance

All coefficients of this equation are significant as the p-value for:

\[ X_1: 5.0 \times 10^{-155} < 0.05 \] \[ X_2: 3.3 \times 10^{-293} < 0.05 \]

Null & Alternative Hypothesis

For each coefficient \((\beta_i)\), the hypotheses are defined as:

\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]

(a) \(X_1\)

\[ t = 31.981, \quad p = 5.0 \times 10^{-155} \]

Reject \(H_0\) → \(X_1\) is a significant positive predictor of \(Y\).

(b) \(X_2\)

\[ t = -53.156, \quad p = 3.3 \times 10^{-293} \]

Reject \(H_0\) → \(X_2\) is a significant negative predictor of \(Y\).

Model Performance

cat("Mean Squared Error:",(mse <- mean(model_multi$residuals^2)))

## Mean Squared Error: 0.2338812

(c) Repeat part b), but re-simulate the data and change the error term to \(\epsilon \sim N\left(\mu = 0, \sigma = 1\right)\) (i.e., standard normal).

# Changing the error sig value to 1
set.seed(7052)
n <- 1000
X_1 <- rnorm(n, mean = 2, sd = 0.1)
X_2 <- rnorm(n, mean = 0, sd = 0.4)
error2 <- rnorm(n, mean = 0, sd = 1)
Y2 <- 10 + 5*X_1 - 2*X_2 + error2

df2 <- data.frame(Y2, X_1, X_2)

# Fit multiple linear regression with two predictors
model_multi2 <- lm(Y2 ~ X_1 + X_2, data = df2)

summary_model_multi2 <- summary(model_multi2)

Estimated Model

The estimated simple linear regression model is:

\[\hat{Y} = 10.5279 + 4.7555X_1 + (-2.0778)X_2\]

Estimated Coefficients

coef_table2 <- as.data.frame(summary_model_multi2$coefficients)

names(coef_table2) <- c("Estimate", "Std Error", "t Value", "p Value")

coef_table2$`p Value` <- format(coef_table2$`p Value`, scientific = TRUE, digits = 3)

coef_table2 %>%
  kbl(
    caption = "<span style='font-size:12pt; white-space:nowrap;'>
    Table 5: Estimated Coefficients for Model Y<sub>2</sub> ~ X<sub>1</sub> - X<sub>2</sub> (n = 1000; Error SD = 1)
    </span>",
    align = "c",
    digits = 3
  ) %>%
  kable_classic(full_width = F, html_font = "Cambria") %>%
  column_spec(2, width = "3cm") %>%  
  column_spec(3, width = "3cm") %>%  
  column_spec(4, width = "3cm") %>%
  column_spec(5, width = "3cm")

Table 5: Estimated Coefficients for Model Y₂ ~ X₁ - X₂ (n = 1000; Error SD = 1)
	Estimate	Std Error	t Value	p Value
(Intercept)	10.528	0.612	17.200	2.91e-58
X_1	4.756	0.305	15.590	3.35e-49
X_2	-2.078	0.077	-27.085	1.56e-121

Significance

All coefficients of this equation are significant as the p-value for:

\[ X_1: 3.35 \times 10^{-49} < 0.05 \] \[ X_2: 1.56 \times 10^{-121} < 0.05 \]

Null & Alternative Hypothesis

For each coefficient \((\beta_i)\), the hypotheses are defined as:

\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]

(a) \(X_1\)

\[ t = 15.590, \quad p = 3.35 \times 10^{-49} \]

Reject \(H_0\) → \(X_1\) is a significant positive predictor of \(Y\).

(b) \(X_2\)

\[ t = -27.085, \quad p = 1.56 \times 10^{-121} \]

Reject \(H_0\) → \(X_2\) is a significant negative predictor of \(Y\).

Model Performance

cat("Mean Squared Error:",(mse <- mean(model_multi2$residuals^2)))

## Mean Squared Error: 0.9355248

(d) Repeat parts a)–c) using \(n = 400\). What do you conclude? What is the effect to the model parameter estimates when the error variance gets smaller? What is the effect when the sample size gets bigger?

set.seed(7052)
n2 <- 400
X_1_n   <- rnorm(n2, mean = 2, sd = 0.1)
X_2_n   <- rnorm(n2, mean = 0, sd = 0.4)
error_n <- rnorm(n2, mean = 0, sd = 0.5)
Y_n <- 10 + 5*X_1_n - 2*X_2_n + error_n

df_n <- data.frame(Y_n, X_1_n, X_2_n)

# Fit multiple linear regression with two predictors
model_multi_n <- lm(Y_n ~ X_1_n + X_2_n, data = df_n)

summary_model_multi_n <- summary(model_multi_n)

Estimated Model

The estimated multi linear regression model is:

\[\hat{Y} = 10.698 + 4.662X_1 - (-1.958)X_2\]

Estimated Coefficients

coef_table_n <- as.data.frame(summary_model_multi_n$coefficients)

names(coef_table_n) <- c("Estimate", "Std Error", "t Value", "p Value")

coef_table_n$`p Value` <- format(coef_table_n$`p Value`, scientific = TRUE, digits = 3)

coef_table_n %>%
  kbl(
    caption = "<span style='font-size:12pt; white-space:nowrap;'>
    Table 6: Estimated Coefficients for Model Y<sub>2</sub> ~ X<sub>1</sub> + X<sub>2</sub> (n = 400; Error SD = 0.5)
    </span>",
    align = "c",
    digits = 3
  ) %>%
  kable_classic(full_width = F, html_font = "Cambria") %>%
  column_spec(2, width = "3cm") %>%  
  column_spec(3, width = "3cm") %>%  
  column_spec(4, width = "3cm") %>%
  column_spec(5, width = "3cm")

Table 6: Estimated Coefficients for Model Y₂ ~ X₁ + X₂ (n = 400; Error SD = 0.5)
	Estimate	Std Error	t Value	p Value
(Intercept)	10.698	0.521	20.532	2.32e-64
X_1_n	4.662	0.259	18.023	1.72e-53
X_2_n	-1.958	0.065	-30.081	1.99e-104

Significance

All coefficients of this equation are significant as the p-value for:

\[ X_1: 1.72 \times 10^{-53} < 0.05 \] \[ X_2: 1.99 \times 10^{-104} < 0.05 \]

Null & Alternative Hypothesis

For each coefficient \((\beta_i)\), the hypotheses are defined as:

\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]

(a) \(X_1\)

\[ t = 18.023, \quad p = 1.72 \times 10^{-53} \]

Reject \(H_0\) → \(X_1\) is a significant positive predictor of \(Y\).

(b) \(X_2\)

\[ t = -30.081, \quad p = 1.99 \times 10^{-104} \]

Reject \(H_0\) → \(X_2\) is a significant negative predictor of \(Y\).

Model Performance

cat("Mean Squared Error:",(mse <- mean(model_multi_n$residuals^2)))

## Mean Squared Error: 0.2557567

set.seed(7052)
# Changing the error sig value to 1
n2 <- 400
X_1_n   <- rnorm(n2, mean = 2, sd = 0.1)
X_2_n   <- rnorm(n2, mean = 0, sd = 0.4)
error2_n <- rnorm(n2, mean = 0, sd = 1)
Y2_n <- 10 + 5*X_1_n - 2*X_2_n + error2_n

df2_n <- data.frame(Y2_n, X_1_n, X_2_n)

# Fit multiple linear regression with two predictors
model_multi2_n <- lm(Y2_n ~ X_1_n + X_2_n, data = df2_n)

summary_model_multi2_n <- summary(model_multi2_n)

Estimated Model

The estimated simple linear regression model is:

\[\hat{Y} = 11.3965 + 4.3247X_1 - (-1.9156)X_2\]

Estimated Coefficients

coef_table2_n <- as.data.frame(summary_model_multi2_n$coefficients)

names(coef_table2_n) <- c("Estimate", "Std Error", "t Value", "p Value")

coef_table2_n$`p Value` <- format(coef_table2_n$`p Value`, scientific = TRUE, digits = 3)

coef_table2_n %>%
  kbl(
    caption = "<span style='font-size:12pt; white-space:nowrap;'>
    Table 7: Estimated Coefficients for Model Y<sub>2</sub> ~ X<sub>1</sub> + X<sub>2</sub> (n = 400; Error SD = 1)
    </span>",
    align = "c",
    digits = 3
  ) %>%
  kable_classic(full_width = F, html_font = "Cambria") %>%
  column_spec(2, width = "3cm") %>%  
  column_spec(3, width = "3cm") %>%  
  column_spec(4, width = "3cm") %>%
  column_spec(5, width = "3cm")

Table 7: Estimated Coefficients for Model Y₂ ~ X₁ + X₂ (n = 400; Error SD = 1)
	Estimate	Std Error	t Value	p Value
(Intercept)	11.397	1.042	10.936	1.64e-24
X_1_n	4.325	0.517	8.359	1.08e-15
X_2_n	-1.916	0.130	-14.716	1.98e-39

Significance

All coefficients of this equation are significant as the p-value for:

\[ X_1: 1.08 \times 10^{-15} < 0.05 \] \[ X_2: 1.98 \times 10^{-39} < 0.05 \]

Null & Alternative Hypothesis

For each coefficient \((\beta_i)\), the hypotheses are defined as:

\[ H_0: \beta_1 +\beta_2 = 0 \quad \text{vs.} \quad H_1: \text{At least one } \beta_j \neq 0 \]

(a) \(X_1\)

\[ t = 10.368, \quad p = 1.08 \times 10^{-15} \]

Reject \(H_0\) → \(X_1\) is a significant positive predictor of \(Y\).

(b) \(X_2\)

\[ t = -14.667, \quad p = 1.98 \times 10^{-39} \]

Reject \(H_0\) → \(X_2\) is a significant negative predictor of \(Y\).

Model Performance

cat("Mean Squared Error:",(mse <- mean(model_multi2_n$residuals^2)))

## Mean Squared Error: 1.023027

Effect of Increasing Sample Size (\(n\))

As the sample size \(n\) increases, the Mean Squared Error (MSE) values in our results change only slightly, but this does not mean the model is performing worse.
The MSE in this context represents the model’s estimate of the true variance of the residuals:

\[ \widehat{\sigma}^2_{MSE} = \frac{SSE}{n - p} \]

This value estimates the underlying error variance \(\sigma^2\), not predictive accuracy.
Because \(E[\widehat{\sigma}^2_{MSE}] = \sigma^2\), the expected MSE should remain approximately constant across different \(n\) values.

As \(n\) increases:

The denominator \(n - p\) becomes larger, which makes the MSE estimate more stable and less variable across samples.
The sampling fluctuations in MSE become smaller. The estimate of \(\sigma^2\) becomes more reliable, even if the specific numeric value changes slightly in a single trial.
Thus, a slightly higher MSE at larger \(n\) does not indicate a worse model; it simply reflects random variation in one simulated dataset.

Conceptually, increasing the sample size gives the regression model more information to estimate both coefficients and residual variance accurately.
The model becomes more precise, and the MSE becomes a more consistent estimator of the true error variance.

Effect of Decreasing Error Standard Deviation (\(\sigma\))

The standard deviation of the errors, \(\sigma\), measures the amount of random noise around the regression line.
Since the MSE estimates \(\sigma^2\), any change in \(\sigma\) directly affects the expected MSE.

When \(\sigma\) decreases:

The residuals become smaller and less scattered around the regression line.
The sum of squared errors (\(SSE\)) decreases, and therefore, the MSE also decreases.
Mathematically, \(MSE \propto \sigma^2\); halving \(\sigma\) should reduce the MSE by approximately one-fourth: \[ (0.5)^2 = 0.25(1)^2 \]

This pattern is reflected clearly in our simulation results. When the error standard deviation was \(\sigma = 1\), the MSE values were approximately 0.94–1.02. When the error standard deviation was reduced to \(\sigma = 0.5\), the MSE values dropped to about 0.23–0.26.

The ratio: \[ \frac{0.24}{0.96} \approx 0.25 \]

Conceptually, a smaller \(\sigma\) means less random noise, so the regression line fits the data more tightly.
The smaller the \(\sigma\), the smaller the average squared residuals (MSE), and the more accurate and precise the model becomes.

Conclusion

Increasing \(n\) → the MSE estimate becomes more stable and less variable.
Decreasing \(\sigma\) → the MSE becomes smaller, indicating less noise and a tighter model fit.
Both behaviors align with theoretical expectations for the Mean Squared Error in linear regression.

(e) What about the MSE from each model?

cat(
  " Mean Squared Error (n = 100; Error SD = 0.5):", mean(model_multi$residuals^2), "\n",
  "Mean Squared Error (n = 100; Error SD =  1 ):", mean(model_multi2$residuals^2), "\n",
  "Mean Squared Error (n = 400; Error SD = 0.5):", mean(model_multi_n$residuals^2), "\n",
  "Mean Squared Error (n = 400; Error SD =  1 ):", mean(model_multi2_n$residuals^2), "\n"
)

##  Mean Squared Error (n = 100; Error SD = 0.5): 0.2338812 
##  Mean Squared Error (n = 100; Error SD =  1 ): 0.9355248 
##  Mean Squared Error (n = 400; Error SD = 0.5): 0.2557567 
##  Mean Squared Error (n = 400; Error SD =  1 ): 1.023027

Problem 3

Multiple Linear Regression in Matrix Notation

(a) Write out the multiple linear regression model with normal errors in matrix form.

General form

\[ \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \qquad \boldsymbol{\varepsilon} \sim N\!\left(\mathbf{0}, \,\sigma^2 \mathbf{I}_n\right) \]

\(\mathbf{Y}\) is the \(n \times 1\) response vector.
\(\mathbf{X}\) is the \(n \times p\) design matrix (first column often all ones for the intercept).
\(\boldsymbol{\beta}\) is the \(p \times 1\) vector of regression coefficients.
\(\boldsymbol{\varepsilon}\) is the \(n \times 1\) error vector with i.i.d. \(N(0,\sigma^2)\) entries.

Matrix

Let \(p=3\) (intercept, \(X_1\), \(X_2\)). Then

\[ \mathbf{Y} = \begin{bmatrix} Y_1 \\ \vdots \\ Y_n \end{bmatrix},\quad \mathbf{X} = \begin{bmatrix} 1 & X_{11} & X_{21} \\ \vdots & \vdots & \vdots \\ 1 & X_{1n} & X_{2n} \end{bmatrix},\quad \boldsymbol{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix},\quad \boldsymbol{\varepsilon} = \begin{bmatrix} \varepsilon_1 \\ \vdots \\ \varepsilon_n \end{bmatrix}. \]

(b) State the model assumptions from part (a)

Assumptions:

The relationship between \(Y\) and the predictors is linear in the parameters, meaning \[ E(Y \mid X) = X\boldsymbol{\beta}. \]

All errors \(\varepsilon_i\) are independent and follow a normal distribution with mean 0 and variance \(\sigma^2\): \[ \varepsilon_i \sim N(0, \sigma^2). \]

Formally, in vector notation, \[ \boldsymbol{\varepsilon} \sim N\!\left(\mathbf{0},\, \sigma^2 \mathbf{I}_n\right). \]

The design matrix \(X\) is full rank, meaning there is no perfect multicollinearity.
If the model has \(p\) predictors, then the model matrix must have rank \(p + 1\) (including the intercept column).

For our two-predictor model (\(p=3\)):

\[ \mathbf{X} = \begin{bmatrix} 1 & X_{11} & X_{21}\\ 1 & X_{12} & X_{22}\\ \vdots & \vdots & \vdots\\ 1 & X_{1n} & X_{2n} \end{bmatrix} \quad\text{(size } n \times 3\text{)}. \]

Then \(\boldsymbol{\beta} = \begin{bmatrix}\beta_0\\ \beta_1\\ \beta_2\end{bmatrix}\), and \(\mathbf{Y} = \begin{bmatrix}Y_1\\ \vdots\\ Y_n\end{bmatrix}\)

(d) What is the least squares estimate of \(\beta\) in part (a)

\[ b = \hat{\boldsymbol{\beta}} = \left(\mathbf{X}^\top \mathbf{X}\right)^{-1}\mathbf{X}^\top \mathbf{Y} \]

(e) What does it mean for the estimate in part d) to be unbiased?

If our least squares estimate from the previous part is unbiased, that means the random “noise” in the data averages out across repeated samples, and OLS does not systematically overestimate or underestimate the true coefficients.

This does not mean that every individual sample produces the exact true values.
Instead, it means that on average, across many samples drawn from the same population, the estimator hits the true parameter.

Formally, the unbiasedness condition is: \[ E\!\left[\hat{\boldsymbol{\beta}} \mid X\right] = \boldsymbol{\beta}, \] or equivalently for each coefficient, \[ E(\hat{\beta}_j) = \beta_j \]