Discussion W2

Introduction

In this week’s discussion, I am using the mtcars dataset in R. This is a cross-sectional dataset that contains observations on 32 different car models from 1974.

Variable Definitions

Dependent Variable (Y): mpg - Miles per gallon (fuel efficiency)

Independent Variables (X):

wt - Weight of the car (in 1000 lbs)
hp - Horsepower

Estimating Equation

The estimating equation for our multiple linear regression is:

mpgi=β0+β1⋅wti+β2⋅hpi+ϵi

# Load dataset
data(mtcars)

# Run multivariate regression
mod <- lm(mpg ~ wt + hp, data = mtcars)
summary(mod)


Call:
lm(formula = mpg ~ wt + hp, data = mtcars)

Residuals:
   Min     1Q Median     3Q    Max 
-3.941 -1.600 -0.182  1.050  5.854 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
wt          -3.87783    0.63273  -6.129 1.12e-06 ***
hp          -0.03177    0.00903  -3.519  0.00145 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared:  0.8268,    Adjusted R-squared:  0.8148 
F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

# Build the design matrix X with intercept, wt, and hp
X <- cbind(1, mtcars$wt, mtcars$hp)
Y <- mtcars$mpg

# Calculate (X'X)^-1 X'Y
XtX_inv <- solve(t(X) %*% X)
XtY <- t(X) %*% Y
beta_hat <- XtX_inv %*% XtY

# Display the estimated coefficients
colnames(beta_hat) <- c("Estimate")
rownames(beta_hat) <- c("Intercept", "wt", "hp")
beta_hat

             Estimate
Intercept 37.22727012
wt        -3.87783074
hp        -0.03177295

# Compute fitted values and residuals
Y_hat <- X %*% beta_hat
residuals <- Y - Y_hat

# Degrees of freedom: n - k (n = obs, k = num parameters including intercept)
n <- nrow(X)
k <- ncol(X)
sigma2_hat <- sum(residuals^2) / (n - k)

# Variance-covariance matrix and standard errors
var_beta_hat <- as.numeric(sigma2_hat) * XtX_inv
se_beta_hat <- sqrt(diag(var_beta_hat))
names(se_beta_hat) <- c("Intercept", "wt", "hp")
se_beta_hat

 Intercept         wt         hp 
1.59878754 0.63273349 0.00902971

# Compare manual and lm() coefficients 
# Collect results into a single table (as before)
comparison <- data.frame(
  Term = c("Intercept", "wt", "hp"),
  Beta_Manual = as.numeric(beta_hat),
  Beta_lm = coef(mod),
  SE_Manual = se_beta_hat,
  SE_lm = summary(mod)$coefficients[, "Std. Error"]
)

# Round only the numeric columns
comparison$Beta_Manual <- round(comparison$Beta_Manual, 4)
comparison$Beta_lm     <- round(comparison$Beta_lm, 4)
comparison$SE_Manual   <- round(comparison$SE_Manual, 4)
comparison$SE_lm       <- round(comparison$SE_lm, 4)

print(comparison)

                 Term Beta_Manual Beta_lm SE_Manual  SE_lm
(Intercept) Intercept     37.2273 37.2273    1.5988 1.5988
wt                 wt     -3.8778 -3.8778    0.6327 0.6327
hp                 hp     -0.0318 -0.0318    0.0090 0.0090

Conclusion

As seen above, both the manual matrix algebra and the lm formula results match. They produce not only the same coefficients but also the same standard errors. The logic is that standard errors reflect how sampling variability there is in our beta estimates, based on the data’s spread and how well the model fits. Intuitively, these standard errors measure the amount of uncertainty in the coefficient estimates.