In this week’s discussion, I am using the mtcars dataset in R. This is a cross-sectional dataset that contains observations on 32 different car models from 1974.
Variable Definitions
Dependent Variable (Y): mpg - Miles per gallon (fuel efficiency)
Independent Variables (X):
wt - Weight of the car (in 1000 lbs)
hp - Horsepower
Estimating Equation
The estimating equation for our multiple linear regression is:
mpgi=β0+β1⋅wti+β2⋅hpi+ϵi
# Load datasetdata(mtcars)# Run multivariate regressionmod <-lm(mpg ~ wt + hp, data = mtcars)summary(mod)
Call:
lm(formula = mpg ~ wt + hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.941 -1.600 -0.182 1.050 5.854
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.22727 1.59879 23.285 < 2e-16 ***
wt -3.87783 0.63273 -6.129 1.12e-06 ***
hp -0.03177 0.00903 -3.519 0.00145 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
# Build the design matrix X with intercept, wt, and hpX <-cbind(1, mtcars$wt, mtcars$hp)Y <- mtcars$mpg# Calculate (X'X)^-1 X'YXtX_inv <-solve(t(X) %*% X)XtY <-t(X) %*% Ybeta_hat <- XtX_inv %*% XtY# Display the estimated coefficientscolnames(beta_hat) <-c("Estimate")rownames(beta_hat) <-c("Intercept", "wt", "hp")beta_hat
Estimate
Intercept 37.22727012
wt -3.87783074
hp -0.03177295
# Compute fitted values and residualsY_hat <- X %*% beta_hatresiduals <- Y - Y_hat# Degrees of freedom: n - k (n = obs, k = num parameters including intercept)n <-nrow(X)k <-ncol(X)sigma2_hat <-sum(residuals^2) / (n - k)# Variance-covariance matrix and standard errorsvar_beta_hat <-as.numeric(sigma2_hat) * XtX_invse_beta_hat <-sqrt(diag(var_beta_hat))names(se_beta_hat) <-c("Intercept", "wt", "hp")se_beta_hat
Intercept wt hp
1.59878754 0.63273349 0.00902971
# Compare manual and lm() coefficients # Collect results into a single table (as before)comparison <-data.frame(Term =c("Intercept", "wt", "hp"),Beta_Manual =as.numeric(beta_hat),Beta_lm =coef(mod),SE_Manual = se_beta_hat,SE_lm =summary(mod)$coefficients[, "Std. Error"])# Round only the numeric columnscomparison$Beta_Manual <-round(comparison$Beta_Manual, 4)comparison$Beta_lm <-round(comparison$Beta_lm, 4)comparison$SE_Manual <-round(comparison$SE_Manual, 4)comparison$SE_lm <-round(comparison$SE_lm, 4)print(comparison)
Term Beta_Manual Beta_lm SE_Manual SE_lm
(Intercept) Intercept 37.2273 37.2273 1.5988 1.5988
wt wt -3.8778 -3.8778 0.6327 0.6327
hp hp -0.0318 -0.0318 0.0090 0.0090
Conclusion
As seen above, both the manual matrix algebra and the lm formula results match. They produce not only the same coefficients but also the same standard errors. The logic is that standard errors reflect how sampling variability there is in our beta estimates, based on the data’s spread and how well the model fits. Intuitively, these standard errors measure the amount of uncertainty in the coefficient estimates.