used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 609750 32.6 1384338 74 NA 715608 38.3
Vcells 1124703 8.6 8388608 64 49152 2010814 15.4
cat("\f") # Clear the consolemtcars
We begin by fitting a multiple regression model predicting miles per gallon (mpg) from the number of cylinders (cyl) and engine displacement (disp) using the built-in mtcars dataset.
Call:
lm(formula = mpg ~ cyl + disp, data = df)
Residuals:
Min 1Q Median 3Q Max
-4.4213 -2.1722 -0.6362 1.1899 7.0516
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.66099 2.54700 13.609 4.02e-14 ***
cyl -1.58728 0.71184 -2.230 0.0337 *
disp -0.02058 0.01026 -2.007 0.0542 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.055 on 29 degrees of freedom
Multiple R-squared: 0.7596, Adjusted R-squared: 0.743
F-statistic: 45.81 on 2 and 29 DF, p-value: 1.058e-09
Residuals are the differences between observed and fitted values: \(e_i = y_i - \hat{y}_i\). A key property of OLS residuals is that they sum to zero (within rounding error).
reg1$residuals Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
-0.84395255 -0.84395255 -3.28885510 1.57324352
Hornet Sportabout Valiant Duster 360 Merc 240D
4.14732774 -2.40601638 -0.25267226 -0.89226849
Merc 230 Merc 280 Merc 280C Merc 450SE
-2.61371193 -2.48751693 -3.88751693 0.11418581
Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
1.01418581 -1.08581419 -1.84730532 -2.09430892
Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
1.79401841 5.70804444 3.64629354 7.05160883
Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
-4.33979314 0.08281514 -0.50535572 -1.45850859
Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
5.47067308 0.61421953 0.16432359 4.04561603
Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
1.06207504 -2.45270705 -0.76710662 -4.42126787
length(reg1$residuals)[1] 32
[1] 0
[1] 270.7403
The t-statistic for a coefficient is its estimate divided by its standard error:
\[t = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)}\]
# Manually
-1.58728 / 0.71184[1] -2.229827
# Using stored coefficient
round(x = reg1$coefficients[2] / 0.71184, digits = 3) cyl
-2.23
The two-tailed p-value uses the t-distribution with \(n - (k + 1)\) degrees of freedom, where \(k\) is the number of predictors.
\[p = 2 \times P\!\left(T \leq t_{\text{obs}}\right)\]
2 * pt(q = -2.230, df = 32 - (2 + 1), lower.tail = TRUE)[1] 0.03365088
The residual standard error (RSE) is the square root of the mean squared error:
\[\hat{\sigma} = \sqrt{\frac{\sum e_i^2}{n - (k+1)}} = \sqrt{\frac{SSR}{df_{\text{error}}}}\]
Because OLS residuals have a mean of exactly zero, the deviation from the mean equals the residual itself, simplifying the calculation.
residuals <- as.data.frame(reg1$residuals)
# Standard deviation of residuals (adjusts for df automatically)
sd(as.numeric(residuals$`reg1$residuals`))[1] 2.955259
# Step-by-step manual calculation
residuals$`reg1$residuals` - mean(residuals$`reg1$residuals`) # deviation from mean [1] -0.84395255 -0.84395255 -3.28885510 1.57324352 4.14732774 -2.40601638
[7] -0.25267226 -0.89226849 -2.61371193 -2.48751693 -3.88751693 0.11418581
[13] 1.01418581 -1.08581419 -1.84730532 -2.09430892 1.79401841 5.70804444
[19] 3.64629354 7.05160883 -4.33979314 0.08281514 -0.50535572 -1.45850859
[25] 5.47067308 0.61421953 0.16432359 4.04561603 1.06207504 -2.45270705
[31] -0.76710662 -4.42126787
(residuals$`reg1$residuals` - mean(residuals$`reg1$residuals`))^2 # squared deviations [1] 0.712255903 0.712255903 10.816567877 2.475095181 17.200327418
[6] 5.788914815 0.063843269 0.796143058 6.831490036 6.187740499
[11] 15.112787915 0.013038400 1.028572866 1.178992446 3.412536939
[16] 4.386129847 3.218502069 32.581771340 13.295456586 49.725187054
[21] 18.833804476 0.006858348 0.255384408 2.127247306 29.928263937
[26] 0.377265632 0.027002242 16.367009048 1.128003399 6.015771866
[31] 0.588452574 19.547609556
[1] 270.7403
[1] 9.335872
[1] 3.055466
[1] 3.055466
summary(reg1) # confirm
Call:
lm(formula = mpg ~ cyl + disp, data = df)
Residuals:
Min 1Q Median 3Q Max
-4.4213 -2.1722 -0.6362 1.1899 7.0516
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.66099 2.54700 13.609 4.02e-14 ***
cyl -1.58728 0.71184 -2.230 0.0337 *
disp -0.02058 0.01026 -2.007 0.0542 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.055 on 29 degrees of freedom
Multiple R-squared: 0.7596, Adjusted R-squared: 0.743
F-statistic: 45.81 on 2 and 29 DF, p-value: 1.058e-09
In matrix notation, the OLS model is \(\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{e}\). The design matrix \(\mathbf{X}\) contains a leading column of ones (for the intercept) followed by the predictor columns.
\(\mathbf{X}\) is the design matrix — an \(n \times (k+1)\) matrix where:
\(\mathbf{X}^\top\mathbf{X}\) is the Sum of Squares and Cross Products (SSCP) matrix — a \((k+1) \times (k+1)\) symmetric matrix where:
\[\mathbf{X}^\top\mathbf{X} = \begin{bmatrix} \sum 1 & \sum x_1 & \sum x_2 \\ \sum x_1 & \sum x_1^2 & \sum x_1 x_2 \\ \sum x_2 & \sum x_1 x_2 & \sum x_2^2 \end{bmatrix}\]
Variations on this theme reveal different matrix interpretations depending on how \(\mathbf{X}\) is scaled:
| Transformation of \(\mathbf{X}\) | \(\frac{1}{n-1}\,\mathbf{X}^\top\mathbf{X}\) produces… | Diagonal | Off-diagonal |
|---|---|---|---|
| Raw \(\mathbf{X}\) | SSCP matrix | Sums of squares | Cross products |
| Centered \(\mathbf{X}\) ÷ \((n-1)\) | Variance–Covariance matrix | Variances | Covariances |
| Standardized \(\mathbf{X}\) ÷ \((n-1)\) | Pearson Correlation matrix | 1s | Correlations |
| Unit-scaled \(\mathbf{X}\) (no ÷) | Cosine Similarity matrix | 1s | Cosine similarities |
To keep things clean, all four transformations use only the predictor columns (no intercept column of 1s), since centering, standardizing, or unit-scaling a constant column is undefined.
\(\mathbf{X}_{\text{raw}}^\top \mathbf{X}_{\text{raw}}\) gives the Sum of Squares and Cross Products (SSCP) matrix — raw sums of squares on the diagonal, raw cross products off the diagonal.
Centering means subtracting the column mean from every entry in that column:
\[x_{ij}^{\text{centered}} = x_{ij} - \bar{x}_j\]
This shifts each variable so it has a mean of zero. When you compute \(\mathbf{X}_c^\top \mathbf{X}_c\) on the centered matrix and divide by \(n - 1\), the diagonal entries become variances and the off-diagonal entries become covariances — identical to cov().
\[\frac{1}{n-1}\,\mathbf{X}_c^\top \mathbf{X}_c = \mathbf{\Sigma} \quad \text{(Variance–Covariance Matrix)}\]
Column means of X_centered (should be ~0):
cyl disp
0.000000e+00 -3.907985e-14
--- Centered X'X / (n-1): Variance-Covariance Matrix ---
print(VarCov) cyl disp
cyl 3.189516 199.6603
disp 199.660282 15360.7998
# Confirm with cov()
cat("\n--- Confirm with cov() ---\n")
--- Confirm with cov() ---
cyl disp
cyl 3.189516 199.6603
disp 199.660282 15360.7998
Standardizing means centering AND dividing by the standard deviation:
\[x_{ij}^{\text{std}} = \frac{x_{ij} - \bar{x}_j}{s_j}\]
This gives each variable a mean of 0 and a standard deviation of 1. Dividing \(\mathbf{X}_s^\top \mathbf{X}_s\) by \(n - 1\) yields the Pearson correlation matrix — 1s on the diagonal, correlations off the diagonal.
\[\frac{1}{n-1}\,\mathbf{X}_s^\top \mathbf{X}_s = \mathbf{R} \quad \text{(Correlation Matrix)}\]
Unit-scaling (L2 normalization) divides each column by its Euclidean norm (length):
\[x_{ij}^{\text{unit}} = \frac{x_{ij}}{\|\mathbf{x}_j\|} \quad \text{where} \quad \|\mathbf{x}_j\| = \sqrt{\sum_i x_{ij}^2}\]
\(\mathbf{X}_u^\top \mathbf{X}_u\) (no division by \(n-1\)) gives the cosine similarity matrix — 1s on the diagonal, cosine similarities off the diagonal.
\[\mathbf{X}_u^\top \mathbf{X}_u = \text{Cosine Similarity Matrix}\]
--- Unit-scaled X'X: Cosine Similarity Matrix ---
print(CosineMat) cyl disp
cyl 1.0000000 0.9656088
disp 0.9656088 1.0000000
\((\mathbf{X}^\top\mathbf{X})^{-1}\) is the matrix inverse of the SSCP matrix. It plays the role of “dividing” in matrix algebra and is central to the OLS solution:
\[\hat{\boldsymbol{\beta}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y}\]
XtX_inv <- solve(XtX)
XtX_inv [,1] [,2] [,3]
[1,] 0.694871232 -0.1730656141 1.764992e-03
[2,] -0.173065614 0.0542769093 -7.054934e-04
[3,] 0.001764992 -0.0007054934 1.127006e-05
In practice, OLS is often computed on the centered design matrix \(\mathbf{X}_c\) (predictors mean-centered, intercept column dropped) because centering:
The relationship is:
\[(\mathbf{X}_c^\top \mathbf{X}_c)^{-1} = \left[(n-1)\,\boldsymbol{\Sigma}\right]^{-1} = \frac{1}{n-1}\,\boldsymbol{\Sigma}^{-1}\]
--- Centered X'X ---
print(XtX_c) cyl disp
cyl 98.875 6189.469
disp 6189.469 476184.795
cat("\n--- (Centered X'X)^{-1} ---\n")
--- (Centered X'X)^{-1} ---
print(XtX_c_inv) cyl disp
cyl 0.0542769093 -7.054934e-04
disp -0.0007054934 1.127006e-05
--- (1/(n-1)) * Sigma^{-1}: should match (Centered X'X)^{-1} ---
[,1] [,2]
[1,] 0.0542769093 -7.054934e-04
[2,] -0.0007054934 1.127006e-05
The variance–covariance matrix of the coefficient estimates is:
\[\text{Var}(\hat{\boldsymbol{\beta}}) = \hat{\sigma}^2 \,(\mathbf{X}^\top\mathbf{X})^{-1}\]
where \(\hat{\sigma}^2 = \frac{\sum e_i^2}{n-(k+1)}\) is the estimated error variance (mean squared error). The standard errors are the square roots of the diagonal entries of this matrix.
\[SE(\hat{\beta}_j) = \sqrt{\left[\hat{\sigma}^2 (\mathbf{X}^\top\mathbf{X})^{-1}\right]_{jj}}\]
# Estimated error variance (sigma-hat squared)
sigma2_hat <- sum(reg1$residuals^2) / 29
# Variance-covariance matrix of beta-hat
vcov_beta <- XtX_inv * sigma2_hat
vcov_beta [,1] [,2] [,3]
[1,] 6.48722874 -1.615718386 0.0164777387
[2,] -1.61571839 0.506722267 -0.0065863960
[3,] 0.01647774 -0.006586396 0.0001052158
[1] 2.54700388 0.71184427 0.01025748
# Confirm against summary output
summary(reg1)
Call:
lm(formula = mpg ~ cyl + disp, data = df)
Residuals:
Min 1Q Median 3Q Max
-4.4213 -2.1722 -0.6362 1.1899 7.0516
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.66099 2.54700 13.609 4.02e-14 ***
cyl -1.58728 0.71184 -2.230 0.0337 *
disp -0.02058 0.01026 -2.007 0.0542 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.055 on 29 degrees of freedom
Multiple R-squared: 0.7596, Adjusted R-squared: 0.743
F-statistic: 45.81 on 2 and 29 DF, p-value: 1.058e-09
# Manual checks
sqrt(6.48722874) # SE for intercept[1] 2.547004
sqrt(0.506722267) # SE for cyl[1] 0.7118443
sqrt(0.0001052158) # SE for disp[1] 0.01025748
The values match the standard errors reported by summary(reg1), confirming the matrix derivation.