OLS: From Scratch using Matrix Algebra in R

1 Introduction

One of the very first learning algorithms that you’ll encounter when studying data science and machine learning is least squares linear regression. Linear regression is one of the easiest learning algorithms to understand; it’s suitable for a wide array of problems, and is already implemented in many programming languages.

Most users are familiar with the lm() function in R, which allows us to perform linear regression quickly and easily. But one drawback to the lm() function is that it takes care of the computations to obtain parameter estimates (and many diagnostic statistics, as well) on its own, leaving the user out of the equation. So, how can we obtain these parameter estimates from scratch?

In this file, I will outline the process from first principles in R. I will use only matrices, vectors, and matrix operations to obtain parameter estimates using the closed-form linear algebraic solution. After reading this post, you’ll see that it’s actually quite simple, and you’ll be able to replicate the process with your own data sets (though using the lm() function is of course much more efficient and robust).

2 Data

Boston data set from the MASS library set contains 14 economic, geographic, and demographic variables for 506 tracts in the city of Boston from the early 1990s.

crim: per capita crime rate by town.
zn: proportion of residential land zoned for lots over 25,000 sq.ft.
indus: proportion of non-retail business acres per town.
chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
nox: nitrogen oxides concentration (parts per 10 million).
rm: average number of rooms per dwelling.
age: proportion of owner-occupied units built prior to 1940.
dis: weighted mean of distances to five Boston employment centres.
rad: index of accessibility to radial highways.
tax: full-value property-tax rate per $10,000.
ptratio: pupil-teacher ratio by town.
black: $1000(Bk-0.63)^2$ where Bk is the proportion of blacks by town.
lstat: lower status of the population (percent).
medv: median value of owner-occupied homes in $1000s.

Our target variable will be the median owner-occupied home value for each tract (in $1000s) — the column titled medv.

2.1 Estimating Equation

Please make sure to type out the estimating equation. Read up on how to do so here.

\[ Median \ Housing \ Price_i = \beta_0 + \beta_1 Crime \ Rate_i + \beta_2 Residential \ Land \ Zoned_i + \beta_3 Charles \ River \ Adjacent_i + ... + \beta_{12} Black_i + \beta_{13} \ Lower \ Socio \ Economic \ Status + \epsilon_i \]

It might be a good idea to split and allign the equation as well. See the same equation below.

Note \\ pushes everything after it to the next line, and & alligns everything under it in the code below.

\[\begin{align*} Median \ Housing \ Price_i = & \beta_0 \ + \beta_1 Crime \ Rate_i + \beta_2 \ Residential \ Land \ Zoned_i + \\ & \beta_3 \ Charles \ River \ Adjacent_i + ... + \beta_{12} \ Black_i + \\ & \beta_{13} \ Lower \ Socio \ Economic \ Status + \epsilon_i \end{align*}\]

Let’s get an idea of what this data set looks like:

# install.packages("MASS")
library(MASS)
  help(Boston)

str(Boston) # 506 rows and 14 columns.

'data.frame':   506 obs. of  14 variables:
 $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
 $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
 $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
 $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
 $ rm     : num  6.58 6.42 7.18 7 7.15 ...
 $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
 $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
 $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
 $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
 $ black  : num  397 397 393 395 397 ...
 $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
 $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::select() masks MASS::select()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

glimpse(Boston)

Rows: 506
Columns: 14
$ crim    <dbl> 0.00632, 0.02731, 0.02729, 0.03237, 0.06905, 0.02985, 0.08829,…
$ zn      <dbl> 18.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 1…
$ indus   <dbl> 2.31, 7.07, 7.07, 2.18, 2.18, 2.18, 7.87, 7.87, 7.87, 7.87, 7.…
$ chas    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ nox     <dbl> 0.538, 0.469, 0.469, 0.458, 0.458, 0.458, 0.524, 0.524, 0.524,…
$ rm      <dbl> 6.575, 6.421, 7.185, 6.998, 7.147, 6.430, 6.012, 6.172, 5.631,…
$ age     <dbl> 65.2, 78.9, 61.1, 45.8, 54.2, 58.7, 66.6, 96.1, 100.0, 85.9, 9…
$ dis     <dbl> 4.0900, 4.9671, 4.9671, 6.0622, 6.0622, 6.0622, 5.5605, 5.9505…
$ rad     <int> 1, 2, 2, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,…
$ tax     <dbl> 296, 242, 242, 222, 222, 222, 311, 311, 311, 311, 311, 311, 31…
$ ptratio <dbl> 15.3, 17.8, 17.8, 18.7, 18.7, 18.7, 15.2, 15.2, 15.2, 15.2, 15…
$ black   <dbl> 396.90, 396.90, 392.83, 394.63, 396.90, 394.12, 395.60, 396.90…
$ lstat   <dbl> 4.98, 9.14, 4.03, 2.94, 5.33, 5.21, 12.43, 19.15, 29.93, 17.10…
$ medv    <dbl> 24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15…

# install.packages("psych")
library(psych)


Attaching package: 'psych'

The following objects are masked from 'package:ggplot2':

    %+%, alpha

describe(Boston, skew = FALSE)

        vars   n   mean     sd median    min    max  range   se
crim       1 506   3.61   8.60   0.26   0.01  88.98  88.97 0.38
zn         2 506  11.36  23.32   0.00   0.00 100.00 100.00 1.04
indus      3 506  11.14   6.86   9.69   0.46  27.74  27.28 0.30
chas       4 506   0.07   0.25   0.00   0.00   1.00   1.00 0.01
nox        5 506   0.55   0.12   0.54   0.38   0.87   0.49 0.01
rm         6 506   6.28   0.70   6.21   3.56   8.78   5.22 0.03
age        7 506  68.57  28.15  77.50   2.90 100.00  97.10 1.25
dis        8 506   3.80   2.11   3.21   1.13  12.13  11.00 0.09
rad        9 506   9.55   8.71   5.00   1.00  24.00  23.00 0.39
tax       10 506 408.24 168.54 330.00 187.00 711.00 524.00 7.49
ptratio   11 506  18.46   2.16  19.05  12.60  22.00   9.40 0.10
black     12 506 356.67  91.29 391.44   0.32 396.90 396.58 4.06
lstat     13 506  12.65   7.14  11.36   1.73  37.97  36.24 0.32
medv      14 506  22.53   9.20  21.20   5.00  50.00  45.00 0.41

3 Model

Linear regression typically takes the form

$y = X \beta + \epsilon$

where

$y$ is a vector of the response variable,
$X$ is the matrix of our feature variables (sometimes called the ‘design’ matrix),
$\beta$ is a vector of parameters that we want to estimate, and
$\epsilon$ is a vector the error term.

3.1 Setup

# Dependant variable 
y <- as.vector(Boston$medv)

# Matrix of feature variables from Boston
X <- as.matrix(Boston[-ncol(Boston)])

?rep #replicates the values in x. 
# vector of ones with same length as rows in Boston
int <- rep(x = 1, 
           times =  length(y)
           )

?cbind # Take a sequence of vector, matrix or data-frame arguments and combine by columns or rows, respectively.

# Add intercept column to X
X <- cbind(int, X)

remove(int)

Now that we have our response vector and our X matrix, we can use them to solve for the parameters using the following closed form solution:

$\beta=(X'X)^{-1}X'y$

3.2 Derivation

If you are interested in the derivation, check out this OLS Matrix Proof.

A good place to start is also Wikipedia or any good undergraduate textbook on linear regression.

4 Implementation: Matrix Algebra

We can implement this in R using our $X$ matrix and $y$ vector.

The %*% operator is simply matrix multiplication.
The t() function takes the transpose of a matrix, and
solve() calculates the inverse of any (invertible) matrix.

$\beta=(X'X)^{-1}X'y$ can be written as -

# Implement closed-form solution
betas <- solve(t(X) %*% X) %*% t(X) %*% y
 
# Round for easier viewing
betas <- round(x = betas, 
               digits = 2
               )

betas

          [,1]
int      36.46
crim     -0.11
zn        0.05
indus     0.02
chas      2.69
nox     -17.77
rm        3.81
age       0.00
dis      -1.48
rad       0.31
tax      -0.01
ptratio  -0.95
black     0.01
lstat    -0.52

Recall that the response variable was medv - median value of owner-occupied homes in $1000s.

We can find the relationship between the response and any of the feature variables in the same manner, paying careful attention to the units in the data description:

We see that an increase of one unit in the number of rooms (rm) is associated with a $3,810 increase in home value.

rm - average number of rooms per dwelling.

The coefficient of chas tells us that homes in tracts adjacent to the Charles River (coded as 1) have a median price that is $2,690 higher than homes in tracts that do not border the river (coded as 0), when all other variables are held constant (cetrius paribus).

chas - a dummy variable which takes a value of 0 or 1 depending on whether or not the tract is adjacent to the Charles River.

An alternative way to compute the beta coefficients in a more computationally efficient manner.

# Alternatively, get the same betas in a more computationally efficient manner - 
betas_check <- solve(crossprod(X), crossprod(X,y))

betas == round(x = betas_check, 
               digits = 2
               )

        [,1]
int     TRUE
crim    TRUE
zn      TRUE
indus   TRUE
chas    TRUE
nox     TRUE
rm      TRUE
age     TRUE
dis     TRUE
rad     TRUE
tax     TRUE
ptratio TRUE
black   TRUE
lstat   TRUE

5 Implementation: `lm()` command

Now let’s verify that these results are in fact correct. We want to compare our results to those produced by the lm() function. Most of you will already know how to do this.

# Linear regression model
lm.mod <- lm(medv ~ ., data=Boston)

# Round for easier viewing
lm.betas <- round(x = lm.mod$coefficients, 
                  digits = 2
                  )

# Create data.frame of results
?data.frame
results <- data.frame(our.results=betas, lm.results=lm.betas)

print(results)

        our.results lm.results
int           36.46      36.46
crim          -0.11      -0.11
zn             0.05       0.05
indus          0.02       0.02
chas           2.69       2.69
nox          -17.77     -17.77
rm             3.81       3.81
age            0.00       0.00
dis           -1.48      -1.48
rad            0.31       0.31
tax           -0.01      -0.01
ptratio       -0.95      -0.95
black          0.01       0.01
lstat         -0.52      -0.52

5.1 Interpretation

The parameter estimates are identical! Now you know how to obtain parameter estimates on your own. Keep in mind that you’re unlikely to favor implementing linear regression in this way over using lm(). The lm() function is very quick, and requires very little code. Using it provides us with a number of diagnostic statistics, including R2, t-statistics, and the oft-maligned p-values, among others. It also solves for the parameters using QR decomposition, which is more robust than the method I’ve presented here. Nevertheless, I wanted to show one way in which it can be done.

There are other ways to solve for the parameters in linear regression. For example, gradient descent can be used to obtain parameter estimates when the number of features is extremely large, a situation that can drastically slow solution time when using the closed-form method. Stochastic gradient descent is often used when both the number of features and the number of samples are very large.

# Print out base lm command output 
summary(lm.mod)


Call:
lm(formula = medv ~ ., data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.595  -2.730  -0.518   1.777  26.199 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
crim        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
zn           4.642e-02  1.373e-02   3.382 0.000778 ***
indus        2.056e-02  6.150e-02   0.334 0.738288    
chas         2.687e+00  8.616e-01   3.118 0.001925 ** 
nox         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
rm           3.810e+00  4.179e-01   9.116  < 2e-16 ***
age          6.922e-04  1.321e-02   0.052 0.958229    
dis         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
rad          3.060e-01  6.635e-02   4.613 5.07e-06 ***
tax         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
ptratio     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
black        9.312e-03  2.686e-03   3.467 0.000573 ***
lstat       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.745 on 492 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7338 
F-statistic: 108.1 on 13 and 492 DF,  p-value: < 2.2e-16

6 Supplementary

It may be a good practice to plot the correlation of your X variables as if you have perfect (or very high multicollinearity), you will not be able to invert your X'X matrix and thus not be able to compute the beta coefficients.

# correlation coefficient of all variables in the dataframe.  
?cor
corr_matrix <- cor(X,
    use = ("complete.obs"),
    method = c("pearson")
    )

Warning in cor(X, use = ("complete.obs"), method = c("pearson")): the standard
deviation is zero

# visualise the correlation coefficient for ease
library(ggcorrplot)
?ggcorrplot
# Create a correlation plot
ggcorrplot(corr_matrix[-1,-1], 
           hc.order = TRUE, 
           type = "lower", 
           lab = TRUE,
           )

## CAN GO A BIT FANCY BY MARKING ONLY VARIBALES THAT ARE STATISTICALLY SIGNIFICANTLY CORRELATED WITH EACH OTHER...

# Compute a matrix of correlation p-values
p.mat <- cor_pmat(X[-1,-1])

# Visualize the correlation matrix
ggcorrplot(corr_matrix[-1,-1], 
           hc.order = TRUE, 
           type = "lower", 
           lab = TRUE,
           p.mat = p.mat,     #
           sig.level = 0.01,  #
           insig = c("blank") #
           )

7 Standard Errors Intuition

In the formula \[Var(\hat{\beta})=\sigma^2*{(X^{T}X)}^{-1}\], both \[\sigma^2\] and \[{(X^{T}X)}^{-1}\] are important components, and they play different roles in understanding the precision of the estimated coefficients in a linear regression model.

$\sigma^2$: Variance of the Error Term
- $\sigma^2$ represents the variance of the error term in the linear regression model $(y=X\beta + \epsilon)$.
- It quantifies the spread or variability of the errors around the fitted regression line.
- A higher $\sigma^2$ indicates higher variability in the data points around the regression line, making it harder to precisely estimate the coefficients.
${(X^{T}X)}^{-1}$: Inverse of the matrix product of Transpose of $X$ and $X$
- ${X^{T}X}$ is a matrix that summarizes the relationships between the independent variables in the dataset.
- Taking the inverse ${(X^{T}X)}^{-1}$ is a mathematical operation that is used to compute the variance-covariance matrix of the estimated coefficients.
- The inverse of ${X^{T}X}$ is crucial for understanding how changes in the independent variables impact the precision of the coefficient estimates.

In practice:

A smaller $\sigma^2$ indicates less variability in the errors, making coefficient estimates more precise.
A well-conditioned and non-singular ${X^{T}X}$ (i.e., ${(X^{T}X)}^{-1}$ exists) is important for the stability of the estimates. If ${X^{T}X}$ is close to being singular (highly correlated predictors), it may lead to inflated variances and less reliable coefficient estimates.

In summary, both components are crucial, and they interact in determining the precision of the estimated coefficients. A lower $\sigma^2$ and a well-behaved ${X^{T}X}$ contribute to more precise coefficient estimates. The goal is to have a well-specified model (low error variance) and well-behaved predictors (non-singular ${X^{T}X}$) for reliable and interpretable regression results.

8 Calculating Standard Errors

You can calculate everything else that lm() command above produced !

# N x k matrix
N   <- nrow(X)
k   <- ncol(X) - 1 # number of predictor variables (excluding Intercept column)

# Estimated Regression Coefficients
beta_hat    <-    solve( t(X) %*% X )  %*%  t(X) %*% y

# Variance of outcome variable = Variance of residuals
sigma_sq  <- 
  residual_variance <- (N-k-1)^-1 * sum((y - X %*% beta_hat)^2)
sigma_sq

[1] 22.51785

residual_std_error  <- sqrt(residual_variance)

# Variance and Std. Error of estimated coefficients of the linear model
var_betaHat       <- sigma_sq * solve(t(X) %*% X)

Lets print out \[Var(\hat{\beta})=\sigma^2*{(X^{T}X)}^{-1}\] -

round(var_betaHat, 
      digits = 0)

        int crim zn indus chas nox rm age dis rad tax ptratio black lstat
int      26    0  0     0    0 -11 -2   0   0   0   0       0     0     0
crim      0    0  0     0    0   0  0   0   0   0   0       0     0     0
zn        0    0  0     0    0   0  0   0   0   0   0       0     0     0
indus     0    0  0     0    0   0  0   0   0   0   0       0     0     0
chas      0    0  0     0    1   0  0   0   0   0   0       0     0     0
nox     -11    0  0     0    0  15  0   0   0   0   0       0     0     0
rm       -2    0  0     0    0   0  0   0   0   0   0       0     0     0
age       0    0  0     0    0   0  0   0   0   0   0       0     0     0
dis       0    0  0     0    0   0  0   0   0   0   0       0     0     0
rad       0    0  0     0    0   0  0   0   0   0   0       0     0     0
tax       0    0  0     0    0   0  0   0   0   0   0       0     0     0
ptratio   0    0  0     0    0   0  0   0   0   0   0       0     0     0
black     0    0  0     0    0   0  0   0   0   0   0       0     0     0
lstat     0    0  0     0    0   0  0   0   0   0   0       0     0     0

# BETTER FORMATTING 
library(knitr)
library(kableExtra)


Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

# Print the table using kable with kableExtra for HTML output
kable(x = var_betaHat,
      format = "html", 
      digits = 2) %>%
  kable_styling(full_width = TRUE)  # Adjust width as needed

	int	crim	indus	chas	nox	rm	age	dis	rad	ptratio	lstat
int	26.05	-0.01	0.02	-0.12	-10.64	-1.51	0.01	-0.36	0.10	-0.40	-0.08
crim	-0.01	0.00	0.00	0.00	0.01	0.00	0.00	0.00	0.00	0.00	0.00
zn	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
indus	0.02	0.00	0.00	-0.01	-0.06	0.00	0.00	0.00	0.00	0.00	0.00
chas	-0.12	0.00	-0.01	0.74	-0.11	-0.01	0.00	0.00	-0.01	0.01	0.00
nox	-10.64	0.01	-0.06	-0.11	14.59	0.15	-0.01	0.21	-0.04	0.16	-0.01
rm	-1.51	0.00	0.00	-0.01	0.15	0.17	0.00	0.01	0.00	0.01	0.01
age	0.01	0.00	0.00	0.00	-0.01	0.00	0.00	0.00	0.00	0.00	0.00
dis	-0.36	0.00	0.00	0.00	0.21	0.01	0.00	0.04	0.00	0.00	0.00
rad	0.10	0.00	0.00	-0.01	-0.04	0.00	0.00	0.00	0.00	0.00	0.00
tax	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
ptratio	-0.40	0.00	0.00	0.01	0.16	0.01	0.00	0.00	0.00	0.02	0.00
black	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
lstat	-0.08	0.00	0.00	0.00	-0.01	0.01	0.00	0.00	0.00	0.00	0.00

kable is a function from the knitr and kableExtra packages in R that is used to create tables in various output formats, primarily within R Markdown documents or when using R for dynamic report generation.

The usual standard errors are on the diagonal element of this variance-covariance matrix, but need to be square-rooted.

coeff_std_errors  <- sqrt(diag(var_betaHat))
coeff_std_errors

        int        crim          zn       indus        chas         nox 
5.103458811 0.032864994 0.013727462 0.061495689 0.861579756 3.819743707 
         rm         age         dis         rad         tax     ptratio 
0.417925254 0.013209782 0.199454735 0.066346440 0.003760536 0.130826756 
      black       lstat 
0.002685965 0.050715278

These should mathc the output now in our lm command.

Lets continue deriving other elements of our hand-written lm command output.

# t values of estimates are ratio of estimated coefficients to std. errors
t_values <- beta_hat / coeff_std_errors

# p-values of t-statistics of estimated coefficeints
p_values_tstat <- 2 * pt(q  = abs(t_values), 
                         df = N-k, 
                         lower.tail = FALSE
                         )

# assigning R's significance codes to obtained p-values
signif_codes_match <- function(x){
  ifelse(x <= 0.001,"***",
         ifelse(x <= 0.01,"**",
                ifelse(x < 0.05,"*",
                       ifelse(x < 0.1,"."," ")
                       )
                )
         )
}

signif_codes <- sapply(p_values_tstat, signif_codes_match)

# R-squared and Adjusted R-squared (refer any econometrics / statistics textbook)
R_sq      <- 1 - (N-k-1) * residual_variance / ( N        * mean((y - mean(y))^2) )
R_sq_adj  <- 1 -           residual_variance / ((N/(N-1)) * mean((y - mean(y))^2) )

# Residual sum of squares (RSS) for the full model
RSS   <- (N-k-1) * residual_variance
# RSS for the partial model with only intercept (equal to mean), ergo, TSS
RSS0  <- TSS <- sum( (y - mean(y))^2 )

# F statistic based on RSS for full and partial models
# k = degress of freedom of partial model
# N - k - 1 = degress of freedom of full model
F_stat <- ( (RSS0 - RSS)/k)  /  (RSS/(N-k-1) )

# p-values of the F statistic
p_value_F_stat <- pf(q   = F_stat, 
                     df1 = k, 
                     df2 = N-k-1, 
                     lower.tail = FALSE
                     )

# stitch the main results toghether
lm_results <- as.data.frame(cbind(round(x = beta_hat, digits = 5), 
                                  round(x = coeff_std_errors, digits = 5), 
                                  round(x = t_values, digits = 5), 
                                  round(x = p_values_tstat, digits = 5), 
                                  signif_codes)
                            )
colnames(lm_results) <- c("Estimate",
                          "Std. Error",
                          "t value",
                          "Pr(>|t|)",
                          ""
                          )

### Print out results of all relevant calculations -----------------------

This is our output from the user written command to repliacte lm command output.

# print(lm_results)

# Print the table using kable with kableExtra for HTML output
kable(x = lm_results,
      format = "html") %>%
  kable_styling(full_width = TRUE)  # Adjust width as needed

	Estimate	Std. Error	t value	Pr(>\|t\|)
int	36.45949	5.10346	7.14407	0	***
crim	-0.10801	0.03286	-3.28652	0.00109	**
zn	0.04642	0.01373	3.38158	0.00078	***
indus	0.02056	0.0615	0.33431	0.73829
chas	2.68673	0.86158	3.11838	0.00192	**
nox	-17.76661	3.81974	-4.65126	0	***
rm	3.80987	0.41793	9.11614	0	***
age	0.00069	0.01321	0.0524	0.95823
dis	-1.47557	0.19945	-7.398	0	***
rad	0.30605	0.06635	4.6129	1e-05	***
tax	-0.01233	0.00376	-3.28001	0.00111	**
ptratio	-0.95275	0.13083	-7.28251	0	***
black	0.00931	0.00269	3.46679	0.00057	***
lstat	-0.52476	0.05072	-10.34715	0	***

?cat # Outputs the objects, concatenating the representations. 
cat("Residual standard error: ", 
          round(residual_std_error, digits = 3), 
          " on ",N-k-1," degrees of freedom",
          "\nMultiple R-squared: ",R_sq," Adjusted R-squared: ",R_sq_adj,
          "\nF-statistic: ",F_stat, " on ",k-1," and ",N-k-1,
          " DF,  p-value: ", p_value_F_stat,"\n")

Residual standard error:  4.745  on  492  degrees of freedom 
Multiple R-squared:  0.7406427  Adjusted R-squared:  0.7337897 
F-statistic:  108.0767  on  12  and  492  DF,  p-value:  6.722175e-135

This is our output from the original (inbuilt) lm command.

summary(lm.mod)


Call:
lm(formula = medv ~ ., data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.595  -2.730  -0.518   1.777  26.199 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
crim        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
zn           4.642e-02  1.373e-02   3.382 0.000778 ***
indus        2.056e-02  6.150e-02   0.334 0.738288    
chas         2.687e+00  8.616e-01   3.118 0.001925 ** 
nox         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
rm           3.810e+00  4.179e-01   9.116  < 2e-16 ***
age          6.922e-04  1.321e-02   0.052 0.958229    
dis         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
rad          3.060e-01  6.635e-02   4.613 5.07e-06 ***
tax         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
ptratio     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
black        9.312e-03  2.686e-03   3.467 0.000573 ***
lstat       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.745 on 492 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7338 
F-statistic: 108.1 on 13 and 492 DF,  p-value: < 2.2e-16

The output matches, as expected.

9 References

Point Estimate / Betas Formula if you are running a multivariate regression and want to replicate the point estimates with cov-var formulas.
Formal OLS Matrix Proof (NYU)
Linear Regression from Scratch: Betas Vector
Linear Regression from Scratch: Variance-Covariance matrix