2025-10-22

Simple linear regression

\(\hat{y}_i = \beta_0 + \beta_1 x_i\)

Simple linear regression mtcars

Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5432 -2.3647 -0.1252  1.4096  6.8727 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared:  0.7528,    Adjusted R-squared:  0.7446 
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

ggplot Scatter plot1 mtcars

\[(x_i, y_i)\] \[x_i = wt_i\] \[and\] \[y_i = mpg_i\]

Residual math

\[\hat{y}_i = \beta_0 + \beta_1 x_i\] \[e_i = y_i - \hat{y}_i\]

plot_ly interactive mtcars

ggplot Residuals ggplot2 mtcars

Math for Coefficient Estimation

The slope (\(\hat{\beta}_1\)) and intercept (\(\hat{\beta}_0\)) are estimated by:

\[ \hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})} {\sum_{i=1}^n (x_i - \bar{x})^2} \]

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

These minimize the sum of squared residuals: \[ S(\beta_0, \beta_1) = \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \]

Math for Variance and Standard Error

Variance of residuals:

\[ s^2 = \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{n - 2} \]

Standard error of slope:

\[ SE(\hat{\beta}_1) = \sqrt{\frac{s^2}{\sum_{i=1}^n (x_i - \bar{x})^2}} \]

Standard error of intercept:

\[ SE(\hat{\beta}_0) = \sqrt{s^2 \left( \frac{1}{n} + \frac{\bar{x}^2}{\sum_{i=1}^n (x_i - \bar{x})^2} \right)} \]

Plot: Fitted vs Actual