2025-11-15

Simple Linear Regression

A basic statistical method for modeling relationships between two variables.

The Goal

First we want to understand how a predictor \(x\) influences a response variable \(y\). We model this relationship with a straight line.

Mathematically: \[ y = \beta_0 + \beta_1 x + \varepsilon \]

Least Squares Estimation

To estimate the best lines, we must minimize squared residuals.

Closed-form estimates: \[ \hat{\beta}_1 = \frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2} \]

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

Dataset Example

set.seed(12)
x <- runif(80, 0, 12)
y <- 3 + 2*x + rnorm(80, sd = 3)
df <- data.frame(x, y)

model <- lm(y ~ x, data = df)
summary(model)
## 
## Call:
## lm(formula = y ~ x, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7721 -1.7142 -0.0674  1.8425  5.8331 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.41886    0.59929   5.705    2e-07 ***
## x            1.95822    0.08799  22.256   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.63 on 78 degrees of freedom
## Multiple R-squared:  0.864,  Adjusted R-squared:  0.8622 
## F-statistic: 495.3 on 1 and 78 DF,  p-value: < 2.2e-16

Scatterplot (ggplot)

## `geom_smooth()` using formula = 'y ~ x'

Residual Plot

Interactive Plot

Interpreting Slope

The slope \(\hat{\beta}_1\): - Represents the estimated change in the \(y\) for each single unit increase in \(x\). - A positive slope suggests \(y\) increased as \(x\) increases.

Example interpretation(based on simulated data): A 1 unit increase in \(x\) is associated with about a 2 unit increase in \(y\).

Predictions

##         fit       lwr       upr
## 1  7.335288  6.430968  8.239608
## 2 15.168153 14.582674 15.753632
## 3 23.001018 22.079262 23.922774

This gives confidence intervals for the mean response

Summary

  • Simple linear regression models linear relationship between variables.
  • Included: 2 ggplots and 1 plotly plot