2026-02-01

Why Linear Regression?

We use linear regression to describe and predict how a response \(y\) changes as \(x\) changes.

Simple linear regression models the relationship between: - a response variable \(y\) - a predictor variable \(x\)

It assumes a linear relationship between the two variables.

## Ex. Data
set.seed(123)
data <- data.frame(
  x = runif(50, 0, 10)
)
data$y <- 3 + 2 * data$x + rnorm(50, 0, 2)

B) ggplot #1 (scatter)

ggplot(data, aes(x, y)) +
  geom_point() +
  theme_minimal() +
  labs(title="Scatterplot of x vs y")

## C) ggplot #2 (line fit)

ggplot(data, aes(x, y)) +
  geom_point() +
  geom_smooth(method="lm", se=TRUE) +
  theme_minimal() +
  labs(title="Regression Line with Confidence Band")
## `geom_smooth()` using formula = 'y ~ x'

## D) plotly (interactive)

plot_ly(data, x=~x, y=~y, type="scatter", mode="markers")

The Regression Model

\[ y = \beta_0 + \beta_1 x + \varepsilon \]

  • \(\beta_0\): intercept
  • \(\beta_1\): slope
  • \(\varepsilon\): random error

Least Squares

We estimate the line by minimizing:

\[ \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]

where \(\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i\).

Fit the Model in R

model <- lm(y ~ x, data = data)
summary(model)
## 
## Call:
## lm(formula = y ~ x, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5116 -1.1157 -0.1313  1.0985  4.3723 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.7153     0.5442   4.989 8.37e-06 ***
## x             2.0764     0.0913  22.743  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.881 on 48 degrees of freedom
## Multiple R-squared:  0.9151, Adjusted R-squared:  0.9133 
## F-statistic: 517.2 on 1 and 48 DF,  p-value: < 2.2e-16

Key Assumptions

  • Linearity: mean of \(y\) changes linearly with \(x\)
  • Independence of errors
  • Constant variance (homoscedasticity)
  • Errors are approximately normal (for inference)

Conclusion

  • Regression helps explain and predict relationships
  • ggplot shows patterns and fitted lines clearly
  • plotly makes exploration interactive