Introduction to Simple Linear Regression

Simple linear regression is a statistical method that models the relationship between two variables by fitting a linear equation to observed data.

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

When to Use Linear Regression

  • Predicting a continuous outcome variable (Y)
  • Based on a single predictor variable (X)
  • Understanding strength and direction of relationship

Real-World Example: Car Speed vs. Stopping Distance

We’ll use the built-in cars dataset in R.

library(ggplot2)
library(plotly)
data(cars)
head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

ggplot2: Scatterplot with Regression Line

ggplot(cars, aes(x = speed, y = dist)) +
  geom_point(color = "darkblue") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Speed vs. Stopping Distance",
       x = "Speed (mph)", y = "Stopping Distance (ft)")
## `geom_smooth()` using formula = 'y ~ x'

ggplot2: Residual Plot

model <- lm(dist ~ speed, data = cars)
cars$residuals <- resid(model)

ggplot(cars, aes(x = speed, y = residuals)) +
  geom_point(color = "purple") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(title = "Residuals of Linear Model",
       x = "Speed", y = "Residual")

Plotly 3D Plot (Extended Example)

set.seed(123)
x <- rnorm(100)
y <- rnorm(100)
z <- 2 + 3 * x + 4 * y + rnorm(100)

plot_ly(x = ~x, y = ~y, z = ~z, type = "scatter3d", mode = "markers",
        marker = list(size = 3, color = z, colorscale = "Viridis"))

Mathematical Derivation

The estimated coefficients are computed as: \[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \] \[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

Conclusion

  • Simple linear regression is useful for modeling linear relationships.
  • It provides interpretability and basic predictive ability.
  • Always inspect residuals and assumptions.

Thank you!