2025-03-16

Introduction

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

At it’s simplest, Linear Regression is the relationship between one predictor (X), and a response (Y) described by a simple y = mx + b equation, with beta 0 giving the y-intercept, and beta 1 giving the slope. Epsilon represents the error term.

Mathematical Underpinnings

\[ Slope: \hat{\beta}_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2} \] The slope is calculated using the Method of Least Squares to find the best fitting line through the given data. \[ Intercept: \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \] The y-intercept is calculated using this function to ensure the line passes through the point \[(\bar{x},\bar{y}) \]

Correlation vs. Regression

Correlation and Regression might look simple on the outset, but there are some very important differences

  • Correlation does not provide a prediction model while Regression does
  • Correlation is limited to just two variables while Regression can have even more
  • Correlation is symmetrical, while Regression is directional
  • Correlation is a unitless measurement while Regression is not.

Example 1

women_lm_model <- lm(weight ~ height, data = women)


hvw_plot <- ggplot(data = women, aes(x = height, y = weight)) +
  geom_point(color = "blue", size = 3, alpha = 0.8) +
  geom_smooth(method = "lm", formula = y ~ x, color = "red", se = TRUE) +
  labs(
    title = "Example 1: Women's Hight vs. Weight",
    x = "Height (inches)",
    y = "Weight (lbs)"
  ) 

Example 1 (cont.)

Example 2

Example 3