What is Simple Linear Regression?

Simple Linear Regression is a statistical method used to model the relationship between two variables:

  • Response variable (Y): The outcome we want to predict
  • Predictor variable (X): The variable we use to make predictions

When do we use it?

  • To understand the relationship between two continuous variables
  • To predict future values based on observed data
  • To quantify the strength of a linear relationship

Example: Predicting a car’s stopping distance based on its speed

The Linear Regression Model

The simple linear regression model is expressed as:

\[Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\]

where:

  • \(Y_i\) is the response variable (stopping distance)
  • \(X_i\) is the predictor variable (speed)
  • \(\beta_0\) is the intercept (stopping distance when speed = 0)
  • \(\beta_1\) is the slope (change in distance per unit increase in speed)
  • \(\epsilon_i\) is the random error term

The goal is to estimate \(\beta_0\) and \(\beta_1\) from our data.

Estimating Parameters: Least Squares Method

We estimate \(\beta_0\) and \(\beta_1\) by minimizing the sum of squared residuals:

\[\text{SSE} = \sum_{i=1}^{n}(Y_i - \hat{Y}_i)^2 = \sum_{i=1}^{n}(Y_i - \beta_0 - \beta_1 X_i)^2\]

The least squares estimates are:

\[\hat{\beta}_1 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^{n}(X_i - \bar{X})^2}\]

\[\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}\]

where \(\bar{X}\) and \(\bar{Y}\) are the sample means.

Example: Cars Dataset

We’ll use the built-in cars dataset in R:

# Load and preview the data
data(cars)
head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10
  • 50 observations from the 1920s
  • speed: Speed of car (mph)
  • dist: Stopping distance (feet)

Research Question: Can we predict stopping distance based on speed?

Visualizing the Relationship

The regression line shows a clear positive relationship between speed and stopping distance.

Residual Plot

Residuals should be randomly scattered around zero with no clear pattern.

Extension: Multiple Regression in 3D

R Code: Creating the Regression Plot

# Load ggplot2
library(ggplot2)

# Create scatter plot with regression line
ggplot(cars, aes(x = speed, y = dist)) +
  geom_point(color = "#8C1D40", size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE, 
              color = "black", fill = "gray") +
  labs(title = "Speed vs Stopping Distance",
       x = "Speed (mph)",
       y = "Stopping Distance (feet)") +
  theme_minimal()

This code creates the scatter plot with regression line from slide 6.

Key Takeaways

Simple Linear Regression allows us to:

  • Model relationships between two variables
  • Make predictions based on observed patterns
  • Quantify the strength of relationships

Our Cars Example:

  • Strong positive relationship between speed and stopping distance
  • For every 1 mph increase in speed, stopping distance increases by ~3.93 feet