2025-06-08

Welcome

This presentation explores Simple Linear Regression, a foundational tool in statistics used to examine the relationship between two variables. We will learn what it is, how it works, and see some visual and practical examples.

What is Simple Linear Regression?

Simple linear regression is a statistical method that allows us to model and analyze the relationship between two continuous variables.

We try to fit a straight line that best describes how one variable (the dependent variable) changes in response to another (the independent variable).

The general form of the simple linear regression model is:

\[ y = \beta_0 + \beta_1 x + \varepsilon \] Where: \(y\) is the dependent variable (what we want to predict), \(x\) is the independent variable (the predictor), \(\beta_0\) is the intercept, \(\beta_1\) is the slope, \(\varepsilon\) is the error term (the difference between the predicted and actual value).

Components of the Regression Model

In simple linear regression, the model is:

\[ y = \beta_0 + \beta_1 x + \varepsilon \]

Each component of the equation has a specific meaning:

  • \(\beta_0\): The intercept — the value of \(y\) when \(x = 0\)
  • \(\beta_1\): The slope — the change in \(y\) for every one-unit increase in \(x\)
  • \(\varepsilon\): The error term — accounts for the variability in \(y\) not explained by \(x\)

This equation assumes a linear relationship between the variables, where the slope and intercept describe the line of best fit through the data.

Visualizing the Data

We begin by visualizing the relationship between two variables using a scatter plot.

Below is a scatter plot of Height vs. Weight for a small sample of individuals.

Fitting a Regression Line

Now we fit a simple linear regression model to our data.

The goal is to find the line that best describes the relationship between height and weight.

3D Regression Visualization

Here’s a 3D plot showing how weight varies with height and age. This gives us an idea of how multiple predictors can influence an outcome.

R Code: Creating the Regression Plot

Below is the R code used to create the linear regression plot showing how weight changes with height.

# Fit the linear model
model <- lm(weight ~ height, data = sample_data)

# Create the plot with regression line
library(ggplot2)
ggplot(sample_data, aes(x = height, y = weight)) +
  geom_point(color = "steelblue", size = 3) +
  geom_smooth(method = "lm", se = FALSE, color = "darkred", 
  size = 1.2) + labs(x = "Height (inches)", y = "Weight (pounds)", 
  title = "Fitted Linear Regression Line") + 
  theme_minimal(base_size = 10)

How the Line is Estimated

To find the best-fitting regression line, we estimate the coefficients \(\beta_0\) and \(\beta_1\) by minimizing the sum of squared errors (SSE).

This means we minimize the following:

\[ SSE = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2 \]

The solution gives us the formulas:

\[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}, \quad \beta_0 = \bar{y} - \beta_1 \bar{x} \]

These are the slope and intercept of the regression line.

Summary & Key Takeaways

  • Simple linear regression is used to model the relationship between two variables.
  • The model assumes a linear relationship between an independent variable \(x\) and a dependent variable \(y\).
  • The equation of the line is: \(y = \beta_0 + \beta_1 x + \varepsilon\)
  • The slope \(\beta_1\) tells us how much \(y\) is expected to change for a one-unit increase in \(x\).
  • We use the least squares method to find the best-fitting line by minimizing prediction errors.
  • R makes it easy to fit a model using lm() and to visualize it with ggplot2 or plotly.
  • Simple regression is a building block for more complex models like multiple regression.