2025-11-17

Introduction

This is a presentation discussing Simple Linear Regression. We will explore the basic ideas, mathematics, and visualizations that help us understand how one variable can be used to predict another.

What Is Simple Linear Regression?

Simple linear regression is a statistical method used to model the relationship between:

  • one independent variable (predictor)
  • one dependent variable (response)

The model assumes the relationship can be expressed as a straight line:

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

Where
- \(\beta_0\) is the intercept
- \(\beta_1\) is the slope
- \(\epsilon\) is the error term

Exploring the Relationship Between Two Variables

To understand simple linear regression, we start by visualizing the relationship between two quantitative variables. Here, we examine how car weight (wt) relates to miles per gallon (mpg) using the mtcars dataset.

Adding a Regression Line

A simple linear regression line helps us see the general trend. It represents the predicted relationship between car weight and fuel efficiency.

## `geom_smooth()` using formula = 'y ~ x'

## The Simple Linear Regression Model

The goal of simple linear regression is to model how a response variable \(Y\) changes with a predictor variable \(X\).

We assume the following mathematical model:

\[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \]

Where:

  • \(\beta_0\) = intercept (value of \(Y\) when \(X = 0\))
  • \(\beta_1\) = slope (change in \(Y\) for each 1-unit increase in \(X\))
  • \(\epsilon_i\) = random error for each observation

Estimating the Coefficients

The parameters \(\beta_0\) and \(\beta_1\) are estimated using the least squares method, which minimizes the total squared error:

\[ \min_{\beta_0, \beta_1} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 \]

The formulas for the least-squares estimates are:

\[ \hat{\beta}_1 = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})} {\sum (X_i - \bar{X})^2} \]

\[ \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X} \]

Fitting a Linear Regression Model in R

We can use R’s built-in lm() function to estimate the intercept and slope for predicting mpg from weight (wt).

model <- lm(mpg ~ wt, data = mtcars)
summary(model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Interactive Plotly Visualization

Here is an interactive scatterplot showing the relationship between car weight and miles per gallon.

Conclusion

Simple linear regression provides a powerful yet easy-to-understand method for modeling the relationship between two quantitative variables. By:

  • visualizing data with scatterplots
  • estimating the regression line
  • using R functions like lm()
  • and interpreting the results

we can understand how changes in one variable are associated with changes in another.

Linear regression is widely used in science, engineering, business, economics, and many other fields—making it one of the most essential tools in statistics.