2026-03-07

Introduction

Simple linear regression is a statistical method used to study the relationship between two quantitative variables.

It helps us answer questions such as:

  • Does one variable help predict another?
  • Is the relationship positive or negative?
  • How strong is the relationship?

In this presentation, we study the relationship between car weight and fuel efficiency.

What is Simple Linear Regression?

Simple linear regression models the relationship between:

  • one predictor variable \(X\)
  • one response variable \(Y\)

The goal is to find the line that best describes how \(Y\) changes as \(X\) changes.

For this example:

  • \(X =\) car weight
  • \(Y =\) miles per gallon (mpg)

Regression Model

The simple linear regression model is

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

where:

  • \(Y\) is the response variable
  • \(X\) is the predictor variable
  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\epsilon\) is the random error term

This equation describes the population relationship between the variables.

Estimated Regression Line

The fitted regression line from sample data is

\[ \hat{Y} = b_0 + b_1 X \]

The slope estimate is

\[ b_1 = \frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2} \]

The intercept estimate is

\[ b_0 = \bar{y} - b_1 \bar{x} \]

These formulas help us estimate the best-fitting line.

Example

We use the built-in mtcars dataset.

The variables used are:

  • wt: weight of the car
  • mpg: miles per gallon

Research question:

Can car weight be used to predict fuel efficiency?

We expect heavier cars to have lower miles per gallon.

Summary of the Data

summary(mtcars[, c("wt", "mpg")])
##        wt             mpg       
##  Min.   :1.513   Min.   :10.40  
##  1st Qu.:2.581   1st Qu.:15.43  
##  Median :3.325   Median :19.20  
##  Mean   :3.217   Mean   :20.09  
##  3rd Qu.:3.610   3rd Qu.:22.80  
##  Max.   :5.424   Max.   :33.90

Scatter Plot

ggplot: Distribution of Miles Per Gallon

Plotly Interactive Plot

R Code for Creating the Scatter Plot

ggplot(
  mtcars,
  aes(x = wt, y = mpg)
) +
  geom_point(size = 1.8) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(
    x = "Weight of Car",
    y = "Miles Per Gallon"
  ) +
  theme_minimal(base_size = 10)

Interpretation Of the Results

From the regression output, the coefficient of wt is negative.

This means:

  1. As car weight increases, miles per gallon tends to decrease

  2. Heavier cars are generally less fuel efficient

The fitted model allows us to quantify this relationship and make predictions.

Why This Topic is Useful

Simple linear regression is useful because it helps us:

  1. understand relationships between variables

  2. make predictions

  3. summarize trends with an equation

  4. support decisions using data

It is widely used in business, science, engineering, and economics.

Conclusion

Simple linear regression is an important statistical tool for modeling the relationship between two quantitative variables.

In this example using the mtcars dataset, we found that:

  1. weight is a useful predictor of fuel efficiency

  2. the relationship between weight and mpg is negative

  3. graphs and regression output help us understand the data clearly

This shows how statistics can be used to explain and predict real-world patterns.