October 26 2025

Introduction

Simple linear regression is a statistical model that allows us to:

-Model the relationship between two variables

-Make predictions based on the observed data

-Understand how changes in one variable affect another

Some key application:

-Predicting sales based on advertisement budget

-Estimating house price based on square footage

-Predicting fuel efficiency based off horsepower in a car

Mathematical Equation

Regression Equation:

\(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 \cdot x\)

Where:

-\(\hat{y}\) is the predicted value of y

-\(\hat{\beta}_0\) is the estimated y-intercept

-\(\hat{\beta}_1\) is the estimated slope coefficient

-\(x\) is the independent variable used to make the predictions

Objective:

-Minimize the sum of squared residuals

-Minimize \(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\)

Example with Motor Trend Car Road Tests

We will use mtcars dataset to analyze the relationship between:

-Independent variable (x): Horsepower(hp)

-Dependent variable (y): Miles per gallon(mpg)

Graph:

ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point(color = "darkgreen", size = 3, alpha = 0.7) + 
  labs(
    title = "Relationship between Horsepower and MPG",
    x = "Horsepower",
    y = "Miles per Gallon (MPG)"
  ) + theme_minimal()

Observation:

As horsepower increases, fuel efficiency tends to decrease.

Fited Regression Model

We can fit the model in R and examine the coefficients:

model = lm(mpg ~ hp, data = mtcars)
summary(model)
Call:
lm(formula = mpg ~ hp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7121 -2.1122 -0.8854  1.5819  8.2360 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
hp          -0.06823    0.01012  -6.742 1.79e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared:  0.6024,    Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

Observation:

Both the intercept and the predictor variable (hp) have extremely small pvlaues which means that they are statistically significant.

The resulting regression equation is:

\(\hat{mpg} = 30.09886 -0.06823\cdot hp\)

Interactive Plotly Visualization

Residual Analysis

Observation:

The residual plot shows no obvious pattern , with points randomly scattered around the horizontal line at zero. This suggests that the linear models assumptions are reasonably met and the fit is appropriate.

Prediction example

Predict MPG for a 150 horespower car:

\(\hat{mpg} = 30.09886 -0.06823\cdot 150\)

\(\hat{mpg} = 19.86436\)

The predicted mpg for a car that has 150 horsepower 19.86436 mpg.

Applications:

-Manufactures can optimize power vs. efficiency trade offs

-Car buyers can use this to compare vehicles and find the right fit for them

-Environmental agency can test fleet emissions

Conclusion

Key Findings:

  • Strong negative relationship: as horsepower increases, mpg decreases

  • Each additional horsepower decreases mpg by approximately 0.07

  • The model explains 60.24% of MPG variance

Practical Value:

  • Consumers can estimate fuel efficiency when car shopping

  • Provides quantitative basis for environmental impact analysis

  • Demonstrates the power of simple linear regression for real-world predictions