2026-06-07

Introduction

Linear Regression is a statistical method used to model the relationship between a quantitative response and one or more explanatory variables.

  • Simple Linear Regression: Uses one independent variable to predict the outcome.

  • Multiple Linear Regression: Uses two or more independent variables for a more robust prediction.

We will explore mtcars from Base R datasets package to demonstrate linear regression.

The Simple Linear Regression Model

We assume the response \(Y\) is linearly related to a single predictor \(X\).

The mathematical model is defined as: \[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i\]

Where:

  • \(Y_i\) is the dependent variable
  • \(X_i\) is the independent variable
  • \(\beta_0\) is the y-intercept
  • \(\beta_1\) is the slope coefficient
  • \(\varepsilon_i\) represents the error term

The goal is to fit a line that minimizes the sum of squared residuals.

Simple Linear Regression: Horsepower Vs. MPG

This plot indicates a negative linear relationship between horsepower and mpg. The fitted regression line visually confirms that cars with more powerful engines tend to be less fuel efficient.

Simple Linear Regression: Weight Vs. MPG

The interactive ggplotly plot shows the negative linear relationship between car weight and fuel efficiency. It also shows that less fuel efficient cars tend to have higher displacement (indicated by the size of data points), are equipped with V shaped engines (indicated by the shape of data points) and have more cylinders (indicated by color of data points).

The Multiple Linear Regression Model

We assume the response \(Y\) is linearly related to more than one predictor \(X\). For example with two predictors the mathematical model expands to: \[ Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i}+ \varepsilon_i\]

Taking MPG as dependent and, Weight and Horsepower as predictors, the equation becomes: \[ MPG = \beta_0 + \beta_1 (Weight) + \beta_2(Horsepower) + \varepsilon_i\]

Fitting the Model (R Code)

# 1. Fit the multiple linear regression model
fit <- lm(mpg ~ wt + hp, data = mtcars)

# 2. Create a grid of Weight and Horsepower values
wt_seq <- seq(min(mtcars$wt), max(mtcars$wt), length.out = 25)
hp_seq <- seq(min(mtcars$hp), max(mtcars$hp), length.out = 25)
grid <- expand.grid(wt = wt_seq, hp = hp_seq)

# 3. Predict MPG for the entire grid to create the Z-axis
grid$mpg <- predict(fit, newdata = grid)
z_matrix <- matrix(grid$mpg, nrow = 25, ncol = 25)

The Fitted Plane in 3D (plotly)

The regression plane represents the predicted mpg, sloping downward as both weight and horsepower increase. The actual data points are indicated by the red markers.

Summary

  • Key Predictors: Both Weight and Horsepower have strong and negative impact on fuel efficiency.
  • Simple Linear Vs. Multiple Linear Regression: Combining these predictors accounts for more data variance, and yields a better prediction than a single variable approach.