Linear Regression Plots in ggplot2

From Simple Correlation to Multiple Predictors

Author

Abdullah Al Shamim

Published

February 10, 2026

Introduction

Linear regression plots allow us to visualize the strength and direction of relationships between variables. This guide covers Simple Linear Regression (one predictor) and Multiple Linear Regression (multiple predictors) using the mtcars dataset.


Mastering Regression Plots: Visualizing Statistical Relationships

Introduction Linear regression is a fundamental statistical method used to model the relationship between a dependent variable and one or more independent variables. In ggplot2, the geom_smooth() function provides a systemic way to overlay these models directly onto your data.

Systemic Pro Tips:

  1. Confidence Intervals: Use se = TRUE to visualize uncertainty. The shaded band represents the 95% confidence interval for the regression line.
  2. Model Choice: While method = "lm" fits a straight line, method = "loess" is better for identifying non-linear local trends.
  3. Residuals: A perfect plot doesn’t replace model diagnostics. Always check your residuals to ensure the assumptions of linearity are met.

1. Simple Linear Regression

A simple regression models the relationship between two continuous variables. Here, we investigate how car weight (wt) predicts fuel efficiency (mpg).

Basic Regression Line

We use geom_smooth(method = "lm") to fit a linear model.

Code
library(tidyverse)

mtcars %>%
  ggplot(aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(title = "Simple Linear Regression: MPG ~ Weight",
       x = "Weight (1000 lbs)",
       y = "Miles per Gallon") +
  theme_minimal()

Aesthetic Customization

You can customize the “Confidence Band” using the fill and alpha arguments to make the plot more readable.

Code
mtcars %>%
  ggplot(aes(x = wt, y = mpg)) +
  geom_point(color = "#A88EF2", size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", formula = y ~ x, color = "red", 
              fill = "lightblue", se = TRUE) +
  labs(title = "Car Weight vs. Fuel Efficiency",
       subtitle = "With 95% Confidence Band",
       x = "Weight (1000 lbs)",
       y = "MPG") +
  theme_bw()


2. Multiple Linear Regression

Multiple regression involves two or more predictors. We can visualize these interactions using Color Mapping or Faceting.

Option A: Interaction by Color

By mapping color to a categorical variable like cylinders (cyl), we see how the relationship between weight and MPG changes across different engine types.

Code
mtcars %>%
  ggplot(aes(x = wt, y = mpg, color = factor(cyl))) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_color_brewer(palette = "Set1", name = "Cylinders") +
  labs(title = "Multiple Regression: Interaction by Cylinder Count",
       x = "Weight (1000 lbs)",
       y = "MPG")

Option B: Faceted Comparison

Faceting creates “Small Multiples,” allowing for a cleaner comparison of the regression slopes for each group.

Code
mtcars %>%
  ggplot(aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~cyl, labeller = label_both) +
  labs(title = "MPG vs Weight: Grouped by Cylinder Count") +
  theme_light()


3. Adding Statistical Insights

A professional plot often includes the regression equation and the R-squared () value to quantify the model’s fit.

Code
library(ggpmisc)

mtcars %>% 
  ggplot(aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm") +
  # Adding the equation and R-squared value
  stat_poly_eq(formula = y ~ x,
               aes(label = paste(after_stat(eq.label), 
                                 after_stat(rr.label), 
                                 sep = "~~~")),
               parse = TRUE) +
  labs(title = "Regression Analysis with Model Statistics",
       x = "Weight", y = "MPG") +
  theme_test()


Systemic Summary Toolkit

Feature ggplot2 Function Purpose
Linear Model geom_smooth(method = "lm") Fits a straight line relationship.
Confidence Band se = TRUE Shows the 95% confidence interval.
Equation stat_poly_eq() Displays the mathematical formula on the plot.
Categorical Splitting facet_wrap() Visualizes multiple regression interaction.