Linear regression plots allow us to visualize the strength and direction of relationships between variables. This guide covers Simple Linear Regression (one predictor) and Multiple Linear Regression (multiple predictors) using the mtcars dataset.
Introduction Linear regression is a fundamental statistical method used to model the relationship between a dependent variable and one or more independent variables. In ggplot2, the geom_smooth() function provides a systemic way to overlay these models directly onto your data.
Systemic Pro Tips:
Confidence Intervals: Use se = TRUE to visualize uncertainty. The shaded band represents the 95% confidence interval for the regression line.
Model Choice: While method = "lm" fits a straight line, method = "loess" is better for identifying non-linear local trends.
Residuals: A perfect plot doesn’t replace model diagnostics. Always check your residuals to ensure the assumptions of linearity are met.
1. Simple Linear Regression
A simple regression models the relationship between two continuous variables. Here, we investigate how car weight (wt) predicts fuel efficiency (mpg).
Basic Regression Line
We use geom_smooth(method = "lm") to fit a linear model.
Code
library(tidyverse)mtcars %>%ggplot(aes(x = wt, y = mpg)) +geom_point() +geom_smooth(method ="lm", color ="red", se =TRUE) +labs(title ="Simple Linear Regression: MPG ~ Weight",x ="Weight (1000 lbs)",y ="Miles per Gallon") +theme_minimal()
Aesthetic Customization
You can customize the “Confidence Band” using the fill and alpha arguments to make the plot more readable.
Code
mtcars %>%ggplot(aes(x = wt, y = mpg)) +geom_point(color ="#A88EF2", size =3, alpha =0.7) +geom_smooth(method ="lm", formula = y ~ x, color ="red", fill ="lightblue", se =TRUE) +labs(title ="Car Weight vs. Fuel Efficiency",subtitle ="With 95% Confidence Band",x ="Weight (1000 lbs)",y ="MPG") +theme_bw()
2. Multiple Linear Regression
Multiple regression involves two or more predictors. We can visualize these interactions using Color Mapping or Faceting.
Option A: Interaction by Color
By mapping color to a categorical variable like cylinders (cyl), we see how the relationship between weight and MPG changes across different engine types.
Code
mtcars %>%ggplot(aes(x = wt, y = mpg, color =factor(cyl))) +geom_point(size =3) +geom_smooth(method ="lm", se =FALSE) +scale_color_brewer(palette ="Set1", name ="Cylinders") +labs(title ="Multiple Regression: Interaction by Cylinder Count",x ="Weight (1000 lbs)",y ="MPG")
Option B: Faceted Comparison
Faceting creates “Small Multiples,” allowing for a cleaner comparison of the regression slopes for each group.
Code
mtcars %>%ggplot(aes(x = wt, y = mpg)) +geom_point() +geom_smooth(method ="lm", se =FALSE) +facet_wrap(~cyl, labeller = label_both) +labs(title ="MPG vs Weight: Grouped by Cylinder Count") +theme_light()
3. Adding Statistical Insights
A professional plot often includes the regression equation and the R-squared () value to quantify the model’s fit.
Code
library(ggpmisc)mtcars %>%ggplot(aes(x = wt, y = mpg)) +geom_point() +geom_smooth(method ="lm") +# Adding the equation and R-squared valuestat_poly_eq(formula = y ~ x,aes(label =paste(after_stat(eq.label), after_stat(rr.label), sep ="~~~")),parse =TRUE) +labs(title ="Regression Analysis with Model Statistics",x ="Weight", y ="MPG") +theme_test()