Understanding the relationship between two numerical variables.
2026-04-11
Understanding the relationship between two numerical variables.
Linear regression is a statistical method used to model the relationship between a predictor variable \(X\) and a response variable \(Y\).
It assumes a linear relationship of the form:
\[ Y = \beta_0 + \beta_1 X + \varepsilon \]
A simple linear regression model assumes that the response variable \(Y\) depends linearly on the predictor \(X\): \[ Y = \beta_0 + \beta_1 X + \varepsilon \]
Where:
We also assume: \[ E[\varepsilon] = 0, \quad \text{Var}(\varepsilon) = \sigma^2 \]
For this presentation, we will use the built-in mtcars dataset in R.
We will focus on the relationship between:
This dataset contains measurements from 32 car models and is commonly used for regression examples.
This plot shows the relationship between horsepower (hp) and miles per gallon (mpg) in the mtcars dataset.
This plot shows the residuals from the linear regression model of mpg predicted by horsepower (hp). Residual plot help us check whether the linear model assumptions are reasonable.
This interactive 3D plot shows how horsepower (hp), weight (wt), and miles per gallon (mpg) relate to each other in the mtcars dataset.
Below is the R code used to create the scatterplot with a regression line shown earlier in the presentation.
library(ggplot2)
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(color = "steelblue", size = 2) +
geom_smooth(method = "lm", se = FALSE, color = "red", linewidth = 1) +
labs(
title = "Horsepower vs MPG",
x = "Horsepower (hp)",
y = "Miles per Gallon (mpg)"
) +
theme_minimal()
The scatterplot and fitted regression line suggest a clear negative relationship between (hp) and miles per gallon (mpg). As horsepower increases, fuel efficiency tends to decrease.
The residual plot shows that most residuals are centered around zero without a strong pattern, which supports the assumption of linearity.
Overall, the model indicates that cars with higher horsepower generally achieve lower gas mileage, and the linear regression model provides a reasonable fit for this relationship.
Simple linear regression provides a useful framework for understanding how two numerical variables are related. In our example using the mtcars dataset, we found that horsepower (hp) is negatively associated with miles per gallon (mpg), meaning with more powerful engines tend to have lower fuel efficiency.
The regression line, residual plot, and 3D visualization together help confirm that the linear model is a reasonable approximation for this relationship. While the model simplifies reality, it offers clear insights and a foundation for more advanced statistical methods.