This is a presentation discussing Simple Linear Regression. We will explore the basic ideas, mathematics, and visualizations that help us understand how one variable can be used to predict another.
2025-11-17
This is a presentation discussing Simple Linear Regression. We will explore the basic ideas, mathematics, and visualizations that help us understand how one variable can be used to predict another.
Simple linear regression is a statistical method used to model the relationship between:
The model assumes the relationship can be expressed as a straight line:
\[ Y = \beta_0 + \beta_1 X + \epsilon \]
Where
- \(\beta_0\) is the intercept
- \(\beta_1\) is the slope
- \(\epsilon\) is the error term
To understand simple linear regression, we start by visualizing the relationship between two quantitative variables. Here, we examine how car weight (wt) relates to miles per gallon (mpg) using the mtcars dataset.
A simple linear regression line helps us see the general trend. It represents the predicted relationship between car weight and fuel efficiency.
## `geom_smooth()` using formula = 'y ~ x'
## The Simple Linear Regression Model
The goal of simple linear regression is to model how a response variable \(Y\) changes with a predictor variable \(X\).
We assume the following mathematical model:
\[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \]
Where:
The parameters \(\beta_0\) and \(\beta_1\) are estimated using the least squares method, which minimizes the total squared error:
\[ \min_{\beta_0, \beta_1} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 \]
The formulas for the least-squares estimates are:
\[ \hat{\beta}_1 = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})} {\sum (X_i - \bar{X})^2} \]
\[ \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X} \]
We can use R’s built-in lm() function to estimate the intercept and slope for predicting mpg from weight (wt).
model <- lm(mpg ~ wt, data = mtcars) summary(model)
## ## Call: ## lm(formula = mpg ~ wt, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.5432 -2.3647 -0.1252 1.4096 6.8727 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** ## wt -5.3445 0.5591 -9.559 1.29e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.046 on 30 degrees of freedom ## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 ## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Here is an interactive scatterplot showing the relationship between car weight and miles per gallon.
Simple linear regression provides a powerful yet easy-to-understand method for modeling the relationship between two quantitative variables. By:
lm()we can understand how changes in one variable are associated with changes in another.
Linear regression is widely used in science, engineering, business, economics, and many other fields—making it one of the most essential tools in statistics.