2025-10-19

Introduction

In this presentation, we explore Simple Linear Regression, one of the most fundamental tools in Statistics.
It helps us model the relationship between two continuous variables — a predictor \(x\) and a response \(y\).

What is Linear Regression?

Linear regression models the relationship between variables by fitting a line that minimizes the distance between observed data points and the predicted line.

\[ y = \beta_0 + \beta_1 x + \epsilon \]

Where:

  • \(y\): dependent variable
  • \(x\): independent variable
  • \(\beta_0\): intercept
  • \(\beta_1\): slope
  • \(\epsilon\): random error term

Example Dataset

We’ll use the built-in mtcars dataset from R.

library(ggplot2)
data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Scatter Plot with Regression Line (ggplot2)

## `geom_smooth()` using formula = 'y ~ x'

Slide 5: Theoretical Model (Math Slide)

The least-squares estimates for simple linear regression are calculated using:

\[ b_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})} {\sum_{i=1}^{n} (x_i - \bar{x})^2} \]

\[ b_0 = \bar{y} - b_1 \bar{x} \]

These estimates define the regression line:

\[ \hat{y}_i = b_0 + b_1 x_i \]

Residual Plot (ggplot2) Code

A residual plot helps assess model fit — points should be randomly scattered if assumptions hold.

model <- lm(mpg ~ hp, data = mtcars)
mtcars$residuals <- resid(model)
ggplot(mtcars, aes(x = hp, y = residuals)) +
geom_point(color = "darkgreen") +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray40") +
labs(title = "Residual Plot", x = "Horsepower", y = "Residuals") +
theme_minimal(base_size = 14)

Residual Plot (ggplot2) Plot

A residual plot helps assess model fit — points should be randomly scattered if assumptions hold.

3D Visualization (plotly)

We can extend visualization into 3D using Plotly, for example, to explore relationships among mpg, hp, and weight.