2025-03-23

Introduction

Simple Linear Regression models the relationship between two variables by fitting a straight line.

We use the equation of the line:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

where: - \(x\) is the independent variable
- \(y\) is the dependent variable
- \(\beta_0\) is the intercept
- \(\beta_1\) is the slope
- \(\epsilon\) is the error term

Use Case Example

We will use the built-in mtcars dataset in R to predict miles per gallon (mpg) based on horsepower (hp).

Loading Packages

library(ggplot2)
library(plotly)

Code: Scatterplot of MPG vs HP

ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point(color = "steelblue", size = 3) +
  ggtitle("Scatterplot of MPG vs Horsepower") +
  xlab("Horsepower") +
  ylab("Miles per Gallon")

Plot: Scatterplot of MPG vs HP

Fit Linear Model

## 
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## hp          -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

We estimate the model:

\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x \]

Code: Regression Line Plot

ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point(color = "darkgreen") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  ggtitle("Regression Line on MPG vs HP") +
  theme_minimal()

Plot: Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Code: 3D Interactive Plot

mtcars$car <- rownames(mtcars)

fig <- plot_ly(mtcars, x = ~hp, y = ~mpg, z = ~wt, type = 'scatter3d', mode = 'markers',
               marker = list(size = 5, color = ~wt, colorscale = "Viridis"),
               text = ~paste("Car:", car))

fig <- fig %>% layout(title = "3D Plot: MPG vs HP vs Weight",
                      scene = list(xaxis = list(title = "HP"),
                                   yaxis = list(title = "MPG"),
                                   zaxis = list(title = "Weight")))

fig

Plot: 3D Interactive Plot

Math: Deriving Coefficients

We minimize the sum of squared residuals:

\[ SSR = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2 \]

Taking derivatives and setting them to 0 yields the normal equations to solve for \(\beta_0\) and \(\beta_1\).

Summary and Interpretation

  • Simple linear regression helps us model and predict relationships.
  • Coefficient \(\beta_1\) shows how much \(y\) changes per unit change in \(x\).
  • \(R^2\) explains how much variance in \(y\) is explained by the model.

Thank You!

Use linear regression to analyze trends, forecast, or study relationships in datasets. Try adding confidence intervals or testing multiple predictors for deeper insight!