What is Simple Linear Regression?

Simple linear regression is a tool used show relation between two variables. Simple Linear Regression models the relationship between:

  • A response variable \(Y\) (what we want to predict)
  • A predictor variable \(X\) (what we use to predict)

The goal is to find the best-fitting straight line through the data. Linear regression is a supervised algorithm meaning it needs labeled training data to learn and it learns from error. —

The Model

The simple linear regression model is:

\[Y = \beta_0 + \beta_1 X + \epsilon\]

Where:

  • \(\beta_0\) = intercept (value of \(Y\) when \(X = 0\))
  • \(\beta_1\) = slope (change in \(Y\) for a one-unit increase in \(X\))
  • \(\epsilon \sim N(0, \sigma^2)\) = random error term

Finding the Best fit

In simple linear regression we have to find a straight line that fits the data best to do that We estimate \(\beta_0\) and \(\beta_1\) using Ordinary Least Squares (OLS), which minimizes the sum of squared residuals:

\[\min \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 = \min \sum_{i=1}^{n}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2\]

The formula to find intercept and slope are:

\[\hat{\beta}_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}, \quad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}\]

Example: Car Weight vs MPG

We’ll use the built-in mtcars dataset to predict MPG from weight.

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "blue", size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(title = "Car Weight vs Fuel Efficiency",
       x = "Weight (1000 lbs)", y = "Miles Per Gallon") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Fitting the Model in R — Code Slide

# Fit the model
model <- lm(mpg ~ wt, data = mtcars)

# View results
summary(model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Residuals Plot

Residuals are the difference between the actual data point and the predicted data points.Residuals should be randomly scattered — no pattern means a good fit.Checking the residuals are important to make sure the model is accurate.

3D Plot: Weight, Horsepower & MPG

library(plotly)

plot_ly(mtcars, x = ~wt, y = ~hp, z = ~mpg,
        type = "scatter3d", mode = "markers",
        marker = list(color = ~mpg, colorscale = "Viridis", size = 5)) %>%
  layout(title = "Weight, Horsepower & MPG",
         scene = list(xaxis = list(title = "Weight"),
                      yaxis = list(title = "Horsepower"),
                      zaxis = list(title = "MPG")))

Model Interpretation

From the output of summary(model):

  • Intercept (\(\hat{\beta}_0\)) ≈ 37.3 — predicted MPG when weight = 0
  • Slope (\(\hat{\beta}_1\)) ≈ −5.34 — each 1,000 lb increase reduces MPG by ~5.3
  • ≈ 0.75 — weight explains 75% of the variation in MPG
  • p-value < 0.001 — the relationship is statistically significant

Conclusion

  • Simple linear regression finds the best line through data
  • OLS minimizes the total squared error
  • The slope tells us direction and magnitude of the relationship
  • R² tells us how well the model fits
  • Always check residuals to validate assumptions