2025-03-04

Simple Linear Regression Introduction

Linear Regression is a form of finding a statistical correlation between a set of data points. This is done by modeling the relationship between an indepedent and dependent variable. This slideshow will explain the formula, present the models, and the best use cases for this approach.

Linear Regression Formula

\[ Y = \beta_0 + \beta_1 X + \epsilon \] This is the formula for linear regression. \(Y\) is the dependent variable (response), \(X\) is the independent variable (predictor), \(\beta_0\) is the intercept, \(\beta_1\) is the slope (coefficient), \(\epsilon\) is the error term.

Estimating the Parameters

The method of least squares is used to estimate the parameters by minimizing:

\[\sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \sum_{i=1}^{n} (y_i - (\hat{\beta}_0 + \hat{\beta}_1 x_i))^2\]

Linear Regression Data

df <- data.frame(
  X = 1:100,  
  Y = 5 + 2 * (1:100) + rnorm(100, mean = 0, sd = 10)
)

This is the code to generate a set of data to plot the linear regression on the next slide.

Linear Regression Plot

## `geom_smooth()` using formula = 'y ~ x'

Linear Regression Use-cases

It does not make sense to always use Linear Regression when measuring the correlation between data points There are times that data does not follow a linear relationship, and using linear regression will not provide the most. The most obvious example of this would be a char that follows a quadratic relationship

Quadratic Relationship with Linear Regression

## `geom_smooth()` using formula = 'y ~ x'

## $x
## [1] "Independent Variable X"
## 
## $y
## [1] "Dependent Variable Y"
## 
## $title
## [1] "Linear Regression on Quadratic Data"
## 
## attr(,"class")
## [1] "labels"

Multiple Linear Regression

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Conclusion

Linear regression is a good model for measuring the relationship between two variables. This is by using a independent and dependent variable and creates a best-fit line. Simple linear regression serves as the foundation for more advanced regression techniques.