2024-03-21

Introduction to Simple Linear Regression

Simple Linear Regression is a commonly used type of predictive analysis tool used to explain the relationship between two variables by fitting a linear equation to observed data. One variable is the explanatory variable and the other is a dependent variable.

Mathematical Model

Simple Linear Regression Equation:

\[ Y = \beta_0 + \beta_1X + \epsilon \]

  • \(Y\) is the dependent variable,
  • \(\beta_0\) is the intercept,
  • \(\beta_1\) is the slope of the line,
  • \(X\) is the independent variable,
  • \(\epsilon\) is the error term.

Assumptions of Simple Linear Regression

  • Independence: Observations are independent.
  • Linearity: Predictor and target relationship is linear.
  • Homoscedasticity: Constant variance of error terms.
  • Normality: Residuals of model are normally distributed.

Estimating Coefficients

You can use the Least Squares Method to estimate the coefficients. This minimizes the sum of the squared differences that between the predicted and observed values.

To find the best fit/line for our data, we look for the intercept \(\beta_0\) and the slope \(\beta_1\) that make the line as close as possible to all data points.

R Code for Fitting a Model

## Warning in summary.lm(model): essentially perfect fit: summary may be
## unreliable
## 
## Call:
## lm(formula = Y ~ X, data = dataset)
## 
## Residuals:
##          1          2          3          4          5 
##  2.184e-16 -4.138e-16  1.514e-16  6.514e-17 -2.111e-17 
## 
## Coefficients:
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept) -7.944e-16  3.007e-16 -2.642e+00   0.0775 .  
## X            1.000e+00  9.065e-17  1.103e+16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.867e-16 on 3 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.217e+32 on 1 and 3 DF,  p-value: < 2.2e-16

Visualizing Linear Regression

## `geom_smooth()` using formula = 'y ~ x'

Residuals Plot

Interactive Linear Regression Plot

## `geom_smooth()` using formula = 'y ~ x'