2026-02-08

What is Simple Linear Regression

Simple linear regression is a statistical method that analyzes the relationship between two variables, the independent (predictor), and the dependent variable (response variable). The method models both variables on a straight line that serves as the ideal linear relationship between them.

Simple Linear Regression Model

The relationship between the independent variable and dependent variable:

\[ Y_i = \beta_0 +\beta_1 X_i + \epsilon \] Variables:

  • \(Y_i\) = dependent variable
  • \(X_i\) = independent variable
  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\epsilon\) = error

Linear Regression Formulas

Regression line estimate:

\[ \hat{Y} = b_0 + b_1 X \] Slope estimate: \[ b_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \] Intercept estimate: \[ b_0 = \bar{y} - b_1 \bar{x} \]

Dataset Information

A data set will be used to demonstrate linear regression through plots and graphs.This data set named Auto MPG is taken from https://archive.ics.uci.edu/dataset/9/auto+mpg. The data set will contain the following automobile information:

  • V1 = miles per gallon
  • V2 = number of cylinders
  • V3 = displacement
  • V4 = horsepower
  • V5 = weight
  • V6 = acceleration
  • V7 = model year
  • V8 = origin
  • V9 = car name

MPG vs Weight ggplot

MPG vs Weight code

ggplot(df, aes(V5, V1)) + geom_point()+ 
  labs(title = "MPG vs Vehicle Weight",
       x = "Weight", y = "MPG")

Regression Line ggplot

`geom_smooth()` using formula = 'y ~ x'