Linear Regression (LaTeX Math)

Definition: Models the relationship between two variables:

  • Dependent variable (\(Y\)): scaler outcome to predict
  • Independent variable(s) (\(X\)): predicting value used to explain \(Y\)

Goal: Estimate a straight line (line of best fit) that minimizes the difference between observed and predicted values.

Linear Model (LaTeX Math)

Equation of the Best Fit Line: \[y = mx + b\] where

  • \(y\) is the predicted value
  • \(m\) is the slope
  • \(x\) is the independent variable
  • \(b\) is the intercept

Example: Cylinders vs Horsepower

Examination of the relationship between number of cylinders and gross horsepower for automobiles.

                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

R Code: Cylinders vs Horsepower

The data:

data(mtcars) 

Fit linear model:

model <- lm(hp ~ cyl, data = mtcars)

Display rows:

head(mtcars, 10)

3D Plotly Plot - Cylinders vs Horsepower

Cylinders vs Horsepower (ggplot) - Part 1

`geom_smooth()` using formula = 'y ~ x'

Miles per Gallon vs Horsepower (ggplot) - Part 2

`geom_smooth()` using formula = 'y ~ x'

Refernces

1. https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics
   /mtcars