What is Simple Linear Regression?

Simple linear regression is a statistical method used to study the relationship between two quantitative variables.

In this presentation, we use the built-in mtcars dataset.

We study whether a car’s weight can help predict miles per gallon.

  • Response variable: mpg
  • Explanatory variable: wt

The Regression Equation

The simple linear regression model is:

\[ Y = \beta_0 + \beta_1X + \epsilon \]

Where:

  • \(Y\) is the response variable
  • \(X\) is the explanatory variable
  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\epsilon\) is the error term

Dataset Preview

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

The mtcars dataset contains information about cars, including miles per gallon, weight, horsepower, and other features.

GGPlot 1: Scatterplot of Weight and MPG

Heavier cars generally have lower miles per gallon.

Fitting the Regression Model

summary(model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

The model estimates how much mpg changes when car weight increases by one unit.

Interpreting the Slope

The fitted regression equation is:

\[ \widehat{mpg} = b_0 + b_1(wt) \]

coef(model)
## (Intercept)          wt 
##   37.285126   -5.344472

The estimated slope is -5.34, meaning that for each one-unit increase in vehicle weight, the predicted miles per gallon decreases by approximately 5.34.

GGPlot 2: Regression Line

The regression line shows the estimated relationship between car weight and fuel efficiency.

Plotly Interactive Plot

This interactive plot allows the viewer to hover over each point.

Slide With R Code Example

This is the R code used to create the regression model:

model <- lm(mpg ~ wt, data = mtcars)
summary(model)
coef(model)

This code fits a linear regression model using mpg as the response variable and wt as the explanatory variable.

Residuals

Residuals measure the difference between the actual value and predicted value.

\[ e_i = y_i - \hat{y}_i \]

A residual tells us how far each observed data point is from the regression line.

GGPlot 3: Residual Plot

A residual plot helps us check whether the linear model is reasonable.

Prediction Example

Suppose a car has a weight of 3.0.

new_car <- data.frame(wt = 3.0)
predict(model, newdata = new_car)
##        1 
## 21.25171

For a car with weight = 3.0, the model predicts approximately 21.25 miles per gallon.

Conclusion

Simple linear regression helps us understand and predict relationships between two variables.

For this dataset:

  • Car weight and miles per gallon have a negative relationship.
  • Heavier cars tend to have lower fuel efficiency.
  • Regression can be used to make predictions.