Introduction & Model

Simple linear regression models the relation between some variable \(X\) and it’s dependent variable \(Y\).

We are exploring the relationship between:
- wt: weight of the car (predictor variable)
- mpg: miles per gallon (dependent variable)

The model:

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

Assumptions

  1. Linearity: The relationship between X and Y is linear.
  2. Independence: Observations are independent of each other.
  3. Homoscedasticity: Variance of residuals is constant.
  4. Normality: Residuals are normally distributed.

Dataset: mtcars

The mtcars dataset was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

dim(mtcars)
## [1] 32 11

We’ll use:
- mpg: miles per gallon
- wt: weight (1000 lbs)
- hp: horsepower

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

MPG vs Weight

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE, color = "blue") +
  labs(title = "MPG vs Weight", x = "Weight (1000 lbs)", y = "Miles per Gallon")
## `geom_smooth()` using formula = 'y ~ x'

Linear Model Summary

model <- lm(mpg ~ wt, data = mtcars)
summary(model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

MPG vs Weight + HP

plot_ly(mtcars, x = ~wt, y = ~hp, z = ~mpg,
        type = "scatter3d", mode = "markers",
        marker = list(size = 5)) %>%
  layout(title = "MPG vs Weight + HP")

Regression Coefficients w/ Math and Code

\[ \beta_1 = \frac{\sum(x_i - \overline{x})(y_i - \overline{y})}{\sum(x_i - \overline{x})^2}\\ \beta_0 = \overline{y} - \beta_1\overline{x} \]

x <- mtcars$wt
y <- mtcars$mpg

x_bar <- mean(x)
y_bar <- mean(y)

beta_1 <- sum((x - x_bar) * (y - y_bar)) / sum((x - x_bar)^2)
beta_0 <- y_bar - beta_1 * x_bar

cat("Estimated slope (β1):", round(beta_1, 4), "\n")
## Estimated slope (β1): -5.3445
cat("Estimated intercept (β0):", round(beta_0, 4), "\n")
## Estimated intercept (β0): 37.2851

Residual Plot

ggplot(data = model, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs Fitted", x = "Fitted Values", y = "Residuals")

Interpretation

  • Intercept (β₀ = 37.2851): Estimated mpg when weight = 0.
    This isn’t realistic (no car weighs 0), but it’s needed to anchor the regression line.
  • Slope (β₁ = -5.3445): For every additional 1000 lbs of weight, the car’s fuel efficiency is expected to decrease by about 5.34 mpg.
    This indicates a strong negative relationship between weight and fuel efficiency.

Residual Interpretations

  • Residual = Actual mpg − Predicted mpg
  • A positive residual means the model underestimated the mpg.
  • A negative residual means the model overestimated the mpg.
  • In a good model, residuals should be randomly scattered around 0 (as we saw in the residual plot).

Summary

  • Strong negative relationship means there is an inverse correlation between mpg and wt.
  • Model assumptions are fairly satisfied.
  • This provides realisitc insight into how fuel efficiency changes with vehicle weight.

Thank You!