2025-02-09

Linear Regression for mtcars Data Set

Simple linear regression is a method used to model the relationship between a dependent variable \(Y\) and an independent variable \(X\).

The equation for simple linear regression is:

\[ Y = \beta_0 + \beta_1 X + \epsilon \] Where: - \(Y\) is the dependent variable. - \(\beta_0\) is the intercept. - \(\beta_1\) is the slope. - \(X\) is the independent variable. - \(\epsilon\) is the error term.



The mtcars dataset contains specifications and performance data for 32 car models from the 1970s. The dataset includes 11 variables related to car performance, dimensions, and fuel consumption.

Linear Regression: Weight vs Mileage

The data suggests there is an inverse relationship between weight and miles per gallon, meaning the heavier the car the less gas mileage.

Linear Regression: Weight vs Horsepower

Cars with the most horsepower tend to be heavier than cars with less horsepower.

Simple Linear Regression Model

Based off of the relationship between weight and miles per gallon, we will make a simple linear regression using the two variables.

slr <- lm(mpg ~ wt, data = mtcars)

model_summary <- summary(slr)

coefficients_rounded <- round(model_summary$coefficients, 3)

kable(coefficients_rounded, caption = "Regression Coefficients")
Regression Coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.285 1.878 19.858 0
wt -5.344 0.559 -9.559 0

Interactive Multivariable Linear Regression

Assessing Model Fit:

Diagnostic Plots of Simple Linear Regression

Assessing Model Fit:

Diagnostic Plots of Multivariable Linear Regression

Conclusion

  • Simple Linear Regression (SLR):
    \(\small \text{mpg} = 37.2851 - 5.3445 \times \text{wt}\)
    R-squared: 0.7528, explaining 75% of the variance in \(\small \text{mpg}\).
    p-value: \(\small < 2 \times 10^{-16}\), highly significant.

  • Multivariable Linear Regression (MLR): \(\small \text{mpg} = 37.2273 - 3.8778 \times \text{wt} - 0.0318 \times \text{hp}\)
    R-squared: 0.8268, explaining 82.7% of the variance in \(\small \text{mpg}\).
    p-value: \(\small 9.109 \times 10^{-12}\), highly significant.

The MLR model explains 82.7% of the variance in mpg, compared to the 75% explained by the SLR model. Including horsepower (hp) improves the model’s fit, providing a higher R-squared. Both models are statistically significant, but the multivariable model offers a more robust prediction by accounting for weight (wt) and horsepower (hp). Therefore, for a more accurate prediction of mpg, the MLR model is recommended.