2025-03-14

Looking at the data

First let’s load the dataset and let’s take a look to the first rows

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

MPG vs Weight graph

library(ggplot2)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "green", size = 2) +
  labs(title = "MPG vs. Weight",
       x = "Weight (1000 lbs)",
       y = "Miles per Gallon") +
  theme_minimal()

Linear Model Equation

We will fit the following simple linear model:

\[ \text{mpg} = \beta_0 + \beta_1 (\text{wt}) + \epsilon \]

Where:

  • \(\beta_0\) is the intercept

  • \(\beta_1\) is the slope (change in mpg per unit increase in weight)

  • \(\epsilon\) is the error term

Linear Regression (MPG vs Weight)

model <- lm(mpg ~ wt, data = mtcars)
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "blue", size = 2) +
  geom_smooth(method = "lm", se = T, color = "red") +
  labs(title = expression(bold("Regression Line (mpg vs wt)")),
       x = "Weight (1000 lbs)",
       y = "Miles per Gallon") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Statistical Summary of the Linear Regression model

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Slope Estimate:

The slope \(\hat{\beta_1}\) in a simple linear regression is given by:

\[ \hat{\beta_1} = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})} {\sum_{i=1}^n (x_i - \bar{x})^2} \]

where:

  • \(x_i\) is the predictor (weight)

  • \(y_i\) is the response (mpg)

  • \(\bar{x}\) and \(\bar{y}\) are sample means.

Weight vs Horsepower vs MPG 3D plot

Conclusions

  • Car weight has a significant negative relationship with mpg. Meaning that on average an increase in the car’s weight will produce a decrease in the mpg.
  • For a more accurate analysis we could include additional variables of the dataset to improve model fit.
  • The mtcars dataset is pretty convenient for demonstrating simple and multiple linear regression concepts.