Overview

This project investigates whether a car’s fuel efficiency, measured in miles per gallon (MPG), can be predicted using two mechanical characteristics: engine displacement and vehicle weight. Using the mtcars dataset and multiple linear regression, the goal is to determine how strongly these variables explain variation in MPG.

Introduction

Fuel efficiency is an important performance measure for vehicles, especially as fuel costs and environmental concerns continue to rise. A common measure of fuel efficiency is miles per gallon (MPG), which indicates how far a car can travel on one gallon of fuel.

Two mechanical characteristics that may influence MPG are: - Engine displacement (disp) - the size of the engine - Vehicle weight (wt) - how heavy the car is

The dataset used in this project is mtcars, originally published in Motor Trend Magazine (1974) and included in base R. It contains data on 32 cars and includes variables related to engine characteristics, performance, and fuel efficiency.

Research Question: Can a car’s fuel efficiency (MPG) be predicted using engine displacement and weight?

This project uses multiple linear regression as the required new statistical technique.

Exploring the Data

#Load dataset
data(mtcars)

#View structure of the dataset
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
#Full summary statistics for variables of interest 
summary(mtcars[, c("mpg", "disp", "wt")])
##       mpg             disp             wt       
##  Min.   :10.40   Min.   : 71.1   Min.   :1.513  
##  1st Qu.:15.43   1st Qu.:120.8   1st Qu.:2.581  
##  Median :19.20   Median :196.3   Median :3.325  
##  Mean   :20.09   Mean   :230.7   Mean   :3.217  
##  3rd Qu.:22.80   3rd Qu.:326.0   3rd Qu.:3.610  
##  Max.   :33.90   Max.   :472.0   Max.   :5.424

Scatterplots and Correlation Coefficients

#Create scatterplot of MPG vs engine displacement 
plot(mtcars$disp, mtcars$mpg, main = "MPG vs Engine Displacement", xlab = "Engine Displacement", ylab = "MPG")

#Calculate correlation coefficient between displacement and MPG
cor(mtcars$disp, mtcars$mpg)
## [1] -0.8475514
#Create scatterplot of MPG vs weight 
plot(mtcars$wt, mtcars$mpg, main = "MPG vs Weight", xlab = "Weight (1000 lbs)", ylab = "MPG")

#Calculate correlation coefficient between weight and MPG
cor(mtcars$wt, mtcars$mpg)
## [1] -0.8676594

Correlation

#Correlation matrix for MPG, displacement, and weight
cor(mtcars[, c("mpg", "disp", "wt")])
##             mpg       disp         wt
## mpg   1.0000000 -0.8475514 -0.8676594
## disp -0.8475514  1.0000000  0.8879799
## wt   -0.8676594  0.8879799  1.0000000

Exploration Summary: MPG is negatively correlated with both displacement (-0.85) and weight (-0.87). Heavier cars and cars with larger engines tend to have lower MPG.

Analysis

Hypotheses:

H0: Engine displacement and weight do not significantly predict MPG.

H1: Engine displacement and/or weight do significantly predict MPG

Single Linear Regression Model - Model 1: MPG ~ Displacement

#Fit single regression model: MPG predicted by displacement
lm_disp <- lm(mpg ~ disp, data = mtcars)

#Display full summary of linear model
summary(lm_disp)
## 
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8922 -2.2022 -0.9631  1.6272  7.2305 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
## disp        -0.041215   0.004712  -8.747 9.38e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.251 on 30 degrees of freedom
## Multiple R-squared:  0.7183, Adjusted R-squared:  0.709 
## F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10

Interpretation: For every additional cubic inch of engine displacement, MPG decreases by 0.041 miles per gallon.

#Fit single regression model: MPG predicted by weight 
lm_wt <- lm(mpg ~ wt, data = mtcars)

#Display full summary of linear model 
summary(lm_wt)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Interpretation: For every additional 1000 lbs, MPG decreases by 5.34 MPG.

Multiple Regression Model

#Fit multiple regression model using displacement and weight 
lm_multi <- lm(mpg ~ disp + wt, data = mtcars)

#Display full summary of the multiple regression model
summary(lm_multi)
## 
## Call:
## lm(formula = mpg ~ disp + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4087 -2.3243 -0.7683  1.7721  6.3484 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.96055    2.16454  16.151 4.91e-16 ***
## disp        -0.01773    0.00919  -1.929  0.06362 .  
## wt          -3.35082    1.16413  -2.878  0.00743 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.917 on 29 degrees of freedom
## Multiple R-squared:  0.7809, Adjusted R-squared:  0.7658 
## F-statistic: 51.69 on 2 and 29 DF,  p-value: 2.744e-10

Interpretation: Weight is the stronger predictor for MPG in this multiple regression model because it’s more directly tied to fuel use. Heavier cars need more energy, especially to move, climb, or maintain speed, which decreases fuel economy. While displacement also relates to MPG, it correlates strongly with weight (r = 0.89). This means that both variables explain the same variation in MPG. When both are included, weight accounts for more of that shared variation, leaving displacement with little predictive power. So, displacement’s coefficient shrinks and becomes statistically insignificant, while weight remains a strong, significant predictor.

Why weight becomes stronger in the multiple model: Weight and displacement are correlated (r = 0.89), when both are included, weight captures more of the shared variation.

Conclusions

These results show that fuel efficiency can be modeled by engine displacement and weight. It was noted that weight was the better predictor both statistically and practically, with both relationships having a negative effect. Larger and heavier cars are shown to have lower fuel efficiency, the multiple regression model explained 78.1% of the variation in MPG, showing the impact of these characteristics on efficiency. Overall, the results validate that vehicle weight and engine size are factors affecting MPG, with weight being the most consistently impactful.

Limitations

This dataset includes multiple limitations, to start the sample size is very small (n=32), all cars in the sample were manufactured in 1974, leading to the likelihood that this analysis may not account for more modern advanced technology. Along with that the model considers only two variables, when in reality, there are numerous of other parameters affecting fuel efficiency, such as transmission type, aerodynamics, and driving conditions. Lastly because the weight and displacement are so highly correlated, it makes it harder to separate their individual effects.

This document was produced as a final project for MAT 143H - Introduction to Statistics (Honors) at North Shore Community College. The course was led by Professor Billy Jackson. Student Name: Lilyanna Romero Semester: MAT-143H-L01