Overview

This project investigates whether a car’s fuel efficiency, measured in miles per gallon (MPG), can be predicted using two mechanical characteristics: engine displacement and vehicle weight. Using the mtcars dataset and multiple linear regression, the goal is to determine how strongly these variables explain variation in MPG.

Introduction

Fuel efficiency is an important performance measure for vehicles, especially as fuel costs and environmental concerns continue to rise. A common measure of fuel efficiency is miles per gallon (MPG), which indicates how far a car can travel on one gallon of fuel.

Two mechanical characteristics that may influence MPG are: - Engine displacement (disp) - the size of the engine - Vehicle weight (wt) - how heavy the car is

The dataset used in this project is mtcars, originally published in Motor Trend Magazine (1974) and included in base R. It contains data on 32 cars and includes variables related to engine characteristics, performance, and fuel efficiency.

Research Question: Can a car’s fuel efficiency (MPG) be predicted using engine displacement and weight?

This project uses multiple linear regression as the required new statistical technique.

Exploring the Data

#Load dataset
data(mtcars)

#View structure of the dataset
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
#Full summary statistics for variables of interest 
summary(mtcars[, c("mpg", "disp", "wt")])
##       mpg             disp             wt       
##  Min.   :10.40   Min.   : 71.1   Min.   :1.513  
##  1st Qu.:15.43   1st Qu.:120.8   1st Qu.:2.581  
##  Median :19.20   Median :196.3   Median :3.325  
##  Mean   :20.09   Mean   :230.7   Mean   :3.217  
##  3rd Qu.:22.80   3rd Qu.:326.0   3rd Qu.:3.610  
##  Max.   :33.90   Max.   :472.0   Max.   :5.424

Scatterplots and Correlation Coefficients

#Create scatterplot of MPG vs engine displacement 
plot(mtcars$disp, mtcars$mpg, main = "MPG vs Engine Displacement", xlab = "Engine Displacement", ylab = "MPG")

#Calculate correlation coefficient between displacement and MPG
cor(mtcars$disp, mtcars$mpg)
## [1] -0.8475514
#Create scatterplot of MPG vs weight 
plot(mtcars$wt, mtcars$mpg, main = "MPG vs Weight", xlab = "Weight (1000 lbs)", ylab = "MPG")

#Calculate correlation coefficient between weight and MPG
cor(mtcars$wt, mtcars$mpg)
## [1] -0.8676594

Correlation

#Correlation matrix for MPG, displacement, and weight
cor(mtcars[, c("mpg", "disp", "wt")])
##             mpg       disp         wt
## mpg   1.0000000 -0.8475514 -0.8676594
## disp -0.8475514  1.0000000  0.8879799
## wt   -0.8676594  0.8879799  1.0000000

Exploration Summary: MPG is negatively correlated with both displacement (-0.85) and weight (-0.87). Heavier cars and cars with larger engines tend to have lower MPG.

Analysis

Hypotheses:

H0: Engine displacement and weight do not significantly predict MPG.

H1: Engine displacement and/or weight do significantly predict MPG

Single Linear Regression Model - Model 1: MPG ~ Displacement

#Fit single regression model: MPG predicted by displacement
lm_disp <- lm(mpg ~ disp, data = mtcars)

#Display full summary of linear model
summary(lm_disp)
## 
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8922 -2.2022 -0.9631  1.6272  7.2305 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
## disp        -0.041215   0.004712  -8.747 9.38e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.251 on 30 degrees of freedom
## Multiple R-squared:  0.7183, Adjusted R-squared:  0.709 
## F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10

Interpretation: For every additional cubic inch of engine displacement, MPG decreases by 0.041 miles per gallon.

#Fit single regression model: MPG predicted by weight 
lm_wt <- lm(mpg ~ wt, data = mtcars)

#Display full summary of linear model 
summary(lm_wt)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Interpretation: For every additional 1000 lbs, MPG decreases by 5.34 MPG.

Multiple Regression Model

#Fit multiple regression model using displacement and weight 
lm_multi <- lm(mpg ~ disp + wt, data = mtcars)

#Display full summary of the multiple regression model
summary(lm_multi)
## 
## Call:
## lm(formula = mpg ~ disp + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4087 -2.3243 -0.7683  1.7721  6.3484 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.96055    2.16454  16.151 4.91e-16 ***
## disp        -0.01773    0.00919  -1.929  0.06362 .  
## wt          -3.35082    1.16413  -2.878  0.00743 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.917 on 29 degrees of freedom
## Multiple R-squared:  0.7809, Adjusted R-squared:  0.7658 
## F-statistic: 51.69 on 2 and 29 DF,  p-value: 2.744e-10

Interpretation: Holding displacement constant, each additional 1000 lbs reduces MPG by 3.35. Holding weight constant, each additional cubic inch of displacement reduces MPG by 0.0177, but this effect is not statistically significant. The model explains 78.1% of the variation in MPG.

Why weight becomes stronger in the multiple model: Weight and displacement are correlated (r = 0.89), when both are included, weight captures more of the shared variation.

Diagnostics

#Generate diagnostic plots for the multiple regression model
plot(lm_multi)

Interpretation of Diagnostic Plots: - Residuals vs Fitted - points are randomly scattered = linearity assumption is reasonable. - Normal Q-Q Plot - points fall mostly on the line = residuals are approximately normal. - Scale-Location Plot - variance appears roughly constant = homoscedasticity is acceptable. - Residuals vs Leverage - no extreme influential points = model is stable.

Conclusions

The analysis shows that MPG can be predicted using engine displacement and weight. Both variables have negative relationships with MPG, meaning: cars with larger engines tend to have lower fuel efficiency and heavier cars also tend to have lower fuel efficiency. Weight was the strongest predictor in the model. Overall, the regression model explains a substantial amount of variation in MPG, supporting the conclusion that these mechanical characteristics meaningfully influence fuel efficiency.

Limitations

The dataset is small (n = 32), which limits generalization. All cars are from 1974, so the results may not reflect modern vehicle technology. Other important predictors of MPG are transmission type, fuel type, or aerodynamics are not included. The dataset is observational, not experimental, so causation can’t be established.

This document was produced as a final project for MAT 143H - Introduction to Statistics (Honors) at North Shore Community College. The course was led by Professor Billy Jackson. Student Name: Lilyanna Romero Semester: MAT-143H-L01