This project investigates whether a car’s fuel efficiency, measured in miles per gallon (MPG), can be predicted using two mechanical characteristics: engine displacement and vehicle weight. Using the mtcars dataset and multiple linear regression, the goal is to determine how strongly these variables explain variation in MPG.
Fuel efficiency is an important performance measure for vehicles, especially as fuel costs and environmental concerns continue to rise. A common measure of fuel efficiency is miles per gallon (MPG), which indicates how far a car can travel on one gallon of fuel.
Two mechanical characteristics that may influence MPG are: - Engine displacement (disp) - the size of the engine - Vehicle weight (wt) - how heavy the car is
The dataset used in this project is mtcars, originally published in Motor Trend Magazine (1974) and included in base R. It contains data on 32 cars and includes variables related to engine characteristics, performance, and fuel efficiency.
Research Question: Can a car’s fuel efficiency (MPG) be predicted using engine displacement and weight?
This project uses multiple linear regression as the required new statistical technique.
#Load dataset
data(mtcars)
#View structure of the dataset
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
#Full summary statistics for variables of interest
summary(mtcars[, c("mpg", "disp", "wt")])
## mpg disp wt
## Min. :10.40 Min. : 71.1 Min. :1.513
## 1st Qu.:15.43 1st Qu.:120.8 1st Qu.:2.581
## Median :19.20 Median :196.3 Median :3.325
## Mean :20.09 Mean :230.7 Mean :3.217
## 3rd Qu.:22.80 3rd Qu.:326.0 3rd Qu.:3.610
## Max. :33.90 Max. :472.0 Max. :5.424
Scatterplots and Correlation Coefficients
#Create scatterplot of MPG vs engine displacement
plot(mtcars$disp, mtcars$mpg, main = "MPG vs Engine Displacement", xlab = "Engine Displacement", ylab = "MPG")
#Calculate correlation coefficient between displacement and MPG
cor(mtcars$disp, mtcars$mpg)
## [1] -0.8475514
#Create scatterplot of MPG vs weight
plot(mtcars$wt, mtcars$mpg, main = "MPG vs Weight", xlab = "Weight (1000 lbs)", ylab = "MPG")
#Calculate correlation coefficient between weight and MPG
cor(mtcars$wt, mtcars$mpg)
## [1] -0.8676594
Correlation
#Correlation matrix for MPG, displacement, and weight
cor(mtcars[, c("mpg", "disp", "wt")])
## mpg disp wt
## mpg 1.0000000 -0.8475514 -0.8676594
## disp -0.8475514 1.0000000 0.8879799
## wt -0.8676594 0.8879799 1.0000000
Exploration Summary: MPG is negatively correlated with both displacement (-0.85) and weight (-0.87). Heavier cars and cars with larger engines tend to have lower MPG.
Hypotheses:
H0: Engine displacement and weight do not significantly predict MPG.
H1: Engine displacement and/or weight do significantly predict MPG
Single Linear Regression Model - Model 1: MPG ~ Displacement
#Fit single regression model: MPG predicted by displacement
lm_disp <- lm(mpg ~ disp, data = mtcars)
#Display full summary of linear model
summary(lm_disp)
##
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8922 -2.2022 -0.9631 1.6272 7.2305
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.599855 1.229720 24.070 < 2e-16 ***
## disp -0.041215 0.004712 -8.747 9.38e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.251 on 30 degrees of freedom
## Multiple R-squared: 0.7183, Adjusted R-squared: 0.709
## F-statistic: 76.51 on 1 and 30 DF, p-value: 9.38e-10
Interpretation: For every additional cubic inch of engine displacement, MPG decreases by 0.041 miles per gallon.
#Fit single regression model: MPG predicted by weight
lm_wt <- lm(mpg ~ wt, data = mtcars)
#Display full summary of linear model
summary(lm_wt)
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Interpretation: For every additional 1000 lbs, MPG decreases by 5.34 MPG.
Multiple Regression Model
#Fit multiple regression model using displacement and weight
lm_multi <- lm(mpg ~ disp + wt, data = mtcars)
#Display full summary of the multiple regression model
summary(lm_multi)
##
## Call:
## lm(formula = mpg ~ disp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4087 -2.3243 -0.7683 1.7721 6.3484
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.96055 2.16454 16.151 4.91e-16 ***
## disp -0.01773 0.00919 -1.929 0.06362 .
## wt -3.35082 1.16413 -2.878 0.00743 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.917 on 29 degrees of freedom
## Multiple R-squared: 0.7809, Adjusted R-squared: 0.7658
## F-statistic: 51.69 on 2 and 29 DF, p-value: 2.744e-10
Interpretation: Holding displacement constant, each additional 1000 lbs reduces MPG by 3.35. Holding weight constant, each additional cubic inch of displacement reduces MPG by 0.0177, but this effect is not statistically significant. The model explains 78.1% of the variation in MPG.
Why weight becomes stronger in the multiple model: Weight and displacement are correlated (r = 0.89), when both are included, weight captures more of the shared variation.
Diagnostics
#Generate diagnostic plots for the multiple regression model
plot(lm_multi)
Interpretation of Diagnostic Plots: - Residuals vs Fitted - points are randomly scattered = linearity assumption is reasonable. - Normal Q-Q Plot - points fall mostly on the line = residuals are approximately normal. - Scale-Location Plot - variance appears roughly constant = homoscedasticity is acceptable. - Residuals vs Leverage - no extreme influential points = model is stable.
The analysis shows that MPG can be predicted using engine displacement and weight. Both variables have negative relationships with MPG, meaning: cars with larger engines tend to have lower fuel efficiency and heavier cars also tend to have lower fuel efficiency. Weight was the strongest predictor in the model. Overall, the regression model explains a substantial amount of variation in MPG, supporting the conclusion that these mechanical characteristics meaningfully influence fuel efficiency.
The dataset is small (n = 32), which limits generalization. All cars are from 1974, so the results may not reflect modern vehicle technology. Other important predictors of MPG are transmission type, fuel type, or aerodynamics are not included. The dataset is observational, not experimental, so causation can’t be established.
This document was produced as a final project for MAT 143H - Introduction to Statistics (Honors) at North Shore Community College. The course was led by Professor Billy Jackson. Student Name: Lilyanna Romero Semester: MAT-143H-L01