This project investigates whether a car’s fuel efficiency, measured in miles per gallon (MPG), can be predicted using two mechanical characteristics: engine displacement and vehicle weight. Using the mtcars dataset and multiple linear regression, the goal is to determine how strongly these variables explain variation in MPG.
Fuel efficiency is an important performance measure for vehicles, especially as fuel costs and environmental concerns continue to rise. A common measure of fuel efficiency is miles per gallon (MPG), which indicates how far a car can travel on one gallon of fuel.
Two mechanical characteristics that may influence MPG are: - Engine displacement (disp) - the size of the engine - Vehicle weight (wt) - how heavy the car is
The dataset used in this project is mtcars, originally published in Motor Trend Magazine (1974) and included in base R. It contains data on 32 cars and includes variables related to engine characteristics, performance, and fuel efficiency.
Research Question: Can a car’s fuel efficiency (MPG) be predicted using engine displacement and weight?
This project uses multiple linear regression as the required new statistical technique.
#Load dataset
data(mtcars)
#View structure of the dataset
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
#Full summary statistics for variables of interest
summary(mtcars[, c("mpg", "disp", "wt")])
## mpg disp wt
## Min. :10.40 Min. : 71.1 Min. :1.513
## 1st Qu.:15.43 1st Qu.:120.8 1st Qu.:2.581
## Median :19.20 Median :196.3 Median :3.325
## Mean :20.09 Mean :230.7 Mean :3.217
## 3rd Qu.:22.80 3rd Qu.:326.0 3rd Qu.:3.610
## Max. :33.90 Max. :472.0 Max. :5.424
Scatterplots and Correlation Coefficients
#Create scatterplot of MPG vs engine displacement
plot(mtcars$disp, mtcars$mpg, main = "MPG vs Engine Displacement", xlab = "Engine Displacement", ylab = "MPG")
#Calculate correlation coefficient between displacement and MPG
cor(mtcars$disp, mtcars$mpg)
## [1] -0.8475514
#Create scatterplot of MPG vs weight
plot(mtcars$wt, mtcars$mpg, main = "MPG vs Weight", xlab = "Weight (1000 lbs)", ylab = "MPG")
#Calculate correlation coefficient between weight and MPG
cor(mtcars$wt, mtcars$mpg)
## [1] -0.8676594
Correlation
#Correlation matrix for MPG, displacement, and weight
cor(mtcars[, c("mpg", "disp", "wt")])
## mpg disp wt
## mpg 1.0000000 -0.8475514 -0.8676594
## disp -0.8475514 1.0000000 0.8879799
## wt -0.8676594 0.8879799 1.0000000
Exploration Summary: MPG is negatively correlated with both displacement (-0.85) and weight (-0.87). Heavier cars and cars with larger engines tend to have lower MPG.
Hypotheses:
H0: Engine displacement and weight do not significantly predict MPG.
H1: Engine displacement and/or weight do significantly predict MPG
Single Linear Regression Model - Model 1: MPG ~ Displacement
#Fit single regression model: MPG predicted by displacement
lm_disp <- lm(mpg ~ disp, data = mtcars)
#Display full summary of linear model
summary(lm_disp)
##
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8922 -2.2022 -0.9631 1.6272 7.2305
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.599855 1.229720 24.070 < 2e-16 ***
## disp -0.041215 0.004712 -8.747 9.38e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.251 on 30 degrees of freedom
## Multiple R-squared: 0.7183, Adjusted R-squared: 0.709
## F-statistic: 76.51 on 1 and 30 DF, p-value: 9.38e-10
Interpretation: For every additional cubic inch of engine displacement, MPG decreases by 0.041 miles per gallon.
#Fit single regression model: MPG predicted by weight
lm_wt <- lm(mpg ~ wt, data = mtcars)
#Display full summary of linear model
summary(lm_wt)
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Interpretation: For every additional 1000 lbs, MPG decreases by 5.34 MPG.
Multiple Regression Model
#Fit multiple regression model using displacement and weight
lm_multi <- lm(mpg ~ disp + wt, data = mtcars)
#Display full summary of the multiple regression model
summary(lm_multi)
##
## Call:
## lm(formula = mpg ~ disp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4087 -2.3243 -0.7683 1.7721 6.3484
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.96055 2.16454 16.151 4.91e-16 ***
## disp -0.01773 0.00919 -1.929 0.06362 .
## wt -3.35082 1.16413 -2.878 0.00743 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.917 on 29 degrees of freedom
## Multiple R-squared: 0.7809, Adjusted R-squared: 0.7658
## F-statistic: 51.69 on 2 and 29 DF, p-value: 2.744e-10
Interpretation: Weight is the stronger predictor for MPG in this multiple regression model because it’s more directly tied to fuel use. Heavier cars need more energy, especially to move, climb, or maintain speed, which decreases fuel economy. While displacement also relates to MPG, it correlates strongly with weight (r = 0.89). This means that both variables explain the same variation in MPG. When both are included, weight accounts for more of that shared variation, leaving displacement with little predictive power. So, displacement’s coefficient shrinks and becomes statistically insignificant, while weight remains a strong, significant predictor.
Why weight becomes stronger in the multiple model: Weight and displacement are correlated (r = 0.89), when both are included, weight captures more of the shared variation.
These results show that fuel efficiency can be modeled by engine displacement and weight. It was noted that weight was the better predictor both statistically and practically, with both relationships having a negative effect. Larger and heavier cars are shown to have lower fuel efficiency, the multiple regression model explained 78.1% of the variation in MPG, showing the impact of these characteristics on efficiency. Overall, the results validate that vehicle weight and engine size are factors affecting MPG, with weight being the most consistently impactful.
This dataset includes multiple limitations, to start the sample size is very small (n=32), all cars in the sample were manufactured in 1974, leading to the likelihood that this analysis may not account for more modern advanced technology. Along with that the model considers only two variables, when in reality, there are numerous of other parameters affecting fuel efficiency, such as transmission type, aerodynamics, and driving conditions. Lastly because the weight and displacement are so highly correlated, it makes it harder to separate their individual effects.
This document was produced as a final project for MAT 143H - Introduction to Statistics (Honors) at North Shore Community College. The course was led by Professor Billy Jackson. Student Name: Lilyanna Romero Semester: MAT-143H-L01