#1. Introduction
In this project, I am an analyst working for a car maker, and I have access to a dataset that can be used to study the fuel economy of cars. As a car maker, I am interested in identifying and understanding factors that contribute to better fuel economy. This dataset includes several key variables related to fuel economy. Using this data, I will develop models to predict fuel economy and uncover insights to improve vehicle efficiency.
#2. Data Loading:
data(mtcars)
head(mtcars, 10)
library(ggplot2)
library(tidyverse)
#3. Pearson’s Correlation
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## [1] -0.7761684
## [1] -0.7768859
## [1] -0.7082234
## [1] 0.418684
## [1] 0.6811719
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$hp
## t = -6.7424, df = 30, p-value = 1.788e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8852686 -0.5860994
## sample estimates:
## cor
## -0.7761684
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$hp + mtcars$qsec + mtcars$drat
## t = -6.7581, df = 30, p-value = 1.713e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8856589 -0.5872846
## sample estimates:
## cor
## -0.7768859
##
## Pearson's product-moment correlation
##
## data: mtcars$hp and mtcars$qsec
## t = -5.4946, df = 30, p-value = 5.766e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8475998 -0.4774331
## sample estimates:
## cor
## -0.7082234
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$qsec
## t = 2.5252, df = 30, p-value = 0.01708
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08195487 0.66961864
## sample estimates:
## cor
## 0.418684
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$drat
## t = 5.096, df = 30, p-value = 1.776e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4360484 0.8322010
## sample estimates:
## cor
## 0.6811719
#4. Linear and Multi Regression Models
I am writing the general form and prediction equation of a regression model for fuel economy using horsepower, quarter-mile time, and rear axle ratio as predictors. I will include interaction terms for horsepower and quarter-mile time, as well as for horsepower and rear axle ratio. Next, I am creating the regression model based on these predictors and interactions. Finally, I will write the prediction model equation using the outputs obtained from my R script.
“MPG = β0 + β1 * HP + ϵ”
” MPG: is the dependent variabnle that I aim to predict” ” HP: is the independent variable used as a predictor” ” β0: The intercept of the regression line, representing the predicted MPG when HP is 0” ” β1: The slope coefficent, representing the change in MPG for each unit increase in. HP” ” ϵ: The error term, representing the difference between the opbderved MPG values and the values predicted by the model”
model <- lm(mpg ~ hp, data = mtcars)
##
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7121 -2.1122 -0.8854 1.5819 8.2360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
## hp -0.06823 0.01012 -6.742 1.79e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
## F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
## R-squared: 0.6024373
## Adjusted R-squared: 0.5891853
“MPG = β0 + β1 * HP + β2 * QSEC + β3 * DRAT + ϵ”
” MPG: is the dependent variabnle that I aim to predict fuel economy” ” HP: Predictor variable for horsepowe” ” QSEC: Predictor variable for a.” ” DRAT: Predictor variable for rear axle ratio” ” β0: Intercept of the regression line, representing the predicted MPG when all predictors are zero.” ” β1, β2, β3 : Coefficients representing the change in MPG for a one-unit increase in HP, QSEC, and DRAT, respectively.” ” ϵ: Error term representing the difference between observed and predicted MPG values”
model1 <- lm(mpg ~ hp + qsec + drat, data = mtcars)
##
## Call:
## lm(formula = mpg ~ hp + qsec + drat, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7977 -2.4804 -0.4937 1.1381 7.3188
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.73662 13.01979 1.362 0.183968
## hp -0.05797 0.01421 -4.080 0.000339 ***
## qsec -0.28407 0.48923 -0.581 0.566116
## drat 4.42875 1.29169 3.429 0.001897 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.207 on 28 degrees of freedom
## Multiple R-squared: 0.7443, Adjusted R-squared: 0.7168
## F-statistic: 27.16 on 3 and 28 DF, p-value: 1.937e-08
## R-squared: 0.7442512
## Adjusted R-squared: 0.7168495
“HP = β0 + β1 * QSEC + ϵ”
” HP: is the dependent variabnle that I aim to predict” ” QSEC: is the independent variable used as a predictor” ” β0: The intercept of the regression line, representing the predicted HPx when QSEC is 0” ” β1: The slope coefficent, representing the change in HP for each unit increase in. QSEC” ” ϵ: The error term, representing the difference between the opbderved HP values and the values predicted by the model”
model2 <- lm(hp ~ qsec, data = mtcars)
##
## Call:
## lm(formula = hp ~ qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -86.903 -33.629 5.336 27.925 100.032
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 631.704 88.700 7.122 6.38e-08 ***
## qsec -27.174 4.946 -5.495 5.77e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 49.2 on 30 degrees of freedom
## Multiple R-squared: 0.5016, Adjusted R-squared: 0.485
## F-statistic: 30.19 on 1 and 30 DF, p-value: 5.766e-06
## R-squared: 0.5015804
## Adjusted R-squared: 0.4849664
“HP = β0 + β1 * DRAT + ϵ”
” HP: is the dependent variabnle that I aim to predict” ” QSEC: is the independent variable used as a predictor” ” β0: The intercept of the regression line, representing the predicted HP when QSEC is 0” ” β1: The slope coefficent, representing the change in HP for each unit increase in. QSEC” ” ϵ: The error term, representing the difference between the opbderved HP values and the values predicted by the model”
model3 <- lm(hp ~ drat, data = mtcars)
##
## Call:
## lm(formula = hp ~ drat, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -89.828 -40.261 -7.934 7.247 185.058
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 353.65 76.05 4.65 6.24e-05 ***
## drat -57.55 20.92 -2.75 0.00999 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 62.28 on 30 degrees of freedom
## Multiple R-squared: 0.2014, Adjusted R-squared: 0.1748
## F-statistic: 7.565 on 1 and 30 DF, p-value: 0.009989
## R-squared: 0.2013847
## Adjusted R-squared: 0.1747642
“MPG = β0 + β1 * HP + β2 * QSEC”
” MPG: is the dependent variabnle that I aim to predict fuel economy” ” HP: Predictor variable for horsepowe” ” QSEC: Predictor variable for quarter-mile time.” ” β0: representing the predicted value of MPG when both HP and QSEC are zero.” ” β1: Coefficients representing the change in MPG for a one-unit increase in HP while holding QSEC constant.” ” β2: Coefficient Slope for QSEC. It represents the change in MPG for each unit increase in QSEC, holding HP constant.”
model4 <- lm(mpg ~ hp + qsec, data = mtcars)
##
## Call:
## lm(formula = mpg ~ hp + qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1782 -2.6030 -0.5098 1.2866 8.7178
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.32371 11.10331 4.352 0.000153 ***
## hp -0.08459 0.01393 -6.071 1.31e-06 ***
## qsec -0.88658 0.53459 -1.658 0.108007
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.755 on 29 degrees of freedom
## Multiple R-squared: 0.6369, Adjusted R-squared: 0.6118
## F-statistic: 25.43 on 2 and 29 DF, p-value: 4.176e-07
new_data <- data.frame(hp = 160, qsec = c(1, 2))
prediction_model <- predict(model4, newdata = new_data)
## 1 2
## 33.90224 33.01566
## 1
## 0.8865796
## Overall F-test P-value: 25.43136 2 29
## Car Fitted
## Mazda RX4 Mazda RX4 24.425370
## Mazda RX4 Wag Mazda RX4 Wag 23.928885
## Datsun 710 Datsun 710 23.957305
## Hornet 4 Drive Hornet 4 Drive 21.783362
## Hornet Sportabout Hornet Sportabout 18.430337
## Valiant Valiant 21.514796
## Duster 360 Duster 360 13.554988
## Merc 240D Merc 240D 25.347344
## Merc 230 Merc 230 19.984693
## Merc 280 Merc 280 21.694354
## Merc 280C Merc 280C 21.162406
## Merc 450SE Merc 450SE 17.670472
## Merc 450SL Merc 450SL 17.493156
## Merc 450SLC Merc 450SLC 17.138524
## Cadillac Fleetwood Cadillac Fleetwood 15.041430
## Lincoln Continental Lincoln Continental 14.337352
## Chrysler Imperial Chrysler Imperial 13.423088
## Fiat 128 Fiat 128 25.478859
## Honda Civic Honda Civic 27.505412
## Toyota Corolla Toyota Corolla 25.182223
## Toyota Corona Toyota Corona 22.377722
## Dodge Challenger Dodge Challenger 20.678150
## AMC Javelin AMC Javelin 20.296921
## Camaro Z28 Camaro Z28 13.936217
## Pontiac Firebird Pontiac Firebird 18.403740
## Fiat X1-9 Fiat X1-9 25.984209
## Porsche 914-2 Porsche 914-2 25.819858
## Lotus Europa Lotus Europa 23.781496
## Ford Pantera L Ford Pantera L 13.135737
## Ferrari Dino Ferrari Dino 19.777938
## Maserati Bora Maserati Bora 7.040973
## Volvo 142E Volvo 142E 22.612682
## Car Residuals
## Mazda RX4 Mazda RX4 -3.42536974
## Mazda RX4 Wag Mazda RX4 Wag -2.92888515
## Datsun 710 Datsun 710 -1.15730529
## Hornet 4 Drive Hornet 4 Drive -0.38336246
## Hornet Sportabout Hornet Sportabout 0.26966269
## Valiant Valiant -3.41479557
## Duster 360 Duster 360 0.74501179
## Merc 240D Merc 240D -0.94734397
## Merc 230 Merc 230 2.81530738
## Merc 280 Merc 280 -2.49435367
## Merc 280C Merc 280C -3.36240589
## Merc 450SE Merc 450SE -1.27047184
## Merc 450SL Merc 450SL -0.19315591
## Merc 450SLC Merc 450SLC -1.93852406
## Cadillac Fleetwood Cadillac Fleetwood -4.64142957
## Lincoln Continental Lincoln Continental -3.93735187
## Chrysler Imperial Chrysler Imperial 1.27691194
## Fiat 128 Fiat 128 6.92114100
## Honda Civic Honda Civic 2.89458775
## Toyota Corolla Toyota Corolla 8.71777720
## Toyota Corona Toyota Corona -0.87772164
## Dodge Challenger Dodge Challenger -5.17815035
## AMC Javelin AMC Javelin -5.09692111
## Camaro Z28 Camaro Z28 -0.63621745
## Pontiac Firebird Pontiac Firebird 0.79626007
## Fiat X1-9 Fiat X1-9 1.31579062
## Porsche 914-2 Porsche 914-2 0.18014154
## Lotus Europa Lotus Europa 6.61850442
## Ford Pantera L Ford Pantera L 2.66426292
## Ferrari Dino Ferrari Dino -0.07793834
## Maserati Bora Maserati Bora 7.95902698
## Volvo 142E Volvo 142E -1.21268239
“MPG = β0 + β1 * HP + β2 * DRAT”
” MPG: is the dependent variabnle that I aim to predict fuel economy” ” HP: Predictor variable for horsepowe” ” DRAT: Predictor variable for rear axle ratio.” ” β0: representing the predicted value of MPG when both HP and DRAT are zero.” ” β1: Coefficients representing the change in MPG for a one-unit increase in HP while holding DRAT constant.” ” β2: Coefficient Slope for DRAT It represents the change in MPG for each unit increase in DRAT, holding HP constant.”
model5 <- lm(mpg ~ hp + drat, data = mtcars)
##
## Call:
## lm(formula = mpg ~ hp + drat, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0369 -2.3487 -0.6034 1.1897 7.7500
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.789861 5.077752 2.125 0.042238 *
## hp -0.051787 0.009293 -5.573 5.17e-06 ***
## drat 4.698158 1.191633 3.943 0.000467 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.17 on 29 degrees of freedom
## Multiple R-squared: 0.7412, Adjusted R-squared: 0.7233
## F-statistic: 41.52 on 2 and 29 DF, p-value: 3.081e-09
## [1] "To see the actual change, I will create a prediction model to show the change"
new_data <- data.frame(hp = 160, drat = c(1, 2))
prediction_model2 <- predict(model5, newdata = new_data)
## 1 2
## 7.202155 11.900313
## 1
## -4.698158
## Overall F-test P-value: 41.52167 2 29
#5. Model with Interaction Term and Qualitative Predictor
“General form: MPG=β0+β1⋅HP+β2⋅QSEC+β3⋅(HP⋅QSEC)+β4⋅CYL6+β5⋅CYL8+ϵ”
“Predicted general form: MPG-hat=β0+β1⋅HP+β2⋅QSEC+β3⋅(HP⋅QSEC)+β4⋅CYL6+β5⋅CYL8+ϵ”
model6 <- lm(mpg~ hp + qsec +qsec:hp + factor(cyl), data = mtcars)
##
## Call:
## lm(formula = mpg ~ hp + qsec + qsec:hp + factor(cyl), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.0004 -1.6264 -0.2424 1.3322 5.7974
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.505565 13.186080 1.858 0.0745 .
## hp 0.141850 0.079164 1.792 0.0848 .
## qsec 0.531630 0.746717 0.712 0.4828
## factor(cyl)6 -4.408372 1.627676 -2.708 0.0118 *
## factor(cyl)8 -4.580823 2.555742 -1.792 0.0847 .
## hp:qsec -0.012526 0.005251 -2.386 0.0246 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.692 on 26 degrees of freedom
## Multiple R-squared: 0.8327, Adjusted R-squared: 0.8005
## F-statistic: 25.88 on 5 and 26 DF, p-value: 2.526e-09
## Overall F-test P-value: 25.88205 5 26
#6. Prediction Models
new_data1 <- data.frame(hp = 175, qsec = 14.2, drat = 3.91)
prediction_model3 <- predict(model1, newdata1 = new_data1)
## Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
## 23.955867 23.796785 24.109210 19.477748
## Hornet Sportabout Valiant Duster 360 Merc 240D
## 16.706976 18.128834 13.249800 24.802909
## Merc 230 Merc 280 Merc 280C Merc 450SE
## 23.084598 22.768097 22.597652 15.954863
## Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
## 15.898048 15.784418 13.720749 13.496484
## Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
## 13.759132 26.448790 31.294723 27.004637
## Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
## 22.815301 16.471698 18.076760 15.674904
## Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
## 16.388441 26.610713 27.336415 23.081217
## Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
## 17.002014 19.220283 9.845972 24.335959
## Car Predict
## Mazda RX4 Mazda RX4 23.955867
## Mazda RX4 Wag Mazda RX4 Wag 23.796785
## Datsun 710 Datsun 710 24.109210
## Hornet 4 Drive Hornet 4 Drive 19.477748
## Hornet Sportabout Hornet Sportabout 16.706976
## Valiant Valiant 18.128834
## Duster 360 Duster 360 13.249800
## Merc 240D Merc 240D 24.802909
## Merc 230 Merc 230 23.084598
## Merc 280 Merc 280 22.768097
## Merc 280C Merc 280C 22.597652
## Merc 450SE Merc 450SE 15.954863
## Merc 450SL Merc 450SL 15.898048
## Merc 450SLC Merc 450SLC 15.784418
## Cadillac Fleetwood Cadillac Fleetwood 13.720749
## Lincoln Continental Lincoln Continental 13.496484
## Chrysler Imperial Chrysler Imperial 13.759132
## Fiat 128 Fiat 128 26.448790
## Honda Civic Honda Civic 31.294723
## Toyota Corolla Toyota Corolla 27.004637
## Toyota Corona Toyota Corona 22.815301
## Dodge Challenger Dodge Challenger 16.471698
## AMC Javelin AMC Javelin 18.076760
## Camaro Z28 Camaro Z28 15.674904
## Pontiac Firebird Pontiac Firebird 16.388441
## Fiat X1-9 Fiat X1-9 26.610713
## Porsche 914-2 Porsche 914-2 27.336415
## Lotus Europa Lotus Europa 23.081217
## Ford Pantera L Ford Pantera L 17.002014
## Ferrari Dino Ferrari Dino 19.220283
## Maserati Bora Maserati Bora 9.845972
## Volvo 142E Volvo 142E 24.335959
## fit lwr upr
## Mazda RX4 23.955867 16.976208 30.93553
## Mazda RX4 Wag 23.796785 16.948899 30.64467
## Datsun 710 24.109210 17.370295 30.84813
## Hornet 4 Drive 19.477748 12.583415 26.37208
## Hornet Sportabout 16.706976 9.933170 23.48078
## Valiant 18.128834 10.962522 25.29515
## Duster 360 13.249800 6.362600 20.13700
## Merc 240D 24.802909 17.944540 31.66128
## Merc 230 23.084598 15.121567 31.04763
## Merc 280 22.768097 16.053721 29.48247
## Merc 280C 22.597652 15.834798 29.36051
## Merc 450SE 15.954863 9.179293 22.73043
## Merc 450SL 15.898048 9.126116 22.66998
## Merc 450SLC 15.784418 9.001975 22.56686
## Cadillac Fleetwood 13.720749 6.835247 20.60625
## Lincoln Continental 13.496484 6.606825 20.38614
## Chrysler Imperial 13.759132 6.854298 20.66397
## Fiat 128 26.448790 19.608904 33.28868
## Honda Civic 31.294723 23.994106 38.59534
## Toyota Corolla 27.004637 20.081856 33.92742
## Toyota Corona 22.815301 15.984610 29.64599
## Dodge Challenger 16.471698 9.291140 23.65226
## AMC Javelin 18.076760 11.259460 24.89406
## Camaro Z28 15.674904 8.690133 22.65967
## Pontiac Firebird 16.388441 9.587063 23.18982
## Fiat X1-9 26.610713 19.773972 33.44745
## Porsche 914-2 27.336415 20.237929 34.43490
## Lotus Europa 23.081217 16.212903 29.94953
## Ford Pantera L 17.002014 9.576316 24.42771
## Ferrari Dino 19.220283 12.308670 26.13190
## Maserati Bora 9.845972 2.239377 17.45257
## Volvo 142E 24.335959 17.554492 31.11743
new_data2 <- data.frame(hp = 175, qsec = 14.2, cyl = "6")
prediction_model3 <- predict(model6, newdata = new_data2)
## 1
## 21.34244
## Car Predict
## 1 1 21.34244
## fit lwr upr
## Mazda RX4 23.955867 16.976208 30.93553
## Mazda RX4 Wag 23.796785 16.948899 30.64467
## Datsun 710 24.109210 17.370295 30.84813
## Hornet 4 Drive 19.477748 12.583415 26.37208
## Hornet Sportabout 16.706976 9.933170 23.48078
## Valiant 18.128834 10.962522 25.29515
## Duster 360 13.249800 6.362600 20.13700
## Merc 240D 24.802909 17.944540 31.66128
## Merc 230 23.084598 15.121567 31.04763
## Merc 280 22.768097 16.053721 29.48247
## Merc 280C 22.597652 15.834798 29.36051
## Merc 450SE 15.954863 9.179293 22.73043
## Merc 450SL 15.898048 9.126116 22.66998
## Merc 450SLC 15.784418 9.001975 22.56686
## Cadillac Fleetwood 13.720749 6.835247 20.60625
## Lincoln Continental 13.496484 6.606825 20.38614
## Chrysler Imperial 13.759132 6.854298 20.66397
## Fiat 128 26.448790 19.608904 33.28868
## Honda Civic 31.294723 23.994106 38.59534
## Toyota Corolla 27.004637 20.081856 33.92742
## Toyota Corona 22.815301 15.984610 29.64599
## Dodge Challenger 16.471698 9.291140 23.65226
## AMC Javelin 18.076760 11.259460 24.89406
## Camaro Z28 15.674904 8.690133 22.65967
## Pontiac Firebird 16.388441 9.587063 23.18982
## Fiat X1-9 26.610713 19.773972 33.44745
## Porsche 914-2 27.336415 20.237929 34.43490
## Lotus Europa 23.081217 16.212903 29.94953
## Ford Pantera L 17.002014 9.576316 24.42771
## Ferrari Dino 19.220283 12.308670 26.13190
## Maserati Bora 9.845972 2.239377 17.45257
## Volvo 142E 24.335959 17.554492 31.11743