I am a data scientist at MTcars and currently I am working on a piece for a magazine called Motor Trend about the automobile industry. This project looks at a data set of a collection of cars as my team is interested in exploring the relationship between a set of variables and how they impact fuel efficiency using the miles they travel per one gallon of traditional fuel (MPG). This research briefly progresses through following steps:
This research will attemp to answer following research questions:
This study will be conducted using the Motor Trend Car Road Tests (mtcars) data set. These data were extracted from the 1974 Motor Trend Magazine. The dataframe comprises fuel consumption by thirty-two automobile types based on ten different aspects.
Here are further information about the variables:included in the dataset:
Let’s begin by invoking the library and selecting the dataframe
library(datasets)
data(mtcars)
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
Based on the summary, we have following variables with following specifications:
## Warning: package 'knitr' was built under R version 4.0.3
| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
| Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
| Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
| Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
| Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
| Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
| Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
| Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
| Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
| Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
| Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
| Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
| Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
| Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
| Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
| Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
| Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
| AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
| Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
| Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
| Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
| Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
| Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
| Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
| Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
| Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
| Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
We can clearly see that the vehicles have either 4, or 6, or 8 cylinders. Based on the research questions, we need to pay attention to cyl variable to identify how they differ in fule efficiency by cylinder types. In addition, we are also interested in the fuel efficiency based on automobiles’ transmission types. The vehicles are identified to have either automatic (coded 0) or manual (coded 1) transmissions.
Along the way we will also see how the impact of these variables change when we include other variables in the model for example, the engine types, weight, gross horsepower, number of forward gears, or number of carburetors. The vs is a bionomial variable, gear and carb are categorical variables suggesting number of forward gears and number of carburetors, respectively. Similarly, wt (the weight of vehicles), hp (gross horsepower)and mpg miles per US gallon are continuous variables.
Let’s check the correlation between the variables included in the dataset:
pairs(mtcars, panel=panel.smooth, main="Pairwise Relationship between MTCARS Variables")
The graph shows that most of the variables do have some sort of linear relationship. Looks like the automobiles with higher number of cylinders give lower miles per gallon. Similarly, cars with automatic transmissions have lower mpg compared to cars with manual transmissions. Likewise, horsepower, weight, displacement, and number of carburetors seem to have lower fuel efficiency when the values increase. Conversely, the rear axle ratio, 1/4 mile time, and engine type seem to have positive linear relationships with mpg.
Let’s check if they really behave the way they appear to be
There are some categorical variables in the dataset. We need to change them into factor variables before we are able to do conduct a regression analysis.
transmission<-as.factor(mtcars$am)
levels(transmission)<-c("Automatic", "Manual")
cylinder<-as.factor(mtcars$cyl)
levels(cylinder)<-c("4cyl", "6cyl","8cyl")
engine<-as.factor(mtcars$vs)
levels(engine)<-c("V-Shaped", "Straight")
GEAR<-as.factor(mtcars$gear)
CARB<-as.factor(mtcars$carb)
fit<-lm(mpg~am, data=mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
As can be seen in the results above, the average miles per gallon are 17.147 across all cars and the model is statistically significant. Similarly, compared to the cars with the automatic transmission, the cars with manual transmission had better mpg per gallon. In other words, per one gallon traditional fuel is linked to 7.245 more miles in the cars with manual transmission, and this increase was statistically significantly higher than zero.
Following box plot strengthen the above finding.
library(ggplot2)
g<-ggplot(mtcars, aes(x=transmission,y=mpg))
g=g+xlab("Cars by Transmission Types")
g=g+ylab("Miles per Gallon")
g=g+geom_boxplot(aes(fill=transmission))
print(g)
Clearly, automatic automobiles give approximately 17 miles per gallon, while the manual automobiles have slightly more than 22 miles per gallon. If you want to ride a car and save some dimes at the same time, you gotta walk away with manual transmission.
library(ggplot2)
fit1<-lm(mpg~vs, data=mtcars)
summary(fit1)
##
## Call:
## lm(formula = mpg ~ vs, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.757 -3.082 -1.267 2.828 9.383
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.617 1.080 15.390 8.85e-16 ***
## vs 7.940 1.632 4.864 3.42e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.581 on 30 degrees of freedom
## Multiple R-squared: 0.4409, Adjusted R-squared: 0.4223
## F-statistic: 23.66 on 1 and 30 DF, p-value: 3.416e-05
g1<-ggplot(mtcars, aes(x=engine,y=mpg))
g1=g1+xlab("Cars by Engine Types")
g1=g1+ylab("Miles per Gallon")
g1=g1+geom_boxplot(aes(fill=engine))
print(g1)
The the results show that average miles per gallon across the cars regardless of their engine types was 16.617 miles. Compared to the cars with v-shaped engine, straight engine cars have statistically significant higher fuel efficiency, i.e., 7.94 miles per gallon. This result has been strengthened by the box plot. We can clearly see that straight engine automobiles have slightly more than 22 miles per gallon rate, while the v-shaped cars give little more than 15 miles per gallon.
Now, lets put transmission and engine types as the predictors of the mpg,and see how the aforementioned statistics change.
fit2<-lm(mpg~am+vs, data=mtcars)
summary(fit2)
##
## Call:
## lm(formula = mpg ~ am + vs, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1905 -2.5988 0.2222 2.7315 6.3095
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.5944 0.9262 15.758 9.35e-16 ***
## am 6.0667 1.2748 4.759 4.96e-05 ***
## vs 6.9294 1.2621 5.490 6.50e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.491 on 29 degrees of freedom
## Multiple R-squared: 0.6861, Adjusted R-squared: 0.6644
## F-statistic: 31.69 on 2 and 29 DF, p-value: 5.056e-08
The results shows that this model is statistically significantly. Comparing R-squared among these models we can see the model with both predictors has better value, i.e., 0.6861. In addition, the average miles per gallon regardless of car types has been slightly dropped to 14.59. Both transmission types and engine types are statistically significant predictors of fuel efficiency in a car.
library(car)
## Warning: package 'car' was built under R version 4.0.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.0.3
car::vif(fit2)
## am vs
## 1.029167 1.029167
Based on the variance inflation factors which suggests the increase in the variance for the second variance in our final model. In reality we don’t see any rate of inflation after we include vs in the model. So, it’s okay, in fact better that we put them in the model.
anova(fit, fit1, fit2)
The above decision that the two predictor model was better than the one predictor model only, has been confirmed by further ANOVA analysis. Based on the results, the two predictors model is statistically significant, showing it is a better fitting model.
par(mfrow=c(2,2))
plot(fit2)
Looking at the upper left hand plot, i.e., Residuals vs. Fitted value, the horizontal reference line at 0 both rises and drops below 0 suggesting the sum around 0. Like in residual plots, the Normal Q-Q plot shows roughly 45 degree angle which suggests normality.
Thanks