Executive Summary

This work analyses the “mtcars” dataset to mainly answer the following questions

The mtcars dataset contains details of 32 car models based on 11 variables.

Priliminary Data Analysis indicates that cars with manual transmission show much better mpg than cars with automatic transmission. The difference between the average mpgs of cars with manual and automatic transmission is of 7.245mpg and the cars with manual transmission are 42.25% more efficient than cars with automatic transmission. Further analysis shows that there are other variables, such as the weight of the car, which heavily affect the fuel efficiency.

Exploratory Data Analysis and Preprocessing

Here, we first import the dataset.

data(mtcars)

View the first few rows to see what the data looks like and get a general feel about the data.

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Here it can be observed that the rows represent different models of cars. The columns represent different variables.

summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Priliminary Data Analysis

Here we check the direct effect of transmission type on average mpg.

Avg_MPG <- aggregate(mpg ~ factor(mtcars$am,labels=c('Automatic','Manual')),
                     data = mtcars, mean)
Avg_MPG
##   factor(mtcars$am, labels = c("Automatic", "Manual"))      mpg
## 1                                            Automatic 17.14737
## 2                                               Manual 24.39231

Calculating the difference between the two mpg values and finding the percentage increase

Overall_Increase <- Avg_MPG[2, 2] - Avg_MPG[1, 2]
print(Overall_Increase)
## [1] 7.244939
Percentage_Increase <- ((Avg_MPG[2, 2] - Avg_MPG[1, 2])/Avg_MPG[1, 2])*100
Percentage_Increase
## [1] 42.25103

We can see here that the transmission type directly affects the fuel efficiency and the cars with manual transmission are much more efficient than cars with automatic transmission

Regression analysis

From the above analysis, we can hypothesize that the mpg is affected adversely when the transmission type is automatic. This can be further explored by performing a linear regression test on our data.

summary(lm(mpg ~ am, data = mtcars))
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

It can be observed here that the R-Squared value is approx 36%. This means that t̥here are factors other than mtcars$am affecting the variance in mpg.

Doing a multivariate analysis and checking the fit with all the variables in the dataset.

summary(lm(mpg~., data = mtcars))$coefficients
##                Estimate  Std. Error    t value   Pr(>|t|)
## (Intercept) 12.30337416 18.71788443  0.6573058 0.51812440
## cyl         -0.11144048  1.04502336 -0.1066392 0.91608738
## disp         0.01333524  0.01785750  0.7467585 0.46348865
## hp          -0.02148212  0.02176858 -0.9868407 0.33495531
## drat         0.78711097  1.63537307  0.4813036 0.63527790
## wt          -3.71530393  1.89441430 -1.9611887 0.06325215
## qsec         0.82104075  0.73084480  1.1234133 0.27394127
## vs           0.31776281  2.10450861  0.1509915 0.88142347
## am           2.52022689  2.05665055  1.2254035 0.23398971
## gear         0.65541302  1.49325996  0.4389142 0.66520643
## carb        -0.19941925  0.82875250 -0.2406258 0.81217871

It can be seen here that there are variables which heavily affect the mpg other than am. For example, the variable wt has a severe negetive effect on mpg given the other variables are constant. This means the fuel efficiency would reduce with increase in weight.

Let us perform the test again using the three most effective variables which are am, wt and qsec.

summary(lm(mpg ~ am + wt + qsec, data = mtcars))
## 
## Call:
## lm(formula = mpg ~ am + wt + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## am            2.9358     1.4109   2.081 0.046716 *  
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

It can be seen that the R-Squared value is almost 85%.

Discussion

From the above analysis, it can be inferred that the transmission type affects mpg negetively. Although it is one of the main causes, there are other variables which affect the mpg such as the weight of the car and the 1/4 mile time.

It is fairly straightforward to see that increase in weight would cause more stress on the engine and thus reduce the milage. But what is interesting is that slower 1/4 mile times mean higher mpg. This may or may not be attributed to the power (hp) of the car or the torque (which is not included in the data) that is being produced but the analysis of qsec with other variables is out of the scope of this project.

Figure 2 in Appendix shows the residual analysis. It can be seen that the points are randomely scattered on the first plot thus indicating the variables are independant. It can be seen that the standardized residuals are between [-2, 2] and the cook’s distance is less than 1. This indicates that the model is a good fit assuming the normality of the residuals. The normality can be checked from second figure as most of the points fall on the line.

Conclusion

Cars with manual transmission are 42% more fuel efficient that cars with automatic transmission.

Appendix

Fig 1 - Boxplot for the two mpgs

boxplot(mtcars$mpg ~ factor(mtcars$am,labels=c('Automatic','Manual')),
        ylab = "Miles Per Gallon",
        xlab = "Transmission Type")

Fig 2 - Residual Analysis

residual_analysis <- lm(mpg ~ wt + am + qsec, data = mtcars)
par(mfrow = c(2, 2))
plot(residual_analysis, 1:4)