MOTOR TREND ANALYTICAL REPORT

Authored by: Paul Vinod

Executive Summary

To understand the relation and devise a model for miles per gallon (MPG) from the dataset of cars. The key objective of this analytic report is to answer the following questions: 1. Is an automatic or manual transmission better for MPG. 2. Quantify MPG difference between the automatic and manual transmissions.

Collection of Data from Repo

library(ggplot2)
library(datasets)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require(GGally)
## Loading required package: GGally
## Warning: package 'GGally' was built under R version 4.0.3
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
data(mtcars)
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Exploratory analysis of the Data

We would put an exploratory analysis of mpg v/s automatic, manual transmission using boxplot as shown in box-plot - Appendix A

d_manual <- mtcars$mpg[mtcars$am == 1]
d_automatic <- mtcars$mpg[mtcars$am == 0]
t_anyz <- t.test(d_manual, d_automatic)
  • From the t-test analysis we get that the p-value is 0.00137 indicating that the null hypothesis is rejected and there is a significant difference in mpg based on the transmission.
  • Another way we can find the difference is the confidence interval viz. 3.2096842, 11.2801944 doesn’t include 0

Regression Model

Thus with the above information we can render a fit viz. mpg vs transmission (am).

fit1 <- lm(mpg ~ am, data = mtcars)
s_fit <- summary(fit1)
s_fit
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

From the above regression fit, we can make the following interpretation. - The model has Adjusted R-squared value 0.3384589 viz. only fitting the exactly with our actual data. We need to work on the multivariate regression to improve the model. - The model was formed keeping the automatic mode of transmission as fixed. This would indicate that manual - automatic mpg difference is 7.2449393 MPG.

Since the R-Squared is less we need to work on a better model. The predictors that need to be included is shown in Appendix B.

Better Regression Model

From Appendix B we came across what all new variables are to be added that mpg is dependent on.

fit2 <- lm(mpg ~ cyl + disp + hp + wt + am, data = mtcars)

To check the improvement on the two models we conduct an Analysis of Variance (ANOVA) test

anova(fit2, fit1)
## Analysis of Variance Table
## 
## Model 1: mpg ~ cyl + disp + hp + wt + am
## Model 2: mpg ~ am
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     26 163.12                                  
## 2     30 720.90 -4   -557.78 22.226 4.507e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value calculated is 4.507481210^{-8} which indicate the given fit is better fit compared to initial.

n_fit <- summary(fit2)
n_fit
## 
## Call:
## lm(formula = mpg ~ cyl + disp + hp + wt + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5952 -1.5864 -0.7157  1.2821  5.5725 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 38.20280    3.66910  10.412 9.08e-11 ***
## cyl         -1.10638    0.67636  -1.636  0.11393    
## disp         0.01226    0.01171   1.047  0.30472    
## hp          -0.02796    0.01392  -2.008  0.05510 .  
## wt          -3.30262    1.13364  -2.913  0.00726 ** 
## am           1.55649    1.44054   1.080  0.28984    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.505 on 26 degrees of freedom
## Multiple R-squared:  0.8551, Adjusted R-squared:  0.8273 
## F-statistic:  30.7 on 5 and 26 DF,  p-value: 4.029e-10

Conclusion

  1. The adjusted r-square is 0.83 is a better model fit with more data considered.
  2. The automatic v/s manual has 1.56MPG. is the difference between the automatic transmission and manual transmission.

Appendix A

mtcars$am_factor <- factor(mtcars$am, labels = c("automatic","manual"))
g = ggplot(data = mtcars, aes(x = am, y = mpg, group = am, fill = am)) 
g = g + geom_boxplot()
g

The plot indicates that mpg is dependent on the transmission whether it is manual or automatic. This can be further be clarified by implementing t-test.

Appendix B

The relation of mpg v/s other variables and its dependency is found using ggcorr function

ggcorr(mtcars)
## Warning in ggcorr(mtcars): data in column(s) 'am_factor' are not numeric and
## were ignored

From the heat map and correlation table we can predict that mpg is dependent on the following variable which is having corr > 0.75.

Appendix C

Plotting the residuals data. The following is the residual plot for linear regression model.

par(mfrow = c(2,2))
plot(fit1)

These are the plots for linear regression between mpg and am.

The following would be residual plots for multivariate regression.

par(mfrow = c(2,2))
plot(fit2)