Mt Cars Data Analysis

This is a analysis report for the Coursera regression model project to analyze how MPG is effected by automatic and manual transmission.Which one is better in terms of performance-usually manual transmission offers better.Let’s see is it true or not? 2.Quantify the difference between manual transmission and automatic transmission

Exploratory Data Analysis

Installing the required packages and libraries

library(ggplot2)  
## Warning: package 'ggplot2' was built under R version 3.4.4
data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Transforming certain variables into factors

mtcars$cyl<- factor(mtcars$cyl)
mtcars$vs<- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am<- factor(mtcars$am,labels = c("Automatic","Manual"))

Regression Analysis

Quantify the difference between Automatic and Manual

aggregate(mpg~am,data = mtcars,mean)
##          am      mpg
## 1 Automatic 17.14737
## 2    Manual 24.39231

frm hypothesis we can state that manual transmission has MPG 7.25 more than the automatic transmission. To determine it’s significance we will use t-test:

D_auto<- mtcars[mtcars$am == "Automatic",]
D_manual <- mtcars[mtcars$am =="Manual",]
t.test(D_auto$mpg,D_manual$mpg)
## 
##  Welch Two Sample t-test
## 
## data:  D_auto$mpg and D_manual$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231

As p-value is 0.0013,we can state that difference is significant. Now to quantify this

fit<- lm(mpg~am,data = mtcars)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amManual       7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

As it also states that mpg has higher mpg than automatic transmission.As R^2 value is 0.359,it states that it this model only explains 36% of covariance.As a result we need to consider multivariable analysis.

As we can see from the summary that cyl, disp,hp,wt has a stronger relation with mpg. So we will neglect rest of the varisble while performing multivariable analyses.We will build the model with these variables and compare with the previous with anova function.

fit1<-lm(mpg~am+cyl+disp+hp+wt,data = mtcars)

As R^2 value is 0.86 so it explains 86% of variance.Now we will compare it with previous model with anova function

anova(fit,fit1)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ am + cyl + disp + hp + wt
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     30 720.90                                  
## 2     25 150.41  5    570.49 18.965 8.637e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fit1)
## 
## Call:
## lm(formula = mpg ~ am + cyl + disp + hp + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9374 -1.3347 -0.3903  1.1910  5.0757 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.864276   2.695416  12.564 2.67e-12 ***
## amManual     1.806099   1.421079   1.271   0.2155    
## cyl6        -3.136067   1.469090  -2.135   0.0428 *  
## cyl8        -2.717781   2.898149  -0.938   0.3573    
## disp         0.004088   0.012767   0.320   0.7515    
## hp          -0.032480   0.013983  -2.323   0.0286 *  
## wt          -2.738695   1.175978  -2.329   0.0282 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.453 on 25 degrees of freedom
## Multiple R-squared:  0.8664, Adjusted R-squared:  0.8344 
## F-statistic: 27.03 on 6 and 25 DF,  p-value: 8.861e-10

Since the p-value is 8.6e-08, we can claim that fit1 is more significant than fit. This model explains us 86% of the variance as a result we can state that [cyl],[disp],[hp],[wt] has an impact on mpg and am.From the result we can say that the difference between automatic and manual transmission is 1.81MPG.

Appendix

Plotting of MPG by transmission type

boxplot(mpg~am,data = mtcars,col=(c("Red","Blue")),xlab= "Transmission Type",ylab="Miles Per Gallon")

##Pairs Plot for the dataset

pairs(mpg~.,data = mtcars)

Checking the residuals

par(mfrow = c(2,2))
plot(fit1)