Reading data to check all variables in the data set. READING DATA
carsdata<-mtcars
head(carsdata,3)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Since our intrest is in miles per gallon vs tranmission (am variable) we fit a linear model between mpg and am
basicmodel<-lm(mpg~factor(am)-1,data=carsdata)
summary(basicmodel)
##
## Call:
## lm(formula = mpg ~ factor(am) - 1, data = carsdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## factor(am)0 17.147 1.125 15.25 1.13e-15 ***
## factor(am)1 24.392 1.360 17.94 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.9487, Adjusted R-squared: 0.9452
## F-statistic: 277.2 on 2 and 30 DF, p-value: < 2.2e-16
The basic model has good R Squared value (at 94.87 %) and the F value and P-value show significnt results to state that the regression is significant.
basicmodel$coef
## factor(am)0 factor(am)1
## 17.14737 24.39231
The basic model coefficients show that the auomatic tranmission has better miles per gallon preditions in comparision to manual tranmission.
Since we want to test rest of the variables which effect Miles Per Gallon Vs Transmission , it is important to consider rest of the variables, the appendix shows scatter plot matrix to understand the same.
Back gorund of stepwise selection process could give us the best optimal model
Here the model selection steps are suppressed since they consume space, but the best optimized model is show below.
Before we do the model selection we convert all the factor variaable to factor
Function used: opti<-step(lm(mpg~.,data=carsdata))
print(summary(opti))
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = carsdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9387 -1.2560 -0.4013 1.1253 5.0513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
## cyl6 -3.03134 1.40728 -2.154 0.04068 *
## cyl8 -2.16368 2.28425 -0.947 0.35225
## hp -0.03211 0.01369 -2.345 0.02693 *
## wt -2.49683 0.88559 -2.819 0.00908 **
## am1 1.80921 1.39630 1.296 0.20646
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401
## F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10
But the best model still includes cylinders which is a negatively correlated to MPG and so is hp, wt and hence we can conclude that the best model to fit would be the initial basic model.
t.test(carsdata$mpg~carsdata$am)
##
## Welch Two Sample t-test
##
## data: carsdata$mpg by carsdata$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
The T test concludes that cars with automatic transmission have better MPG in comparision to manual tranmission also rest of analysis can be seen in the apendix to conclue that Automatic transmission gives better miles per gallon in comparission to manual transmission.
pairs(mtcars)
The box plot shows that vehicles with automatic tranmission show better Miles Per Gallon on an average in comparission to manual transmission
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
ggplot(mtcars,aes(factor(am),mpg,fill=factor(am)))+geom_boxplot()
plot(opti)
plot(basicmodel)