This report s exploring the relationship between a set of variables and miles per gallon (MPG) (outcome).Particularly in the following two questions:

The dataset consists of 32 observations on 11 variables.

-[, 1] mpg Miles/(US) gallon

-[, 2] cyl Number of cylinders

-[, 3] disp Displacement (cu.in.)

-[, 4] hp Gross horsepower

-[, 5] drat Rear axle ratio

-[, 6] wt Weight (lb/1000)

-[, 7] qsec 1/4 mile time

-[, 8] vs V/S

-[, 9] am Transmission (0 = automatic, 1 = manual)

-[,10] gear Number of forward gears

-[,11] carb Number of carburetors

Executive Summary

Started with data exploration, with more detail of the data , use cor function to check correlation between each variables.Then,find out the mpg relation with the am(automatic or not ). Next,Seclect the important variables ,started to build the models.Diagnostics the models and choose the best one .

## Loading required package: carData

Data exploration:

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

After the detail about the data,let s see correlation between each variables

cor(mtcars)
##             mpg        cyl       disp         hp        drat         wt
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684  0.68117191 -0.8676594
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475 -0.69993811  0.7824958
## disp -0.8475514  0.9020329  1.0000000  0.7909486 -0.71021393  0.8879799
## hp   -0.7761684  0.8324475  0.7909486  1.0000000 -0.44875912  0.6587479
## drat  0.6811719 -0.6999381 -0.7102139 -0.4487591  1.00000000 -0.7124406
## wt   -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065  1.0000000
## qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476 -0.1747159
## vs    0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846 -0.5549157
## am    0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113 -0.6924953
## gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013 -0.5832870
## carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980  0.4276059
##             qsec         vs          am       gear        carb
## mpg   0.41868403  0.6640389  0.59983243  0.4802848 -0.55092507
## cyl  -0.59124207 -0.8108118 -0.52260705 -0.4926866  0.52698829
## disp -0.43369788 -0.7104159 -0.59122704 -0.5555692  0.39497686
## hp   -0.70822339 -0.7230967 -0.24320426 -0.1257043  0.74981247
## drat  0.09120476  0.4402785  0.71271113  0.6996101 -0.09078980
## wt   -0.17471588 -0.5549157 -0.69249526 -0.5832870  0.42760594
## qsec  1.00000000  0.7445354 -0.22986086 -0.2126822 -0.65624923
## vs    0.74453544  1.0000000  0.16834512  0.2060233 -0.56960714
## am   -0.22986086  0.1683451  1.00000000  0.7940588  0.05753435
## gear -0.21268223  0.2060233  0.79405876  1.0000000  0.27407284
## carb -0.65624923 -0.5696071  0.05753435  0.2740728  1.00000000

So,select the top 7 to build the models. Before the futher,let see the basic cor between the variables: As the plots shows above,cyl,disp,hp,wt is going down when the mpg is going up;in contrast drat and qset are the same as mpg,But , all of them are not all on the line . What s more ,disp,hp, dra are obviously Bimodal curve.

Q1:

Is an automatic or manual transmission better for MPG?

boxplot(mtcars$mpg,mtcars$mpg[mtcars$am==1],mtcars$mpg[mtcars$am==0],ylab = "mpg",names = c("overall","automobile","mantual"))

abline(h = mean(mtcars$mpg),lwd = 2,col = "red")

abline(h = mean(mtcars$mpg[mtcars$am==0]),lwd = 2,col = "blue")

t.test(mtcars$mpg~mtcars$am)
## 
##  Welch Two Sample t-test
## 
## data:  mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231

From the boxplot and t.test , we can tell , there is likely to be some difference between manual and automatic transmissions. The p value is low enough to reject the notion that there is no difference between mpg amongst transmissions and that manual is better for mpg than automatic.

Q2:

fit1<-lm(mpg~.,data = mtcars)
fit2<-lm(mpg~.,data = mtcars1)
fit3<-lm(mpg~cyl+(disp+hp+drat+wt+qsec)^2 , data=mtcars)

In partical point of view , we will know disp,hp,drat,wt,qsec are Interaction terms.So i build fit3 like this .

anova(fit1,fit2,fit3)
## Analysis of Variance Table
## 
## Model 1: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## Model 2: mpg ~ cyl + disp + hp + drat + wt + qsec
## Model 3: mpg ~ cyl + (disp + hp + drat + wt + qsec)^2
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     21 147.49                           
## 2     25 163.48 -4   -15.982 0.5795 0.6821
## 3     15 103.43 10    60.045 0.8708 0.5772

we can see ,fit 3 is the best one . #Regression Diagnostics

par(mfrow = c(2,2))
plot(fit3)

From the Q-Q graph, residuals appear normally , and there are no obvious patterns on the residuals vs fitted on the distribution of residuals.So the model fits really good .

Conclusion:

the manual transmission will really get a higher MPG.