Executive Summary

Analyzing the dataset of different collection of cars, we are interested in exploring the relationship between different variable from a set of eleven variables, and miles per gallon (MPG).Dataset from the 1974 Motor Trend has been used to findout that,

By incorporating hypothesis testing and simple linear regression, it is determined that there is significant difference between the mean MPG for automatic and manual transmission cars, on average manual transmission cars have 7.245 more MPG. Moreover, in order to regulate for other confounding variables such as weight, Number of cylinders and horsepower of the car. Multivariate regression model is used to find out the impact of other variables on MPG.Simple and multivariate regression models were validated by using ANOVA test. Results from the multivariate regression disclose that on average manual transmission cars get 1.47miles more per gallon as compare to automatic transmission cars.

Data Analysis

class(mtcars)
## [1] "data.frame"
colnames(mtcars)
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
tail(mtcars)
##                 mpg cyl  disp  hp drat    wt qsec vs am gear carb
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
## Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
## Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2
seq(mtcars)
##  [1]  1  2  3  4  5  6  7  8  9 10 11

Exploratory Data Analysis

As we are interested in exploring the relationship between a set of variables and miles per gallon (mpg), so mpg is our dependent variable.So we plot dependent variable to check its distribution.

The distribution of mpg, showing in the grapgs (Plot, 1 & 2) is approximately normal and apparently the data does not contain any outliers. Now we check how mpg can be changed by automatic or manual transmission, by plotting a box plot.

We can kind a transparent hypothesis from this box plot (Plot, 3): it seems that automatic cars have a lower miles per gallon, and so a lower fuel potency, than manual cars do. However it’s attainable that this apparent pattern happened by random chance, that we tend to simply happened to choose a bunch of automatic cars with low potency and bunch of manual cars with higher potency, thus to see whether or not that this the case, we’ve got to use a statistical test.

T.test

t.test(mpg ~ am, data=mtcars)
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231

Result of t test gives a lot of information. p-value is very small which assured that there is actual distinction between the two groups. 95% confidence interval shows that mpg in manual cars can be lower than automatic cars and it varies from 3.2 to 11.3. Hence the null Hypothesis is rejected.There exist a significant difference in the mean MPG between manual transmission cars and automatic transmission cars.

Correlation Test

In order to determine the relationship between the variables, and to findout which variables should be included in our model to answer the questions, we perform a correlation test and create a correlation matrix.

data(mtcars)
sort(cor(mtcars)[1,])
##         wt        cyl       disp         hp       carb       qsec 
## -0.8676594 -0.8521620 -0.8475514 -0.7761684 -0.5509251  0.4186840 
##       gear         am         vs       drat        mpg 
##  0.4802848  0.5998324  0.6640389  0.6811719  1.0000000
cor(mtcars, use="complete.obs", method="pearson")
##             mpg        cyl       disp         hp        drat         wt
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684  0.68117191 -0.8676594
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475 -0.69993811  0.7824958
## disp -0.8475514  0.9020329  1.0000000  0.7909486 -0.71021393  0.8879799
## hp   -0.7761684  0.8324475  0.7909486  1.0000000 -0.44875912  0.6587479
## drat  0.6811719 -0.6999381 -0.7102139 -0.4487591  1.00000000 -0.7124406
## wt   -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065  1.0000000
## qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476 -0.1747159
## vs    0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846 -0.5549157
## am    0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113 -0.6924953
## gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013 -0.5832870
## carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980  0.4276059
##             qsec         vs          am       gear        carb
## mpg   0.41868403  0.6640389  0.59983243  0.4802848 -0.55092507
## cyl  -0.59124207 -0.8108118 -0.52260705 -0.4926866  0.52698829
## disp -0.43369788 -0.7104159 -0.59122704 -0.5555692  0.39497686
## hp   -0.70822339 -0.7230967 -0.24320426 -0.1257043  0.74981247
## drat  0.09120476  0.4402785  0.71271113  0.6996101 -0.09078980
## wt   -0.17471588 -0.5549157 -0.69249526 -0.5832870  0.42760594
## qsec  1.00000000  0.7445354 -0.22986086 -0.2126822 -0.65624923
## vs    0.74453544  1.0000000  0.16834512  0.2060233 -0.56960714
## am   -0.22986086  0.1683451  1.00000000  0.7940588  0.05753435
## gear -0.21268223  0.2060233  0.79405876  1.0000000  0.27407284
## carb -0.65624923 -0.5696071  0.05753435  0.2740728  1.00000000

values which are obtained from the correlation test and correlation matrix (plot 4) shows that variables such as wt, cyl, disp, and hp are highly correlated with the dependent variable mpg. Hence, they should be included in the regressin model. while looking at the correlation matrix, it can be seen that cyl and disp are highly correlated with each other. In order to avoid the problem of collinearity only one variable from these two will be included in the model.

Regression Analysis

Simple Linear Regression Model

Correlation is the best test when we are interested in finding the relationship between two continous variables. But when we are interested in finding out the relationship in which one variable predicts the behavior of other variable, regression analysis is used. we started with the simple regression, by fitting a simple linear regression for mpg on am.

fit <-lm(mpg~am, data=mtcars)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Hypothesis testing do not provide any noteworthy information by using simple regression. Null hypothesis is rejected and it can be said that, there exists a linear relation between the predictor variable mpg and am.Intercepts and Coefficient can be explained that, on average automatic transmission cars has 17.147 MPG and manual transmission cars has 24.49(17.147 + 7.24).The value of R^2 is 0.3385, which means this model only explain 33.85% of the variance.

Multivariate Regression Model

since the reults of correlation test has shown that are other variables in the dataset that showed a linear correlation with the dependent variable. So we fit a multivariate regression for mpg on am, wt, cyl and hp.

ANOVA Test

Hence we have two models of the same data, an ANOVA test will be run to compare the two models and to findout that these two models are significantly different or not.

mfit <- lm(mpg~am + cyl + wt + hp, data = mtcars)
anova(fit, mfit)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ am + cyl + wt + hp
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1     30 720.9                                  
## 2     27 170.0  3     550.9 29.166 1.274e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

with the p-value f 1.274e-08, null hypothesis is rejected and it can be claimed that the multivariate model of regression is significanlty different from that of simple model.

Before analyzing the detail of the model, it is important to examine the residuals. In order to observe any indication of non-normality and to inspect any sign of heteroskedasticity, residuals vs. fitted values plot is examined.

After examing the plots, we can say that residuals are normally distibutes and are homoskedastic. Now estimates from the multivariate model can be interpreted.

summary(mfit)
## 
## Call:
## lm(formula = mpg ~ am + cyl + wt + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4765 -1.8471 -0.5544  1.2758  5.6608 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 36.14654    3.10478  11.642 4.94e-12 ***
## am           1.47805    1.44115   1.026   0.3142    
## cyl         -0.74516    0.58279  -1.279   0.2119    
## wt          -2.60648    0.91984  -2.834   0.0086 ** 
## hp          -0.02495    0.01365  -1.828   0.0786 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.509 on 27 degrees of freedom
## Multiple R-squared:  0.849,  Adjusted R-squared:  0.8267 
## F-statistic: 37.96 on 4 and 27 DF,  p-value: 1.025e-10

Multivariate regression model explain 84.9% variance. Furthermore it can be seen that wt and upto some extent hp confound the relationship between am and mpg.On Average manual transmission cars has 1.47MPGs more than automatic transmissios cars. It is concluded that, it is better for MPG to have manual transission cars as compae to automatic cars.

Appendix

Plot, 1

par(mfrow = c(1,2))
x <- mtcars$mpg
h<-hist(x,breaks=10, col="blue", xlab="Miles Per Gallon", main="Histogram of Miles Per Gallon")
xfit<-seq(min(x),max(x),length=40)
yfit<-dnorm(xfit,mean=mean(x),sd=sd(x))
yfit <- yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, type="l", col="red", lwd=2)

Plot,2

d <- density(mtcars$mpg)
plot(d, xlab ="MPG", main ="Density Plot of MPG")

Plot, 3

boxplot(mpg~am, data = mtcars, col = c("blue", "blue"), xlab = "Transmission", ylab = "Miles per Gallon", main = "MPG by Transmission Type")

Plot 4, Correlation Matrix

plot(mtcars)

Plot, 5

par(mfrow = c(2,2))
plot(mfit)