###INTRODUCTION

With respect to the final project I have chosen the preloded data set in R of “mtcars”. I have loaded the data set and performed the analysis on it.

To explore the relationship between the set of variables and miles per gallon(mpg),which is the outcome : “Is ian automatic transmission better for MPG” “Quantify the MPG difference between automatic and manual transmissions”

Using linear regression and hypothesis testing , it can be concluded that there is significant difference between MPG of automatic and manual transmission cars.

To quantify the MPG difference between automatic and manual transmission cars a linear regression model that took into account the weight the type of transmission and acceleration was used. controlling these facotrs, manual transmission cars acn have better fuel efficiency of 2.94MPG more than automatic transmission cars.

###LOADING NECESSARY LIBRARIES

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

###LOADING NECESSARY DATASET

data(mtcars)
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

###PROCESSING THE DATA

mtcars$am<-as.factor(mtcars$am)
levels(mtcars$am)<-c("AT","MT")

###EXPLORATORY DATA ANALYSIS

Mean of automatic and manual transmission cars :

aggregate(mpg~am,data=mtcars,mean)
##   am      mpg
## 1 AT 17.14737
## 2 MT 24.39231

The mean MPG of manual transmission cars ids 7.245MPG higher than that of aotumatic transmission cars.

running a t-test:

atData<-mtcars[mtcars$am=="AT",]
mtData<-mtcars[mtcars$am=="MT",]
t.test(atData$mpg,mtData$mpg)
## 
##  Welch Two Sample t-test
## 
## data:  atData$mpg and mtData$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231

The p value of the test is 0.001374,which falls within 95% confidence interval. Hence controlling for all other avriables there is a significant difference between the mean MPG of bothcars.

###HISTOGRAM OF THE MPG FOR AT AND MT CARS

ggplot(data=mtcars,aes(mpg))+geom_histogram()+facet_grid(.~am)+labs(x="Miles per Gallon",y="Frequncy",title="MPG Histogram for AT and MT cars.")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

###CORRELATIONS

corr<-select(mtcars,mpg,cyl,disp,wt,qsec,am)
pairs(corr)

###LINEAR MODELS

MODEL1: regress mpg against am:

fit_1<-lm(mpg~am,data=mtcars)
summary(fit_1)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amMT           7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

from this simple linear regression model of mpg against am, manual transmission cars have 7.24 MPG more than automatic transmission cars. the value R^2 of this model is 0.3598, meaning that it only explains 35.98% of the variance.

MODEL 2:using the step function:

fit_2<-step(lm(data=mtcars,mpg~.),trace=0,steps=10000)
summary(fit_2)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## amMT          2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

The model uses a step function algorith to pick the variables that affect the MPG of the cars the most. From the model the weight,acceleration as well as the transmission mode affect the MPG of the cars most.

Based on this multivariate regression models a manual transmission car has a fuel efficiency of 2.94MPG higher than that of the automated transmission cars . The adjusted R^2 of the model is 0.834,meaning that 83% of the variance in mpg can be explained by this model.

ANOVA OF THE 2 MODELS:

fit_step<-lm(mpg~am+wt+qsec,data=mtcars)
anova(fit_1,fit_step)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt + qsec
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1     30 720.90                                 
## 2     28 169.29  2    551.61 45.618 1.55e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value indicates that we should reject the null hypothesisthat the means from both models are the same. That is, the weight and the acceleration of the car have significant impact on its MPG.

###CONCLUSIION;

In conclusion, holding the weight and acceleration(qsec) of the car constant,manual transmission cars offer 2.94MPG better fuel efficiency.

APPENDIX: MODEL RESIDUALS

par(mfrow=c(2,2))
plot(fit_2)