###INTRODUCTION
With respect to the final project I have chosen the preloded data set in R of “mtcars”. I have loaded the data set and performed the analysis on it.
To explore the relationship between the set of variables and miles per gallon(mpg),which is the outcome : “Is ian automatic transmission better for MPG” “Quantify the MPG difference between automatic and manual transmissions”
Using linear regression and hypothesis testing , it can be concluded that there is significant difference between MPG of automatic and manual transmission cars.
To quantify the MPG difference between automatic and manual transmission cars a linear regression model that took into account the weight the type of transmission and acceleration was used. controlling these facotrs, manual transmission cars acn have better fuel efficiency of 2.94MPG more than automatic transmission cars.
###LOADING NECESSARY LIBRARIES
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
###LOADING NECESSARY DATASET
data(mtcars)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
###PROCESSING THE DATA
mtcars$am<-as.factor(mtcars$am)
levels(mtcars$am)<-c("AT","MT")
###EXPLORATORY DATA ANALYSIS
Mean of automatic and manual transmission cars :
aggregate(mpg~am,data=mtcars,mean)
## am mpg
## 1 AT 17.14737
## 2 MT 24.39231
The mean MPG of manual transmission cars ids 7.245MPG higher than that of aotumatic transmission cars.
running a t-test:
atData<-mtcars[mtcars$am=="AT",]
mtData<-mtcars[mtcars$am=="MT",]
t.test(atData$mpg,mtData$mpg)
##
## Welch Two Sample t-test
##
## data: atData$mpg and mtData$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
The p value of the test is 0.001374,which falls within 95% confidence interval. Hence controlling for all other avriables there is a significant difference between the mean MPG of bothcars.
###HISTOGRAM OF THE MPG FOR AT AND MT CARS
ggplot(data=mtcars,aes(mpg))+geom_histogram()+facet_grid(.~am)+labs(x="Miles per Gallon",y="Frequncy",title="MPG Histogram for AT and MT cars.")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
###CORRELATIONS
corr<-select(mtcars,mpg,cyl,disp,wt,qsec,am)
pairs(corr)
###LINEAR MODELS
MODEL1: regress mpg against am:
fit_1<-lm(mpg~am,data=mtcars)
summary(fit_1)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amMT 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
from this simple linear regression model of mpg against am, manual transmission cars have 7.24 MPG more than automatic transmission cars. the value R^2 of this model is 0.3598, meaning that it only explains 35.98% of the variance.
MODEL 2:using the step function:
fit_2<-step(lm(data=mtcars,mpg~.),trace=0,steps=10000)
summary(fit_2)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## amMT 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
The model uses a step function algorith to pick the variables that affect the MPG of the cars the most. From the model the weight,acceleration as well as the transmission mode affect the MPG of the cars most.
Based on this multivariate regression models a manual transmission car has a fuel efficiency of 2.94MPG higher than that of the automated transmission cars . The adjusted R^2 of the model is 0.834,meaning that 83% of the variance in mpg can be explained by this model.
ANOVA OF THE 2 MODELS:
fit_step<-lm(mpg~am+wt+qsec,data=mtcars)
anova(fit_1,fit_step)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt + qsec
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 28 169.29 2 551.61 45.618 1.55e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value indicates that we should reject the null hypothesisthat the means from both models are the same. That is, the weight and the acceleration of the car have significant impact on its MPG.
###CONCLUSIION;
In conclusion, holding the weight and acceleration(qsec) of the car constant,manual transmission cars offer 2.94MPG better fuel efficiency.
APPENDIX: MODEL RESIDUALS
par(mfrow=c(2,2))
plot(fit_2)