library(ggplot2)
library(dplyr)
data(mtcars)
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
mtcars$cyl<- factor(mtcars$cyl); mtcars$vs<- factor(mtcars$vs); mtcars$gear<- factor(mtcars$gear); mtcars$carb<- factor(mtcars$carb); mtcars$am<- factor(mtcars$am,labels=c("Automatic","Manual"))
aggregate(mpg ~ am, mtcars, mean)
am mpg
1 Automatic 17.14737
2 Manual 24.39231
manualC<- mtcars[mtcars$am=="Manual",]; automaticC<- mtcars[mtcars$am=="Automatic",]
t.test(manualC$mpg, automaticC$mpg)
Welch Two Sample t-test
data: manualC$mpg and automaticC$mpg
t = 3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
3.209684 11.280194
sample estimates:
mean of x mean of y
24.39231 17.14737
Cdata<- lm(mpg ~ am, mtcars)
summary(Cdata)
Call:
lm(formula = mpg ~ am, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-9.3923 -3.0923 -0.2974 3.2439 9.5077
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.147 1.125 15.247 1.13e-15 ***
amManual 7.245 1.764 4.106 0.000285 ***
---
Signif. codes: 0 ???***??? 0.001 ???**??? 0.01 ???*??? 0.05 ???.??? 0.1 ??? ??? 1
Residual standard error: 4.902 on 30 degrees of freedom
Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
multifit_model<- lm(mpg~am + cyl + disp + hp + wt, data = mtcars)
anova(Cdata, multifit_model)
Analysis of Variance Table
Model 1: mpg ~ am
Model 2: mpg ~ am + cyl + disp + hp + wt
Res.Df RSS Df Sum of Sq F Pr(>F)
1 30 720.90
2 25 150.41 5 570.49 18.965 8.637e-08 ***
---
Signif. codes: 0 ???***??? 0.001 ???**??? 0.01 ???*??? 0.05 ???.??? 0.1 ??? ??? 1
summary(multifit_model)
Call:
lm(formula = mpg ~ am + cyl + disp + hp + wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.9374 -1.3347 -0.3903 1.1910 5.0757
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.864276 2.695416 12.564 2.67e-12 ***
amManual 1.806099 1.421079 1.271 0.2155
cyl6 -3.136067 1.469090 -2.135 0.0428 *
cyl8 -2.717781 2.898149 -0.938 0.3573
disp 0.004088 0.012767 0.320 0.7515
hp -0.032480 0.013983 -2.323 0.0286 *
wt -2.738695 1.175978 -2.329 0.0282 *
---
Signif. codes: 0 ???***??? 0.001 ???**??? 0.01 ???*??? 0.05 ???.??? 0.1 ??? ??? 1
Residual standard error: 2.453 on 25 degrees of freedom
Multiple R-squared: 0.8664, Adjusted R-squared: 0.8344
F-statistic: 27.03 on 6 and 25 DF, p-value: 8.861e-10
ggplot(data = mtcars, aes(mpg)) + geom_histogram() + facet_grid(.~am) + labs(x = "Miles per Gallon", y = "Frequency", title = "MPG Histogram for AT and MT cars")
### ploting data using boxpot to explaing MPG by Transmission type
boxplot(mpg ~ am, data = mtcars, col = (c("yellow","red")), ylab = "Miles$Gallon", xlab = "Transmission Type")
### to understand the correlation we will use pairs plot
mtcars_vars <- mtcars[, c(1, 3, 5, 6, 7, 9, 10)]
mar.orig <- par()$mar
par(mar = c(1, 1, 1, 1))
pairs(mtcars_vars, panel = panel.smooth, col = 9 + mtcars$wt)
### Visulazing the residuals
par(mfrow = c(2,2))
plot(multifit_model)
## Conclusion
### Under this model the Multiple R-squared:0.8664 which is 86% of the variance and as a result, cyl, disp, hp, wt did affect the correlation between mpg and am which is significant. Hence, we can conclude by explaining our second question that the difference between automatic and manual transmissions is 1.81 MPG.