In this project I am going to look inside the relationship between some variables and miles per gallon (MPG) in the mtcars dataset. The purpose of this is to answer the question whether the transmision method used by a car (Manual or Automatic) have real influence in the MPG that the car is able to give, all this based on the mtcars dataset.
We are using the dataset called “mtcars”. This dataset contains data about cars and several variables that are described below in the summary. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
head(mtcars)
The following plot is generated to show the relation ship between automatic and manual cars in terms of their MPG. The plot shows clearly that manual cars have a higher average in mpg than automatic cars. Just looking at this graph we could asume that a manual car should have better MPG, but…. let’s do some aditional analysis.
ggplot(mtcars, aes(x=factor(am), y=mpg, fill=am)) + geom_boxplot() + theme(legend.position="none") + ggtitle("Plot to show relationship in MPG between automatic and manual cars")
Calculating t-test for the data shows a low p-value which is an indicator that both manual and automatic data can be classified as random data therefore we can do the analysis on both subsets. After we display a correlation matrix to find which columns we should include in the analysis. From the matrix the columns MPG, WT, CYL, DISP and HP are seen as highly correlated so we choose them.
t.test(mtcars[mtcars$am==0, ]$mpg, mtcars[mtcars$am==1, ]$mpg)
##
## Welch Two Sample t-test
##
## data: mtcars[mtcars$am == 0, ]$mpg and mtcars[mtcars$am == 1, ]$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
corrplot(cor(mtcars), method="circle")
Below we fit a linear model with the columns we selected earlier, and we can see in the summary that amM (Manual Transmision) offers more MPG than amA (Automatic Transmision). Nevertheless it is important to consider that maybe we have few data to give a definitive conclusion. And have to consider that in that year maybe Manual Transmision cars were more advanced than Automatic. I wouldn’t take this model as a real predictor. Also I draw a residual plot to show that residuals are fairly on the line, not a outlier can be seen.
mtcars$am <- as.factor(mtcars$am)
levels(mtcars$am) <-c("A", "M")
fit <- lm(mpg~cyl+disp+hp+wt+am, mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ cyl + disp + hp + wt + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5952 -1.5864 -0.7157 1.2821 5.5725
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.20280 3.66910 10.412 9.08e-11 ***
## cyl -1.10638 0.67636 -1.636 0.11393
## disp 0.01226 0.01171 1.047 0.30472
## hp -0.02796 0.01392 -2.008 0.05510 .
## wt -3.30262 1.13364 -2.913 0.00726 **
## amM 1.55649 1.44054 1.080 0.28984
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.505 on 26 degrees of freedom
## Multiple R-squared: 0.8551, Adjusted R-squared: 0.8273
## F-statistic: 30.7 on 5 and 26 DF, p-value: 4.029e-10
qqnorm(resid(fit))
qqline(resid(fit))
1. From the data, a Manual Transmission car have better MPG
2. Should we don’t consider this as a definitive analysis. More data should be used.
3. CYL, WT, DISP and HP are the most influential columns in the analysis