# Motor Trend Car Road Tests ####
# The data was extracted from the 1974 Motor Trend US magazine, and
# comprises fuel consumption and 10 aspects of automobile design and
# performance for 32 automobiles (1973-74 models)
data("mtcars")
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
plot(mpg~wt,data = mtcars)
Figure 1: Plot of weight versus mpg of the cars.
plot(mpg~hp, data = mtcars)
Figure 2: Weight versus gross horse power of the cars.
I think that the gross horse power needs to be transformed, the weight also needs to be transformed, and the miles per gallon is generally normal, but I will test in the next question is the data needs to be transformed.
hist(mtcars$hp)
Figure 3: Histogram of the gross horse power of the cars.
qqnorm(mtcars$hp)
qqline(mtcars$hp)
Figure 4: Q-Q Plot of the gross horse power of the cars.
hist(mtcars$wt)
Figure 5: Histogram of the weight of the cars.
qqnorm(mtcars$wt)
qqline(mtcars$wt)
Figure 6: Q-Q Plot of the weight of the cars.
hist(mtcars$mpg)
Figure 7: Histogram of the Miles per Galon of the cars.
qqnorm(mtcars$mpg)
qqline(mtcars$mpg)
Figure 8: Q-Q Plot of the Miles per Galon of the cars.
The only variable we needed to transform is the gross horse power. While looking at the raw data of the graph it is more like a Poisson distribution and when using the square root function we can more closely approximate a normal distribution.
hpSqrt<-sqrt(mtcars$hp)
hist(hpSqrt)
Figure 9: Square Root ransformation test via a histogram and a q-q plot for Horse Power.
qqnorm(hpSqrt)
qqline(hpSqrt)
Figure 9: Square Root ransformation test via a histogram and a q-q plot for Horse Power.
mpgLog<-log10(mtcars$mpg+0.0001)
hist(mpgLog)
Figure 10: Log transformation tests via a histagram and a q-q plot for Miles per Galon
qqnorm(mpgLog)
qqline(mpgLog)
Figure 10: Log transformation tests via a histagram and a q-q plot for Miles per Galon
cars.LM<-lm(mpg~hp+wt, data = mtcars)
plot(cars.LM)
Figure 11: Linear Model of the raw data which shows a bow-shaped curve.
Figure 11: Linear Model of the raw data which shows a bow-shaped curve.
Figure 11: Linear Model of the raw data which shows a bow-shaped curve.
Figure 11: Linear Model of the raw data which shows a bow-shaped curve.
cars.LM2<-lm(mpgLog~hpSqrt+wt, data = mtcars)
summary(cars.LM2)
##
## Call:
## lm(formula = mpgLog ~ hpSqrt + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.07654 -0.03460 -0.01095 0.02994 0.11870
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.764760 0.036694 48.094 < 2e-16 ***
## hpSqrt -0.018448 0.004191 -4.402 0.000133 ***
## wt -0.081639 0.011893 -6.864 1.53e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04658 on 29 degrees of freedom
## Multiple R-squared: 0.8786, Adjusted R-squared: 0.8703
## F-statistic: 105 on 2 and 29 DF, p-value: 5.232e-14
plot(cars.LM2)
Figure 12: Linear Model of the transformed data.
Figure 12: Linear Model of the transformed data.
Figure 12: Linear Model of the transformed data.
Figure 12: Linear Model of the transformed data.
We reject the null hypothesis. The p-value is really low like 0.0000 which is way below the 0.05 threshold. The resuluts are statistically significant. I can’t plot the data because I own a mac.
I did drop the interaction term because the p-value is .22508 which is above the 0.05 p-value threshold set.
cars.LM3<-lm(mpgLog~hpSqrt+wt+hpSqrt*wt, data=mtcars)
plot(cars.LM3)
Figure 13: Linear Model checking interactions for significance.
Figure 13: Linear Model checking interactions for significance.
Figure 13: Linear Model checking interactions for significance.
Figure 13: Linear Model checking interactions for significance.
summary(cars.LM3)
##
## Call:
## lm(formula = mpgLog ~ hpSqrt + wt + hpSqrt * wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.068373 -0.035845 -0.005321 0.033176 0.097228
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.928979 0.137283 14.051 3.31e-14 ***
## hpSqrt -0.032288 0.011904 -2.712 0.01129 *
## wt -0.138722 0.047501 -2.920 0.00683 **
## hpSqrt:wt 0.004592 0.003702 1.241 0.22508
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04615 on 28 degrees of freedom
## Multiple R-squared: 0.885, Adjusted R-squared: 0.8726
## F-statistic: 71.81 on 3 and 28 DF, p-value: 2.908e-13
The multiple linear regression model for car miles per gallon versus horse power and weight was significant(p-value<0.0001; Multiple R^2=0.8726). There is a significant positive relationship between miles per gallon(MPG) and horsepower(p-value<0.001). There is also a significant positive relationship between mile per gallon(MPG) and weight(p-value<0.001). The interaction term was dropped because of lack of significant(p=0.225). Miles per gallon was log transformed and horsepower was square root transformed to approximate normality and homogeneity of variance of the residuals.
Please turn–in your homework via Sakai by saving and submitting an R Markdown PDF or HTML file from R Pubs!