A dataset of car performance from Motor Trend Car Roadtests (mtcars) was explored using regression analysis techniques to address questions on the performance differences in fuel consumption (MPG) between cars with automatic and manual transmissions and their quantifiable differences. The findings, including data visualization, showed significant differences in fuel consumption, such that cars with manual transmissions outperformed those with automatic transmissions by 1.81 MPG. Limitations in the analysis were also considered.
## [1] 32 11
## The following object is masked from package:ggplot2:
##
## mpg
T-Test transmission type and MPG
## [1] 0.001373638
The T-Test rejects the null hypothesis that the difference between transmission types is 0.
## mean in group 0 mean in group 1
## 17.14737 24.39231
The difference estimate between the 2 transmissions is 7.24494 MPG in favor of manual transmission.
Fit the full model of the data
Since none of the coefficients have a p-value less than 0.05, we cannot conclude which variables are more statistically significant.
Backward selection determined which variables were most statistically significant, with the new regression model having 4 variables (cylinders, horsepower, weight, transmission). The R-squared value of 0.8659 confirmed that this model explained about 87% of the variance in MPG.
The p-values also are statistically significant because they have a p-value less than 0.05. The coefficients conclude that increasing the number of cylinders from 4 to 6 with decrease the MPG by 3.03. Further increasing the cylinders to 8 with a decrease to the MPG by 2.16. Increasing the horsepower decreases MPG by 3.21 for every 100 horsepower. Weight decreases the MPG by 2.5 for each 1000 lbs increase. A Manual transmission improves the MPG by 1.81.
A stepwise regression was used to determine the best-fit regression model. This involved iteratively adding and removing variables based on Akaike Information Criterion (AIC) because of the relatively low n size. This stepwise process aimed to balance model
Several regression models were considered during the analysis. Polynomial regression was explored in addition to the linear regression model to capture potential non-linear relationships. Additionally, interaction terms were included for interdependent effects between predictor variables.
Alternative regression models,ridge or lasso regression, were considered to address multicollinearity and prevent overfitting. These regularization techniques proved beneficial for the dataset that contained multiple correlated predictors.
The results of the EDA using multple forms of linear regression were found to address the questions in this report:
Comparing “Manual” and “Automatic” transmission types to miles per gallon (MPG), relied on coefficients’ significance. The coefficients for each transmission type differed significantly from zero, and their associated p-values were below a chosen significance level (e.g., 0.05). In short, manual transmission demonstrated significantly better fuel consumption, measure as MPG).
Further analysis of the “mtcars” dataset revealed a significant MPG difference between automatic and manual transmissions. The quantified difference, represented by regression coefficients, indicated a meaningful impact on fuel efficiency. The result is statistically justified, with p-values confirming the significance of the transmission type in explaining the observed MPG variations.
The consideration of various regression models, both linear and non-linear, allowed for a comprehensive evaluation of the dataset, ensuring the chosen model provided the best balance of explanatory power and interpretability. However, given the limited obserservations (n=32) and visual review of the uniqueness of car model types (removing Merc and Toyota) making up close to 72% of the dataset, the interpretation of the regression analysis and results should be regarded with caution.
## [1] 0.001373638
## mean in group 0 mean in group 1
## 17.14737 24.39231
## Length Class Mode
## statistic 1 -none- numeric
## parameter 1 -none- numeric
## p.value 1 -none- numeric
## conf.int 2 -none- numeric
## estimate 2 -none- numeric
## null.value 1 -none- numeric
## stderr 1 -none- numeric
## alternative 1 -none- character
## method 1 -none- character
## data.name 1 -none- character
## Start: AIC=226.88
## hp ~ mpg + wt + drat + qsec
##
## Df Sum of Sq RSS AIC
## - drat 1 94.9 28183 224.98
## - mpg 1 1519.4 29608 226.56
## <none> 28088 226.88
## - wt 1 3861.9 31950 229.00
## - qsec 1 28102.2 56190 247.06
##
## Step: AIC=224.98
## hp ~ mpg + wt + qsec
##
## Df Sum of Sq RSS AIC
## - mpg 1 1424.5 29608 224.56
## <none> 28183 224.98
## + drat 1 94.9 28088 226.88
## - wt 1 3797.9 31981 227.03
## - qsec 1 29625.1 57808 245.97
##
## Step: AIC=224.56
## hp ~ wt + qsec
##
## Df Sum of Sq RSS AIC
## <none> 29608 224.56
## + mpg 1 1425 28183 224.98
## + drat 1 0 29608 226.56
## - wt 1 43026 72633 251.28
## - qsec 1 52881 82489 255.35
##
## Call:
## lm(formula = hp ~ wt + qsec, data = mtcars)
##
## Coefficients:
## (Intercept) wt qsec
## 441.26 38.67 -23.47
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.723053 5.8990407 1.648243 0.1108925394
## wt -2.936531 0.6660253 -4.409038 0.0001488947
## qsec 1.016974 0.2520152 4.035366 0.0004030165
## am1 14.079428 3.4352512 4.098515 0.0003408693
## wt:am1 -4.141376 1.1968119 -3.460340 0.0018085763
## [1] 0