This work refers to the analysis of vehicle consumption. Here we want to answer the following questions: 1. “Is an automatic or manual transmission for better MPG” and 2. “Quantify the MPG difference between automatic and manual transmissions.” Therefore, we created some models, one of them exclusively using categorical shift (automatic or manual) and removing the intercept to assess mpg. The other model finalist mpg = 34 - 4:26 * cyl6 - 6:07 * cyl8 - 3.21 * wt where cy6 and cy8 are binary variables mutually exclusive and using BOTH to zero, we have the level of 4 cylinders. WT indicates that for each unit, there is on average a marginal reduction in consumption by 3.21 units. It was concluded that manual shift yields on average 42% more than the automatic shift, so manual transmission is better for MPG.
The variable “mpg” “correlates with all numeric variables (Fig1: Correlation and Boxplot for mpg by wt, gear), showing that it can have its variability potentially explained by this database. Whereas the explanatory variables also correlate, this can lead us to an inflation of the variance. So here we can assess the variables that are more explanatory and look elegant and streamlined model.
It was necessary to adjust to factor variables “cyl”, “gear”, “carb”, “am”. After adjustment has been practiced a visual analysis of the consumption in relation to some categorical variables in order to understand which of them has any explanatory potential.
It was found that MPG is influenced when considering shift and weight, i.e., appears to have better performance when the vehicle speed is 4 and its weight is smaller than 2000 lbs. They appeared two outliers, one of a Fiat 128 with a yield of about 32 mpg, but the car is in the range above 2000 lbs, which sets an error in the database, as the manufacturer advises that the correct value is something close 1600 pounds. But the other is a Mercedes 280C whose values are correct and therefore is an extreme point of low income (Fig2: Vehicles considered outlier).
It is notable that the MPG also receives influence on the number of cylinders (Fig3: Boxplot for mpg by cyl, carb, am). 4 cylinders has the best performance and generally cars with 5 gears, but not necessarily have the best consumption, there is little variability. So if you buy a car and technical knowledge, give preference to the 5 gear, 4-cylinder, 2 carburetors and manual type.
So far we have learned that the amount of weight of speed and the number of cylinders become information that actually differentiate the car’s performance. Thus, strategy is basically to use them to model mpg.
Considering the models worked, we use the fit3 as a reference to assess whether it is worthwhile to add variables to the Anova tool. See the results suggest that we should keep only the most simplified model (mpg ~ cyl + wt). For this we obtain an Adjusted R² > 0.8, ie, there is a good explanation of variability of vehicle performance when we use the reference weight and number of cylinders.
summary(fit1)$adj.r.squared; summary(fit2)$adj.r.squared; summary(fit3)$adj.r.squared
## [1] 0.7790215
## [1] 0.806939
## [1] 0.8200146
anova(fit3, fit1, fit2)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + wt
## Model 2: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## Model 3: mpg ~ cyl + hp + wt + gear + carb
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 183.06
## 2 15 120.40 13 62.656 0.6004 0.8190
## 3 20 140.25 -5 -19.853 0.4947 0.7754
Using the shapiro test (p-value = 0.259) we accept the hypothesis of normality. Evaluating Residuals vs Fitted (Fig4: Quality assessment model fit3) apparently have possible interference, on quadratic dependency problem. Scale-Location is within up to 2 nd, so is reasonably ok. Residuals vs Leverage. Here are some points “pulling” potentially influencing the model, but are still within an appropriate range. Now we have to make a homoscedasticity test to draw any conclusion. In Breusch- Pagan test test was found p-value = 0.06037. Here it depends on what the researcher must accept in his working hypothesis. To run performance is reasonable mágic number of 5%, so it would be inconclusive because 6% is very close to 5%. In this case we must increase the sample size to assess the consistency of constant variance test or consider the possibility of studying vehicle categories separately. Anyway apparently would not absurd, for this model suggest that the variance of the errors suffer some kind of inflation as the Residuals chart vs Fitted does not demonstrate any “cone”. So we can consider that the constructed model is minimally reasonable and is good enough to explain the variability of vehicle performance considering only wt and cyl.
What we want to answer is whether the manual and automatic shift indicate mpg relatively different. So we set a specific model for this. This new model brings a formation without intercept so that we can evaluate the mpg ratio comparing manual and automatic shift. Their coefficients are: automatic is 17.14737 and 24.39231 for manual. This prints a big difference. Operating 24.39231 / 17.14737 = 1.422510274, ie the manual shift yields on average 42% more than the automatic shift. Note that with this numerical difference as brandy and joining with the visual information of the appendix is not necesssary to hypothesis testing to conclude that the mpg is quite different behavior when comparing the types of shift. Therefore manual transmission for better MPEG and addresses the difference between the income. And if we consider an explanatory model for mpg, we can use the relationship: mpg = 34 - 4.26 * cyl6 - 6.07 * cyl8 - 3.21 * wt where cy6 and cy8 are binary variables mutually exclusive and using both to zero, we have the level of 4 cylinders.
\newpageFig1: Correlation and Boxplot for mpg by wt, gear.
Fig2: Vehicles considered outlier.
Fig3: Boxplot for mpg by cyl, carb, am.
Fig4: Quality assessment model fit3.