The Regression Models Course Project is about studying the Motor “Trend Car Road Test” dataset which could be loaded directly for the libraries of R. The data was extracted from the 1974 Motor Trend US magazine and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
Description of variables:
mpg Miles/(US) gallon: how far a car can travel
cyl Number of cylinders
disp Displacement (in cubic inches): this metric gives a good proxy for the total amount of power the engine can generate. It measures the volume of an engine’s cylinders.
hp Gross horsepower(HP)
drat Rear axle ratio: indicates the ratio of the angular velocity of the input gear to the angular velocity of the output gear. A vehicle with a high ratio would provide more torque and thus more towing capability
wt Weight (1000 lbs.)
qsec 1/4 mile time. A performance measure, primarily of acceleration from standstill
am Transmission (0 = automatic, 1 = manual)
gear Number of forward gears
carb Number of carburetors. A carburetor is a device that blends air and fuel for an internal combustion engine
Our main goal is to explore the relationship between a set of variables (predictors) and miles per gallon (outcome). Basically, we have been asked to answer the next research questions:
library(ggplot2) #R graphics library
library(gridExtra) #Some additions for graphing
library(dplyr) #Useful for data transformation
library(plyr) #Useful for data transformation
library(car) #Includes useful statistical functions
library(knitr) #Useful for graphing
library(MASS) #Another package for statistics
library(effects) #Package for visualizing plots (specially regression models)
library(psych) #useful for correlation tests
library(jtools) #useful for interaction plots
library(modelr) #useful for obtaining performance metrics
library(broom) #convert to a tidy dataframe the model's metrics
#Data can be extracted using the dataset: "mtcars". We assign this dataframe to other variable called data.cars:
data.cars <- mtcars
head(data.cars,5)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Firstly, we start exploring the dataset:
str(data.cars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
summary(data.cars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
We have just identified that all of our variables correspond to numerical values. However, according to our summarized data, there are some variables that can treated as factors: cyl, carb, drat and gear.
It is important to state that every row of our dataset is a new vehicle model for the current sample. In general, there are distinguished types that vary according to their capacity. For example, the overall Horse Power is reflecting 146.7 in its mean, we have a minimum value of 52 and a maximum value of 335. According to their MPG development, there are on average 20 miles per gallon approximately for the overall sample; adding that the overall median is nearly equal to the mean, so we would expect a normally distributed data.
By looking at one of our main predictor AM, we can see that our variable is binary. The “0” value corresponds to Automatic Transmission and “1” for Manual Transmission. So we proceed to given this variable a more meanful representation:
data.cars$am <- as.factor(data.cars$am)
data.cars$am <- revalue(data.cars$am, c("0"="Automatic", "1"="Manual"))
We can clearly see from our summary representation, that we have outliers in some variables. It will be useful to see how variable is our data, looking at the outliers and see the differences in “AM” groups by creating boxplots:
mpg <- ggplot(data.cars, aes(x=am, y = mpg)) +
geom_boxplot(fill="cornflowerblue") + xlab("Transmission") + ylab("Median") +
theme(plot.title = element_text(hjust=0.5)) +
theme(plot.title = element_text(hjust=0.5, size = 15, face = "italic")) + labs(title="Fuel Efficiency (mpg)")
disp <- ggplot(data.cars, aes(x=am, y = disp, fill = factor(am))) +
geom_boxplot(fill="gold") + xlab("Transmission") + ylab("Median") +
theme(plot.title = element_text(hjust=0.5)) +
theme(plot.title = element_text(hjust=0.5, size = 15, face = "italic")) + labs(title="Displacement in Cubic Inches(disp) Boxplot")
hp <- ggplot(data.cars, aes(x=am, y = hp, fill = factor(am))) +
geom_boxplot(fill="darkseagreen1") + xlab("Transmission") + ylab("Median") +
theme(plot.title = element_text(hjust=0.5)) +
theme(plot.title = element_text(hjust=0.5, size = 15, face = "italic")) + labs(title="Engine's Power Output(HP) Boxplot")
drat <- ggplot(data.cars, aes(x=am, y = drat, fill = factor(am))) +
geom_boxplot(fill="lightblue3") + xlab("Transmission") + ylab("Median") +
theme(plot.title = element_text(hjust=0.5)) +
theme(plot.title = element_text(hjust=0.5, size = 15, face = "italic")) + labs(title="Rear axle ratio(drat) Boxplot")
wt <- ggplot(data.cars, aes(x=am, y = wt, fill = factor(am))) +
geom_boxplot(fill="khaki2") + xlab("Transmission") + ylab("Median") +
theme(plot.title = element_text(hjust=0.5)) +
theme(plot.title = element_text(hjust=0.5, size = 15, face = "italic")) + labs(title="Weight 1,000 lbs (wt) Boxplot")
qsec <- ggplot(data.cars, aes(x=am, y = qsec, fill = factor(am))) +
geom_boxplot(fill="lightslateblue") + xlab("Transmission") + ylab("Median") +
theme(plot.title = element_text(hjust=0.5)) +
theme(plot.title = element_text(hjust=0.5, size = 15, face = "italic")) + labs(title="Aceleration Performance in seconds (qsec) Boxplot")
grid.arrange(mpg,disp,hp,drat,wt,qsec)
1.According to each variability (size of boxes) for every variable, we do see a such difference for every type of transmission; and that is mainly to the fact that different types of car models were selected randomly. Moreover, we can see an unbalanced design in respect to those variables:
detach("package:dplyr", character.only = TRUE)
library(dplyr)
data.cars.no <- data.cars %>%
group_by(am) %>%
summarize(n = n())
data.cars.type <- data.cars %>%
mutate(car.model = row.names(data.cars)) %>%
filter(disp > 300 | hp > 200) %>%
select(car.model, disp, hp, am) %>%
arrange(desc(as.character(am)))
print(data.cars.no)
## # A tibble: 2 x 2
## am n
## <fct> <int>
## 1 Automatic 19
## 2 Manual 13
print(data.cars.type)
## car.model disp hp am
## 1 Ford Pantera L 351 264 Manual
## 2 Maserati Bora 301 335 Manual
## 3 Hornet Sportabout 360 175 Automatic
## 4 Duster 360 360 245 Automatic
## 5 Cadillac Fleetwood 472 205 Automatic
## 6 Lincoln Continental 460 215 Automatic
## 7 Chrysler Imperial 440 230 Automatic
## 8 Dodge Challenger 318 150 Automatic
## 9 AMC Javelin 304 150 Automatic
## 10 Camaro Z28 350 245 Automatic
## 11 Pontiac Firebird 400 175 Automatic
In general, 50% of the time, fuel efficiency is more effective (MPG) in manual transmission than automatic transmission. Also, it appears to be more variation in manual transmission than automatic transmission (from 23 to 31); so the latest is more consistent in terms of MPG but not as higher as the manual transmission.
However, this is not the case for the overall power performance; automatic transmission shows more efficiency in output power in both displacement and engines. For the case of displacement we do see a bigger difference of nearly 275 (automatic) and 120 (manual) = 165 cubic inches.
In terms of wheel’s performance (Rear Axie Ratio), manual transmission seems to be more effective.
Apparently, there is no such big difference in aceleration performance and its variation for both automatic and manual transmision, giving an output of around 17.90 and 17 seconds respectively.
Based on these conclusions we will now proceed in answering our research questions and whether the type of transmission AM should be a confounder for fitting a regression model to establish a difference in MPG for both manual and automatic types.
Now that we have a clear idea of our “Trend Car Road Test” dataset, our next step is constructing a regression model in order to answer our research questions. We proceed to construct a new data frame “data.cars.mod”
data.cars.mod <- data.cars %>%
select(mpg, disp, hp, wt, qsec, am)
First of all, we should get to know the relationships between our selected variables. We can do this by doing a scatterplot matrix using the “psych” package:
pairs.panels(data.cars.mod[c("mpg", "disp", "hp", "wt", "qsec")])
#What is interesting about this package that it also displays the correlation strenght and distribution of variables.
From the above matrix it appears to be a linear relationship between MPG and the rest of variables. Even more, we can see that our outcome variable MPG approximates to a normal distribution, which is good for fitting a model. Also, we can see a highly skewed “HP” predictor and a normally distributed QSEC and WT variables. This is not the case for DISP since it doesn’t appear to have normal distribution.
Stretched ovals indicate a strong correlation between variables. It appears that our outcome MPG is somehow related to each predictor variable which is good for fitting a model. However, highly correlated predictor variables such as DISP - HP , WT - DISP and HP-QSEC may give multicollinearity problems when doing our predictions.
predict.cor <- cor(data.cars.mod[c("mpg", "disp", "hp", "wt", "qsec")])
kable(predict.cor, format = 'markdown')
mpg | disp | hp | wt | qsec | |
---|---|---|---|---|---|
mpg | 1.0000000 | -0.8475514 | -0.7761684 | -0.8676594 | 0.4186840 |
disp | -0.8475514 | 1.0000000 | 0.7909486 | 0.8879799 | -0.4336979 |
hp | -0.7761684 | 0.7909486 | 1.0000000 | 0.6587479 | -0.7082234 |
wt | -0.8676594 | 0.8879799 | 0.6587479 | 1.0000000 | -0.1747159 |
qsec | 0.4186840 | -0.4336979 | -0.7082234 | -0.1747159 | 1.0000000 |
Effectively, we have identified higher correlation between the variables previously mentioned. If we took a glance of a preliminary model, we would obtain the regression equation as follows:
fit.cars.1 <- lm( mpg ~ am + disp + hp + wt + qsec, data = data.cars.mod)
summary(fit.cars.1)
##
## Call:
## lm(formula = mpg ~ am + disp + hp + wt + qsec, data = data.cars.mod)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5399 -1.7398 -0.3196 1.1676 4.5534
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.36190 9.74079 1.474 0.15238
## amManual 3.47045 1.48578 2.336 0.02749 *
## disp 0.01124 0.01060 1.060 0.29897
## hp -0.02117 0.01450 -1.460 0.15639
## wt -4.08433 1.19410 -3.420 0.00208 **
## qsec 1.00690 0.47543 2.118 0.04391 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.429 on 26 degrees of freedom
## Multiple R-squared: 0.8637, Adjusted R-squared: 0.8375
## F-statistic: 32.96 on 5 and 26 DF, p-value: 1.844e-10
Automatic.1 <- fit.cars.1$coefficients[1]
Manual.1 <- fit.cars.1$coefficients[1] + fit.cars.1$coefficients[2]
Difference.1 <- fit.cars.1$coefficients[2]
cat("Automatic Transmission - MPG (Average):", Automatic.1)
## Automatic Transmission - MPG (Average): 14.3619
cat("Manual Transmission - MPG (Average):", Manual.1)
## Manual Transmission - MPG (Average): 17.83236
cat("Difference in AM Transmission - MPG (Average):", Difference.1)
## Difference in AM Transmission - MPG (Average): 3.470453
Our model yields significant results by looking at the Multiple R-Squared value of 86.37% (how much of the model outcome MPG is explained by adding the selected predictors); more importantly is seeing the Adjusted R-Squared (83.75%) value since it penalizes the addition of more predictors in our model.
All else being constant, a 1000 lbs. increase in average weight (wt) appears to be associated with 4.08 decrease in Miles Per Gallon performance (MPG)
All else being constant, the average in MPG for Automatic Transmission is 14.36 whereas Manual Transmission is estimated to be a total of 17.83; representing a difference of a 3.47 between both AM types
We are now going to construct a stepwise process for iteratively adding and removing predictors in order to find the subset of variables in the data, resulting in the best performing model (lowering prediction errors). By doing this step, we will also find out if our assumptions of whether or not avoiding correlated variables performs well the model. The key here is finding the best model which reduces the AIC errors (Akaike Information Criterion, an estimator for meassuring the quality of statistical models)
AIC = 2k - 2 Ln(L)
where,
k = number of estimated parameters
L = maximum function value of statistical parameters
#We can run a stepwise process using the "MASS" package in R. This gives us the option to use both FORWARD and BACKWARD to find which fits best
step.cars <- stepAIC(fit.cars.1, direction ="both")
## Start: AIC=62.16
## mpg ~ am + disp + hp + wt + qsec
##
## Df Sum of Sq RSS AIC
## - disp 1 6.629 160.07 61.515
## <none> 153.44 62.162
## - hp 1 12.572 166.01 62.682
## - qsec 1 26.470 179.91 65.255
## - am 1 32.198 185.63 66.258
## - wt 1 69.043 222.48 72.051
##
## Step: AIC=61.52
## mpg ~ am + hp + wt + qsec
##
## Df Sum of Sq RSS AIC
## - hp 1 9.219 169.29 61.307
## <none> 160.07 61.515
## + disp 1 6.629 153.44 62.162
## - qsec 1 20.225 180.29 63.323
## - am 1 25.993 186.06 64.331
## - wt 1 78.494 238.56 72.284
##
## Step: AIC=61.31
## mpg ~ am + wt + qsec
##
## Df Sum of Sq RSS AIC
## <none> 169.29 61.307
## + hp 1 9.219 160.07 61.515
## + disp 1 3.276 166.01 62.682
## - am 1 26.178 195.46 63.908
## - qsec 1 109.034 278.32 75.217
## - wt 1 183.347 352.63 82.790
print(step.cars$anova)
## Stepwise Model Path
## Analysis of Deviance Table
##
## Initial Model:
## mpg ~ am + disp + hp + wt + qsec
##
## Final Model:
## mpg ~ am + wt + qsec
##
##
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 26 153.4378 62.1619
## 2 - disp 1 6.628654 27 160.0665 61.5153
## 3 - hp 1 9.219469 28 169.2859 61.3073
There are two predictors that have been identified as non-significant throughout the stepwise process: “disp and hp”. So, in both directions, the function returns the final model:
mpg ~ am + wt + qsec
fit.cars.2 <- lm( mpg ~ am + qsec + wt , data = data.cars.mod)
summary(fit.cars.2)
##
## Call:
## lm(formula = mpg ~ am + qsec + wt, data = data.cars.mod)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## amManual 2.9358 1.4109 2.081 0.046716 *
## qsec 1.2259 0.2887 4.247 0.000216 ***
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
Automatic.2 <- fit.cars.2$coefficients[1]
Manual.2 <- fit.cars.2$coefficients[1] + fit.cars.1$coefficients[2]
Difference.2 <- fit.cars.2$coefficients[2]
cat("Automatic Transmission - MPG (Average):", Automatic.2)
## Automatic Transmission - MPG (Average): 9.617781
cat("Manual Transmission - MPG (Average):", Manual.2)
## Manual Transmission - MPG (Average): 13.08823
cat("Difference in AM Transmission - MPG (Average):", Difference.2)
## Difference in AM Transmission - MPG (Average): 2.935837
Our Adjusted R-Squared is now 83.36% and it’s nearly equal compared to the first model. Coefficient estimates intercepts are now different since our model equation has changed as well:
All else being constant, the average in MPG for Automatic Transmission is 9.61 whereas Manual Transmission is estimated to be a total of 13.08; representing a difference of a 2.93 between both AM types
Interaction plots:
effects.2 <- allEffects(fit.cars.2, lattent = FALSE)
plot(allEffects(fit.cars.2, lattent = TRUE), multiline = TRUE)
So manual transmission gives the most representative effectiveness for Miles Per Gallon (MPG). Moreover, we can see graphically the positive effect that QSEC has on MPG and the negative effect that wt has on MPG.
Once we have detected the significant variables it may be worth to check if predictors WT and QSEC are somehow interacting with the categorical AM predictor:
#wt and transmission
fit.wt <- lm(mpg ~ wt*am, data = data.cars.mod)
interact_plot(fit.wt, pred = "wt", modx = "am") # es significativa la interaccion
#qsec and am
fit.qsec <- lm(mpg ~ qsec*am, data = data.cars.mod)
interact_plot(fit.qsec, pred = "qsec", modx = "am") # es poca la interaccion
fit.cars.3 <- lm(mpg ~ am*wt + qsec, data = data.cars.mod)
summary(fit.cars.3)
##
## Call:
## lm(formula = mpg ~ am * wt + qsec, data = data.cars.mod)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5076 -1.3801 -0.5588 1.0630 4.3684
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.723 5.899 1.648 0.110893
## amManual 14.079 3.435 4.099 0.000341 ***
## wt -2.937 0.666 -4.409 0.000149 ***
## qsec 1.017 0.252 4.035 0.000403 ***
## amManual:wt -4.141 1.197 -3.460 0.001809 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.084 on 27 degrees of freedom
## Multiple R-squared: 0.8959, Adjusted R-squared: 0.8804
## F-statistic: 58.06 on 4 and 27 DF, p-value: 7.168e-13
We can see that it seems to be a representative change in both manual and automatic transmission when WT of Weight is present rather than QSEC. Interestingly enough, is that when weight is around 2,800 lbs., both transmission types perform similarly. We do an ANOVA test to assess significance:
anova(fit.cars.2, fit.cars.3)
## Analysis of Variance Table
##
## Model 1: mpg ~ am + qsec + wt
## Model 2: mpg ~ am * wt + qsec
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 169.29
## 2 27 117.28 1 52.01 11.974 0.001809 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Both models are compared according to their significant coefficients and with their RSS (Residual Standard Errors), which evaluates the overall quality of the fitted model. So model 2 (Model 2: mpg ~ am * wt + qsec) is more significant than the original model (Model 1: mpg ~ am + wt + qsec)
We have to take into account that whenever R deploys a regression model, there are some assumptions about the data: Linearity of data; Normality of Residuals; Multicollinearity; and Homogeneity of Residuals (Constant Variance or Homoscedasticity). In previous calculations, we have assesed the Multicollinearity issue.
ASSESSING Model 1: mpg ~ am + wt + qsec
#using the car package we can plot four diagnostic plots: a) Residuals vs Fitted, b) Normal QQ plot, c) Scale-location and d) Residuals vs Leverage
par(mfrow = c(2,2))
plot(fit.cars.2, main = "mpg ~ am + wt + qsec")
ASSESSING Model 2: mpg ~ am * wt + qsec
par(mfrow = c(2,2))
plot(fit.cars.3, main = "mpg ~ am * wt + qsec")
Ideally, we would expect that our predicted values (fitted) nearly approximate an horizontal line; meaning a less distance between our residuals and the fitted values.
Residuals appear to be nearly normally distributed.
By standardize our residuals, we can determine if we have a heteroscedasticity. Ideally we would expect the predicted line to be horizontal.
Through a Cook’s Distance method, we can determine Leverage points and potential outliers.
Model 2 seems to fit pretty well compared to the Model 1, and this could be appratiated by looking at the Residuals Vs Fitted plot
In order to improve the model, it is worth to take a look at the potential outliers and whether we should include them in our predictions. For example, in MODEL 1 Fiat 128 and Chrysler Imperial seem to be nearly around the Cook’s Distance borders. Whereas, in MODEL 2 we have the new Maserati Bora model, and Chrysler Imperial / Fiat 128 as well.
After the diagnostics stage, we are now going to assess the model’s accuracy. For this, we are going to use the next metrics:
R-Squared: proportion of variation in the outcome that is explained by the predictor variables
RMSE: Root Mean Squared Error. That will measure the average error in predicting the outcome from observations. This is the average difference between the observed actual outcome values and the predicted values.
MAE: Mean Absolute Error. Average absolute difference between observed and predicted outcomes. Less sensitive to outliers.
m1 <- data.frame(
MODEL = "mpg ~ am + wt + qsec",
R2 = rsquare(fit.cars.2, data = data.cars.mod ),
RMSE = rmse(fit.cars.2, data = data.cars.mod),
MAE = mae(fit.cars.2, data = data.cars.mod) )
m2 <- data.frame(
MODEL = "mpg ~ am * wt + qsec",
R2 = rsquare(fit.cars.3, data = data.cars.mod ),
RMSE = rmse(fit.cars.3, data = data.cars.mod),
MAE = mae(fit.cars.3, data = data.cars.mod) )
metrics <- rbind(m1,m2)
kable(metrics, format = 'markdown')
MODEL | R2 | RMSE | MAE |
---|---|---|---|
mpg ~ am + wt + qsec | 0.8496636 | 2.300040 | 1.931954 |
mpg ~ am * wt + qsec | 0.8958514 | 1.914389 | 1.580804 |
From the above results, we can see the Model 2 = mpg ~ am * wt + qsec , performs better than Model 1, due to the fact that errors are less than the last one. Also, a higher value R-Squared is indicating that 89.5% of the model is explained by the predictor variables.
It will be helpful if we can give an idea of how the Prediction Values VS the Actual Values perform in both Model 1 and 2:
par(mfrow = c(1,2))
plot(predict(fit.cars.2), data.cars$mpg, col = "Blue", cex = 0.6,
xlab="predicted",ylab="actual", main = "MODEL 1: mpg ~ am + wt + qsec")
plot(predict(fit.cars.3), data.cars$mpg, col = "Red", cex = 0.6,
xlab="predicted",ylab="actual", main = "MODEL 2: mpg ~ am*wt + qsec")
In conclusion both models have performed well in terms of predicting the value of MPG. However, MODEL 2 shows less predicting errors so it means that it would be more accurate predictions.