There are 32 different car types and 11 different variables.
hist(mtcars$mpg, main="Miles per Gallon", xlab="mpg")
hist(mtcars$wt, main="Weight", xlab="Weight (1000 lbs)")
hist(mtcars$hp, main="Horsepower", xlab="hp")
hist(mtcars$cyl, main="Cylinders", xlab="cyl")
hist(mtcars$qsec, main="Quarter-mile time", xlab="qsec")
hist(mtcars$disp, main="Displacement", xlab="disp")
plot(mtcars$wt, mtcars$mpg, main="mpg vs Weight", xlab="Weight (1000 lbs)", ylab="mpg")
plot(mtcars$hp, mtcars$mpg, main="mpg vs Horsepower", xlab="Horsepower", ylab="mpg")
plot(mtcars$cyl, mtcars$mpg, main="mpg vs Cylinders", xlab="Cylinders", ylab="mpg")
cor(mtcars)
## mpg cyl disp hp drat wt
## mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594
## cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958
## disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799
## hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479
## drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406
## wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000
## qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159
## vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157
## am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953
## gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870
## carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059
## qsec vs am gear carb
## mpg 0.41868403 0.6640389 0.59983243 0.4802848 -0.55092507
## cyl -0.59124207 -0.8108118 -0.52260705 -0.4926866 0.52698829
## disp -0.43369788 -0.7104159 -0.59122704 -0.5555692 0.39497686
## hp -0.70822339 -0.7230967 -0.24320426 -0.1257043 0.74981247
## drat 0.09120476 0.4402785 0.71271113 0.6996101 -0.09078980
## wt -0.17471588 -0.5549157 -0.69249526 -0.5832870 0.42760594
## qsec 1.00000000 0.7445354 -0.22986086 -0.2126822 -0.65624923
## vs 0.74453544 1.0000000 0.16834512 0.2060233 -0.56960714
## am -0.22986086 0.1683451 1.00000000 0.7940588 0.05753435
## gear -0.21268223 0.2060233 0.79405876 1.0000000 0.27407284
## carb -0.65624923 -0.5696071 0.05753435 0.2740728 1.00000000
hist(mtcars\(mpg, main="Miles per Gallon", xlab="mpg") hist(mtcars\)wt, main=“Weight”, xlab=“Weight (1000 lbs)”)
plot(mtcars\(wt, mtcars\)mpg, main=“mpg vs Weight”, xlab=“Weight (1000 lbs)”, ylab=“mpg”) plot(mtcars\(hp, mtcars\)mpg, main=“mpg vs Horsepower”, xlab=“Horsepower”, ylab=“mpg”)
cor(mtcars$mpg, mtcars)
## mpg cyl disp hp drat wt qsec
## [1,] 1 -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684
## vs am gear carb
## [1,] 0.6640389 0.5998324 0.4802848 -0.5509251
It really interests me how strong the negative correlation between cylinders and miles per gallon is. This is not something that I have previously considered in vehicle gas mileage. It also interested me how weight and mpg are so directly correlated in a negative way.
cor(mtcars$mpg, mtcars)
Weight, engine size, cylinder count, and horsepower are the variables most strongly correlated with mpg. This is because all of these have to do with engine size and general weight, which is the largest component in gas needed to move a vehicle.
sum(is.na(mtcars))
## [1] 0
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
The mtcars dataset has no null values since sum(is.na(mtcars)) = 0. There appears to be no major inconsistencies in this mtcars dataset. summary(mtcars) and str(mtcars) show the consistency of this dataset.
regress <- lm(mpg ~., data = mtcars)
summary(regress)
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4506 -1.6044 -0.1196 1.2193 4.6271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337 18.71788 0.657 0.5181
## cyl -0.11144 1.04502 -0.107 0.9161
## disp 0.01334 0.01786 0.747 0.4635
## hp -0.02148 0.02177 -0.987 0.3350
## drat 0.78711 1.63537 0.481 0.6353
## wt -3.71530 1.89441 -1.961 0.0633 .
## qsec 0.82104 0.73084 1.123 0.2739
## vs 0.31776 2.10451 0.151 0.8814
## am 2.52023 2.05665 1.225 0.2340
## gear 0.65541 1.49326 0.439 0.6652
## carb -0.19942 0.82875 -0.241 0.8122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
## F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
plot(regress)
est <- predict(regress, mtcars)
mse <- mean((mtcars$mpg - est)^2)
mse
## [1] 4.609201
interact <- lm(mpg ~ wt * am + ., data = mtcars)
summary(interact)
##
## Call:
## lm(formula = mpg ~ wt * am + ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0807 -1.4803 -0.4741 1.3226 4.5850
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.070807 17.191096 0.120 0.90532
## wt -3.030663 1.712406 -1.770 0.09200 .
## am 13.334354 4.663324 2.859 0.00969 **
## cyl 0.225567 0.942201 0.239 0.81323
## disp 0.004384 0.016328 0.269 0.79105
## hp -0.006131 0.020359 -0.301 0.76643
## drat 0.359683 1.469370 0.245 0.80911
## qsec 1.109647 0.662235 1.676 0.10938
## vs -0.077414 1.884792 -0.041 0.96764
## gear 1.108383 1.344775 0.824 0.41954
## carb -0.090545 0.740919 -0.122 0.90395
## wt:am -4.137886 1.640318 -2.523 0.02023 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.365 on 20 degrees of freedom
## Multiple R-squared: 0.9006, Adjusted R-squared: 0.846
## F-statistic: 16.48 on 11 and 20 DF, p-value: 1.081e-07
boxplot(mtcars$wt, main="Boxplot of Weight")
mtcars$wt_wins <- Winsorize(mtcars$wt)
summary(mtcars$wt_wins)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.736 2.581 3.325 3.222 3.610 5.293
model_winsorized <- model_wins <- lm(mpg ~ . - wt + wt_wins, data = mtcars)
summary(model_winsorized)
##
## Call:
## lm(formula = mpg ~ . - wt + wt_wins, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6280 -1.4760 -0.1924 1.2648 4.5156
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.81707 18.95690 0.729 0.4741
## cyl -0.09028 1.06107 -0.085 0.9330
## disp 0.01202 0.01861 0.646 0.5252
## hp -0.02146 0.02223 -0.965 0.3454
## drat 0.76681 1.66752 0.460 0.6503
## qsec 0.72003 0.73178 0.984 0.3363
## vs 0.51686 2.13095 0.243 0.8107
## am 2.42856 2.09624 1.159 0.2597
## gear 0.75687 1.51096 0.501 0.6216
## carb -0.27079 0.85296 -0.317 0.7540
## wt_wins -3.61419 2.05327 -1.760 0.0929 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.691 on 21 degrees of freedom
## Multiple R-squared: 0.865, Adjusted R-squared: 0.8006
## F-statistic: 13.45 on 10 and 21 DF, p-value: 5.141e-07
summary(regress)
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4506 -1.6044 -0.1196 1.2193 4.6271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337 18.71788 0.657 0.5181
## cyl -0.11144 1.04502 -0.107 0.9161
## disp 0.01334 0.01786 0.747 0.4635
## hp -0.02148 0.02177 -0.987 0.3350
## drat 0.78711 1.63537 0.481 0.6353
## wt -3.71530 1.89441 -1.961 0.0633 .
## qsec 0.82104 0.73084 1.123 0.2739
## vs 0.31776 2.10451 0.151 0.8814
## am 2.52023 2.05665 1.225 0.2340
## gear 0.65541 1.49326 0.439 0.6652
## carb -0.19942 0.82875 -0.241 0.8122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
## F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
The intercept shows what the mpg would be if all of the other predictors were zero. The weight equaling -3.88 just shows that every 1000lb increase in vehicle leads to a decrease of 3.88 mpg. The horsepower being -0.02 shows that every additional unit of horsepower leads to a 0.02 decrease in mpg. The transmission being 2.52 shows that manual transmissions often have a higher mpg of 2.52 than automatics. plot(regress) These plots have showed almost linear relationships that assumes consistency and predictability is a key factor in this model. These assumptions are being met by this dataset according to these plots. 4.61 is the MSE value.
summary(interact)
##
## Call:
## lm(formula = mpg ~ wt * am + ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0807 -1.4803 -0.4741 1.3226 4.5850
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.070807 17.191096 0.120 0.90532
## wt -3.030663 1.712406 -1.770 0.09200 .
## am 13.334354 4.663324 2.859 0.00969 **
## cyl 0.225567 0.942201 0.239 0.81323
## disp 0.004384 0.016328 0.269 0.79105
## hp -0.006131 0.020359 -0.301 0.76643
## drat 0.359683 1.469370 0.245 0.80911
## qsec 1.109647 0.662235 1.676 0.10938
## vs -0.077414 1.884792 -0.041 0.96764
## gear 1.108383 1.344775 0.824 0.41954
## carb -0.090545 0.740919 -0.122 0.90395
## wt:am -4.137886 1.640318 -2.523 0.02023 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.365 on 20 degrees of freedom
## Multiple R-squared: 0.9006, Adjusted R-squared: 0.846
## F-statistic: 16.48 on 11 and 20 DF, p-value: 1.081e-07
This summary shows that there is a significant difference in the way that weight affects vehicles when there is a manual transmission vs a manual transmission. These manuals lose less mpg than the automatics do as weight increases. There were two outliers in the dataset. After winsorizing these pieces of data, the R^2 went from a .84 to a .86 which is a slight increase. The improvement in this R^2 will most likely not improve the predictability of this model significantly.