## corrplot 0.95 loaded
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
It is interesting how there is a correlation between the transmission
and weight, where the automatic cars weigh more than manual cars.
The highest correlation to miles per gallon is the engine type. This could be because the different engine types could use different amounts of gas.
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$vs
## t = 4.8644, df = 30, p-value = 3.416e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4103630 0.8223262
## sample estimates:
## cor
## 0.6640389
Amount of missing data in mtcars:
## [1] 0
There are 4 outliers in hp, wt, qsec, and carb
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Coefficients:
## (Intercept) cyl disp hp drat wt
## 12.30337 -0.11144 0.01334 -0.02148 0.78711 -3.71530
## qsec vs am gear carb
## 0.82104 0.31776 2.52023 0.65541 -0.19942
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4506 -1.6044 -0.1196 1.2193 4.6271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337 18.71788 0.657 0.5181
## cyl -0.11144 1.04502 -0.107 0.9161
## disp 0.01334 0.01786 0.747 0.4635
## hp -0.02148 0.02177 -0.987 0.3350
## drat 0.78711 1.63537 0.481 0.6353
## wt -3.71530 1.89441 -1.961 0.0633 .
## qsec 0.82104 0.73084 1.123 0.2739
## vs 0.31776 2.10451 0.151 0.8814
## am 2.52023 2.05665 1.225 0.2340
## gear 0.65541 1.49326 0.439 0.6652
## carb -0.19942 0.82875 -0.241 0.8122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
## F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
We assume that all the variables are independent of each other, this dataset may not apply to this assumption because the different variables could effect each other.
## [1] 4.609201
##
## Call:
## lm(formula = mpg ~ . + cyl * disp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1697 -1.6096 -0.1275 1.1873 3.8355
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.976395 18.535141 1.617 0.1215
## cyl -1.789619 1.183617 -1.512 0.1462
## disp -0.095947 0.049001 -1.958 0.0643 .
## hp -0.033409 0.020359 -1.641 0.1164
## drat -0.541227 1.584761 -0.342 0.7363
## wt -3.552721 1.717760 -2.068 0.0518 .
## qsec 0.698111 0.664203 1.051 0.3058
## vs 0.828745 1.918957 0.432 0.6705
## am 0.819051 1.997640 0.410 0.6862
## gear 1.554511 1.405425 1.106 0.2818
## carb 0.144212 0.764824 0.189 0.8523
## cyl:disp 0.013762 0.005825 2.363 0.0284 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.401 on 20 degrees of freedom
## Multiple R-squared: 0.8976, Adjusted R-squared: 0.8413
## F-statistic: 15.94 on 11 and 20 DF, p-value: 1.441e-07
The interaction between cylinders and displacement is significant. This increases R^2.
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4506 -1.6044 -0.1196 1.2193 4.6271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337 18.71788 0.657 0.5181
## cyl -0.11144 1.04502 -0.107 0.9161
## disp 0.01334 0.01786 0.747 0.4635
## hp -0.02148 0.02177 -0.987 0.3350
## drat 0.78711 1.63537 0.481 0.6353
## wt -3.71530 1.89441 -1.961 0.0633 .
## qsec 0.82104 0.73084 1.123 0.2739
## vs 0.31776 2.10451 0.151 0.8814
## am 2.52023 2.05665 1.225 0.2340
## gear 0.65541 1.49326 0.439 0.6652
## carb -0.19942 0.82875 -0.241 0.8122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
## F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
Even after doing winsorization on all variables with outliers, the R^2 doesn’t change.