Motor Trend Car Road Tests

Linear regression analysis

Correlation analysis

Row 8 and 9 are yes-no types and should not be checked for linear regression.

M <- cor(mtcars[, -c(8,9)])
corrplot.mixed(M)

Which factors have more than 70% correlation, whether positive or negative?

M[lower.tri(M, diag = T)] <- 0

threshold <- 0.7

sapply(row.names(M), function(i) { which(abs(M[i,]) > threshold)  })

## $mpg
##  cyl disp   hp   wt 
##    2    3    4    6 
## 
## $cyl
## disp   hp   wt 
##    3    4    6 
## 
## $disp
##   hp drat   wt 
##    4    5    6 
## 
## $hp
## qsec carb 
##    7    9 
## 
## $drat
## wt 
##  6 
## 
## $wt
## named integer(0)
## 
## $qsec
## named integer(0)
## 
## $gear
## named integer(0)
## 
## $carb
## named integer(0)

Scatter plots

mult_draw <- par(mfrow=c(2, 2))

for (i in row.names(M)) {
  for (j in colnames(M)) {
    if (abs(M[i,j]) > threshold) {
      plot(mtcars[,i]~mtcars[,j], xlab=j, ylab=i)
    }
  }
}

par(mult_draw)

All might be eligible for linear regression analysis, but it depends on research questions and hypotheses.

As of now, I will check the linear regression model of hp (horsepower) and mpg (miles per gallon), as it might be easier to make sense of.

plot(mpg~hp, data=mtcars)

Linear model fits

fit0 <- lm(mpg~hp, data = mtcars)
summary(fit0)

## 
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## hp          -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

F-statistics for ratio of variance is 45.46, with \(p < 0.05\). Therefore, association is significant.

Adjusted \(R^{2}\) is 58.9% for regression line \(Y = 30.1 - 0.068X\). Therefore, goodness of fit is 58.9%. Increase in horsepower of 1 results in reduction of miles per gallon of 0.068.

par(mfrow=c(2,2))
plot(fit0)

QQ plot

QQ plot shows data too peaked in the middle (excess kurtosis), with some right-skewness.

QQ interpretation

Other plots

Variances are not so constant.
Also, Residuals vs Leverage makes me consider alternate models as well.

Motor Trend Car Road Tests

Pacharapol Withayasakpunt

2/11/2021