library(corrplot)
## corrplot 0.84 loaded

Description

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

Format

A data frame with 32 observations on 11 (numeric) variables.

[, 1]   mpg Miles/(US) gallon
[, 2]   cyl Number of cylinders                       -- ordinal (3)
[, 3]   disp    Displacement (cu.in.)
[, 4]   hp  Gross horsepower
[, 5]   drat    Rear axle ratio
[, 6]   wt  Weight (1000 lbs)
[, 7]   qsec    1/4 mile time
[, 8]   vs  Engine (0 = V-shaped, 1 = straight)       -- ordinal
[, 9]   am  Transmission (0 = automatic, 1 = manual)  -- ordinal
[,10]   gear    Number of forward gears                 -- ordinal (3)
[,11]   carb    Number of carburetors                   -- ordinal (5)

Linear regression analysis

Correlation analysis

Row 8 and 9 are yes-no types and should not be checked for linear regression.

M <- cor(mtcars[, -c(8,9)])
corrplot.mixed(M)

Which factors have more than 70% correlation, whether positive or negative?

M[lower.tri(M, diag = T)] <- 0
threshold <- 0.7
sapply(row.names(M), function(i) { which(abs(M[i,]) > threshold)  })
## $mpg
##  cyl disp   hp   wt 
##    2    3    4    6 
## 
## $cyl
## disp   hp   wt 
##    3    4    6 
## 
## $disp
##   hp drat   wt 
##    4    5    6 
## 
## $hp
## qsec carb 
##    7    9 
## 
## $drat
## wt 
##  6 
## 
## $wt
## named integer(0)
## 
## $qsec
## named integer(0)
## 
## $gear
## named integer(0)
## 
## $carb
## named integer(0)

Scatter plots

mult_draw <- par(mfrow=c(2, 2))

for (i in row.names(M)) {
  for (j in colnames(M)) {
    if (abs(M[i,j]) > threshold) {
      plot(mtcars[,i]~mtcars[,j], xlab=j, ylab=i)
    }
  }
}

par(mult_draw)

All might be eligible for linear regression analysis, but it depends on research questions and hypotheses.

As of now, I will check the linear regression model of hp (horsepower) and mpg (miles per gallon), as it might be easier to make sense of.

plot(mpg~hp, data=mtcars)

Linear model fits

fit0 <- lm(mpg~hp, data = mtcars)
summary(fit0)
## 
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## hp          -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

F-statistics for ratio of variance is 45.46, with \(p < 0.05\). Therefore, association is significant.

Adjusted \(R^{2}\) is 58.9% for regression line \(Y = 30.1 - 0.068X\). Therefore, goodness of fit is 58.9%. Increase in horsepower of 1 results in reduction of miles per gallon of 0.068.

par(mfrow=c(2,2))
plot(fit0)

QQ plot

QQ plot shows data too peaked in the middle (excess kurtosis), with some right-skewness.

QQ interpretation

Other plots

  • Variances are not so constant.
  • Also, Residuals vs Leverage makes me consider alternate models as well.