library(corrplot)
## corrplot 0.84 loaded
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
A data frame with 32 observations on 11 (numeric) variables.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders -- ordinal (3)
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs Engine (0 = V-shaped, 1 = straight) -- ordinal
[, 9] am Transmission (0 = automatic, 1 = manual) -- ordinal
[,10] gear Number of forward gears -- ordinal (3)
[,11] carb Number of carburetors -- ordinal (5)
Row 8 and 9 are yes-no types and should not be checked for linear regression.
M <- cor(mtcars[, -c(8,9)])
corrplot.mixed(M)
Which factors have more than 70% correlation, whether positive or negative?
M[lower.tri(M, diag = T)] <- 0
threshold <- 0.7
sapply(row.names(M), function(i) { which(abs(M[i,]) > threshold) })
## $mpg
## cyl disp hp wt
## 2 3 4 6
##
## $cyl
## disp hp wt
## 3 4 6
##
## $disp
## hp drat wt
## 4 5 6
##
## $hp
## qsec carb
## 7 9
##
## $drat
## wt
## 6
##
## $wt
## named integer(0)
##
## $qsec
## named integer(0)
##
## $gear
## named integer(0)
##
## $carb
## named integer(0)
mult_draw <- par(mfrow=c(2, 2))
for (i in row.names(M)) {
for (j in colnames(M)) {
if (abs(M[i,j]) > threshold) {
plot(mtcars[,i]~mtcars[,j], xlab=j, ylab=i)
}
}
}
par(mult_draw)
All might be eligible for linear regression analysis, but it depends on research questions and hypotheses.
As of now, I will check the linear regression model of hp (horsepower) and mpg (miles per gallon), as it might be easier to make sense of.
plot(mpg~hp, data=mtcars)
fit0 <- lm(mpg~hp, data = mtcars)
summary(fit0)
##
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7121 -2.1122 -0.8854 1.5819 8.2360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
## hp -0.06823 0.01012 -6.742 1.79e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
## F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
F-statistics for ratio of variance is 45.46, with \(p < 0.05\). Therefore, association is significant.
Adjusted \(R^{2}\) is 58.9% for regression line \(Y = 30.1 - 0.068X\). Therefore, goodness of fit is 58.9%. Increase in horsepower of 1 results in reduction of miles per gallon of 0.068.
par(mfrow=c(2,2))
plot(fit0)
QQ plot shows data too peaked in the middle (excess kurtosis), with some right-skewness.
QQ interpretation