Sameer Mathur
Linearity in regression analysis of mtcars dataset**
Regression Diagnostics
---
Motor Trend Car Road Tests
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
Data Description
mpg Miles/(US) galloncyl Number of cylindersdisp Displacement (cu.in.)hp Gross horsepowerdrat Rear axle ratiowt Weight (1000 lbs)qsec ¼ mile timevs Engine (0 = V-shaped, 1 = straight)am Transmission (0 = automatic, 1 = manual)gear Number of forward gearscarb Number of carburetors# importing data
data(mtcars)
# attaching data columns
attach(mtcars)
# data rows and columns
dim(mtcars)
[1] 32 11
# first few rows
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# descriptive statistics
library(psych)
describe(mtcars)[, c(1:5, 8:9)]
vars n mean sd median min max
mpg 1 32 20.09 6.03 19.20 10.40 33.90
cyl 2 32 6.19 1.79 6.00 4.00 8.00
disp 3 32 230.72 123.94 196.30 71.10 472.00
hp 4 32 146.69 68.56 123.00 52.00 335.00
drat 5 32 3.60 0.53 3.70 2.76 4.93
wt 6 32 3.22 0.98 3.33 1.51 5.42
qsec 7 32 17.85 1.79 17.71 14.50 22.90
vs 8 32 0.44 0.50 0.00 0.00 1.00
am 9 32 0.41 0.50 0.00 0.00 1.00
gear 10 32 3.69 0.74 4.00 3.00 5.00
carb 11 32 2.81 1.62 2.00 1.00 8.00
# fitting simple linear model
fitmtcarsModel <- lm(mpg ~ am + wt + hp + disp + cyl, data = mtcars)
# summary of the fitted model
summary(fitmtcarsModel)
Call:
lm(formula = mpg ~ am + wt + hp + disp + cyl, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.5952 -1.5864 -0.7157 1.2821 5.5725
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.20280 3.66910 10.412 9.08e-11 ***
am 1.55649 1.44054 1.080 0.28984
wt -3.30262 1.13364 -2.913 0.00726 **
hp -0.02796 0.01392 -2.008 0.05510 .
disp 0.01226 0.01171 1.047 0.30472
cyl -1.10638 0.67636 -1.636 0.11393
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.505 on 26 degrees of freedom
Multiple R-squared: 0.8551, Adjusted R-squared: 0.8273
F-statistic: 30.7 on 5 and 26 DF, p-value: 4.029e-10
# residual plots of OLS model
par(mfrow=c(2,2))
plot(fitmtcarsModel)
The linearity assumption can be checked by inspecting the Residuals vs Fitted plot (1st plot) from the Diagnostic Plots.
# residual vs. fitted plot
plot(fitmtcarsModel, 1)
library(caret)
mpgTrans <- BoxCoxTrans(mtcars$mpg)
mpgTrans
Box-Cox Transformation
32 data points used to estimate Lambda
Input data summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.40 15.43 19.20 20.09 22.80 33.90
Largest/Smallest: 3.26
Sample Skewness: 0.611
Estimated Lambda: 0
With fudge factor, Lambda = 0 will be used for transformations
# append the transformed variable to mtcars
mtcars <- cbind(mtcars, mpgNew = predict(mpgTrans, mtcars$mpg))
# first few rows of the datset
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mpgNew
Mazda RX4 3.044522
Mazda RX4 Wag 3.044522
Datsun 710 3.126761
Hornet 4 Drive 3.063391
Hornet Sportabout 2.928524
Valiant 2.895912
The new regresison model will be based on the transformed data.
# fitting multiple linear model
fitmtcarsTransModel <- lm(mpgNew ~ am + wt + hp + disp + cyl, data = mtcars)
# summary of the fitted model
summary(fitmtcarsTransModel)
Call:
lm(formula = mpgNew ~ am + wt + hp + disp + cyl, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.15374 -0.07660 -0.03053 0.07504 0.24276
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.8592092 0.1656755 23.294 < 2e-16 ***
am 0.0275168 0.0650464 0.423 0.67575
wt -0.1718588 0.0511888 -3.357 0.00243 **
hp -0.0011999 0.0006286 -1.909 0.06739 .
disp 0.0001290 0.0005286 0.244 0.80910
cyl -0.0345400 0.0305403 -1.131 0.26840
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1131 on 26 degrees of freedom
Multiple R-squared: 0.879, Adjusted R-squared: 0.8557
F-statistic: 37.78 on 5 and 26 DF, p-value: 4.03e-11
# residual vs. fitted plot
plot(fitmtcarsTransModel, 1)
Before Box-Cox transformation
# residual vs. fitted plot
plot(fitmtcarsModel, 1)
After Box-Cox transformation
# residual vs. fitted plot
plot(fitmtcarsTransModel, 1)
We can see that after Box-Cox transformation the red line become flatter, implying that the log-linear model is closer to satisfying the linearity condition, compared to the linear-linear model.