Linearity in Regression Analysis

Sameer Mathur

Linearity in regression analysis of mtcars dataset**

Regression Diagnostics

---

mtcars

Motor Trend Car Road Tests

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

Source mtcars data

Data Description

  1. mpg Miles/(US) gallon
  2. cyl Number of cylinders
  3. disp Displacement (cu.in.)
  4. hp Gross horsepower
  5. drat Rear axle ratio
  6. wt Weight (1000 lbs)
  7. qsec ¼ mile time
  8. vs Engine (0 = V-shaped, 1 = straight)
  9. am Transmission (0 = automatic, 1 = manual)
  10. gear Number of forward gears
  11. carb Number of carburetors

Importing data

# importing data
data(mtcars)
# attaching data columns
attach(mtcars)
# data rows and columns
dim(mtcars)
[1] 32 11

First few rows of the cars dataset

# first few rows
head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Descriptive statistics

# descriptive statistics
library(psych)
describe(mtcars)[, c(1:5, 8:9)]
     vars  n   mean     sd median   min    max
mpg     1 32  20.09   6.03  19.20 10.40  33.90
cyl     2 32   6.19   1.79   6.00  4.00   8.00
disp    3 32 230.72 123.94 196.30 71.10 472.00
hp      4 32 146.69  68.56 123.00 52.00 335.00
drat    5 32   3.60   0.53   3.70  2.76   4.93
wt      6 32   3.22   0.98   3.33  1.51   5.42
qsec    7 32  17.85   1.79  17.71 14.50  22.90
vs      8 32   0.44   0.50   0.00  0.00   1.00
am      9 32   0.41   0.50   0.00  0.00   1.00
gear   10 32   3.69   0.74   4.00  3.00   5.00
carb   11 32   2.81   1.62   2.00  1.00   8.00

Regression Model

Multiple linear regression

# fitting simple linear model
fitmtcarsModel <- lm(mpg ~ am + wt + hp + disp + cyl, data = mtcars)
# summary of the fitted model
summary(fitmtcarsModel)

Call:
lm(formula = mpg ~ am + wt + hp + disp + cyl, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5952 -1.5864 -0.7157  1.2821  5.5725 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 38.20280    3.66910  10.412 9.08e-11 ***
am           1.55649    1.44054   1.080  0.28984    
wt          -3.30262    1.13364  -2.913  0.00726 ** 
hp          -0.02796    0.01392  -2.008  0.05510 .  
disp         0.01226    0.01171   1.047  0.30472    
cyl         -1.10638    0.67636  -1.636  0.11393    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.505 on 26 degrees of freedom
Multiple R-squared:  0.8551,    Adjusted R-squared:  0.8273 
F-statistic:  30.7 on 5 and 26 DF,  p-value: 4.029e-10

Diagnostic Plots

# residual plots of OLS model
par(mfrow=c(2,2))
plot(fitmtcarsModel)

The linearity assumption can be checked by inspecting the Residuals vs Fitted plot (1st plot) from the Diagnostic Plots.

plot of chunk unnamed-chunk-7

Linearity of the data (Residual vs. Fitted Plot)

# residual vs. fitted plot
plot(fitmtcarsModel, 1)

plot of chunk unnamed-chunk-9

The red line should be approximately horizontal

Rectifying Non-linearity

Box-Cox Transformation

library(caret)
mpgTrans <- BoxCoxTrans(mtcars$mpg)
mpgTrans
Box-Cox Transformation

32 data points used to estimate Lambda

Input data summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.43   19.20   20.09   22.80   33.90 

Largest/Smallest: 3.26 
Sample Skewness: 0.611 

Estimated Lambda: 0 
With fudge factor, Lambda = 0 will be used for transformations
# append the transformed variable to mtcars
mtcars <- cbind(mtcars, mpgNew = predict(mpgTrans, mtcars$mpg))
# first few rows of the datset
head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
                    mpgNew
Mazda RX4         3.044522
Mazda RX4 Wag     3.044522
Datsun 710        3.126761
Hornet 4 Drive    3.063391
Hornet Sportabout 2.928524
Valiant           2.895912

Regression model on transformed data

The new regresison model will be based on the transformed data.

Multiple Linear Regression

# fitting multiple linear model
fitmtcarsTransModel <- lm(mpgNew ~ am + wt + hp + disp + cyl, data = mtcars)
# summary of the fitted model
summary(fitmtcarsTransModel)

Call:
lm(formula = mpgNew ~ am + wt + hp + disp + cyl, data = mtcars)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.15374 -0.07660 -0.03053  0.07504  0.24276 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.8592092  0.1656755  23.294  < 2e-16 ***
am           0.0275168  0.0650464   0.423  0.67575    
wt          -0.1718588  0.0511888  -3.357  0.00243 ** 
hp          -0.0011999  0.0006286  -1.909  0.06739 .  
disp         0.0001290  0.0005286   0.244  0.80910    
cyl         -0.0345400  0.0305403  -1.131  0.26840    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1131 on 26 degrees of freedom
Multiple R-squared:  0.879, Adjusted R-squared:  0.8557 
F-statistic: 37.78 on 5 and 26 DF,  p-value: 4.03e-11

Residual versus Fitted plot after transformation

# residual vs. fitted plot
plot(fitmtcarsTransModel, 1)

plot of chunk unnamed-chunk-14

Comparing Residual versus Fitted plots before and after Box-Cox transformation

Before Box-Cox transformation

# residual vs. fitted plot
plot(fitmtcarsModel, 1)

plot of chunk unnamed-chunk-15

After Box-Cox transformation

# residual vs. fitted plot
plot(fitmtcarsTransModel, 1)

plot of chunk unnamed-chunk-16

We can see that after Box-Cox transformation the red line become flatter, implying that the log-linear model is closer to satisfying the linearity condition, compared to the linear-linear model.