Setup, libraries

knitr::opts_chunk$set(echo = TRUE, message=F)
library(lmSupport)
library(car)
## Loading required package: carData
source('http://psych.colorado.edu/~jclab/R/mcSummaryLm.R') 

data

?mtcars
d <- mtcars

Vars of interest:

DV = hp (gross horsepower)

IV1 = am (Transmission: 0 = automatic, 1 = manual)

IV2 = cyl (cylinders in the engine: 4, 6, or 8)

Cov = qsec (1/4 mile time)

ANOVA: Code for simple effects slide

Deriving prediction formula for example on slides

# Codes
## transmission
d$Transmission <- NA
d$Transmission[d$am == 0] <- -.5
d$Transmission[d$am == 1] <- .5

## cylinders
d$cyl6.48 <- NA
d$cyl6.48[d$cyl == 4] <- -1/3
d$cyl6.48[d$cyl == 6] <- 2/3
d$cyl6.48[d$cyl == 8] <- -1/3

d$cyl4.8 <- NA
d$cyl4.8[d$cyl == 4] <- -.5
d$cyl4.8[d$cyl == 6] <- 0
d$cyl4.8[d$cyl == 8] <- .5

# ANOVA model, not using the covariate/moderator of qsec
m1 <- lm(hp ~ Transmission * (cyl4.8 + cyl6.48), data = d)
summary(m1) 
## 
## Call:
## lm(formula = hp ~ Transmission * (cyl4.8 + cyl6.48), data = d)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -44.17 -19.17  -7.75  14.46  50.83 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           151.188      6.208  24.354  < 2e-16 ***
## Transmission           39.653     12.416   3.194  0.00366 ** 
## cyl4.8                163.562     14.911  10.970 2.98e-11 ***
## cyl6.48               -41.594     13.420  -3.099  0.00462 ** 
## Transmission:cyl4.8   108.125     29.821   3.626  0.00123 ** 
## Transmission:cyl6.48  -34.854     26.839  -1.299  0.20547    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.22 on 26 degrees of freedom
## Multiple R-squared:  0.8477, Adjusted R-squared:  0.8184 
## F-statistic: 28.94 on 5 and 26 DF,  p-value: 7.646e-10

So, looking at the output, we get this prediction equation: \(Horsepower_i = 151.19 + 39.65*Transmission_i + 163.56*cyl4.8_i – 41.59*cyl6.48_i\) \(+ 108.13*Transmission*cyl4.8i – 34.85*Transmission*cyl6.48_i + \epsilon_i\)

Simple effect for Transmission by hand would be: \((39.65 + 108.13*cyl4.8 – 34.85*cyl6.48 )*Transmission_i\)

And for a given level of cylinder (4, in this case): \((39.65 + 108.13*(-1/2) – 34.85*(-1/3) )*Transmission_i\)

And then solve! Simple effect of transmission for a 4 cylinder car is:

(39.65 + 108.13*(-1/2) - 34.85*(-1/3))
## [1] -2.798333

And double checking using dummy codes:

# Codes for cyl = 4
d$four_6cyl <- NA
d$four_6cyl[d$cyl == 4] <- 0
d$four_6cyl[d$cyl == 6] <- 1
d$four_6cyl[d$cyl == 8] <- 0

d$four_8cyl <- NA
d$four_8cyl[d$cyl == 4] <- 0
d$four_8cyl[d$cyl == 6] <- 0
d$four_8cyl[d$cyl == 8] <- 1

# model
m1.2 <- lm(hp ~ Transmission * (four_6cyl + four_8cyl), data = d)
mcSummary(m1.2)
## lm(formula = hp ~ Transmission * (four_6cyl + four_8cyl), data = d)
## 
## Omnibus ANOVA
##                   SS df        MS EtaSq      F p
## Model      123529.75  5 24705.950 0.848 28.939 0
## Error       22197.12 26   853.736               
## Corr Total 145726.88 31  4700.867               
## 
##    RMSE AdjEtaSq
##  29.219    0.818
## 
## Coefficients
##                            Est  StErr      t     SSR(3) EtaSq   tol  CI_2.5
## (Intercept)             83.271  9.891  8.419  60515.186 0.732    NA  62.940
## Transmission            -2.792 19.781 -0.141     17.004 0.001 0.283 -43.452
## four_6cyl               40.187 14.911  2.695   6201.735 0.218 0.702   9.538
## four_8cyl              163.562 14.911 10.970 102730.335 0.822 0.488 132.913
## Transmission:four_6cyl  19.208 29.821  0.644    354.202 0.016 0.551 -42.090
## Transmission:four_8cyl 108.125 29.821  3.626  11223.375 0.336 0.353  46.827
##                        CI_97.5     p
## (Intercept)            103.601 0.000
## Transmission            37.869 0.889
## four_6cyl               70.837 0.012
## four_8cyl              194.212 0.000
## Transmission:four_6cyl  80.507 0.525
## Transmission:four_8cyl 169.423 0.001

We also get -2.79! (off by a little b/c of rounding, but close enough)

# also this will just grab the coefficient you're interested in:
m1.2$coefficients[2]
## Transmission 
##    -2.791667

ANCOVA? Homogeneity of Regression Assumption

I think a car’s quarter mile time will be a useful control in my model above! I’ll enter it as a covariate and re-run the model.

# model
m2 <- lm(hp ~ Transmission * (cyl4.8 + cyl6.48) + qsec, data = d)
summary(m2)
## 
## Call:
## lm(formula = hp ~ Transmission * (cyl4.8 + cyl6.48) + qsec, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -47.295 -15.315  -2.711  22.488  39.019 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           355.248     98.616   3.602 0.001365 ** 
## Transmission            9.037     18.840   0.480 0.635632    
## cyl4.8                119.208     25.596   4.657 9.06e-05 ***
## cyl6.48               -41.677     12.642  -3.297 0.002929 ** 
## qsec                  -11.480      5.538  -2.073 0.048642 *  
## Transmission:cyl4.8   107.293     28.096   3.819 0.000788 ***
## Transmission:cyl6.48  -38.666     25.351  -1.525 0.139747    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.53 on 25 degrees of freedom
## Multiple R-squared:   0.87,  Adjusted R-squared:  0.8388 
## F-statistic: 27.89 on 6 and 25 DF,  p-value: 6.35e-10

Looks like the effect of transmission was reduced when I entered in my covariate, indicating that the effect of transmission is due in part to its collinearity with a car’s quarter mile time.

But wait!! Is it actually a covariate? Or should it, in reality, be a moderator?

In other words, does our covariate interact with our predictors to predict our outcome?

Heterogeneity of Regression Assumption model! (i.e., fully interactive model)

m3 <- lm(hp ~ Transmission * (cyl4.8 + cyl6.48) * qsec, data = d) # qsec added as MODERATOR in this model, to check this assumption
summary(m3)
## 
## Call:
## lm(formula = hp ~ Transmission * (cyl4.8 + cyl6.48) * qsec, data = d)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -49.34 -10.85   0.00  12.41  41.10 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                -1340.16     879.53  -1.524   0.1432  
## Transmission               -3219.55    1759.05  -1.830   0.0822 .
## cyl4.8                     -4895.15    2625.91  -1.864   0.0770 .
## cyl6.48                     2903.79    1338.11   2.170   0.0422 *
## qsec                         104.70      60.31   1.736   0.0980 .
## Transmission:cyl4.8       -10886.16    5251.82  -2.073   0.0513 .
## Transmission:cyl6.48        5664.10    2676.22   2.116   0.0470 *
## Transmission:qsec            225.48     120.62   1.869   0.0763 .
## cyl4.8:qsec                  348.92     180.33   1.935   0.0673 .
## cyl6.48:qsec                -199.04      91.37  -2.178   0.0415 *
## Transmission:cyl4.8:qsec     746.59     360.66   2.070   0.0516 .
## Transmission:cyl6.48:qsec   -390.64     182.74  -2.138   0.0451 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.48 on 20 degrees of freedom
## Multiple R-squared:  0.9109, Adjusted R-squared:  0.8619 
## F-statistic:  18.6 on 11 and 20 DF,  p-value: 3.79e-08

Look at all the significant interactions that qsec has with our other predictors! This is a violation of our assumption, so qsec would not be an appropriate covariate in this model. Clearly works better as a moderator!

Also look at the effect of transmission in this model: It has increased in significance slightly, such that it’s now a marginal effect (.05 < p < .1). So we get a clearer picture of how these variables are related to each other.