Setup, libraries
knitr::opts_chunk$set(echo = TRUE, message=F)
library(lmSupport)
library(car)## Loading required package: carData
source('http://psych.colorado.edu/~jclab/R/mcSummaryLm.R') data
?mtcars
d <- mtcarsVars of interest:
DV = hp (gross horsepower)
IV1 = am (Transmission: 0 = automatic, 1 = manual)
IV2 = cyl (cylinders in the engine: 4, 6, or 8)
Cov = qsec (1/4 mile time)
ANOVA: Code for simple effects slide
Deriving prediction formula for example on slides
# Codes
## transmission
d$Transmission <- NA
d$Transmission[d$am == 0] <- -.5
d$Transmission[d$am == 1] <- .5
## cylinders
d$cyl6.48 <- NA
d$cyl6.48[d$cyl == 4] <- -1/3
d$cyl6.48[d$cyl == 6] <- 2/3
d$cyl6.48[d$cyl == 8] <- -1/3
d$cyl4.8 <- NA
d$cyl4.8[d$cyl == 4] <- -.5
d$cyl4.8[d$cyl == 6] <- 0
d$cyl4.8[d$cyl == 8] <- .5
# ANOVA model, not using the covariate/moderator of qsec
m1 <- lm(hp ~ Transmission * (cyl4.8 + cyl6.48), data = d)
summary(m1) ##
## Call:
## lm(formula = hp ~ Transmission * (cyl4.8 + cyl6.48), data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.17 -19.17 -7.75 14.46 50.83
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 151.188 6.208 24.354 < 2e-16 ***
## Transmission 39.653 12.416 3.194 0.00366 **
## cyl4.8 163.562 14.911 10.970 2.98e-11 ***
## cyl6.48 -41.594 13.420 -3.099 0.00462 **
## Transmission:cyl4.8 108.125 29.821 3.626 0.00123 **
## Transmission:cyl6.48 -34.854 26.839 -1.299 0.20547
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29.22 on 26 degrees of freedom
## Multiple R-squared: 0.8477, Adjusted R-squared: 0.8184
## F-statistic: 28.94 on 5 and 26 DF, p-value: 7.646e-10
So, looking at the output, we get this prediction equation: \(Horsepower_i = 151.19 + 39.65*Transmission_i + 163.56*cyl4.8_i – 41.59*cyl6.48_i\) \(+ 108.13*Transmission*cyl4.8i – 34.85*Transmission*cyl6.48_i + \epsilon_i\)
Simple effect for Transmission by hand would be: \((39.65 + 108.13*cyl4.8 – 34.85*cyl6.48 )*Transmission_i\)
And for a given level of cylinder (4, in this case): \((39.65 + 108.13*(-1/2) – 34.85*(-1/3) )*Transmission_i\)
And then solve! Simple effect of transmission for a 4 cylinder car is:
(39.65 + 108.13*(-1/2) - 34.85*(-1/3))## [1] -2.798333
And double checking using dummy codes:
# Codes for cyl = 4
d$four_6cyl <- NA
d$four_6cyl[d$cyl == 4] <- 0
d$four_6cyl[d$cyl == 6] <- 1
d$four_6cyl[d$cyl == 8] <- 0
d$four_8cyl <- NA
d$four_8cyl[d$cyl == 4] <- 0
d$four_8cyl[d$cyl == 6] <- 0
d$four_8cyl[d$cyl == 8] <- 1
# model
m1.2 <- lm(hp ~ Transmission * (four_6cyl + four_8cyl), data = d)
mcSummary(m1.2)## lm(formula = hp ~ Transmission * (four_6cyl + four_8cyl), data = d)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 123529.75 5 24705.950 0.848 28.939 0
## Error 22197.12 26 853.736
## Corr Total 145726.88 31 4700.867
##
## RMSE AdjEtaSq
## 29.219 0.818
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5
## (Intercept) 83.271 9.891 8.419 60515.186 0.732 NA 62.940
## Transmission -2.792 19.781 -0.141 17.004 0.001 0.283 -43.452
## four_6cyl 40.187 14.911 2.695 6201.735 0.218 0.702 9.538
## four_8cyl 163.562 14.911 10.970 102730.335 0.822 0.488 132.913
## Transmission:four_6cyl 19.208 29.821 0.644 354.202 0.016 0.551 -42.090
## Transmission:four_8cyl 108.125 29.821 3.626 11223.375 0.336 0.353 46.827
## CI_97.5 p
## (Intercept) 103.601 0.000
## Transmission 37.869 0.889
## four_6cyl 70.837 0.012
## four_8cyl 194.212 0.000
## Transmission:four_6cyl 80.507 0.525
## Transmission:four_8cyl 169.423 0.001
We also get -2.79! (off by a little b/c of rounding, but close enough)
# also this will just grab the coefficient you're interested in:
m1.2$coefficients[2]## Transmission
## -2.791667
ANCOVA? Homogeneity of Regression Assumption
I think a car’s quarter mile time will be a useful control in my model above! I’ll enter it as a covariate and re-run the model.
# model
m2 <- lm(hp ~ Transmission * (cyl4.8 + cyl6.48) + qsec, data = d)
summary(m2)##
## Call:
## lm(formula = hp ~ Transmission * (cyl4.8 + cyl6.48) + qsec, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -47.295 -15.315 -2.711 22.488 39.019
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 355.248 98.616 3.602 0.001365 **
## Transmission 9.037 18.840 0.480 0.635632
## cyl4.8 119.208 25.596 4.657 9.06e-05 ***
## cyl6.48 -41.677 12.642 -3.297 0.002929 **
## qsec -11.480 5.538 -2.073 0.048642 *
## Transmission:cyl4.8 107.293 28.096 3.819 0.000788 ***
## Transmission:cyl6.48 -38.666 25.351 -1.525 0.139747
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27.53 on 25 degrees of freedom
## Multiple R-squared: 0.87, Adjusted R-squared: 0.8388
## F-statistic: 27.89 on 6 and 25 DF, p-value: 6.35e-10
Looks like the effect of transmission was reduced when I entered in my covariate, indicating that the effect of transmission is due in part to its collinearity with a car’s quarter mile time.
But wait!! Is it actually a covariate? Or should it, in reality, be a moderator?
In other words, does our covariate interact with our predictors to predict our outcome?
Heterogeneity of Regression Assumption model! (i.e., fully interactive model)
m3 <- lm(hp ~ Transmission * (cyl4.8 + cyl6.48) * qsec, data = d) # qsec added as MODERATOR in this model, to check this assumption
summary(m3)##
## Call:
## lm(formula = hp ~ Transmission * (cyl4.8 + cyl6.48) * qsec, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.34 -10.85 0.00 12.41 41.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1340.16 879.53 -1.524 0.1432
## Transmission -3219.55 1759.05 -1.830 0.0822 .
## cyl4.8 -4895.15 2625.91 -1.864 0.0770 .
## cyl6.48 2903.79 1338.11 2.170 0.0422 *
## qsec 104.70 60.31 1.736 0.0980 .
## Transmission:cyl4.8 -10886.16 5251.82 -2.073 0.0513 .
## Transmission:cyl6.48 5664.10 2676.22 2.116 0.0470 *
## Transmission:qsec 225.48 120.62 1.869 0.0763 .
## cyl4.8:qsec 348.92 180.33 1.935 0.0673 .
## cyl6.48:qsec -199.04 91.37 -2.178 0.0415 *
## Transmission:cyl4.8:qsec 746.59 360.66 2.070 0.0516 .
## Transmission:cyl6.48:qsec -390.64 182.74 -2.138 0.0451 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.48 on 20 degrees of freedom
## Multiple R-squared: 0.9109, Adjusted R-squared: 0.8619
## F-statistic: 18.6 on 11 and 20 DF, p-value: 3.79e-08
Look at all the significant interactions that qsec has with our other predictors! This is a violation of our assumption, so qsec would not be an appropriate covariate in this model. Clearly works better as a moderator!
Also look at the effect of transmission in this model: It has increased in significance slightly, such that it’s now a marginal effect (.05 < p < .1). So we get a clearer picture of how these variables are related to each other.