Welcome to week 7! This week is all about moderation and mediation and our examples will mostly be applied.
First thing we do is load the necessary libraries
Next, read in the data for our moderation example.
Prompt: We are interested in number of publications faculty members have at the end of their first year. We survey 29 faculty members at the beginning of their first year on a number of measures including the Big Five and a measure of their general writing ability. At the end of year one we followed up and asked them to report the total number of published papers on their CV.
1. Create the following variables and then correlate them, what does this tell us about multiccolinearity? Which terms should we use? 1) An interaction term between RAW agreeableness and ability. 2) Agreeableness centered 3) Ability centered 4) An interaction from CENTERED agreeableness and ability.
# Note: In r, we don't have to manually create the interaction
# terms to do regression, but we'll do both to show you
# they are equivalent
data$agree_ability_i <- data$ability * data$agree
data$agree_c <- scale(data$agree,scale=F)
data$ability_c <- scale(data$ability,scale=F)
data$agreec_abilityc_i <- data$ability_c * data$agree_c
cor(data[,c("agree","ability","agree_ability_i","agree_c"
,"ability_c","agreec_abilityc_i")])
## agree ability agree_ability_i agree_c
## agree 1.00000000 -0.02753552 0.7272881 1.00000000
## ability -0.02753552 1.00000000 0.6501399 -0.02753552
## agree_ability_i 0.72728812 0.65013988 1.0000000 0.72728812
## agree_c 1.00000000 -0.02753552 0.7272881 1.00000000
## ability_c -0.02753552 1.00000000 0.6501399 -0.02753552
## agreec_abilityc_i 0.47965423 -0.09802281 0.4202844 0.47965423
## ability_c agreec_abilityc_i
## agree -0.02753552 0.47965423
## ability 1.00000000 -0.09802281
## agree_ability_i 0.65013988 0.42028443
## agree_c -0.02753552 0.47965423
## ability_c 1.00000000 -0.09802281
## agreec_abilityc_i -0.09802281 1.00000000
Answer: This shows that the raw interaction term between agreeableness and ability are highly correlated, reflecting high multicollinearity. Centering the variables weakens the correlation. We should use the centered terms, as it reduces multicollinearity.
2. Regress publications on agree, ability and the interaction between agree and ability (all uncentered). Then, regress edits on agree, ability and the interaction between agreement and ability (all centered). How does this impact our results? How is this related to multicollinearity?
model.uncentered <- lm(pubs ~ agree * ability, data=data)
summary(model.uncentered)
##
## Call:
## lm(formula = pubs ~ agree * ability, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6174 -2.1973 -0.0545 2.3517 6.5404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.5336 12.5903 -0.757 0.4560
## agree 2.2717 2.7334 0.831 0.4138
## ability 5.6610 2.4170 2.342 0.0274 *
## agree:ability -0.7068 0.5257 -1.344 0.1909
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5085
## F-statistic: 10.66 on 3 and 25 DF, p-value: 0.000107
model.uncentered2 <- lm(pubs ~ agree + ability + agree_ability_i, data=data)
summary(model.uncentered2)
##
## Call:
## lm(formula = pubs ~ agree + ability + agree_ability_i, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6174 -2.1973 -0.0545 2.3517 6.5404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.5336 12.5903 -0.757 0.4560
## agree 2.2717 2.7334 0.831 0.4138
## ability 5.6610 2.4170 2.342 0.0274 *
## agree_ability_i -0.7068 0.5257 -1.344 0.1909
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5085
## F-statistic: 10.66 on 3 and 25 DF, p-value: 0.000107
model.centered <- lm(pubs ~ agree_c * ability_c,data=data)
summary(model.centered)
##
## Call:
## lm(formula = pubs ~ agree_c * ability_c, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6174 -2.1973 -0.0545 2.3517 6.5404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.8974 0.6875 17.305 1.99e-15 ***
## agree_c -0.9455 0.6023 -1.570 0.129
## ability_c 2.4194 0.5199 4.654 9.14e-05 ***
## agree_c:ability_c -0.7068 0.5257 -1.344 0.191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5085
## F-statistic: 10.66 on 3 and 25 DF, p-value: 0.000107
model.centered2 <- lm(pubs ~ agree_c + ability_c + agreec_abilityc_i, data=data)
summary(model.centered2)
##
## Call:
## lm(formula = pubs ~ agree_c + ability_c + agreec_abilityc_i,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6174 -2.1973 -0.0545 2.3517 6.5404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.8974 0.6875 17.305 1.99e-15 ***
## agree_c -0.9455 0.6023 -1.570 0.129
## ability_c 2.4194 0.5199 4.654 9.14e-05 ***
## agreec_abilityc_i -0.7068 0.5257 -1.344 0.191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5085
## F-statistic: 10.66 on 3 and 25 DF, p-value: 0.000107
Answer: Regressing edits on the centered terms shows significant effects on the intercept because the intercept is now set at the average rather than at zero, and a lower standard error. This is a reflection of reduced multicollinearity.
3. How does the model fit?
model.centered.sum <- summary(model.centered)
sqrt(model.centered.sum$r.square)
## [1] 0.7490943
Answer: The model is a good fit, with multiple R at ~0.75.
4. Does the interaction add significantly to our ability to predict edits? Explain. Anwer: No, it does not. This is reflected in the output that shows a p-value of .191.
5. What is wrong with automatically calculated betas when working with interactions? How can we get the correct ones? What are they?
data.z <- as.data.frame(scale(data))
model.centered <- lm(pubs ~ agree_c * ability_c, data=data.z)
summary(model.centered)
##
## Call:
## lm(formula = pubs ~ agree_c * ability_c, data = data.z)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.01182 -0.41635 -0.01033 0.44561 1.23929
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.00637 0.13027 -0.049 0.961
## agree_c -0.23708 0.15103 -1.570 0.129
## ability_c 0.61975 0.13317 4.654 9.14e-05 ***
## agree_c:ability_c -0.23960 0.17822 -1.344 0.191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7011 on 25 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5085
## F-statistic: 10.66 on 3 and 25 DF, p-value: 0.000107
describe(data.z$agree_c)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 29 0 1 0.31 -0.02 1.12 -1.95 1.82 3.78 0.04 -1.03 0.19
describe(data.z$pubs)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 29 0 1 -0.18 0 1.12 -1.88 1.72 3.6 0.15 -1.08 0.19
Answer: The problem is with the statistical software, as it does not accurately standardize terms. To get the correct ones, we should z-score X1 and X2, and then multiply them in order to get the interaction term. Doing so will allow us to use the model as intended, producing standardized betas and an accurate interaction term. The betas here are agree=-.23; ability=.61; and interaction=-.23
6. Graph the data, what does this indicate and should we be interpreting the interaction?
First we will need to create separate objects with our predicted data
predict(model.centered, data.frame(agree_c=0, ability_c=0))
## 1
## -0.006369963
#cool its our intercept (because the intercept is = to when the predictors are at 0)
#now we will find a range of values to plot at medium, low, and high values of
#agree_c which is z-scored so its really "agree_z.."
#cool its our intercept (because the intercept is = to when the predictors are at 0)
#now we will find a range of values to plot at medium, low, and high values of
#agree_c which is z-scored so its really "agree_z.."
p1<-predict(model.centered, data.frame(agree_c=0, ability_c=-2:2))
p2<-predict(model.centered, data.frame(agree_c=-1, ability_c=-2:2))
p3<-predict(model.centered, data.frame(agree_c=1, ability_c=-2:2))
#note that each p1, p2, and p3 will have five predicted values (corresponding to the five
#numbers between -2 to 2 - only gives whole numbers but we dont need every point since its a line)
p1
## 1 2 3 4 5
## -1.245860990 -0.626115476 -0.006369963 0.613375551 1.233121065
#note that the "3" point for this p1 above is actually where ability_c=0
#(and reflects the intercept again, where agree=0 and ability=0)
#now lets look at the rest of our predicted values across low and high agree
p2
## 1 2 3 4 5
## -1.4879815 -0.6286377 0.2307060 1.0900498 1.9493935
p3
## 1 2 3 4 5
## -1.0037405 -0.6235932 -0.2434460 0.1367013 0.5168486
Now we can plot. Be sure to run this chunk altogether or else you will run into errors
#Ok now we will do a somewhat ugly plot
plot(pubs~ability_c, data.z)
lines(-2:2, p1)
#this line shows the relationship between ability and publications for AVERAGE agreeablenes
lines(-2:2, p2)
#this line shows the relationship between ability and publications for LOW agreeablenes
lines(-2:2, p3)
#this line shows the relationship between ability and publications for HIGH agreeablenes
#so try to interpret this interaction! (if need be, re run to make lines appear again)
Answer: The graph does show an interaction, but as we observed in the model estimates, these are not significant.
We will be using a new prompt and new data set for our mediation example.
Prompt: We want to know if people that believe in luck are luckier than people that do not believe in luck. Participants were asked how much they believe in luck. They were then allowed to play up to eight games of chance, but had to play at least one. Look at the relationship between belief in luck, games played, and wins.
data2 <- read.csv("HW7Data_2.csv", header=T, stringsAsFactors = FALSE, na.strings=c("","NA"))
1. Does belief in luck predict wins?
data2.z <- as.data.frame(scale(data2))
model.luck <- lm(win ~ luck, data = data2.z)
summary(model.luck)
##
## Call:
## lm(formula = win ~ luck, data = data2.z)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.91169 -0.59399 0.04979 0.69357 1.98113
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.974e-17 8.027e-02 0.000 1
## luck 4.829e-01 8.061e-02 5.991 2.32e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8794 on 118 degrees of freedom
## Multiple R-squared: 0.2332, Adjusted R-squared: 0.2267
## F-statistic: 35.89 on 1 and 118 DF, p-value: 2.322e-08
# what did we just estimate?? the TOTAL EFFECT or aka c!
Answer: It does, interestingly enough.
2. Skeptical of this relationship, you realize those that believe in luck may not actually win more, but may instead participate in games of chance more often (possibly believing the outcome that should happen always will or something like that). What is this called? Draw this model.
ModelVis <- c(0, "'.553'", 0,
0, 0, 0,
"'.812'", "'.032*'", 0)
M2<- matrix (nrow=3, ncol=3, byrow = TRUE, data=ModelVis)
plot<- plotmat(M2, pos=c(1,2),
name= c( "Play","Luck", "Win"),
box.type = "rect", box.size = 0.12, box.prop=0.5, curve=0)
Answer: this is called a mediation model.
3. Estimate each of the path coefficients and add them to the model.
# first lets estimante path a from luck to play (using betas)
model.a <- lm(play ~ luck, data = data2.z)
summary(model.a)
##
## Call:
## lm(formula = play ~ luck, data = data2.z)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0639 -0.6006 0.1213 0.8236 2.2869
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.760e-16 7.640e-02 0.000 1
## luck 5.527e-01 7.672e-02 7.204 5.96e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8369 on 118 degrees of freedom
## Multiple R-squared: 0.3054, Adjusted R-squared: 0.2996
## F-statistic: 51.89 on 1 and 118 DF, p-value: 5.961e-11
# now lets estimate b and c'
model.bc <- lm(win ~ luck + play, data = data2.z)
summary(model.bc)
##
## Call:
## lm(formula = win ~ luck + play, data = data2.z)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.85048 -0.14978 -0.03446 0.54099 0.71041
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.350e-16 5.117e-02 0.000 1.00
## luck 3.418e-02 6.165e-02 0.554 0.58
## play 8.120e-01 6.165e-02 13.170 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5605 on 117 degrees of freedom
## Multiple R-squared: 0.6911, Adjusted R-squared: 0.6858
## F-statistic: 130.9 on 2 and 117 DF, p-value: < 2.2e-16
#we can also do this using lavaan
path.mediate <-'win ~ play + luck
play ~ luck'
model.path = lavaan::sem(model=path.mediate, data=data2)
summary(model.path, standardized=T, fit.measure = T)
## lavaan 0.6-7 ended normally after 16 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 5
##
## Number of observations 120
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Model Test Baseline Model:
##
## Test statistic 184.710
## Degrees of freedom 3
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -339.143
## Loglikelihood unrestricted model (H1) -339.143
##
## Akaike (AIC) 688.287
## Bayesian (BIC) 702.224
## Sample-size adjusted Bayesian (BIC) 686.417
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value RMSEA <= 0.05 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## win ~
## play 0.910 0.068 13.337 0.000 0.910 0.812
## luck 0.067 0.120 0.561 0.575 0.067 0.034
## play ~
## luck 0.973 0.134 7.265 0.000 0.973 0.553
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .win 0.739 0.095 7.746 0.000 0.739 0.309
## .play 1.322 0.171 7.746 0.000 1.322 0.695
#we dont need extra fit measures (beyond the scope of the class) but I wanted to keep in case you might
#want it one day, below commented out
#fitMeasures(model.path, fit.measures='all')
#ok so in the output, if you note the "Std.all" collumn, this should match exactly with what we calculated
#in a piece-wise fashion (above) because this model is just identified.
#You would also get these Beta values under "Estimate" if you entered the "data2.Z" instead of "data2"
#now we will remove the c' path (the regression of play on luck) to give us a df to test our model
path.mediate2 <-'win ~ play + luck
play~~0*luck' #I am saying here, set the covariance here to 0 (drop the path)
model.path2 = lavaan::sem(model=path.mediate2, data=data2)
summary(model.path2, standardized=T, fit.measures = T)
## lavaan 0.6-7 ended normally after 14 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 5
##
## Number of observations 120
##
## Model Test User Model:
##
## Test statistic 43.739
## Degrees of freedom 1
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 184.710
## Degrees of freedom 3
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.765
## Tucker-Lewis Index (TLI) 0.294
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -502.036
## Loglikelihood unrestricted model (H1) -480.167
##
## Akaike (AIC) 1014.072
## Bayesian (BIC) 1028.010
## Sample-size adjusted Bayesian (BIC) 1012.202
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.597
## 90 Percent confidence interval - lower 0.454
## 90 Percent confidence interval - upper 0.754
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.291
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## win ~
## play 0.910 0.057 16.004 0.000 0.910 0.825
## luck 0.067 0.100 0.674 0.501 0.067 0.035
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## play ~~
## luck 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .win 0.739 0.095 7.746 0.000 0.739 0.319
## play 1.903 0.246 7.746 0.000 1.903 1.000
## luck 0.614 0.079 7.746 0.000 0.614 1.000
#Interpret the Chi Square, also known as the "badness of fit statistic"
#does this model fit well without that path? or not?
Answer: It fits well with and without the path, as reflected in the p-values.
4. Do we have evidence of mediation? Explain. Answer: C’ is not significant, but A, B, and C are, so there is partial mediation.
5. How can we test that the path between luck and wins has actually decreased? (Hint: there are two ways.) Answer: We can use the bootstrap estimates and test the difference between the betas.
6. How can we get an effect size for the amount of mediation? Answer: We could use the Sobel test.