Welcome to week 7! This week is all about moderation and mediation and our examples will mostly be applied.

Moderation

First thing we do is load the necessary libraries

library(ggplot2)
library(psych)
install.packages("lavaan")
library(lavaan)
require(MASS)
install.packages("diagram")
library(diagram) #for creating mediation visual

Next, read in the data for our moderation example.

Prompt: We are interested in number of publications faculty members have at the end of their first year. We survey 29 faculty members at the beginning of their first year on a number of measures including the Big Five and a measure of their general writing ability. At the end of year one we followed up and asked them to report the total number of published papers on their CV.

1. Create the following variables and then correlate them, what does this tell us about multiccolinearity? Which terms should we use? 1) An interaction term between RAW agreeableness and ability. 2) Agreeableness centered 3) Ability centered 4) An interaction from CENTERED agreeableness and ability.

# Note: In r, we don't have to manually create the interaction
# terms to do regression, but we'll do both to show you
# they are equivalent
data$agree_ability_i <- data$ability * data$agree
data$agree_c <- scale(data$agree,scale=F)
data$ability_c <- scale(data$ability,scale=F)
data$agreec_abilityc_i <- data$ability_c * data$agree_c
cor(data[,c("agree","ability","agree_ability_i","agree_c"
            ,"ability_c","agreec_abilityc_i")])
##                         agree     ability agree_ability_i     agree_c
## agree              1.00000000 -0.02753552       0.7272881  1.00000000
## ability           -0.02753552  1.00000000       0.6501399 -0.02753552
## agree_ability_i    0.72728812  0.65013988       1.0000000  0.72728812
## agree_c            1.00000000 -0.02753552       0.7272881  1.00000000
## ability_c         -0.02753552  1.00000000       0.6501399 -0.02753552
## agreec_abilityc_i  0.47965423 -0.09802281       0.4202844  0.47965423
##                     ability_c agreec_abilityc_i
## agree             -0.02753552        0.47965423
## ability            1.00000000       -0.09802281
## agree_ability_i    0.65013988        0.42028443
## agree_c           -0.02753552        0.47965423
## ability_c          1.00000000       -0.09802281
## agreec_abilityc_i -0.09802281        1.00000000

Answer: We should use agreeableness centered, ability centered, and an interaction term with centered agreeableness and ability. Centering the IV’s before crossing them reduces multicolinearity.

2. Regress publications on agree, ability and the interaction between agree and ability (all uncentered). Then, regress edits on agree, ability and the interaction between agreement and ability (all centered). How does this impact our results? How is this related to multicollinearity?

model.uncentered <- lm(pubs ~ agree * ability, data=data)
summary(model.uncentered)
## 
## Call:
## lm(formula = pubs ~ agree * ability, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.6174  -2.1973  -0.0545   2.3517   6.5404 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    -9.5336    12.5903  -0.757   0.4560  
## agree           2.2717     2.7334   0.831   0.4138  
## ability         5.6610     2.4170   2.342   0.0274 *
## agree:ability  -0.7068     0.5257  -1.344   0.1909  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared:  0.5611, Adjusted R-squared:  0.5085 
## F-statistic: 10.66 on 3 and 25 DF,  p-value: 0.000107
model.uncentered2 <- lm(pubs ~ agree + ability + agree_ability_i, data=data)
summary(model.uncentered2)
## 
## Call:
## lm(formula = pubs ~ agree + ability + agree_ability_i, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.6174  -2.1973  -0.0545   2.3517   6.5404 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)  
## (Intercept)      -9.5336    12.5903  -0.757   0.4560  
## agree             2.2717     2.7334   0.831   0.4138  
## ability           5.6610     2.4170   2.342   0.0274 *
## agree_ability_i  -0.7068     0.5257  -1.344   0.1909  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared:  0.5611, Adjusted R-squared:  0.5085 
## F-statistic: 10.66 on 3 and 25 DF,  p-value: 0.000107
model.centered <- lm(pubs ~ agree_c * ability_c,data=data)
summary(model.centered)
## 
## Call:
## lm(formula = pubs ~ agree_c * ability_c, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.6174  -2.1973  -0.0545   2.3517   6.5404 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        11.8974     0.6875  17.305 1.99e-15 ***
## agree_c            -0.9455     0.6023  -1.570    0.129    
## ability_c           2.4194     0.5199   4.654 9.14e-05 ***
## agree_c:ability_c  -0.7068     0.5257  -1.344    0.191    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared:  0.5611, Adjusted R-squared:  0.5085 
## F-statistic: 10.66 on 3 and 25 DF,  p-value: 0.000107
model.centered2 <- lm(pubs ~ agree_c + ability_c + agreec_abilityc_i, data=data)
summary(model.centered2)
## 
## Call:
## lm(formula = pubs ~ agree_c + ability_c + agreec_abilityc_i, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.6174  -2.1973  -0.0545   2.3517   6.5404 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        11.8974     0.6875  17.305 1.99e-15 ***
## agree_c            -0.9455     0.6023  -1.570    0.129    
## ability_c           2.4194     0.5199   4.654 9.14e-05 ***
## agreec_abilityc_i  -0.7068     0.5257  -1.344    0.191    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared:  0.5611, Adjusted R-squared:  0.5085 
## F-statistic: 10.66 on 3 and 25 DF,  p-value: 0.000107

Answer: The F-statistic [F(3, 25) = 10.66] and p-value are the same (p < .01); however, the interpretation of the terms is different with the centered variables. Multicollinearity is reduced with the centered variables.

3. How does the model fit?

model.centered.sum <- summary(model.centered)
sqrt(model.centered.sum$r.square)
## [1] 0.7490943

Answer: Model fit is strong and accounts for 75% of variation in the outcome.

4. Does the interaction add significantly to our ability to predict edits? Explain. Answer: The interaction term is not significant (p=.19), and therefore, does not add significantly to our ability to predict edits.

5. What is wrong with automatically calculated betas when working with interactions? How can we get the correct ones? What are they?

data.z <- as.data.frame(scale(data))
model.centered <- lm(pubs ~ agree_c * ability_c, data=data.z)
summary(model.centered)

describe(data.z$agree_c)
describe(data.z$pubs)

Answer: If we want standardized beta coefficients, we must z-score our variables first. B0 = 0, B1 = -0.23, B2 = 0.62, B4 (interaction term) = -0.24.

6. Graph the data, what does this indicate and should we be interpreting the interaction?

First we will need to create separate objects with our predicted data

predict(model.centered, data.frame(agree_c=0, ability_c=0))

#cool its our intercept (because the intercept is = to when the predictors are at 0)
#now we will find a range of values to plot at medium, low, and high values of 
#agree_c which is z-scored so its really "agree_z.."

#cool its our intercept (because the intercept is = to when the predictors are at 0)
#now we will find a range of values to plot at medium, low, and high values of 
#agree_c which is z-scored so its really "agree_z.."

p1<-predict(model.centered, data.frame(agree_c=0, ability_c=-2:2))
p2<-predict(model.centered, data.frame(agree_c=-1, ability_c=-2:2))
p3<-predict(model.centered, data.frame(agree_c=1, ability_c=-2:2))
#note that each p1, p2, and p3 will have five predicted values (corresponding to the five
#numbers between -2 to 2 - only gives whole numbers but we dont need every point since its a line)
p1

#note that the "3" point for this p1 above is actually where ability_c=0 
#(and reflects the intercept again, where agree=0 and ability=0)

#now lets look at the rest of our predicted values across low and high agree
p2
p3

Now we can plot. Be sure to run this chunk altogether or else you will run into errors

#Ok now we will do a somewhat ugly plot

plot(pubs~ability_c, data.z)
lines(-2:2, p1) 
#this line shows the relationship between ability and publications for AVERAGE agreeablenes
lines(-2:2, p2)
#this line shows the relationship between ability and publications for LOW agreeablenes
lines(-2:2, p3)
#this line shows the relationship between ability and publications for HIGH agreeablenes
#so try to interpret this interaction! (if need be, re run to make lines appear again)

Answer: It seems there is an interaction between ability and agreeableness in predicting number of publications, as indicated by the crossing, non-parallel lines.

Mediation

We will be using a new prompt and new data set for our mediation example.

Prompt: We want to know if people that believe in luck are luckier than people that do not believe in luck. Participants were asked how much they believe in luck. They were then allowed to play up to eight games of chance, but had to play at least one. Look at the relationship between belief in luck, games played, and wins.

data2 <- read.csv("HW7Data_2.csv", header=T, stringsAsFactors = FALSE, na.strings=c("","NA"))

1. Does belief in luck predict wins?

data2.z <- as.data.frame(scale(data2))

model.luck <- lm(win ~ luck, data = data2.z)
summary(model.luck)
## 
## Call:
## lm(formula = win ~ luck, data = data2.z)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.91169 -0.59399  0.04979  0.69357  1.98113 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.974e-17  8.027e-02   0.000        1    
## luck         4.829e-01  8.061e-02   5.991 2.32e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8794 on 118 degrees of freedom
## Multiple R-squared:  0.2332, Adjusted R-squared:  0.2267 
## F-statistic: 35.89 on 1 and 118 DF,  p-value: 2.322e-08
# what did we just estimate??  the TOTAL EFFECT or aka c!

Answer: Belief in luck predicts wins, F(1, 118) = 35.89, p < .01.

2. Skeptical of this relationship, you realize those that believe in luck may not actually win more, but may instead participate in games of chance more often (possibly believing the outcome that should happen always will or something like that). What is this called? Draw this model.

ModelVis <- c(0, "'.553'", 0,
           0, 0, 0,
           "'.812'", "'.032*'", 0)
M2<- matrix (nrow=3, ncol=3, byrow = TRUE, data=ModelVis)
plot<- plotmat(M2, pos=c(1,2),
               name= c( "Play","Luck", "Win"),
               box.type = "rect", box.size = 0.12, box.prop=0.5,  curve=0)

This is called mediation, or an indirect effect, where belief in luck affects number of wins through number of plays (the mediator).

3. Estimate each of the path coefficients and add them to the model.

# first lets estimante path a from luck to play (using betas)
model.a <- lm(play ~ luck, data = data2.z)
summary(model.a)
## 
## Call:
## lm(formula = play ~ luck, data = data2.z)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0639 -0.6006  0.1213  0.8236  2.2869 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.760e-16  7.640e-02   0.000        1    
## luck        5.527e-01  7.672e-02   7.204 5.96e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8369 on 118 degrees of freedom
## Multiple R-squared:  0.3054, Adjusted R-squared:  0.2996 
## F-statistic: 51.89 on 1 and 118 DF,  p-value: 5.961e-11
# now lets estimate b and c' 
model.bc <- lm(win ~ luck + play, data = data2.z)
summary(model.bc)
## 
## Call:
## lm(formula = win ~ luck + play, data = data2.z)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.85048 -0.14978 -0.03446  0.54099  0.71041 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.350e-16  5.117e-02   0.000     1.00    
## luck         3.418e-02  6.165e-02   0.554     0.58    
## play         8.120e-01  6.165e-02  13.170   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5605 on 117 degrees of freedom
## Multiple R-squared:  0.6911, Adjusted R-squared:  0.6858 
## F-statistic: 130.9 on 2 and 117 DF,  p-value: < 2.2e-16
#we can also do this using lavaan
path.mediate <-'win ~ play + luck  
                play ~ luck'

model.path = lavaan::sem(model=path.mediate, data=data2)

summary(model.path, standardized=T, fit.measure = T)
## Length  Class   Mode 
##      1 lavaan     S4
#we dont need extra fit measures (beyond the scope of the class) but I wanted to keep in case you might 
#want it one day, below commented out
#fitMeasures(model.path, fit.measures='all') 
#ok so in the output, if you note the "Std.all" collumn, this should match exactly with what we calculated 
#in a piece-wise fashion (above) because this model is just identified.
#You would also get these Beta values under "Estimate" if you entered the "data2.Z" instead of "data2"

#now we will remove the c' path (the regression of play on luck) to give us a df to test our model

path.mediate2 <-'win ~ play + luck
                  play~~0*luck' #I am saying here, set the covariance here to 0 (drop the path)

model.path2 = lavaan::sem(model=path.mediate2, data=data2)
summary(model.path2, standardized=T, fit.measures = T)
## Length  Class   Mode 
##      1 lavaan     S4
#Interpret the Chi Square, also known as the "badness of fit statistic"
#does this model fit well without that path? or not?

Answer: The path from luck to play is .06, p < .01. The path from play to win is .08, p < .01.

4. Do we have evidence of mediation? Explain. Answer: We have evidence of mediation, as the relationship between luck and wins decreased when play was added to the model, and the coefficient became insignificant. (r = .06, p <.01 –> r =.003, p = .58)

5. How can we test that the path between luck and wins has actually decreased? (Hint: there are two ways.) Answer: Compare changes in coefficients and the significance of these coefficients after the suspected mediator is added to the model.

6. How can we get an effect size for the amount of mediation? Answer: To estimate the effect size for amount of mediation we may use ratios comparing direct effect with total effect or comparing the indirect effect with the direct effect. Other effect sizes are possible beyond ratios.