Welcome to week 7! This week is all about moderation and mediation and our examples will mostly be applied.
First thing we do is load the necessary libraries
Next, read in the data for our moderation example.
Prompt: We are interested in number of publications faculty members have at the end of their first year. We survey 29 faculty members at the beginning of their first year on a number of measures including the Big Five and a measure of their general writing ability. At the end of year one we followed up and asked them to report the total number of published papers on their CV.
1. Create the following variables and then correlate them, what does this tell us about multiccolinearity? Which terms should we use? 1) An interaction term between RAW agreeableness and ability. 2) Agreeableness centered 3) Ability centered 4) An interaction from CENTERED agreeableness and ability.
# Note: In r, we don't have to manually create the interaction
# terms to do regression, but we'll do both to show you
# they are equivalent
data$agree_ability_i <- data$ability * data$agree
data$agree_c <- scale(data$agree,scale=F)
data$ability_c <- scale(data$ability,scale=F)
data$agreec_abilityc_i <- data$ability_c * data$agree_c
cor(data[,c("agree","ability","agree_ability_i","agree_c"
,"ability_c","agreec_abilityc_i")])
## agree ability agree_ability_i agree_c
## agree 1.00000000 -0.02753552 0.7272881 1.00000000
## ability -0.02753552 1.00000000 0.6501399 -0.02753552
## agree_ability_i 0.72728812 0.65013988 1.0000000 0.72728812
## agree_c 1.00000000 -0.02753552 0.7272881 1.00000000
## ability_c -0.02753552 1.00000000 0.6501399 -0.02753552
## agreec_abilityc_i 0.47965423 -0.09802281 0.4202844 0.47965423
## ability_c agreec_abilityc_i
## agree -0.02753552 0.47965423
## ability 1.00000000 -0.09802281
## agree_ability_i 0.65013988 0.42028443
## agree_c -0.02753552 0.47965423
## ability_c 1.00000000 -0.09802281
## agreec_abilityc_i -0.09802281 1.00000000
Answer: It’s best to use the terms that have been centered prior to calculating the interaction, as this makes our interaction regression coefficients more interpretable.
2. Regress publications on agree, ability and the interaction between agree and ability (all uncentered). Then, regress edits on agree, ability and the interaction between agreement and ability (all centered). How does this impact our results? How is this related to multicollinearity?
model.uncentered <- lm(pubs ~ agree * ability, data=data)
summary(model.uncentered)
##
## Call:
## lm(formula = pubs ~ agree * ability, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6174 -2.1973 -0.0545 2.3517 6.5404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.5336 12.5903 -0.757 0.4560
## agree 2.2717 2.7334 0.831 0.4138
## ability 5.6610 2.4170 2.342 0.0274 *
## agree:ability -0.7068 0.5257 -1.344 0.1909
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5085
## F-statistic: 10.66 on 3 and 25 DF, p-value: 0.000107
model.uncentered2 <- lm(pubs ~ agree + ability + agree_ability_i, data=data)
summary(model.uncentered2)
##
## Call:
## lm(formula = pubs ~ agree + ability + agree_ability_i, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6174 -2.1973 -0.0545 2.3517 6.5404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.5336 12.5903 -0.757 0.4560
## agree 2.2717 2.7334 0.831 0.4138
## ability 5.6610 2.4170 2.342 0.0274 *
## agree_ability_i -0.7068 0.5257 -1.344 0.1909
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5085
## F-statistic: 10.66 on 3 and 25 DF, p-value: 0.000107
model.centered <- lm(pubs ~ agree_c * ability_c,data=data)
summary(model.centered)
##
## Call:
## lm(formula = pubs ~ agree_c * ability_c, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6174 -2.1973 -0.0545 2.3517 6.5404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.8974 0.6875 17.305 1.99e-15 ***
## agree_c -0.9455 0.6023 -1.570 0.129
## ability_c 2.4194 0.5199 4.654 9.14e-05 ***
## agree_c:ability_c -0.7068 0.5257 -1.344 0.191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5085
## F-statistic: 10.66 on 3 and 25 DF, p-value: 0.000107
model.centered2 <- lm(pubs ~ agree_c + ability_c + agreec_abilityc_i, data=data)
summary(model.centered2)
##
## Call:
## lm(formula = pubs ~ agree_c + ability_c + agreec_abilityc_i,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.6174 -2.1973 -0.0545 2.3517 6.5404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.8974 0.6875 17.305 1.99e-15 ***
## agree_c -0.9455 0.6023 -1.570 0.129
## ability_c 2.4194 0.5199 4.654 9.14e-05 ***
## agreec_abilityc_i -0.7068 0.5257 -1.344 0.191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.7 on 25 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5085
## F-statistic: 10.66 on 3 and 25 DF, p-value: 0.000107
Answer: Our R squared for each model is the same, however our centered variables adjust for the intercept. When centered, we see an intercept of 11.90, which is more meaningful than a negative value. This output is related to multicollinearity in that our regressors are otherwise correlated with one another, and scaling them will help to avoid these issues.
3. How does the model fit? - the model explains roughly 75% of the variance for publications.
model.centered.sum <- summary(model.centered)
sqrt(model.centered.sum$r.square)
## [1] 0.7490943
4. Does the interaction add significantly to our ability to predict edits? Explain. Anwer: The interaction does not add much to our ability to predict, as our b coefficient for the interaction is very small and negative, which is not meaningful in this framework.
5. What is wrong with automatically calculated betas when working with interactions? How can we get the correct ones? What are they?
data.z <- as.data.frame(scale(data))
model.centered <- lm(pubs ~ agree_c * ability_c, data=data.z)
summary(model.centered)
##
## Call:
## lm(formula = pubs ~ agree_c * ability_c, data = data.z)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.01182 -0.41635 -0.01033 0.44561 1.23929
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.00637 0.13027 -0.049 0.961
## agree_c -0.23708 0.15103 -1.570 0.129
## ability_c 0.61975 0.13317 4.654 9.14e-05 ***
## agree_c:ability_c -0.23960 0.17822 -1.344 0.191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7011 on 25 degrees of freedom
## Multiple R-squared: 0.5611, Adjusted R-squared: 0.5085
## F-statistic: 10.66 on 3 and 25 DF, p-value: 0.000107
describe(data.z$agree_c)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 29 0 1 0.31 -0.02 1.12 -1.95 1.82 3.78 0.04 -1.03 0.19
describe(data.z$pubs)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 29 0 1 -0.18 0 1.12 -1.88 1.72 3.6 0.15 -1.08 0.19
Answer: If we automatically standardize our b values into betas, this will not be an accurate method. To do this correctly, we must first z-score agreeableness and ability, and THEN calculate agree*ability. This gives us a better estimate.
6. Graph the data, what does this indicate and should we be interpreting the interaction?
First we will need to create separate objects with our predicted data
predict(model.centered, data.frame(agree_c=0, ability_c=0))
## 1
## -0.006369963
#cool its our intercept (because the intercept is = to when the predictors are at 0)
#now we will find a range of values to plot at medium, low, and high values of
#agree_c which is z-scored so its really "agree_z.."
#cool its our intercept (because the intercept is = to when the predictors are at 0)
#now we will find a range of values to plot at medium, low, and high values of
#agree_c which is z-scored so its really "agree_z.."
p1<-predict(model.centered, data.frame(agree_c=0, ability_c=-2:2))
p2<-predict(model.centered, data.frame(agree_c=-1, ability_c=-2:2))
p3<-predict(model.centered, data.frame(agree_c=1, ability_c=-2:2))
#note that each p1, p2, and p3 will have five predicted values (corresponding to the five
#numbers between -2 to 2 - only gives whole numbers but we dont need every point since its a line)
p1
## 1 2 3 4 5
## -1.245860990 -0.626115476 -0.006369963 0.613375551 1.233121065
#note that the "3" point for this p1 above is actually where ability_c=0
#(and reflects the intercept again, where agree=0 and ability=0)
#now lets look at the rest of our predicted values across low and high agree
p2
## 1 2 3 4 5
## -1.4879815 -0.6286377 0.2307060 1.0900498 1.9493935
p3
## 1 2 3 4 5
## -1.0037405 -0.6235932 -0.2434460 0.1367013 0.5168486
Now we can plot. Be sure to run this chunk altogether or else you will run into errors
#Ok now we will do a somewhat ugly plot
plot(pubs~ability_c, data.z)
lines(-2:2, p1)
#this line shows the relationship between ability and publications for AVERAGE agreeablenes
lines(-2:2, p2)
#this line shows the relationship between ability and publications for LOW agreeablenes
lines(-2:2, p3)
#this line shows the relationship between ability and publications for HIGH agreeablenes
#so try to interpret this interaction! (if need be, re run to make lines appear again)
Answer: This indicates the regression of centered ability on publications. We can interpret from this that at any value of ability, it positively predicts publication.
We will be using a new prompt and new data set for our mediation example.
Prompt: We want to know if people that believe in luck are luckier than people that do not believe in luck. Participants were asked how much they believe in luck. They were then allowed to play up to eight games of chance, but had to play at least one. Look at the relationship between belief in luck, games played, and wins.
data2 <- read.csv("HW7Data_2.csv", header=T, stringsAsFactors = FALSE, na.strings=c("","NA"))
1. Does belief in luck predict wins?
data2.z <- as.data.frame(scale(data2))
model.luck <- lm(win ~ luck, data = data2.z)
summary(model.luck)
##
## Call:
## lm(formula = win ~ luck, data = data2.z)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.91169 -0.59399 0.04979 0.69357 1.98113
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.974e-17 8.027e-02 0.000 1
## luck 4.829e-01 8.061e-02 5.991 2.32e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8794 on 118 degrees of freedom
## Multiple R-squared: 0.2332, Adjusted R-squared: 0.2267
## F-statistic: 35.89 on 1 and 118 DF, p-value: 2.322e-08
# what did we just estimate?? the TOTAL EFFECT or aka c!
2. Skeptical of this relationship, you realize those that believe in luck may not actually win more, but may instead participate in games of chance more often (possibly believing the outcome that should happen always will or something like that). What is this called? Draw this model.
ModelVis <- c(0, "'.553'", 0,
0, 0, 0,
"'.812'", "'.032*'", 0)
M2<- matrix (nrow=3, ncol=3, byrow = TRUE, data=ModelVis)
plot<- plotmat(M2, pos=c(1,2),
name= c( "Play","Luck", "Win"),
box.type = "rect", box.size = 0.12, box.prop=0.5, curve=0)
3. Estimate each of the path coefficients and add them to the model.
# first lets estimante path a from luck to play (using betas)
model.a <- lm(play ~ luck, data = data2.z)
summary(model.a)
##
## Call:
## lm(formula = play ~ luck, data = data2.z)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0639 -0.6006 0.1213 0.8236 2.2869
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.760e-16 7.640e-02 0.000 1
## luck 5.527e-01 7.672e-02 7.204 5.96e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8369 on 118 degrees of freedom
## Multiple R-squared: 0.3054, Adjusted R-squared: 0.2996
## F-statistic: 51.89 on 1 and 118 DF, p-value: 5.961e-11
# now lets estimate b and c'
model.bc <- lm(win ~ luck + play, data = data2.z)
summary(model.bc)
##
## Call:
## lm(formula = win ~ luck + play, data = data2.z)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.85048 -0.14978 -0.03446 0.54099 0.71041
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.350e-16 5.117e-02 0.000 1.00
## luck 3.418e-02 6.165e-02 0.554 0.58
## play 8.120e-01 6.165e-02 13.170 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5605 on 117 degrees of freedom
## Multiple R-squared: 0.6911, Adjusted R-squared: 0.6858
## F-statistic: 130.9 on 2 and 117 DF, p-value: < 2.2e-16
#we can also do this using lavaan
path.mediate <-'win ~ play + luck
play ~ luck'
model.path = lavaan::sem(model=path.mediate, data=data2)
summary(model.path, standardized=T, fit.measure = T)
## lavaan 0.6-7 ended normally after 16 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 5
##
## Number of observations 120
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Model Test Baseline Model:
##
## Test statistic 184.710
## Degrees of freedom 3
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -339.143
## Loglikelihood unrestricted model (H1) -339.143
##
## Akaike (AIC) 688.287
## Bayesian (BIC) 702.224
## Sample-size adjusted Bayesian (BIC) 686.417
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value RMSEA <= 0.05 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## win ~
## play 0.910 0.068 13.337 0.000 0.910 0.812
## luck 0.067 0.120 0.561 0.575 0.067 0.034
## play ~
## luck 0.973 0.134 7.265 0.000 0.973 0.553
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .win 0.739 0.095 7.746 0.000 0.739 0.309
## .play 1.322 0.171 7.746 0.000 1.322 0.695
#we dont need extra fit measures (beyond the scope of the class) but I wanted to keep in case you might
#want it one day, below commented out
#fitMeasures(model.path, fit.measures='all')
#ok so in the output, if you note the "Std.all" collumn, this should match exactly with what we calculated
#in a piece-wise fashion (above) because this model is just identified.
#You would also get these Beta values under "Estimate" if you entered the "data2.Z" instead of "data2"
#now we will remove the c' path (the regression of play on luck) to give us a df to test our model
path.mediate2 <-'win ~ play + luck
play~~0*luck' #I am saying here, set the covariance here to 0 (drop the path)
model.path2 = lavaan::sem(model=path.mediate2, data=data2)
summary(model.path2, standardized=T, fit.measures = T)
## lavaan 0.6-7 ended normally after 14 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 5
##
## Number of observations 120
##
## Model Test User Model:
##
## Test statistic 43.739
## Degrees of freedom 1
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 184.710
## Degrees of freedom 3
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.765
## Tucker-Lewis Index (TLI) 0.294
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -502.036
## Loglikelihood unrestricted model (H1) -480.167
##
## Akaike (AIC) 1014.072
## Bayesian (BIC) 1028.010
## Sample-size adjusted Bayesian (BIC) 1012.202
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.597
## 90 Percent confidence interval - lower 0.454
## 90 Percent confidence interval - upper 0.754
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.291
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## win ~
## play 0.910 0.057 16.004 0.000 0.910 0.825
## luck 0.067 0.100 0.674 0.501 0.067 0.035
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## play ~~
## luck 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .win 0.739 0.095 7.746 0.000 0.739 0.319
## play 1.903 0.246 7.746 0.000 1.903 1.000
## luck 0.614 0.079 7.746 0.000 0.614 1.000
#Interpret the Chi Square, also known as the "badness of fit statistic"
#does this model fit well without that path? or not?
Answer:
4. Do we have evidence of mediation? Explain. Answer: Partially, because our c’ beta is not equal to zero, but a and b (luck and plays) are not equal to zero, which suggests partial mediation of luck on wins. I am not sure if I understand this output correctly.
5. How can we test that the path between luck and wins has actually decreased? (Hint: there are two ways.) Answer: Could we use monte carlo estimation or some other type of bootstrapping method?
6. How can we get an effect size for the amount of mediation? Answer: We can use the Sobel test to identify the size of the effect.