##Mediation and Moderation HW 2
#import data
library(readr)
library(knitr)
stress <- read_csv("C:/Users/rlutt/Downloads/stress_withdraw.csv")
## Rows: 262 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (7): tenure, stress, depress, withdraw, sex, age, efficacy
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# run a causal mediation analysis in which stress= main IV, depress=mediator, and withdraw intention= DV
library(mediation)
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: mvtnorm
## Loading required package: sandwich
## mediation: Causal Mediation Analysis
## Version: 4.5.0
med.simple<- lm(withdraw~stress, data=stress)
out.simple<-lm(withdraw~ stress + depress, data=stress)
med.out.simple<- mediate(med.simple, out.simple, treat="stress", mediator= "depress", boot=T, sims=1000, boot.ci.type= "bca")
## Running nonparametric bootstrap
summary(med.out.simple)
##
## Causal Mediation Analysis
##
## Nonparametric Bootstrap Confidence Intervals with the BCa Method
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME 4.32e-02 -6.23e-02 0.14 0.36
## ADE -7.68e-02 -1.80e-01 0.04 0.21
## Total Effect -3.37e-02 -2.46e-01 0.16 0.77
## Prop. Mediated -1.28e+00 -1.75e+03 -0.27 0.86
##
## Sample Size Used: 262
##
##
## Simulations: 1000
This analysis tests whether the outcome, withdraw is affected by the main independent variable, stress, and if stress’s effect acts through a mediator variable, depress. The indirect effect represents the effect of stress multiplied by depress on the outcome withdraw. The direct effect is only the effect of stress on the outcome, withdraw.
These results can be interpreted as follows. The indirect effect is the ACME, which shown by the table above has a positive coefficient and p-value above 0.05, deeming it as not significant. The direct effect is the ADE, which also shown above has a negative coefficient and a p-value that is also not significant. The total effect is coefficients of the ADE and ACME added together, which is also not significant and is negative.
Therefore, as none of these results are significant, we cannot conclude that the effect of stress operates through the mediator, depress on the outcome withdraw.
#plot & sensitivity analyses
plot(med.out.simple)
s.out<-medsens(med.out.simple)
summary(s.out)
##
## Mediation Sensitivity Analysis for Average Causal Mediation Effect
##
## Sensitivity Region
##
## Rho ACME 95% CI Lower 95% CI Upper R^2_M*R^2_Y* R^2_M~R^2_Y~
## [1,] -0.9 0.0812 -0.0724 0.2348 0.81 0.6612
## [2,] -0.8 0.0764 -0.0682 0.2210 0.64 0.5224
## [3,] -0.7 0.0718 -0.0642 0.2079 0.49 0.4000
## [4,] -0.6 0.0674 -0.0603 0.1952 0.36 0.2939
## [5,] -0.5 0.0632 -0.0566 0.1830 0.25 0.2041
## [6,] -0.4 0.0591 -0.0530 0.1711 0.16 0.1306
## [7,] -0.3 0.0550 -0.0495 0.1595 0.09 0.0735
## [8,] -0.2 0.0510 -0.0460 0.1481 0.04 0.0327
## [9,] -0.1 0.0471 -0.0426 0.1368 0.01 0.0082
## [10,] 0.0 0.0432 -0.0392 0.1255 0.00 0.0000
## [11,] 0.1 0.0392 -0.0358 0.1142 0.01 0.0082
## [12,] 0.2 0.0353 -0.0323 0.1029 0.04 0.0327
## [13,] 0.3 0.0313 -0.0288 0.0915 0.09 0.0735
## [14,] 0.4 0.0273 -0.0253 0.0799 0.16 0.1306
## [15,] 0.5 0.0231 -0.0217 0.0680 0.25 0.2041
## [16,] 0.6 0.0189 -0.0180 0.0558 0.36 0.2939
## [17,] 0.7 0.0145 -0.0142 0.0431 0.49 0.4000
## [18,] 0.8 0.0099 -0.0101 0.0299 0.64 0.5224
## [19,] 0.9 0.0051 -0.0059 0.0161 0.81 0.6612
##
## Rho at which ACME = 0: 0.9
## R^2_M*R^2_Y* at which ACME = 0: 0.81
## R^2_M~R^2_Y~ at which ACME = 0: 0.6612
plot(s.out, "rho")
plot(s.out, "R2")
Sensitivity analyses are useful when determining if any omitted
variables may be affecting the results. The first plot shows the rho
values. The line is straight and diagonal and the 95% confidence
interval covers nearly the entire plot. The fact that the confidence
interval is covering nearly the entire plot means that an omitted
variable can negate the strength of this analysis. As the results are
not significant anyways, these results makes sense.
Next, the R2 values plot also leads to a conclusion that the strength of an omitted variable confounder only has to explain a small amount of the remaining variance in Y (.01-.04, shown in the plot) so that ACME is going to zero. This means we accept the null hypothesis, deeming ACME as not having an effect on the outcome.
The average maternal and paternal age, both shown to be in early to mid 20’s, suggesting that they’ve just begun childbearing and that their child may be young. Parental education is mixed, as a value of 0.35 presents that around 35% of parents in the sample have at least a tertiary education. Assuming that family size represents number of children, the sample has an average of just below 2 children per family. Lastly, it appears 46% of the children in the sample are non-white, meaning that white children are the majority (if this value is measured in a binary way (white and non-white)).
Model 5 includes parental age in the model, leading me to conclude that this model is only for fathers. The sample sizes for the two models are also different, strengthening my claim that these models are gender/parent-specific. Model 5 shows a significant effect (B=0.05) for parental age. This can be interpreted as: every yearly increase in paternal age is associated with a 5% increase in their child’s educational attainment. High parental education is also significant in this model, showing a stronger effect (B=0.89) than in model 4, suggesting that gender may have a moderating effect on parental age. The interaction term of high parental education and paternal age is not significant, however.
This conceptual model maps out a moderated mediation analysis. Future orientation mediates the effect of neighborhood disadvantage on the outcome, academic achievement. A moderator that affects the pathway between neighborhood disadvantage and future orientation (a) is parental support. This moderator also has affects on the direct pathway (c) of neighborhood disadvantage’s effect on academic achievement. In the right corner a potential confounder is listed, multiple group race/ethnicity, which could have an effect on the likelihood of a person’s neighborhood disadvantage as well as on their academic achievement. The analysis should control for multiple group race and ethnicity to avoid biased results.
#import my own data for moderation analysis
# I use 2016 DHS data from Uganda for this. This data set samples women in child-rearing years.
library(haven)
uganda16 <- read_dta("C:/Users/rlutt/Downloads/UGIR7BFL.DTA")
uganda16<-zap_labels(uganda16)
My hypothesis is that women with high education will desire less children than women with low education, however whether or not they are using contraception will have a moderating effect on this. Women who aren’t using contraception will be more likely to desire another child than women who are regardless of their education, as not using is a sign that a woman may be trying to get pregnant.
The way I measure fertility preferences for this question is continuous: a woman’s ideal number of children.
I measure education as no education, primary school educational attainment, and secondary school educational attainment or above. I categorize contraceptive use as none or yes, I use a modern method of birth control.
The first control variable I include is involvement in financial decision-making. This is as an indicator for empowerment, which is shown to affect fertility preferences. I utilized this measure in a dichotomous fashion: whether a women is involved in financial decision-making or not. Second, I control for age, in 5 year groups from 15-49. And lastly, I control for whether the respondent already has a child or not at the time of survey.
#recode variables
#fertility preferences
uganda16$idealnumchil<-car::recode(uganda16$v614, recodes= "1='1'; 2='2'; 3='3'; 4='4'; 5='5'; 6='6'; 7='NA'", as.factor=T)
uganda16$ideal<-as.numeric(uganda16$idealnumchil)
#education level
uganda16$educationlevel <- car::recode(uganda16$v106,
recodes = "0 = 'none'; 1 = 'primary'; 2:3='secondary and above'; else='NA' ",
as.factor=T)
#contraception
uganda16$contraception<-as.factor(uganda16$v313)
uganda16$contraception<-car::Recode(uganda16$v313, recodes= "0='no'; 3= 'yes'; else='NA'", as.factor=T)
#financial decision-making
uganda16$decide_money <- ifelse(uganda16$v739 %in% c(1, 2), 1, 0)
#age groups
uganda16$agegroup <- car::Recode(uganda16$v013, recodes = "1='15-19'; 2='20-24';3='25-29'; 4 = '30-34';5='35-39';6='40-44';7='45+' ", as.factor=T)
#has another kid
uganda16$currentchildren<-uganda16$v202+uganda16$v203
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
ug16<-select(uganda16, educationlevel, contraception, decide_money, agegroup, idealnumchil)
library(gtsummary)
##
## Attaching package: 'gtsummary'
## The following object is masked from 'package:MASS':
##
## select
# summarize the data with our package
table1 <-
ug16 %>%
tbl_summary()
table1
Characteristic | N = 18,5061 |
---|---|
educationlevel | |
none | 2,071 (11%) |
primary | 10,893 (59%) |
secondary and above | 5,542 (30%) |
contraception | |
NA | 504 (2.7%) |
no | 13,088 (71%) |
yes | 4,914 (27%) |
decide_money | 6,451 (35%) |
agegroup | |
15-19 | 4,276 (23%) |
20-24 | 3,782 (20%) |
25-29 | 3,014 (16%) |
30-34 | 2,600 (14%) |
35-39 | 2,029 (11%) |
40-44 | 1,621 (8.8%) |
45+ | 1,184 (6.4%) |
idealnumchil | |
0 | 140 (0.8%) |
1 | 125 (0.7%) |
2 | 1,187 (6.4%) |
3 | 1,574 (8.5%) |
4 | 7,329 (40%) |
5 | 2,115 (11%) |
6 | 5,569 (30%) |
NA | 467 (2.5%) |
1 n (%) |
#Note: The descriptives table presents ideal number of children numbered 0-6 or more. For simplicity in the analysis, ideal number of children will be a continuous value of 0-6 only.The table also shows the percentage of the sample invovled in financial decision making. NA's are included for the contraception and ideal number of children variables but are excluded in the analysis.
#drop NA's from contraception variable
uganda<-uganda16%>%
filter(contraception == "no" | contraception== "yes")
#employ complex survey design
# survey design variables
library(survey)
## Loading required package: grid
## Loading required package: survival
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
uganda$psu <- uganda$v021
uganda$strata <- uganda$v022
uganda$pwt <- uganda$v005/1000000
desi<-svydesign(ids = ~ psu, strata = ~ strata, weights =~ pwt, data=uganda)
#perform moderation analysis
#first without interactions
library(stats)
model1 <- svyglm(ideal ~currentchildren + contraception + factor(educationlevel) + factor(agegroup) +decide_money, design=desi)
model1%>%
tbl_regression()
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
sons at home | 0.13 | 0.11, 0.14 | <0.001 |
contraception | |||
no | — | — | |
no | — | — | |
yes | -0.13 | -0.18, -0.08 | <0.001 |
factor(educationlevel) | |||
none | — | — | |
primary | -0.30 | -0.38, -0.22 | <0.001 |
secondary and above | -0.71 | -0.81, -0.61 | <0.001 |
factor(agegroup) | |||
15-19 | — | — | |
20-24 | 0.04 | -0.03, 0.10 | 0.3 |
25-29 | 0.18 | 0.10, 0.26 | <0.001 |
30-34 | 0.31 | 0.22, 0.40 | <0.001 |
35-39 | 0.40 | 0.30, 0.50 | <0.001 |
40-44 | 0.58 | 0.48, 0.68 | <0.001 |
45+ | 0.70 | 0.59, 0.80 | <0.001 |
decide_money | 0.06 | 0.01, 0.11 | 0.023 |
1 CI = Confidence Interval |
#then, with the interaction
model2 <- svyglm(ideal ~ currentchildren + contraception + factor(educationlevel) + factor(agegroup) +decide_money + factor(educationlevel)*factor(contraception),design=desi)
model2%>%
tbl_regression()
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
sons at home | 0.13 | 0.11, 0.14 | <0.001 |
contraception | |||
no | — | — | |
no | — | — | |
yes | -0.31 | -0.46, -0.17 | <0.001 |
factor(educationlevel) | |||
none | — | — | |
primary | -0.33 | -0.43, -0.24 | <0.001 |
secondary and above | -0.77 | -0.89, -0.66 | <0.001 |
factor(agegroup) | |||
15-19 | — | — | |
20-24 | 0.03 | -0.03, 0.10 | 0.3 |
25-29 | 0.17 | 0.09, 0.26 | <0.001 |
30-34 | 0.30 | 0.21, 0.40 | <0.001 |
35-39 | 0.40 | 0.30, 0.50 | <0.001 |
40-44 | 0.58 | 0.48, 0.68 | <0.001 |
45+ | 0.69 | 0.58, 0.80 | <0.001 |
decide_money | 0.06 | 0.01, 0.11 | 0.027 |
factor(educationlevel) * factor(contraception) | |||
primary * yes | 0.17 | 0.01, 0.33 | 0.033 |
secondary and above * yes | 0.25 | 0.09, 0.42 | 0.003 |
1 CI = Confidence Interval |
The results of this analysis show significant negative effects for our variables of interest on ideal fertility: education level and contraceptive use in model 1. This leads me to accept the start of my hypothesis.
The results for model 2 show a significant moderating effect by contraceptive use on education’s effect on ideal fertility in model, but not in the direction I hypothesized. Having primary coupled with using contraception is associated with a 17% (p<.05) increase in overall ideal fertility. Having secondary or higher education coupled with using contraception is associated with a 25% (p<.01) increase in overall ideal fertilty. Plotting these effects can help us understand further.
#plot the moderating effects
library(ggplot2)
uganda$edu<-as.numeric(uganda$educationlevel)
p <- ggplot(uganda, aes(x = edu, y = ideal, color = contraception)) +
geom_line() + scale_x_continuous(n.breaks = 3)+
scale_color_manual(values = c("no" = "blue", "yes" = "red"))
p
The moderating effects shown in this plot partially support/ partially
contradict my hypothesis. At the primary level of education, women using
contraception are more likely to have higher ideal fertility. This could
be due to the fact women using contraception are currently postponing
future desired births to later in life. Meaning, they may still have
high ideal fertility, even if they are using contraceptives at this
time. At the secondary or higher level of education, women using
contraception have lower ideal fertility than women not using. This is
consistent with the literature showing that women using contraceptives
and women with higher education tend to have lower ideal fertility.