##Mediation and Moderation HW 2

#import data
library(readr)
library(knitr)
stress <- read_csv("C:/Users/rlutt/Downloads/stress_withdraw.csv")
## Rows: 262 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (7): tenure, stress, depress, withdraw, sex, age, efficacy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# run a causal mediation analysis in which stress= main IV, depress=mediator, and withdraw intention= DV

library(mediation)
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: mvtnorm
## Loading required package: sandwich
## mediation: Causal Mediation Analysis
## Version: 4.5.0
med.simple<- lm(withdraw~stress, data=stress)
out.simple<-lm(withdraw~ stress + depress, data=stress)
med.out.simple<- mediate(med.simple, out.simple, treat="stress", mediator= "depress", boot=T, sims=1000, boot.ci.type= "bca")
## Running nonparametric bootstrap
summary(med.out.simple)
## 
## Causal Mediation Analysis 
## 
## Nonparametric Bootstrap Confidence Intervals with the BCa Method
## 
##                 Estimate 95% CI Lower 95% CI Upper p-value
## ACME            4.32e-02    -6.23e-02         0.14    0.36
## ADE            -7.68e-02    -1.80e-01         0.04    0.21
## Total Effect   -3.37e-02    -2.46e-01         0.16    0.77
## Prop. Mediated -1.28e+00    -1.75e+03        -0.27    0.86
## 
## Sample Size Used: 262 
## 
## 
## Simulations: 1000

This analysis tests whether the outcome, withdraw is affected by the main independent variable, stress, and if stress’s effect acts through a mediator variable, depress. The indirect effect represents the effect of stress multiplied by depress on the outcome withdraw. The direct effect is only the effect of stress on the outcome, withdraw.

These results can be interpreted as follows. The indirect effect is the ACME, which shown by the table above has a positive coefficient and p-value above 0.05, deeming it as not significant. The direct effect is the ADE, which also shown above has a negative coefficient and a p-value that is also not significant. The total effect is coefficients of the ADE and ACME added together, which is also not significant and is negative.

Therefore, as none of these results are significant, we cannot conclude that the effect of stress operates through the mediator, depress on the outcome withdraw.

#plot & sensitivity analyses
plot(med.out.simple)

s.out<-medsens(med.out.simple)
summary(s.out)
## 
## Mediation Sensitivity Analysis for Average Causal Mediation Effect
## 
## Sensitivity Region
## 
##        Rho   ACME 95% CI Lower 95% CI Upper R^2_M*R^2_Y* R^2_M~R^2_Y~
##  [1,] -0.9 0.0812      -0.0724       0.2348         0.81       0.6612
##  [2,] -0.8 0.0764      -0.0682       0.2210         0.64       0.5224
##  [3,] -0.7 0.0718      -0.0642       0.2079         0.49       0.4000
##  [4,] -0.6 0.0674      -0.0603       0.1952         0.36       0.2939
##  [5,] -0.5 0.0632      -0.0566       0.1830         0.25       0.2041
##  [6,] -0.4 0.0591      -0.0530       0.1711         0.16       0.1306
##  [7,] -0.3 0.0550      -0.0495       0.1595         0.09       0.0735
##  [8,] -0.2 0.0510      -0.0460       0.1481         0.04       0.0327
##  [9,] -0.1 0.0471      -0.0426       0.1368         0.01       0.0082
## [10,]  0.0 0.0432      -0.0392       0.1255         0.00       0.0000
## [11,]  0.1 0.0392      -0.0358       0.1142         0.01       0.0082
## [12,]  0.2 0.0353      -0.0323       0.1029         0.04       0.0327
## [13,]  0.3 0.0313      -0.0288       0.0915         0.09       0.0735
## [14,]  0.4 0.0273      -0.0253       0.0799         0.16       0.1306
## [15,]  0.5 0.0231      -0.0217       0.0680         0.25       0.2041
## [16,]  0.6 0.0189      -0.0180       0.0558         0.36       0.2939
## [17,]  0.7 0.0145      -0.0142       0.0431         0.49       0.4000
## [18,]  0.8 0.0099      -0.0101       0.0299         0.64       0.5224
## [19,]  0.9 0.0051      -0.0059       0.0161         0.81       0.6612
## 
## Rho at which ACME = 0: 0.9
## R^2_M*R^2_Y* at which ACME = 0: 0.81
## R^2_M~R^2_Y~ at which ACME = 0: 0.6612
plot(s.out, "rho")

plot(s.out, "R2")

Sensitivity analyses are useful when determining if any omitted variables may be affecting the results. The first plot shows the rho values. The line is straight and diagonal and the 95% confidence interval covers nearly the entire plot. The fact that the confidence interval is covering nearly the entire plot means that an omitted variable can negate the strength of this analysis. As the results are not significant anyways, these results makes sense.

Next, the R2 values plot also leads to a conclusion that the strength of an omitted variable confounder only has to explain a small amount of the remaining variance in Y (.01-.04, shown in the plot) so that ACME is going to zero. This means we accept the null hypothesis, deeming ACME as not having an effect on the outcome.

  1. The fourth question is about interpreting a descriptives table showing mean and standard deviation values in the US for data related to child education and parent’s demographics. As the focus of this study is children’s educational attainment, I will start by explaining this measure. The z signifies that this measure is z-scored. The result of z-scoring a variable leaves the mean close to zero and the standard deviation close to 1.

The average maternal and paternal age, both shown to be in early to mid 20’s, suggesting that they’ve just begun childbearing and that their child may be young. Parental education is mixed, as a value of 0.35 presents that around 35% of parents in the sample have at least a tertiary education. Assuming that family size represents number of children, the sample has an average of just below 2 children per family. Lastly, it appears 46% of the children in the sample are non-white, meaning that white children are the majority (if this value is measured in a binary way (white and non-white)).

  1. As the earlier part of question four stated that the principal focus is children’s educational attainment, I’m going to assume that is also the outcome. The significant predictor in model 4 is high parental education (B=0.82). This can be interpreted as: compared to parents of lower education, parents with a high education have an 82% higher child’s education. Controls in this model are high parental education interacting with maternal/paternal age (not significant) and maternal age, which leads me to think that this model only includes mothers.

Model 5 includes parental age in the model, leading me to conclude that this model is only for fathers. The sample sizes for the two models are also different, strengthening my claim that these models are gender/parent-specific. Model 5 shows a significant effect (B=0.05) for parental age. This can be interpreted as: every yearly increase in paternal age is associated with a 5% increase in their child’s educational attainment. High parental education is also significant in this model, showing a stronger effect (B=0.89) than in model 4, suggesting that gender may have a moderating effect on parental age. The interaction term of high parental education and paternal age is not significant, however.

  1. These graphs show the interaction terms between maternal and paternal age with high parental education. All of these variables are positively correlated with the outcome, however the slopes of the lines in the maternal age graph (left) are steeper than the lines in the paternal age graph (right)
  1. This conceptual model maps out a moderated mediation analysis. Future orientation mediates the effect of neighborhood disadvantage on the outcome, academic achievement. A moderator that affects the pathway between neighborhood disadvantage and future orientation (a) is parental support. This moderator also has affects on the direct pathway (c) of neighborhood disadvantage’s effect on academic achievement. In the right corner a potential confounder is listed, multiple group race/ethnicity, which could have an effect on the likelihood of a person’s neighborhood disadvantage as well as on their academic achievement. The analysis should control for multiple group race and ethnicity to avoid biased results.

#import my own data for moderation analysis
# I use 2016 DHS data from Uganda for this. This data set samples women in child-rearing years.

library(haven)

uganda16 <- read_dta("C:/Users/rlutt/Downloads/UGIR7BFL.DTA")
uganda16<-zap_labels(uganda16)
  1. Research Question: Is the effect of educational attainment on women’s fertility preferences moderated by her usage of contraception?

My hypothesis is that women with high education will desire less children than women with low education, however whether or not they are using contraception will have a moderating effect on this. Women who aren’t using contraception will be more likely to desire another child than women who are regardless of their education, as not using is a sign that a woman may be trying to get pregnant.

  1. The DHS is a nationally representative survey that collects key maternal health and child health information from women in their child-rearing years. I use 2016 because it is the most recently available women’s file for Uganda.

The way I measure fertility preferences for this question is continuous: a woman’s ideal number of children.

I measure education as no education, primary school educational attainment, and secondary school educational attainment or above. I categorize contraceptive use as none or yes, I use a modern method of birth control.

The first control variable I include is involvement in financial decision-making. This is as an indicator for empowerment, which is shown to affect fertility preferences. I utilized this measure in a dichotomous fashion: whether a women is involved in financial decision-making or not. Second, I control for age, in 5 year groups from 15-49. And lastly, I control for whether the respondent already has a child or not at the time of survey.

#recode variables

#fertility preferences

uganda16$idealnumchil<-car::recode(uganda16$v614, recodes= "1='1'; 2='2'; 3='3'; 4='4'; 5='5'; 6='6'; 7='NA'", as.factor=T)

uganda16$ideal<-as.numeric(uganda16$idealnumchil)
  
#education level
uganda16$educationlevel <- car::recode(uganda16$v106, 
                                       recodes = "0 = 'none'; 1 = 'primary'; 2:3='secondary and above'; else='NA' ",
                                       as.factor=T)

#contraception
uganda16$contraception<-as.factor(uganda16$v313)

uganda16$contraception<-car::Recode(uganda16$v313, recodes= "0='no'; 3= 'yes'; else='NA'", as.factor=T)

#financial decision-making
uganda16$decide_money <- ifelse(uganda16$v739 %in% c(1, 2), 1, 0)

#age groups
uganda16$agegroup <- car::Recode(uganda16$v013, recodes = "1='15-19'; 2='20-24';3='25-29'; 4 = '30-34';5='35-39';6='40-44';7='45+' ", as.factor=T)

#has another kid
uganda16$currentchildren<-uganda16$v202+uganda16$v203

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
## 
##     select
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
ug16<-select(uganda16, educationlevel, contraception, decide_money, agegroup, idealnumchil)
library(gtsummary)
## 
## Attaching package: 'gtsummary'
## The following object is masked from 'package:MASS':
## 
##     select
# summarize the data with our package
table1 <- 
  ug16 %>%
  tbl_summary()

table1
Characteristic N = 18,5061
educationlevel
    none 2,071 (11%)
    primary 10,893 (59%)
    secondary and above 5,542 (30%)
contraception
    NA 504 (2.7%)
    no 13,088 (71%)
    yes 4,914 (27%)
decide_money 6,451 (35%)
agegroup
    15-19 4,276 (23%)
    20-24 3,782 (20%)
    25-29 3,014 (16%)
    30-34 2,600 (14%)
    35-39 2,029 (11%)
    40-44 1,621 (8.8%)
    45+ 1,184 (6.4%)
idealnumchil
    0 140 (0.8%)
    1 125 (0.7%)
    2 1,187 (6.4%)
    3 1,574 (8.5%)
    4 7,329 (40%)
    5 2,115 (11%)
    6 5,569 (30%)
    NA 467 (2.5%)
1 n (%)
#Note: The descriptives table presents ideal number of children numbered 0-6 or more. For simplicity in the analysis, ideal number of children will be a continuous value of 0-6 only.The table also shows the percentage of the sample invovled in financial decision making. NA's are included for the contraception and ideal number of children variables but are excluded in the analysis.
#drop NA's from contraception variable

uganda<-uganda16%>%
filter(contraception == "no" | contraception== "yes")


#employ complex survey design
# survey design variables

library(survey)
## Loading required package: grid
## Loading required package: survival
## 
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
## 
##     dotchart
uganda$psu <- uganda$v021
uganda$strata <- uganda$v022
uganda$pwt <- uganda$v005/1000000
desi<-svydesign(ids = ~ psu, strata = ~ strata, weights =~ pwt, data=uganda)

#perform moderation analysis

#first without interactions
library(stats)
model1 <- svyglm(ideal ~currentchildren + contraception + factor(educationlevel) + factor(agegroup) +decide_money, design=desi)
model1%>%
  tbl_regression()
Characteristic Beta 95% CI1 p-value
sons at home 0.13 0.11, 0.14 <0.001
contraception
    no
    no
    yes -0.13 -0.18, -0.08 <0.001
factor(educationlevel)
    none
    primary -0.30 -0.38, -0.22 <0.001
    secondary and above -0.71 -0.81, -0.61 <0.001
factor(agegroup)
    15-19
    20-24 0.04 -0.03, 0.10 0.3
    25-29 0.18 0.10, 0.26 <0.001
    30-34 0.31 0.22, 0.40 <0.001
    35-39 0.40 0.30, 0.50 <0.001
    40-44 0.58 0.48, 0.68 <0.001
    45+ 0.70 0.59, 0.80 <0.001
decide_money 0.06 0.01, 0.11 0.023
1 CI = Confidence Interval
#then, with the interaction
model2 <- svyglm(ideal ~ currentchildren + contraception + factor(educationlevel) + factor(agegroup) +decide_money + factor(educationlevel)*factor(contraception),design=desi)
model2%>%
  tbl_regression()
Characteristic Beta 95% CI1 p-value
sons at home 0.13 0.11, 0.14 <0.001
contraception
    no
    no
    yes -0.31 -0.46, -0.17 <0.001
factor(educationlevel)
    none
    primary -0.33 -0.43, -0.24 <0.001
    secondary and above -0.77 -0.89, -0.66 <0.001
factor(agegroup)
    15-19
    20-24 0.03 -0.03, 0.10 0.3
    25-29 0.17 0.09, 0.26 <0.001
    30-34 0.30 0.21, 0.40 <0.001
    35-39 0.40 0.30, 0.50 <0.001
    40-44 0.58 0.48, 0.68 <0.001
    45+ 0.69 0.58, 0.80 <0.001
decide_money 0.06 0.01, 0.11 0.027
factor(educationlevel) * factor(contraception)
    primary * yes 0.17 0.01, 0.33 0.033
    secondary and above * yes 0.25 0.09, 0.42 0.003
1 CI = Confidence Interval

The results of this analysis show significant negative effects for our variables of interest on ideal fertility: education level and contraceptive use in model 1. This leads me to accept the start of my hypothesis.

The results for model 2 show a significant moderating effect by contraceptive use on education’s effect on ideal fertility in model, but not in the direction I hypothesized. Having primary coupled with using contraception is associated with a 17% (p<.05) increase in overall ideal fertility. Having secondary or higher education coupled with using contraception is associated with a 25% (p<.01) increase in overall ideal fertilty. Plotting these effects can help us understand further.

#plot the moderating effects
library(ggplot2)

uganda$edu<-as.numeric(uganda$educationlevel)

p <- ggplot(uganda, aes(x = edu, y = ideal, color = contraception)) +
            geom_line() + scale_x_continuous(n.breaks = 3)+
            scale_color_manual(values = c("no" = "blue", "yes" = "red"))
p

The moderating effects shown in this plot partially support/ partially contradict my hypothesis. At the primary level of education, women using contraception are more likely to have higher ideal fertility. This could be due to the fact women using contraception are currently postponing future desired births to later in life. Meaning, they may still have high ideal fertility, even if they are using contraceptives at this time. At the secondary or higher level of education, women using contraception have lower ideal fertility than women not using. This is consistent with the literature showing that women using contraceptives and women with higher education tend to have lower ideal fertility.