We will use data from the National Evaluation of Welfare-to-Work Strategies (NEWWS) study. The study randomly assigned 694 participants, who were mostly low-income single mothers with young children. Labor Force Attachment (LFA) program (208) aimed at moving low-income parents from welfare to work by providing employment-focused incentives and services. Control group (486) received aid from the Aid to Families with Dependent Children (AFDC) program without requirement for working.
library(foreign)
newws<- read.dta("E:\\R\\newws.dta")
Research Question: How does the LFA impact on maternal depression vary by number of children at baseline?
Treatment:
treat 1 if a participant was assigned to LFA and 0
otherwise.
Outcome:
depression maternal depression at the end of the second
year after randomization, which is a summary score of 12 items measuring
depressive symptoms during the past week on a 0-3 scale.
Moderator:
CHCNT 1 if had 1 child, 2 if had 2 children, and 3 if
had 3 or more children before randomization.
Pretreatment Covariates:
nevmar 1 if never married and 0 otherwise.
emp_prior 1 if employed and 0 otherwise.
hispanic 1 if Hispanic and 0 otherwise.
ADCPC welfare amount in the year before
randomization
attitude a composite score of two attitude items - so
many family problems that I cannot work at a full time or part time job;
so much to do during the day that I cannot go to a school or job
training program - measured on the scale of 1-4.
depress_prior a composite score of three depressive
symptom items - sad, depressed, blues, and lonely - in the week before
randomization measured on the scale of 1-4.
workpref one’s level of preference for taking care of
family full time than working on the scale of 1-4.
nohsdip 1 if had never obtained a high school diploma or
a General Educational Development certificate and 0 otherwise.
We first assess the research question by visualizing the data. Please note that this can be misleading when the treatment is not randomized within levels of the moderator due to potential confounding of the treatment-outcome relationship. We do not need to worry about this issue here because treatment is randomized.
newws$i.treat = as.factor(newws$treat)
newws$i.CHCNT = as.factor(newws$CHCNT)
# If the treatment and the categorical moderator are both coded as numeric values, need to factorize them first. This should be done before data visualization and model fitting. In model fitting, it is fine if a binary variable is not factorized.
library(ggplot2)
ggplot(newws) +
aes(x = i.treat, y = depression, fill = i.CHCNT) +
geom_boxplot()
We can tell from the plot that the LFA participants who had one, three, or more children at baseline had lower level of depression compared to their counterparts in the control group. In contrast, LFA increased the depression level of those who had two children at baseline.
Because treatment is randomized for the whole sample, treatment is also randomized at each number of children at baseline. Therefore, the identification assumption is satisfied.
To further increase estimation efficiency, we can further adjust for pretreatment covariates that predict the outcome in the regression. If the relationship between each covariate and the outcome differs across levels of the moderator, an interaction between the covariate and the moderator should also be included in the model. Such decisions can be made based on theoretical reasoning. If you are not sure, and there is sufficient sample size, you may adjust for all the available pretreatment covariates and their interactions with the moderator in the regression.
# Multiple regression with interaction
mod = lm(depression ~ i.CHCNT * (treat + nevmar + emp_prior + hispanic + ADCPC + attitude + depress_prior + workpref + nohsdip), data = newws)
# Print results
summary(mod)
##
## Call:
## lm(formula = depression ~ i.CHCNT * (treat + nevmar + emp_prior +
## hispanic + ADCPC + attitude + depress_prior + workpref +
## nohsdip), data = newws)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.496 -5.094 -1.915 3.743 26.445
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.5504134 2.8956698 -0.535 0.59254
## i.CHCNT2 9.6754597 4.1562876 2.328 0.02022 *
## i.CHCNT3 9.4657952 3.7756359 2.507 0.01241 *
## treat -0.0853537 1.1507255 -0.074 0.94089
## nevmar 2.3124889 1.0666098 2.168 0.03051 *
## emp_prior 1.7179735 1.1107451 1.547 0.12242
## hispanic 3.1148937 1.1676247 2.668 0.00782 **
## ADCPC -0.0002824 0.0001844 -1.531 0.12616
## attitude 1.4776292 0.8876905 1.665 0.09647 .
## depress_prior 1.6976549 0.5854195 2.900 0.00386 **
## workpref -0.5462867 0.7758199 -0.704 0.48159
## nohsdip 0.7485183 1.0551033 0.709 0.47831
## i.CHCNT2:treat 2.2091787 1.5881301 1.391 0.16467
## i.CHCNT3:treat -0.8030672 1.5611645 -0.514 0.60714
## i.CHCNT2:nevmar -1.8564583 1.5073971 -1.232 0.21855
## i.CHCNT3:nevmar -1.0172542 1.5543422 -0.654 0.51304
## i.CHCNT2:emp_prior -1.0763722 1.5865357 -0.678 0.49773
## i.CHCNT3:emp_prior -2.8088565 1.6425156 -1.710 0.08772 .
## i.CHCNT2:hispanic -2.5276778 1.6154229 -1.565 0.11813
## i.CHCNT3:hispanic -2.2756702 1.6091053 -1.414 0.15776
## i.CHCNT2:ADCPC 0.0001669 0.0002419 0.690 0.49051
## i.CHCNT3:ADCPC 0.0002439 0.0002196 1.110 0.26730
## i.CHCNT2:attitude -2.4911672 1.2698138 -1.962 0.05020 .
## i.CHCNT3:attitude -3.1147211 1.2096840 -2.575 0.01024 *
## i.CHCNT2:depress_prior 0.3400050 0.8605735 0.395 0.69290
## i.CHCNT3:depress_prior -0.1400105 0.8420546 -0.166 0.86799
## i.CHCNT2:workpref -0.1488115 1.0423771 -0.143 0.88652
## i.CHCNT3:workpref 0.8358643 1.0628641 0.786 0.43190
## i.CHCNT2:nohsdip -0.7524831 1.5119425 -0.498 0.61887
## i.CHCNT3:nohsdip -0.2521325 1.5018347 -0.168 0.86673
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.506 on 664 degrees of freedom
## Multiple R-squared: 0.09991, Adjusted R-squared: 0.0606
## F-statistic: 2.542 on 29 and 664 DF, p-value: 2.009e-05
Under the identification assumption that the treatment is randomized within levels of the moderator, and the model-based assumptions of linearity and additivity, the coefficients of the interaction between i.CHCNT2 and treat and that between i.CHCNT3 and treat are unbiased estimates of the moderated treatment effects.
By dividing the effect estimates with the standard deviation of the outcome, we can obtain the estimated effect sizes.
coef(mod)["treat"]/sd(newws$depression)
## treat
## -0.01102072
coef(mod)["i.CHCNT2:treat"]/sd(newws$depression)
## i.CHCNT2:treat
## 0.2852453
coef(mod)["i.CHCNT3:treat"]/sd(newws$depression)
## i.CHCNT3:treat
## -0.1036906
The results indicate that compared to those with one child at baseline, the LFA impact among those with two children at baseline is higher. The difference is estimated to be 2.21 (p = 0.16), which accounts for 29 percent of the standard deviation of the outcome. In contrast, the LFA impact among those with three or more children at baseline is lower than that among those with one child at baseline. The difference is estimated to be -0.80 (p = 0.61), which accounts for -10 percent of the standard deviation of the outcome. The effect sizes of the two moderated treatment effects are not negligible. Nevertheless, both effects are insignificant. This is because the sample size is not sufficient to detect significant moderated treatment effects.