Final Project Scaffolding Assignment #2 - Statistical Analysis Plan
Author
Jingyi Yang
Questions
Primary Research Question: List our your primary research question for this research overall
How work-family conflict is related to health outcomes: control multiple variables.
Who/what is included in the data you are using? What units are in your data? What population does your data generalize to? e.g. countries, adult Americans, etc. State who is included in your sample here.
The units are per person, and the population is Americans, and the sample includes the 4149 representatives across America.
Geographic Areas Included in the Analysis List out the geographies that are included in your analysis. Is there grouping in your data that requires you to use a multilevel model? Or do you have interest in how a group level predictor might influence individual level behavior?
The population is American, so the data do not require a multilevel model. However, in the future, I would like to try the General Social Survey in other countries, like China, which will make the country a group-level predictor.
Dependent Variable List out each dependent variable you plan on including in your final project. Then show a distribution of the responses with any NAs/non-substantive answers removed.
The dependent variables are physical health and mental health conditions. The response distribution is concentrated in lower numbers, like 1 or 2, which means for most of the respondents, the number of days in bad physical and mental condition is low.
Hypotheses List all your hypotheses here including any interactive hypotheses that you might want to test.
The job interferes with family life and will have a positive relationship with poor health conditions. Family life interfering with jobs will have a positive relationship with poor mental health conditions. Some control variables, like working status and marriage status, might have relationships with family and work conflict.
Primary Modeling Approach What regression modeling approach - i.e. OLS, binary logit/probit, ordered logit/probit, count model, multilevel model, etc. - will you pursue with this type of dependent variable? If you have multiple DVs listed above, you should have multiple modeling approaches here so for each DV explicitly state the approach you think you will use.
I will use Ordered Logit/Probit and Matching and/or Balancing Procedures to build models for both dependent variables, physical health status, and mental health status.
Primary Explanatory Independent Variables List your primary explanatory IVs here including how they are measured. Should each IV be inputted as a numeric or factor? Or is that something that you will evaluate empirically? For each primary IV, list that specifically.
The independent variables are work interrupt family and family interrupt family. They are factors when doing the Ordered Logit/Probit model and changed to numerical for the Matching and/or Balancing Procedures model to match the model calculation.
Primary Control Variables List your primary control variables here. For survey data. everyone should at a minimum have key demographic control variables listed here including gender/sex identity, age, education, and race/ethnicity (depending on the geographic location). There are likely other key control variables to include as well so do not stop at those listed.
The control variables I might use are include 1) respondents’ working status, 2) Respondents’ marriage status,3) number for people in the household, 4) respondents’ age, 5) respondents’ education level, 6) respondents’ family income, 7) respondents’ race, and 8) respondents’ sex.
I chose these variables based on the articles related to the topic. Here is the frequency table about the control variables appeared in the eight articles.
Variables Name
Notes
Frequency Number
age
8
gender
5
education
5
Marital status
5
Working hours
5
race
3
number of children living at home
3
family income (income)
2
income
2
body mass (wight)
2
Parental status
1 = child(ren) livingat home and 2 = no children at home
2
living arrangement
Live with spouse, and others
2
family history of heart disease
1
heavy drinking
1
Socioeconomic Index for Occupations
1
Location
1
presence of a long term disease
1
Work schedule
1
work environment
1
psychological job demands
1
decision latitude
1
social support at work
1
Emotional demands
1
changing domestic roles
1
changing work characteristics
1
job category
1
Shift work
1
Socioeconomic position
1
height
1
#Prepare
library(haven)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.3
Warning: package 'ggplot2' was built under R version 4.4.3
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(semPlot)library(lavaan)
This is lavaan 0.6-19
lavaan is FREE software! Please report any bugs.
library(psych)
Attaching package: 'psych'
The following object is masked from 'package:lavaan':
cor2cov
The following objects are masked from 'package:ggplot2':
%+%, alpha
library(skimr) library(corrplot)
corrplot 0.95 loaded
library(patchwork) #Merge GGPlots together library(ggplot2) #Graphinglibrary(stargazer) #Tabular Regression Results
Please cite as:
Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(jtools) #Tabular Regression Results
Warning: package 'jtools' was built under R version 4.4.3
library(descr) #Easy Frequency Tables
Warning: package 'descr' was built under R version 4.4.3
library(stats) #Imports survey data library(ggeffects) #Predicted Probabilities from Regressions
Warning: package 'ggeffects' was built under R version 4.4.3
Warning: package 'brant' was built under R version 4.4.3
library(boot) #Create CIs for Multinomial Modeling
Attaching package: 'boot'
The following object is masked from 'package:psych':
logit
library(cem) #Coarsened Exact Matching
Warning: package 'cem' was built under R version 4.4.3
Loading required package: tcltk
Loading required package: lattice
Attaching package: 'lattice'
The following object is masked from 'package:boot':
melanoma
How to use CEM? Type vignette("cem")
library(MatchIt) #Coarsened Exact Matching
Warning: package 'MatchIt' was built under R version 4.4.3
Warning: package 'WeightIt' was built under R version 4.4.3
library(Hmisc) #General Modeling
Attaching package: 'Hmisc'
The following object is masked from 'package:jtools':
%nin%
The following object is masked from 'package:psych':
describe
The following objects are masked from 'package:dplyr':
src, summarize
The following objects are masked from 'package:base':
format.pval, units
library(ebal) #Entropy Balancing
##
## ebal Package: Implements Entropy Balancing.
## See http://www.stanford.edu/~jhain/ for additional information.
library(survey) #Applying EB weights
Loading required package: grid
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Loading required package: survival
Attaching package: 'survival'
The following object is masked from 'package:boot':
aml
Attaching package: 'survey'
The following object is masked from 'package:Hmisc':
deff
The following object is masked from 'package:WeightIt':
calibrate
The following object is masked from 'package:graphics':
dotchart
Re-fitting to get Hessian
Re-fitting to get Hessian
Warning in FUN(X[[i]], ...): tidy() does not return p values for models of
class data.frame; significance stars not printed.
Warning in FUN(X[[i]], ...): tidy() does not return p values for models of
class data.frame; significance stars not printed.
ggplot(c_physical_health, aes(x = response.level, y = predicted, fill =factor(x))) +geom_bar(stat ="identity", position ="dodge") +# Bar plot# Add error bars for confidence intervalsgeom_errorbar(aes(ymin = conf.low, ymax = conf.high), width =0.4, position =position_dodge(width =0.7))+theme_minimal(base_size =13)+labs(x ="Response Level: Physical Health Status", y ="Predicted Probability", title ="Predicted Probability about Physical Health Status with Job interfere the family life")+scale_fill_discrete(labels=c("1"="NEVER","2"="RARELY","3"="SOMETIMES","4"="OFTEN"))+scale_x_discrete(labels=c("0"="None","1"="One","2"="Two", "3"="Three", "4"="Four", "5"="Five", "6"="Six", "7"="Seven", "8"="Over Eight"))+guides(fill =guide_legend(title ="Job interfere the family life", nrow=1), color ="none")+theme(legend.position ="bottom")
ggplot(d_physical_health, aes(x = response.level, y = predicted, fill =factor(x))) +geom_bar(stat ="identity", position ="dodge") +# Bar plot# Add error bars for confidence intervalsgeom_errorbar(aes(ymin = conf.low, ymax = conf.high), width =0.4, position =position_dodge(width =0.7))+theme_minimal(base_size =13)+labs(x ="Response Level: Physical Health Status", y ="Predicted Probability", title ="Predicted Probability about Physical Health Status \n with Family life interfere the Job")+scale_fill_discrete(labels=c("1"="NEVER","2"="RARELY","3"="SOMETIMES","4"="OFTEN"))+scale_x_discrete(labels=c("0"="None","1"="One","2"="Two", "3"="Three", "4"="Four", "5"="Five", "6"="Six", "7"="Seven", "8"="Over Eight"))+guides(fill =guide_legend(title ="Family life interfere the Job"), color ="none")+theme(legend.position ="bottom")
Re-fitting to get Hessian
Re-fitting to get Hessian
Warning in FUN(X[[i]], ...): tidy() does not return p values for models of
class data.frame; significance stars not printed.
Warning in FUN(X[[i]], ...): tidy() does not return p values for models of
class data.frame; significance stars not printed.
ggplot(c_mental_health, aes(x = response.level, y = predicted, fill =factor(x))) +geom_bar(stat ="identity", position ="dodge") +# Bar plot# Add error bars for confidence intervalsgeom_errorbar(aes(ymin = conf.low, ymax = conf.high), width =0.4, position =position_dodge(width =0.7))+theme_minimal(base_size =13)+labs(x ="Response Level: Mental Health Status", y ="Predicted Probability", title ="Predicted Probability about Mental Health Status with Job interfere the family life")+scale_fill_discrete(labels=c("1"="NEVER","2"="RARELY","3"="SOMETIMES","4"="OFTEN"))+scale_x_discrete(labels=c("0"="None","1"="One","2"="Two", "3"="Three", "4"="Four", "5"="Five", "6"="Six", "7"="Seven", "8"="Over Eight"))+guides(fill =guide_legend(title ="Job interfere the family life", nrow=1), color ="none")+theme(legend.position ="bottom")
ggplot(d_mental_health, aes(x = response.level, y = predicted, fill =factor(x))) +geom_bar(stat ="identity", position ="dodge") +# Bar plot# Add error bars for confidence intervalsgeom_errorbar(aes(ymin = conf.low, ymax = conf.high), width =0.4, position =position_dodge(width =0.7))+theme_minimal(base_size =13)+labs(x ="Response Level: Mental Health Status", y ="Predicted Probability", title ="Predicted Probability about Mental Health Status \n with Family life interfere the Job")+scale_fill_discrete(labels=c("1"="NEVER","2"="RARELY","3"="SOMETIMES","4"="OFTEN"))+scale_x_discrete(labels=c("0"="None","1"="One","2"="Two", "3"="Three", "4"="Four", "5"="Five", "6"="Six", "7"="Seven", "8"="Over Eight"))+guides(fill =guide_legend(title ="Family life interfere the Job"), color ="none")+theme(legend.position ="bottom")
##What is the impact of employment training on earnings?#Testing for Imbalance Between Groupscheck<-test_data_new_wif %>%group_by(binary_wif) %>%summarise_at(vars(physical_health_status,mental_health_status,marriage_status,total_people_in_household,age_group,education,family_income,race,sex, working_status), list(mean = mean,var = var))round(t(check), 3)
match_exact <-matchit(binary_wif ~physical_health_status+mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, method ="exact", data = test_data_new_wif)data_exact <-match.data(match_exact) #Creates new dataframe that only includes the matched casessummary(match_exact)
Call:
matchit(formula = binary_wif ~ physical_health_status + mental_health_status +
marriage_status + total_people_in_household + age_group +
education + family_income + race + sex + working_status,
data = test_data_new_wif, method = "exact")
Summary of Balance for All Data:
Means Treated Means Control Std. Mean Diff.
physical_health_status 1.8958 1.5027 0.1441
mental_health_status 3.1157 2.0860 0.3076
marriage_status 3.2940 3.1398 0.0860
total_people_in_household 1.8889 1.8611 0.0195
age_group 2.1123 2.1935 -0.1063
education 3.1701 2.9498 0.1736
family_income 11.7083 11.5565 0.1137
race 1.4931 1.5341 -0.0542
sex 1.4965 1.5036 -0.0141
working_status 1.2176 1.3082 -0.1737
Var. Ratio eCDF Mean eCDF Max
physical_health_status 1.0618 0.0437 0.1027
mental_health_status 1.2291 0.1144 0.1606
marriage_status 0.9988 0.0308 0.0528
total_people_in_household 0.9320 0.0095 0.0301
age_group 0.7941 0.0288 0.0675
education 1.0185 0.0441 0.0867
family_income 0.7161 0.0127 0.0511
race 0.9616 0.0137 0.0289
sex 1.0003 0.0035 0.0071
working_status 0.8729 0.0313 0.0923
Summary of Balance for Matched Data:
Means Treated Means Control Std. Mean Diff.
physical_health_status 0.2857 0.2857 0
mental_health_status 0.8057 0.8057 -0
marriage_status 3.5314 3.5314 0
total_people_in_household 1.5371 1.5371 0
age_group 2.1771 2.1771 0
education 3.1371 3.1371 -0
family_income 12.0000 12.0000 0
race 1.3829 1.3829 -0
sex 1.3829 1.3829 -0
working_status 1.0343 1.0343 0
Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
physical_health_status 0.9981 0 0 0
mental_health_status 0.9981 0 0 0
marriage_status 0.9981 0 0 0
total_people_in_household 0.9981 0 0 0
age_group 0.9981 0 0 0
education 0.9981 0 0 0
family_income . 0 0 0
race 0.9981 0 0 0
sex 0.9981 0 0 0
working_status 0.9981 0 0 0
Sample Sizes:
Control Treated
All 1116. 864
Matched (ESS) 131.13 175
Matched 208. 175
Unmatched 908. 689
Discarded 0. 0
imbalance_exact <-imbalance(group = data_exact$binary_wif, data = data_exact, drop =c("physical_health_status", "binary_wif", "weights", "subclass")) #With matched data, always add weights and subclass here
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
imbalance_exact
Multivariate Imbalance Measure: L1=0.270
Percentage of local common support: LCS=100.0%
Univariate Imbalance Measures:
statistic type L1 min 25% 50% 75% max
mental_health_status 0.67555599 (Chi2) 0.006263736 NA NA NA NA NA
marriage_status 0.57735084 (Chi2) 0.024258242 NA NA NA NA NA
total_people_in_household 1.20128547 (Chi2) 0.036813187 NA NA NA NA NA
age_group 1.42665653 (Chi2) 0.057912088 NA NA NA NA NA
education 8.16204702 (Chi2) 0.132664835 NA NA NA NA NA
family_income 2.84334204 (Chi2) 0.000000000 NA NA NA NA NA
race 0.63276543 (Chi2) 0.023846154 NA NA NA NA NA
sex 0.02220519 (Chi2) 0.012664835 NA NA NA NA NA
working_status 0.41031836 (Chi2) 0.009890110 NA NA NA NA NA
###Match Coarsened Exact ###Perform the matching here with code that resembles most regressionsmatch_cem <-matchit(binary_wif ~physical_health_status+mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, method ="cem", data = test_data_new_wif)data_cem <-match.data(match_cem) #Creates new dataframe that only includes the matched casessummary(match_cem)
Call:
matchit(formula = binary_wif ~ physical_health_status + mental_health_status +
marriage_status + total_people_in_household + age_group +
education + family_income + race + sex + working_status,
data = test_data_new_wif, method = "cem")
Summary of Balance for All Data:
Means Treated Means Control Std. Mean Diff.
physical_health_status 1.8958 1.5027 0.1441
mental_health_status 3.1157 2.0860 0.3076
marriage_status 3.2940 3.1398 0.0860
total_people_in_household 1.8889 1.8611 0.0195
age_group 2.1123 2.1935 -0.1063
education 3.1701 2.9498 0.1736
family_income 11.7083 11.5565 0.1137
race 1.4931 1.5341 -0.0542
sex 1.4965 1.5036 -0.0141
working_status 1.2176 1.3082 -0.1737
Var. Ratio eCDF Mean eCDF Max
physical_health_status 1.0618 0.0437 0.1027
mental_health_status 1.2291 0.1144 0.1606
marriage_status 0.9988 0.0308 0.0528
total_people_in_household 0.9320 0.0095 0.0301
age_group 0.7941 0.0288 0.0675
education 1.0185 0.0441 0.0867
family_income 0.7161 0.0127 0.0511
race 0.9616 0.0137 0.0289
sex 1.0003 0.0035 0.0071
working_status 0.8729 0.0313 0.0923
Summary of Balance for Matched Data:
Means Treated Means Control Std. Mean Diff.
physical_health_status 0.2857 0.2857 0
mental_health_status 0.8057 0.8057 -0
marriage_status 3.5314 3.5314 0
total_people_in_household 1.5371 1.5371 0
age_group 2.1771 2.1771 0
education 3.1371 3.1371 -0
family_income 12.0000 12.0000 0
race 1.3829 1.3829 -0
sex 1.3829 1.3829 -0
working_status 1.0343 1.0343 0
Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
physical_health_status 0.9981 0 0 0
mental_health_status 0.9981 0 0 0
marriage_status 0.9981 0 0 0
total_people_in_household 0.9981 0 0 0
age_group 0.9981 0 0 0
education 0.9981 0 0 0
family_income . 0 0 0
race 0.9981 0 0 0
sex 0.9981 0 0 0
working_status 0.9981 0 0 0
Sample Sizes:
Control Treated
All 1116. 864
Matched (ESS) 131.13 175
Matched 208. 175
Unmatched 908. 689
Discarded 0. 0
imbalance_cem <-imbalance(group = data_cem$binary_wif, data = data_cem, drop =c("physical_health_status", "binary_wif", "weights", "subclass"))
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
imbalance_cem
Multivariate Imbalance Measure: L1=0.270
Percentage of local common support: LCS=100.0%
Univariate Imbalance Measures:
statistic type L1 min 25% 50% 75% max
mental_health_status 0.67555599 (Chi2) 0.006263736 NA NA NA NA NA
marriage_status 0.57735084 (Chi2) 0.024258242 NA NA NA NA NA
total_people_in_household 1.20128547 (Chi2) 0.036813187 NA NA NA NA NA
age_group 1.42665653 (Chi2) 0.057912088 NA NA NA NA NA
education 8.16204702 (Chi2) 0.132664835 NA NA NA NA NA
family_income 2.84334204 (Chi2) 0.000000000 NA NA NA NA NA
race 0.63276543 (Chi2) 0.023846154 NA NA NA NA NA
sex 0.02220519 (Chi2) 0.012664835 NA NA NA NA NA
working_status 0.41031836 (Chi2) 0.009890110 NA NA NA NA NA
#Compare t-tests of DV on treated - no control variables t.test(test_data_new_wif$physical_health_status, test_data_new_wif$binary_wif)
Welch Two Sample t-test
data: test_data_new_wif$physical_health_status and test_data_new_wif$binary_wif
t = 20.142, df = 2113.5, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.117356 1.358402
sample estimates:
mean of x mean of y
1.6742424 0.4363636
#Estimate Linear Regression on Raw Datalm1<-lm(physical_health_status~ binary_wif+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, test_data_new_wif)summary(lm1)
#######Entropy Balancing# Create a subset of the dataset with the selected variablestreatment_var <-"binary_wif"covariates_vars <-c("marriage_status", "total_people_in_household", "age_group","education", "family_income", "race", "sex", "working_status")dependent_var <-"physical_health_status"# Prepare treatment and covariatestreatment <- test_data_new_wif$binary_wifcovariates <- test_data_new_wif[, covariates_vars]# Run entropy balancinge_bal <-ebalance(Treatment = treatment,X = covariates,max.iterations =200,constraint.tolerance =1)
Converged within tolerance
# Add weights back to LLtest_data_new_wif$eb_weight <-NAtest_data_new_wif$eb_weight[test_data_new_wif[[treatment_var]] ==1] <-1# Treated units get weight = 1test_data_new_wif$eb_weight[test_data_new_wif[[treatment_var]] ==0] <- e_bal$w # Control units get EB weights# Final data for regressioneb_data <- test_data_new_wif %>%filter(!is.na(eb_weight)) # Exclude unmatched if any#data for analysis #Now have a weight called 'eb_weight' that can be used in analysis ##Let's check that the two groups are equal now eb_data %>%group_by(binary_wif) %>%summarise(age_weighted_mean =wtd.mean(age_group, weights = eb_weight), age_weighted_variance =wtd.var(age_group, weights = eb_weight) )
##What is the impact of employment training on earnings?#Testing for Imbalance Between Groupscheck<-test_data_new_fiw %>%group_by(binary_fiw) %>%summarise_at(vars(physical_health_status,mental_health_status,marriage_status,total_people_in_household,age_group,education,family_income,race,sex, working_status), list(mean = mean,var = var))round(t(check), 3)
match_exact_fiw_1 <-matchit(binary_fiw ~physical_health_status+mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, method ="exact", data = test_data_new_fiw)data_exact_fiw_1 <-match.data(match_exact_fiw_1) #Creates new dataframe that only includes the matched casessummary(match_exact_fiw_1)
Call:
matchit(formula = binary_fiw ~ physical_health_status + mental_health_status +
marriage_status + total_people_in_household + age_group +
education + family_income + race + sex + working_status,
data = test_data_new_fiw, method = "exact")
Summary of Balance for All Data:
Means Treated Means Control Std. Mean Diff.
physical_health_status 2.0478 1.5007 0.1929
mental_health_status 3.1799 2.2359 0.2849
marriage_status 3.4506 3.0939 0.2014
total_people_in_household 1.9920 1.8180 0.1172
age_group 2.1290 2.1716 -0.0570
education 3.1449 3.0000 0.1124
family_income 11.6194 11.6243 -0.0032
race 1.5334 1.5081 0.0322
sex 1.5111 1.4956 0.0312
working_status 1.2611 1.2722 -0.0198
Var. Ratio eCDF Mean eCDF Max
physical_health_status 1.1892 0.0608 0.1114
mental_health_status 1.1331 0.1049 0.1517
marriage_status 0.9734 0.0713 0.1093
total_people_in_household 1.0761 0.0226 0.0844
age_group 0.7738 0.0328 0.0642
education 1.0559 0.0290 0.0637
family_income 1.0335 0.0039 0.0126
race 1.0783 0.0084 0.0226
sex 1.0004 0.0078 0.0156
working_status 1.0733 0.0124 0.0241
Summary of Balance for Matched Data:
Means Treated Means Control Std. Mean Diff.
physical_health_status 0.3411 0.3411 0
mental_health_status 1.1628 1.1628 0
marriage_status 3.5271 3.5271 0
total_people_in_household 1.5736 1.5736 0
age_group 2.1783 2.1783 0
education 3.1628 3.1628 -0
family_income 12.0000 12.0000 0
race 1.4496 1.4496 0
sex 1.3876 1.3876 0
working_status 1.0388 1.0388 0
Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
physical_health_status 1.0001 0 0 0
mental_health_status 1.0001 0 0 0
marriage_status 1.0001 0 0 0
total_people_in_household 1.0001 0 0 0
age_group 1.0001 0 0 0
education 1.0001 0 0 0
family_income . 0 0 0
race 1.0001 0 0 0
sex 1.0001 0 0 0
working_status 1.0001 0 0 0
Sample Sizes:
Control Treated
All 1352. 628
Matched (ESS) 131.16 129
Matched 205. 129
Unmatched 1147. 499
Discarded 0. 0
imbalance_exact_fiw_1 <-imbalance(group = data_exact_fiw_1$binary_fiw, data = data_exact_fiw_1, drop =c("physical_health_status", "binary_fiw", "weights", "subclass")) #With matched data, always add weights and subclass here
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
imbalance_exact_fiw_1
Multivariate Imbalance Measure: L1=0.276
Percentage of local common support: LCS=100.0%
Univariate Imbalance Measures:
statistic type L1 min 25% 50% 75% max
mental_health_status 5.1320068 (Chi2) 0.060351673 NA NA NA NA NA
marriage_status 0.4797745 (Chi2) 0.003970505 NA NA NA NA NA
total_people_in_household 2.2527992 (Chi2) 0.049725846 NA NA NA NA NA
age_group 0.7456901 (Chi2) 0.033692569 NA NA NA NA NA
education 2.4581016 (Chi2) 0.084363774 NA NA NA NA NA
family_income 17.2934132 (Chi2) 0.000000000 NA NA NA NA NA
race 2.4630622 (Chi2) 0.075061448 NA NA NA NA NA
sex 0.3086324 (Chi2) 0.036377387 NA NA NA NA NA
working_status 0.1768549 (Chi2) 0.014369446 NA NA NA NA NA
###Match Coarsened Exact ###Perform the matching here with code that resembles most regressionsmatch_cem_fiw_1 <-matchit(binary_fiw ~physical_health_status+mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, method ="cem", data = test_data_new_fiw)data_cem_fiw_1 <-match.data(match_cem_fiw_1) #Creates new dataframe that only includes the matched casessummary(match_cem_fiw_1)
Call:
matchit(formula = binary_fiw ~ physical_health_status + mental_health_status +
marriage_status + total_people_in_household + age_group +
education + family_income + race + sex + working_status,
data = test_data_new_fiw, method = "cem")
Summary of Balance for All Data:
Means Treated Means Control Std. Mean Diff.
physical_health_status 2.0478 1.5007 0.1929
mental_health_status 3.1799 2.2359 0.2849
marriage_status 3.4506 3.0939 0.2014
total_people_in_household 1.9920 1.8180 0.1172
age_group 2.1290 2.1716 -0.0570
education 3.1449 3.0000 0.1124
family_income 11.6194 11.6243 -0.0032
race 1.5334 1.5081 0.0322
sex 1.5111 1.4956 0.0312
working_status 1.2611 1.2722 -0.0198
Var. Ratio eCDF Mean eCDF Max
physical_health_status 1.1892 0.0608 0.1114
mental_health_status 1.1331 0.1049 0.1517
marriage_status 0.9734 0.0713 0.1093
total_people_in_household 1.0761 0.0226 0.0844
age_group 0.7738 0.0328 0.0642
education 1.0559 0.0290 0.0637
family_income 1.0335 0.0039 0.0126
race 1.0783 0.0084 0.0226
sex 1.0004 0.0078 0.0156
working_status 1.0733 0.0124 0.0241
Summary of Balance for Matched Data:
Means Treated Means Control Std. Mean Diff.
physical_health_status 0.3411 0.3411 0
mental_health_status 1.1628 1.1628 0
marriage_status 3.5271 3.5271 0
total_people_in_household 1.5736 1.5736 0
age_group 2.1783 2.1783 0
education 3.1628 3.1628 -0
family_income 12.0000 12.0000 0
race 1.4496 1.4496 0
sex 1.3876 1.3876 0
working_status 1.0388 1.0388 0
Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
physical_health_status 1.0001 0 0 0
mental_health_status 1.0001 0 0 0
marriage_status 1.0001 0 0 0
total_people_in_household 1.0001 0 0 0
age_group 1.0001 0 0 0
education 1.0001 0 0 0
family_income . 0 0 0
race 1.0001 0 0 0
sex 1.0001 0 0 0
working_status 1.0001 0 0 0
Sample Sizes:
Control Treated
All 1352. 628
Matched (ESS) 131.16 129
Matched 205. 129
Unmatched 1147. 499
Discarded 0. 0
imbalance_cem_fiw_1 <-imbalance(group = data_cem_fiw_1$binary_fiw, data = data_cem_fiw_1, drop =c("physical_health_status", "binary_fiw", "weights", "subclass"))
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
imbalance_cem_fiw_1
Multivariate Imbalance Measure: L1=0.276
Percentage of local common support: LCS=100.0%
Univariate Imbalance Measures:
statistic type L1 min 25% 50% 75% max
mental_health_status 5.1320068 (Chi2) 0.060351673 NA NA NA NA NA
marriage_status 0.4797745 (Chi2) 0.003970505 NA NA NA NA NA
total_people_in_household 2.2527992 (Chi2) 0.049725846 NA NA NA NA NA
age_group 0.7456901 (Chi2) 0.033692569 NA NA NA NA NA
education 2.4581016 (Chi2) 0.084363774 NA NA NA NA NA
family_income 17.2934132 (Chi2) 0.000000000 NA NA NA NA NA
race 2.4630622 (Chi2) 0.075061448 NA NA NA NA NA
sex 0.3086324 (Chi2) 0.036377387 NA NA NA NA NA
working_status 0.1768549 (Chi2) 0.014369446 NA NA NA NA NA
Welch Two Sample t-test
data: test_data_new_fiw$physical_health_status and test_data_new_fiw$binary_fiw
t = 22.125, df = 2097.5, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.236784 1.477357
sample estimates:
mean of x mean of y
1.6742424 0.3171717
#Estimate Linear Regression on Raw Datalm1_fiw_p<-lm(physical_health_status~ binary_fiw+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, test_data_new_fiw)summary(lm1_fiw_p)
#######Entropy Balancing# Create a subset of the dataset with the selected variablestreatment_var_fiw_1 <-"binary_fiw"covariates_vars_fiw_1 <-c("marriage_status", "total_people_in_household", "age_group","education", "family_income", "race", "sex", "working_status")dependent_var_fiw_1 <-"physical_health_status"# Prepare treatment and covariatestreatment_fiw_1 <- test_data_new_fiw$binary_fiwcovariates_fiw_1<- test_data_new_fiw[, covariates_vars_fiw_1]# Run entropy balancinge_bal_fiw_1 <-ebalance(Treatment = treatment_fiw_1,X = covariates_fiw_1,max.iterations =200,constraint.tolerance =1)
Converged within tolerance
test_data_new_fiw$eb_weight_fiw_1 <-NAtest_data_new_fiw$eb_weight_fiw_1[test_data_new_fiw[[treatment_var_fiw_1]] ==1] <-1# Treated units get weight = 1test_data_new_fiw$eb_weight_fiw_1[test_data_new_fiw[[treatment_var_fiw_1]] ==0] <- e_bal_fiw_1$w # Control units get EB weights# Final data for regressioneb_data_fiw_1 <- test_data_new_fiw %>%filter(!is.na(eb_weight_fiw_1)) # Exclude unmatched if any#data for analysis #Now have a weight called 'eb_weight' that can be used in analysis ##Let's check that the two groups are equal now eb_data_fiw_1 %>%group_by(binary_fiw) %>%summarise(age_weighted_mean_fiw_1 =wtd.mean(age_group, weights = eb_weight_fiw_1), age_weighted_variance_fiw_1 =wtd.var(age_group, weights = eb_weight_fiw_1) )
##What is the impact of employment training on earnings?#Testing for Imbalance Between Groupscheck_wif_m<-test_data_new_wif_m %>%group_by(binary_wif) %>%summarise_at(vars(mental_health_status,marriage_status,total_people_in_household,age_group,education,family_income,race,sex, working_status), list(mean = mean,var = var))round(t(check_wif_m), 3)
match_exact_wif_2 <-matchit(binary_wif~mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, method ="exact", data =test_data_new_wif_m)data_exact_wif_2 <-match.data(match_exact_wif_2) #Creates new dataframe that only includes the matched casessummary(match_exact_wif_2)
Call:
matchit(formula = binary_wif ~ mental_health_status + marriage_status +
total_people_in_household + age_group + education + family_income +
race + sex + working_status, data = test_data_new_wif_m,
method = "exact")
Summary of Balance for All Data:
Means Treated Means Control Std. Mean Diff.
mental_health_status 3.1157 2.0860 0.3076
marriage_status 3.2940 3.1398 0.0860
total_people_in_household 1.8889 1.8611 0.0195
age_group 2.1123 2.1935 -0.1063
education 3.1701 2.9498 0.1736
family_income 11.7083 11.5565 0.1137
race 1.4931 1.5341 -0.0542
sex 1.4965 1.5036 -0.0141
working_status 1.2176 1.3082 -0.1737
Var. Ratio eCDF Mean eCDF Max
mental_health_status 1.2291 0.1144 0.1606
marriage_status 0.9988 0.0308 0.0528
total_people_in_household 0.9320 0.0095 0.0301
age_group 0.7941 0.0288 0.0675
education 1.0185 0.0441 0.0867
family_income 0.7161 0.0127 0.0511
race 0.9616 0.0137 0.0289
sex 1.0003 0.0035 0.0071
working_status 0.8729 0.0313 0.0923
Summary of Balance for Matched Data:
Means Treated Means Control Std. Mean Diff.
mental_health_status 1.6055 1.6055 0
marriage_status 3.6125 3.6125 0
total_people_in_household 1.4844 1.4844 0
age_group 2.1938 2.1938 0
education 3.3149 3.3149 0
family_income 11.9965 11.9965 -0
race 1.3391 1.3391 0
sex 1.3945 1.3945 0
working_status 1.0484 1.0484 0
Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
mental_health_status 0.9985 0 0 0
marriage_status 0.9985 0 0 0
total_people_in_household 0.9985 0 0 0
age_group 0.9985 0 0 0
education 0.9985 0 0 0
family_income 0.9985 0 0 0
race 0.9985 0 0 0
sex 0.9985 0 0 0
working_status 0.9985 0 0 0
Sample Sizes:
Control Treated
All 1116. 864
Matched (ESS) 202.12 289
Matched 317. 289
Unmatched 799. 575
Discarded 0. 0
imbalance_exact_wif_2 <-imbalance(group = data_exact_wif_2$binary_wif, data = data_exact_wif_2, drop =c("mental_health_status", "binary_wif", "weights", "subclass")) #With matched data, always add weights and subclass here
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
imbalance_exact_wif_2
Multivariate Imbalance Measure: L1=0.219
Percentage of local common support: LCS=100.0%
Univariate Imbalance Measures:
statistic type L1 min 25% 50% 75% max
marriage_status 2.006001e+00 (Chi2) 0.055723533 NA NA NA NA NA
total_people_in_household 1.923340e+00 (Chi2) 0.032069684 NA NA NA NA NA
age_group 3.474368e+00 (Chi2) 0.039470381 NA NA NA NA NA
education 7.140104e+00 (Chi2) 0.099985810 NA NA NA NA NA
family_income 7.705697e-32 (Chi2) 0.000000000 NA NA NA NA NA
race 1.839743e-01 (Chi2) 0.013251394 NA NA NA NA NA
sex 8.779728e-01 (Chi2) 0.040867562 NA NA NA NA NA
working_status 5.372074e-02 (Chi2) 0.002641547 NA NA NA NA NA
###Match Coarsened Exact ###Perform the matching here with code that resembles most regressionsmatch_cem_fiw_2 <-matchit(binary_wif ~mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, method ="cem", data = test_data_new_wif_m)data_cem_fiw_2 <-match.data(match_cem_fiw_2) #Creates new dataframe that only includes the matched casessummary(match_cem_fiw_2)
Call:
matchit(formula = binary_wif ~ mental_health_status + marriage_status +
total_people_in_household + age_group + education + family_income +
race + sex + working_status, data = test_data_new_wif_m,
method = "cem")
Summary of Balance for All Data:
Means Treated Means Control Std. Mean Diff.
mental_health_status 3.1157 2.0860 0.3076
marriage_status 3.2940 3.1398 0.0860
total_people_in_household 1.8889 1.8611 0.0195
age_group 2.1123 2.1935 -0.1063
education 3.1701 2.9498 0.1736
family_income 11.7083 11.5565 0.1137
race 1.4931 1.5341 -0.0542
sex 1.4965 1.5036 -0.0141
working_status 1.2176 1.3082 -0.1737
Var. Ratio eCDF Mean eCDF Max
mental_health_status 1.2291 0.1144 0.1606
marriage_status 0.9988 0.0308 0.0528
total_people_in_household 0.9320 0.0095 0.0301
age_group 0.7941 0.0288 0.0675
education 1.0185 0.0441 0.0867
family_income 0.7161 0.0127 0.0511
race 0.9616 0.0137 0.0289
sex 1.0003 0.0035 0.0071
working_status 0.8729 0.0313 0.0923
Summary of Balance for Matched Data:
Means Treated Means Control Std. Mean Diff.
mental_health_status 1.6055 1.6055 0
marriage_status 3.6125 3.6125 0
total_people_in_household 1.4844 1.4844 0
age_group 2.1938 2.1938 0
education 3.3149 3.3149 0
family_income 11.9965 11.9965 -0
race 1.3391 1.3391 0
sex 1.3945 1.3945 0
working_status 1.0484 1.0484 0
Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
mental_health_status 0.9985 0 0 0
marriage_status 0.9985 0 0 0
total_people_in_household 0.9985 0 0 0
age_group 0.9985 0 0 0
education 0.9985 0 0 0
family_income 0.9985 0 0 0
race 0.9985 0 0 0
sex 0.9985 0 0 0
working_status 0.9985 0 0 0
Sample Sizes:
Control Treated
All 1116. 864
Matched (ESS) 202.12 289
Matched 317. 289
Unmatched 799. 575
Discarded 0. 0
imbalance_cem_fiw_2 <-imbalance(group = data_cem_fiw_2$binary_wif, data = data_cem_fiw_2, drop =c("mental_health_status", "binary_wif", "weights", "subclass"))
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
imbalance_cem_fiw_2
Multivariate Imbalance Measure: L1=0.219
Percentage of local common support: LCS=100.0%
Univariate Imbalance Measures:
statistic type L1 min 25% 50% 75% max
marriage_status 2.006001e+00 (Chi2) 0.055723533 NA NA NA NA NA
total_people_in_household 1.923340e+00 (Chi2) 0.032069684 NA NA NA NA NA
age_group 3.474368e+00 (Chi2) 0.039470381 NA NA NA NA NA
education 7.140104e+00 (Chi2) 0.099985810 NA NA NA NA NA
family_income 7.705697e-32 (Chi2) 0.000000000 NA NA NA NA NA
race 1.839743e-01 (Chi2) 0.013251394 NA NA NA NA NA
sex 8.779728e-01 (Chi2) 0.040867562 NA NA NA NA NA
working_status 5.372074e-02 (Chi2) 0.002641547 NA NA NA NA NA
#Compare t-tests of DV on treated - no control variables t.test(test_data_new_wif_m$mental_health_status, test_data_new_wif_m$binary_wif)
Welch Two Sample t-test
data: test_data_new_wif_m$mental_health_status and test_data_new_wif_m$binary_wif
t = 28.782, df = 2073.7, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.955972 2.242008
sample estimates:
mean of x mean of y
2.5353535 0.4363636
#Estimate Linear Regression on Raw Datalm1_wif_m<-lm(mental_health_status~ binary_wif+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, test_data_new_wif_m)summary(lm1_wif_m)
#######Entropy Balancing# Create a subset of the dataset with the selected variablestreatment_var_wif_m <-"binary_wif"covariates_vars_wif_m <-c("marriage_status", "total_people_in_household", "age_group","education", "family_income", "race", "sex", "working_status")dependent_var_wif_m <-"mental_health_status"# Prepare treatment and covariatestreatment_wif_m <- test_data_new_wif_m$binary_wifcovariates_wif_m <- test_data_new_wif_m[, covariates_vars_wif_m]# Run entropy balancinge_bal_wif_m <-ebalance(Treatment = treatment_wif_m,X = covariates_wif_m,max.iterations =200,constraint.tolerance =1)
Converged within tolerance
# Add weights back to LLtest_data_new_wif_m$eb_weight_wif_m <-NAtest_data_new_wif_m$eb_weight_wif_m[test_data_new_wif_m[[treatment_var_wif_m]] ==1] <-1# Treated units get weight = 1test_data_new_wif_m$eb_weight_wif_m[test_data_new_wif[[treatment_var_wif_m]] ==0] <- e_bal$w # Control units get EB weights# Final data for regressioneb_data_wif_m <- test_data_new_wif_m %>%filter(!is.na(eb_weight_wif_m)) # Exclude unmatched if any#data for analysis #Now have a weight called 'eb_weight' that can be used in analysis ##Let's check that the two groups are equal now eb_data_wif_m %>%group_by(binary_wif) %>%summarise(age_weighted_mean =wtd.mean(age_group, weights = eb_weight_wif_m), age_weighted_variance =wtd.var(age_group, weights = eb_weight_wif_m) )
##What is the impact of employment training on earnings?#Testing for Imbalance Between Groupscheck_fiw_m<-test_data_new_fiw_m %>%group_by(binary_fiw) %>%summarise_at(vars(mental_health_status,marriage_status,total_people_in_household,age_group,education,family_income,race,sex, working_status), list(mean = mean,var = var))round(t(check_fiw_m), 3)
match_exact_fiw_m <-matchit(binary_fiw~mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, method ="exact", data = test_data_new_fiw_m)data_exact_fiw_m <-match.data(match_exact_fiw_m) #Creates new dataframe that only includes the matched casessummary(match_exact_fiw_m)
Call:
matchit(formula = binary_fiw ~ mental_health_status + marriage_status +
total_people_in_household + age_group + education + family_income +
race + sex + working_status, data = test_data_new_fiw_m,
method = "exact")
Summary of Balance for All Data:
Means Treated Means Control Std. Mean Diff.
mental_health_status 3.1799 2.2359 0.2849
marriage_status 3.4506 3.0939 0.2014
total_people_in_household 1.9920 1.8180 0.1172
age_group 2.1290 2.1716 -0.0570
education 3.1449 3.0000 0.1124
family_income 11.6194 11.6243 -0.0032
race 1.5334 1.5081 0.0322
sex 1.5111 1.4956 0.0312
working_status 1.2611 1.2722 -0.0198
Var. Ratio eCDF Mean eCDF Max
mental_health_status 1.1331 0.1049 0.1517
marriage_status 0.9734 0.0713 0.1093
total_people_in_household 1.0761 0.0226 0.0844
age_group 0.7738 0.0328 0.0642
education 1.0559 0.0290 0.0637
family_income 1.0335 0.0039 0.0126
race 1.0783 0.0084 0.0226
sex 1.0004 0.0078 0.0156
working_status 1.0733 0.0124 0.0241
Summary of Balance for Matched Data:
Means Treated Means Control Std. Mean Diff.
mental_health_status 2.0694 2.0694 0
marriage_status 3.6481 3.6481 0
total_people_in_household 1.6481 1.6481 0
age_group 2.1204 2.1204 0
education 3.3009 3.3009 0
family_income 11.9954 11.9954 0
race 1.4213 1.4213 0
sex 1.4259 1.4259 0
working_status 1.0370 1.0370 0
Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
mental_health_status 0.9993 0 0 0
marriage_status 0.9993 0 0 0
total_people_in_household 0.9993 0 0 0
age_group 0.9993 0 0 0
education 0.9993 0 0 0
family_income 0.9993 0 0 0
race 0.9993 0 0 0
sex 0.9993 0 0 0
working_status 0.9993 0 0 0
Sample Sizes:
Control Treated
All 1352. 628
Matched (ESS) 188.4 216
Matched 329. 216
Unmatched 1023. 412
Discarded 0. 0
imbalance_exact_fiw_m <-imbalance(group = data_exact_fiw_m$binary_fiw, data = data_exact_fiw_m, drop =c("mental_health_status", "binary_fiw", "weights", "subclass")) #With matched data, always add weights and subclass here
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
imbalance_exact_fiw_m
Multivariate Imbalance Measure: L1=0.237
Percentage of local common support: LCS=100.0%
Univariate Imbalance Measures:
statistic type L1 min 25% 50% 75% max
marriage_status 2.357093e+00 (Chi2) 0.041286727 NA NA NA NA NA
total_people_in_household 6.476332e+00 (Chi2) 0.080673759 NA NA NA NA NA
age_group 2.769106e+00 (Chi2) 0.063210627 NA NA NA NA NA
education 3.274957e+00 (Chi2) 0.073581560 NA NA NA NA NA
family_income 4.121486e-29 (Chi2) 0.000000000 NA NA NA NA NA
race 3.458818e+00 (Chi2) 0.066461218 NA NA NA NA NA
sex 0.000000e+00 (Chi2) 0.003433525 NA NA NA NA NA
working_status 3.217730e-02 (Chi2) 0.006641900 NA NA NA NA NA
###Match Coarsened Exact ###Perform the matching here with code that resembles most regressionsmatch_cem_fiw_m <-matchit(binary_fiw ~mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, method ="cem", data = test_data_new_fiw_m)data_cem_fiw_m <-match.data(match_cem_fiw_m) #Creates new dataframe that only includes the matched casessummary(match_cem_fiw_m)
Call:
matchit(formula = binary_fiw ~ mental_health_status + marriage_status +
total_people_in_household + age_group + education + family_income +
race + sex + working_status, data = test_data_new_fiw_m,
method = "cem")
Summary of Balance for All Data:
Means Treated Means Control Std. Mean Diff.
mental_health_status 3.1799 2.2359 0.2849
marriage_status 3.4506 3.0939 0.2014
total_people_in_household 1.9920 1.8180 0.1172
age_group 2.1290 2.1716 -0.0570
education 3.1449 3.0000 0.1124
family_income 11.6194 11.6243 -0.0032
race 1.5334 1.5081 0.0322
sex 1.5111 1.4956 0.0312
working_status 1.2611 1.2722 -0.0198
Var. Ratio eCDF Mean eCDF Max
mental_health_status 1.1331 0.1049 0.1517
marriage_status 0.9734 0.0713 0.1093
total_people_in_household 1.0761 0.0226 0.0844
age_group 0.7738 0.0328 0.0642
education 1.0559 0.0290 0.0637
family_income 1.0335 0.0039 0.0126
race 1.0783 0.0084 0.0226
sex 1.0004 0.0078 0.0156
working_status 1.0733 0.0124 0.0241
Summary of Balance for Matched Data:
Means Treated Means Control Std. Mean Diff.
mental_health_status 2.0694 2.0694 0
marriage_status 3.6481 3.6481 0
total_people_in_household 1.6481 1.6481 0
age_group 2.1204 2.1204 0
education 3.3009 3.3009 0
family_income 11.9954 11.9954 0
race 1.4213 1.4213 0
sex 1.4259 1.4259 0
working_status 1.0370 1.0370 0
Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
mental_health_status 0.9993 0 0 0
marriage_status 0.9993 0 0 0
total_people_in_household 0.9993 0 0 0
age_group 0.9993 0 0 0
education 0.9993 0 0 0
family_income 0.9993 0 0 0
race 0.9993 0 0 0
sex 0.9993 0 0 0
working_status 0.9993 0 0 0
Sample Sizes:
Control Treated
All 1352. 628
Matched (ESS) 188.4 216
Matched 329. 216
Unmatched 1023. 412
Discarded 0. 0
imbalance_cem_fiw_m <-imbalance(group = data_cem_fiw_m$binary_fiw, data = data_cem_fiw_m, drop =c("mental_health_status", "binary_fiw", "weights", "subclass"))
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
imbalance_cem_fiw_m
Multivariate Imbalance Measure: L1=0.237
Percentage of local common support: LCS=100.0%
Univariate Imbalance Measures:
statistic type L1 min 25% 50% 75% max
marriage_status 2.357093e+00 (Chi2) 0.041286727 NA NA NA NA NA
total_people_in_household 6.476332e+00 (Chi2) 0.080673759 NA NA NA NA NA
age_group 2.769106e+00 (Chi2) 0.063210627 NA NA NA NA NA
education 3.274957e+00 (Chi2) 0.073581560 NA NA NA NA NA
family_income 4.121486e-29 (Chi2) 0.000000000 NA NA NA NA NA
race 3.458818e+00 (Chi2) 0.066461218 NA NA NA NA NA
sex 0.000000e+00 (Chi2) 0.003433525 NA NA NA NA NA
working_status 3.217730e-02 (Chi2) 0.006641900 NA NA NA NA NA
Welch Two Sample t-test
data: test_data_new_fiw_m$mental_health_status and test_data_new_fiw_m$binary_fiw
t = 30.459, df = 2062.4, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.075363 2.361000
sample estimates:
mean of x mean of y
2.5353535 0.3171717
#Estimate Linear Regression on Raw Datalm1_fiw_m<-lm(mental_health_status~ binary_fiw+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, test_data_new_fiw_m)summary(lm1_fiw_m)
#######Entropy Balancing# Create a subset of the dataset with the selected variablestreatment_var_fiw_m <-"binary_fiw"covariates_vars_fiw_m <-c("marriage_status", "total_people_in_household", "age_group","education", "family_income", "race", "sex", "working_status")dependent_var_fiw_m <-"mental_health_status"# Prepare treatment and covariatestreatment_fiw_m <- test_data_new_fiw_m$binary_fiwcovariates_fiw_m <- test_data_new_fiw_m[, covariates_vars_fiw_m]# Run entropy balancinge_bal_fiw_m <-ebalance(Treatment = treatment_fiw_m,X = covariates_fiw_m,max.iterations =200,constraint.tolerance =1)
Converged within tolerance
test_data_new_fiw_m$eb_weight_fiw_m <-NAtest_data_new_fiw_m$eb_weight_fiw_m[test_data_new_fiw_m[[treatment_var_fiw_m]] ==1] <-1# Treated units get weight = 1test_data_new_fiw_m$eb_weight_fiw_m[test_data_new_fiw_m[[treatment_var_fiw_m]] ==0] <- e_bal_fiw_m$w # Control units get EB weights# Final data for regressioneb_data_fiw_m <- test_data_new_fiw_m %>%filter(!is.na(eb_weight_fiw_m)) # Exclude unmatched if any#data for analysis #Now have a weight called 'eb_weight' that can be used in analysis ##Let's check that the two groups are equal now eb_data_fiw_m %>%group_by(binary_fiw) %>%summarise(age_weighted_mean_fiw_m =wtd.mean(age_group, weights = eb_weight_fiw_m), age_weighted_variance_fiw_m =wtd.var(age_group, weights = eb_weight_fiw_m) )