Final Project Scaffolding Assignment #2 - Statistical Analysis Plan

Author

Jingyi Yang

Questions

Primary Research Question: List our your primary research question for this research overall

How work-family conflict is related to health outcomes: control multiple variables.

Who/what is included in the data you are using? What units are in your data? What population does your data generalize to? e.g. countries, adult Americans, etc. State who is included in your sample here.

The units are per person, and the population is Americans, and the sample includes the 4149 representatives across America.

Geographic Areas Included in the Analysis List out the geographies that are included in your analysis. Is there grouping in your data that requires you to use a multilevel model? Or do you have interest in how a group level predictor might influence individual level behavior?

The population is American, so the data do not require a multilevel model. However, in the future, I would like to try the General Social Survey in other countries, like China, which will make the country a group-level predictor.

Dependent Variable List out each dependent variable you plan on including in your final project. Then show a distribution of the responses with any NAs/non-substantive answers removed.

The dependent variables are physical health and mental health conditions. The response distribution is concentrated in lower numbers, like 1 or 2, which means for most of the respondents, the number of days in bad physical and mental condition is low.

Hypotheses List all your hypotheses here including any interactive hypotheses that you might want to test.

The job interferes with family life and will have a positive relationship with poor health conditions. Family life interfering with jobs will have a positive relationship with poor mental health conditions. Some control variables, like working status and marriage status, might have relationships with family and work conflict.

Primary Modeling Approach What regression modeling approach - i.e. OLS, binary logit/probit, ordered logit/probit, count model, multilevel model, etc. - will you pursue with this type of dependent variable? If you have multiple DVs listed above, you should have multiple modeling approaches here so for each DV explicitly state the approach you think you will use.

I will use Ordered Logit/Probit and Matching and/or Balancing Procedures to build models for both dependent variables, physical health status, and mental health status.

Primary Explanatory Independent Variables List your primary explanatory IVs here including how they are measured. Should each IV be inputted as a numeric or factor? Or is that something that you will evaluate empirically? For each primary IV, list that specifically.

The independent variables are work interrupt family and family interrupt family. They are factors when doing the Ordered Logit/Probit model and changed to numerical for the Matching and/or Balancing Procedures model to match the model calculation.

Primary Control Variables List your primary control variables here. For survey data. everyone should at a minimum have key demographic control variables listed here including gender/sex identity, age, education, and race/ethnicity (depending on the geographic location). There are likely other key control variables to include as well so do not stop at those listed.

The control variables I might use are include 1) respondents’ working status, 2) Respondents’ marriage status,3) number for people in the household, 4) respondents’ age, 5) respondents’ education level, 6) respondents’ family income, 7) respondents’ race, and 8) respondents’ sex.

I chose these variables based on the articles related to the topic. Here is the frequency table about the control variables appeared in the eight articles.

Variables Name	Notes	Frequency Number
age		8
gender		5
education		5
Marital status		5
Working hours		5
race		3
number of children living at home		3
family income (income)		2
income		2
body mass (wight)		2
Parental status	1 = child(ren) livingat home and 2 = no children at home	2
living arrangement	Live with spouse, and others	2
family history of heart disease		1
heavy drinking		1
Socioeconomic Index for Occupations		1
Location		1
presence of a long term disease		1
Work schedule		1
work environment		1
psychological job demands		1
decision latitude		1
social support at work		1
Emotional demands		1
changing domestic roles		1
changing work characteristics		1
job category		1
Shift work		1
Socioeconomic position		1
height		1

#Prepare

library(haven)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.4.3

Warning: package 'ggplot2' was built under R version 4.4.3

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.5
✔ ggplot2   3.5.1     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(semPlot)
library(lavaan)

This is lavaan 0.6-19
lavaan is FREE software! Please report any bugs.

library(psych)


Attaching package: 'psych'

The following object is masked from 'package:lavaan':

    cor2cov

The following objects are masked from 'package:ggplot2':

    %+%, alpha

library(skimr) 
library(corrplot)

corrplot 0.95 loaded

library(patchwork) #Merge GGPlots together 
library(ggplot2) #Graphing
library(stargazer) #Tabular Regression Results


Please cite as: 

 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

library(jtools) #Tabular Regression Results

Warning: package 'jtools' was built under R version 4.4.3

library(descr) #Easy Frequency Tables

Warning: package 'descr' was built under R version 4.4.3

library(stats) #Imports survey data 
library(ggeffects) #Predicted Probabilities from Regressions

Warning: package 'ggeffects' was built under R version 4.4.3

library(nnet) #For multinomial models
library(MASS) #For ordered models


Attaching package: 'MASS'

The following object is masked from 'package:patchwork':

    area

The following object is masked from 'package:dplyr':

    select

library(brant) #Test parallel regression assumption

Warning: package 'brant' was built under R version 4.4.3

library(boot) #Create CIs for Multinomial Modeling


Attaching package: 'boot'

The following object is masked from 'package:psych':

    logit

library(cem) #Coarsened Exact Matching

Warning: package 'cem' was built under R version 4.4.3

Loading required package: tcltk
Loading required package: lattice

Attaching package: 'lattice'

The following object is masked from 'package:boot':

    melanoma


How to use CEM? Type vignette("cem")

library(MatchIt)  #Coarsened Exact Matching

Warning: package 'MatchIt' was built under R version 4.4.3

library(stargazer)  #Regression Output
library(jtools)  #Regression Output
library(WeightIt) #Entropy Balancing

Warning: package 'WeightIt' was built under R version 4.4.3

library(Hmisc) #General Modeling


Attaching package: 'Hmisc'

The following object is masked from 'package:jtools':

    %nin%

The following object is masked from 'package:psych':

    describe

The following objects are masked from 'package:dplyr':

    src, summarize

The following objects are masked from 'package:base':

    format.pval, units

library(ebal) #Entropy Balancing

##
## ebal Package: Implements Entropy Balancing.

## See http://www.stanford.edu/~jhain/ for additional information.

library(survey) #Applying EB weights

Loading required package: grid
Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

Loading required package: survival

Attaching package: 'survival'

The following object is masked from 'package:boot':

    aml


Attaching package: 'survey'

The following object is masked from 'package:Hmisc':

    deff

The following object is masked from 'package:WeightIt':

    calibrate

The following object is masked from 'package:graphics':

    dotchart

Data Import

GSS2022 <- read_dta("研一下/DACSS 790Q/Final Project/2022/GSS2022.dta")
head(GSS2022)

# A tibble: 6 × 1,185
  year         id wrkstat hrs1        hrs2        evwork      wrkslf  occ10     
  <dbl+lbl> <dbl> <dbl+l> <dbl+lbl>   <dbl+lbl>   <dbl+lbl>   <dbl+l> <dbl+lbl> 
1 2022          1 1 [wor…    40       NA(i) [iap] NA(i) [iap] 2 [som…  430 [man…
2 2022          2 5 [ret… NA(i) [iap] NA(i) [iap]     1 [yes] 2 [som…   50 [mar…
3 2022          3 1 [wor…    52       NA(i) [iap] NA(i) [iap] 2 [som… 4610 [per…
4 2022          4 3 [wit… NA(i) [iap]    25       NA(i) [iap] 2 [som… 4120 [foo…
5 2022          5 8 [oth… NA(i) [iap] NA(i) [iap]     1 [yes] 2 [som… 7330 [ind…
6 2022          6 1 [wor…    50       NA(i) [iap] NA(i) [iap] 2 [som… 4610 [per…
# ℹ 1,177 more variables: prestg10 <dbl+lbl>, prestg105plus <dbl+lbl>,
#   indus10 <dbl+lbl>, marital <dbl+lbl>, martype <dbl+lbl>, divorce <dbl+lbl>,
#   widowed <dbl+lbl>, spwrksta <dbl+lbl>, sphrs1 <dbl+lbl>, sphrs2 <dbl+lbl>,
#   spevwork <dbl+lbl>, cowrksta <dbl+lbl>, coevwork <dbl+lbl>,
#   cohrs1 <dbl+lbl>, cohrs2 <dbl+lbl>, spwrkslf <dbl+lbl>, sppres80 <dbl+lbl>,
#   spocc10 <dbl+lbl>, sppres10 <dbl+lbl>, sppres105plus <dbl+lbl>,
#   spind10 <dbl+lbl>, coocc10 <dbl+lbl>, coind10 <dbl+lbl>, …

Setting data set

# Change age to age group! 
age_breaks <- c(18, 30, 50, 65, 100)

GSS2022$age_group <- cut(GSS2022$age, 
                      age_breaks, labels = c("18-29", "30-49", "50-64", "65+"), 
                      include.lowest = TRUE) 
                      
GSS2022<- GSS2022 %>% mutate(age_group= recode(age_group,"18-29"="1", "30-49"="2", "50-64"="3", "65+"="4"))

test_data <- GSS2022 %>% dplyr::select(c(wkvsfam, famvswk, physhlth, mntlhlth, marital,hompop_exp, age_group, degree, income, race, sex, wrkstat))%>% mutate(wkvsfam= as.numeric(wkvsfam),famvswk= as.numeric(famvswk), physhlth= as.numeric(physhlth), mntlhlth= as.numeric(mntlhlth), marital= as.numeric(marital), hompop_exp= as.numeric(hompop_exp), age_group= as.numeric(age_group), degree= as.numeric(degree), income= as.numeric(income), race= as.numeric(race), sex= as.numeric(sex), wrkstat=as.numeric(wrkstat))

new_names <- c("job_interfere_with_family", "family_interfere_with_job", "physical_health_status", "mental_health_status", "marriage_status", "total_people_in_household", "age_group","education", "family_income", "race", "sex", "working_status")

colnames(test_data) <- new_names #Apply new names to your data frame
skim(test_data)

Data summary
Name	test_data
Number of rows	4149
Number of columns	12
_______________________
Column type frequency:
numeric	12
________________________
Group variables	None

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
job_interfere_with_family	1767	0.57	2.69	0.93	1	2	3	3	4	▂▇▁▇▅
family_interfere_with_job	1779	0.57	2.94	0.86	1	2	3	4	4	▁▆▁▇▆
physical_health_status	1844	0.56	2.67	6.11	0	0	0	2	30	▇▁▁▁▁
mental_health_status	1870	0.55	4.53	7.93	0	0	0	5	30	▇▁▁▁▁
marriage_status	16	1.00	2.77	1.74	1	1	3	5	5	▇▁▃▁▆
total_people_in_household	8	1.00	1.75	1.67	0	1	1	2	49	▇▁▁▁▁
age_group	256	0.94	2.46	1.03	1	2	2	3	4	▅▇▁▅▅
education	2	1.00	1.82	1.27	0	1	1	3	4	▂▇▂▃▂
family_income	484	0.88	11.22	2.07	1	12	12	12	12	▁▁▁▁▇
race	64	0.98	1.51	0.76	1	1	1	2	3	▇▁▂▁▂
sex	23	0.99	1.54	0.50	1	1	2	2	2	▇▁▁▁▇
working_status	9	1.00	3.07	2.34	1	1	2	5	8	▇▁▃▁▂

final_data <- test_data %>% drop_na()
head(final_data)

# A tibble: 6 × 12
  job_interfere_with_family family_interfere_with_job physical_health_status
                      <dbl>                     <dbl>                  <dbl>
1                         2                         3                     30
2                         3                         3                      0
3                         1                         4                      4
4                         1                         3                      2
5                         3                         3                      0
6                         2                         1                      5
# ℹ 9 more variables: mental_health_status <dbl>, marriage_status <dbl>,
#   total_people_in_household <dbl>, age_group <dbl>, education <dbl>,
#   family_income <dbl>, race <dbl>, sex <dbl>, working_status <dbl>

skim(final_data)

Data summary
Name	final_data
Number of rows	1980
Number of columns	12
_______________________
Column type frequency:
numeric	12
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
job_interfere_with_family	1	2.67	0.91	1	2	3	3	4	▂▇▁▇▅
family_interfere_with_job	1	2.93	0.86	1	2	3	4	4	▁▆▁▇▆
physical_health_status	1	2.73	6.12	0	0	0	2	30	▇▁▁▁▁
mental_health_status	1	4.60	7.93	0	0	0	5	30	▇▁▁▁▁
marriage_status	1	2.79	1.80	1	1	3	5	5	▇▁▃▁▆
total_people_in_household	1	1.90	1.74	0	1	1	2	43	▇▁▁▁▁
age_group	1	2.16	0.82	1	2	2	3	4	▃▇▁▅▁
education	1	2.05	1.27	0	1	2	3	4	▁▇▂▅▃
family_income	1	11.62	1.48	1	12	12	12	12	▁▁▁▁▇
race	1	1.52	0.77	1	1	1	2	3	▇▁▂▁▂
sex	1	1.50	0.50	1	1	2	2	2	▇▁▁▁▇
working_status	1	1.27	0.54	1	1	1	1	3	▇▁▂▁▁

# Reverse Coding
final_data <- final_data %>%
  mutate(
    job_interfere_with_family = case_when(
      job_interfere_with_family == 1 ~ 4,
      job_interfere_with_family == 2 ~ 3,
      job_interfere_with_family == 3 ~ 2,
      job_interfere_with_family == 4 ~ 1,
      job_interfere_with_family == "NA"~ NA_real_),
    job_interfere_with_family = labelled(job_interfere_with_family, 
      c(`Never` = 1, `Rarely` = 2, `Sometimes` = 3, `Often` = 4))) %>% 
  mutate(family_interfere_with_job = as.numeric(family_interfere_with_job),
    family_interfere_with_job = case_when(
      family_interfere_with_job == 1 ~ 4,
      family_interfere_with_job == 2 ~ 3,
      family_interfere_with_job == 3 ~ 2,
      family_interfere_with_job == 4 ~ 1,
      family_interfere_with_job == "NA"~ NA_real_),
    family_interfere_with_job = labelled(family_interfere_with_job, 
      c(`Never` = 1, `Rarely` = 2, `Sometimes` = 3, `Often` = 4))) %>%
  mutate(physical_health_status = case_when(
    physical_health_status == 0 ~ 0,
    physical_health_status == 1 ~ 1,
    physical_health_status == 2 ~ 2,
    physical_health_status == 3 ~ 3,
    physical_health_status == 4 ~ 4,
    physical_health_status == 5 ~ 5,
    physical_health_status == 6 ~ 6,
    physical_health_status == 7 ~ 7,
    physical_health_status >= 8 ~ 8,
    physical_health_status =="NA"~ NA_real_),
    physical_health_status = labelled(physical_health_status, 
      c(`NONE`=0, `1`=1, `2`=2, `3`=3, `4`=4, `5`=5, `6`=6, `7`=7, `8+`=8)))%>%
  mutate(mental_health_status= case_when(
    mental_health_status== 0 ~ 0,
    mental_health_status== 1 ~ 1,
    mental_health_status== 2 ~ 2,
    mental_health_status== 3 ~ 3,
    mental_health_status== 4 ~ 4,
    mental_health_status== 5 ~ 5,
    mental_health_status== 6 ~ 6,
    mental_health_status== 7 ~ 7,
    mental_health_status>= 8 ~ 8,
    mental_health_status== "NA"~ NA_real_),
    mental_health_status=labelled(mental_health_status, c(`NONE`=0, `1`=1, `2`=2, `3`=3, `4`=4, `5`=5, `6`=6, `7`=7, `8+`=8)))%>%
  mutate(marriage_status= case_when(
    marriage_status== 1~5, 
    marriage_status== 2~4,
    marriage_status== 3~3,
    marriage_status== 4~2,
    marriage_status== 5~1,
    marriage_status== "NA"~ NA_real_),
    marriage_status=labelled(marriage_status, c(`Never married`=1, `Separated`=2, `Divorced`=3, `Widowed`=4, `Married`=5))) %>%
  mutate(age_group= case_when(
    age_group== 1 ~ 1,
    age_group== 2 ~ 2, 
    age_group== "3"~ 3,
    age_group== "4" ~ 4,
    age_group== "NA"~ NA_real_),
    age_group=labelled(age_group, c(`18-29 YEARS OLD`=1, `30-49 YEARS OLD`=2, `50-64 YEARS OLD`=3, `64 YEARS OLD OR OVER`=4)))%>% 
  mutate(education= case_when(
    education== 0 ~ 1,
    education== 1 ~ 2,
    education== 2 ~ 3, 
    education== 3 ~ 4, 
    education== 4 ~ 5,
    education== "NA"~ NA_real_),
    education= labelled(education, c(`Less THAN HIGH SCHOOL`= 1, `HIGH SCHOOL`= 2, `ASSOCIATE/JUNIOR COLLEGE`=3, `BACHELORS`=4, `GRADUATE`= 5)))%>%
  mutate(family_income= case_when(
    family_income== 1 ~ 1, 
    family_income== 2 ~ 2, 
    family_income== 3 ~ 3, 
    family_income== 4 ~ 4, 
    family_income== 5 ~ 5, 
    family_income== 6 ~ 6, 
    family_income== 7 ~ 7, 
    family_income== 8 ~ 8, 
    family_income== 9 ~ 9,
    family_income== 10 ~ 10, 
    family_income== 11 ~ 11, 
    family_income== 12 ~ 12,
    family_income== "NA"~ NA_real_),
    family_income= labelled(family_income, c(`UNDER$1,000`= 1,`$1,000 TO $2,999`= 2, `$3,000 TO $3,999`=3, `$4,000 TO $4,999`=4, `$5,000 TOlibra $5,999`= 5, `$6,000 TO $6,999`=6, `$7,000 TO $7,999`=7, `$8,000 TO $9,999`=8, `$10,000 TO $14,999`= 9, `$15,000 TO $19,999`= 10,`$20,000 TO $24,999`= 11,  `$25,000 OR MORE`= 12))) %>%
  mutate(race= case_when(
    race== 1 ~ 1, 
    race== 2 ~ 2,
    race== 3 ~ 3,
    race== "NA"~ NA_real_),
  race= labelled(race, c(`WHITE`= 1, `BLACK`= 2, `OTHER`=3))) %>%
  mutate(sex= case_when(
    sex== 1 ~ 1, 
    sex== 2 ~ 2,
    sex== "NA"~ NA_real_),
    sex= labelled(sex, c(`MALE`=1, `FEMALE`=2)))%>%
  mutate(total_people_in_household = case_when(
    total_people_in_household == 0 ~ 0,
    total_people_in_household == 1 ~ 1,
    total_people_in_household == 2 ~ 2,
    total_people_in_household == 3 ~ 3,
    total_people_in_household == 4 ~ 4,
    total_people_in_household == 5 ~ 5,
    total_people_in_household == 6 ~ 6,
    total_people_in_household == 7 ~ 7,
    total_people_in_household >= 8 ~ 8,
    total_people_in_household == "NA"~ NA_real_),
    total_people_in_household = labelled(total_people_in_household, 
      c(`NONE`=0, `1`=1, `2`=2, `3`=3, `4`=4, `5`=5, `6`=6, `7`=7, `8+`=8)))%>%
  mutate(working_status = case_when(
    working_status == 1 ~ 1,
    working_status == 2 ~ 2,
    working_status == 3 ~ 3,
    working_status == 4 ~ 4,
    working_status == 5 ~ 5,
    working_status == 6 ~ 6,
    working_status == 7 ~ 7,
    working_status == 8 ~ 8,
    working_status == "NA"~ NA_real_),
    working_status = labelled(working_status,
      c(`WORKING FULL TIME`=1, `WORKING PART TIME`=2,`WITH A JOB, BUT NOT AT WORK BECAUSE OF TEMPORARY ILLNESS,VACATION, STRIKE`=3, `UNEMPLOYED, LAID OFF, LOOKING FOR WORK`=4, `RETIRED`=5, `IN SCHOOL`=6, `KEEPING HOUSE`=7,`OTHER`=8)))
head(final_data)

# A tibble: 6 × 12
  job_interfere_with_family family_interfere_with_job physical_health_status
  <dbl+lbl>                 <dbl+lbl>                 <dbl+lbl>             
1 3 [Sometimes]             2 [Rarely]                8 [8+]                
2 2 [Rarely]                2 [Rarely]                0 [NONE]              
3 4 [Often]                 1 [Never]                 4 [4]                 
4 4 [Often]                 2 [Rarely]                2 [2]                 
5 2 [Rarely]                2 [Rarely]                0 [NONE]              
6 3 [Sometimes]             4 [Often]                 5 [5]                 
# ℹ 9 more variables: mental_health_status <dbl+lbl>,
#   marriage_status <dbl+lbl>, total_people_in_household <dbl+lbl>,
#   age_group <dbl+lbl>, education <dbl+lbl>, family_income <dbl+lbl>,
#   race <dbl+lbl>, sex <dbl+lbl>, working_status <dbl+lbl>

skim(final_data)

Data summary
Name	final_data
Number of rows	1980
Number of columns	12
_______________________
Column type frequency:
numeric	12
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
job_interfere_with_family	1	2.33	0.91	1	2	2	3	4	▅▇▁▇▂
family_interfere_with_job	1	2.07	0.86	1	1	2	3	4	▆▇▁▆▁
physical_health_status	1	1.67	2.69	0	0	0	2	8	▇▂▁▁▂
mental_health_status	1	2.54	3.21	0	0	0	5	8	▇▂▁▁▃
marriage_status	1	3.21	1.80	1	1	3	5	5	▆▁▃▁▇
total_people_in_household	1	1.87	1.45	0	1	1	2	8	▇▃▁▁▁
age_group	1	2.16	0.82	1	2	2	3	4	▃▇▁▅▁
education	1	3.05	1.27	1	2	3	4	5	▁▇▂▅▃
family_income	1	11.62	1.48	1	12	12	12	12	▁▁▁▁▇
race	1	1.52	0.77	1	1	1	2	3	▇▁▂▁▂
sex	1	1.50	0.50	1	1	2	2	2	▇▁▁▁▇
working_status	1	1.27	0.54	1	1	1	1	3	▇▁▂▁▁

Separate two scales

data_physical_health<-final_data%>% dplyr::select(c(job_interfere_with_family, family_interfere_with_job, physical_health_status, marriage_status, total_people_in_household,  age_group,education, family_income, race, sex, working_status))

head(data_physical_health)

# A tibble: 6 × 11
  job_interfere_with_family family_interfere_with_job physical_health_status
  <dbl+lbl>                 <dbl+lbl>                 <dbl+lbl>             
1 3 [Sometimes]             2 [Rarely]                8 [8+]                
2 2 [Rarely]                2 [Rarely]                0 [NONE]              
3 4 [Often]                 1 [Never]                 4 [4]                 
4 4 [Often]                 2 [Rarely]                2 [2]                 
5 2 [Rarely]                2 [Rarely]                0 [NONE]              
6 3 [Sometimes]             4 [Often]                 5 [5]                 
# ℹ 8 more variables: marriage_status <dbl+lbl>,
#   total_people_in_household <dbl+lbl>, age_group <dbl+lbl>,
#   education <dbl+lbl>, family_income <dbl+lbl>, race <dbl+lbl>,
#   sex <dbl+lbl>, working_status <dbl+lbl>

data_mental_health <- final_data%>% dplyr::select(c(job_interfere_with_family, family_interfere_with_job, mental_health_status,marriage_status, total_people_in_household,  age_group,education, family_income, race, sex, working_status))

head(data_mental_health)

# A tibble: 6 × 11
  job_interfere_with_family family_interfere_with_job mental_health_status
  <dbl+lbl>                 <dbl+lbl>                 <dbl+lbl>           
1 3 [Sometimes]             2 [Rarely]                8 [8+]              
2 2 [Rarely]                2 [Rarely]                0 [NONE]            
3 4 [Often]                 1 [Never]                 8 [8+]              
4 4 [Often]                 2 [Rarely]                8 [8+]              
5 2 [Rarely]                2 [Rarely]                0 [NONE]            
6 3 [Sometimes]             4 [Often]                 5 [5]               
# ℹ 8 more variables: marriage_status <dbl+lbl>,
#   total_people_in_household <dbl+lbl>, age_group <dbl+lbl>,
#   education <dbl+lbl>, family_income <dbl+lbl>, race <dbl+lbl>,
#   sex <dbl+lbl>, working_status <dbl+lbl>

Ordered Model

Job interfere family

ordered__physical_health_l<- polr(factor(physical_health_status)~job_interfere_with_family+family_interfere_with_job+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=data_physical_health, 
           na.action = na.exclude, method = "logistic") 

brant(ordered__physical_health_l)

------------------------------------------------------------ 
Test for            X2  df  probability 
------------------------------------------------------------ 
Omnibus             100.78  70  0.01
job_interfere_with_family   7.19    7   0.41
family_interfere_with_job   1.31    7   0.99
marriage_status     6.3 7   0.5
total_people_in_household   9.91    7   0.19
age_group           8.95    7   0.26
education           16.15   7   0.02
family_income           4.4 7   0.73
race                5.47    7   0.6
sex             8.26    7   0.31
working_status          20.09   7   0.01
------------------------------------------------------------ 

H0: Parallel Regression Assumption holds

summary(ordered__physical_health_l)


Re-fitting to get Hessian

Call:
polr(formula = factor(physical_health_status) ~ job_interfere_with_family + 
    family_interfere_with_job + marriage_status + total_people_in_household + 
    age_group + education + family_income + race + sex + working_status, 
    data = data_physical_health, na.action = na.exclude, method = "logistic")

Coefficients:
                             Value Std. Error t value
job_interfere_with_family  0.19097    0.05864  3.2566
family_interfere_with_job  0.19520    0.06183  3.1570
marriage_status           -0.08711    0.02806 -3.1049
total_people_in_household -0.02185    0.03290 -0.6642
age_group                  0.03027    0.05972  0.5068
education                  0.01863    0.03700  0.5037
family_income             -0.03478    0.02987 -1.1647
race                      -0.12590    0.06093 -2.0663
sex                        0.39306    0.09252  4.2486
working_status             0.37649    0.08301  4.5354

Intercepts:
    Value   Std. Error t value
0|1  1.6507  0.4426     3.7297
1|2  1.8831  0.4431     4.2501
2|3  2.2862  0.4442     5.1466
3|4  2.5800  0.4452     5.7956
4|5  2.7380  0.4458     6.1423
5|6  3.1232  0.4474     6.9811
6|7  3.1895  0.4477     7.1244
7|8  3.4165  0.4489     7.6105

Residual Deviance: 5342.019 
AIC: 5378.019

ordered_physical_health_p<- polr(factor(physical_health_status)~job_interfere_with_family+family_interfere_with_job+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=data_physical_health, 
           na.action = na.exclude, method = "probit") 

brant(ordered_physical_health_p)

------------------------------------------------------------ 
Test for            X2  df  probability 
------------------------------------------------------------ 
Omnibus             100.78  70  0.01
job_interfere_with_family   7.19    7   0.41
family_interfere_with_job   1.31    7   0.99
marriage_status     6.3 7   0.5
total_people_in_household   9.91    7   0.19
age_group           8.95    7   0.26
education           16.15   7   0.02
family_income           4.4 7   0.73
race                5.47    7   0.6
sex             8.26    7   0.31
working_status          20.09   7   0.01
------------------------------------------------------------ 

H0: Parallel Regression Assumption holds

summary(ordered_physical_health_p)


Re-fitting to get Hessian

Call:
polr(formula = factor(physical_health_status) ~ job_interfere_with_family + 
    family_interfere_with_job + marriage_status + total_people_in_household + 
    age_group + education + family_income + race + sex + working_status, 
    data = data_physical_health, na.action = na.exclude, method = "probit")

Coefficients:
                              Value Std. Error t value
job_interfere_with_family  0.103555    0.03461  2.9918
family_interfere_with_job  0.109271    0.03657  2.9880
marriage_status           -0.051186    0.01657 -3.0897
total_people_in_household -0.015420    0.01937 -0.7960
age_group                  0.027511    0.03545  0.7761
education                  0.002272    0.02196  0.1035
family_income             -0.020732    0.01823 -1.1375
race                      -0.076628    0.03580 -2.1405
sex                        0.225815    0.05485  4.1171
working_status             0.228849    0.04905  4.6660

Intercepts:
    Value   Std. Error t value
0|1  0.9376  0.2652     3.5357
1|2  1.0790  0.2654     4.0662
2|3  1.3193  0.2658     4.9641
3|4  1.4894  0.2661     5.5966
4|5  1.5788  0.2663     5.9276
5|6  1.7896  0.2669     6.7063
6|7  1.8249  0.2669     6.8362
7|8  1.9434  0.2673     7.2707

Residual Deviance: 5346.804 
AIC: 5382.804

ols_physical_health_p<-glm(as.numeric(physical_health_status)~job_interfere_with_family+family_interfere_with_job+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=data_physical_health) #PID treated as factor
summary(ols_physical_health_p)


Call:
glm(formula = as.numeric(physical_health_status) ~ job_interfere_with_family + 
    family_interfere_with_job + marriage_status + total_people_in_household + 
    age_group + education + family_income + race + sex + working_status, 
    data = data_physical_health)

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                0.31347    0.59893   0.523   0.6008    
job_interfere_with_family  0.18328    0.07658   2.393   0.0168 *  
family_interfere_with_job  0.20888    0.08143   2.565   0.0104 *  
marriage_status           -0.08703    0.03649  -2.385   0.0172 *  
total_people_in_household -0.02183    0.04197  -0.520   0.6031    
age_group                  0.07287    0.07846   0.929   0.3531    
education                 -0.04222    0.04856  -0.869   0.3848    
family_income             -0.03262    0.04175  -0.781   0.4347    
race                      -0.17351    0.07818  -2.219   0.0266 *  
sex                        0.48306    0.12083   3.998 6.63e-05 ***
working_status             0.55923    0.11189   4.998 6.31e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 6.970247)

    Null deviance: 14313  on 1979  degrees of freedom
Residual deviance: 13724  on 1969  degrees of freedom
AIC: 9476.4

Number of Fisher Scoring iterations: 2

export_summs(ordered__physical_health_l, ordered_physical_health_p, ols_physical_health_p,digits=3)


Re-fitting to get Hessian


Re-fitting to get Hessian

Warning in FUN(X[[i]], ...): tidy() does not return p values for models of
class data.frame; significance stars not printed.
Warning in FUN(X[[i]], ...): tidy() does not return p values for models of
class data.frame; significance stars not printed.

	Model 1	Model 2	Model 3
job_interfere_with_family	0.191	0.104	0.183 *
	(0.059)	(0.035)	(0.077)
family_interfere_with_job	0.195	0.109	0.209 *
	(0.062)	(0.037)	(0.081)
marriage_status	-0.087	-0.051	-0.087 *
	(0.028)	(0.017)	(0.036)
total_people_in_household	-0.022	-0.015	-0.022
	(0.033)	(0.019)	(0.042)
age_group	0.030	0.028	0.073
	(0.060)	(0.035)	(0.078)
education	0.019	0.002	-0.042
	(0.037)	(0.022)	(0.049)
family_income	-0.035	-0.021	-0.033
	(0.030)	(0.018)	(0.042)
race	-0.126	-0.077	-0.174 *
	(0.061)	(0.036)	(0.078)
sex	0.393	0.226	0.483 ***
	(0.093)	(0.055)	(0.121)
working_status	0.376	0.229	0.559 ***
	(0.083)	(0.049)	(0.112)
0\|1	1.651	0.938
	(0.443)	(0.265)
1\|2	1.883	1.079
	(0.443)	(0.265)
2\|3	2.286	1.319
	(0.444)	(0.266)
3\|4	2.580	1.489
	(0.445)	(0.266)
4\|5	2.738	1.579
	(0.446)	(0.266)
5\|6	3.123	1.790
	(0.447)	(0.267)
6\|7	3.189	1.825
	(0.448)	(0.267)
7\|8	3.416	1.943
	(0.449)	(0.267)
(Intercept)			0.313
			(0.599)
N	1980.000	1980.000	1980
AIC	5378.019	5382.804	9476.434
BIC	5478.654	5483.440	9543.524
Pseudo R2			0.041
* p < 0.001; p < 0.01; * p < 0.05.

stargazer(ordered__physical_health_l, ordered_physical_health_p, ols_physical_health_p,digits=3, type="text")


====================================================================================
                                             Dependent variable:                    
                          ----------------------------------------------------------
                          physical_health_status  as.numeric(physical_health_status)
                            ordered     ordered                 normal              
                           logistic     probit                                      
                              (1)         (2)                    (3)                
------------------------------------------------------------------------------------
job_interfere_with_family  0.191***    0.104***                0.183**              
                            (0.059)     (0.035)                (0.077)              
                                                                                    
family_interfere_with_job  0.195***    0.109***                0.209**              
                            (0.062)     (0.037)                (0.081)              
                                                                                    
marriage_status            -0.087***   -0.051***               -0.087**             
                            (0.028)     (0.017)                (0.036)              
                                                                                    
total_people_in_household   -0.022      -0.015                  -0.022              
                            (0.033)     (0.019)                (0.042)              
                                                                                    
age_group                    0.030       0.028                  0.073               
                            (0.060)     (0.035)                (0.078)              
                                                                                    
education                    0.019       0.002                  -0.042              
                            (0.037)     (0.022)                (0.049)              
                                                                                    
family_income               -0.035      -0.021                  -0.033              
                            (0.030)     (0.018)                (0.042)              
                                                                                    
race                       -0.126**    -0.077**                -0.174**             
                            (0.061)     (0.036)                (0.078)              
                                                                                    
sex                        0.393***    0.226***                0.483***             
                            (0.093)     (0.055)                (0.121)              
                                                                                    
working_status             0.376***    0.229***                0.559***             
                            (0.083)     (0.049)                (0.112)              
                                                                                    
Constant                                                        0.313               
                                                               (0.599)              
                                                                                    
------------------------------------------------------------------------------------
Observations                 1,980       1,980                  1,980               
Log Likelihood                                                -4,727.217            
Akaike Inf. Crit.                                             9,476.434             
====================================================================================
Note:                                                    *p<0.1; **p<0.05; ***p<0.01

c_physical_health <- ggpredict(ordered__physical_health_l, terms="job_interfere_with_family")
print(c_physical_health)

# Predicted probabilities of physical_health_status

physical_health_status: 0

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.69 | 0.48, 0.84
                        2 |      0.64 | 0.43, 0.81
                        3 |      0.60 | 0.38, 0.78
                        4 |      0.55 | 0.33, 0.75

physical_health_status: 1

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.05 | 0.02, 0.10
                        2 |      0.05 | 0.02, 0.11
                        3 |      0.05 | 0.02, 0.12
                        4 |      0.06 | 0.02, 0.13

physical_health_status: 2

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.07 | 0.03, 0.15
                        2 |      0.08 | 0.03, 0.17
                        3 |      0.08 | 0.04, 0.18
                        4 |      0.09 | 0.04, 0.20

physical_health_status: 3

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.04 | 0.02, 0.09
                        2 |      0.05 | 0.02, 0.10
                        3 |      0.05 | 0.02, 0.12
                        4 |      0.06 | 0.02, 0.13

physical_health_status: 4

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.02 | 0.01, 0.04
                        2 |      0.02 | 0.01, 0.05
                        3 |      0.02 | 0.01, 0.06
                        4 |      0.03 | 0.01, 0.07

physical_health_status: 5

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.04 | 0.02, 0.09
                        2 |      0.04 | 0.02, 0.10
                        3 |      0.05 | 0.02, 0.11
                        4 |      0.06 | 0.02, 0.13

physical_health_status: 6

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.01 | 0.00, 0.01
                        2 |      0.01 | 0.00, 0.02
                        3 |      0.01 | 0.00, 0.02
                        4 |      0.01 | 0.00, 0.02

physical_health_status: 7

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.02 | 0.01, 0.04
                        2 |      0.02 | 0.01, 0.05
                        3 |      0.02 | 0.01, 0.05
                        4 |      0.03 | 0.01, 0.06

physical_health_status: 8

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.07 | 0.03, 0.15
                        2 |      0.09 | 0.04, 0.18
                        3 |      0.10 | 0.05, 0.22
                        4 |      0.12 | 0.05, 0.26

Adjusted for:
* family_interfere_with_job =  2.07
*           marriage_status =  3.21
* total_people_in_household =  1.87
*                 age_group =  2.16
*                 education =  3.05
*             family_income = 11.62
*                      race =  1.52
*                       sex =  1.50
*            working_status =  1.27

ggplot(c_physical_health, aes(x = response.level, y = predicted, fill = factor(x))) +
  geom_bar(stat = "identity", position = "dodge") +  # Bar plot
  # Add error bars for confidence intervals
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.4, position = position_dodge(width = 0.7))+
  theme_minimal(base_size = 13)+
  labs(x = "Response Level: Physical Health Status", y = "Predicted Probability", 
       title = "Predicted Probability about Physical Health Status with Job interfere the family life")+
 scale_fill_discrete(labels= c("1"= "NEVER","2"="RARELY","3"="SOMETIMES","4"="OFTEN"))+
  scale_x_discrete(labels= c("0"="None","1"="One","2"="Two", "3"="Three", "4"="Four", "5"="Five", "6"= "Six", "7"= "Seven", "8"="Over Eight"))+
  guides(fill = guide_legend(title = "Job interfere the family life", nrow=1), color = "none")+
   theme(legend.position = "bottom")

d_physical_health<- ggpredict(ordered__physical_health_l, terms="family_interfere_with_job")
print(d_physical_health)

# Predicted probabilities of physical_health_status

physical_health_status: 0

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.68 | 0.47, 0.83
                        2 |      0.63 | 0.42, 0.80
                        3 |      0.59 | 0.37, 0.77
                        4 |      0.54 | 0.32, 0.74

physical_health_status: 1

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.05 | 0.02, 0.11
                        2 |      0.05 | 0.02, 0.12
                        3 |      0.06 | 0.02, 0.12
                        4 |      0.06 | 0.02, 0.13

physical_health_status: 2

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.07 | 0.03, 0.16
                        2 |      0.08 | 0.04, 0.17
                        3 |      0.09 | 0.04, 0.19
                        4 |      0.09 | 0.04, 0.20

physical_health_status: 3

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.04 | 0.02, 0.10
                        2 |      0.05 | 0.02, 0.11
                        3 |      0.05 | 0.02, 0.12
                        4 |      0.06 | 0.02, 0.14

physical_health_status: 4

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.02 | 0.01, 0.05
                        2 |      0.02 | 0.01, 0.05
                        3 |      0.03 | 0.01, 0.06
                        4 |      0.03 | 0.01, 0.07

physical_health_status: 5

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.04 | 0.02, 0.09
                        2 |      0.05 | 0.02, 0.10
                        3 |      0.05 | 0.02, 0.12
                        4 |      0.06 | 0.02, 0.14

physical_health_status: 6

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.01 | 0.00, 0.01
                        2 |      0.01 | 0.00, 0.02
                        3 |      0.01 | 0.00, 0.02
                        4 |      0.01 | 0.00, 0.02

physical_health_status: 7

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.02 | 0.01, 0.04
                        2 |      0.02 | 0.01, 0.05
                        3 |      0.02 | 0.01, 0.06
                        4 |      0.03 | 0.01, 0.07

physical_health_status: 8

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.08 | 0.03, 0.16
                        2 |      0.09 | 0.04, 0.19
                        3 |      0.11 | 0.05, 0.23
                        4 |      0.13 | 0.06, 0.27

Adjusted for:
* job_interfere_with_family =  2.33
*           marriage_status =  3.21
* total_people_in_household =  1.87
*                 age_group =  2.16
*                 education =  3.05
*             family_income = 11.62
*                      race =  1.52
*                       sex =  1.50
*            working_status =  1.27

ggplot(d_physical_health, aes(x = response.level, y = predicted, fill = factor(x))) +
  geom_bar(stat = "identity", position = "dodge") +  # Bar plot
  # Add error bars for confidence intervals
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.4, position = position_dodge(width = 0.7))+
  theme_minimal(base_size = 13)+
   labs(x = "Response Level: Physical Health Status", y = "Predicted Probability", 
       title = "Predicted Probability about Physical Health Status \n with Family life interfere the Job")+
 scale_fill_discrete(labels= c("1"= "NEVER","2"="RARELY","3"="SOMETIMES","4"="OFTEN"))+
  scale_x_discrete(labels= c("0"="None","1"="One","2"="Two", "3"="Three", "4"="Four", "5"="Five", "6"= "Six", "7"= "Seven", "8"="Over Eight"))+
  guides(fill = guide_legend(title = "Family life interfere the Job"), color = "none")+
   theme(legend.position = "bottom")

Family interfere job

ordered_mental_health_l<- polr(factor(mental_health_status)~job_interfere_with_family+family_interfere_with_job+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=data_mental_health, 
           na.action = na.exclude, method = "logistic") 

brant(ordered_mental_health_l)

------------------------------------------------------------ 
Test for            X2  df  probability 
------------------------------------------------------------ 
Omnibus             104.97  70  0
job_interfere_with_family   9.52    7   0.22
family_interfere_with_job   9.59    7   0.21
marriage_status     6.37    7   0.5
total_people_in_household   8.35    7   0.3
age_group           15.23   7   0.03
education           31.63   7   0
family_income           6.19    7   0.52
race                8.32    7   0.31
sex             2.94    7   0.89
working_status          2.78    7   0.9
------------------------------------------------------------ 

H0: Parallel Regression Assumption holds

summary(ordered_mental_health_l)


Re-fitting to get Hessian

Call:
polr(formula = factor(mental_health_status) ~ job_interfere_with_family + 
    family_interfere_with_job + marriage_status + total_people_in_household + 
    age_group + education + family_income + race + sex + working_status, 
    data = data_mental_health, na.action = na.exclude, method = "logistic")

Coefficients:
                             Value Std. Error  t value
job_interfere_with_family  0.34172    0.05771   5.9218
family_interfere_with_job  0.23714    0.06015   3.9422
marriage_status           -0.08133    0.02681  -3.0338
total_people_in_household -0.02933    0.03107  -0.9441
age_group                 -0.60914    0.06065 -10.0430
education                 -0.02017    0.03625  -0.5564
family_income             -0.04240    0.03041  -1.3940
race                      -0.32039    0.05989  -5.3499
sex                        0.60467    0.08985   6.7295
working_status             0.18643    0.08189   2.2766

Intercepts:
    Value    Std. Error t value 
0|1  -0.1414   0.4443    -0.3182
1|2   0.0662   0.4444     0.1490
2|3   0.4033   0.4446     0.9071
3|4   0.6334   0.4447     1.4242
4|5   0.7574   0.4448     1.7026
5|6   1.1895   0.4453     2.6712
6|7   1.2575   0.4454     2.8233
7|8   1.3937   0.4457     3.1274

Residual Deviance: 5728.532 
AIC: 5764.532

ordered_mental_health_p<- polr(factor(mental_health_status)~job_interfere_with_family+family_interfere_with_job+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=data_mental_health, 
           na.action = na.exclude, method = "probit") 

brant(ordered_mental_health_p)

------------------------------------------------------------ 
Test for            X2  df  probability 
------------------------------------------------------------ 
Omnibus             104.97  70  0
job_interfere_with_family   9.52    7   0.22
family_interfere_with_job   9.59    7   0.21
marriage_status     6.37    7   0.5
total_people_in_household   8.35    7   0.3
age_group           15.23   7   0.03
education           31.63   7   0
family_income           6.19    7   0.52
race                8.32    7   0.31
sex             2.94    7   0.89
working_status          2.78    7   0.9
------------------------------------------------------------ 

H0: Parallel Regression Assumption holds

summary(ordered_mental_health_p)


Re-fitting to get Hessian

Call:
polr(formula = factor(mental_health_status) ~ job_interfere_with_family + 
    family_interfere_with_job + marriage_status + total_people_in_household + 
    age_group + education + family_income + race + sex + working_status, 
    data = data_mental_health, na.action = na.exclude, method = "probit")

Coefficients:
                             Value Std. Error t value
job_interfere_with_family  0.19746    0.03399  5.8092
family_interfere_with_job  0.13401    0.03590  3.7324
marriage_status           -0.04959    0.01615 -3.0711
total_people_in_household -0.01819    0.01867 -0.9742
age_group                 -0.36014    0.03602 -9.9984
education                 -0.01752    0.02168 -0.8083
family_income             -0.02742    0.01818 -1.5084
race                      -0.18687    0.03545 -5.2722
sex                        0.35866    0.05370  6.6783
working_status             0.10915    0.04912  2.2221

Intercepts:
    Value   Std. Error t value
0|1 -0.1534  0.2648    -0.5795
1|2 -0.0276  0.2648    -0.1041
2|3  0.1757  0.2648     0.6633
3|4  0.3132  0.2649     1.1824
4|5  0.3868  0.2649     1.4602
5|6  0.6395  0.2650     2.4137
6|7  0.6787  0.2650     2.5613
7|8  0.7564  0.2650     2.8540

Residual Deviance: 5737.719 
AIC: 5773.719

ols_mental_health_p<-glm(as.numeric(mental_health_status)~job_interfere_with_family+family_interfere_with_job+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=data_mental_health, family=gaussian(link="identity")) #PID treated as factor
summary(ols_mental_health_p)


Call:
glm(formula = as.numeric(mental_health_status) ~ job_interfere_with_family + 
    family_interfere_with_job + marriage_status + total_people_in_household + 
    age_group + education + family_income + race + sex + working_status, 
    family = gaussian(link = "identity"), data = data_mental_health)

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                3.15995    0.67695   4.668 3.25e-06 ***
job_interfere_with_family  0.51472    0.08655   5.947 3.22e-09 ***
family_interfere_with_job  0.27537    0.09204   2.992  0.00281 ** 
marriage_status           -0.12581    0.04125  -3.050  0.00232 ** 
total_people_in_household -0.03206    0.04744  -0.676  0.49932    
age_group                 -0.85738    0.08867  -9.669  < 2e-16 ***
education                 -0.10617    0.05489  -1.934  0.05322 .  
family_income             -0.07050    0.04719  -1.494  0.13533    
race                      -0.42737    0.08836  -4.836 1.42e-06 ***
sex                        0.87432    0.13657   6.402 1.91e-10 ***
working_status             0.31398    0.12647   2.483  0.01312 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 8.904412)

    Null deviance: 20353  on 1979  degrees of freedom
Residual deviance: 17533  on 1969  degrees of freedom
AIC: 9961.3

Number of Fisher Scoring iterations: 2

stargazer(ordered_mental_health_l, ordered_mental_health_p, ols_mental_health_p,digits=3, type="text")


================================================================================
                                           Dependent variable:                  
                          ------------------------------------------------------
                          mental_health_status  as.numeric(mental_health_status)
                           ordered    ordered                normal             
                           logistic    probit                                   
                             (1)        (2)                   (3)               
--------------------------------------------------------------------------------
job_interfere_with_family  0.342***   0.197***              0.515***            
                           (0.058)    (0.034)               (0.087)             
                                                                                
family_interfere_with_job  0.237***   0.134***              0.275***            
                           (0.060)    (0.036)               (0.092)             
                                                                                
marriage_status           -0.081***  -0.050***             -0.126***            
                           (0.027)    (0.016)               (0.041)             
                                                                                
total_people_in_household   -0.029     -0.018                -0.032             
                           (0.031)    (0.019)               (0.047)             
                                                                                
age_group                 -0.609***  -0.360***             -0.857***            
                           (0.061)    (0.036)               (0.089)             
                                                                                
education                   -0.020     -0.018               -0.106*             
                           (0.036)    (0.022)               (0.055)             
                                                                                
family_income               -0.042     -0.027                -0.070             
                           (0.030)    (0.018)               (0.047)             
                                                                                
race                      -0.320***  -0.187***             -0.427***            
                           (0.060)    (0.035)               (0.088)             
                                                                                
sex                        0.605***   0.359***              0.874***            
                           (0.090)    (0.054)               (0.137)             
                                                                                
working_status             0.186**    0.109**               0.314**             
                           (0.082)    (0.049)               (0.126)             
                                                                                
Constant                                                    3.160***            
                                                            (0.677)             
                                                                                
--------------------------------------------------------------------------------
Observations                1,980      1,980                 1,980              
Log Likelihood                                             -4,969.664           
Akaike Inf. Crit.                                          9,961.329            
================================================================================
Note:                                                *p<0.1; **p<0.05; ***p<0.01

export_summs (ordered_mental_health_l, ordered_mental_health_p, ols_mental_health_p)


Re-fitting to get Hessian


Re-fitting to get Hessian

Warning in FUN(X[[i]], ...): tidy() does not return p values for models of
class data.frame; significance stars not printed.
Warning in FUN(X[[i]], ...): tidy() does not return p values for models of
class data.frame; significance stars not printed.

	Model 1	Model 2	Model 3
job_interfere_with_family	0.34	0.20	0.51 ***
	(0.06)	(0.03)	(0.09)
family_interfere_with_job	0.24	0.13	0.28 **
	(0.06)	(0.04)	(0.09)
marriage_status	-0.08	-0.05	-0.13 **
	(0.03)	(0.02)	(0.04)
total_people_in_household	-0.03	-0.02	-0.03
	(0.03)	(0.02)	(0.05)
age_group	-0.61	-0.36	-0.86 ***
	(0.06)	(0.04)	(0.09)
education	-0.02	-0.02	-0.11
	(0.04)	(0.02)	(0.05)
family_income	-0.04	-0.03	-0.07
	(0.03)	(0.02)	(0.05)
race	-0.32	-0.19	-0.43 ***
	(0.06)	(0.04)	(0.09)
sex	0.60	0.36	0.87 ***
	(0.09)	(0.05)	(0.14)
working_status	0.19	0.11	0.31 *
	(0.08)	(0.05)	(0.13)
0\|1	-0.14	-0.15
	(0.44)	(0.26)
1\|2	0.07	-0.03
	(0.44)	(0.26)
2\|3	0.40	0.18
	(0.44)	(0.26)
3\|4	0.63	0.31
	(0.44)	(0.26)
4\|5	0.76	0.39
	(0.44)	(0.26)
5\|6	1.19	0.64
	(0.45)	(0.26)
6\|7	1.26	0.68
	(0.45)	(0.26)
7\|8	1.39	0.76
	(0.45)	(0.27)
(Intercept)			3.16 ***
			(0.68)
N	1980.00	1980.00	1980
AIC	5764.53	5773.72	9961.33
BIC	5865.17	5874.35	10028.42
Pseudo R2			0.14
* p < 0.001; p < 0.01; * p < 0.05.

c_mental_health <- ggpredict(ordered_mental_health_l, terms="job_interfere_with_family")
print(c_mental_health)

# Predicted probabilities of mental_health_status

mental_health_status: 0

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.63 | 0.43, 0.80
                        2 |      0.55 | 0.34, 0.74
                        3 |      0.47 | 0.27, 0.68
                        4 |      0.38 | 0.20, 0.61

mental_health_status: 1

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.05 | 0.02, 0.10
                        2 |      0.05 | 0.02, 0.11
                        3 |      0.05 | 0.02, 0.12
                        4 |      0.05 | 0.02, 0.12

mental_health_status: 2

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.07 | 0.03, 0.15
                        2 |      0.08 | 0.03, 0.17
                        3 |      0.08 | 0.04, 0.18
                        4 |      0.08 | 0.04, 0.19

mental_health_status: 3

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.04 | 0.02, 0.09
                        2 |      0.05 | 0.02, 0.11
                        3 |      0.05 | 0.02, 0.12
                        4 |      0.06 | 0.02, 0.13

mental_health_status: 4

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.02 | 0.01, 0.05
                        2 |      0.02 | 0.01, 0.05
                        3 |      0.03 | 0.01, 0.06
                        4 |      0.03 | 0.01, 0.07

mental_health_status: 5

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.06 | 0.03, 0.13
                        2 |      0.07 | 0.03, 0.15
                        3 |      0.09 | 0.04, 0.18
                        4 |      0.10 | 0.04, 0.21

mental_health_status: 6

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.01 | 0.00, 0.02
                        2 |      0.01 | 0.00, 0.02
                        3 |      0.01 | 0.00, 0.03
                        4 |      0.01 | 0.01, 0.03

mental_health_status: 7

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.01 | 0.01, 0.03
                        2 |      0.02 | 0.01, 0.04
                        3 |      0.02 | 0.01, 0.05
                        4 |      0.03 | 0.01, 0.06

mental_health_status: 8

job_interfere_with_family | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.11 | 0.05, 0.23
                        2 |      0.15 | 0.07, 0.29
                        3 |      0.20 | 0.09, 0.37
                        4 |      0.26 | 0.12, 0.46

Adjusted for:
* family_interfere_with_job =  2.07
*           marriage_status =  3.21
* total_people_in_household =  1.87
*                 age_group =  2.16
*                 education =  3.05
*             family_income = 11.62
*                      race =  1.52
*                       sex =  1.50
*            working_status =  1.27

ggplot(c_mental_health, aes(x = response.level, y = predicted, fill = factor(x))) +
  geom_bar(stat = "identity", position = "dodge") +  # Bar plot
  # Add error bars for confidence intervals
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.4, position = position_dodge(width = 0.7))+
  theme_minimal(base_size = 13)+
  labs(x = "Response Level: Mental Health Status", y = "Predicted Probability", 
       title = "Predicted Probability about Mental Health Status with Job interfere the family life")+
 scale_fill_discrete(labels= c("1"= "NEVER","2"="RARELY","3"="SOMETIMES","4"="OFTEN"))+
  scale_x_discrete(labels= c("0"="None","1"="One","2"="Two", "3"="Three", "4"="Four", "5"="Five", "6"= "Six", "7"= "Seven", "8"="Over Eight"))+
  guides(fill = guide_legend(title = "Job interfere the family life", nrow=1), color = "none")+
   theme(legend.position = "bottom")

d_mental_health <- ggpredict(ordered_mental_health_l, terms="family_interfere_with_job")
print(d_mental_health)

# Predicted probabilities of mental_health_status

mental_health_status: 0

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.59 | 0.38, 0.77
                        2 |      0.53 | 0.32, 0.73
                        3 |      0.47 | 0.27, 0.68
                        4 |      0.41 | 0.22, 0.64

mental_health_status: 1

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.05 | 0.02, 0.11
                        2 |      0.05 | 0.02, 0.11
                        3 |      0.05 | 0.02, 0.12
                        4 |      0.05 | 0.02, 0.12

mental_health_status: 2

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.07 | 0.03, 0.16
                        2 |      0.08 | 0.03, 0.17
                        3 |      0.08 | 0.04, 0.18
                        4 |      0.08 | 0.04, 0.19

mental_health_status: 3

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.04 | 0.02, 0.10
                        2 |      0.05 | 0.02, 0.11
                        3 |      0.05 | 0.02, 0.12
                        4 |      0.06 | 0.02, 0.13

mental_health_status: 4

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.02 | 0.01, 0.05
                        2 |      0.02 | 0.01, 0.06
                        3 |      0.03 | 0.01, 0.06
                        4 |      0.03 | 0.01, 0.07

mental_health_status: 5

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.07 | 0.03, 0.14
                        2 |      0.08 | 0.03, 0.16
                        3 |      0.09 | 0.04, 0.18
                        4 |      0.09 | 0.04, 0.21

mental_health_status: 6

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.01 | 0.00, 0.02
                        2 |      0.01 | 0.00, 0.02
                        3 |      0.01 | 0.00, 0.03
                        4 |      0.01 | 0.01, 0.03

mental_health_status: 7

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.02 | 0.01, 0.04
                        2 |      0.02 | 0.01, 0.04
                        3 |      0.02 | 0.01, 0.05
                        4 |      0.03 | 0.01, 0.06

mental_health_status: 8

family_interfere_with_job | Predicted |     95% CI
--------------------------------------------------
                        1 |      0.13 | 0.06, 0.26
                        2 |      0.16 | 0.07, 0.31
                        3 |      0.20 | 0.09, 0.37
                        4 |      0.24 | 0.11, 0.44

Adjusted for:
* job_interfere_with_family =  2.33
*           marriage_status =  3.21
* total_people_in_household =  1.87
*                 age_group =  2.16
*                 education =  3.05
*             family_income = 11.62
*                      race =  1.52
*                       sex =  1.50
*            working_status =  1.27

ggplot(d_mental_health, aes(x = response.level, y = predicted, fill = factor(x))) +
  geom_bar(stat = "identity", position = "dodge") +  # Bar plot
  # Add error bars for confidence intervals
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.4, position = position_dodge(width = 0.7))+
  theme_minimal(base_size = 13)+
  labs(x = "Response Level: Mental Health Status", y = "Predicted Probability", 
       title = "Predicted Probability about Mental Health Status \n with Family life interfere the Job")+
 scale_fill_discrete(labels= c("1"= "NEVER","2"="RARELY","3"="SOMETIMES","4"="OFTEN"))+
  scale_x_discrete(labels= c("0"="None","1"="One","2"="Two", "3"="Three", "4"="Four", "5"="Five", "6"= "Six", "7"= "Seven", "8"="Over Eight"))+
  guides(fill = guide_legend(title = "Family life interfere the Job"), color = "none")+
   theme(legend.position = "bottom")

Matching Model

 test_data_new <- final_data %>%
  mutate(binary_wif=case_when(
  job_interfere_with_family == 1 ~ 0, 
  job_interfere_with_family == 2 ~ 0,
  job_interfere_with_family == 3 ~ 1,
  job_interfere_with_family == 4 ~ 1),
  binary_wif = labelled(binary_wif, 
      c(`No` = 0, `Yes` = 1)))%>% mutate(binary_fiw=case_when(
  family_interfere_with_job == 1 ~ 0, 
  family_interfere_with_job == 2 ~ 0,
  family_interfere_with_job == 3 ~ 1,
  family_interfere_with_job == 4 ~ 1),
  binary_fiw = labelled(binary_fiw, 
      c(`No` = 0, `Yes` = 1)))

head(test_data_new)

# A tibble: 6 × 14
  job_interfere_with_family family_interfere_with_job physical_health_status
  <dbl+lbl>                 <dbl+lbl>                 <dbl+lbl>             
1 3 [Sometimes]             2 [Rarely]                8 [8+]                
2 2 [Rarely]                2 [Rarely]                0 [NONE]              
3 4 [Often]                 1 [Never]                 4 [4]                 
4 4 [Often]                 2 [Rarely]                2 [2]                 
5 2 [Rarely]                2 [Rarely]                0 [NONE]              
6 3 [Sometimes]             4 [Often]                 5 [5]                 
# ℹ 11 more variables: mental_health_status <dbl+lbl>,
#   marriage_status <dbl+lbl>, total_people_in_household <dbl+lbl>,
#   age_group <dbl+lbl>, education <dbl+lbl>, family_income <dbl+lbl>,
#   race <dbl+lbl>, sex <dbl+lbl>, working_status <dbl+lbl>,
#   binary_wif <dbl+lbl>, binary_fiw <dbl+lbl>

test_data_new <- test_data_new %>% mutate(job_interfere_with_family= as.numeric(job_interfere_with_family), family_interfere_with_job= as.numeric(family_interfere_with_job), physical_health_status= as.numeric(physical_health_status), mental_health_status= as.numeric(mental_health_status),  marriage_status= as.numeric(marriage_status), total_people_in_household= as.numeric(total_people_in_household), age_group=as.numeric(age_group), education= as.numeric(education),family_income= as.numeric(family_income), race= as.numeric(race), sex= as.numeric(sex), working_status = as.numeric(working_status), binary_wif= as.numeric(binary_wif),binary_fiw= as.numeric(binary_fiw))
head(test_data_new)

# A tibble: 6 × 14
  job_interfere_with_family family_interfere_with_job physical_health_status
                      <dbl>                     <dbl>                  <dbl>
1                         3                         2                      8
2                         2                         2                      0
3                         4                         1                      4
4                         4                         2                      2
5                         2                         2                      0
6                         3                         4                      5
# ℹ 11 more variables: mental_health_status <dbl>, marriage_status <dbl>,
#   total_people_in_household <dbl>, age_group <dbl>, education <dbl>,
#   family_income <dbl>, race <dbl>, sex <dbl>, working_status <dbl>,
#   binary_wif <dbl>, binary_fiw <dbl>

Job interrupt the family and Physical health status

test_data_new_wif<-test_data_new %>% dplyr::select(-c(job_interfere_with_family,family_interfere_with_job,binary_fiw))

head(test_data_new_wif)

# A tibble: 6 × 11
  physical_health_status mental_health_status marriage_status
                   <dbl>                <dbl>           <dbl>
1                      8                    8               3
2                      0                    0               3
3                      4                    8               1
4                      2                    8               1
5                      0                    0               5
6                      5                    5               1
# ℹ 8 more variables: total_people_in_household <dbl>, age_group <dbl>,
#   education <dbl>, family_income <dbl>, race <dbl>, sex <dbl>,
#   working_status <dbl>, binary_wif <dbl>

##What is the impact of employment training on earnings?
#Testing for Imbalance Between Groups
check<-test_data_new_wif %>%
  group_by(binary_wif) %>%
  summarise_at(vars(physical_health_status,mental_health_status,marriage_status,total_people_in_household,age_group,education,family_income,race,sex, working_status), 
               list(mean = mean,
                    var = var))

round(t(check), 3)

                                 [,1]   [,2]
binary_wif                      0.000  1.000
physical_health_status_mean     1.503  1.896
mental_health_status_mean       2.086  3.116
marriage_status_mean            3.140  3.294
total_people_in_household_mean  1.861  1.889
age_group_mean                  2.194  2.112
education_mean                  2.950  3.170
family_income_mean             11.556 11.708
race_mean                       1.534  1.493
sex_mean                        1.504  1.497
working_status_mean             1.308  1.218
physical_health_status_var      7.009  7.442
mental_health_status_var        9.117 11.206
marriage_status_var             3.220  3.216
total_people_in_household_var   2.170  2.022
age_group_var                   0.736  0.584
education_var                   1.581  1.611
family_income_var               2.493  1.785
race_var                        0.595  0.572
sex_var                         0.250  0.250
working_status_var              0.312  0.272

match_exact <- matchit(binary_wif ~physical_health_status+mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
                       method = "exact", data = test_data_new_wif)
data_exact <- match.data(match_exact) #Creates new dataframe that only includes the matched cases
summary(match_exact)


Call:
matchit(formula = binary_wif ~ physical_health_status + mental_health_status + 
    marriage_status + total_people_in_household + age_group + 
    education + family_income + race + sex + working_status, 
    data = test_data_new_wif, method = "exact")

Summary of Balance for All Data:
                          Means Treated Means Control Std. Mean Diff.
physical_health_status           1.8958        1.5027          0.1441
mental_health_status             3.1157        2.0860          0.3076
marriage_status                  3.2940        3.1398          0.0860
total_people_in_household        1.8889        1.8611          0.0195
age_group                        2.1123        2.1935         -0.1063
education                        3.1701        2.9498          0.1736
family_income                   11.7083       11.5565          0.1137
race                             1.4931        1.5341         -0.0542
sex                              1.4965        1.5036         -0.0141
working_status                   1.2176        1.3082         -0.1737
                          Var. Ratio eCDF Mean eCDF Max
physical_health_status        1.0618    0.0437   0.1027
mental_health_status          1.2291    0.1144   0.1606
marriage_status               0.9988    0.0308   0.0528
total_people_in_household     0.9320    0.0095   0.0301
age_group                     0.7941    0.0288   0.0675
education                     1.0185    0.0441   0.0867
family_income                 0.7161    0.0127   0.0511
race                          0.9616    0.0137   0.0289
sex                           1.0003    0.0035   0.0071
working_status                0.8729    0.0313   0.0923

Summary of Balance for Matched Data:
                          Means Treated Means Control Std. Mean Diff.
physical_health_status           0.2857        0.2857               0
mental_health_status             0.8057        0.8057              -0
marriage_status                  3.5314        3.5314               0
total_people_in_household        1.5371        1.5371               0
age_group                        2.1771        2.1771               0
education                        3.1371        3.1371              -0
family_income                   12.0000       12.0000               0
race                             1.3829        1.3829              -0
sex                              1.3829        1.3829              -0
working_status                   1.0343        1.0343               0
                          Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
physical_health_status        0.9981         0        0               0
mental_health_status          0.9981         0        0               0
marriage_status               0.9981         0        0               0
total_people_in_household     0.9981         0        0               0
age_group                     0.9981         0        0               0
education                     0.9981         0        0               0
family_income                      .         0        0               0
race                          0.9981         0        0               0
sex                           0.9981         0        0               0
working_status                0.9981         0        0               0

Sample Sizes:
              Control Treated
All           1116.       864
Matched (ESS)  131.13     175
Matched        208.       175
Unmatched      908.       689
Discarded        0.         0

imbalance_exact <- imbalance(group = data_exact$binary_wif, data = data_exact, 
          drop = c("physical_health_status", "binary_wif", "weights",  "subclass")) #With matched data, always add weights and subclass here

Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect

imbalance_exact


Multivariate Imbalance Measure: L1=0.270
Percentage of local common support: LCS=100.0%

Univariate Imbalance Measures:

                           statistic   type          L1 min 25% 50% 75% max
mental_health_status      0.67555599 (Chi2) 0.006263736  NA  NA  NA  NA  NA
marriage_status           0.57735084 (Chi2) 0.024258242  NA  NA  NA  NA  NA
total_people_in_household 1.20128547 (Chi2) 0.036813187  NA  NA  NA  NA  NA
age_group                 1.42665653 (Chi2) 0.057912088  NA  NA  NA  NA  NA
education                 8.16204702 (Chi2) 0.132664835  NA  NA  NA  NA  NA
family_income             2.84334204 (Chi2) 0.000000000  NA  NA  NA  NA  NA
race                      0.63276543 (Chi2) 0.023846154  NA  NA  NA  NA  NA
sex                       0.02220519 (Chi2) 0.012664835  NA  NA  NA  NA  NA
working_status            0.41031836 (Chi2) 0.009890110  NA  NA  NA  NA  NA

###Match Coarsened Exact 
###Perform the matching here with code that resembles most regressions
match_cem <- matchit(binary_wif ~physical_health_status+mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
                       method = "cem", data = test_data_new_wif)
data_cem <- match.data(match_cem) #Creates new dataframe that only includes the matched cases
summary(match_cem)


Call:
matchit(formula = binary_wif ~ physical_health_status + mental_health_status + 
    marriage_status + total_people_in_household + age_group + 
    education + family_income + race + sex + working_status, 
    data = test_data_new_wif, method = "cem")

Summary of Balance for All Data:
                          Means Treated Means Control Std. Mean Diff.
physical_health_status           1.8958        1.5027          0.1441
mental_health_status             3.1157        2.0860          0.3076
marriage_status                  3.2940        3.1398          0.0860
total_people_in_household        1.8889        1.8611          0.0195
age_group                        2.1123        2.1935         -0.1063
education                        3.1701        2.9498          0.1736
family_income                   11.7083       11.5565          0.1137
race                             1.4931        1.5341         -0.0542
sex                              1.4965        1.5036         -0.0141
working_status                   1.2176        1.3082         -0.1737
                          Var. Ratio eCDF Mean eCDF Max
physical_health_status        1.0618    0.0437   0.1027
mental_health_status          1.2291    0.1144   0.1606
marriage_status               0.9988    0.0308   0.0528
total_people_in_household     0.9320    0.0095   0.0301
age_group                     0.7941    0.0288   0.0675
education                     1.0185    0.0441   0.0867
family_income                 0.7161    0.0127   0.0511
race                          0.9616    0.0137   0.0289
sex                           1.0003    0.0035   0.0071
working_status                0.8729    0.0313   0.0923

Summary of Balance for Matched Data:
                          Means Treated Means Control Std. Mean Diff.
physical_health_status           0.2857        0.2857               0
mental_health_status             0.8057        0.8057              -0
marriage_status                  3.5314        3.5314               0
total_people_in_household        1.5371        1.5371               0
age_group                        2.1771        2.1771               0
education                        3.1371        3.1371              -0
family_income                   12.0000       12.0000               0
race                             1.3829        1.3829              -0
sex                              1.3829        1.3829              -0
working_status                   1.0343        1.0343               0
                          Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
physical_health_status        0.9981         0        0               0
mental_health_status          0.9981         0        0               0
marriage_status               0.9981         0        0               0
total_people_in_household     0.9981         0        0               0
age_group                     0.9981         0        0               0
education                     0.9981         0        0               0
family_income                      .         0        0               0
race                          0.9981         0        0               0
sex                           0.9981         0        0               0
working_status                0.9981         0        0               0

Sample Sizes:
              Control Treated
All           1116.       864
Matched (ESS)  131.13     175
Matched        208.       175
Unmatched      908.       689
Discarded        0.         0

imbalance_cem <- imbalance(group = data_cem$binary_wif, data = data_cem, 
          drop = c("physical_health_status", "binary_wif", "weights",  "subclass"))

Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect

imbalance_cem


Multivariate Imbalance Measure: L1=0.270
Percentage of local common support: LCS=100.0%

Univariate Imbalance Measures:

                           statistic   type          L1 min 25% 50% 75% max
mental_health_status      0.67555599 (Chi2) 0.006263736  NA  NA  NA  NA  NA
marriage_status           0.57735084 (Chi2) 0.024258242  NA  NA  NA  NA  NA
total_people_in_household 1.20128547 (Chi2) 0.036813187  NA  NA  NA  NA  NA
age_group                 1.42665653 (Chi2) 0.057912088  NA  NA  NA  NA  NA
education                 8.16204702 (Chi2) 0.132664835  NA  NA  NA  NA  NA
family_income             2.84334204 (Chi2) 0.000000000  NA  NA  NA  NA  NA
race                      0.63276543 (Chi2) 0.023846154  NA  NA  NA  NA  NA
sex                       0.02220519 (Chi2) 0.012664835  NA  NA  NA  NA  NA
working_status            0.41031836 (Chi2) 0.009890110  NA  NA  NA  NA  NA

#Compare t-tests of DV on treated - no control variables 
t.test(test_data_new_wif$physical_health_status, test_data_new_wif$binary_wif)


    Welch Two Sample t-test

data:  test_data_new_wif$physical_health_status and test_data_new_wif$binary_wif
t = 20.142, df = 2113.5, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 1.117356 1.358402
sample estimates:
mean of x mean of y 
1.6742424 0.4363636

#Estimate Linear Regression on Raw Data
lm1<-lm(physical_health_status~ binary_wif+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
        test_data_new_wif)
summary(lm1)


Call:
lm(formula = physical_health_status ~ binary_wif + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = test_data_new_wif)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3126 -1.6850 -1.1607  0.7808  7.3931 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                0.87791    0.58260   1.507 0.132000    
binary_wif                 0.46913    0.12111   3.874 0.000111 ***
marriage_status           -0.07870    0.03647  -2.158 0.031066 *  
total_people_in_household -0.01235    0.04200  -0.294 0.768753    
age_group                  0.05731    0.07846   0.730 0.465220    
education                 -0.03314    0.04859  -0.682 0.495220    
family_income             -0.02947    0.04178  -0.705 0.480765    
race                      -0.16968    0.07829  -2.167 0.030330 *  
sex                        0.49082    0.12100   4.057 5.18e-05 ***
working_status             0.55782    0.11208   4.977 7.03e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.646 on 1970 degrees of freedom
Multiple R-squared:  0.03625,   Adjusted R-squared:  0.03185 
F-statistic: 8.233 on 9 and 1970 DF,  p-value: 4.168e-12

#Estimate Linear Regression on Coarsened Exact Matched Data
lm2<-lm(physical_health_status~ binary_wif+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
        data_cem, weights = weights)
summary(lm2)


Call:
lm(formula = physical_health_status ~ binary_wif + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = data_cem, weights = weights)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-1.1258 -0.4269 -0.2605 -0.0800 11.6683 

Coefficients: (1 not defined because of singularities)
                            Estimate Std. Error t value Pr(>|t|)  
(Intercept)                1.350e+00  5.219e-01   2.587   0.0101 *
binary_wif                -1.890e-16  1.396e-01   0.000   1.0000  
marriage_status           -3.851e-03  4.394e-02  -0.088   0.9302  
total_people_in_household -8.599e-02  6.545e-02  -1.314   0.1897  
age_group                 -1.748e-01  1.083e-01  -1.614   0.1074  
education                 -7.212e-02  5.674e-02  -1.271   0.2045  
family_income                     NA         NA      NA       NA  
race                      -1.941e-01  1.023e-01  -1.898   0.0585 .
sex                        2.462e-01  1.533e-01   1.606   0.1091  
working_status            -3.713e-01  3.359e-01  -1.105   0.2698  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.361 on 374 degrees of freedom
Multiple R-squared:  0.04053,   Adjusted R-squared:  0.02001 
F-statistic: 1.975 on 8 and 374 DF,  p-value: 0.04852

 #Compare results 
plot_summs(lm1, lm2)

Warning in base$statistic[!is.na(base$statistic)] <- x$coeftable[, stat_col]:
number of items to replace is not a multiple of replacement length

Warning in base[["p.value"]][!is.na(base$statistic)] <- x$coeftable[, "p"]:
number of items to replace is not a multiple of replacement length

stargazer(lm1, lm2, type = "text", digits = 3, 
          dep.var.labels = c("Earnings Post Training"),
          column.labels = c("Full Data", "Matched Data"))


=======================================================================
                                       Dependent variable:             
                          ---------------------------------------------
                                     Earnings Post Training            
                                 Full Data            Matched Data     
                                    (1)                    (2)         
-----------------------------------------------------------------------
binary_wif                       0.469***                -0.000        
                                  (0.121)                (0.140)       
                                                                       
marriage_status                  -0.079**                -0.004        
                                  (0.036)                (0.044)       
                                                                       
total_people_in_household         -0.012                 -0.086        
                                  (0.042)                (0.065)       
                                                                       
age_group                          0.057                 -0.175        
                                  (0.078)                (0.108)       
                                                                       
education                         -0.033                 -0.072        
                                  (0.049)                (0.057)       
                                                                       
family_income                     -0.029                               
                                  (0.042)                              
                                                                       
race                             -0.170**                -0.194*       
                                  (0.078)                (0.102)       
                                                                       
sex                              0.491***                 0.246        
                                  (0.121)                (0.153)       
                                                                       
working_status                   0.558***                -0.371        
                                  (0.112)                (0.336)       
                                                                       
Constant                           0.878                 1.350**       
                                  (0.583)                (0.522)       
                                                                       
-----------------------------------------------------------------------
Observations                       1,980                   383         
R2                                 0.036                  0.041        
Adjusted R2                        0.032                  0.020        
Residual Std. Error          2.646 (df = 1970)      1.361 (df = 374)   
F Statistic               8.233*** (df = 9; 1970) 1.975** (df = 8; 374)
=======================================================================
Note:                                       *p<0.1; **p<0.05; ***p<0.01

#######Entropy Balancing
# Create a subset of the dataset with the selected variables
treatment_var <- "binary_wif"
covariates_vars <- c("marriage_status", "total_people_in_household", "age_group","education", "family_income", "race", "sex", "working_status")
dependent_var <- "physical_health_status"

# Prepare treatment and covariates
treatment <- test_data_new_wif$binary_wif
covariates <- test_data_new_wif[, covariates_vars]

# Run entropy balancing
e_bal <- ebalance(
  Treatment = treatment,
  X = covariates,
  max.iterations = 200,
  constraint.tolerance = 1)

Converged within tolerance

# Add weights back to LL
test_data_new_wif$eb_weight <- NA
test_data_new_wif$eb_weight[test_data_new_wif[[treatment_var]] == 1] <- 1  # Treated units get weight = 1
test_data_new_wif$eb_weight[test_data_new_wif[[treatment_var]] == 0] <- e_bal$w  # Control units get EB weights

# Final data for regression
eb_data <- test_data_new_wif %>% filter(!is.na(eb_weight))  # Exclude unmatched if any

#data for analysis 
#Now have a weight called 'eb_weight' that can be used in analysis 

##Let's check that the two groups are equal now 
eb_data %>%
  group_by(binary_wif) %>%
  summarise(
    age_weighted_mean = wtd.mean(age_group, weights = eb_weight), 
    age_weighted_variance = wtd.var(age_group, weights = eb_weight)
  )

# A tibble: 2 × 3
  binary_wif age_weighted_mean age_weighted_variance
       <dbl>             <dbl>                 <dbl>
1          0              2.11                 0.676
2          1              2.11                 0.584

check <- eb_data %>%
  group_by(binary_wif) %>%
  summarise(across(.cols = c(marriage_status, total_people_in_household, age_group,education, family_income, race, sex, working_status),
                   .fns = list(
                     weighted_mean = ~wtd.mean(., weights = eb_weight),
                     weighted_variance = ~wtd.var(., weights = eb_weight)),
                   .names = "{.col}_{.fn}"),
            .groups = "drop")

check_t<- round(t(check), 2)

print(check_t)

                                             [,1]  [,2]
binary_wif                                   0.00  1.00
marriage_status_weighted_mean                3.29  3.29
marriage_status_weighted_variance            3.24  3.22
total_people_in_household_weighted_mean      1.89  1.89
total_people_in_household_weighted_variance  2.13  2.02
age_group_weighted_mean                      2.11  2.11
age_group_weighted_variance                  0.68  0.58
education_weighted_mean                      3.17  3.17
education_weighted_variance                  1.62  1.61
family_income_weighted_mean                 11.71 11.71
family_income_weighted_variance              1.53  1.79
race_weighted_mean                           1.49  1.49
race_weighted_variance                       0.57  0.57
sex_weighted_mean                            1.50  1.50
sex_weighted_variance                        0.25  0.25
working_status_weighted_mean                 1.22  1.22
working_status_weighted_variance             0.23  0.27

lm3<-lm(physical_health_status~ binary_wif+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=eb_data, weights = eb_weight)
lm4<-lm(physical_health_status~ binary_wif+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=eb_data)

stargazer(lm3, lm4, type = "text", digits = 3, 
          dep.var.labels = c("Earnings Post Training"),
          column.labels = c("Entropy Balanced", "Raw Data"))


============================================================
                                    Dependent variable:     
                                ----------------------------
                                   Earnings Post Training   
                                 Entropy Balanced  Raw Data 
                                       (1)            (2)   
------------------------------------------------------------
binary_wif                           0.475***      0.469*** 
                                     (0.118)        (0.121) 
                                                            
marriage_status                      -0.070*       -0.079** 
                                     (0.036)        (0.036) 
                                                            
total_people_in_household             -0.027        -0.012  
                                     (0.042)        (0.042) 
                                                            
age_group                             0.044          0.057  
                                     (0.080)        (0.078) 
                                                            
education                             -0.044        -0.033  
                                     (0.048)        (0.049) 
                                                            
family_income                         -0.022        -0.029  
                                     (0.047)        (0.042) 
                                                            
race                                 -0.155**      -0.170** 
                                     (0.078)        (0.078) 
                                                            
sex                                  0.521***      0.491*** 
                                     (0.120)        (0.121) 
                                                            
working_status                       0.490***      0.558*** 
                                     (0.120)        (0.112) 
                                                            
Constant                              0.866          0.878  
                                     (0.638)        (0.583) 
                                                            
------------------------------------------------------------
Observations                          1,980          1,980  
R2                                    0.035          0.036  
Adjusted R2                           0.031          0.032  
Residual Std. Error (df = 1970)       2.446          2.646  
F Statistic (df = 9; 1970)           7.932***      8.233*** 
============================================================
Note:                            *p<0.1; **p<0.05; ***p<0.01

Family interrupt and Physical health status

test_data_new_fiw<-test_data_new %>% dplyr::select(-c(job_interfere_with_family,family_interfere_with_job,binary_wif))

head(test_data_new_fiw)

# A tibble: 6 × 11
  physical_health_status mental_health_status marriage_status
                   <dbl>                <dbl>           <dbl>
1                      8                    8               3
2                      0                    0               3
3                      4                    8               1
4                      2                    8               1
5                      0                    0               5
6                      5                    5               1
# ℹ 8 more variables: total_people_in_household <dbl>, age_group <dbl>,
#   education <dbl>, family_income <dbl>, race <dbl>, sex <dbl>,
#   working_status <dbl>, binary_fiw <dbl>

##What is the impact of employment training on earnings?
#Testing for Imbalance Between Groups
check<-test_data_new_fiw %>%
  group_by(binary_fiw) %>%
  summarise_at(vars(physical_health_status,mental_health_status,marriage_status,total_people_in_household,age_group,education,family_income,race,sex, working_status), 
               list(mean = mean,
                    var = var))

round(t(check), 3)

                                 [,1]   [,2]
binary_fiw                      0.000  1.000
physical_health_status_mean     1.501  2.048
mental_health_status_mean       2.236  3.180
marriage_status_mean            3.094  3.451
total_people_in_household_mean  1.818  1.992
age_group_mean                  2.172  2.129
education_mean                  3.000  3.145
family_income_mean             11.624 11.619
race_mean                       1.508  1.533
sex_mean                        1.496  1.511
working_status_mean             1.272  1.261
physical_health_status_var      6.765  8.046
mental_health_status_var        9.687 10.977
marriage_status_var             3.224  3.138
total_people_in_household_var   2.050  2.206
age_group_var                   0.723  0.559
education_var                   1.574  1.662
family_income_var               2.167  2.239
race_var                        0.571  0.616
sex_var                         0.250  0.250
working_status_var              0.290  0.311

match_exact_fiw_1 <- matchit(binary_fiw ~physical_health_status+mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
                       method = "exact", data = test_data_new_fiw)
data_exact_fiw_1 <- match.data(match_exact_fiw_1) #Creates new dataframe that only includes the matched cases
summary(match_exact_fiw_1)


Call:
matchit(formula = binary_fiw ~ physical_health_status + mental_health_status + 
    marriage_status + total_people_in_household + age_group + 
    education + family_income + race + sex + working_status, 
    data = test_data_new_fiw, method = "exact")

Summary of Balance for All Data:
                          Means Treated Means Control Std. Mean Diff.
physical_health_status           2.0478        1.5007          0.1929
mental_health_status             3.1799        2.2359          0.2849
marriage_status                  3.4506        3.0939          0.2014
total_people_in_household        1.9920        1.8180          0.1172
age_group                        2.1290        2.1716         -0.0570
education                        3.1449        3.0000          0.1124
family_income                   11.6194       11.6243         -0.0032
race                             1.5334        1.5081          0.0322
sex                              1.5111        1.4956          0.0312
working_status                   1.2611        1.2722         -0.0198
                          Var. Ratio eCDF Mean eCDF Max
physical_health_status        1.1892    0.0608   0.1114
mental_health_status          1.1331    0.1049   0.1517
marriage_status               0.9734    0.0713   0.1093
total_people_in_household     1.0761    0.0226   0.0844
age_group                     0.7738    0.0328   0.0642
education                     1.0559    0.0290   0.0637
family_income                 1.0335    0.0039   0.0126
race                          1.0783    0.0084   0.0226
sex                           1.0004    0.0078   0.0156
working_status                1.0733    0.0124   0.0241

Summary of Balance for Matched Data:
                          Means Treated Means Control Std. Mean Diff.
physical_health_status           0.3411        0.3411               0
mental_health_status             1.1628        1.1628               0
marriage_status                  3.5271        3.5271               0
total_people_in_household        1.5736        1.5736               0
age_group                        2.1783        2.1783               0
education                        3.1628        3.1628              -0
family_income                   12.0000       12.0000               0
race                             1.4496        1.4496               0
sex                              1.3876        1.3876               0
working_status                   1.0388        1.0388               0
                          Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
physical_health_status        1.0001         0        0               0
mental_health_status          1.0001         0        0               0
marriage_status               1.0001         0        0               0
total_people_in_household     1.0001         0        0               0
age_group                     1.0001         0        0               0
education                     1.0001         0        0               0
family_income                      .         0        0               0
race                          1.0001         0        0               0
sex                           1.0001         0        0               0
working_status                1.0001         0        0               0

Sample Sizes:
              Control Treated
All           1352.       628
Matched (ESS)  131.16     129
Matched        205.       129
Unmatched     1147.       499
Discarded        0.         0

imbalance_exact_fiw_1 <- imbalance(group = data_exact_fiw_1$binary_fiw, data = data_exact_fiw_1, 
          drop = c("physical_health_status", "binary_fiw", "weights",  "subclass")) #With matched data, always add weights and subclass here

Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect

imbalance_exact_fiw_1


Multivariate Imbalance Measure: L1=0.276
Percentage of local common support: LCS=100.0%

Univariate Imbalance Measures:

                           statistic   type          L1 min 25% 50% 75% max
mental_health_status       5.1320068 (Chi2) 0.060351673  NA  NA  NA  NA  NA
marriage_status            0.4797745 (Chi2) 0.003970505  NA  NA  NA  NA  NA
total_people_in_household  2.2527992 (Chi2) 0.049725846  NA  NA  NA  NA  NA
age_group                  0.7456901 (Chi2) 0.033692569  NA  NA  NA  NA  NA
education                  2.4581016 (Chi2) 0.084363774  NA  NA  NA  NA  NA
family_income             17.2934132 (Chi2) 0.000000000  NA  NA  NA  NA  NA
race                       2.4630622 (Chi2) 0.075061448  NA  NA  NA  NA  NA
sex                        0.3086324 (Chi2) 0.036377387  NA  NA  NA  NA  NA
working_status             0.1768549 (Chi2) 0.014369446  NA  NA  NA  NA  NA

###Match Coarsened Exact 
###Perform the matching here with code that resembles most regressions
match_cem_fiw_1 <- matchit(binary_fiw ~physical_health_status+mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
                       method = "cem", data = test_data_new_fiw)
data_cem_fiw_1 <- match.data(match_cem_fiw_1) #Creates new dataframe that only includes the matched cases
summary(match_cem_fiw_1)


Call:
matchit(formula = binary_fiw ~ physical_health_status + mental_health_status + 
    marriage_status + total_people_in_household + age_group + 
    education + family_income + race + sex + working_status, 
    data = test_data_new_fiw, method = "cem")

Summary of Balance for All Data:
                          Means Treated Means Control Std. Mean Diff.
physical_health_status           2.0478        1.5007          0.1929
mental_health_status             3.1799        2.2359          0.2849
marriage_status                  3.4506        3.0939          0.2014
total_people_in_household        1.9920        1.8180          0.1172
age_group                        2.1290        2.1716         -0.0570
education                        3.1449        3.0000          0.1124
family_income                   11.6194       11.6243         -0.0032
race                             1.5334        1.5081          0.0322
sex                              1.5111        1.4956          0.0312
working_status                   1.2611        1.2722         -0.0198
                          Var. Ratio eCDF Mean eCDF Max
physical_health_status        1.1892    0.0608   0.1114
mental_health_status          1.1331    0.1049   0.1517
marriage_status               0.9734    0.0713   0.1093
total_people_in_household     1.0761    0.0226   0.0844
age_group                     0.7738    0.0328   0.0642
education                     1.0559    0.0290   0.0637
family_income                 1.0335    0.0039   0.0126
race                          1.0783    0.0084   0.0226
sex                           1.0004    0.0078   0.0156
working_status                1.0733    0.0124   0.0241

Summary of Balance for Matched Data:
                          Means Treated Means Control Std. Mean Diff.
physical_health_status           0.3411        0.3411               0
mental_health_status             1.1628        1.1628               0
marriage_status                  3.5271        3.5271               0
total_people_in_household        1.5736        1.5736               0
age_group                        2.1783        2.1783               0
education                        3.1628        3.1628              -0
family_income                   12.0000       12.0000               0
race                             1.4496        1.4496               0
sex                              1.3876        1.3876               0
working_status                   1.0388        1.0388               0
                          Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
physical_health_status        1.0001         0        0               0
mental_health_status          1.0001         0        0               0
marriage_status               1.0001         0        0               0
total_people_in_household     1.0001         0        0               0
age_group                     1.0001         0        0               0
education                     1.0001         0        0               0
family_income                      .         0        0               0
race                          1.0001         0        0               0
sex                           1.0001         0        0               0
working_status                1.0001         0        0               0

Sample Sizes:
              Control Treated
All           1352.       628
Matched (ESS)  131.16     129
Matched        205.       129
Unmatched     1147.       499
Discarded        0.         0

imbalance_cem_fiw_1 <- imbalance(group = data_cem_fiw_1$binary_fiw, data = data_cem_fiw_1, 
          drop = c("physical_health_status", "binary_fiw", "weights",  "subclass"))

Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect

imbalance_cem_fiw_1


Multivariate Imbalance Measure: L1=0.276
Percentage of local common support: LCS=100.0%

Univariate Imbalance Measures:

                           statistic   type          L1 min 25% 50% 75% max
mental_health_status       5.1320068 (Chi2) 0.060351673  NA  NA  NA  NA  NA
marriage_status            0.4797745 (Chi2) 0.003970505  NA  NA  NA  NA  NA
total_people_in_household  2.2527992 (Chi2) 0.049725846  NA  NA  NA  NA  NA
age_group                  0.7456901 (Chi2) 0.033692569  NA  NA  NA  NA  NA
education                  2.4581016 (Chi2) 0.084363774  NA  NA  NA  NA  NA
family_income             17.2934132 (Chi2) 0.000000000  NA  NA  NA  NA  NA
race                       2.4630622 (Chi2) 0.075061448  NA  NA  NA  NA  NA
sex                        0.3086324 (Chi2) 0.036377387  NA  NA  NA  NA  NA
working_status             0.1768549 (Chi2) 0.014369446  NA  NA  NA  NA  NA

t.test(test_data_new_fiw$physical_health_status, test_data_new_fiw$binary_fiw)


    Welch Two Sample t-test

data:  test_data_new_fiw$physical_health_status and test_data_new_fiw$binary_fiw
t = 22.125, df = 2097.5, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 1.236784 1.477357
sample estimates:
mean of x mean of y 
1.6742424 0.3171717

#Estimate Linear Regression on Raw Data
lm1_fiw_p<-lm(physical_health_status~ binary_fiw+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
        test_data_new_fiw)
summary(lm1_fiw_p)


Call:
lm(formula = physical_health_status ~ binary_fiw + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = test_data_new_fiw)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4470 -1.6637 -1.1503  0.8487  7.4235 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                0.91308    0.58059   1.573   0.1160    
binary_fiw                 0.59148    0.12862   4.599 4.52e-06 ***
marriage_status           -0.08843    0.03655  -2.420   0.0156 *  
total_people_in_household -0.01976    0.04196  -0.471   0.6377    
age_group                  0.05467    0.07826   0.699   0.4849    
education                 -0.02902    0.04843  -0.599   0.5490    
family_income             -0.02174    0.04171  -0.521   0.6023    
race                      -0.18362    0.07816  -2.349   0.0189 *  
sex                        0.48038    0.12083   3.976 7.27e-05 ***
working_status             0.53193    0.11165   4.764 2.03e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.642 on 1970 degrees of freedom
Multiple R-squared:  0.03922,   Adjusted R-squared:  0.03483 
F-statistic: 8.936 on 9 and 1970 DF,  p-value: 2.596e-13

#Estimate Linear Regression on Coarsened Exact Matched Data
lm2_fiw_p<-lm(physical_health_status~ binary_fiw+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
        data_cem_fiw_1, weights = weights)
summary(lm2_fiw_p)


Call:
lm(formula = physical_health_status ~ binary_fiw + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = data_cem_fiw_1, weights = weights)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-1.5902 -0.4906 -0.2798  0.0279  9.0255 

Coefficients: (1 not defined because of singularities)
                            Estimate Std. Error t value Pr(>|t|)  
(Intercept)                6.716e-01  5.983e-01   1.123   0.2624  
binary_fiw                 1.250e-16  1.594e-01   0.000   1.0000  
marriage_status           -7.946e-02  5.293e-02  -1.501   0.1343  
total_people_in_household -1.249e-01  6.841e-02  -1.826   0.0687 .
age_group                 -5.155e-02  1.255e-01  -0.411   0.6814  
education                  1.112e-01  6.781e-02   1.639   0.1021  
family_income                     NA         NA      NA       NA  
race                      -2.249e-01  1.107e-01  -2.031   0.0431 *
sex                        4.372e-01  1.697e-01   2.576   0.0104 *
working_status            -3.598e-01  4.289e-01  -0.839   0.4021  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.418 on 325 degrees of freedom
Multiple R-squared:  0.07481,   Adjusted R-squared:  0.05204 
F-statistic: 3.285 on 8 and 325 DF,  p-value: 0.001277

 #Compare results 
plot_summs(lm1_fiw_p, lm2_fiw_p)

Warning in base$statistic[!is.na(base$statistic)] <- x$coeftable[, stat_col]:
number of items to replace is not a multiple of replacement length

Warning in base[["p.value"]][!is.na(base$statistic)] <- x$coeftable[, "p"]:
number of items to replace is not a multiple of replacement length

stargazer(lm1_fiw_p, lm2_fiw_p, type = "text", digits = 3, 
          dep.var.labels = c("Earnings Post Training"),
          column.labels = c("Full Data", "Matched Data"))


========================================================================
                                       Dependent variable:              
                          ----------------------------------------------
                                      Earnings Post Training            
                                 Full Data             Matched Data     
                                    (1)                    (2)          
------------------------------------------------------------------------
binary_fiw                       0.591***                 0.000         
                                  (0.129)                (0.159)        
                                                                        
marriage_status                  -0.088**                 -0.079        
                                  (0.037)                (0.053)        
                                                                        
total_people_in_household         -0.020                 -0.125*        
                                  (0.042)                (0.068)        
                                                                        
age_group                          0.055                  -0.052        
                                  (0.078)                (0.125)        
                                                                        
education                         -0.029                  0.111         
                                  (0.048)                (0.068)        
                                                                        
family_income                     -0.022                                
                                  (0.042)                               
                                                                        
race                             -0.184**                -0.225**       
                                  (0.078)                (0.111)        
                                                                        
sex                              0.480***                0.437**        
                                  (0.121)                (0.170)        
                                                                        
working_status                   0.532***                 -0.360        
                                  (0.112)                (0.429)        
                                                                        
Constant                           0.913                  0.672         
                                  (0.581)                (0.598)        
                                                                        
------------------------------------------------------------------------
Observations                       1,980                   334          
R2                                 0.039                  0.075         
Adjusted R2                        0.035                  0.052         
Residual Std. Error          2.642 (df = 1970)       1.418 (df = 325)   
F Statistic               8.936*** (df = 9; 1970) 3.285*** (df = 8; 325)
========================================================================
Note:                                        *p<0.1; **p<0.05; ***p<0.01

#######Entropy Balancing
# Create a subset of the dataset with the selected variables
treatment_var_fiw_1 <- "binary_fiw"
covariates_vars_fiw_1 <- c("marriage_status", "total_people_in_household", "age_group","education", "family_income", "race", "sex", "working_status")
dependent_var_fiw_1 <- "physical_health_status"

# Prepare treatment and covariates
treatment_fiw_1  <- test_data_new_fiw$binary_fiw
covariates_fiw_1<- test_data_new_fiw[, covariates_vars_fiw_1]

# Run entropy balancing
e_bal_fiw_1  <- ebalance(
  Treatment = treatment_fiw_1,
  X = covariates_fiw_1,
  max.iterations = 200,
  constraint.tolerance = 1)

Converged within tolerance

test_data_new_fiw$eb_weight_fiw_1 <- NA
test_data_new_fiw$eb_weight_fiw_1[test_data_new_fiw[[treatment_var_fiw_1]] == 1] <- 1  # Treated units get weight = 1
test_data_new_fiw$eb_weight_fiw_1[test_data_new_fiw[[treatment_var_fiw_1]] == 0] <- e_bal_fiw_1$w  # Control units get EB weights
# Final data for regression
eb_data_fiw_1 <- test_data_new_fiw %>% filter(!is.na(eb_weight_fiw_1))  # Exclude unmatched if any

#data for analysis 
#Now have a weight called 'eb_weight' that can be used in analysis 

##Let's check that the two groups are equal now 
eb_data_fiw_1 %>%
  group_by(binary_fiw) %>%
  summarise(
    age_weighted_mean_fiw_1 = wtd.mean(age_group, weights = eb_weight_fiw_1), 
    age_weighted_variance_fiw_1 = wtd.var(age_group, weights = eb_weight_fiw_1)
  )

# A tibble: 2 × 3
  binary_fiw age_weighted_mean_fiw_1 age_weighted_variance_fiw_1
       <dbl>                   <dbl>                       <dbl>
1          0                    2.13                       0.676
2          1                    2.13                       0.559

check_fiw_1 <- eb_data_fiw_1 %>%
  group_by(binary_fiw) %>%
  summarise(across(.cols = c(marriage_status, total_people_in_household, age_group,education, family_income, race, sex, working_status),
                   .fns = list(
                     weighted_mean_fiw_1 = ~wtd.mean(., weights = eb_weight_fiw_1),
                     weighted_variance_fiw_1 = ~wtd.var(., weights = eb_weight_fiw_1)),
                   .names = "{.col}_{.fn}"),
            .groups = "drop")

check_t_fiw_1<- round(t(check_fiw_1), 2)

print(check_t_fiw_1)

                                                   [,1]  [,2]
binary_fiw                                         0.00  1.00
marriage_status_weighted_mean_fiw_1                3.45  3.45
marriage_status_weighted_variance_fiw_1            3.12  3.14
total_people_in_household_weighted_mean_fiw_1      1.99  1.99
total_people_in_household_weighted_variance_fiw_1  2.43  2.21
age_group_weighted_mean_fiw_1                      2.13  2.13
age_group_weighted_variance_fiw_1                  0.68  0.56
education_weighted_mean_fiw_1                      3.14  3.14
education_weighted_variance_fiw_1                  1.64  1.66
family_income_weighted_mean_fiw_1                 11.62 11.62
family_income_weighted_variance_fiw_1              2.41  2.24
race_weighted_mean_fiw_1                           1.53  1.53
race_weighted_variance_fiw_1                       0.61  0.62
sex_weighted_mean_fiw_1                            1.51  1.51
sex_weighted_variance_fiw_1                        0.25  0.25
working_status_weighted_mean_fiw_1                 1.26  1.26
working_status_weighted_variance_fiw_1             0.29  0.31

lm3_fiw_1<-lm(physical_health_status~ binary_fiw+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=eb_data_fiw_1, weights = eb_weight_fiw_1)
lm4_fiw_1<-lm(physical_health_status~ binary_fiw+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=eb_data_fiw_1)

stargazer(lm3_fiw_1, lm4_fiw_1, type = "text", digits = 3, 
          dep.var.labels = c("Earnings Post Training"),
          column.labels = c("Entropy Balanced", "Raw Data"))


============================================================
                                    Dependent variable:     
                                ----------------------------
                                   Earnings Post Training   
                                 Entropy Balanced  Raw Data 
                                       (1)            (2)   
------------------------------------------------------------
binary_fiw                           0.589***      0.591*** 
                                     (0.120)        (0.129) 
                                                            
marriage_status                      -0.084**      -0.088** 
                                     (0.037)        (0.037) 
                                                            
total_people_in_household             -0.024        -0.020  
                                     (0.040)        (0.042) 
                                                            
age_group                             -0.017         0.055  
                                     (0.081)        (0.078) 
                                                            
education                             -0.036        -0.029  
                                     (0.048)        (0.048) 
                                                            
family_income                         -0.013        -0.022  
                                     (0.041)        (0.042) 
                                                            
race                                 -0.194**      -0.184** 
                                     (0.077)        (0.078) 
                                                            
sex                                  0.454***      0.480*** 
                                     (0.122)        (0.121) 
                                                            
working_status                       0.563***      0.532*** 
                                     (0.112)        (0.112) 
                                                            
Constant                              0.996*         0.913  
                                     (0.580)        (0.581) 
                                                            
------------------------------------------------------------
Observations                          1,980          1,980  
R2                                    0.043          0.039  
Adjusted R2                           0.038          0.035  
Residual Std. Error (df = 1970)       2.120          2.642  
F Statistic (df = 9; 1970)           9.788***      8.936*** 
============================================================
Note:                            *p<0.1; **p<0.05; ***p<0.01

Job interrupt the family and mental health status

test_data_new_wif_m<-test_data_new %>% dplyr::select(-c(job_interfere_with_family,family_interfere_with_job,binary_fiw, physical_health_status))

head(test_data_new_wif_m)

# A tibble: 6 × 10
  mental_health_status marriage_status total_people_in_household age_group
                 <dbl>           <dbl>                     <dbl>     <dbl>
1                    8               3                         1         4
2                    0               3                         3         3
3                    8               1                         1         1
4                    8               1                         3         1
5                    0               5                         2         2
6                    5               1                         2         2
# ℹ 6 more variables: education <dbl>, family_income <dbl>, race <dbl>,
#   sex <dbl>, working_status <dbl>, binary_wif <dbl>

##What is the impact of employment training on earnings?
#Testing for Imbalance Between Groups
check_wif_m<-test_data_new_wif_m %>%
  group_by(binary_wif) %>%
  summarise_at(vars(mental_health_status,marriage_status,total_people_in_household,age_group,education,family_income,race,sex, working_status), 
               list(mean = mean,
                    var = var))

round(t(check_wif_m), 3)

                                 [,1]   [,2]
binary_wif                      0.000  1.000
mental_health_status_mean       2.086  3.116
marriage_status_mean            3.140  3.294
total_people_in_household_mean  1.861  1.889
age_group_mean                  2.194  2.112
education_mean                  2.950  3.170
family_income_mean             11.556 11.708
race_mean                       1.534  1.493
sex_mean                        1.504  1.497
working_status_mean             1.308  1.218
mental_health_status_var        9.117 11.206
marriage_status_var             3.220  3.216
total_people_in_household_var   2.170  2.022
age_group_var                   0.736  0.584
education_var                   1.581  1.611
family_income_var               2.493  1.785
race_var                        0.595  0.572
sex_var                         0.250  0.250
working_status_var              0.312  0.272

match_exact_wif_2 <- matchit(binary_wif~mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
                       method = "exact", data =test_data_new_wif_m)
data_exact_wif_2 <- match.data(match_exact_wif_2) #Creates new dataframe that only includes the matched cases
summary(match_exact_wif_2)


Call:
matchit(formula = binary_wif ~ mental_health_status + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = test_data_new_wif_m, 
    method = "exact")

Summary of Balance for All Data:
                          Means Treated Means Control Std. Mean Diff.
mental_health_status             3.1157        2.0860          0.3076
marriage_status                  3.2940        3.1398          0.0860
total_people_in_household        1.8889        1.8611          0.0195
age_group                        2.1123        2.1935         -0.1063
education                        3.1701        2.9498          0.1736
family_income                   11.7083       11.5565          0.1137
race                             1.4931        1.5341         -0.0542
sex                              1.4965        1.5036         -0.0141
working_status                   1.2176        1.3082         -0.1737
                          Var. Ratio eCDF Mean eCDF Max
mental_health_status          1.2291    0.1144   0.1606
marriage_status               0.9988    0.0308   0.0528
total_people_in_household     0.9320    0.0095   0.0301
age_group                     0.7941    0.0288   0.0675
education                     1.0185    0.0441   0.0867
family_income                 0.7161    0.0127   0.0511
race                          0.9616    0.0137   0.0289
sex                           1.0003    0.0035   0.0071
working_status                0.8729    0.0313   0.0923

Summary of Balance for Matched Data:
                          Means Treated Means Control Std. Mean Diff.
mental_health_status             1.6055        1.6055               0
marriage_status                  3.6125        3.6125               0
total_people_in_household        1.4844        1.4844               0
age_group                        2.1938        2.1938               0
education                        3.3149        3.3149               0
family_income                   11.9965       11.9965              -0
race                             1.3391        1.3391               0
sex                              1.3945        1.3945               0
working_status                   1.0484        1.0484               0
                          Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
mental_health_status          0.9985         0        0               0
marriage_status               0.9985         0        0               0
total_people_in_household     0.9985         0        0               0
age_group                     0.9985         0        0               0
education                     0.9985         0        0               0
family_income                 0.9985         0        0               0
race                          0.9985         0        0               0
sex                           0.9985         0        0               0
working_status                0.9985         0        0               0

Sample Sizes:
              Control Treated
All           1116.       864
Matched (ESS)  202.12     289
Matched        317.       289
Unmatched      799.       575
Discarded        0.         0

imbalance_exact_wif_2 <- imbalance(group = data_exact_wif_2$binary_wif, data = data_exact_wif_2, 
          drop = c("mental_health_status", "binary_wif", "weights",  "subclass")) #With matched data, always add weights and subclass here

Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect

imbalance_exact_wif_2


Multivariate Imbalance Measure: L1=0.219
Percentage of local common support: LCS=100.0%

Univariate Imbalance Measures:

                             statistic   type          L1 min 25% 50% 75% max
marriage_status           2.006001e+00 (Chi2) 0.055723533  NA  NA  NA  NA  NA
total_people_in_household 1.923340e+00 (Chi2) 0.032069684  NA  NA  NA  NA  NA
age_group                 3.474368e+00 (Chi2) 0.039470381  NA  NA  NA  NA  NA
education                 7.140104e+00 (Chi2) 0.099985810  NA  NA  NA  NA  NA
family_income             7.705697e-32 (Chi2) 0.000000000  NA  NA  NA  NA  NA
race                      1.839743e-01 (Chi2) 0.013251394  NA  NA  NA  NA  NA
sex                       8.779728e-01 (Chi2) 0.040867562  NA  NA  NA  NA  NA
working_status            5.372074e-02 (Chi2) 0.002641547  NA  NA  NA  NA  NA

###Match Coarsened Exact 
###Perform the matching here with code that resembles most regressions
match_cem_fiw_2 <- matchit(binary_wif ~mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
                       method = "cem", data = test_data_new_wif_m)
data_cem_fiw_2 <- match.data(match_cem_fiw_2) #Creates new dataframe that only includes the matched cases
summary(match_cem_fiw_2)


Call:
matchit(formula = binary_wif ~ mental_health_status + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = test_data_new_wif_m, 
    method = "cem")

Summary of Balance for All Data:
                          Means Treated Means Control Std. Mean Diff.
mental_health_status             3.1157        2.0860          0.3076
marriage_status                  3.2940        3.1398          0.0860
total_people_in_household        1.8889        1.8611          0.0195
age_group                        2.1123        2.1935         -0.1063
education                        3.1701        2.9498          0.1736
family_income                   11.7083       11.5565          0.1137
race                             1.4931        1.5341         -0.0542
sex                              1.4965        1.5036         -0.0141
working_status                   1.2176        1.3082         -0.1737
                          Var. Ratio eCDF Mean eCDF Max
mental_health_status          1.2291    0.1144   0.1606
marriage_status               0.9988    0.0308   0.0528
total_people_in_household     0.9320    0.0095   0.0301
age_group                     0.7941    0.0288   0.0675
education                     1.0185    0.0441   0.0867
family_income                 0.7161    0.0127   0.0511
race                          0.9616    0.0137   0.0289
sex                           1.0003    0.0035   0.0071
working_status                0.8729    0.0313   0.0923

Summary of Balance for Matched Data:
                          Means Treated Means Control Std. Mean Diff.
mental_health_status             1.6055        1.6055               0
marriage_status                  3.6125        3.6125               0
total_people_in_household        1.4844        1.4844               0
age_group                        2.1938        2.1938               0
education                        3.3149        3.3149               0
family_income                   11.9965       11.9965              -0
race                             1.3391        1.3391               0
sex                              1.3945        1.3945               0
working_status                   1.0484        1.0484               0
                          Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
mental_health_status          0.9985         0        0               0
marriage_status               0.9985         0        0               0
total_people_in_household     0.9985         0        0               0
age_group                     0.9985         0        0               0
education                     0.9985         0        0               0
family_income                 0.9985         0        0               0
race                          0.9985         0        0               0
sex                           0.9985         0        0               0
working_status                0.9985         0        0               0

Sample Sizes:
              Control Treated
All           1116.       864
Matched (ESS)  202.12     289
Matched        317.       289
Unmatched      799.       575
Discarded        0.         0

imbalance_cem_fiw_2 <- imbalance(group = data_cem_fiw_2$binary_wif, data = data_cem_fiw_2, 
          drop = c("mental_health_status", "binary_wif", "weights",  "subclass"))

Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect

imbalance_cem_fiw_2


Multivariate Imbalance Measure: L1=0.219
Percentage of local common support: LCS=100.0%

Univariate Imbalance Measures:

                             statistic   type          L1 min 25% 50% 75% max
marriage_status           2.006001e+00 (Chi2) 0.055723533  NA  NA  NA  NA  NA
total_people_in_household 1.923340e+00 (Chi2) 0.032069684  NA  NA  NA  NA  NA
age_group                 3.474368e+00 (Chi2) 0.039470381  NA  NA  NA  NA  NA
education                 7.140104e+00 (Chi2) 0.099985810  NA  NA  NA  NA  NA
family_income             7.705697e-32 (Chi2) 0.000000000  NA  NA  NA  NA  NA
race                      1.839743e-01 (Chi2) 0.013251394  NA  NA  NA  NA  NA
sex                       8.779728e-01 (Chi2) 0.040867562  NA  NA  NA  NA  NA
working_status            5.372074e-02 (Chi2) 0.002641547  NA  NA  NA  NA  NA

#Compare t-tests of DV on treated - no control variables 
t.test(test_data_new_wif_m$mental_health_status, test_data_new_wif_m$binary_wif)


    Welch Two Sample t-test

data:  test_data_new_wif_m$mental_health_status and test_data_new_wif_m$binary_wif
t = 28.782, df = 2073.7, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 1.955972 2.242008
sample estimates:
mean of x mean of y 
2.5353535 0.4363636

#Estimate Linear Regression on Raw Data
lm1_wif_m<-lm(mental_health_status~ binary_wif+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
        test_data_new_wif_m)
summary(lm1_wif_m)


Call:
lm(formula = mental_health_status ~ binary_wif + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = test_data_new_wif_m)

Residuals:
   Min     1Q Median     3Q    Max 
-5.598 -2.226 -1.012  2.324  7.834 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                4.32239    0.66156   6.534 8.15e-11 ***
binary_wif                 1.02060    0.13753   7.421 1.72e-13 ***
marriage_status           -0.11247    0.04142  -2.715  0.00668 ** 
total_people_in_household -0.01621    0.04769  -0.340  0.73400    
age_group                 -0.88963    0.08909  -9.985  < 2e-16 ***
education                 -0.09085    0.05517  -1.647  0.09979 .  
family_income             -0.06044    0.04745  -1.274  0.20289    
race                      -0.42634    0.08890  -4.796 1.74e-06 ***
sex                        0.87945    0.13739   6.401 1.93e-10 ***
working_status             0.30306    0.12727   2.381  0.01735 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.005 on 1970 degrees of freedom
Multiple R-squared:  0.1261,    Adjusted R-squared:  0.1221 
F-statistic: 31.58 on 9 and 1970 DF,  p-value: < 2.2e-16

lm2_wif_m<-lm(mental_health_status~ binary_wif+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
        data_cem_fiw_2, weights = weights)
summary(lm2_wif_m)


Call:
lm(formula = mental_health_status ~ binary_wif + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = data_cem_fiw_2, weights = weights)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-4.9496 -1.6149 -0.7626  0.2902 12.7326 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                5.200e+01  2.236e+01   2.326 0.020365 *  
binary_wif                -1.749e-14  2.170e-01   0.000 1.000000    
marriage_status            5.088e-02  6.899e-02   0.737 0.461130    
total_people_in_household -2.026e-01  1.075e-01  -1.885 0.059959 .  
age_group                 -1.319e+00  1.689e-01  -7.813 2.54e-14 ***
education                 -1.381e-01  8.818e-02  -1.566 0.117955    
family_income             -3.936e+00  1.867e+00  -2.108 0.035412 *  
race                      -7.630e-01  1.635e-01  -4.667 3.78e-06 ***
sex                        8.236e-01  2.382e-01   3.457 0.000585 ***
working_status             1.593e-01  4.541e-01   0.351 0.725816    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.668 on 596 degrees of freedom
Multiple R-squared:  0.1654,    Adjusted R-squared:  0.1528 
F-statistic: 13.13 on 9 and 596 DF,  p-value: < 2.2e-16

 #Compare results 
plot_summs(lm1_wif_m, lm2_wif_m)

stargazer(lm1_wif_m,lm2_wif_m, type = "text", digits = 3, 
          dep.var.labels = c("Earnings Post Training"),
          column.labels = c("Full Data", "Matched Data"))


==========================================================================
                                        Dependent variable:               
                          ------------------------------------------------
                                       Earnings Post Training             
                                 Full Data              Matched Data      
                                    (1)                      (2)          
--------------------------------------------------------------------------
binary_wif                        1.021***                 -0.000         
                                  (0.138)                  (0.217)        
                                                                          
marriage_status                  -0.112***                  0.051         
                                  (0.041)                  (0.069)        
                                                                          
total_people_in_household          -0.016                  -0.203*        
                                  (0.048)                  (0.108)        
                                                                          
age_group                        -0.890***                -1.319***       
                                  (0.089)                  (0.169)        
                                                                          
education                         -0.091*                  -0.138         
                                  (0.055)                  (0.088)        
                                                                          
family_income                      -0.060                 -3.936**        
                                  (0.047)                  (1.867)        
                                                                          
race                             -0.426***                -0.763***       
                                  (0.089)                  (0.163)        
                                                                          
sex                               0.879***                0.824***        
                                  (0.137)                  (0.238)        
                                                                          
working_status                    0.303**                   0.159         
                                  (0.127)                  (0.454)        
                                                                          
Constant                          4.322***                52.004**        
                                  (0.662)                 (22.360)        
                                                                          
--------------------------------------------------------------------------
Observations                       1,980                     606          
R2                                 0.126                    0.165         
Adjusted R2                        0.122                    0.153         
Residual Std. Error          3.005 (df = 1970)        2.668 (df = 596)    
F Statistic               31.581*** (df = 9; 1970) 13.127*** (df = 9; 596)
==========================================================================
Note:                                          *p<0.1; **p<0.05; ***p<0.01

#######Entropy Balancing
# Create a subset of the dataset with the selected variables
treatment_var_wif_m <- "binary_wif"
covariates_vars_wif_m <- c("marriage_status", "total_people_in_household", "age_group","education", "family_income", "race", "sex", "working_status")
dependent_var_wif_m <- "mental_health_status"

# Prepare treatment and covariates
treatment_wif_m <- test_data_new_wif_m$binary_wif
covariates_wif_m <- test_data_new_wif_m[, covariates_vars_wif_m]

# Run entropy balancing
e_bal_wif_m <- ebalance(
  Treatment = treatment_wif_m,
  X = covariates_wif_m,
  max.iterations = 200,
  constraint.tolerance = 1)

Converged within tolerance

# Add weights back to LL
test_data_new_wif_m$eb_weight_wif_m <- NA
test_data_new_wif_m$eb_weight_wif_m[test_data_new_wif_m[[treatment_var_wif_m]] == 1] <- 1  # Treated units get weight = 1
test_data_new_wif_m$eb_weight_wif_m[test_data_new_wif[[treatment_var_wif_m]] == 0] <- e_bal$w  # Control units get EB weights

# Final data for regression
eb_data_wif_m <- test_data_new_wif_m %>% filter(!is.na(eb_weight_wif_m))  # Exclude unmatched if any

#data for analysis 
#Now have a weight called 'eb_weight' that can be used in analysis 

##Let's check that the two groups are equal now 
eb_data_wif_m %>%
  group_by(binary_wif) %>%
  summarise(
    age_weighted_mean = wtd.mean(age_group, weights = eb_weight_wif_m), 
    age_weighted_variance = wtd.var(age_group, weights = eb_weight_wif_m)
  )

# A tibble: 2 × 3
  binary_wif age_weighted_mean age_weighted_variance
       <dbl>             <dbl>                 <dbl>
1          0              2.11                 0.676
2          1              2.11                 0.584

check_wif_m_1 <- eb_data_wif_m %>%
  group_by(binary_wif) %>%
  summarise(across(.cols = c(marriage_status, total_people_in_household, age_group,education, family_income, race, sex, working_status),
                   .fns = list(
                     weighted_mean = ~wtd.mean(., weights = eb_weight_wif_m),
                     weighted_variance = ~wtd.var(., weights = eb_weight_wif_m)),
                   .names = "{.col}_{.fn}"),
            .groups = "drop")

check_t_wif_m<- round(t(check_wif_m_1), 2)

print(check_t_wif_m)

                                             [,1]  [,2]
binary_wif                                   0.00  1.00
marriage_status_weighted_mean                3.29  3.29
marriage_status_weighted_variance            3.24  3.22
total_people_in_household_weighted_mean      1.89  1.89
total_people_in_household_weighted_variance  2.13  2.02
age_group_weighted_mean                      2.11  2.11
age_group_weighted_variance                  0.68  0.58
education_weighted_mean                      3.17  3.17
education_weighted_variance                  1.62  1.61
family_income_weighted_mean                 11.71 11.71
family_income_weighted_variance              1.53  1.79
race_weighted_mean                           1.49  1.49
race_weighted_variance                       0.57  0.57
sex_weighted_mean                            1.50  1.50
sex_weighted_variance                        0.25  0.25
working_status_weighted_mean                 1.22  1.22
working_status_weighted_variance             0.23  0.27

lm3_wif_m<-lm(mental_health_status~ binary_wif+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=eb_data_wif_m, weights = eb_weight_wif_m)
lm4_wif_m<-lm(mental_health_status~ binary_wif+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=eb_data_wif_m)

stargazer(lm3_wif_m, lm4_wif_m, type = "text", digits = 3, 
          dep.var.labels = c("Earnings Post Training"),
          column.labels = c("Entropy Balanced", "Raw Data"))


============================================================
                                    Dependent variable:     
                                ----------------------------
                                   Earnings Post Training   
                                Entropy Balanced   Raw Data 
                                       (1)           (2)    
------------------------------------------------------------
binary_wif                          1.040***       1.021*** 
                                     (0.135)       (0.138)  
                                                            
marriage_status                     -0.108***     -0.112*** 
                                     (0.041)       (0.041)  
                                                            
total_people_in_household            -0.011         -0.016  
                                     (0.048)       (0.048)  
                                                            
age_group                           -0.889***     -0.890*** 
                                     (0.092)       (0.089)  
                                                            
education                            -0.092*       -0.091*  
                                     (0.055)       (0.055)  
                                                            
family_income                        -0.064         -0.060  
                                     (0.054)       (0.047)  
                                                            
race                                -0.448***     -0.426*** 
                                     (0.090)       (0.089)  
                                                            
sex                                 0.936***       0.879*** 
                                     (0.138)       (0.137)  
                                                            
working_status                       0.280**       0.303**  
                                     (0.137)       (0.127)  
                                                            
Constant                            4.297***       4.322*** 
                                     (0.733)       (0.662)  
                                                            
------------------------------------------------------------
Observations                          1,980         1,980   
R2                                    0.126         0.126   
Adjusted R2                           0.122         0.122   
Residual Std. Error (df = 1970)       2.811         3.005   
F Statistic (df = 9; 1970)          31.480***     31.581*** 
============================================================
Note:                            *p<0.1; **p<0.05; ***p<0.01

Family interrupt and Mental health status

test_data_new_fiw_m<-test_data_new %>% dplyr::select(-c(job_interfere_with_family,family_interfere_with_job,binary_wif, physical_health_status))

head(test_data_new_fiw)

# A tibble: 6 × 12
  physical_health_status mental_health_status marriage_status
                   <dbl>                <dbl>           <dbl>
1                      8                    8               3
2                      0                    0               3
3                      4                    8               1
4                      2                    8               1
5                      0                    0               5
6                      5                    5               1
# ℹ 9 more variables: total_people_in_household <dbl>, age_group <dbl>,
#   education <dbl>, family_income <dbl>, race <dbl>, sex <dbl>,
#   working_status <dbl>, binary_fiw <dbl>, eb_weight_fiw_1 <dbl>

##What is the impact of employment training on earnings?
#Testing for Imbalance Between Groups
check_fiw_m<-test_data_new_fiw_m %>%
  group_by(binary_fiw) %>%
  summarise_at(vars(mental_health_status,marriage_status,total_people_in_household,age_group,education,family_income,race,sex, working_status), 
               list(mean = mean,
                    var = var))

round(t(check_fiw_m), 3)

                                 [,1]   [,2]
binary_fiw                      0.000  1.000
mental_health_status_mean       2.236  3.180
marriage_status_mean            3.094  3.451
total_people_in_household_mean  1.818  1.992
age_group_mean                  2.172  2.129
education_mean                  3.000  3.145
family_income_mean             11.624 11.619
race_mean                       1.508  1.533
sex_mean                        1.496  1.511
working_status_mean             1.272  1.261
mental_health_status_var        9.687 10.977
marriage_status_var             3.224  3.138
total_people_in_household_var   2.050  2.206
age_group_var                   0.723  0.559
education_var                   1.574  1.662
family_income_var               2.167  2.239
race_var                        0.571  0.616
sex_var                         0.250  0.250
working_status_var              0.290  0.311

match_exact_fiw_m <- matchit(binary_fiw~mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
                       method = "exact", data = test_data_new_fiw_m)
data_exact_fiw_m <- match.data(match_exact_fiw_m) #Creates new dataframe that only includes the matched cases
summary(match_exact_fiw_m)


Call:
matchit(formula = binary_fiw ~ mental_health_status + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = test_data_new_fiw_m, 
    method = "exact")

Summary of Balance for All Data:
                          Means Treated Means Control Std. Mean Diff.
mental_health_status             3.1799        2.2359          0.2849
marriage_status                  3.4506        3.0939          0.2014
total_people_in_household        1.9920        1.8180          0.1172
age_group                        2.1290        2.1716         -0.0570
education                        3.1449        3.0000          0.1124
family_income                   11.6194       11.6243         -0.0032
race                             1.5334        1.5081          0.0322
sex                              1.5111        1.4956          0.0312
working_status                   1.2611        1.2722         -0.0198
                          Var. Ratio eCDF Mean eCDF Max
mental_health_status          1.1331    0.1049   0.1517
marriage_status               0.9734    0.0713   0.1093
total_people_in_household     1.0761    0.0226   0.0844
age_group                     0.7738    0.0328   0.0642
education                     1.0559    0.0290   0.0637
family_income                 1.0335    0.0039   0.0126
race                          1.0783    0.0084   0.0226
sex                           1.0004    0.0078   0.0156
working_status                1.0733    0.0124   0.0241

Summary of Balance for Matched Data:
                          Means Treated Means Control Std. Mean Diff.
mental_health_status             2.0694        2.0694               0
marriage_status                  3.6481        3.6481               0
total_people_in_household        1.6481        1.6481               0
age_group                        2.1204        2.1204               0
education                        3.3009        3.3009               0
family_income                   11.9954       11.9954               0
race                             1.4213        1.4213               0
sex                              1.4259        1.4259               0
working_status                   1.0370        1.0370               0
                          Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
mental_health_status          0.9993         0        0               0
marriage_status               0.9993         0        0               0
total_people_in_household     0.9993         0        0               0
age_group                     0.9993         0        0               0
education                     0.9993         0        0               0
family_income                 0.9993         0        0               0
race                          0.9993         0        0               0
sex                           0.9993         0        0               0
working_status                0.9993         0        0               0

Sample Sizes:
              Control Treated
All            1352.      628
Matched (ESS)   188.4     216
Matched         329.      216
Unmatched      1023.      412
Discarded         0.        0

imbalance_exact_fiw_m <- imbalance(group = data_exact_fiw_m$binary_fiw, data = data_exact_fiw_m, 
          drop = c("mental_health_status", "binary_fiw", "weights",  "subclass")) #With matched data, always add weights and subclass here

Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect

imbalance_exact_fiw_m


Multivariate Imbalance Measure: L1=0.237
Percentage of local common support: LCS=100.0%

Univariate Imbalance Measures:

                             statistic   type          L1 min 25% 50% 75% max
marriage_status           2.357093e+00 (Chi2) 0.041286727  NA  NA  NA  NA  NA
total_people_in_household 6.476332e+00 (Chi2) 0.080673759  NA  NA  NA  NA  NA
age_group                 2.769106e+00 (Chi2) 0.063210627  NA  NA  NA  NA  NA
education                 3.274957e+00 (Chi2) 0.073581560  NA  NA  NA  NA  NA
family_income             4.121486e-29 (Chi2) 0.000000000  NA  NA  NA  NA  NA
race                      3.458818e+00 (Chi2) 0.066461218  NA  NA  NA  NA  NA
sex                       0.000000e+00 (Chi2) 0.003433525  NA  NA  NA  NA  NA
working_status            3.217730e-02 (Chi2) 0.006641900  NA  NA  NA  NA  NA

###Match Coarsened Exact 
###Perform the matching here with code that resembles most regressions
match_cem_fiw_m <- matchit(binary_fiw ~mental_health_status+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
                       method = "cem", data = test_data_new_fiw_m)
data_cem_fiw_m <- match.data(match_cem_fiw_m) #Creates new dataframe that only includes the matched cases
summary(match_cem_fiw_m)


Call:
matchit(formula = binary_fiw ~ mental_health_status + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = test_data_new_fiw_m, 
    method = "cem")

Summary of Balance for All Data:
                          Means Treated Means Control Std. Mean Diff.
mental_health_status             3.1799        2.2359          0.2849
marriage_status                  3.4506        3.0939          0.2014
total_people_in_household        1.9920        1.8180          0.1172
age_group                        2.1290        2.1716         -0.0570
education                        3.1449        3.0000          0.1124
family_income                   11.6194       11.6243         -0.0032
race                             1.5334        1.5081          0.0322
sex                              1.5111        1.4956          0.0312
working_status                   1.2611        1.2722         -0.0198
                          Var. Ratio eCDF Mean eCDF Max
mental_health_status          1.1331    0.1049   0.1517
marriage_status               0.9734    0.0713   0.1093
total_people_in_household     1.0761    0.0226   0.0844
age_group                     0.7738    0.0328   0.0642
education                     1.0559    0.0290   0.0637
family_income                 1.0335    0.0039   0.0126
race                          1.0783    0.0084   0.0226
sex                           1.0004    0.0078   0.0156
working_status                1.0733    0.0124   0.0241

Summary of Balance for Matched Data:
                          Means Treated Means Control Std. Mean Diff.
mental_health_status             2.0694        2.0694               0
marriage_status                  3.6481        3.6481               0
total_people_in_household        1.6481        1.6481               0
age_group                        2.1204        2.1204               0
education                        3.3009        3.3009               0
family_income                   11.9954       11.9954               0
race                             1.4213        1.4213               0
sex                              1.4259        1.4259               0
working_status                   1.0370        1.0370               0
                          Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
mental_health_status          0.9993         0        0               0
marriage_status               0.9993         0        0               0
total_people_in_household     0.9993         0        0               0
age_group                     0.9993         0        0               0
education                     0.9993         0        0               0
family_income                 0.9993         0        0               0
race                          0.9993         0        0               0
sex                           0.9993         0        0               0
working_status                0.9993         0        0               0

Sample Sizes:
              Control Treated
All            1352.      628
Matched (ESS)   188.4     216
Matched         329.      216
Unmatched      1023.      412
Discarded         0.        0

imbalance_cem_fiw_m <- imbalance(group = data_cem_fiw_m$binary_fiw, data = data_cem_fiw_m, 
          drop = c("mental_health_status", "binary_fiw", "weights",  "subclass"))

Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect
Warning in chisq.test(cbind(t1[keep], t2[keep])): Chi-squared approximation may
be incorrect

imbalance_cem_fiw_m


Multivariate Imbalance Measure: L1=0.237
Percentage of local common support: LCS=100.0%

Univariate Imbalance Measures:

                             statistic   type          L1 min 25% 50% 75% max
marriage_status           2.357093e+00 (Chi2) 0.041286727  NA  NA  NA  NA  NA
total_people_in_household 6.476332e+00 (Chi2) 0.080673759  NA  NA  NA  NA  NA
age_group                 2.769106e+00 (Chi2) 0.063210627  NA  NA  NA  NA  NA
education                 3.274957e+00 (Chi2) 0.073581560  NA  NA  NA  NA  NA
family_income             4.121486e-29 (Chi2) 0.000000000  NA  NA  NA  NA  NA
race                      3.458818e+00 (Chi2) 0.066461218  NA  NA  NA  NA  NA
sex                       0.000000e+00 (Chi2) 0.003433525  NA  NA  NA  NA  NA
working_status            3.217730e-02 (Chi2) 0.006641900  NA  NA  NA  NA  NA

t.test(test_data_new_fiw_m$mental_health_status, test_data_new_fiw_m$binary_fiw)


    Welch Two Sample t-test

data:  test_data_new_fiw_m$mental_health_status and test_data_new_fiw_m$binary_fiw
t = 30.459, df = 2062.4, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 2.075363 2.361000
sample estimates:
mean of x mean of y 
2.5353535 0.3171717

#Estimate Linear Regression on Raw Data
lm1_fiw_m<-lm(mental_health_status~ binary_fiw+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
        test_data_new_fiw_m)
summary(lm1_fiw_m)


Call:
lm(formula = mental_health_status ~ binary_fiw + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = test_data_new_fiw_m)

Residuals:
   Min     1Q Median     3Q    Max 
-5.283 -2.201 -1.038  2.282  7.782 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                4.48384    0.66220   6.771 1.68e-11 ***
binary_fiw                 0.96660    0.14670   6.589 5.66e-11 ***
marriage_status           -0.12499    0.04169  -2.998  0.00275 ** 
total_people_in_household -0.02861    0.04786  -0.598  0.55000    
age_group                 -0.90521    0.08926 -10.142  < 2e-16 ***
education                 -0.07671    0.05523  -1.389  0.16505    
family_income             -0.04567    0.04758  -0.960  0.33724    
race                      -0.45322    0.08915  -5.084 4.04e-07 ***
sex                        0.86251    0.13782   6.258 4.76e-10 ***
working_status             0.24460    0.12734   1.921  0.05491 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.013 on 1970 degrees of freedom
Multiple R-squared:  0.121, Adjusted R-squared:  0.117 
F-statistic: 30.14 on 9 and 1970 DF,  p-value: < 2.2e-16

#Estimate Linear Regression on Coarsened Exact Matched Data
lm2_fiw_m<-lm(mental_health_status~ binary_fiw+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, 
        data_cem_fiw_m, weights = weights)
summary(lm2_fiw_m)


Call:
lm(formula = mental_health_status ~ binary_fiw + marriage_status + 
    total_people_in_household + age_group + education + family_income + 
    race + sex + working_status, data = data_cem_fiw_m, weights = weights)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-6.4630 -1.7506 -0.8811  0.8171 13.6884 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                1.684e+01  2.174e+01   0.774 0.439110    
binary_fiw                 1.734e-15  2.479e-01   0.000 1.000000    
marriage_status           -1.845e-02  8.051e-02  -0.229 0.818793    
total_people_in_household -3.166e-01  1.064e-01  -2.976 0.003056 ** 
age_group                 -1.647e+00  1.951e-01  -8.442 2.94e-16 ***
education                 -3.496e-02  1.012e-01  -0.346 0.729731    
family_income             -7.855e-01  1.812e+00  -0.433 0.664916    
race                      -9.229e-01  1.711e-01  -5.393 1.04e-07 ***
sex                        8.707e-01  2.586e-01   3.366 0.000817 ***
working_status            -1.040e+00  6.612e-01  -1.573 0.116375    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.831 on 535 degrees of freedom
Multiple R-squared:  0.2282,    Adjusted R-squared:  0.2153 
F-statistic: 17.58 on 9 and 535 DF,  p-value: < 2.2e-16

 #Compare results 
plot_summs(lm1_fiw_m,lm2_fiw_m)

stargazer(lm1_fiw_m, lm2_fiw_m, type = "text", digits = 3, 
          dep.var.labels = c("Earnings Post Training"),
          column.labels = c("Full Data", "Matched Data"))


==========================================================================
                                        Dependent variable:               
                          ------------------------------------------------
                                       Earnings Post Training             
                                 Full Data              Matched Data      
                                    (1)                      (2)          
--------------------------------------------------------------------------
binary_fiw                        0.967***                  0.000         
                                  (0.147)                  (0.248)        
                                                                          
marriage_status                  -0.125***                 -0.018         
                                  (0.042)                  (0.081)        
                                                                          
total_people_in_household          -0.029                 -0.317***       
                                  (0.048)                  (0.106)        
                                                                          
age_group                        -0.905***                -1.647***       
                                  (0.089)                  (0.195)        
                                                                          
education                          -0.077                  -0.035         
                                  (0.055)                  (0.101)        
                                                                          
family_income                      -0.046                  -0.785         
                                  (0.048)                  (1.812)        
                                                                          
race                             -0.453***                -0.923***       
                                  (0.089)                  (0.171)        
                                                                          
sex                               0.863***                0.871***        
                                  (0.138)                  (0.259)        
                                                                          
working_status                     0.245*                  -1.040         
                                  (0.127)                  (0.661)        
                                                                          
Constant                          4.484***                 16.836         
                                  (0.662)                 (21.744)        
                                                                          
--------------------------------------------------------------------------
Observations                       1,980                     545          
R2                                 0.121                    0.228         
Adjusted R2                        0.117                    0.215         
Residual Std. Error          3.013 (df = 1970)        2.831 (df = 535)    
F Statistic               30.139*** (df = 9; 1970) 17.581*** (df = 9; 535)
==========================================================================
Note:                                          *p<0.1; **p<0.05; ***p<0.01

#######Entropy Balancing
# Create a subset of the dataset with the selected variables
treatment_var_fiw_m <- "binary_fiw"
covariates_vars_fiw_m <- c("marriage_status", "total_people_in_household", "age_group","education", "family_income", "race", "sex", "working_status")
dependent_var_fiw_m <- "mental_health_status"

# Prepare treatment and covariates
treatment_fiw_m  <- test_data_new_fiw_m$binary_fiw
covariates_fiw_m <- test_data_new_fiw_m[, covariates_vars_fiw_m]

# Run entropy balancing
e_bal_fiw_m  <- ebalance(
  Treatment = treatment_fiw_m,
  X = covariates_fiw_m,
  max.iterations = 200,
  constraint.tolerance = 1)

Converged within tolerance

test_data_new_fiw_m$eb_weight_fiw_m <- NA
test_data_new_fiw_m$eb_weight_fiw_m[test_data_new_fiw_m[[treatment_var_fiw_m]] == 1] <- 1  # Treated units get weight = 1
test_data_new_fiw_m$eb_weight_fiw_m[test_data_new_fiw_m[[treatment_var_fiw_m]] == 0] <- e_bal_fiw_m$w  # Control units get EB weights
# Final data for regression
eb_data_fiw_m <- test_data_new_fiw_m %>% filter(!is.na(eb_weight_fiw_m))  # Exclude unmatched if any

#data for analysis 
#Now have a weight called 'eb_weight' that can be used in analysis 

##Let's check that the two groups are equal now 
eb_data_fiw_m %>%
  group_by(binary_fiw) %>%
  summarise(
    age_weighted_mean_fiw_m = wtd.mean(age_group, weights = eb_weight_fiw_m), 
    age_weighted_variance_fiw_m = wtd.var(age_group, weights = eb_weight_fiw_m)
  )

# A tibble: 2 × 3
  binary_fiw age_weighted_mean_fiw_m age_weighted_variance_fiw_m
       <dbl>                   <dbl>                       <dbl>
1          0                    2.13                       0.676
2          1                    2.13                       0.559

check_fiw_m <- eb_data_fiw_m %>%
  group_by(binary_fiw) %>%
  summarise(across(.cols = c(marriage_status, total_people_in_household, age_group,education, family_income, race, sex, working_status),
                   .fns = list(
                     weighted_mean_fiw_m = ~wtd.mean(., weights = eb_weight_fiw_m),
                     weighted_variance_fiw_m = ~wtd.var(., weights = eb_weight_fiw_m)),
                   .names = "{.col}_{.fn}"),
            .groups = "drop")

check_t_fiw_m<- round(t(check_fiw_m), 2)

print(check_t_fiw_m)

                                                   [,1]  [,2]
binary_fiw                                         0.00  1.00
marriage_status_weighted_mean_fiw_m                3.45  3.45
marriage_status_weighted_variance_fiw_m            3.12  3.14
total_people_in_household_weighted_mean_fiw_m      1.99  1.99
total_people_in_household_weighted_variance_fiw_m  2.43  2.21
age_group_weighted_mean_fiw_m                      2.13  2.13
age_group_weighted_variance_fiw_m                  0.68  0.56
education_weighted_mean_fiw_m                      3.14  3.14
education_weighted_variance_fiw_m                  1.64  1.66
family_income_weighted_mean_fiw_m                 11.62 11.62
family_income_weighted_variance_fiw_m              2.41  2.24
race_weighted_mean_fiw_m                           1.53  1.53
race_weighted_variance_fiw_m                       0.61  0.62
sex_weighted_mean_fiw_m                            1.51  1.51
sex_weighted_variance_fiw_m                        0.25  0.25
working_status_weighted_mean_fiw_m                 1.26  1.26
working_status_weighted_variance_fiw_m             0.29  0.31

lm3_fiw_m<-lm(mental_health_status~ binary_fiw+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=eb_data_fiw_m, weights = eb_weight_fiw_m)
lm4_fiw_m<-lm(mental_health_status~ binary_fiw+marriage_status+ total_people_in_household+age_group+education+family_income+race+sex+working_status, data=eb_data_fiw_m)

stargazer(lm3_fiw_m, lm4_fiw_m, type = "text", digits = 3, 
          dep.var.labels = c("Earnings Post Training"),
          column.labels = c("Entropy Balanced", "Raw Data"))


============================================================
                                    Dependent variable:     
                                ----------------------------
                                   Earnings Post Training   
                                Entropy Balanced   Raw Data 
                                       (1)           (2)    
------------------------------------------------------------
binary_fiw                          0.994***       0.967*** 
                                     (0.137)       (0.147)  
                                                            
marriage_status                     -0.096**      -0.125*** 
                                     (0.042)       (0.042)  
                                                            
total_people_in_household            -0.036         -0.029  
                                     (0.046)       (0.048)  
                                                            
age_group                           -0.939***     -0.905*** 
                                     (0.093)       (0.089)  
                                                            
education                            -0.054         -0.077  
                                     (0.055)       (0.055)  
                                                            
family_income                        -0.022         -0.046  
                                     (0.047)       (0.048)  
                                                            
race                                -0.489***     -0.453*** 
                                     (0.088)       (0.089)  
                                                            
sex                                 0.846***       0.863*** 
                                     (0.139)       (0.138)  
                                                            
working_status                      0.370***        0.245*  
                                     (0.128)       (0.127)  
                                                            
Constant                            4.018***       4.484*** 
                                     (0.661)       (0.662)  
                                                            
------------------------------------------------------------
Observations                          1,980         1,980   
R2                                    0.123         0.121   
Adjusted R2                           0.119         0.117   
Residual Std. Error (df = 1970)       2.419         3.013   
F Statistic (df = 9; 1970)          30.668***     30.139*** 
============================================================
Note:                            *p<0.1; **p<0.05; ***p<0.01