The purpose of the present research was to understand the differences in citizenship status in estimating the probabilities of poverty by the variation in educational attainment among employed and never married female in . The research question was tested for this purpose. 1. Does the estimated probability of poverty vary in terms of the differences in citizenship status among three different groups of educational attainment.
The sample of this research was consisted of employed and never married female between the age 18 and 45 years. The sample of the present research was nationally representative. The sample was consisted of the American community Survey data 2016, which was available in pumps.org. Although the extracted data consisted of both US census and American community survey from the year 2002 to 2016, only the year 2016 was used for the present analysis.
Education: Education was re coded as 3 categories(Less than high school, High school, College) out of existing codes. Race: The variable race was re coded as 5 broad categories(White, Black, American Indian, Asian, Other) of race out of the existing codes. Chinese, Japanese, and other Asian were re coded under Asian category, all the other races and multiracial are re coded under Other category. Citizen: Citizenship status was re-coding as “yes” for those with neutralized citizenship and those who born abroad of American parents, and “no”" for those who were no a citizen and who were not citizen but received the first paper and “NA” for rest of the codes(Foreign born, citizenship status not reported, NA).
Poverty:Poverty in the data set was recorded as a poverty thresholds. Thresholds range between 1 and 500 was re coded as below poverty thresholds and 501 or more as above poverty thresholds.
Binary logistic regression analysis was conducted by using 5 different models to get the best fitted model. Zelig 4 was used to estimate the probabilities of poverty by the difference in citizenship status among different groups of educational attainment.
getwd()
## [1] "/Users/mzaliny/work"
setwd("/Users/mzaliny/work")
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.8.0 ✔ stringr 1.2.0
## ✔ readr 1.1.1 ✔ forcats 0.2.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
F1<-read_csv("family_ipums.csv")
## Parsed with column specification:
## cols(
## .default = col_integer()
## )
## See spec(...) for full column specifications.
F2<-subset(F1,YEAR=="2016")
head(F2)
## # A tibble: 6 x 28
## YEAR DATANUM SERIAL HHWT GQ PERNUM PERWT FAMSIZE NCHILD ELDCH YNGCH
## <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 2016 1 3 159 1 3 184 4 0 99 99
## 2 2016 1 6 226 1 1 226 2 0 99 99
## 3 2016 1 6 226 1 2 155 2 0 99 99
## 4 2016 1 17 78 1 2 142 3 0 99 99
## 5 2016 1 17 78 1 3 142 3 0 99 99
## 6 2016 1 18 71 1 1 71 3 2 20 16
## # ... with 17 more variables: SEX <int>, AGE <int>, MARST <int>,
## # RACE <int>, RACED <int>, CITIZEN <int>, LANGUAGE <int>,
## # LANGUAGED <int>, EDUC <int>, EDUCD <int>, EMPSTAT <int>,
## # EMPSTATD <int>, UHRSWORK <int>, INCTOT <int>, INCOTHER <int>,
## # POVERTY <int>, DISABWRK <int>
library(tidyverse)
F2<-F2[c(1, 8, 12:15, 17, 20, 22, 27)]
library(tidyverse)
F2<-F2 %>%
mutate(race_r = sjmisc::rec(RACE, rec = "1=1; 2=2; 3=3; 4:6=4; 7:9=5 "))%>%
mutate(citizen_r=sjmisc::rec(CITIZEN, rec = "0=0; 1:2=1; 3:4=2; 5=0"))%>%
mutate(poverty_r=sjmisc::rec(POVERTY, rec = "1:500=1; 501=0"))%>%
mutate(edu_r=sjmisc::rec(EDUC, rec = "0:5=1; 6=2; 7:11=3"))
head(F2)
## # A tibble: 6 x 14
## YEAR FAMSIZE SEX AGE MARST RACE CITIZEN EDUC EMPSTAT POVERTY
## <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 2016 4 1 20 6 1 0 6 3 261
## 2 2016 2 1 25 6 1 0 7 1 154
## 3 2016 2 2 23 6 1 0 8 1 153
## 4 2016 3 2 25 6 2 0 7 1 326
## 5 2016 3 1 24 6 2 0 7 1 326
## 6 2016 3 2 37 6 2 0 7 1 122
## # ... with 4 more variables: race_r <dbl>, citizen_r <dbl>,
## # poverty_r <dbl>, edu_r <dbl>
library(tidyverse)
F2$SEX<-factor(F2$SEX, levels = c(1, 2),
labels = c("Male","Female"))
F2$race_r<-factor(F2$race_r, levels = c(1, 2, 3, 4, 5),
labels = c("White", "Black", "AmericanIndian", "Asian", "Other"))
F2$MARST<-factor(F2$MARST, levels = c(1, 2, 3, 4, 5, 6),
labels = c("Married_SP", "Married_SA","Separated", "Divorced", "Widowed", "NM/single"))
F2$citizen_r<-factor(F2$citizen_r, levels = c(0, 1, 2),
labels = c("NA", "yes", "no"))
F2$EMPSTAT<-factor(F2$EMPSTAT, levels = c(0, 1, 2, 3),
labels = c("NA", "Employed", "Unemployed", "Other" ))
F2$edu_r<-factor(F2$edu_r, levels = c(1, 2, 3),
labels = c("Less than High School", "High School", "College"))
head(F2)
## # A tibble: 6 x 14
## YEAR FAMSIZE SEX AGE MARST RACE CITIZEN EDUC EMPSTAT POVERTY
## <int> <int> <fct> <int> <fct> <int> <int> <int> <fct> <int>
## 1 2016 4 Male 20 NM/single 1 0 6 Other 261
## 2 2016 2 Male 25 NM/single 1 0 7 Employ… 154
## 3 2016 2 Female 23 NM/single 1 0 8 Employ… 153
## 4 2016 3 Female 25 NM/single 2 0 7 Employ… 326
## 5 2016 3 Male 24 NM/single 2 0 7 Employ… 326
## 6 2016 3 Female 37 NM/single 2 0 7 Employ… 122
## # ... with 4 more variables: race_r <fct>, citizen_r <fct>,
## # poverty_r <dbl>, edu_r <fct>
library(tidyverse)
F2<-F2[c(3:5,9,11:14)]
head(F2)
## # A tibble: 6 x 8
## SEX AGE MARST EMPSTAT race_r citizen_r poverty_r edu_r
## <fct> <int> <fct> <fct> <fct> <fct> <dbl> <fct>
## 1 Male 20 NM/single Other White NA 1.00 High School
## 2 Male 25 NM/single Employed White NA 1.00 College
## 3 Female 23 NM/single Employed White NA 1.00 College
## 4 Female 25 NM/single Employed Black NA 1.00 College
## 5 Male 24 NM/single Employed Black NA 1.00 College
## 6 Female 37 NM/single Employed Black NA 1.00 College
library(dplyr)
Female.data <- subset(F2, SEX=="Female" & EMPSTAT=="Employed",
select=SEX:edu_r)
head(Female.data)
## # A tibble: 6 x 8
## SEX AGE MARST EMPSTAT race_r citizen_r poverty_r edu_r
## <fct> <int> <fct> <fct> <fct> <fct> <dbl> <fct>
## 1 Female 23 NM/single Employed White NA 1.00 College
## 2 Female 25 NM/single Employed Black NA 1.00 College
## 3 Female 37 NM/single Employed Black NA 1.00 College
## 4 Female 28 NM/single Employed Black NA 1.00 College
## 5 Female 18 NM/single Employed White NA 1.00 High School
## 6 Female 26 NM/single Employed White NA 1.00 Less than Hi…
library(tidyverse)
Female.data<-Female.data[-which(Female.data$citizen_r == "NA"), ]
head(Female.data)
## # A tibble: 6 x 8
## SEX AGE MARST EMPSTAT race_r citizen_r poverty_r edu_r
## <fct> <int> <fct> <fct> <fct> <fct> <dbl> <fct>
## 1 Female 27 NM/single Employed Asian yes 1.00 College
## 2 Female 22 NM/single Employed Black yes NA College
## 3 Female 43 NM/single Employed White yes 1.00 College
## 4 Female 31 NM/single Employed Black no 1.00 College
## 5 Female 31 NM/single Employed Black no 1.00 College
## 6 Female 31 NM/single Employed Asian no 0 College
library(tidyverse)
Female.data<-na.omit(Female.data)
head(Female.data)
## # A tibble: 6 x 8
## SEX AGE MARST EMPSTAT race_r citizen_r poverty_r edu_r
## <fct> <int> <fct> <fct> <fct> <fct> <dbl> <fct>
## 1 Female 27 NM/single Employed Asian yes 1.00 College
## 2 Female 43 NM/single Employed White yes 1.00 College
## 3 Female 31 NM/single Employed Black no 1.00 College
## 4 Female 31 NM/single Employed Black no 1.00 College
## 5 Female 31 NM/single Employed Asian no 0 College
## 6 Female 31 NM/single Employed Black no 1.00 College
m0f<-glm(poverty_r~citizen_r,family = binomial, data = Female.data)
summary(m0f)
##
## Call:
## glm(formula = poverty_r ~ citizen_r, family = binomial, data = Female.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0647 0.5026 0.5026 0.8093 0.8093
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.94820 0.02449 38.72 <2e-16 ***
## citizen_rno 1.05691 0.04139 25.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 16766 on 16869 degrees of freedom
## Residual deviance: 16067 on 16868 degrees of freedom
## AIC: 16071
##
## Number of Fisher Scoring iterations: 4
m1f<-glm(poverty_r~citizen_r+edu_r+race_r,family = binomial, data = Female.data)
summary(m1f)
##
## Call:
## glm(formula = poverty_r ~ citizen_r + edu_r + race_r, family = binomial,
## data = Female.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.7982 0.2628 0.4821 0.7833 1.0749
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.57031 0.12225 21.025 < 2e-16 ***
## citizen_rno 0.77849 0.04338 17.946 < 2e-16 ***
## edu_rHigh School -0.74569 0.12781 -5.834 5.40e-09 ***
## edu_rCollege -1.80087 0.12073 -14.917 < 2e-16 ***
## race_rBlack 0.24212 0.06990 3.464 0.000533 ***
## race_rAmericanIndian 0.87846 0.43395 2.024 0.042938 *
## race_rAsian -0.52350 0.04674 -11.200 < 2e-16 ***
## race_rOther 0.54590 0.07068 7.724 1.13e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 16766 on 16869 degrees of freedom
## Residual deviance: 14857 on 16862 degrees of freedom
## AIC: 14873
##
## Number of Fisher Scoring iterations: 6
m2f<-glm(poverty_r~citizen_r*edu_r+race_r,family = binomial, data = Female.data)
summary(m2f)
##
## Call:
## glm(formula = poverty_r ~ citizen_r * edu_r + race_r, family = binomial,
## data = Female.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9568 0.2079 0.5042 0.7740 1.0550
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.96568 0.16172 12.155 < 2e-16 ***
## citizen_rno 1.85784 0.23993 7.743 9.70e-15 ***
## edu_rHigh School -0.27156 0.17168 -1.582 0.113692
## edu_rCollege -1.16434 0.16248 -7.166 7.73e-13 ***
## race_rBlack 0.25072 0.06974 3.595 0.000324 ***
## race_rAmericanIndian 0.88967 0.43383 2.051 0.040293 *
## race_rAsian -0.50647 0.04667 -10.853 < 2e-16 ***
## race_rOther 0.53520 0.07078 7.561 3.99e-14 ***
## citizen_rno:edu_rHigh School -0.74164 0.26227 -2.828 0.004687 **
## citizen_rno:edu_rCollege -1.19594 0.24470 -4.887 1.02e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 16766 on 16869 degrees of freedom
## Residual deviance: 14821 on 16860 degrees of freedom
## AIC: 14841
##
## Number of Fisher Scoring iterations: 6
m3f<-glm(poverty_r~citizen_r*race_r+edu_r,family = binomial, data = Female.data)
summary(m3f)
##
## Call:
## glm(formula = poverty_r ~ citizen_r * race_r + edu_r, family = binomial,
## data = Female.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8346 0.2614 0.4515 0.7703 1.0624
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.51703 0.12532 20.085 < 2e-16 ***
## citizen_rno 0.84238 0.06816 12.359 < 2e-16 ***
## race_rBlack 0.31330 0.08292 3.778 0.000158 ***
## race_rAmericanIndian 0.81611 0.54331 1.502 0.133067
## race_rAsian -0.47303 0.05775 -8.191 2.60e-16 ***
## race_rOther 0.48421 0.08911 5.434 5.52e-08 ***
## edu_rHigh School -0.72063 0.12827 -5.618 1.93e-08 ***
## edu_rCollege -1.76726 0.12152 -14.543 < 2e-16 ***
## citizen_rno:race_rBlack -0.24838 0.15287 -1.625 0.104203
## citizen_rno:race_rAmericanIndian 0.16306 0.90958 0.179 0.857729
## citizen_rno:race_rAsian -0.14675 0.09721 -1.510 0.131152
## citizen_rno:race_rOther 0.15554 0.14717 1.057 0.290562
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 16766 on 16869 degrees of freedom
## Residual deviance: 14850 on 16858 degrees of freedom
## AIC: 14874
##
## Number of Fisher Scoring iterations: 6
m4f<-glm(poverty_r~citizen_r*(edu_r+race_r),family = binomial, data = Female.data)
summary(m4f)
##
## Call:
## glm(formula = poverty_r ~ citizen_r * (edu_r + race_r), family = binomial,
## data = Female.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9752 0.2089 0.4834 0.7589 1.0514
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.96181 0.16263 12.063 < 2e-16 ***
## citizen_rno 1.85216 0.24409 7.588 3.25e-14 ***
## edu_rHigh School -0.27555 0.17163 -1.606 0.108377
## edu_rCollege -1.17166 0.16252 -7.210 5.62e-13 ***
## race_rBlack 0.30724 0.08254 3.722 0.000197 ***
## race_rAmericanIndian 0.83765 0.54056 1.550 0.121240
## race_rAsian -0.48628 0.05751 -8.455 < 2e-16 ***
## race_rOther 0.49579 0.08866 5.592 2.24e-08 ***
## citizen_rno:edu_rHigh School -0.71914 0.26261 -2.738 0.006174 **
## citizen_rno:edu_rCollege -1.15428 0.24664 -4.680 2.87e-06 ***
## citizen_rno:race_rBlack -0.20549 0.15345 -1.339 0.180540
## citizen_rno:race_rAmericanIndian 0.13892 0.91045 0.153 0.878730
## citizen_rno:race_rAsian -0.06093 0.09842 -0.619 0.535853
## citizen_rno:race_rOther 0.10400 0.14778 0.704 0.481586
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 16766 on 16869 degrees of freedom
## Residual deviance: 14818 on 16856 degrees of freedom
## AIC: 14846
##
## Number of Fisher Scoring iterations: 6
lmtest::lrtest(m0f, m1f,m2f, m3f, m4f)
## Likelihood ratio test
##
## Model 1: poverty_r ~ citizen_r
## Model 2: poverty_r ~ citizen_r + edu_r + race_r
## Model 3: poverty_r ~ citizen_r * edu_r + race_r
## Model 4: poverty_r ~ citizen_r * race_r + edu_r
## Model 5: poverty_r ~ citizen_r * (edu_r + race_r)
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 2 -8033.4
## 2 8 -7428.6 6 1209.637 < 2.2e-16 ***
## 3 10 -7410.3 2 36.610 1.122e-08 ***
## 4 12 -7425.2 2 29.671 3.606e-07 ***
## 5 14 -7408.8 2 32.696 7.946e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(Zelig)
## Loading required package: survival
##
## Attaching package: 'Zelig'
## The following object is masked from 'package:purrr':
##
## reduce
zfemale <- zelig(poverty_r~citizen_r*edu_r+race_r, model = "logit", data = Female.data, cite = F)
summary(zfemale)
## Model:
##
## Call:
## z5$zelig(formula = poverty_r ~ citizen_r * edu_r + race_r, data = Female.data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9568 0.2079 0.5042 0.7740 1.0550
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.96568 0.16172 12.155 < 2e-16
## citizen_rno 1.85784 0.23993 7.743 9.70e-15
## edu_rHigh School -0.27156 0.17168 -1.582 0.113692
## edu_rCollege -1.16434 0.16248 -7.166 7.73e-13
## race_rBlack 0.25072 0.06974 3.595 0.000324
## race_rAmericanIndian 0.88967 0.43383 2.051 0.040293
## race_rAsian -0.50647 0.04667 -10.853 < 2e-16
## race_rOther 0.53520 0.07078 7.561 3.99e-14
## citizen_rno:edu_rHigh School -0.74164 0.26227 -2.828 0.004687
## citizen_rno:edu_rCollege -1.19594 0.24470 -4.887 1.02e-06
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 16766 on 16869 degrees of freedom
## Residual deviance: 14821 on 16860 degrees of freedom
## AIC: 14841
##
## Number of Fisher Scoring iterations: 6
##
## Next step: Use 'setx' method
x.cy <- setx(zfemale, citizen_r="yes")
x.cn <- setx(zfemale, citizen_r="no")
s.citizen <- sim(zfemale, x = x.cy, x1=x.cn)
summary(s.citizen)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.6901251 0.008076961 0.6903511 0.6738736 0.7058212
## pv
## 0 1
## [1,] 0.303 0.697
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.8116545 0.007255409 0.8116289 0.796806 0.8255204
## pv
## 0 1
## [1,] 0.192 0.808
## fd
## mean sd 50% 2.5% 97.5%
## [1,] 0.1215293 0.008519054 0.1215659 0.1042979 0.1384006
plot(s.citizen)
xlessh <- setx(zfemale, citizen_r = "yes", edu_r = "Less than High School")
x1lessh <- setx(zfemale, citizen_r = "no", edu_r = "Less than High School")
slessh <- sim(zfemale, x = xlessh, x1 = x1lessh)
plot(slessh)
xhigh <- setx(zfemale, citizen_r = "yes", edu_r = "High School")
x1high <- setx(zfemale, citizen_r = "no", edu_r = "High School")
shigh <- sim(zfemale, x = xhigh, x1 = x1high)
plot(shigh)
xcollege <- setx(zfemale, citizen_r = "yes", edu_r = "College")
x1college <- setx(zfemale, citizen_r = "no", edu_r = "College")
scollege <- sim(zfemale, x = xcollege, x1 = x1college)
plot(scollege)
fd1 <- slessh$get_qi(xvalue="x1", qi="fd")
fd2 <- shigh$get_qi(xvalue="x1", qi="fd")
fd3 <- scollege$get_qi(xvalue="x1", qi="fd")
edufd <- as.data.frame(cbind(fd1, fd2, fd3))
head(edufd)
## V1 V2 V3
## 1 0.09180455 0.10272978 0.1279951
## 2 0.10987893 0.10719154 0.1046025
## 3 0.12549160 0.10324767 0.1222664
## 4 0.07864366 0.08847176 0.1304338
## 5 0.10662803 0.10324674 0.1253150
## 6 0.11110825 0.10268249 0.1150790
library(tidyr)
tidd <- edufd %>%
gather(edu_r, simv, 1:3)
head(tidd)
## edu_r simv
## 1 V1 0.09180455
## 2 V1 0.10987893
## 3 V1 0.12549160
## 4 V1 0.07864366
## 5 V1 0.10662803
## 6 V1 0.11110825
library(dplyr)
tidd %>%
group_by(edu_r) %>%
summarise(mean = mean(simv), sd = sd(simv))
## # A tibble: 3 x 3
## edu_r mean sd
## <chr> <dbl> <dbl>
## 1 V1 0.102 0.0175
## 2 V2 0.0989 0.00960
## 3 V3 0.122 0.00903
library(ggplot2)
ggplot(tidd, aes(simv)) + geom_histogram() + facet_grid(~edu_r)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The results suggested that the probability of poverty can be estimated as 0.102 by the differences between citizenship status within female group of less than high school education. The probabilities were 0.099 and 0.122 for female group with high school education and college education respectively. So, the probability of estimating poverty by the differences in citizenship status was highest among female with college education than other groups of educational attainment.