Proposal Draft EHA
Bachelors Attainment Between Hispanics and Non Hispanic Whites - An Analysis
Introduction
For this analysis I will observe the possible differences in educational attainment between Hispanics and Non Hispanic Whites. The outcome variable to help measure educational attainment will be whether one received a Bachelors degree or not. It is hypothesized that
h1) those who are Non Hispanic whites will be at greater risk of earning a Bachelors degree as opposed to Hispanics.
The data being used will be the National Longitudinal Study of Youth (NLSY) following the cohort from 1997. For this analysis three waves of data are observed: 2004, 2010, and 2019. In the year of 1997 many of the respondents are still in high school and they are followed throughout high school graduation throughout the rest of their life course. This sample is representative of the national population.
I begin our study by observing the year of 2004 about a few years after the study began to allow time to pass for those respondents to go through high school and be able to earn a Bachelors degree. I present three survival curves for each wave of data to observe those cross sectional relationships between the age when the respondent had earned a Bachelors degree. Next a series of Cox Regression models will be conducted to observe those possible interaction effects that may occur effecting the risk of the age of earning a Bachelors degree between Hispanics and Non Hispanic whites. The different variables that will be observed will be sex, if the father earned a Bachelors degree, if the mother earned a Bachelors degree, the respondents perceived skin tone, and the region of residence of the respondent in 2019.
The Variables
myvars<-c( "ID","HDEGREE04", "HDEGREE2010", "HDEGREE2019","ETHNICITY",
"SEX", "BIOFTHIGD", "BIOMTHIGD", "BDATEY","VSTRAT", "VPSU",
"samplingweight","DATEBA", "Rskintone", "Rregion2019")
dat97<-dat97[,myvars]
dat97<- dat97 %>%
filter(HDEGREE04 >=0, HDEGREE2010>=0, HDEGREE2019>=0, DATEBA >=0, ETHNICITY>=0, SEX>=0, VSTRAT>=0, Rskintone>=0, Rregion2019>=0) #filter missing data codesTime Constant Variables
Year of Bachelors 2004
Bachelors degree or higher = 1 & all lesser educations are labled 0
dat97$Bachelors_1 <-Recode(dat97$HDEGREE04, recodes = "0:3 = 0; 4:7 = 1; else=NA", as.factor=T)Year of Bachelors 2010
Bachelors degree or higher = 1 & all lesser educations are labeled 0.
dat97$Bachelors_2 <-Recode(dat97$HDEGREE2010, recodes = "0:3 = 0; 4:7 = 1; else=NA", as.factor=T)Year of Bachelors 2019
Bachelors degree or higher = 1 & all lesser educations are labeled 0.
dat97$Bachelors_3 <-Recode(dat97$HDEGREE2019, recodes = "0:3 = 0; 4:7 = 1; else=NA", as.factor=T)Hispanic
Hispanics are coded as 0 & Non Hispanic whites are coded as 1, all other ethnicities are excluded.
dat97$Hispanic<-Recode(dat97$ETHNICITY, recodes = "2 = 0; 4 = 1; else=NA", as.factor=T)
dat97$his1<-as.factor(ifelse(dat97$Hispanic==1, "Hispanic", "Non Hispanic"))Sex
Women are coded as 0 & Men are coded as 1.
dat97$sex1<-Recode(dat97$SEX, recodes = "1 = 0; 2 = 1; else=NA", as.factor=T)
dat97$sex11<-as.factor(ifelse(dat97$sex1==1, "Women", "Men"))Bachelors Degree of the Father
Dads Bachelors degree or higher = 1 & all lesser educations are labeled 0.
dat97$DADBA <-Recode(dat97$BIOFTHIGD, recodes = "0:15 = 0; 16:20 = 1; else=NA", as.factor=T)Bachelors Degree of the Mother
Moms Bachelors degree or higher = 1 & all lesser educations are labeled 0.
dat97$MOMBA <-Recode(dat97$BIOMTHIGD, recodes = "0:15 = 0; 16:20 = 1; else=NA", as.factor=T)Respondents Skin Tone
Respondents perceived skin tone. 0 to 3 are those perceived with whiter skin tones. Those 4 to 10 are those perceived to have darker skin tones.
dat97$skintone <-Recode(dat97$'Rskintone' , recodes = "0:3 = 0; 4:10 = 1; else=NA", as.factor=T)Respondents Region of Residence in 2019 living in the Southern US.
Respondents region of residence in 2019. Those 1 (Northeast), 2 (North), and 4 (West)coded into 0 and 3 (South) coded into 1
dat97$south <-Recode(dat97$'Rregion2019' , recodes = "1:2 = 0; 4 = 0; 3 = 1; else=NA", as.factor=T)Censoring
#dat97<- dat97%>%filter(DATEBA>0)
dat97$BAYR_1<-ifelse(dat97$HDEGREE04==4,
(2004-dat97$BDATEY),
ifelse(dat97$HDEGREE04>=4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yet
## For the wave of 2004Time varying variables
dat97$BAYR_2<-ifelse(dat97$HDEGREE2010==2,
(2010-dat97$BDATEY),
ifelse(dat97$HDEGREE2010>=4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yet
## For the wave of 2010dat97$BAYR_3<-ifelse(dat97$HDEGREE2019==2,
(2019-dat97$BDATEY),
ifelse(dat97$HDEGREE2019>=4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yet
## For the wave of 2019Results
Initial Survival Analysis Observations
dat97<- data.frame(dat97)
fit<-survfit(Surv(time = BAYR_1, event = as.numeric(Bachelors_1) )~his1,
data = dat97)
summary(fit)Call: survfit(formula = Surv(time = BAYR_1, event = as.numeric(Bachelors_1)) ~
his1, data = dat97)
1178 observations deleted due to missingness
his1=Hispanic
time n.risk n.event survival std.err lower 95% CI upper 95% CI
21.0 270 2 0.993 0.00522 0.982 1.000
22.0 268 50 0.807 0.02400 0.762 0.856
22.5 218 1 0.804 0.02417 0.758 0.853
23.0 217 117 0.370 0.02939 0.317 0.433
23.4 100 1 0.367 0.02933 0.313 0.429
24.0 99 99 0.000 NaN NA NA
his1=Non Hispanic
time n.risk n.event survival std.err lower 95% CI upper 95% CI
21 37 1 0.973 0.0267 0.922 1.000
22 36 3 0.892 0.0510 0.797 0.998
23 33 16 0.459 0.0819 0.324 0.652
24 17 17 0.000 NaN NA NA
fit %>%
ggsurvfit()+
xlim(18, 25)Warning: Removed 2 row(s) containing missing values (geom_path).
## Wave of 2004This first survival model observes the year 2004. In this first survival model we observe really no difference between Hispanics and Non Hispanics whites regarding the risk of earning a Bachelors degree. This observes the age interval 18 - 25.
fit1<-survfit(Surv(time = BAYR_2, event = as.numeric(Bachelors_2) )~his1,
data = dat97)
summary(fit1)Call: survfit(formula = Surv(time = BAYR_2, event = as.numeric(Bachelors_2)) ~
his1, data = dat97)
359 observations deleted due to missingness
his1=Hispanic
time n.risk n.event survival std.err lower 95% CI upper 95% CI
21.4 918 1 0.99891 0.00109 0.996779 1.0000
22.0 917 3 0.99564 0.00217 0.991391 0.9999
22.1 914 1 0.99455 0.00243 0.989804 0.9993
22.3 913 2 0.99237 0.00287 0.986763 0.9980
22.4 911 22 0.96841 0.00577 0.957161 0.9798
22.5 889 6 0.96187 0.00632 0.949565 0.9743
22.6 883 1 0.96078 0.00641 0.948309 0.9734
22.7 882 1 0.95969 0.00649 0.947056 0.9725
23.0 881 8 0.95098 0.00713 0.937116 0.9651
23.3 873 2 0.94880 0.00727 0.934651 0.9632
23.4 871 74 0.86819 0.01116 0.846582 0.8904
23.5 797 11 0.85621 0.01158 0.833810 0.8792
23.6 786 1 0.85512 0.01162 0.832651 0.8782
23.7 785 6 0.84858 0.01183 0.825710 0.8721
24.0 779 27 0.81917 0.01270 0.794650 0.8445
24.1 752 1 0.81808 0.01273 0.793504 0.8434
24.2 751 2 0.81590 0.01279 0.791215 0.8414
24.3 749 8 0.80719 0.01302 0.782069 0.8331
24.4 741 76 0.72440 0.01475 0.696066 0.7539
24.5 665 4 0.72004 0.01482 0.691578 0.7497
24.6 661 1 0.71895 0.01484 0.690456 0.7486
24.7 660 11 0.70697 0.01502 0.678133 0.7370
24.8 649 1 0.70588 0.01504 0.677014 0.7360
25.0 648 28 0.67538 0.01545 0.645761 0.7064
25.1 620 2 0.67320 0.01548 0.643535 0.7042
25.3 618 11 0.66122 0.01562 0.631301 0.6926
25.4 607 90 0.56318 0.01637 0.531993 0.5962
25.5 517 6 0.55664 0.01640 0.525419 0.5897
25.6 511 2 0.55447 0.01640 0.523229 0.5876
25.7 509 13 0.54031 0.01645 0.509009 0.5735
25.8 496 1 0.53922 0.01645 0.507916 0.5724
26.0 495 35 0.50109 0.01650 0.469767 0.5345
26.1 445 1 0.49996 0.01650 0.468641 0.5334
26.3 444 5 0.49433 0.01651 0.463013 0.5278
26.4 439 97 0.38511 0.01616 0.354697 0.4181
26.5 342 18 0.36484 0.01600 0.334784 0.3976
26.6 324 3 0.36146 0.01597 0.331472 0.3942
26.7 321 9 0.35133 0.01588 0.321544 0.3839
26.8 312 3 0.34795 0.01584 0.318238 0.3804
26.9 309 3 0.34457 0.01581 0.314934 0.3770
27.0 306 44 0.29502 0.01520 0.266688 0.3264
27.1 246 1 0.29382 0.01518 0.265520 0.3251
27.3 245 8 0.28423 0.01506 0.256188 0.3153
27.4 237 62 0.20987 0.01377 0.184553 0.2387
27.5 175 4 0.20508 0.01366 0.179977 0.2337
27.7 171 6 0.19788 0.01349 0.173126 0.2262
27.8 165 2 0.19548 0.01344 0.170846 0.2237
27.9 163 2 0.19308 0.01338 0.168567 0.2212
28.0 161 22 0.16670 0.01268 0.143616 0.1935
28.1 130 1 0.16542 0.01264 0.142402 0.1922
28.2 129 2 0.16285 0.01258 0.139976 0.1895
28.2 127 1 0.16157 0.01254 0.138764 0.1881
28.3 126 1 0.16029 0.01251 0.137553 0.1868
28.4 125 27 0.12567 0.01145 0.105122 0.1502
28.5 98 2 0.12310 0.01135 0.102743 0.1475
28.6 96 1 0.12182 0.01131 0.101555 0.1461
28.7 95 2 0.11925 0.01121 0.099181 0.1434
28.8 93 1 0.11797 0.01117 0.097995 0.1420
28.8 92 1 0.11669 0.01112 0.096811 0.1407
29.0 91 18 0.09361 0.01016 0.075664 0.1158
29.1 59 1 0.09202 0.01011 0.074187 0.1141
29.3 58 1 0.09044 0.01006 0.072713 0.1125
29.4 57 14 0.06822 0.00918 0.052411 0.0888
29.5 43 1 0.06664 0.00910 0.050988 0.0871
29.6 42 2 0.06346 0.00894 0.048153 0.0836
29.7 40 2 0.06029 0.00877 0.045335 0.0802
29.8 38 1 0.05870 0.00868 0.043933 0.0784
29.8 37 3 0.05394 0.00840 0.039754 0.0732
30.0 34 7 0.04284 0.00765 0.030189 0.0608
30.4 18 11 0.01666 0.00575 0.008468 0.0328
30.5 7 1 0.01428 0.00540 0.006805 0.0300
30.7 6 2 0.00952 0.00453 0.003747 0.0242
31.3 4 1 0.00714 0.00397 0.002399 0.0212
31.4 3 1 0.00476 0.00328 0.001231 0.0184
33.5 2 1 0.00238 0.00235 0.000343 0.0165
36.1 1 1 0.00000 NaN NA NA
his1=Non Hispanic
time n.risk n.event survival std.err lower 95% CI upper 95% CI
22.4 208 3 0.9856 0.00827 0.96951 1.0000
22.5 205 1 0.9808 0.00952 0.96228 0.9996
22.6 204 1 0.9760 0.01062 0.95537 0.9970
23.0 203 1 0.9712 0.01161 0.94867 0.9942
23.1 202 1 0.9663 0.01250 0.94215 0.9912
23.2 201 1 0.9615 0.01333 0.93576 0.9880
23.2 200 1 0.9567 0.01411 0.92948 0.9848
23.4 199 6 0.9279 0.01794 0.89339 0.9637
23.5 193 3 0.9135 0.01949 0.87604 0.9525
23.6 190 2 0.9038 0.02044 0.86466 0.9448
23.7 188 2 0.8942 0.02132 0.85340 0.9370
23.9 186 1 0.8894 0.02174 0.84781 0.9331
24.0 185 3 0.8750 0.02293 0.83119 0.9211
24.3 182 2 0.8654 0.02367 0.82022 0.9130
24.4 180 8 0.8269 0.02623 0.77708 0.8800
24.5 172 1 0.8221 0.02652 0.77175 0.8758
25.0 171 4 0.8029 0.02758 0.75060 0.8588
25.2 167 1 0.7981 0.02783 0.74535 0.8545
25.2 166 1 0.7933 0.02808 0.74010 0.8503
25.3 165 1 0.7885 0.02832 0.73487 0.8460
25.4 164 16 0.7115 0.03141 0.65256 0.7758
25.5 148 2 0.7019 0.03172 0.64243 0.7669
25.8 146 2 0.6923 0.03200 0.63234 0.7580
25.8 144 2 0.6827 0.03227 0.62228 0.7490
26.0 142 9 0.6394 0.03329 0.57739 0.7081
26.3 122 1 0.6342 0.03343 0.57193 0.7032
26.4 121 20 0.5294 0.03517 0.46472 0.6030
26.5 101 4 0.5084 0.03531 0.44369 0.5825
26.6 97 1 0.5032 0.03533 0.43846 0.5774
26.7 96 1 0.4979 0.03535 0.43323 0.5722
27.0 95 6 0.4665 0.03537 0.40204 0.5412
27.4 83 11 0.4046 0.03526 0.34112 0.4800
27.6 72 1 0.3990 0.03521 0.33565 0.4744
27.7 71 1 0.3934 0.03516 0.33019 0.4687
27.8 70 1 0.3878 0.03510 0.32474 0.4631
28.0 69 7 0.3484 0.03455 0.28690 0.4232
28.1 56 2 0.3360 0.03442 0.27488 0.4107
28.2 54 1 0.3298 0.03434 0.26890 0.4044
28.2 53 1 0.3236 0.03425 0.26293 0.3982
28.4 52 7 0.2800 0.03336 0.22169 0.3537
28.5 45 1 0.2738 0.03320 0.21587 0.3472
29.0 44 5 0.2427 0.03221 0.18708 0.3148
29.1 32 2 0.2275 0.03193 0.17279 0.2995
29.4 30 3 0.2047 0.03132 0.15171 0.2763
29.5 27 2 0.1896 0.03078 0.13791 0.2606
29.7 25 2 0.1744 0.03013 0.12432 0.2447
29.8 23 2 0.1592 0.02936 0.11096 0.2286
30.0 21 3 0.1365 0.02795 0.09138 0.2039
30.3 11 1 0.1241 0.02803 0.07971 0.1932
30.4 10 6 0.0496 0.02225 0.02061 0.1195
30.6 4 1 0.0372 0.01985 0.01309 0.1059
31.0 3 1 0.0248 0.01667 0.00665 0.0926
32.4 2 1 0.0124 0.01210 0.00184 0.0839
32.6 1 1 0.0000 NaN NA NA
fit1 %>%
ggsurvfit()+
xlim(18, 30)Warning: Removed 15 row(s) containing missing values (geom_path).
## Wave of 2010The second survival model that is observed is for the year 2010. The age intervals are now between 18 - 30. From this model we can see early really no difference between the age of 18 to about 23 years old. After this age the difference begins to widen with Hispanics being at greater risk of not earning a Bachelors degree compared to Non Hispanics whites. This relationship was not present in the first survival model observing the year 2004.
fit2<-survfit(Surv(time = BAYR_3, event = as.numeric(Bachelors_3) )~his1,
data = dat97)
summary(fit2)Call: survfit(formula = Surv(time = BAYR_3, event = as.numeric(Bachelors_3)) ~
his1, data = dat97)
303 observations deleted due to missingness
his1=Hispanic
time n.risk n.event survival std.err lower 95% CI upper 95% CI
21.4 959 1 0.99896 0.00104 0.996917 1.00000
22.0 958 3 0.99583 0.00208 0.991758 0.99992
22.1 955 1 0.99479 0.00233 0.990239 0.99935
22.3 954 2 0.99270 0.00275 0.987328 0.99810
22.4 952 22 0.96976 0.00553 0.958982 0.98066
22.5 930 6 0.96350 0.00606 0.951708 0.97545
22.6 924 1 0.96246 0.00614 0.950506 0.97457
22.7 923 1 0.96142 0.00622 0.949306 0.97369
23.0 922 8 0.95308 0.00683 0.939785 0.96655
23.3 914 2 0.95099 0.00697 0.937425 0.96475
23.4 912 74 0.87383 0.01072 0.853062 0.89510
23.5 838 11 0.86236 0.01113 0.840825 0.88444
23.6 827 1 0.86131 0.01116 0.839715 0.88347
23.7 826 6 0.85506 0.01137 0.833064 0.87763
24.0 820 27 0.82690 0.01222 0.803302 0.85120
24.1 793 1 0.82586 0.01225 0.802204 0.85021
24.2 792 2 0.82377 0.01230 0.800010 0.84825
24.3 790 8 0.81543 0.01253 0.791245 0.84036
24.4 782 76 0.73618 0.01423 0.708813 0.76461
24.5 706 4 0.73201 0.01430 0.704510 0.76059
24.6 702 1 0.73097 0.01432 0.703435 0.75958
24.7 701 11 0.71950 0.01451 0.691621 0.74850
24.8 690 1 0.71846 0.01452 0.690548 0.74749
25.0 689 28 0.68926 0.01494 0.660583 0.71918
25.1 661 2 0.68717 0.01497 0.658448 0.71715
25.3 659 11 0.67570 0.01512 0.646717 0.70599
25.4 648 90 0.58186 0.01593 0.551460 0.61393
25.5 558 6 0.57560 0.01596 0.545153 0.60775
25.6 552 2 0.57351 0.01597 0.543052 0.60569
25.7 550 13 0.55996 0.01603 0.529406 0.59227
25.8 537 1 0.55892 0.01603 0.528358 0.59124
26.0 536 35 0.52242 0.01613 0.491743 0.55501
26.1 501 1 0.52138 0.01613 0.490700 0.55397
26.3 500 5 0.51616 0.01614 0.485483 0.54878
26.4 495 97 0.41502 0.01591 0.384974 0.44740
26.5 398 18 0.39625 0.01579 0.366468 0.42844
26.6 380 3 0.39312 0.01577 0.363388 0.42528
26.7 377 9 0.38373 0.01570 0.354157 0.41578
26.8 368 3 0.38060 0.01568 0.351083 0.41261
26.9 365 3 0.37748 0.01565 0.348010 0.40944
27.0 362 44 0.33160 0.01520 0.303099 0.36277
27.1 318 1 0.33055 0.01519 0.302081 0.36171
27.3 317 8 0.32221 0.01509 0.293950 0.35319
27.4 309 62 0.25756 0.01412 0.231319 0.28678
27.5 247 4 0.25339 0.01405 0.227303 0.28247
27.7 243 6 0.24713 0.01393 0.221286 0.27600
27.8 237 2 0.24505 0.01389 0.219282 0.27384
27.9 235 2 0.24296 0.01385 0.217279 0.27168
28.0 233 22 0.22002 0.01338 0.195304 0.24787
28.1 211 1 0.21898 0.01335 0.194308 0.24678
28.2 210 2 0.21689 0.01331 0.192316 0.24461
28.2 208 1 0.21585 0.01329 0.191321 0.24352
28.3 207 1 0.21481 0.01326 0.190325 0.24244
28.4 206 27 0.18665 0.01258 0.163552 0.21302
28.5 179 2 0.18457 0.01253 0.161577 0.21083
28.6 177 1 0.18352 0.01250 0.160590 0.20973
28.7 176 2 0.18144 0.01244 0.158616 0.20755
28.8 174 1 0.18040 0.01242 0.157630 0.20645
28.8 173 1 0.17935 0.01239 0.156644 0.20536
29.0 172 18 0.16058 0.01186 0.138950 0.18559
29.1 154 1 0.15954 0.01182 0.137970 0.18448
29.3 153 1 0.15850 0.01179 0.136991 0.18338
29.4 152 14 0.14390 0.01133 0.123315 0.16792
29.5 138 1 0.14286 0.01130 0.122341 0.16681
29.6 137 2 0.14077 0.01123 0.120395 0.16460
29.7 135 2 0.13869 0.01116 0.118450 0.16238
29.8 133 1 0.13764 0.01113 0.117478 0.16127
29.8 132 3 0.13452 0.01102 0.114564 0.15794
30.0 129 7 0.12722 0.01076 0.107782 0.15015
30.3 122 1 0.12617 0.01072 0.106815 0.14904
30.4 121 11 0.11470 0.01029 0.096208 0.13675
30.5 110 1 0.11366 0.01025 0.095247 0.13563
30.7 109 2 0.11157 0.01017 0.093326 0.13339
31.0 107 6 0.10532 0.00991 0.087577 0.12665
31.3 101 2 0.10323 0.00983 0.085665 0.12440
31.4 99 7 0.09593 0.00951 0.078993 0.11651
31.5 92 1 0.09489 0.00946 0.078043 0.11538
31.6 91 1 0.09385 0.00942 0.077093 0.11424
31.7 90 1 0.09281 0.00937 0.076144 0.11311
31.8 89 2 0.09072 0.00927 0.074247 0.11085
32.0 87 1 0.08968 0.00923 0.073300 0.10971
32.2 86 1 0.08863 0.00918 0.072354 0.10858
32.4 85 3 0.08551 0.00903 0.069519 0.10517
32.5 82 1 0.08446 0.00898 0.068576 0.10403
32.6 81 1 0.08342 0.00893 0.067633 0.10289
32.7 80 2 0.08133 0.00883 0.065750 0.10061
32.9 78 2 0.07925 0.00872 0.063871 0.09833
33.0 76 5 0.07404 0.00845 0.059188 0.09261
33.2 71 1 0.07299 0.00840 0.058254 0.09146
33.3 70 1 0.07195 0.00834 0.057321 0.09031
33.4 69 6 0.06569 0.00800 0.051744 0.08340
33.5 63 3 0.06257 0.00782 0.048971 0.07993
33.6 60 1 0.06152 0.00776 0.048048 0.07877
33.7 59 1 0.06048 0.00770 0.047127 0.07761
34.0 58 3 0.05735 0.00751 0.044372 0.07413
34.3 55 1 0.05631 0.00744 0.043456 0.07296
34.4 54 6 0.05005 0.00704 0.037991 0.06594
34.5 48 1 0.04901 0.00697 0.037085 0.06477
34.6 47 1 0.04797 0.00690 0.036181 0.06359
34.8 46 3 0.04484 0.00668 0.033480 0.06005
35.0 43 4 0.04067 0.00638 0.029905 0.05530
35.4 39 2 0.03858 0.00622 0.028130 0.05292
36.0 37 1 0.03754 0.00614 0.027246 0.05172
36.1 36 1 0.03650 0.00606 0.026365 0.05052
36.2 35 1 0.03545 0.00597 0.025485 0.04932
36.4 34 3 0.03233 0.00571 0.022864 0.04570
36.5 31 2 0.03024 0.00553 0.021131 0.04327
36.7 29 1 0.02920 0.00544 0.020270 0.04206
36.8 28 1 0.02815 0.00534 0.019411 0.04084
36.9 27 1 0.02711 0.00524 0.018557 0.03961
37.0 26 2 0.02503 0.00504 0.016859 0.03715
37.1 24 1 0.02398 0.00494 0.016016 0.03591
37.2 23 1 0.02294 0.00483 0.015178 0.03467
37.4 22 3 0.01981 0.00450 0.012694 0.03092
37.6 19 1 0.01877 0.00438 0.011877 0.02966
37.8 18 1 0.01773 0.00426 0.011067 0.02839
37.9 17 2 0.01564 0.00401 0.009467 0.02584
38.0 15 2 0.01356 0.00373 0.007900 0.02326
38.2 13 1 0.01251 0.00359 0.007132 0.02196
38.4 12 1 0.01147 0.00344 0.006374 0.02064
38.7 11 1 0.01043 0.00328 0.005629 0.01932
38.8 10 1 0.00938 0.00311 0.004898 0.01798
38.9 9 1 0.00834 0.00294 0.004184 0.01663
39.0 8 2 0.00626 0.00255 0.002818 0.01389
39.2 6 1 0.00521 0.00233 0.002175 0.01250
39.4 5 1 0.00417 0.00208 0.001569 0.01109
39.5 4 1 0.00313 0.00180 0.001011 0.00968
39.6 3 1 0.00209 0.00147 0.000522 0.00833
39.7 2 1 0.00104 0.00104 0.000147 0.00740
40.0 1 1 0.00000 NaN NA NA
his1=Non Hispanic
time n.risk n.event survival std.err lower 95% CI upper 95% CI
22.4 223 3 0.98655 0.00771 0.97154 1.0000
22.5 220 1 0.98206 0.00889 0.96480 0.9996
22.6 219 1 0.97758 0.00991 0.95834 0.9972
23.0 218 1 0.97309 0.01084 0.95209 0.9946
23.1 217 1 0.96861 0.01168 0.94599 0.9918
23.2 216 1 0.96413 0.01245 0.94002 0.9888
23.2 215 1 0.95964 0.01318 0.93416 0.9858
23.4 214 6 0.93274 0.01677 0.90043 0.9662
23.5 208 3 0.91928 0.01824 0.88422 0.9557
23.6 205 2 0.91031 0.01913 0.87357 0.9486
23.7 203 2 0.90135 0.01997 0.86304 0.9413
23.9 201 1 0.89686 0.02037 0.85782 0.9377
24.0 200 3 0.88341 0.02149 0.84227 0.9266
24.3 197 2 0.87444 0.02219 0.83201 0.9190
24.4 195 8 0.83857 0.02464 0.79164 0.8883
24.5 187 1 0.83408 0.02491 0.78666 0.8844
25.0 186 4 0.81614 0.02594 0.76685 0.8686
25.2 182 1 0.81166 0.02618 0.76193 0.8646
25.2 181 1 0.80717 0.02642 0.75702 0.8607
25.3 180 1 0.80269 0.02665 0.75212 0.8567
25.4 179 16 0.73094 0.02970 0.67499 0.7915
25.5 163 2 0.72197 0.03000 0.66550 0.7832
25.8 161 2 0.71300 0.03029 0.65604 0.7749
25.8 159 2 0.70404 0.03057 0.64660 0.7666
26.0 157 9 0.66368 0.03164 0.60448 0.7287
26.3 148 1 0.65919 0.03174 0.59983 0.7244
26.4 147 20 0.56951 0.03316 0.50809 0.6383
26.5 127 4 0.55157 0.03330 0.49001 0.6209
26.6 123 1 0.54709 0.03333 0.48550 0.6165
26.7 122 1 0.54260 0.03336 0.48100 0.6121
27.0 121 6 0.51570 0.03347 0.45410 0.5856
27.4 115 11 0.46637 0.03341 0.40528 0.5367
27.6 104 1 0.46188 0.03339 0.40087 0.5322
27.7 103 1 0.45740 0.03336 0.39647 0.5277
27.8 102 1 0.45291 0.03333 0.39208 0.5232
28.0 101 7 0.42152 0.03307 0.36145 0.4916
28.1 94 2 0.41256 0.03297 0.35275 0.4825
28.2 92 1 0.40807 0.03291 0.34841 0.4780
28.2 91 1 0.40359 0.03285 0.34407 0.4734
28.4 90 7 0.37220 0.03237 0.31387 0.4414
28.5 83 1 0.36771 0.03229 0.30957 0.4368
29.0 82 5 0.34529 0.03184 0.28820 0.4137
29.1 77 2 0.33632 0.03164 0.27969 0.4044
29.4 75 3 0.32287 0.03131 0.26698 0.3905
29.5 72 2 0.31390 0.03108 0.25854 0.3811
29.7 70 2 0.30493 0.03083 0.25012 0.3718
29.8 68 2 0.29596 0.03057 0.24173 0.3624
30.0 66 3 0.28251 0.03015 0.22919 0.3482
30.3 63 1 0.27803 0.03000 0.22503 0.3435
30.4 62 6 0.25112 0.02904 0.20019 0.3150
30.6 56 1 0.24664 0.02887 0.19608 0.3102
31.0 55 1 0.24215 0.02869 0.19198 0.3054
31.4 54 1 0.23767 0.02850 0.18788 0.3006
31.5 53 2 0.22870 0.02812 0.17972 0.2910
31.9 51 1 0.22422 0.02793 0.17565 0.2862
32.3 50 2 0.21525 0.02752 0.16753 0.2766
32.4 48 4 0.19731 0.02665 0.15142 0.2571
32.5 44 1 0.19283 0.02642 0.14741 0.2522
32.6 43 1 0.18834 0.02618 0.14342 0.2473
32.8 42 1 0.18386 0.02594 0.13944 0.2424
33.0 41 2 0.17489 0.02544 0.13151 0.2326
33.4 39 1 0.17040 0.02518 0.12756 0.2276
33.5 38 1 0.16592 0.02491 0.12362 0.2227
33.7 37 1 0.16143 0.02464 0.11970 0.2177
33.9 36 1 0.15695 0.02436 0.11579 0.2128
34.0 35 1 0.15247 0.02407 0.11189 0.2078
34.4 34 4 0.13453 0.02285 0.09644 0.1877
34.5 30 1 0.13004 0.02252 0.09261 0.1826
35.0 29 1 0.12556 0.02219 0.08880 0.1775
35.4 28 2 0.11659 0.02149 0.08124 0.1673
35.7 26 1 0.11211 0.02113 0.07749 0.1622
35.8 25 1 0.10762 0.02075 0.07375 0.1571
36.0 24 3 0.09417 0.01956 0.06268 0.1415
36.3 21 1 0.08969 0.01913 0.05904 0.1362
36.4 20 2 0.08072 0.01824 0.05183 0.1257
36.8 18 1 0.07623 0.01777 0.04828 0.1204
37.4 17 2 0.06726 0.01677 0.04126 0.1097
37.7 15 1 0.06278 0.01624 0.03781 0.1042
38.0 14 3 0.04933 0.01450 0.02772 0.0878
38.4 11 2 0.04036 0.01318 0.02128 0.0765
38.6 9 1 0.03587 0.01245 0.01817 0.0708
39.0 8 1 0.03139 0.01168 0.01514 0.0651
39.4 7 2 0.02242 0.00991 0.00943 0.0533
39.5 5 1 0.01794 0.00889 0.00679 0.0474
39.6 4 1 0.01345 0.00771 0.00437 0.0414
39.7 3 1 0.00897 0.00631 0.00226 0.0356
40.0 2 2 0.00000 NaN NA NA
fit2 %>%
ggsurvfit()+
xlim(18, 45)Warning: Removed 2 row(s) containing missing values (geom_path).
## Wave of 2019The third survival model that is observed is for the age intervals between 18 - 45. This continued statistically significant difference in the risk of earning a Bachelors degree; with Hispanics being at greater risk of not earning a Bachelors compared to Non Hispanic whites. Although from this model it seems to come close to a state of no statistical difference at the age of 40.
Risk Set
dat97<-dat97%>% filter(is.na(Bachelors_1)==F &
is.na(Bachelors_2)==F &
is.na(Bachelors_3)==F &
is.na(BAYR_1)==F &
is.na(BAYR_2)==F &
is.na(BAYR_3)==F &
Bachelors_1!=1)Pivot
e.long1 <- dat97 %>%
#rename
rename(wt = samplingweight,strata= VSTRAT, psu = VPSU)%>%
select(ID, psu,wt,strata,his1,sex11,DADBA,MOMBA,DATEBA,south,skintone, #time constant
BAYR_1, BAYR_2, BAYR_3, #t-varying variables
Bachelors_1, Bachelors_2, Bachelors_3)%>%
pivot_longer(cols = c(-ID,-psu, -wt, -strata, -his1, -sex11, -MOMBA, -DADBA, -DATEBA, -south, -skintone), #error here
names_to = c(".value", "wave"), #make wave variable and put t-v vars into columns
names_sep = "_")%>% #all t-v variables have _ between name and time, like age_1, age_2
group_by(ID)%>%
mutate(age_enter = DATEBA,
age_exit = lead(DATEBA, 1, order_by=ID))%>%
mutate(nexBA = dplyr::lead(Bachelors,n=1, order_by = ID))%>%
mutate(BAtran = ifelse(nexBA == 1 & Bachelors == 0, 1, 0))%>%
filter(is.na(age_exit)==F)%>%
ungroup()%>%
filter(complete.cases(age_enter, age_exit, BAtran,
his1, MOMBA, DADBA, sex11, psu, south, skintone))options(survey.lonely.psu = "adjust")
des2<-svydesign(ids = ~psu, #error here
strata = ~strata,
weights =~wt,
data=e.long1,
nest=T)Cox Regression Hispanics and Skin Tone
f1 <- survfit(Surv(time = age_enter, event = BAtran)~his1+skintone, data=e.long1)
f1%>%
ggsurvfit()#Fit the model
fitskintone<-svycoxph(Surv(time = age_enter, event = BAtran)~his1*skintone, design=des2)
summary(fitskintone)Stratified 1 - level Cluster Sampling design (with replacement)
With (168) clusters.
svydesign(ids = ~psu, strata = ~strata, weights = ~wt, data = e.long1,
nest = T)
Call:
svycoxph(formula = Surv(time = age_enter, event = BAtran) ~ his1 *
skintone, design = des2)
n= 1758, number of events= 638
coef exp(coef) se(coef) robust se z
his1Non Hispanic -0.29195 0.74681 0.13857 0.15224 -1.918
skintone1 0.12395 1.13196 0.29353 0.38821 0.319
his1Non Hispanic:skintone1 0.04464 1.04565 0.44616 0.50153 0.089
Pr(>|z|)
his1Non Hispanic 0.0551 .
skintone1 0.7495
his1Non Hispanic:skintone1 0.9291
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
his1Non Hispanic 0.7468 1.3390 0.5541 1.006
skintone1 1.1320 0.8834 0.5289 2.423
his1Non Hispanic:skintone1 1.0457 0.9563 0.3913 2.794
Concordance= 0.516 (se = 0.006 )
Likelihood ratio test= NA on 3 df, p=NA
Wald test = 3.96 on 3 df, p=0.3
Score (logrank) test = NA on 3 df, p=NA
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
This first Cox Regression is used to measure the risk of earning a Bachelors degree between Hispanics and Non Hispanic whites in associaiton with skin tone. When observing the model; the variable Hispanic was found with a marginal effect with those who are Hispanic about 26% less likely to earn a Bachelors degree compared to Non Hispanic whites. When placed together with the interaction term (*) the marginal effect is lost.
Cox Regression Hispanics and region of residency in the South
f2 <- survfit(Surv(time = age_enter, event = BAtran)~his1+south, data=e.long1)
f2%>%
ggsurvfit()#Fit the model
fitSouth<-svycoxph(Surv(time = age_enter, event = BAtran)~his1*south, design=des2)
summary(fitSouth)Stratified 1 - level Cluster Sampling design (with replacement)
With (168) clusters.
svydesign(ids = ~psu, strata = ~strata, weights = ~wt, data = e.long1,
nest = T)
Call:
svycoxph(formula = Surv(time = age_enter, event = BAtran) ~ his1 *
south, design = des2)
n= 1758, number of events= 638
coef exp(coef) se(coef) robust se z Pr(>|z|)
his1Non Hispanic -0.28344 0.75319 0.16428 0.19090 -1.485 0.138
south1 -0.09592 0.90854 0.08482 0.10466 -0.916 0.359
his1Non Hispanic:south1 0.05319 1.05463 0.26314 0.23421 0.227 0.820
exp(coef) exp(-coef) lower .95 upper .95
his1Non Hispanic 0.7532 1.3277 0.5181 1.095
south1 0.9085 1.1007 0.7400 1.115
his1Non Hispanic:south1 1.0546 0.9482 0.6664 1.669
Concordance= 0.524 (se = 0.011 )
Likelihood ratio test= NA on 3 df, p=NA
Wald test = 4.84 on 3 df, p=0.2
Score (logrank) test = NA on 3 df, p=NA
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
This second Cox Regression is used to measure the risk of earning a Bachelors degree between Hispanics and Non Hispanic whites in association with living in the Southern region of the US. When observing the model; non of the variables in the model are statistically significant.
Cox Regression Hispanics only
f3 <- survfit(Surv(time = age_enter, event = BAtran)~his1, data=e.long1)
f3%>%
ggsurvfit()#Fit the model
fithis<-svycoxph(Surv(time = age_enter, event = BAtran)~his1, design=des2)
summary(fithis)Stratified 1 - level Cluster Sampling design (with replacement)
With (168) clusters.
svydesign(ids = ~psu, strata = ~strata, weights = ~wt, data = e.long1,
nest = T)
Call:
svycoxph(formula = Surv(time = age_enter, event = BAtran) ~ his1,
design = des2)
n= 1758, number of events= 638
coef exp(coef) se(coef) robust se z Pr(>|z|)
his1Non Hispanic -0.2695 0.7637 0.1282 0.1390 -1.939 0.0526 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
his1Non Hispanic 0.7637 1.309 0.5816 1.003
Concordance= 0.515 (se = 0.006 )
Likelihood ratio test= NA on 1 df, p=NA
Wald test = 3.76 on 1 df, p=0.05
Score (logrank) test = NA on 1 df, p=NA
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
The third Cox Regression is used to measure the risk of earning a Bachelors degree between Hispanics and Non Hispanic whites only. When observing the model; Hispanics are found to have a marginal effect with Hispanics about 24% less likely to earn a Bachelors degree compared to their Non Hispanic white counterparts.
Cox Regression Hispanics and Bachelors Degree of the Father
f4 <- survfit(Surv(time = age_enter, event = BAtran)~his1+DADBA, data=e.long1)
f4%>%
ggsurvfit()#Fit the model
fitDADBA<-svycoxph(Surv(time = age_enter, event = BAtran)~his1*DADBA, design=des2)
summary(fitDADBA)Stratified 1 - level Cluster Sampling design (with replacement)
With (168) clusters.
svydesign(ids = ~psu, strata = ~strata, weights = ~wt, data = e.long1,
nest = T)
Call:
svycoxph(formula = Surv(time = age_enter, event = BAtran) ~ his1 *
DADBA, design = des2)
n= 1758, number of events= 638
coef exp(coef) se(coef) robust se z Pr(>|z|)
his1Non Hispanic -0.14940 0.86123 0.15119 0.20259 -0.737 0.461
DADBA1 0.38760 1.47344 0.08091 0.07996 4.848 1.25e-06
his1Non Hispanic:DADBA1 -0.25017 0.77867 0.29109 0.33251 -0.752 0.452
his1Non Hispanic
DADBA1 ***
his1Non Hispanic:DADBA1
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
his1Non Hispanic 0.8612 1.1611 0.5790 1.281
DADBA1 1.4734 0.6787 1.2597 1.723
his1Non Hispanic:DADBA1 0.7787 1.2842 0.4058 1.494
Concordance= 0.546 (se = 0.012 )
Likelihood ratio test= NA on 3 df, p=NA
Wald test = 25.43 on 3 df, p=1e-05
Score (logrank) test = NA on 3 df, p=NA
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
The fourth Cox Regression is used to measure the risk of earning a Bachelors degree between Hispanics and Non Hispanic whites in association with Fathers who have earned a Bachelors degree. When observing the model; Fathers who have earned a Bachelors are statistically significant at the p<001 level (***) with those respondents who have a Father that has a Bachelors degree are about 47% more likely to earn a Bachelors degree themselves as opposed to those respondents whos Father did not earn a Bachelors degree. The variable Hispanic is not statistically significant in this model. When these two variables are placed with a interaction term that significance is lost.
Cox Regression Hispanics and Bachelors Degree of the Mother
f5 <- survfit(Surv(time = age_enter, event = BAtran)~his1+MOMBA, data=e.long1)
f5%>%
ggsurvfit()#Fit the model
fitMOMBA<-svycoxph(Surv(time = age_enter, event = BAtran)~his1*MOMBA, design=des2)
summary(fitMOMBA)Stratified 1 - level Cluster Sampling design (with replacement)
With (168) clusters.
svydesign(ids = ~psu, strata = ~strata, weights = ~wt, data = e.long1,
nest = T)
Call:
svycoxph(formula = Surv(time = age_enter, event = BAtran) ~ his1 *
MOMBA, design = des2)
n= 1758, number of events= 638
coef exp(coef) se(coef) robust se z Pr(>|z|)
his1Non Hispanic -0.11864 0.88813 0.15090 0.16979 -0.699 0.484708
MOMBA1 0.35608 1.42772 0.07984 0.09342 3.811 0.000138
his1Non Hispanic:MOMBA1 -0.39651 0.67266 0.29044 0.32562 -1.218 0.223336
his1Non Hispanic
MOMBA1 ***
his1Non Hispanic:MOMBA1
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
his1Non Hispanic 0.8881 1.1260 0.6367 1.239
MOMBA1 1.4277 0.7004 1.1888 1.715
his1Non Hispanic:MOMBA1 0.6727 1.4866 0.3553 1.273
Concordance= 0.552 (se = 0.012 )
Likelihood ratio test= NA on 3 df, p=NA
Wald test = 21.44 on 3 df, p=9e-05
Score (logrank) test = NA on 3 df, p=NA
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
The fifth Cox Regression is used to measure the risk of earning a Bachelors degree between Hispanics and Non Hispanic whites in association with Mothers who have earned a Bachelors degree. When observing the model; Mothers who have earned a Bachelors are statistically significant at the p<001 level (***) with those respondents who have a Mother that has a Bachelors degree are about 42% more likely to earn a Bachelors degree themselves as opposed to those respondents whose Mother did not earn a Bachelors degree. The variable Hispanic is not statistically significant in this model. When these two variables are placed with a interaction term that significance is lost.
Cox Regression Hispanics and sex
f6 <- survfit(Surv(time = age_enter, event = BAtran)~his1+sex11, data=e.long1)
f6%>%
ggsurvfit()#Fit the model
fitsex<-svycoxph(Surv(time = age_enter, event = BAtran)~his1*sex11, design=des2)
summary(fitsex)Stratified 1 - level Cluster Sampling design (with replacement)
With (168) clusters.
svydesign(ids = ~psu, strata = ~strata, weights = ~wt, data = e.long1,
nest = T)
Call:
svycoxph(formula = Surv(time = age_enter, event = BAtran) ~ his1 *
sex11, design = des2)
n= 1758, number of events= 638
coef exp(coef) se(coef) robust se z
his1Non Hispanic -0.31902 0.72686 0.17542 0.20024 -1.593
sex11Women -0.03268 0.96785 0.07875 0.08456 -0.386
his1Non Hispanic:sex11Women 0.10524 1.11098 0.25660 0.28409 0.370
Pr(>|z|)
his1Non Hispanic 0.111
sex11Women 0.699
his1Non Hispanic:sex11Women 0.711
exp(coef) exp(-coef) lower .95 upper .95
his1Non Hispanic 0.7269 1.3758 0.4909 1.076
sex11Women 0.9679 1.0332 0.8200 1.142
his1Non Hispanic:sex11Women 1.1110 0.9001 0.6366 1.939
Concordance= 0.506 (se = 0.012 )
Likelihood ratio test= NA on 3 df, p=NA
Wald test = 4.09 on 3 df, p=0.3
Score (logrank) test = NA on 3 df, p=NA
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
The final Cox Regression model is used to measure the risk of earning a Bachelors degree between Hispanics and Non Hispanic whites in association with the respondents sex. When observing the model; none of the variables in the model are statistically significant.
Grambsch and Therneau (1994) Test
Overall, none of the models are statistically significant meaning that all variables are implying proportionality of effect. Being that none of the variables are correlated with the timing of the transition. This furthers the reliability of the observed Cox Regression Models.
skintone
fit.test1<-cox.zph(fitskintone)
fit.test1 chisq df p
his1 5.24e-04 1 0.98
skintone 1.00e-03 1 0.97
his1:skintone 5.83e-05 1 0.99
GLOBAL 1.48e-03 3 1.00
plot(fit.test1, df=2)South Region Residency
fit.test2<-cox.zph(fitSouth)
fit.test2 chisq df p
his1 5.68e-04 1 0.98
south 4.92e-06 1 1.00
his1:south 4.03e-05 1 0.99
GLOBAL 6.11e-04 3 1.00
plot(fit.test2, df=2)Hispanics
fit.test3<-cox.zph(fithis)
fit.test3 chisq df p
his1 0.000536 1 0.98
GLOBAL 0.000536 1 0.98
plot(fit.test3, df=2)Bachelors Degree for Fathers
fit.test4<-cox.zph(fitDADBA)
fit.test4 chisq df p
his1 7.54e-04 1 0.98
DADBA 7.53e-05 1 0.99
his1:DADBA 1.84e-05 1 1.00
GLOBAL 8.16e-04 3 1.00
plot(fit.test4, df=2)Bachelors Degree for Mothers
fit.test5<-cox.zph(fitMOMBA)
fit.test5 chisq df p
his1 7.40e-04 1 0.98
MOMBA 9.06e-06 1 1.00
his1:MOMBA 1.56e-04 1 0.99
GLOBAL 8.64e-04 3 1.00
plot(fit.test5, df=2)sex
fit.test6<-cox.zph(fitsex)
fit.test6 chisq df p
his1 5.62e-04 1 0.98
sex11 2.78e-04 1 0.99
his1:sex11 1.18e-05 1 1.00
GLOBAL 9.25e-04 3 1.00
plot(fit.test6, df=2)Conclusions
From the findings overall the hypothesis (h1) was not supported by analysis of the various Cox Regression Models. Only at a couple of time was there only but a marginal effect had between Hispanics and Non Hispanic whites concerning the risk of earning a Bachelors degree. With model one observing skin tone it is interesting that the skin tone variable was not able to take away the marginal effect had on the Hispanic variable. Meaning that evening accounting for the respondents skin color they are still slightly facing a educational disparity in terms of earning a Bachelors degree over the life course in the US. This is controlled for with the interaction term. This marginal effect with Hispanics is found again with only observing Hispanics in the model.
The two variables with the largest statistical significance would be the variables of observing the Mother’s and the Father’s educational attainment of whether or not they earned a Bachelors degree. Both variables showed over 40% of their children more likely to earn a Bachelors degree. When these variables are placed into interaction effects with Hispanics the statistical significance is lost for both variables in both models respectively.
The Grambsch and Therneau (1994) Tests conducted for each model further solidifies the reliability of the results that were generated.
When observing Hispanics between the years of 2004 and 2019 although there may be found some slight differences in educational attainment. That difference is lost especially as the cohort ages. The importance of parental educational attainment is found throughout theses models for both the Mother and the Father.
Future research is to be done for the different instances of entering into college, quality of education, and occupation attainment after graduation.