Observing Educational Differences Between Hispanics and Non Hispanic Whites: College Educational Outcomes
The event variable that I am using the whether the respondent earns a bachelors degree.
The duration/time variable are those cohorts for the years 2004, 2010, and 2019 regarding the age that they earn they’re bachelors degree.
The censoring indicator is any respondent who has already earned they’re bachelors degree.
The two groups that are being compared are Hispanics and Non Hispanic Whites.
Survival functions show below.
When the t-tests for the survival functions below were conducted the first survival function was not statistically significant. The other two models showing years 2010 and 2019 showing a statistically significant difference between Hispanics and Non Hispanic whites regarding the timing of receiving a bachelors degree. For the statistically significant models it showed that those who were Hispanic tended to not earn their Bachelors Degree’s before Non Hispanic Whites.
When observing the Kaplan-Meier survival analysis you can see in the first plot for the year 2004; Hispanics had a slight advantage over Non Hispanic Whites earning them at slightly higher proportions when aged between 20-22 years.
When observing the cohort for the year 2010; the differences begin to widen with Non Hispanic Whites earning they’re bachelors degree at much higher proportions than Hispanics. This is between the ages of 18 - 35. When they become almost equal would be the late 20’s and early 30’s it gets close again.
When observing the cohort for the year 2019; the educational differences widens even more with Non Hispanic Whites tending to earn their Bachelor Degrees earlier and at higher proportions than Hispanics. From this model it can show, outside for those first early years from the 2004 wave, the educational differences never truly were close to each other but rather widening the whole duration. With only a closeness happening with the age of 40 but a difference between the two still exists.
Loading required package: grid
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Attaching package: 'survey'
The following object is masked from 'package:graphics':
dotchart
library(ggsurvfit)library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
Rows: 8984 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (13): ID, SEX, BDATEM, BDATEY, SAMPLETYPE, ETHNICITY, HDEGREE04, HDEGREE...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dat97$Bachelors04 <-Recode(dat97$HDEGREE04, recodes ="0:3 = 0; 4:7 = 1; else=NA", as.factor=T)## Bachelors degree or higher = 1 & all lesser educations are labled 0
#tabyl(Bachelors04)
dat97$Bachelors10 <-Recode(dat97$HDEGREE2010, recodes ="0:3 = 0; 4:7 = 1; else=NA", as.factor=T)## Bachelors degree or higher = 1 & all lesser educations are labled 0dat97 %>%tabyl(Bachelors10) ## Bachelors degree or higher in 2010
Bachelors10 n percent valid_percent
0 5570 0.6199911 0.7503705
1 1853 0.2062556 0.2496295
<NA> 1561 0.1737533 NA
dat97$Bachelors19 <-Recode(dat97$HDEGREE2019, recodes ="0:3 = 0; 4:7 = 1; else=NA", as.factor=T)## Bachelors degree or higher = 1 & all lesser educations are labled 0dat97 %>%tabyl(Bachelors19) ## Bachelors degree or higher in 2019
Bachelors19 n percent valid_percent
0 4772 0.5311665 0.6893961
1 2150 0.2393143 0.3106039
<NA> 2062 0.2295191 NA
dat97$Hispanic<-Recode(dat97$ETHNICITY, recodes ="2 = 0; 4 = 1; else=NA", as.factor=T)## Hispanics are coded as 0 & Non Hipanic whites are coded as 1, all other ethnicities are excludeddat97$his1<-as.factor(ifelse(dat97$Hispanic==1, "Hispanic", "Non Hispanic"))dat97 %>%tabyl(his1) ## Hispanics and Non Hispanic whites coded
his1 n percent valid_percent
Hispanic 4665 0.5192565 0.7104782
Non Hispanic 1901 0.2115984 0.2895218
<NA> 2418 0.2691451 NA
summary(dat97)
ID SEX BDATEM BDATEY
Min. : 1 Min. :1.000 Min. : 1.000 Min. :1980
1st Qu.:2249 1st Qu.:1.000 1st Qu.: 3.000 1st Qu.:1981
Median :4502 Median :1.000 Median : 7.000 Median :1982
Mean :4504 Mean :1.488 Mean : 6.556 Mean :1982
3rd Qu.:6758 3rd Qu.:2.000 3rd Qu.:10.000 3rd Qu.:1983
Max. :9022 Max. :2.000 Max. :12.000 Max. :1984
SAMPLETYPE ETHNICITY HDEGREE04 HDEGREE2010
Min. :0.0000 Min. :1.000 Min. :-5.00 Min. :-5.000
1st Qu.:1.0000 1st Qu.:1.000 1st Qu.: 0.00 1st Qu.: 0.000
Median :1.0000 Median :4.000 Median : 2.00 Median : 2.000
Mean :0.7511 Mean :2.788 Mean : 0.66 Mean : 1.045
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.: 2.00 3rd Qu.: 3.000
Max. :1.0000 Max. :4.000 Max. : 7.00 Max. : 7.000
HDEGREE2019 VSTRAT VPSU samplingweight
Min. :-5.00 Min. : 1.00 Min. :1.00 Min. : 0
1st Qu.: 0.00 1st Qu.: 21.00 1st Qu.:1.00 1st Qu.: 0
Median : 2.00 Median : 41.00 Median :1.00 Median : 0
Mean : 0.85 Mean : 46.56 Mean :1.49 Mean : 215700
3rd Qu.: 3.00 3rd Qu.: 65.00 3rd Qu.:2.00 3rd Qu.: 518286
Max. : 7.00 Max. :117.00 Max. :2.00 Max. :2773108
DATEBA Bachelors04 Bachelors10 Bachelors19 Hispanic
Min. : -4.00 0 :6867 0 :5570 0 :4772 0 :1901
1st Qu.: -4.00 1 : 568 1 :1853 1 :2150 1 :4665
Median : -4.00 NA's:1549 NA's:1561 NA's:2062 NA's:2418
Mean : 81.82
3rd Qu.:269.00
Max. :481.00
his1
Hispanic :4665
Non Hispanic:1901
NA's :2418
summary(dat97$DATEBA)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-4.00 -4.00 -4.00 81.82 269.00 481.00
dat97<- dat97 %>%filter(DATEBA>0)dat97$BAYR<-ifelse(dat97$HDEGREE04==2, (2004-dat97$BDATEY),ifelse(dat97$HDEGREE04==4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yetsummary(dat97$BAYR)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
20.00 21.00 22.00 22.04 23.42 36.08 369
## For the wave of 2004
dat97$BAYR1<-ifelse(dat97$HDEGREE2010==2, (2010-dat97$BDATEY),ifelse(dat97$HDEGREE2010==4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yetsummary(dat97$BAYR1)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
21.42 25.33 26.42 26.48 28.00 36.08 663
## For the wave of 2010
dat97$BAYR2<-ifelse(dat97$HDEGREE2019==2, (2019-dat97$BDATEY),ifelse(dat97$HDEGREE2019==4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yetsummary(dat97$BAYR2)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
21.42 25.42 26.92 28.06 29.67 40.08 972
plot(fit.s[[2]], ylim=c(0,1), xlim=c(18,25),col=1, ci=F )lines(fit.s[[1]], col=2)title(main="Survival Function for Obtaining a Bachelors Between Hispanics and Non Hispanic Whites 2004",sub="Hispanics vs Non-Hispanic Whites")legend("topright",legend =c("Hispanics","Non-Hispanic Whites" ),col=c(1,2), lty=1)
plot(fit.s1[[2]], ylim=c(0,1), xlim=c(18,35),col=1, ci=F )lines(fit.s1[[1]], col=2)title(main="Survival Function for Obtaining a Bachelors Between Hispanics and Non Hispanic Whites 2010",sub="Hispanics vs Non-Hispanic Whites")legend("topright",legend =c("Hispanics","Non-Hispanic Whites" ),col=c(1,2), lty=1)
plot(fit.s2[[2]], ylim=c(0,1), xlim=c(18,45),col=1, ci=F )lines(fit.s2[[1]], col=2)title(main="Survival Function for Obtaining a Bachelors Between Hispanics and Non Hispanic Whites 2010",sub="Hispanics vs Non-Hispanic Whites")legend("topright",legend =c("Hispanics","Non-Hispanic Whites" ),col=c(1,2), lty=1)