HW 3 EHA

Author

Brandon Flores

  1. I am using a Proportional Hazards model because for the data that I am evaluating it conceptually makes sense to observe the amount of hazard between Hispanics and Non Hispanic Whites regarding earning a Bachelors Degree. Observing the amount of time, the respondents age of, earning a Bachelors between the two groups.

  2. The Parametric Distribution I am using is the Piecewise Exponential Model because you are able to have set intervals of observations to obtain a more specified outcomes. As opposed to the other models that assume more of a continuation throughout without much variation.

Analysis

For this analysis added predictors will be added to the model along with the variable Hispanics. The parental education for both the mother and father are added to the models. For parental education it is observed whether or not they earned a Bachelors degree. The sex of the respondent is also added to the model.

When observing the AIC, the model that was the best fit for the data was the Piecewise model with the lowest shown AIC. When observing the regression summary of the Cox model it shows that when controlling for parental education of earning a Bachelors and the respondents sex there is no statistically significant relationship between Hispanics and Non Hispanic Whites. A Cox model was also conducted without those other control variables and a statistical significant result was still not found. This is different with typical results showing a statistically significant result between Hispanics and Non Hispanic Whites concerning educational attainment.

library(car)
Loading required package: carData
library(haven)
library(survival)
library(ggplot2)
library(tidyverse)
── Attaching packages
───────────────────────────────────────
tidyverse 1.3.2 ──
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 
✔ purrr   0.3.4      
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::recode() masks car::recode()
✖ purrr::some()   masks car::some()
library(survey)
Loading required package: grid
Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack


Attaching package: 'survey'

The following object is masked from 'package:graphics':

    dotchart
library(ggsurvfit)
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(muhaz)
library(eha)
dat97<-read_csv("C:\\Users\\BTP\\Desktop\\97cohortNLSYRnum.csv")
Rows: 8984 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (15): ID, SEX, BDATEM, BDATEY, SAMPLETYPE, ETHNICITY, HDEGREE04, HDEGREE...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
myvars<-c( "ID","HDEGREE04", "HDEGREE2010", "HDEGREE2019","ETHNICITY",
           "SEX", "BIOFTHIGD", "BIOMTHIGD", "BDATEY","VSTRAT", "VPSU",
           "samplingweight","DATEBA")

dat97<-dat97[,myvars]

Time Constant Variables

dat97$Bachelors_1 <-Recode(dat97$HDEGREE04, recodes = "0:3 = 0; 4:7 = 1; else=NA", as.factor=T)

## Bachelors degree or higher = 1 & all lesser educations are labled 0
dat97$Bachelors_2 <-Recode(dat97$HDEGREE2010, recodes = "0:3 = 0; 4:7 = 1; else=NA", as.factor=T)

## Bachelors degree or higher = 1 & all lesser educations are labled 0

## Bachelors degree or higher in 2010
dat97$Bachelors_3 <-Recode(dat97$HDEGREE2019, recodes = "0:3 = 0; 4:7 = 1; else=NA", as.factor=T)

## Bachelors degree or higher = 1 & all lesser educations are labled 0

## Bachelors degree or higher in 2019
dat97$Hispanic<-Recode(dat97$ETHNICITY, recodes = "2 = 0; 4 = 1; else=NA", as.factor=T)

## Hispanics are coded as 0 & Non Hipanic whites are coded as 1, all other ethnicities are excluded

dat97$his1<-as.factor(ifelse(dat97$Hispanic==1, "Hispanic", "Non Hispanic"))
dat97$sex1<-Recode(dat97$SEX, recodes = "1 = 0; 2 = 1; else=NA", as.factor=T)

## Women are coded as 0 & Men  are coded as 1

dat97$sex11<-as.factor(ifelse(dat97$sex1==1, "Women", "Men"))
dat97$DADBA <-Recode(dat97$BIOFTHIGD, recodes = "0:15 = 0; 16:20 = 1; else=NA", as.factor=T)

## Dads Bachelors degree or higher = 1 & all lesser educations are labled 0

## Dads Bachelors degree or higher 
dat97$MOMBA <-Recode(dat97$BIOMTHIGD, recodes = "0:15 = 0; 16:20 = 1; else=NA", as.factor=T)

## Moms Bachelors degree or higher = 1 & all lesser educations are labled 0

## Moms Bachelors degree or higher

Censoring

dat97<- dat97%>%filter(DATEBA>0)

dat97$BAYR_1<-ifelse(dat97$HDEGREE04==2,
                   (2004-dat97$BDATEY),
                   ifelse(dat97$HDEGREE04==4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yet

                  ## For the wave of 2004

Time varying variables

dat97$BAYR_2<-ifelse(dat97$HDEGREE2010==2,
                   (2010-dat97$BDATEY),
                   ifelse(dat97$HDEGREE2010==4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yet

                  ## For the wave of 2010
dat97$BAYR_3<-ifelse(dat97$HDEGREE2019==2,
                   (2019-dat97$BDATEY),
                   ifelse(dat97$HDEGREE2019==4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yet

                  ## For the wave of 2019

Risk Set

dat97<-dat97%>% filter(is.na(Bachelors_1)==F &
                  is.na(Bachelors_2)==F &
                  is.na(Bachelors_3)==F & 
                  is.na(BAYR_1)==F &
                  is.na(BAYR_2)==F &
                  is.na(BAYR_3)==F &
                    Bachelors_1!=1)

Pivot

e.long1 <- dat97 %>%
  #rename
  rename(wt = samplingweight,strata= VSTRAT, psu = VPSU)%>%
  select(ID,his1,sex11,DADBA,MOMBA,DATEBA,   #time constant
        BAYR_1, BAYR_2, BAYR_3, #t-varying variables
         Bachelors_1, Bachelors_2, Bachelors_3)%>%
   pivot_longer(cols = c(-ID, -his1, -sex11, -MOMBA, -DADBA, -DATEBA), #time constant variables go here
               names_to  = c(".value", "wave"), #make wave variable and put t-v vars into columns
               names_sep = "_") #all t-v variables have _ between name and time, like age_1, age_2

Transition variable

e.long1 <- e.long1%>%
  group_by(ID)%>%
  mutate(nexBA = dplyr::lead(Bachelors,n=1, order_by = ID))%>%
  mutate(BAtran = ifelse(nexBA == 1 & Bachelors == 0, 1, 0))%>%
  ungroup()




#find which people failed in the first time period and remove them from the second risk period risk set
failed1<-which(is.na(e.long1$BAtran)==T)
#e.long1<-e.long1[-failed1,] # this makes you have 0 cases

print(e.long1, n = 100)
# A tibble: 2,406 × 11
       ID his1         sex11 DADBA MOMBA DATEBA wave   BAYR Bache…¹ nexBA BAtran
    <dbl> <fct>        <fct> <fct> <fct>  <dbl> <chr> <dbl> <fct>   <fct>  <dbl>
  1    11 Non Hispanic Women 0     0        432 1      22   0       0          0
  2    11 Non Hispanic Women 0     0        432 2      28   0       1          1
  3    11 Non Hispanic Women 0     0        432 3      36   1       <NA>       0
  4    28 <NA>         Women 0     0        317 1      21   0       1          1
  5    28 <NA>         Women 0     0        317 2      26.4 1       1          0
  6    28 <NA>         Women 0     0        317 3      26.4 1       <NA>       0
  7    34 Hispanic     Women 0     1        329 1      20   0       1          1
  8    34 Hispanic     Women 0     1        329 2      27.4 1       1          0
  9    34 Hispanic     Women 0     1        329 3      27.4 1       <NA>       0
 10    35 Hispanic     Men   <NA>  <NA>     321 1      21   0       1          1
 11    35 Hispanic     Men   <NA>  <NA>     321 2      26.8 1       1          0
 12    35 Hispanic     Men   <NA>  <NA>     321 3      26.8 1       <NA>       0
 13    41 Hispanic     Women 0     0        317 1      21   0       1          1
 14    41 Hispanic     Women 0     0        317 2      26.4 1       1          0
 15    41 Hispanic     Women 0     0        317 3      26.4 1       <NA>       0
 16    48 <NA>         Women <NA>  0        325 1      21   0       1          1
 17    48 <NA>         Women <NA>  0        325 2      27.1 1       1          0
 18    48 <NA>         Women <NA>  0        325 3      27.1 1       <NA>       0
 19    55 Non Hispanic Women 0     0        336 1      23   0       1          1
 20    55 Non Hispanic Women 0     0        336 2      28   1       1          0
 21    55 Non Hispanic Women 0     0        336 3      28   1       <NA>       0
 22    63 <NA>         Women 0     0        380 1      20   0       0          0
 23    63 <NA>         Women 0     0        380 2      26   0       1          1
 24    63 <NA>         Women 0     0        380 3      31.7 1       <NA>       0
 25    70 <NA>         Men   0     1        324 1      22   0       1          1
 26    70 <NA>         Men   0     1        324 2      27   1       1          0
 27    70 <NA>         Men   0     1        324 3      27   1       <NA>       0
 28    91 Non Hispanic Women 0     1        349 1      22   0       1          1
 29    91 Non Hispanic Women 0     1        349 2      29.1 1       1          0
 30    91 Non Hispanic Women 0     1        349 3      29.1 1       <NA>       0
 31   108 Hispanic     Men   1     1        317 1      20   0       1          1
 32   108 Hispanic     Men   1     1        317 2      26.4 1       1          0
 33   108 Hispanic     Men   1     1        317 3      26.4 1       <NA>       0
 34   114 Hispanic     Women 0     0        368 1      22   0       1          1
 35   114 Hispanic     Women 0     0        368 2      30.7 1       1          0
 36   114 Hispanic     Women 0     0        368 3      30.7 1       <NA>       0
 37   128 Hispanic     Women 1     1        300 1      22   0       1          1
 38   128 Hispanic     Women 1     1        300 2      25   1       1          0
 39   128 Hispanic     Women 1     1        300 3      25   1       <NA>       0
 40   157 Non Hispanic Women 0     0        317 1      22   0       1          1
 41   157 Non Hispanic Women 0     0        317 2      26.4 1       1          0
 42   157 Non Hispanic Women 0     0        317 3      26.4 1       <NA>       0
 43   170 Hispanic     Women 1     0        401 1      20   0       0          0
 44   170 Hispanic     Women 1     0        401 2      26   0       1          1
 45   170 Hispanic     Women 1     0        401 3      33.4 1       <NA>       0
 46   182 Hispanic     Women 1     1        316 1      20   0       1          1
 47   182 Hispanic     Women 1     1        316 2      26.3 1       1          0
 48   182 Hispanic     Women 1     1        316 3      26.3 1       <NA>       0
 49   183 Hispanic     Women 1     1        324 1      24   0       1          1
 50   183 Hispanic     Women 1     1        324 2      27   1       1          0
 51   183 Hispanic     Women 1     1        324 3      27   1       <NA>       0
 52   187 Non Hispanic Men   0     0        317 1      21   0       1          1
 53   187 Non Hispanic Men   0     0        317 2      26.4 1       1          0
 54   187 Non Hispanic Men   0     0        317 3      26.4 1       <NA>       0
 55   199 Hispanic     Men   1     1        346 1      22   0       1          1
 56   199 Hispanic     Men   1     1        346 2      28.8 1       1          0
 57   199 Hispanic     Men   1     1        346 3      28.8 1       <NA>       0
 58   203 Hispanic     Women 0     0        324 1      20   0       1          1
 59   203 Hispanic     Women 0     0        324 2      27   1       1          0
 60   203 Hispanic     Women 0     0        324 3      27   1       <NA>       0
 61   215 Non Hispanic Men   0     0        456 1      23   0       0          0
 62   215 Non Hispanic Men   0     0        456 2      29   0       1          1
 63   215 Non Hispanic Men   0     0        456 3      38   1       <NA>       0
 64   229 Hispanic     Women <NA>  1        317 1      20   0       1          1
 65   229 Hispanic     Women <NA>  1        317 2      26.4 1       1          0
 66   229 Hispanic     Women <NA>  1        317 3      26.4 1       <NA>       0
 67   230 Non Hispanic Men   0     0        329 1      20   0       1          1
 68   230 Non Hispanic Men   0     0        329 2      27.4 1       1          0
 69   230 Non Hispanic Men   0     0        329 3      27.4 1       <NA>       0
 70   237 Hispanic     Men   0     0        329 1      21   0       1          1
 71   237 Hispanic     Men   0     0        329 2      27.4 1       1          0
 72   237 Hispanic     Men   0     0        329 3      27.4 1       <NA>       0
 73   238 Hispanic     Women <NA>  <NA>     313 1      23   0       1          1
 74   238 Hispanic     Women <NA>  <NA>     313 2      26.1 1       1          0
 75   238 Hispanic     Women <NA>  <NA>     313 3      26.1 1       <NA>       0
 76   240 Hispanic     Women 1     1        305 1      21   0       1          1
 77   240 Hispanic     Women 1     1        305 2      25.4 1       1          0
 78   240 Hispanic     Women 1     1        305 3      25.4 1       <NA>       0
 79   248 Hispanic     Women 0     0        323 1      21   0       1          1
 80   248 Hispanic     Women 0     0        323 2      26.9 1       1          0
 81   248 Hispanic     Women 0     0        323 3      26.9 1       <NA>       0
 82   251 Non Hispanic Men   0     0        305 1      22   0       1          1
 83   251 Non Hispanic Men   0     0        305 2      25.4 1       1          0
 84   251 Non Hispanic Men   0     0        305 3      25.4 1       <NA>       0
 85   286 Hispanic     Women 0     0        325 1      20   0       1          1
 86   286 Hispanic     Women 0     0        325 2      27.1 1       1          0
 87   286 Hispanic     Women 0     0        325 3      27.1 1       <NA>       0
 88   293 Non Hispanic Women 0     0        341 1      21   0       1          1
 89   293 Non Hispanic Women 0     0        341 2      28.4 1       1          0
 90   293 Non Hispanic Women 0     0        341 3      28.4 1       <NA>       0
 91   296 Non Hispanic Women 0     0        317 1      23   0       1          1
 92   296 Non Hispanic Women 0     0        317 2      26.4 1       1          0
 93   296 Non Hispanic Women 0     0        317 3      26.4 1       <NA>       0
 94   300 Hispanic     Women 0     0        301 1      20   0       1          1
 95   300 Hispanic     Women 0     0        301 2      25.1 1       1          0
 96   300 Hispanic     Women 0     0        301 3      25.1 1       <NA>       0
 97   311 Hispanic     Men   <NA>  <NA>     324 1      22   0       1          1
 98   311 Hispanic     Men   <NA>  <NA>     324 2      27   1       1          0
 99   311 Hispanic     Men   <NA>  <NA>     324 3      27   1       <NA>       0
100   329 Hispanic     Men   0     0        317 1      20   0       1          1
# … with 2,306 more rows, and abbreviated variable name ¹​Bachelors
#Exponential
#interval censored
fitl1<-phreg(Surv(time = BAYR+1, event = BAtran)~his1+MOMBA+DADBA+sex11, data=e.long1,
             dist = "weibull",
             shape=1)
#Weibull
fitl2<-phreg(Surv(time = BAYR+1, event = BAtran)~his1+MOMBA+DADBA+sex11, data=e.long1
             , dist = "weibull")
#Piecewise constant
fitl3<-pchreg(Surv(time = BAYR, event = BAtran)~his1+MOMBA+DADBA+sex11,data=e.long1, cuts = c(20,25, 30, 35))


#AIC for exponential
AIC(fitl1)
[1] 6004.528
AIC(fitl2)
[1] 5049.012
-2*fitl3$loglik[2]+length(coef(fitl3))
[1] 3338.596
#Empirical (Cox)
fitle<-coxreg(Surv(time = BAYR, event = BAtran)~his1+DADBA+MOMBA+sex11, data=e.long1)

fitle2<-coxreg(Surv(time = BAYR, event = BAtran)~his1, data=e.long1)

check.dist(fitle, fitl1, main = "Exponential")

summary(fitle)
Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
his1                                                         0.418 
        Hispanic      0.783     0         1 (reference)
    Non Hispanic      0.217    -0.086     0.918     0.107
DADBA                                                        0.759 
               0      0.609     0         1 (reference)
               1      0.391     0.030     1.030     0.097
MOMBA                                                        0.948 
               0      0.617     0         1 (reference)
               1      0.383     0.006     1.006     0.097
sex11                                                        0.775 
             Men      0.477     0         1 (reference)
           Women      0.523     0.025     1.025     0.086

Events                    557 
Total time at risk         42851 
Max. log. likelihood      -3930.1 
LR test statistic         0.99 
Degrees of freedom        4 
Overall p-value           0.911008
summary(fitle2)
Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
his1                                                         0.268 
        Hispanic      0.765     0         1 (reference)
    Non Hispanic      0.235    -0.103     0.902     0.094

Events                    646 
Total time at risk         49868 
Max. log. likelihood      -4643.8 
LR test statistic         1.22 
Degrees of freedom        1 
Overall p-value           0.2684
AIC(fitle)
[1] 7868.295
check.dist(fitle, fitl2, main = "Weibull")

check.dist(fitle, fitl3, main = "Piecewise Exponential")