HW 5 EHA 2

Author

Brandon Flores

For HW 5 I continued to use the National Longitudinal Survey of Youth (1997 cohort). For this analysis I used just one time period rather than the longitudinal methods previously explored. The year that will be observed will be the year of 2004; the first initial wave of the data. The wave for 2019 was initially analyzed but continued running into errors for this portion of the code

” Plot the hazard function on the probability scale} haz<-1/(1+exp(-coef(fit.0))) time<-seq(1,30,1) “

with the time portion of the code I was running into errors. When I ran the data with the year 2004 it was able to run. Other errors occured thorughout the process of fitting the data; usually missing observations as the errors for different variables. I was able to correct these different errors. The only code I was not able to run was

” dat<-expand.grid(year=seq(1,20,1)) dat$genmod<-predict(fit.0, newdata=data.frame(year=as.factor(1:20 )), type=“response”) dat$lin<-predict(fit.l, newdata=dat, type=“response”) dat$sq<-predict(fit.s, newdata=dat, type=“response”) dat$cub<-predict(fit.c, newdata=dat, type=“response”) dat$quart<-predict(fit.q, newdata=dat, type=“response”) dat$spline<-predict(fit.sp, newdata=expand.grid(year=seq(1,20,1)), type=“response”) “

It continued to give me the error of not being able to find my variable BAYR in the dat$sq line. I’ve checked the chunk and everything looked correct and was able to run.

The variables in the models are earning a Bachelors in 2004 (Bachelors04), if the person is Hispanic (his1), respondents sex (sex1), if Bachelors is attained by the father (DADBA), if a Bachelors is attained by the mother (MOMBA), and the age of the respondent when they earned their Bachelors (BAYR).

When observing fit.0 (General Model) when observing all of the variables in the model, the respondents sex has a marginal effect being that women are still at a slightly higher risk of earning a Bachelors when compared to men when observed with all other variables in the model.

This effect is lost with no other variables being statistically significant (outside of BAYR; time constant variables), for fit.l (Linear Model) and fit.s (Square Model). Fit.c (Cubic Model) brought about a much higher statistical significance for sex showing the same results as fit.0 but also DADBA is statistically significant showing that the respondents risk of earning a Bachelors lessens if their father does not have a Bachelors degree.

When observing fit.q (Quartic Model) the statistical significance with sex was lost now showing only a marginal effect. It shows a reverse trend with men tending to be at slightly higher risk than women when earning a Bachelors. With the Splines method it did not show any statistical significance with the variable BAYR.

When observing the AIC’s of all of the models, the Quartic Model exhibited the lowest AIC (797.02). The key variable of being Hispanic was not found to be statistically significant in any model. The only time it was found to be significant was during a Cox Regression but then was controlled away with sex. For future analysis possibly stratifying the Hispanic variable itself with the current data can be able to observe more specified differences as opposed to comparing Hispanics to Non Hispanic whites; which as been found many times through many models to not be significant. Sex has been consistently found to be significant so that could possibly be a key variable to continue to have especially when stratifying the data looking within the Hispanic variable. Also fathers educational attainment should continued to be observed being that it was also found to be statistically significant in one of the models that was conducted in this assignment.

library(car)

Loading required package: carData

library(haven)
library(survival)
library(ggplot2)
library(tidyverse)

── Attaching packages
───────────────────────────────────────
tidyverse 1.3.2 ──

✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 
✔ purrr   0.3.4      
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::recode() masks car::recode()
✖ purrr::some()   masks car::some()

library(survey)

Loading required package: grid
Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack


Attaching package: 'survey'

The following object is masked from 'package:graphics':

    dotchart

library(ggsurvfit)
library(janitor)


Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

dat97<-read_csv("C:\\Users\\BTP\\Desktop\\97cohortNLSYRnum.csv")

Rows: 8984 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (15): ID, SEX, BDATEM, BDATEY, SAMPLETYPE, ETHNICITY, HDEGREE04, HDEGREE...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

myvars<-c( "ID","HDEGREE04", "HDEGREE2010", "HDEGREE2019","ETHNICITY",
           "SEX", "BIOFTHIGD", "BIOMTHIGD", "BDATEY","VSTRAT", "VPSU",
           "samplingweight","DATEBA")

dat97<-dat97[,myvars]

dat97<- dat97 %>%
  filter(HDEGREE04 >=0, HDEGREE2010>=0, HDEGREE2019>=0, DATEBA >=0, ETHNICITY>=0, SEX>=0, VSTRAT>=0, samplingweight>=0, BIOFTHIGD>=0, BIOMTHIGD>=0, ID>=0, BDATEY>=0  ) #filter missing data codes

dat97$Bachelors04 <-Recode(dat97$HDEGREE04, recodes = "0:3 = 0; 4:7 = 1; else=NA", as.factor=T)

## Bachelors degree or higher = 1 & all lesser educations are labled 0

#tabyl(Bachelors04)

dat97$Bachelors10 <-Recode(dat97$HDEGREE2010, recodes = "0:3 = 0; 4:7 = 1; else=NA", as.factor=T)

## Bachelors degree or higher = 1 & all lesser educations are labled 0

dat97 %>%
  tabyl(Bachelors10) ## Bachelors degree or higher in 2010

 Bachelors10    n   percent
           0  213 0.1418109
           1 1289 0.8581891

dat97$Bachelors19 <-Recode(dat97$HDEGREE2019, recodes = "0:3 = 0; 4:7 = 1; else=NA", as.factor=T)

## Bachelors degree or higher = 1 & all lesser educations are labled 0

dat97 %>%
  tabyl(Bachelors19) ## Bachelors degree or higher in 2019

 Bachelors19    n percent
           1 1502       1

dat97$Hispanic<-Recode(dat97$ETHNICITY, recodes = "2 = 0; 4 = 1; else=NA", as.factor=T)

## Hispanics are coded as 0 & Non Hipanic whites are coded as 1, all other ethnicities are excluded

dat97$his1<-as.factor(ifelse(dat97$Hispanic==1, "Hispanic", "Non Hispanic"))


dat97 %>% 
  tabyl(his1) ## Hispanics  and Non Hispanic whites coded

         his1    n   percent valid_percent
     Hispanic 1031 0.6864181     0.8321227
 Non Hispanic  208 0.1384820     0.1678773
         <NA>  263 0.1750999            NA

summary(dat97)

       ID         HDEGREE04      HDEGREE2010     HDEGREE2019      ETHNICITY    
 Min.   :   9   Min.   :0.000   Min.   :0.000   Min.   :4.000   Min.   :1.000  
 1st Qu.:1894   1st Qu.:2.000   1st Qu.:4.000   1st Qu.:4.000   1st Qu.:2.000  
 Median :3644   Median :2.000   Median :4.000   Median :4.000   Median :4.000  
 Mean   :3962   Mean   :2.559   Mean   :3.939   Mean   :4.438   Mean   :3.218  
 3rd Qu.:5873   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:4.000  
 Max.   :9016   Max.   :5.000   Max.   :7.000   Max.   :7.000   Max.   :4.000  
      SEX         BIOFTHIGD       BIOMTHIGD         BDATEY         VSTRAT      
 Min.   :1.00   Min.   : 2.00   Min.   : 1.00   Min.   :1980   Min.   :  1.00  
 1st Qu.:1.00   1st Qu.:12.00   1st Qu.:12.00   1st Qu.:1981   1st Qu.: 19.00  
 Median :2.00   Median :14.00   Median :14.00   Median :1982   Median : 39.00  
 Mean   :1.57   Mean   :14.24   Mean   :14.06   Mean   :1982   Mean   : 41.83  
 3rd Qu.:2.00   3rd Qu.:16.00   3rd Qu.:16.00   3rd Qu.:1983   3rd Qu.: 59.00  
 Max.   :2.00   Max.   :20.00   Max.   :20.00   Max.   :1984   Max.   :116.00  
      VPSU       samplingweight       DATEBA      Bachelors04 Bachelors10
 Min.   :1.000   Min.   :     0   Min.   :257.0   0:1113      0: 213     
 1st Qu.:1.000   1st Qu.:180962   1st Qu.:296.0   1: 389      1:1289     
 Median :1.000   Median :530019   Median :317.0                          
 Mean   :1.476   Mean   :411945   Mean   :326.9                          
 3rd Qu.:2.000   3rd Qu.:595821   3rd Qu.:341.0                          
 Max.   :2.000   Max.   :936477   Max.   :481.0                          
 Bachelors19 Hispanic              his1     
 1:1502      0   : 208   Hispanic    :1031  
             1   :1031   Non Hispanic: 208  
             NA's: 263   NA's        : 263

summary(dat97$DATEBA)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  257.0   296.0   317.0   326.9   341.0   481.0

dat97<- dat97 %>% filter(DATEBA>0)


dat97$BAYR<-ifelse(dat97$HDEGREE04==2,
                   (2004-dat97$BDATEY),
                   ifelse(dat97$HDEGREE04==4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yet


summary(dat97$BAYR)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  20.00   21.00   22.00   22.04   23.42   36.08      95

                  ## For the wave of 2004

dat97$BAYR1<-ifelse(dat97$HDEGREE2010==2,
                   (2010-dat97$BDATEY),
                   ifelse(dat97$HDEGREE2010==4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yet


summary(dat97$BAYR1)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  21.42   25.00   26.42   26.39   27.50   36.08     289

                  ## For the wave of 2010

dat97$BAYR2<-ifelse(dat97$HDEGREE2019==2,
                   (2019-dat97$BDATEY),
                   ifelse(dat97$HDEGREE2019==4,dat97$DATEBA/12,NA)) ## For Censored because they dont have a bachelors degree yet


summary(dat97$BAYR2)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  21.42   25.33   26.58   27.80   29.00   40.08     489

                  ## For the wave of 2019

dat97$sex1<-Recode(dat97$SEX, recodes = "1 = 0; 2 = 1; else=NA", as.factor=T)

## Women are coded as 0 & Men  are coded as 1

dat97$sex11<-as.factor(ifelse(dat97$sex1==1, "Women", "Men"))

dat97$DADBA <-Recode(dat97$BIOFTHIGD, recodes = "0:15 = 0; 16:20 = 1; else=NA", as.factor=T)

## Dads Bachelors degree or higher = 1 & all lesser educations are labled 0

## Dads Bachelors degree or higher

dat97$MOMBA <-Recode(dat97$BIOMTHIGD, recodes = "0:15 = 0; 16:20 = 1; else=NA", as.factor=T)

## Moms Bachelors degree or higher = 1 & all lesser educations are labled 0

## Moms Bachelors degree or higher

library(survival)
pp<-survSplit(Surv(BAYR, Bachelors04)~. ,
data = dat97[dat97$BAYR>0,],
cut=seq(20,32, 5),
episode="year_BA")

pp$BAYR <- pp$BAYR-1
pp<-pp[order(pp$ID, pp$BAYR),]
knitr::kable(head(pp[, c("ID", "BAYR", "Bachelors04", "his1", "MOMBA", "DADBA", "sex1")], n=40))

ID	BAYR	Bachelors04	his1	MOMBA	DADBA	sex1
9	19.00000	censor	Hispanic	0	0	0
9	21.00000	censor	Hispanic	0	0	0
11	19.00000	censor	Non Hispanic	0	0	1
11	21.00000	censor	Non Hispanic	0	0	1
23	19.00000	censor	Non Hispanic	0	0	1
23	20.00000	censor	Non Hispanic	0	0	1
28	19.00000	censor	NA	0	0	1
28	20.00000	censor	NA	0	0	1
31	19.00000	censor	Hispanic	0	0	0
31	23.41667	1	Hispanic	0	0	0
32	19.00000	censor	Hispanic	1	1	1
32	22.41667	1	Hispanic	1	1	1
33	19.00000	censor	Hispanic	0	1	1
33	22.50000	1	Hispanic	0	1	1
34	19.00000	censor	Hispanic	1	0	1
41	19.00000	censor	Hispanic	0	0	1
41	20.00000	censor	Hispanic	0	0	1
55	19.00000	censor	Non Hispanic	0	0	1
55	22.00000	censor	Non Hispanic	0	0	1
56	19.00000	censor	Hispanic	1	1	1
56	23.66667	1	Hispanic	1	1	1
62	19.00000	censor	NA	0	0	0
62	21.41667	1	NA	0	0	0
63	19.00000	censor	NA	0	0	1
67	19.00000	censor	NA	1	0	0
67	22.00000	1	NA	1	0	0
70	19.00000	censor	NA	1	0	0
70	21.00000	censor	NA	1	0	0
86	19.00000	censor	Hispanic	0	0	1
86	23.00000	censor	Hispanic	0	0	1
89	19.00000	censor	Hispanic	0	0	0
89	23.08333	1	Hispanic	0	0	0
90	19.00000	censor	Non Hispanic	1	0	0
90	22.00000	censor	Non Hispanic	1	0	0
91	19.00000	censor	Non Hispanic	1	0	1
91	21.00000	censor	Non Hispanic	1	0	1
97	19.00000	censor	Non Hispanic	0	0	0
97	22.16667	1	Non Hispanic	0	0	0
104	19.00000	censor	Hispanic	0	1	1
104	24.00000	censor	Hispanic	0	1	1

pp<- pp %>%
  filter(HDEGREE04 >=0, HDEGREE2010>=0, HDEGREE2019>=0, DATEBA >=0, ETHNICITY>=0, SEX>=0, VSTRAT>=0, samplingweight>=0, BIOFTHIGD>=0, BIOMTHIGD>=0, ID>=0, BDATEY>=0  ) #filter missing data codes

options(survey.lonely.psu = "adjust")
des<-survey::svydesign(ids=~VPSU,
strata=~VSTRAT,
data=pp,
weight=~samplingweight,
nest=TRUE)

fit.0<-svyglm(Bachelors04~as.factor(BAYR)-1,
design=des,
family=binomial(link="cloglog"))

Warning in eval(family$initialize): non-integer #successes in a binomial glm!

summary(fit.0)


Call:
svyglm(formula = Bachelors04 ~ as.factor(BAYR) - 1, design = des, 
    family = binomial(link = "cloglog"))

Survey design:
survey::svydesign(ids = ~VPSU, strata = ~VSTRAT, data = pp, weight = ~samplingweight, 
    nest = TRUE)

Coefficients: (2 not defined because of singularities)
                                  Estimate Std. Error  t value Pr(>|t|)    
as.factor(BAYR)19               -21.589859   0.043381 -497.685  < 2e-16 ***
as.factor(BAYR)20               -21.544983   0.069037 -312.077  < 2e-16 ***
as.factor(BAYR)20.4166666666667   3.142452   0.043177   72.781  < 2e-16 ***
as.factor(BAYR)21                -3.686846   0.588064   -6.269 3.15e-08 ***
as.factor(BAYR)21.0833333333333   3.141734   0.043208   72.712  < 2e-16 ***
as.factor(BAYR)21.3333333333333   3.141808   0.024959  125.880  < 2e-16 ***
as.factor(BAYR)21.4166666666667   3.140085   0.009053  346.840  < 2e-16 ***
as.factor(BAYR)21.5               3.140616   0.020585  152.570  < 2e-16 ***
as.factor(BAYR)21.5833333333333   3.139851   0.030617  102.553  < 2e-16 ***
as.factor(BAYR)22                -2.388249   0.337959   -7.067 1.23e-09 ***
as.factor(BAYR)22.0833333333333   3.137388   0.026033  120.518  < 2e-16 ***
as.factor(BAYR)22.1666666666667   3.138566   0.027923  112.400  < 2e-16 ***
as.factor(BAYR)22.25              3.128833   0.043769   71.485  < 2e-16 ***
as.factor(BAYR)22.3333333333333   3.138059   0.035269   88.974  < 2e-16 ***
as.factor(BAYR)22.4166666666667   3.138899   0.006565  478.129  < 2e-16 ***
as.factor(BAYR)22.5               3.138780   0.013138  238.910  < 2e-16 ***
as.factor(BAYR)22.5833333333333   3.138399   0.024288  129.217  < 2e-16 ***
as.factor(BAYR)22.6666666666667   3.137952   0.016730  187.561  < 2e-16 ***
as.factor(BAYR)22.9166666666667   3.130165   0.043711   71.611  < 2e-16 ***
as.factor(BAYR)23                -0.742472   0.222452   -3.338  0.00139 ** 
as.factor(BAYR)23.0833333333333   3.135656   0.032279   97.142  < 2e-16 ***
as.factor(BAYR)23.1666666666667   3.141424   0.043221   72.682  < 2e-16 ***
as.factor(BAYR)23.3333333333333   3.138704   0.014154  221.747  < 2e-16 ***
as.factor(BAYR)23.4166666666667   3.137915   0.004334  723.941  < 2e-16 ***
as.factor(BAYR)23.5               3.135907   0.012461  251.665  < 2e-16 ***
as.factor(BAYR)23.5833333333333   3.137404   0.026283  119.371  < 2e-16 ***
as.factor(BAYR)23.6666666666667   3.136592   0.011664  268.914  < 2e-16 ***
as.factor(BAYR)24               -21.759442   1.000000  -21.759  < 2e-16 ***
as.factor(BAYR)29               -21.759442   1.000000  -21.759  < 2e-16 ***
as.factor(BAYR)35.0833333333333   3.140970   0.043241   72.639  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 0.1877941)

Number of Fisher Scoring iterations: 20

#Plot the hazard function on the probability scale
haz<-1/(1+exp(-coef(fit.0)))
time<-seq(1,30,1)
plot(haz~time, type="l", ylab="h(t)")
title(main="Discrete Time Hazard Function for Bachelors Attainment 2004")

St<-NA
time<-1:length(haz)
St[1]<-1-haz[1]
for(i in 2:length(haz)){
St[i]<-St[i-1]* (1-haz[i])
}
St<-c(1, St)
time<-c(0, time)
plot(y=St,x=time, type="l",
main="Survival function for earning a Bachelors 2004")

#Linear term for time
fit.0<-svyglm(Bachelors04~his1++sex1+DADBA+MOMBA+1,
design=des ,
family=binomial(link="cloglog"))

Warning in eval(family$initialize): non-integer #successes in a binomial glm!

summary(fit.0)


Call:
svyglm(formula = Bachelors04 ~ his1 + +sex1 + DADBA + MOMBA + 
    1, design = des, family = binomial(link = "cloglog"))

Survey design:
survey::svydesign(ids = ~VPSU, strata = ~VSTRAT, data = pp, weight = ~samplingweight, 
    nest = TRUE)

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      -1.79273    0.12608 -14.219   <2e-16 ***
his1Non Hispanic -0.27747    0.20718  -1.339   0.1848    
sex11             0.21216    0.11250   1.886   0.0635 .  
DADBA1            0.09959    0.11663   0.854   0.3961    
MOMBA1           -0.03069    0.11259  -0.273   0.7859    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1.092358)

Number of Fisher Scoring iterations: 5

1/(1+exp(-coef(fit.0)))

     (Intercept) his1Non Hispanic            sex11           DADBA1 
       0.1427387        0.4310753        0.5528408        0.5248774 
          MOMBA1 
       0.4923272

#Linear term for time
fit.l<-svyglm(Bachelors04~BAYR+his1+sex1+DADBA+MOMBA,
design=des ,
family=binomial(link="cloglog"))

Warning in eval(family$initialize): non-integer #successes in a binomial glm!

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(fit.l)


Call:
svyglm(formula = Bachelors04 ~ BAYR + his1 + sex1 + DADBA + MOMBA, 
    design = des, family = binomial(link = "cloglog"))

Survey design:
survey::svydesign(ids = ~VPSU, strata = ~VSTRAT, data = pp, weight = ~samplingweight, 
    nest = TRUE)

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      -29.67934    2.46906 -12.021   <2e-16 ***
BAYR               1.30170    0.10970  11.866   <2e-16 ***
his1Non Hispanic  -0.11653    0.29472  -0.395    0.694    
sex11              0.20006    0.17987   1.112    0.270    
DADBA1             0.09722    0.22850   0.425    0.672    
MOMBA1             0.12137    0.22747   0.534    0.595    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 3.753416e+12)

Number of Fisher Scoring iterations: 11

fit.s<-svyglm(Bachelors04~BAYR+his1+sex1+DADBA+MOMBA+I(BAYR^2),
design=des ,
family=binomial(link="cloglog"))

Warning in eval(family$initialize): non-integer #successes in a binomial glm!

Warning: glm.fit: algorithm did not converge

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(fit.s)


Call:
svyglm(formula = Bachelors04 ~ BAYR + his1 + sex1 + DADBA + MOMBA + 
    I(BAYR^2), design = des, family = binomial(link = "cloglog"))

Survey design:
survey::svydesign(ids = ~VPSU, strata = ~VSTRAT, data = pp, weight = ~samplingweight, 
    nest = TRUE)

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
(Intercept)       6.171e+01  7.135e+12       0        1
BAYR             -7.035e+00  6.560e+11       0        1
his1Non Hispanic -1.038e-01  2.515e+09       0        1
sex11             2.336e-01  1.307e+08       0        1
DADBA1            1.488e-01  2.186e+09       0        1
MOMBA1            2.661e-02  2.218e+09       0        1
I(BAYR^2)         1.898e-01  1.504e+10       0        1

(Dispersion parameter for binomial family taken to be 3.753416e+12)

Number of Fisher Scoring iterations: 25

fit.c<-svyglm(Bachelors04~BAYR+his1+sex1+DADBA+MOMBA+I(BAYR^2)+I(BAYR^3 ),
design=des ,
family=binomial(link="cloglog"))

Warning in eval(family$initialize): non-integer #successes in a binomial glm!

Warning: glm.fit: algorithm did not converge

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(fit.c)


Call:
svyglm(formula = Bachelors04 ~ BAYR + his1 + sex1 + DADBA + MOMBA + 
    I(BAYR^2) + I(BAYR^3), design = des, family = binomial(link = "cloglog"))

Survey design:
survey::svydesign(ids = ~VPSU, strata = ~VSTRAT, data = pp, weight = ~samplingweight, 
    nest = TRUE)

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)      -3.228e+17  2.544e+16 -12.688  < 2e-16 ***
BAYR              3.803e+16  3.186e+15  11.934  < 2e-16 ***
his1Non Hispanic -9.494e+12  1.435e+14  -0.066  0.94746    
sex11            -2.722e+14  8.602e+13  -3.165  0.00233 ** 
DADBA1           -2.533e+14  8.251e+13  -3.070  0.00309 ** 
MOMBA1            3.095e+13  8.252e+13   0.375  0.70879    
I(BAYR^2)        -1.466e+15  1.293e+14 -11.336  < 2e-16 ***
I(BAYR^3)         1.845e+13  1.691e+12  10.912  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 5.027167e+14)

Number of Fisher Scoring iterations: 25

fit.q<-svyglm(Bachelors04~BAYR+his1+sex1+DADBA+MOMBA+I(BAYR^2)+I(BAYR^3 )+I(BAYR^4),
design=des ,
family=binomial(link="cloglog"))

Warning in eval(family$initialize): non-integer #successes in a binomial glm!

summary(fit.q)


Call:
svyglm(formula = Bachelors04 ~ BAYR + his1 + sex1 + DADBA + MOMBA + 
    I(BAYR^2) + I(BAYR^3) + I(BAYR^4), design = des, family = binomial(link = "cloglog"))

Survey design:
survey::svydesign(ids = ~VPSU, strata = ~VSTRAT, data = pp, weight = ~samplingweight, 
    nest = TRUE)

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)  
(Intercept)      -7.561e+02  2.975e+02  -2.542   0.0134 *
BAYR              9.801e+01  4.637e+01   2.113   0.0384 *
his1Non Hispanic -1.899e-01  2.472e-01  -0.768   0.4450  
sex11             2.749e-01  1.385e-01   1.985   0.0513 .
DADBA1            1.687e-01  1.469e-01   1.149   0.2547  
MOMBA1            5.090e-02  1.556e-01   0.327   0.7446  
I(BAYR^2)        -4.602e+00  2.691e+00  -1.710   0.0919 .
I(BAYR^3)         9.204e-02  6.886e-02   1.337   0.1859  
I(BAYR^4)        -6.538e-04  6.546e-04  -0.999   0.3215  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 0.3968849)

Number of Fisher Scoring iterations: 10

#Spline
library(splines)
fit.sp<-svyglm(Bachelors04~ns(BAYR, df = 3),
design=des ,
family=binomial(link="cloglog"))

Warning in eval(family$initialize): non-integer #successes in a binomial glm!

Warning: glm.fit: algorithm did not converge

summary(fit.sp)


Call:
svyglm(formula = Bachelors04 ~ ns(BAYR, df = 3), design = des, 
    family = binomial(link = "cloglog"))

Survey design:
survey::svydesign(ids = ~VPSU, strata = ~VSTRAT, data = pp, weight = ~samplingweight, 
    nest = TRUE)

Coefficients: (1 not defined because of singularities)
                  Estimate Std. Error t value Pr(>|t|)
(Intercept)        -2.8762    28.2200  -0.102    0.919
ns(BAYR, df = 3)1  11.3137    13.9656   0.810    0.420
ns(BAYR, df = 3)2   0.5812    55.4097   0.010    0.992

(Dispersion parameter for binomial family taken to be 0.4329859)

Number of Fisher Scoring iterations: 25

#AIC table
aic<-round(c(
fit.l$deviance+2*length(fit.l$coefficients),
fit.s$deviance+2*length(fit.s$coefficients),
fit.c$deviance+2*length(fit.c$coefficients),
fit.q$deviance+2*length(fit.q$coefficients),
fit.sp$deviance+2*length(fit.sp$coefficients),
fit.0$deviance+2*length(fit.0$coefficients)),2)
#compare all aics to the one from the general model
dif.aic<-round(aic-aic[6],2)
data.frame(model =c( "linear","square", "cubic", "quartic","spline", "general"),
aic=aic,
aic_dif=dif.aic)

    model      aic  aic_dif
1  linear   970.14 -1119.10
2  square  1099.51  -989.73
3   cubic 17960.80 15871.56
4 quartic   797.02 -1292.22
5  spline   913.84 -1175.40
6 general  2089.24     0.00