Homework 4

##thoughts of suicide wave 1
mergeddata4$sui_1<-Recode(mergeddata4$H1SU1, recodes="1=1; 0=0; else=NA")
## age, if friend attempted suicide, if family attempted, if live alone, education, thoughts of suicide, sex, race, closeness to mentor, and thoughts on likelihood of divorce at wave 3
mergeddata4$age_3<-(mergeddata4$IYEAR3 - mergeddata4$H3OD1Y)
mergeddata4$fri_sui3<-Recode(mergeddata4$H3TO133, recodes="1=1; 0=0; else=NA")
mergeddata4$fam_sui3 <-Recode(mergeddata4$H3TO135, recodes="1=1; 0=0; else=NA")
mergeddata4$live_03<-Recode(mergeddata4$H3HR5, recodes="1=1; 2=0; else=NA")
mergeddata4$H3ED1 <- as.numeric(mergeddata4$H3ED1)
mergeddata4$educ_re3<-Recode(mergeddata4$H3ED1, recodes= "6:11='lesshigh'; 
                              12='high'; 13:15='somecol'; 16:22='col'; else=NA", as.factor=T)
mergeddata4$sui_3<-Recode(mergeddata4$H3TO130, recodes="1=1; 0=0; else=NA")
mergeddata4$male_3<-Recode(mergeddata4$BIO_SEX3, recodes="1=1; 2=0; else=NA")
mergeddata4$H3IR4 <- as.numeric(mergeddata4$H3IR4)
mergeddata4$race_3<-Recode(mergeddata4$H3IR4, recodes= "1='nwhite'; 
                              2='nhblack'; 3:5='other'; else=NA", as.factor=T)
mergeddata4$H3MN11 <- as.numeric(mergeddata4$H3MN11)
mergeddata4$clo_ment3<-Recode(mergeddata4$H3MN11, recodes= "0='no'; 
                              1:2='somewhat'; 3:4='very'; else=NA", as.factor=T)
##frequency of church activities
mergeddata4$H3RE30<- as.numeric(mergeddata4$H3RE30)
mergeddata4$rel_impor3<-Recode(mergeddata4$H3RE30, recodes= "0='nope'; 
                              1='somewhat'; 2='very'; 4='most_imp'; else=NA", as.factor=T)
## religion is important
mergeddata4$H3RE25 <- as.numeric(mergeddata4$H3RE25)
mergeddata4$act_chur3<-Recode(mergeddata4$H3RE25, recodes= "0='never'; 
                              1:2='few_year'; 3='one_month'; 4:5='few_mon';  6='mul_days'; else=NA", as.factor=T)
##profound religious experience
mergeddata4$pro_exp3<-Recode(mergeddata4$H3RE34, recodes="1=1; 0=0; else=NA")
## religion guides me
mergeddata4$H3RE38<- as.numeric(mergeddata4$H3RE38)
mergeddata4$rel_guide3<-Recode(mergeddata4$H3RE38, recodes= "1:2='agree'; 3='unsure'; 4:5='disagree'; else=NA", as.factor=T)
##angels watch me
mergeddata4$H3RE39<- as.numeric(mergeddata4$H3RE39)
mergeddata4$ang_watch3<-Recode(mergeddata4$H3RE39, recodes= "1:2='agree'; 3='unsure'; 4:5='disagree'; else=NA", as.factor=T)
## thoughts of suicide at wave four
mergeddata4$sui_4<-Recode(mergeddata4$H4SE1, recodes="1=1; 0=0; else=NA")
## use scale ( ) function for continuous, makes a z score
## in order to make an interaction term you do the following (variable name * variable name 2) within the regression model, do not attempt to make a unique variable
## no strata value, does not effect standard errors that much

##Selecting my variables
myvars<-c( "AID", "age_3", "clo_ment3", "male_3", "fri_sui3", "educ_re3", "act_chur3", "rel_impor3", "pro_exp3", "rel_guide3", "ang_watch3", "race_3", "fam_sui3", "sui_1", "sui_3", "sui_4", "GSWGT134", "CLUSTER2")
mergeddata4<-mergeddata4[,myvars]
##subsetting data
sam <- mergeddata4 %>% 
  filter(complete.cases(.))
##suicide transition variable
sam<-sam %>% 
mutate(suitran_1 =ifelse(sui_1==0 & sui_3 ==0, 0,1),
       sui_tran2 =ifelse(suitran_1==1, NA,
                         ifelse(sui_3==0 & sui_4==0,0,1)))
#suicide thought transition based on self reported education at wave 3]

options(survey.lonely.psu = "adjust")
des<-svydesign(ids =~CLUSTER2, weights = ~GSWGT134,
    data = sam, nest = T)


##Fit the Cox model the data. Include all main effects in the model. Test for an interaction between at least two of the predictors.

fitl<-svycoxph(formula = Surv(time = age_3, event = suitran_1) ~ 
    male_3 + educ_re3, design = des)
summary(fitl)

## 1 - level Cluster Sampling design (with replacement)
## With (131) clusters.
## svydesign(ids = ~CLUSTER2, weights = ~GSWGT134, data = sam, nest = T)

## Call:
## svycoxph(formula = Surv(time = age_3, event = suitran_1) ~ male_3 + 
##     educ_re3, design = des)
## 
##   n= 2559, number of events= 472 
## 
##                      coef exp(coef) se(coef) robust se      z Pr(>|z|)    
## male_3           -0.42799   0.65182  0.09326   0.13394 -3.195   0.0014 ** 
## educ_re3high      0.93560   2.54874  0.15591   0.17397  5.378 7.53e-08 ***
## educ_re3lesshigh  1.13577   3.11357  0.17939   0.20499  5.541 3.02e-08 ***
## educ_re3somecol   0.93487   2.54689  0.15114   0.14585  6.410 1.46e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                  exp(coef) exp(-coef) lower .95 upper .95
## male_3              0.6518     1.5342    0.5013    0.8475
## educ_re3high        2.5487     0.3924    1.8124    3.5843
## educ_re3lesshigh    3.1136     0.3212    2.0834    4.6532
## educ_re3somecol     2.5469     0.3926    1.9137    3.3897
## 
## Concordance= 0.642  (se = 0.015 )
## Likelihood ratio test= NA  on 4 df,   p=NA
## Wald test            = 48.82  on 4 df,   p=6e-10
## Score (logrank) test = NA  on 4 df,   p=NA
## 
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).

fitl1<-svycoxph(formula = Surv(time = age_3, event = suitran_1) ~ 
    male_3 + fri_sui3 + fam_sui3 + educ_re3 + act_chur3 + rel_impor3 + rel_guide3, design = des)
summary(fitl1)

## 1 - level Cluster Sampling design (with replacement)
## With (131) clusters.
## svydesign(ids = ~CLUSTER2, weights = ~GSWGT134, data = sam, nest = T)

## Call:
## svycoxph(formula = Surv(time = age_3, event = suitran_1) ~ male_3 + 
##     fri_sui3 + fam_sui3 + educ_re3 + act_chur3 + rel_impor3 + 
##     rel_guide3, design = des)
## 
##   n= 2559, number of events= 472 
## 
##                        coef exp(coef) se(coef) robust se      z Pr(>|z|)    
## male_3             -0.47418   0.62240  0.09407   0.13054 -3.632 0.000281 ***
## fri_sui3            0.85858   2.35980  0.13681   0.16853  5.095 3.50e-07 ***
## fam_sui3            0.55567   1.74310  0.19213   0.21545  2.579 0.009904 ** 
## educ_re3high        0.95420   2.59659  0.15670   0.17496  5.454 4.93e-08 ***
## educ_re3lesshigh    1.07153   2.91983  0.18025   0.21658  4.948 7.51e-07 ***
## educ_re3somecol     0.90977   2.48374  0.15157   0.14793  6.150 7.75e-10 ***
## act_chur3few_year   0.93108   2.53725  0.40796   0.51200  1.819 0.068983 .  
## act_chur3mul_days   1.61291   5.01739  0.60322   0.73564  2.193 0.028341 *  
## act_chur3never      0.82144   2.27376  0.39598   0.50730  1.619 0.105394    
## act_chur3one_month  0.91612   2.49958  0.56467   0.65852  1.391 0.164173    
## rel_impor3somewhat -0.18542   0.83075  0.12998   0.16270 -1.140 0.254430    
## rel_impor3very     -0.57439   0.56305  0.14813   0.18830 -3.050 0.002285 ** 
## rel_guide3disagree  0.07377   1.07656  0.13900   0.21338  0.346 0.729547    
## rel_guide3unsure    0.23979   1.27098  0.11322   0.14765  1.624 0.104363    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                    exp(coef) exp(-coef) lower .95 upper .95
## male_3                0.6224     1.6067    0.4819    0.8039
## fri_sui3              2.3598     0.4238    1.6960    3.2834
## fam_sui3              1.7431     0.5737    1.1427    2.6590
## educ_re3high          2.5966     0.3851    1.8428    3.6587
## educ_re3lesshigh      2.9198     0.3425    1.9099    4.4638
## educ_re3somecol       2.4837     0.4026    1.8586    3.3191
## act_chur3few_year     2.5373     0.3941    0.9301    6.9211
## act_chur3mul_days     5.0174     0.1993    1.1866   21.2156
## act_chur3never        2.2738     0.4398    0.8413    6.1455
## act_chur3one_month    2.4996     0.4001    0.6876    9.0867
## rel_impor3somewhat    0.8308     1.2037    0.6039    1.1428
## rel_impor3very        0.5631     1.7760    0.3893    0.8144
## rel_guide3disagree    1.0766     0.9289    0.7086    1.6356
## rel_guide3unsure      1.2710     0.7868    0.9516    1.6975
## 
## Concordance= 0.691  (se = 0.015 )
## Likelihood ratio test= NA  on 14 df,   p=NA
## Wald test            = 126.2  on 14 df,   p=<2e-16
## Score (logrank) test = NA  on 14 df,   p=NA
## 
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).

fitl2<-svycoxph(formula = Surv(time = age_3, event = suitran_1) ~ 
    male_3 + fri_sui3 +fam_sui3 + educ_re3 + act_chur3 + rel_impor3 + rel_guide3 +male_3*fri_sui3 + +male_3*educ_re3, design = des)
summary(fitl2)

## 1 - level Cluster Sampling design (with replacement)
## With (131) clusters.
## svydesign(ids = ~CLUSTER2, weights = ~GSWGT134, data = sam, nest = T)

## Call:
## svycoxph(formula = Surv(time = age_3, event = suitran_1) ~ male_3 + 
##     fri_sui3 + fam_sui3 + educ_re3 + act_chur3 + rel_impor3 + 
##     rel_guide3 + male_3 * fri_sui3 + +male_3 * educ_re3, design = des)
## 
##   n= 2559, number of events= 472 
## 
##                             coef exp(coef) se(coef) robust se      z Pr(>|z|)
## male_3                   0.41584   1.51564  0.27183   0.27317  1.522 0.127946
## fri_sui3                 0.71949   2.05338  0.18135   0.23840  3.018 0.002545
## fam_sui3                 0.54165   1.71885  0.19350   0.21158  2.560 0.010464
## educ_re3high             1.40003   4.05534  0.23367   0.21839  6.411 1.45e-10
## educ_re3lesshigh         1.76320   5.83106  0.25069   0.26625  6.622 3.53e-11
## educ_re3somecol          1.35311   3.86945  0.22436   0.18734  7.223 5.09e-13
## act_chur3few_year        0.92030   2.51004  0.40850   0.51456  1.789 0.073692
## act_chur3mul_days        1.62825   5.09494  0.60367   0.72980  2.231 0.025676
## act_chur3never           0.81004   2.24799  0.39612   0.50751  1.596 0.110468
## act_chur3one_month       0.90668   2.47609  0.56465   0.65712  1.380 0.167656
## rel_impor3somewhat      -0.15448   0.85686  0.13004   0.15712 -0.983 0.325521
## rel_impor3very          -0.54123   0.58203  0.14826   0.18004 -3.006 0.002646
## rel_guide3disagree       0.06969   1.07217  0.13972   0.21323  0.327 0.743808
## rel_guide3unsure         0.26008   1.29703  0.11345   0.14213  1.830 0.067268
## male_3:fri_sui3          0.30881   1.36180  0.26654   0.29532  1.046 0.295702
## male_3:educ_re3high     -0.95936   0.38314  0.31402   0.33086 -2.900 0.003736
## male_3:educ_re3lesshigh -1.61669   0.19855  0.37572   0.43065 -3.754 0.000174
## male_3:educ_re3somecol  -0.94977   0.38683  0.30782   0.30021 -3.164 0.001558
##                            
## male_3                     
## fri_sui3                ** 
## fam_sui3                *  
## educ_re3high            ***
## educ_re3lesshigh        ***
## educ_re3somecol         ***
## act_chur3few_year       .  
## act_chur3mul_days       *  
## act_chur3never             
## act_chur3one_month         
## rel_impor3somewhat         
## rel_impor3very          ** 
## rel_guide3disagree         
## rel_guide3unsure        .  
## male_3:fri_sui3            
## male_3:educ_re3high     ** 
## male_3:educ_re3lesshigh ***
## male_3:educ_re3somecol  ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## male_3                     1.5156     0.6598   0.88730    2.5889
## fri_sui3                   2.0534     0.4870   1.28689    3.2764
## fam_sui3                   1.7188     0.5818   1.13539    2.6021
## educ_re3high               4.0553     0.2466   2.64322    6.2219
## educ_re3lesshigh           5.8311     0.1715   3.46032    9.8260
## educ_re3somecol            3.8694     0.2584   2.68032    5.5861
## act_chur3few_year          2.5100     0.3984   0.91556    6.8813
## act_chur3mul_days          5.0949     0.1963   1.21879   21.2984
## act_chur3never             2.2480     0.4448   0.83138    6.0784
## act_chur3one_month         2.4761     0.4039   0.68300    8.9765
## rel_impor3somewhat         0.8569     1.1671   0.62974    1.1659
## rel_impor3very             0.5820     1.7181   0.40898    0.8283
## rel_guide3disagree         1.0722     0.9327   0.70593    1.6284
## rel_guide3unsure           1.2970     0.7710   0.98168    1.7137
## male_3:fri_sui3            1.3618     0.7343   0.76338    2.4293
## male_3:educ_re3high        0.3831     2.6100   0.20032    0.7328
## male_3:educ_re3lesshigh    0.1986     5.0364   0.08537    0.4618
## male_3:educ_re3somecol     0.3868     2.5851   0.21477    0.6967
## 
## Concordance= 0.697  (se = 0.015 )
## Likelihood ratio test= NA  on 18 df,   p=NA
## Wald test            = 182.7  on 18 df,   p=<2e-16
## Score (logrank) test = NA  on 18 df,   p=NA
## 
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).

##Evaluate the assumptions of the Cox model, including proportionality of hazards of covariates, and if you use a continuous variable, evaluate linearity of the effect using Martingale residuals
##use schoenfeld
schoenresid<-resid(fitl1, type="schoenfeld")

fit.sr<-lm(schoenresid~des$variables$age_3[des$variables$suitran_1==1])

fit.sr%>%
  broom::tidy()%>%
  filter(term != "(Intercept)")%>%
  select(response, estimate, statistic, p.value)

## # A tibble: 14 × 4
##    response             estimate statistic p.value
##    <chr>                   <dbl>     <dbl>   <dbl>
##  1 male_3             -0.00144    -0.115    0.908 
##  2 fri_sui3            0.0160      1.84     0.0658
##  3 fam_sui3            0.00614     0.940    0.348 
##  4 educ_re3high        0.00577     0.480    0.632 
##  5 educ_re3lesshigh    0.000897    0.102    0.918 
##  6 educ_re3somecol    -0.00451    -0.350    0.726 
##  7 act_chur3few_year   0.00661     0.721    0.471 
##  8 act_chur3mul_days   0.00220     0.923    0.356 
##  9 act_chur3never     -0.0105     -1.05     0.295 
## 10 act_chur3one_month  0.00215     0.681    0.496 
## 11 rel_impor3somewhat -0.000697   -0.0549   0.956 
## 12 rel_impor3very      0.00574     0.443    0.658 
## 13 rel_guide3disagree -0.00834    -0.768    0.443 
## 14 rel_guide3unsure    0.0000480   0.00394  0.997

schoenresid<-resid(fitl2, type="schoenfeld")

fit.sr<-lm(schoenresid~des$variables$age_3[des$variables$suitran_1==1])

fit.sr%>%
  broom::tidy()%>%
  filter(term != "(Intercept)")%>%
  select(response, estimate, statistic, p.value)

## # A tibble: 18 × 4
##    response                 estimate statistic p.value
##    <chr>                       <dbl>     <dbl>   <dbl>
##  1 male_3                  -0.00126    -0.101   0.920 
##  2 fri_sui3                 0.0161      1.85    0.0652
##  3 fam_sui3                 0.00608     0.930   0.353 
##  4 educ_re3high             0.00597     0.496   0.620 
##  5 educ_re3lesshigh         0.000761    0.0870  0.931 
##  6 educ_re3somecol         -0.00429    -0.333   0.740 
##  7 act_chur3few_year        0.00661     0.722   0.471 
##  8 act_chur3mul_days        0.00218     0.916   0.360 
##  9 act_chur3never          -0.0105     -1.05    0.295 
## 10 act_chur3one_month       0.00214     0.679   0.497 
## 11 rel_impor3somewhat      -0.000998   -0.0786  0.937 
## 12 rel_impor3very           0.00596     0.460   0.646 
## 13 rel_guide3disagree      -0.00824    -0.759   0.448 
## 14 rel_guide3unsure        -0.000163   -0.0134  0.989 
## 15 male_3:fri_sui3          0.000187    0.0297  0.976 
## 16 male_3:educ_re3high     -0.00147    -0.168   0.866 
## 17 male_3:educ_re3lesshigh -0.00242    -0.496   0.620 
## 18 male_3:educ_re3somecol   0.00600     0.660   0.510

fit.test<-cox.zph(fitl1)
fit.test

##               chisq df    p
## male_3     0.001355  1 0.97
## fri_sui3   0.001537  1 0.97
## fam_sui3   0.000075  1 0.99
## educ_re3   0.015940  3 1.00
## act_chur3  0.002712  4 1.00
## rel_impor3 0.001770  2 1.00
## rel_guide3 0.002397  2 1.00
## GLOBAL     0.027794 14 1.00

plot(fit.test, df=2)

fit.test<-cox.zph(fitl2)
fit.test

##                    chisq df    p
## male_3          1.12e-03  1 0.97
## fri_sui3        1.56e-03  1 0.97
## fam_sui3        5.54e-05  1 0.99
## educ_re3        1.70e-02  3 1.00
## act_chur3       2.65e-03  4 1.00
## rel_impor3      1.69e-03  2 1.00
## rel_guide3      2.14e-03  2 1.00
## male_3:fri_sui3 4.67e-05  1 0.99
## male_3:educ_re3 7.15e-04  3 1.00
## GLOBAL          2.87e-02 18 1.00

plot(fit.test, df=2)

## As the cox model fitted to the data was based on testing for proportionality Schoenfeld Residuals were used along with the Grambsch and Therneau (1994) test for non-proportional effects were used. Based on the result there appears to be no dependence between the residuals and the outcome, or are correlated with the timing of the transition and thus is most likely proportional.

##Produce plots of the survival function and the cumulative hazard function for different “risk profiles”, as done in the notes
## For these models i decided to use only the predictors that have been consistent thus far. A third model was made just to include items on sex, and education,  having made a suicide attempt.

sam$risk<-exp(fitl$linear.predictors)
#highest risk child
sam[which.max(sam$risk),c("AID", "educ_re3", "male_3",  "risk")]

## # A tibble: 1 × 4
##   AID      educ_re3 male_3     risk
##   <chr>    <fct>    <dbl+lbl> <dbl>
## 1 57101310 lesshigh 0          3.11

#lowest risk child
sam[which.min(sam$risk),c("AID", "educ_re3", "male_3", "risk")]

## # A tibble: 1 × 4
##   AID      educ_re3 male_3        risk
##   <chr>    <fct>    <dbl+lbl>    <dbl>
## 1 57232895 col      1 [(1) Male] 0.652

sam<-sam%>%
   mutate(rrisk= round(risk, 4))%>%
   arrange(risk)
e.u<-unique(sam$rrisk)
e.u[order(e.u)]

## [1] 0.6518 1.0000 1.6601 1.6613 2.0295 2.5469 2.5487 3.1136

head(sam[which.max(sam$rrisk),c("AID", "educ_re3", "male_3", "risk")], n=20)

## # A tibble: 1 × 4
##   AID      educ_re3 male_3     risk
##   <chr>    <fct>    <dbl+lbl> <dbl>
## 1 57101310 lesshigh 0          3.11

tail(sam[which.min(sam$rrisk),c("AID", "educ_re3", "male_3", "risk")], n=20)

## # A tibble: 1 × 4
##   AID      educ_re3 male_3        risk
##   <chr>    <fct>    <dbl+lbl>    <dbl>
## 1 57232895 col      1 [(1) Male] 0.652

plot(survfit(fitl, conf.int = F),
     ylab="S(t)",
     xlab="AGE",
     xlim=c(20,22))


lines(survfit(fitl, newdata = data.frame(male_3=1, educ_re3="col"),
              conf.int=F) ,col="red", lty=1)
lines(survfit(fitl, newdata = data.frame(male_3=0, educ_re3="col"),
              conf.int=F) ,col="red", lty=2)


lines(survfit(fitl, newdata = data.frame(male_3=1, educ_re3="somecol"),
              conf.int=F) ,col="green", lty=1)
lines(survfit(fitl, newdata = data.frame(male_3=0, educ_re3="somecol"),
              conf.int=F) ,col="green", lty=2)

lines(survfit(fitl, newdata = data.frame(male_3=1, educ_re3="high"),
              conf.int=F) ,col="blue", lty=1)
lines(survfit(fitl, newdata = data.frame(male_3=0, educ_re3="high"),
              conf.int=F) ,col="blue", lty=2)

lines(survfit(fitl, newdata = data.frame(male_3=1, educ_re3="lesshigh"),
              conf.int=F) ,col="purple", lty=1)
lines(survfit(fitl, newdata = data.frame(male_3=0, educ_re3="lesshigh"),
              conf.int=F) ,col="purple", lty=2)


title(main=c("Survival function for suicidal thought transition between",
             "ages 20 and 22"))
legend("bottomleft", 
       legend=c("mean", "College, Male ", "College, Female","Some College, Male ",
                "Some College, Female", "Highschool, Male ", "Highschool Female", "Less High, Male ", "Less High, Female"),
       col=c(1,"red","red", "green", "green", "blue", "blue", "purple", "purple"),
       lty=c(1,1,2,1,2,1,2,1,2), cex=.8)

sf1<-survfit(fitl)
sf2<-survfit(fitl,
  newdata=data.frame(educ_re3="col",
                     male_3=1,
                     age_3="(20,22]"))

sf3<-survfit(fitl,
  newdata=data.frame(educ_re3="col",
                     male_3=0,
                     age_3="(20,22]"))

sf4<-survfit(fitl,
  newdata=data.frame(educ_re3="somecol",
                     male_3=1,
                     age_3="(20,22]"))

sf5<-survfit(fitl,
  newdata=data.frame(educ_re3="somecol",
                     male_3=0,
                     age_3="(20,22]"))

H1<--log(sf1$surv)
H2<--log(sf2$surv)
H3<--log(sf3$surv)
H4<--log(sf4$surv)
H5<--log(sf5$surv)

test<- data.frame(H=c(H1, H2, H3),  #error -
                  group=c(rep("means", length(H1)),
                          rep("high", length(H1)),
                          rep("low edu", length(H1))),
                  times=sf1$time)

##and the hazard function
test %>%  
  ggplot(aes(x=times, y=H,color=group))+ 
  geom_line()

times<- sf1$time
hs1<-loess(diff(c(0,H1))~times, degree=2, span=.25)

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 17.955

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 1.045

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.092

hs2<-loess(diff(c(0,H2))~times, degree=2, span=.25)

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 17.955

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 1.045

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.092

hs3<-loess(diff(c(0,H3))~times, degree=2, span=.25)

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 17.955

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 1.045

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.092

hs4<-loess(diff(c(0,H4))~times, degree=1, span=.25)

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 17.955

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 1.045

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.092

hs5<-loess(diff(c(0,H5))~times, degree=1, span=.25)

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 17.955

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 1.045

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0

## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.092

plot(predict(hs1)~times, ylab="smoothed h(t)", xlab="AGE")
     title(main="Smoothed hazard plots")
     lines(predict(hs2)~times, col="red")
     lines(predict(hs3)~times,  col="green")
     lines(predict(hs4)~times, col="blue")
     lines(predict(hs5)~times,  col="purple")
     legend("topright",
            legend=c("Means of Covariates","College Male, age 20-22", "College Female, age 20-22", "Some College Male,    age 20-22", "Some College Female, age 20-22"),
            lty=1, col=c(1,"red", "green", "blue", "purple"), cex=,8)

##Cox regression models
##1) Carry out the following analysis:
##Define your outcome as in HW 1. Also consider what covariates are hypothesized to affect the outcome variable. Define these and construct a parametric model for your outcome.



##The following parametric model was fitted to the main dependent variable of having thoughts of suicide at wave three. Thoughts of suicide at wave 1,3,4 were used to construct a thoughts of suicide transition variable in order to predict future thoughts of suicide. With various researchers hypothesizing which covariates/risk factors may increase or decrease chances of suicidal thoughts, or attempts emerging in people. With Durkheim (1987) focusing on the importance of religion and social integration, others have hypothesized the following variables including whites compared to minorities (Molock et al 2008), males more often than females (Leeners 1996), exposure to suicide from family members (Kline et al 2021), younger and older people (Bridge et al 2015; Steele et al 2018). The following will be variables were then constructed in order to test the predictive power of a future risk of suicide variables of closeness to a mentor, if family or friend had recently attempted suicide in the past year, highest level of education reached including college, some college, highschool, and less than highschool, age at time of survey, as common demographic variables, and sex being male vs female. Race provided fascinating results in the last analyses with it being found to be not significant at predicting risk of future suicidal thoughts, it will be kept for the present research, with two models being made, one with and one without race. Race is thus coded as Black, White and Other. Religious variables were also added either as binary or categorical based variables. 

## The results were run multiple times with some of the religious variables ( angels watching, profoundness of religion) were not significant in predicting a future thoughts of suicide and were thus removed from the model. Race consistently was tested by itself, and without other potential covariates and was also thus removed from the models. 

##Interaction term

##As sex is listed in most if not all the research regarding suicide, sex will be used as an interaction term with a few variables including the education and friend attempted suicide item.

## Model 0 indicates a simplistic model compared to 1 and 2 which include many more covariates. The results indicate that males, and those with higher amounts of education have lower risk of suicide compared to females and those with lower educational attainment.


##Model 1 indicates a cox proportional hazard model for the event or risk of having thoughts of suicide. Interestingly, males were at much less risk of having thoughts of suicide than females. The race was removed from the primary models as that variable had no significant relationship with the risk of thoughts of suicide. If friends and family had attempted suicide, respondents were at 2.3 to 1.7 times the risk of thinking about suicide compared to those who did not. In terms of education compared to people with college degrees, those without were above 2 times the risk. While the religious variables provided some interesting questions for future research, such as multiple days per week of activities increasing the risk of thoughts of suicide, which begs the question of why that might be. However, if religion was important for respondents, they were 44% less likely to be at risk of thoughts of suicide compared to those who reported religion was not important. This considering religion being a guiding force provided no significant power, brings about more questions, and could potentially be removed from the model.

#3Model 2 includes all main effects along with 2 interaction terms, males multiplied by if friends hat attempted suicide, and males by education. In this model, the statistical significance of sex has disappeared, but the statistical and risk magnitude for thoughts of suicide in education greatly decreased. What was once above 2 times the risk compared to people with a college education, is now 5.8, 4, and 3.86 times the risk for people with less than high school, high school, and some college respectively. The magnitude of risk increased by and decreased by .3 for both friends and family making a suicide attempt. With multiple days of church activities per month receiving a negligible boost of about .02%, similar results were found for those who thought religion was important compared to those who did not. Lastly, the interaction terms indicated no statistical predicting power for sex by a friend who had attempted suicide. However, by both being male and having less than, high school, and some college, they were statistically at lower risk compared to females who had similar levels with a college education. However, compared to model one wherein less than high school predicted the highest risk,  both highschool and some college predicted the lowest risk of having future suicidal thoughts.

##Risk profiles were also generated. The child with the highest risk of thoughts of suicide was male with a college education in which case the risk was 3.1 times higher than baseline, with the lowest risk child also being male with a college degree and a 65% lower risk compared to the baseline. The smoothed hazard function generally follows the same pattern, but instead the risk profiles are as follows. As educational attainment decreases chances for suicidal thoughts increases, this pattern is observed in both men and women, with men having lower risk overall.

## Religious variables will be removed from future research, instead using adverse child experiences.

Homework 4

Joshua Reyna

2022-09-22