Homework 2 Event History

library(haven)
library(survival)
library(car)

## Loading required package: carData

library(muhaz)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:car':
## 
##     recode

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(survival)
library(forcats)
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(tidyverse)

## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──

## ✔ ggplot2 3.3.6     ✔ readr   2.1.2
## ✔ tibble  3.1.8     ✔ purrr   0.3.4
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()          masks stats::filter()
## ✖ kableExtra::group_rows() masks dplyr::group_rows()
## ✖ dplyr::lag()             masks stats::lag()
## ✖ dplyr::recode()          masks car::recode()
## ✖ purrr::some()            masks car::some()

library(ggsurvfit)
library(gtsummary)
library(survey)

## Loading required package: grid
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## 
## Attaching package: 'survey'
## 
## The following object is masked from 'package:graphics':
## 
##     dotchart

library(rio)
library(haven)
X21600_0008_Data <- read_sav("21600-0008-Data.sav")
#View(X21600_0008_Data)

X21600_0022_Data <- read_sav("21600-0022-Data.sav")
#View(X21600_0022_Data)

X21600_0001_Data <- read_sav("21600-0001-Data.sav")
#View(X21600_0001_Data)

mergeddata<-dplyr::left_join( X21600_0008_Data, X21600_0001_Data, by=c("AID"= "AID"))
mergeddata2<-dplyr::left_join(mergeddata, X21600_0022_Data, by=c("AID"= "AID"))
rm(X21600_0008_Data,X21600_0022_Data,X21600_0001_Data,mergeddata);gc()

##            used  (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells  2466774 131.8    4397960 234.9  4397960 234.9
## Vcells 31220129 238.2   94593822 721.7 94593019 721.7

## I apologize for the mess, the addhealth gave me alot of trouble, i enlisted samson to help me with the coding as it was proving to be too taxing on my mind. I will clean up the data later.

##1  Define your event variable 
##The Add Health Survey data includes various measures of suicide of particular interest are the following variables H4SE1, H3TO130, and H1SU1 which indicate if the respondent thought about suicide within the last year. As indicated with the code 0= not thought about suicide, while 1= thought about suicide. We are testing to see what people would have repeated instances of having suicidal thoughts. We also wanted to include those who did not have suicidal thoughts at point 1, but would develop them in later waves.

##2 Define a duration or time variable 

##The duration in this case would be a year, as the question is not specific enough with where in the year.

##3 Define a censoring indicator 

##The censoring indicator would consist of several items, primarily if the respondent did not attempt suicide, was missing from the survey, or legitimately skipped the question. Additionally, people were censored if they for some reason did not show up in the survey at different waves. In which case they did not or are assumed not have thought about suicide within the last year. We tried not to censor people within the sample as everyone should ideally be at risk of thinking about suicide.


##Define a grouping variable, this can be dichotomous or categorical.
   ##Below 2 grouping variables have been used, Males as indicated by sex male female= 0, and race wherein coding for 1=White, 2=Black, and due to lower observations across other racial categories 3=other.

 ##Do you have a research hypothesis about the survival patterns for the levels of the categorical variable?  State it.
## It is hypothesized that males, and whites will have higher risk of thinking about suicide in the previous wave, for each wave. Similarly, Whites should have increased chances of having suicidal thoughts.

mergeddata2$sui_1<-Recode(mergeddata2$H1SU1, recodes="1=1; 0=0; else=NA")
mergeddata2$male_1<-Recode(mergeddata2$S2, recodes="1=1; 2=0; else=NA")

mergeddata2$H1GI8 <- as.numeric(mergeddata2$H1GI8)
mergeddata2$race_1<-Recode(mergeddata2$H1GI8, recodes= "1='nwhite'; 
                              2='nhblack'; 3:5='other'; else=NA", as.factor=T)

mergeddata2$sui_3<-Recode(mergeddata2$H3TO130, recodes="1=1; 0=0; else=NA")
mergeddata2$male_3<-Recode(mergeddata2$BIO_SEX3, recodes="1=1; 2=0; else=NA")

mergeddata2$H3IR4 <- as.numeric(mergeddata2$H3IR4)
mergeddata2$race_3<-Recode(mergeddata2$H3IR4, recodes= "1='nwhite'; 
                              2='nhblack'; 3:5='other'; else=NA", as.factor=T)


mergeddata2$sui_4<-Recode(mergeddata2$H4SE1, recodes="1=1; 0=0; else=NA")
mergeddata2$male_4<-Recode(mergeddata2$BIO_SEX4, recodes="1=1; 2=0; else=NA")
mergeddata2$H4IR4 <- as.numeric(mergeddata2$H4IR4)
mergeddata2$race_4<-Recode(mergeddata2$H4IR4, recodes= "1='nwhite'; 
                              2='nhblack'; 3:5='other'; else=NA", as.factor=T)

mergeddata2$age_3<-(mergeddata2$IYEAR3 - mergeddata2$H3OD1Y)
mergeddata2$age_4<-(mergeddata2$IYEAR4 - mergeddata2$H4OD1Y)
mergeddata2$age_1<-(mergeddata2$IYEAR - mergeddata2$H1GI1Y)


myvars<-c( "AID","age_1", "age_3", "age_4", "male_1", "male_3", "male_4",
"race_3", "race_4", "sui_1", "sui_3", "sui_4")
mergeddata2<-mergeddata2[,myvars]

sam <- mergeddata2 %>% 
  filter(complete.cases(.))


sam <- import("e.long2.rds") %>%

  pivot_longer(c(-AID,-race,-male), names_to = c(".value","wave"),names_sep = "_") %>%

  group_by(AID) %>%

  mutate(nextsui=lead(sui,n=1,order_by = AID),

         suiran=ifelse(nextsui==1&sui==0,1,0)) %>%

  filter(!is.na(suiran))

## Warning: `sui_1` and `sui_2` have conflicting value labels.
## ℹ Labels for these values will be taken from `sui_1`.
## ✖ Values: 0, 6, 8, and 9

## Warning: `sui_1` and `sui_3` have conflicting value labels.
## ℹ Labels for these values will be taken from `sui_1`.
## ✖ Values: 0, 6, and 8

#suicide transition


fit <- survfit(Surv(time=as.numeric(wave), event=suiran) ~ 1,

                  conf.type = "log",

                   data = sam)
fit %>%
  ggsurvfit(risk.table=T) + 
  labs(title="Survival Function For Transition To Thoughts Of Suicide Waves 1-3")

## Warning: Ignoring unknown parameters: risk.table

summary(fit)

## Call: survfit(formula = Surv(time = as.numeric(wave), event = suiran) ~ 
##     1, data = sam, conf.type = "log")
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1   6232     127    0.980 0.00179        0.976        0.983
##     2   3207     139    0.937 0.00392        0.930        0.945

##Estimate the survival function for your outcome and plot it
             
fit2<- survfit(Surv(time = as.numeric(wave), event = suiran) ~ race,
data = sam)
summary(fit2)

## Call: survfit(formula = Surv(time = as.numeric(wave), event = suiran) ~ 
##     race, data = sam)
## 
##                 race=nhblack 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1   1548      32    0.979 0.00362        0.972        0.986
##     2    790      27    0.946 0.00723        0.932        0.960
## 
##                 race=nwhite 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1   4378      92    0.979 0.00217        0.975        0.983
##     2   2260     104    0.934 0.00478        0.925        0.943
## 
##                 race=other 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1    306       3     0.99 0.00563        0.979        1.000
##     2    157       8     0.94 0.01818        0.905        0.976

fit3<- survfit(Surv(time = as.numeric(wave), event = suiran) ~ male,
data = sam)
summary(fit3)

## Call: survfit(formula = Surv(time = as.numeric(wave), event = suiran) ~ 
##     male, data = sam)
## 
##                 male=0 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1   3555      63    0.982 0.00221        0.978        0.987
##     2   1836      89    0.935 0.00535        0.924        0.945
## 
##                 male=1 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1   2677      64    0.976 0.00295        0.970        0.982
##     2   1371      50    0.940 0.00570        0.929        0.952

##Carry out the following analysis

    ##Kaplan-Meier survival analysis of the outcome
    
    ##Comparison of Kaplan-Meier survival across grouping variables in your data. Interpret your results.
    ##Plot the survival function for the analysis for each level of the group variable



fit2 %>%
ggsurvfit(risk.table = T,
ylim = c(.5, 1)) +
labs(title="Survival Function For Transition To Thoughts Of Suicide Waves 1-3",
subtitle ="By Race" )

## Warning: Ignoring unknown parameters: risk.table, ylim

fit3 %>%
ggsurvfit(risk.table = T,
ylim = c(.5, 1)) +
labs(title="Survival Function For Transition To Thoughts Of Suicide Waves 1-3",
subtitle ="By Sex" )

## Warning: Ignoring unknown parameters: risk.table, ylim

## For simplicity, the waves were renamed on the graphs to 1,2, and 3, when they actually reflect 1,3, and 4 respectively. It was questioned how thoughts of suicide might be prevalent across different waves in the Add Health Data. From waves 1 to 3 it has been estimated that about 2% of the respondents developed thoughts of suicide during this time. However, these odds did increase over time, for instance between waves 3 and 4 this percentage has increased to about 6% of respondents an increase of about 3 times. It is evident more people are experiencing the event over time.

## A comparison in thoughts of suicide by groups such as race and sex were then performed. In terms of race it appears that Blacks and Whites have near comparable chances of experiencing thoughts of suicide in wave one, and into wave three. From wave three to four this line then diverges to the point Blacks then have the highest chances of not having these thoughts. With Other races having the lowest odds from wave 1 to 3, and between Blacks and Whites from waves 3 to 4. With, Whites having highest chances of having suicidal thoughts during these time periods. Which is shown by the summary statistics, in Blacks the thoughts went down between the two time points, with whites having a marked increase in thoughts. Others have a negligible increase in thoughts. In terms of sex, females have lower chances of having suicidal thoughts from wave 3 to four, while men have those odds wave 1 to 3. Interestingly comparing the two statistics indicates that males actually had lower instances of suicidal thoughts in the waves 3 to 4, but also have much less people in that group compared to females.

Homework 2 Event History

Joshua Reyna

2022-09-10