nhis <-read_ipums_ddi("C:/Users/ozd504/OneDrive - University of Texas at San Antonio/classes/dem7223/dem7223_22/data/nhis_00015.xml")nhis_dat <-read_ipums_micro(nhis)
Use of data from IPUMS NHIS is subject to conditions including that users
should cite the data appropriately. Use command `ipums_conditions()` for more
details.
Code
nhis_dat <- haven::zap_labels(nhis_dat)
Filter only NHIS respondents who are eligible for mortality follow up:
Code
nhis_dat <- nhis_dat %>%filter(MORTELIG ==1)
Age at death
This example shows the use of an age-at-death outcome. If the respondent died during the follow up period, their age at death is the difference between their year of mortality and their year of birth, calculated as the year of the survey minus their age.
The current year of mortality follow up is 2014, that is the censoring year for mortality
This example shows the use of a time-to-death outcome. If the respondent died within 5 years of the survey, they are coded as experiencing the event, they are otherwise censored.
ggsurvfit() will give us the \(F(t)\), or the cumulative distribution function, and \(f(t) = \frac{dF(t)}{dt}\), so we can use this to get the hazard function:
Rows: 101 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): age
dbl (3): age_n, qx, ex
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
test<- test%>%filter(x!=0& x>=18& x<=100)mdat<-left_join(test, uslt14, by=c("x"="age_n"))mdat%>%ggplot()+aes(x=x, y=haz) +geom_line(color ="blue")+ylim(0, .3)+geom_line(aes(x=x, y = qx), color ="green")
---title: "IPUMS - NHIS Mortality Example"author: "Corey Sparks, PhD"format: html: code-fold: true code-tools: true code-link: true df-print: paged self-contained: trueeditor: visual---This example is intended to show the basics of coding mortality and censoring using the [IPUMS NHIS](https://nhis.ipums.org/nhis/) mortality data.## Read in IPUMS IHIS extract```{r}library(tidyverse)library(ipumsr)``````{r}nhis <-read_ipums_ddi("C:/Users/ozd504/OneDrive - University of Texas at San Antonio/classes/dem7223/dem7223_22/data/nhis_00015.xml")nhis_dat <-read_ipums_micro(nhis)nhis_dat <- haven::zap_labels(nhis_dat)```Filter only NHIS respondents who are eligible for mortality follow up:```{r}nhis_dat <- nhis_dat %>%filter(MORTELIG ==1)```## Age at deathThis example shows the use of an age-at-death outcome. If the respondent died during the follow up period, their age at death is the difference between their year of mortality and their year of birth, calculated as the year of the survey minus their age.The current year of mortality follow up is **2014**, that is the censoring year for mortality```{r}nhis_dat <- nhis_dat %>%mutate(death_age =ifelse( MORTSTAT ==1, MORTDODY - (YEAR - AGE), 2014- (YEAR - AGE)), d.event =ifelse(MORTSTAT ==1, 1, 0))library(survival)age_fit <-survfit(Surv(death_age, d.event) ~1, data = nhis_dat)library(ggsurvfit)age_fit %>%ggsurvfit() +add_confidence_interval(type ="ribbon") +add_quantile() ```## Time to deathThis example shows the use of a time-to-death outcome. If the respondent died within 5 years of the survey, they are coded as experiencing the event, they are otherwise censored.```{r}nhis_dat <- nhis_dat %>%mutate(death_time =ifelse( MORTSTAT ==1, MORTDODY - YEAR , 2014- YEAR ), d5.event =ifelse(MORTSTAT ==1& death_time <=5, 1, 0))library(survival)time_fit <-survfit(Surv(death_time, d5.event) ~1, conf.type ="log",data = nhis_dat)library(ggsurvfit)time_fit %>%ggsurvfit()+xlim(-1, 6) +add_censor_mark() +add_confidence_interval()+add_quantile(y_value = .975) ```## Estimating the hazard function from the survival function```{r}age_fit%>% broom::tidy() %>%select(time, estimate) %>%mutate(cumhaz =-log(estimate)) %>%mutate(haz =c(0, diff(cumhaz)))%>%ggplot()+aes(x=time, y=haz)+geom_line()```## Alternative way to get hazardRemeber $$h(t) = \frac{f(t)}{S(t)}$$`ggsurvfit()` will give us the $F(t)$, or the cumulative distribution function, and $f(t) = \frac{dF(t)}{dt}$, so we can use this to get the hazard function:```{r}p<-age_fit %>%ggsurvfit(type ="risk") test<-ggplot_build(p)$data %>%as.data.frame() %>%mutate(ft =c(0, diff(y)))%>%mutate(st =1-y)%>%mutate(haz = ft/st)test %>%ggplot() +aes(x = x, y = ft) +geom_line()```## Comparison to $q_x$ from life tableThis is the [2014 US life table](https://www.cdc.gov/nchs/data/nvsr/nvsr66/nvsr66_04.pdf) from NCHS```{r}uslt14<-read_csv("../data/uslt_2014.csv")test<- test%>%filter(x!=0& x>=18& x<=100)mdat<-left_join(test, uslt14, by=c("x"="age_n"))mdat%>%ggplot()+aes(x=x, y=haz) +geom_line(color ="blue")+ylim(0, .3)+geom_line(aes(x=x, y = qx), color ="green")```