IPUMS - NHIS Mortality Example

Author

Corey Sparks, PhD

This example is intended to show the basics of coding mortality and censoring using the IPUMS NHIS mortality data.

Read in IPUMS IHIS extract

Code

library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.0      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Code

library(ipumsr)

Code

nhis <- read_ipums_ddi("C:/Users/ozd504/OneDrive - University of Texas at San Antonio/classes/dem7223/dem7223_22/data/nhis_00015.xml")

nhis_dat <- read_ipums_micro(nhis)

Use of data from IPUMS NHIS is subject to conditions including that users
should cite the data appropriately. Use command `ipums_conditions()` for more
details.

Code

nhis_dat <- haven::zap_labels(nhis_dat)

Filter only NHIS respondents who are eligible for mortality follow up:

Code

nhis_dat <- nhis_dat %>%
  filter(MORTELIG == 1)

Age at death

This example shows the use of an age-at-death outcome. If the respondent died during the follow up period, their age at death is the difference between their year of mortality and their year of birth, calculated as the year of the survey minus their age.

The current year of mortality follow up is 2014, that is the censoring year for mortality

Code

nhis_dat <- nhis_dat %>%
  mutate(death_age = ifelse( MORTSTAT ==1, 
                             MORTDODY - (YEAR - AGE), 
                             2014 - (YEAR - AGE)), 
         d.event = ifelse(MORTSTAT == 1, 1, 0))

library(survival)

age_fit <- survfit(Surv(death_age, d.event) ~ 1, 
                   data = nhis_dat)

library(ggsurvfit)

age_fit %>%
  ggsurvfit() +
  add_confidence_interval(type = "ribbon") +
  add_quantile()

Time to death

This example shows the use of a time-to-death outcome. If the respondent died within 5 years of the survey, they are coded as experiencing the event, they are otherwise censored.

Code

nhis_dat <- nhis_dat %>%
  mutate(death_time = ifelse( MORTSTAT ==1, 
                             MORTDODY - YEAR , 
                             2014 - YEAR ), 
         d5.event = ifelse(MORTSTAT == 1 & death_time <= 5, 1, 0))

library(survival)

time_fit <- survfit(Surv(death_time, d5.event) ~ 1, 
                  conf.type = "log",
                   data = nhis_dat)

library(ggsurvfit)

time_fit %>%
  ggsurvfit()+
  xlim(-1, 6) +
  add_censor_mark() +
  add_confidence_interval()+
  add_quantile(y_value = .975)

Warning: Removed 23 row(s) containing missing values (geom_path).

Estimating the hazard function from the survival function

Code

age_fit%>%
 broom::tidy() %>%
  select(time, estimate) %>%
  mutate(cumhaz = -log(estimate)) %>%
  mutate(haz = c(0, diff(cumhaz)))%>%
  ggplot()+
  aes(x=time, y=haz)+
  geom_line()

Alternative way to get hazard

Remeber \[h(t) = \frac{f(t)}{S(t)}\]

ggsurvfit() will give us the \(F(t)\), or the cumulative distribution function, and \(f(t) = \frac{dF(t)}{dt}\), so we can use this to get the hazard function:

Code

p<-age_fit %>%
  ggsurvfit(type = "risk") 

test<- ggplot_build(p)$data  %>%
  as.data.frame() %>%
  mutate(ft = c(0, diff(y)))%>%
  mutate(st = 1-y)%>%
  mutate(haz = ft/st)

test %>%
  ggplot() + 
  aes(x = x, y = ft) + 
  geom_line()

Comparison to \(q_x\) from life table

This is the 2014 US life table from NCHS

Code

uslt14<- read_csv("../data/uslt_2014.csv")

Rows: 101 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): age
dbl (3): age_n, qx, ex

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

test<- test%>%
  filter(x!=0 & x>=18 & x<=100)

mdat<- left_join(test, uslt14, by=c("x"="age_n"))

mdat%>%
  ggplot()+aes(x=x, y=haz) +
  geom_line(color = "blue")+
  ylim(0, .3)+
  geom_line(aes(x=x, y = qx), color = "green")

Warning: Removed 2 row(s) containing missing values (geom_path).