Event History Analysis HW 2

For this homework I will use Kaplan-Meier estimation to create and plot survival functions and make comparisons between groups.

library(haven)
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(car)
Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some
loki <- read_stata("C:/Users/maman/OneDrive - University of Texas at San Antonio/Event History Analysis Data/nhis_00002.dta.gz")
View(loki)
loki <- haven::zap_labels(loki)

loki <- loki %>%
  filter(mortelig == 1)

loki_16 <- loki %>% 
  filter(loki$year == 2016)

For this project data from the National Health Interview Survey (NHIS) will be used.

1. Define the event variable

The event variable that will be used in this analysis is an individuals final mortality status, labeled “mortstat”.

2. Define a duration or time variable

The time variable will be the age in years of the respondents

3. Define a censoring indicator

The variable “mortelig” will be used to filter for respondents who are eligible for the mortality follow-up (non-children). The year of the survey, 2016, is the censoring year for mortality.

4. Estimate the survival function of your outcome and plot it.

The plotted survival function for the age-at-death for the 2016 NHIS with mortality follow-up is plotted below:

loki_16 <- loki_16 %>%
  mutate(death_age = ifelse( mortstat ==1, 
                             mortdody - (year - age), 
                             2016 - (year - age)), 
         d.event = ifelse(mortstat == 1, 1, 0))

library(survival)

age_fit <- survfit(Surv(death_age, d.event) ~ 1, 
                   data = loki_16)

library(ggsurvfit)

age_fit %>%
  ggsurvfit() +
  add_confidence_interval(type = "ribbon") +
  add_quantile() 

5a. Perform a Kaplan-Meier survival analysis of the outcome.

summary(age_fit)
Call: survfit(formula = Surv(death_age, d.event) ~ 1, data = loki_16)

 time n.risk n.event survival  std.err lower 95% CI upper 95% CI
   20  31625       1    1.000 3.16e-05        1.000        1.000
   21  31190       2    1.000 5.53e-05        1.000        1.000
   22  30759       1    1.000 6.41e-05        1.000        1.000
   24  29957       2    1.000 7.96e-05        1.000        1.000
   25  29504       2    1.000 9.29e-05        1.000        1.000
   26  29013       1    1.000 9.91e-05        1.000        1.000
   27  28544       3    1.000 1.16e-04        0.999        1.000
   28  28036       1    1.000 1.22e-04        0.999        1.000
   30  27034       2    0.999 1.32e-04        0.999        1.000
   32  26041       1    0.999 1.38e-04        0.999        1.000
   33  25558       1    0.999 1.43e-04        0.999        1.000
   34  25038       1    0.999 1.49e-04        0.999        1.000
   36  23962       2    0.999 1.60e-04        0.999        1.000
   37  23431       2    0.999 1.71e-04        0.999        1.000
   38  22956       3    0.999 1.87e-04        0.999        0.999
   39  22458       3    0.999 2.02e-04        0.999        0.999
   40  22005       4    0.999 2.21e-04        0.998        0.999
   41  21569       7    0.998 2.53e-04        0.998        0.999
   42  21136       2    0.998 2.62e-04        0.998        0.999
   43  20691       1    0.998 2.66e-04        0.998        0.999
   44  20249       2    0.998 2.75e-04        0.998        0.999
   45  19784       2    0.998 2.84e-04        0.998        0.999
   46  19281      12    0.997 3.36e-04        0.997        0.998
   47  18752       2    0.997 3.44e-04        0.997        0.998
   48  18274       5    0.997 3.65e-04        0.996        0.998
   49  17839       4    0.997 3.82e-04        0.996        0.998
   50  17357       9    0.996 4.19e-04        0.996        0.997
   51  16861      11    0.996 4.62e-04        0.995        0.997
   52  16347       9    0.995 4.96e-04        0.994        0.996
   53  15782      14    0.994 5.49e-04        0.993        0.995
   54  15172       8    0.994 5.79e-04        0.993        0.995
   55  14615      13    0.993 6.29e-04        0.992        0.994
   56  14079      13    0.992 6.77e-04        0.991        0.993
   57  13494      17    0.991 7.41e-04        0.989        0.992
   58  12888       9    0.990 7.76e-04        0.988        0.992
   59  12328      13    0.989 8.27e-04        0.987        0.991
   60  11735      21    0.987 9.11e-04        0.985        0.989
   61  11130      22    0.985 1.00e-03        0.983        0.987
   62  10561      12    0.984 1.05e-03        0.982        0.986
   63   9984      19    0.982 1.13e-03        0.980        0.984
   64   9379      25    0.980 1.24e-03        0.977        0.982
   65   8820      27    0.977 1.37e-03        0.974        0.979
   66   8214      23    0.974 1.48e-03        0.971        0.977
   67   7668      33    0.970 1.64e-03        0.966        0.973
   68   7080      40    0.964 1.85e-03        0.961        0.968
   69   6449      24    0.961 1.98e-03        0.957        0.965
   70   5884      34    0.955 2.19e-03        0.951        0.959
   71   5408      40    0.948 2.44e-03        0.943        0.953
   72   4997      43    0.940 2.72e-03        0.935        0.945
   73   4579      35    0.933 2.95e-03        0.927        0.938
   74   4142      41    0.923 3.26e-03        0.917        0.930
   75   3795      36    0.915 3.54e-03        0.908        0.922
   76   3441      31    0.906 3.80e-03        0.899        0.914
   77   3124      37    0.896 4.15e-03        0.888        0.904
   78   2837      31    0.886 4.46e-03        0.877        0.895
   79   2549      40    0.872 4.90e-03        0.862        0.882
   80   2274      33    0.859 5.30e-03        0.849        0.870
   81   2037      39    0.843 5.82e-03        0.832        0.854
   82   1799      32    0.828 6.29e-03        0.816        0.840
   83   1589      39    0.808 6.93e-03        0.794        0.821
   84   1385      33    0.788 7.53e-03        0.774        0.803
   85   1201      84    0.733 9.09e-03        0.716        0.751
   86    397     135    0.484 1.84e-02        0.449        0.521
   87    262     135    0.235 1.74e-02        0.203        0.271
   88    127     127    0.000      NaN           NA           NA

5b. Define a grouping variable.

United States citizenship status (yes/no) will be used as a grouping variable.

5c. State a research hypothesis about the survival patterns by the grouping variable.

People who are U.S. citizens will have a higher survival rate compared to those who are not citizens.

5d. Compare Kaplan-Meier survival across grouping variables in your data. Interpret your results.

Below are the Kaplan-Meier estimates by citizenship status:

loki_16$cit <- Recode(loki_16$citizen,
                      recodes= "1='no'; 2='yes'; else=NA")

fit <-survfit(Surv(death_age, d.event)~cit, data=loki_16)

summary(fit)
Call: survfit(formula = Surv(death_age, d.event) ~ cit, data = loki_16)

36 observations deleted due to missingness 
                cit=no 
 time n.risk n.event survival  std.err lower 95% CI upper 95% CI
   20   1902       1    0.999 0.000526       0.9984        1.000
   39    947       1    0.998 0.001178       0.9961        1.000
   43    757       1    0.997 0.001767       0.9936        1.000
   46    630       1    0.996 0.002369       0.9909        1.000
   48    561       1    0.994 0.002956       0.9880        1.000
   50    492       1    0.992 0.003574       0.9847        0.999
   52    435       1    0.989 0.004231       0.9812        0.998
   53    404       1    0.987 0.004878       0.9775        0.997
   55    357       1    0.984 0.005593       0.9733        0.995
   56    332       1    0.981 0.006313       0.9690        0.994
   59    268       1    0.978 0.007274       0.9634        0.992
   61    224       1    0.973 0.008450       0.9568        0.990
   71     95       1    0.963 0.013182       0.9375        0.989
   72     86       1    0.952 0.017137       0.9188        0.986
   76     60       1    0.936 0.023052       0.8918        0.982
   78     50       1    0.917 0.029219       0.8617        0.976
   81     28       1    0.884 0.042762       0.8045        0.972
   83     22       1    0.844 0.056647       0.7402        0.963
   85     15       2    0.732 0.088889       0.5767        0.928
   86      3       1    0.488 0.207770       0.2117        1.000
   87      2       1    0.244 0.201332       0.0484        1.000
   88      1       1    0.000      NaN           NA           NA

                cit=yes 
 time n.risk n.event survival  std.err lower 95% CI upper 95% CI
   21  29274       2    1.000 4.83e-05        1.000        1.000
   22  28874       1    1.000 5.94e-05        1.000        1.000
   24  28141       2    1.000 7.78e-05        1.000        1.000
   25  27728       2    1.000 9.30e-05        1.000        1.000
   26  27284       1    1.000 1.00e-04        1.000        1.000
   27  26855       3    1.000 1.19e-04        0.999        1.000
   28  26398       1    1.000 1.25e-04        0.999        1.000
   30  25501       2    0.999 1.37e-04        0.999        1.000
   32  24608       1    0.999 1.42e-04        0.999        1.000
   33  24193       1    0.999 1.48e-04        0.999        1.000
   34  23720       1    0.999 1.54e-04        0.999        1.000
   36  22781       2    0.999 1.66e-04        0.999        1.000
   37  22330       2    0.999 1.78e-04        0.999        1.000
   38  21918       3    0.999 1.95e-04        0.999        0.999
   39  21495       2    0.999 2.05e-04        0.999        0.999
   40  21089       4    0.999 2.26e-04        0.998        0.999
   41  20702       7    0.998 2.60e-04        0.998        0.999
   42  20317       2    0.998 2.69e-04        0.998        0.999
   44  19518       2    0.998 2.78e-04        0.998        0.999
   45  19095       2    0.998 2.88e-04        0.998        0.999
   46  18641      11    0.998 3.38e-04        0.997        0.998
   47  18151       2    0.997 3.47e-04        0.997        0.998
   48  17704       4    0.997 3.65e-04        0.996        0.998
   49  17302       4    0.997 3.82e-04        0.996        0.998
   50  16856       8    0.996 4.17e-04        0.996        0.997
   51  16390      11    0.996 4.63e-04        0.995        0.997
   52  15904       8    0.995 4.96e-04        0.994        0.996
   53  15370      13    0.994 5.47e-04        0.993        0.996
   54  14785       8    0.994 5.79e-04        0.993        0.995
   55  14254      12    0.993 6.27e-04        0.992        0.994
   56  13744      12    0.992 6.75e-04        0.991        0.994
   57  13184      17    0.991 7.42e-04        0.990        0.992
   58  12596       9    0.990 7.78e-04        0.989        0.992
   59  12057      12    0.989 8.27e-04        0.988        0.991
   60  11486      21    0.987 9.15e-04        0.986        0.989
   61  10903      21    0.986 1.00e-03        0.984        0.988
   62  10349      12    0.984 1.05e-03        0.982        0.986
   63   9783      19    0.983 1.14e-03        0.980        0.985
   64   9192      25    0.980 1.26e-03        0.977        0.982
   65   8648      27    0.977 1.38e-03        0.974        0.979
   66   8055      23    0.974 1.50e-03        0.971        0.977
   67   7534      33    0.970 1.66e-03        0.966        0.973
   68   6952      40    0.964 1.87e-03        0.960        0.968
   69   6330      24    0.960 2.01e-03        0.957        0.964
   70   5773      34    0.955 2.22e-03        0.950        0.959
   71   5313      39    0.948 2.47e-03        0.943        0.953
   72   4911      42    0.940 2.75e-03        0.934        0.945
   73   4503      35    0.932 2.99e-03        0.927        0.938
   74   4072      41    0.923 3.30e-03        0.917        0.930
   75   3731      36    0.914 3.59e-03        0.907        0.921
   76   3381      30    0.906 3.85e-03        0.898        0.914
   77   3070      37    0.895 4.20e-03        0.887        0.903
   78   2787      30    0.885 4.51e-03        0.877        0.894
   79   2506      40    0.871 4.96e-03        0.862        0.881
   80   2238      33    0.858 5.37e-03        0.848        0.869
   81   2009      38    0.842 5.88e-03        0.831        0.854
   82   1776      32    0.827 6.35e-03        0.815        0.840
   83   1567      38    0.807 6.98e-03        0.793        0.821
   84   1367      33    0.788 7.59e-03        0.773        0.803
   85   1186      82    0.733 9.14e-03        0.715        0.751
   86    394     134    0.484 1.85e-02        0.449        0.521
   87    260     134    0.234 1.75e-02        0.203        0.271
   88    126     126    0.000      NaN           NA           NA

A statistical test for differences between groups is shown below. The results suggest that there is no statistical difference in the survival rates between citizens and non-citizens.

survdiff(Surv(death_age,d.event)~cit,data=loki_16)
Call:
survdiff(formula = Surv(death_age, d.event) ~ cit, data = loki_16)

n=32451, 36 observations deleted due to missingness.

            N Observed Expected (O-E)^2/E (O-E)^2/V
cit=no   1935       23     23.3  4.31e-03   0.00484
cit=yes 30516     1433   1432.7  7.02e-05   0.00484

 Chisq= 0  on 1 degrees of freedom, p= 0.9 

5e. Plot the survival function for the analysis for each level of the group variable

fit%>% ggsurvfit() + labs(title = "Mortality survival plot", subtitle = "By Citizenship Status", x = "Years", y = "Survival Probability")