Data: IPUMS NHIS
#libraries
library (haven)
library (survival)
library (car)
Loading required package: carData
── Attaching packages
───────────────────────────────────────
tidyverse 1.3.2 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.4
✔ tibble 3.1.8 ✔ dplyr 1.0.10
✔ tidyr 1.2.1 ✔ stringr 1.4.1
✔ readr 2.1.2 ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
✖ dplyr::recode() masks car::recode()
✖ purrr::some() masks car::some()
library (ipumsr)
ddi <- read_ipums_ddi ("nhis_00004.xml" )
data_nhis<- read_ipums_micro (ddi)
Use of data from IPUMS NHIS is subject to conditions including that users
should cite the data appropriately. Use command `ipums_conditions()` for more
details.
data_nhis<- haven:: zap_labels (data_nhis)
nams<- names (data_nhis)
head (nams,n= 20 )
[1] "YEAR" "SERIAL" "STRATA" "PSU" "NHISHID"
[6] "HHWEIGHT" "PERNUM" "NHISPID" "HHX" "FMX"
[11] "PX" "PERWEIGHT" "SAMPWEIGHT" "FWEIGHT" "ASTATFLG"
[16] "CSTATFLG" "OWNERSHIP" "MORTELIG" "MORTSTAT" "MORTDODY"
newnames<- tolower (gsub (pattern = "_" ,replacement = "" ,x = nams))
names (data_nhis)<- newnames
recode
data_nhis <- data_nhis %>%
filter (mortelig == 1 , mortdody< 9999 )
library (dplyr)
data_nhis$ home<- Recode (data_nhis$ ownership, recodes= "10:12='own'; 20='rent'; else=NA" , as.factor= T)
summary (data_nhis$ home)
own rent NA's
8506 2388 363
Comparing Survival Times Between Groups
Define a duration or time variable
Duration/Time Variable
Time variable - year 2002
Event - Death
Define a censoring indicator:
Year of death
Estimate the survival function for your outcome and plot it
Survival Function
data_nhis <- data_nhis %>%
mutate (death_time = ifelse ( mortelig == 1 ,
mortdody - year ,
2002 - year ),
d5.event = ifelse (mortelig == 1 & death_time <= 5 , 1 , 0 ))
library (survival)
time_fit <- survfit (Surv (death_time, d5.event) ~ 1 ,
conf.type = "log" ,
data = data_nhis)
library (ggsurvfit)
time_fit %>%
ggsurvfit ()+
xlim (- 1 , 6 ) +
add_censor_mark () +
add_confidence_interval ()+
add_quantile (y_value = .975 )
Warning: Removed 11 row(s) containing missing values (geom_path).
Carry out the following analysis:
Kaplan-Meier Survival Analysis Outcome
Call: survfit(formula = Surv(death_time, d5.event) ~ 1, data = data_nhis,
conf.type = "log")
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 11257 240 0.979 0.00136 0.976 0.981
1 11017 519 0.933 0.00236 0.928 0.937
2 10498 553 0.883 0.00302 0.878 0.889
3 9945 624 0.828 0.00356 0.821 0.835
4 9321 578 0.777 0.00393 0.769 0.784
5 8743 602 0.723 0.00422 0.715 0.732
Define a grouping variable (dichotomous/categorical)
The grouping variable is Housing/Home ownership (own or rent). The variable is categorical.
Research hypothesis about the survival patterns for the levels of the variable
Hypothesis: Individuals in rental housing will have a lower survival status than individuals who own a home.
Comparison of Kaplan-Meier survival across grouping variables , interpret results.
x-axis is time in year, y-axis is survival probability.
At one year, the survival probability is 0.933. At three years survival probability rate drops to .828.
Plot the survival function for the analysis for each level of the group variable.
survdiff (Surv (year, d5.event)~ home, data= data_nhis)
Call:
survdiff(formula = Surv(year, d5.event) ~ home, data = data_nhis)
n=10894, 363 observations deleted due to missingness.
N Observed Expected (O-E)^2/E (O-E)^2/V
home=own 8506 2321 2339 0.143 0.898
home=rent 2388 675 657 0.508 0.898
Chisq= 0.9 on 1 degrees of freedom, p= 0.3
library (ggsurvfit)
newfit <- survfit (Surv (death_time, d5.event) ~ home,
conf.type = "log" ,
data = data_nhis)
summary (newfit)
Call: survfit(formula = Surv(death_time, d5.event) ~ home, data = data_nhis,
conf.type = "log")
363 observations deleted due to missingness
home=own
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 8506 192 0.977 0.00161 0.974 0.981
1 8314 390 0.932 0.00274 0.926 0.937
2 7924 398 0.885 0.00346 0.878 0.892
3 7526 460 0.831 0.00407 0.823 0.839
4 7066 426 0.781 0.00449 0.772 0.789
5 6640 455 0.727 0.00483 0.718 0.737
home=rent
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 2388 39 0.984 0.00259 0.979 0.989
1 2349 114 0.936 0.00501 0.926 0.946
2 2235 136 0.879 0.00667 0.866 0.892
3 2099 141 0.820 0.00786 0.805 0.835
4 1958 121 0.769 0.00862 0.753 0.786
5 1837 124 0.717 0.00921 0.700 0.736
newfit %>%
ggsurvfit ()+
xlim (- 1 , 6 ) +
add_censor_mark () +
add_confidence_interval ()+
add_quantile (y_value = .975 )
Warning: Removed 22 row(s) containing missing values (geom_path).
Results indicate there is a small difference in survival probability between individuals who own a home and rent. Individuals who own a home have a slightly higher survival probability than those who rent.
Source:
Lynn A. Blewett, Julia A. Rivera Drew, Miriam L. King, Kari C.W. Williams, Natalie Del Ponte and Pat Convey. IPUMS Health Surveys: National Health Interview Survey, Version 7.1 [dataset]. Minneapolis, MN: IPUMS, 2021.
https://doi.org/10.18128/D070.V7.1