Introduction and Data

According to the source, a large HMO wished to evaluate the survival time of its HIV+ members using a follow-up study. Subjects were enrolled in the study from January 1,1989 to December 31,1991. The study ended on December 31,1995. After a confirmed diagnosis of HIV, members were followed until death due to AIDS-related complications, until the end of the study or until the subject was lost to follow-up. We assumed that there were no deaths to the other causes. The primary outcome variable of interest is survival time after a confirmed diagnosis of HIV. Since subjects entered the study at different time over a 3-year period, the maximum possible follow-up time is different for each study participant. Possible predictors of survival time were collected at enrollment into the study.

Variables are the following:

TIME = the follow-up time is the number of months between the entry date and end date

AGE= the age of the subject at the start of follow-up (in years)

DRUG= history of prior IV drug use (1= Yes, 0=No)

CENSOR= vital status at the end of the study (1=death due to AIDS, 0= lost of follow-up or alive)


columname <- c("ID", "StartDate","EndDate","Age","Drug","Censor")
type_col <- c("integer", rep("factor",2),rep("integer",3)) 
hmohiv <- read.csv("/Users/azizur/Desktop/stat 755/others work/hmohiv .csv")
head(hmohiv)
##   ID        StartDate           EndDate Age Drug Censor
## 1  1 5/15/90 12:00 AM 10/14/90 12:00 AM  46    0      1
## 2  2 9/19/89 12:00 AM  3/20/90 12:00 AM  35    1      0
## 3  3 4/21/91 12:00 AM 12/20/91 12:00 AM  30    1      1
## 4  4  1/3/91 12:00 AM   4/4/91 12:00 AM  30    1      1
## 5  5 9/18/89 12:00 AM  7/19/91 12:00 AM  36    0      1
## 6  6 3/18/91 12:00 AM  4/17/91 12:00 AM  32    1      0

Non parametric estimation of the survival function

a) Kaplan-Maier Survival Curve with 95% Confidence Interval

library(survival)
library(date)
Start.d <- as.Date(hmohiv$StartDate, format = "%m/%d/%Y")
End.d <- as.Date(hmohiv$EndDate, format = "%m/%d/%Y")
Survival_days <- difftime(End.d, Start.d)

hmohiv.km <- survfit(Surv(Survival_days,hmohiv$Censor)~1, conf.type="log-log")

plot(hmohiv.km, xlab="time (days)", ylab="survival probability",
                main="KME of survival probability for HMO-HIV+ data",
     conf.int=T)

b)The median survival and a Confidence Interval

hmohiv.km
## Call: survfit(formula = Surv(Survival_days, hmohiv$Censor) ~ 1, conf.type = "log-log")
## 
##       n  events  median 0.95LCL 0.95UCL 
##     100      80     212     152     273

We can see that the median survival time is 212 days and a 95% condidence interval ranges from 152 to 273 days


Comparison of two survival distributions:

A patient’s history of IV drug use may be a risk factor for survival. Compare the estimated survival curves for the two groups: users (DRUG=1) and non-users (DRUG=0).

Survival_month <- Survival_days/30.5
drug <- factor(hmohiv$Drug)
levels(drug) <- list(No_Drug = as.character(0),Drug = as.character(1))
library(tibble)
hmohiv1 <- add_column(hmohiv,drug, .after = 6)

plot(survfit(Surv(Survival_month) ~ drug), xlab="Time in months",
     ylab="Survival probability", col=c("blue", "red"), lwd=2)
legend("topright", legend=c("Drug", "No Drug"),
       col=c("blue","red"), lwd=2)

YES. Non-users of IV drug have significantly different survival experience than the users.


References