Problem 1

Consider the survival data given in the following table:

Year of follow-up Number alive at the beginning of interval Number dying in the interval
0-1 1100 240
1-2 860 180
2-3 680 185
3-4 495 137
4-5 358 118
5-6 240 60
6-7 180 52
7-8 128 44
8-9 84 32
more than 9 52 28

i.) Compute the estimated survival function, the probability density function and the hazard function.

ii.) Also compute acturial estimate of hazard rate and compare with previous one.

iii.) Estimate median survival time from the graph.

Solution:

Given data:

b_y = 0:9
e_y = c(1:9,Inf)
n_b = c(1100,860,680,495,358,240,180,128,84,52)
n_d = c(240,180,185,137,118,60,52,44,32,28)
df = data.frame(year_beg = b_y, year_e = e_y, alive_beg = n_b, die_int = n_d )
df
##    year_beg year_e alive_beg die_int
## 1         0      1      1100     240
## 2         1      2       860     180
## 3         2      3       680     185
## 4         3      4       495     137
## 5         4      5       358     118
## 6         5      6       240      60
## 7         6      7       180      52
## 8         7      8       128      44
## 9         8      9        84      32
## 10        9    Inf        52      28
df$Surv = df$alive_beg/1100
df$Pdf = c(-diff(df$Surv),NA)
df$Haz =  df$Pdf/df$Surv
df$Act_Haz = df$die_int/(df$alive_beg - df$die_int/2)

Survival Curve, PDF CURV

plot(df$year_beg,df$Surv,main = "Survival Curve", xlab = "Survival Times(t)",ylab = "Survival Function S(t)")
segments(df$year_beg,df$Surv,df$year_e,df$Surv)
segments(df$year_e,df$Surv,df$year_beg[-1],df$Surv[-1], lty = 4)
abline(h = 0.5, lty = 2, col = "red")

Problem 2

Suppose in a research design one particular treatment was administered to the patients and enty or exit of patients at any time throughout the course of the study was permissible. The experimenter stopped the study after a period of 500 days. Their survival times (in days) and event status of patients (i.e. censored/ uncensored) were recorded in the following table.

Patients Suvival Times (in days) Event status of patients
1 50 event occured
2 100 event occured
3 150 event occured
4 200 lost to follow-up
5 250 event occured
6 300 lost to follow-up
7 350 event occured
8 400 event occured
9 450 event not occured at the end of the study
10 500 event not occured at the end of the study
  1. Usind Kaplan-Meier method, estimate the survival function for the given data.

  2. Plot the survival function based on the estimates obtained in (i)

  3. Assuming the survival time follows exponential distribution with survival rate $\lambda$, estimate the mean survival time.

Solution:

data = data.frame(patients = 1:10)
data$surv_times = seq(50,500,50)
data$delta = rep(c(1,0,1,0,1,0),c(3,1,1,1,2,2))
m = 1
for(i in 1:10){
  if(data$delta[i] == 1){
    data$r[i] = data$patients[i]
    data$p[i] = round((10-data$r[i])/(10-data$r[i] + 1),4)
    m = data$p[i] * m 
    #print(c(i,data$p[i],m))
    data$St[i] = m
  } else{
    data$r[i] = NA
    data$p[i] = NA
    data$St[i] = NA
  }
}
data
##    patients surv_times delta  r      p        St
## 1         1         50     1  1 0.9000 0.9000000
## 2         2        100     1  2 0.8889 0.8000100
## 3         3        150     1  3 0.8750 0.7000087
## 4         4        200     0 NA     NA        NA
## 5         5        250     1  5 0.8333 0.5833173
## 6         6        300     0 NA     NA        NA
## 7         7        350     1  7 0.7500 0.4374880
## 8         8        400     1  8 0.6667 0.2916732
## 9         9        450     0 NA     NA        NA
## 10       10        500     0 NA     NA        NA

Plot Survival Curve

val = data[!is.na(data$St),c(2,6)]
val = rbind(c(0,1),val)
val = cbind(val,end = c(val$surv_times[-1],NA))
plot(val$surv_times,val$St, main = "Survival Curve",xlab = "Time(t)",ylab="Survival Function S(t)")
segments(val$surv_times,val$St,val$end,val$St)
segments(val$end,val$St,val$surv_times[-1],val$St[-1],lty=3)

When we assume that the given survival times(T) follows an exponential distribution then the corresponding likelihood is given by:

\[ L \propto \prod_{i=1}^{10} \left( f(t_i) \right)^{\delta_i} \left( S(t_i) \right)^{1-\delta_i} \\ = \prod_{i=1}^{10} \left( \lambda*e^{-\lambda t_i} \right)^{\delta_i} \left( e^{-\lambda t_i} \right)^{1-\delta_i} \]

The log-likelihood can then be written as:

\[ log(L) \propto \sum_{exact} log(\lambda) - \lambda \sum t_i \]

The mle can thus be calculated as:

\[ \lambda_{mle} = \frac{r}{\sum t_i} \]

r = sum(data$delta)
sum = sum(data$surv_times)
mle = r/sum
mean = 1/mle
mle
## [1] 0.002181818
mean
## [1] 458.3333