library(survival)
## Loading required package: splines
Burns=read.csv("Burns.csv")
To find the columns of the actuarial table needed to estimate the time-to-infection survival curve, as well as plot the “straight line” and “step function” estimates:
LifeTable=function(time,status,breaks,finite) {
failed=c(0,hist(time[status==1],breaks=breaks,plot=F)$counts)
censored=c(0,hist(time[status==0],breaks=breaks,plot=F)$counts)
alivestart=length(time)-cumsum(failed[-length(failed)])-cumsum(censored[-length(censored)])
atrisk=alivestart-c(censored[-1]/2)
failrate=failed[-1]/atrisk
survrate=1-failrate
survest=c(1,cumprod(1-failrate)[1:(length(failrate))])
if (finite==0) return(list(Failed=failed[-1],Censored=censored[-1],AliveStart=alivestart,
AtRisk=atrisk[-length(atrisk)],FailRate=failrate[-length(failrate)],
SurvRate=survrate[-length(survrate)],SurvEst=survest[-length(survest)]))
if (finite==1) return(list(Failed=failed[-1],Censored=censored[-1],AliveStart=alivestart,
AtRisk=atrisk,FailRate=failrate,SurvRate=survrate,SurvEst=survest)) }
l=LifeTable(Burns$Time, Burns$Status, c(0, 10, 20, 30, 40, 50, 60,70, 80, max(Burns$Time)), 0)
step = stepfun( c(10,20,30,40,50,60,70,80) , l$SurvEst )
plot(step , do.points=F , ylab="Survival" , xlab="Days" , main="" , ylim=c(0.4,1))
lines( c(0,10,20,30,40,50,60,70,80) , l$SurvEst , col='red' )
To add the Kaplan-Meier curve:
KM = survfit( Surv(Time,Status)~1 , data=Burns )
plot(step , do.points=F , ylab="Survival" , xlab="Days" , main="" , ylim=c(0.4,1))
lines( c(0,10,20,30,40,50,60,70,80) , l$SurvEst , col='red' )
lines( KM , mark.time=F , col='blue' , conf.int=F )
The actuarial method, using straight lines, adequately estimates the Kaplan-Meier.
We MUST use actuarial data when the data is aggregated, i.e. we don’t know the exact time of failure. The actuarial method works when we know an interval of time of failure, whereas the Kaplan-Meier requires the exact time of both failure and censoring.
The actuarial method assumes that half of the censored data is included in the “At Risk” group.
The actuarial estimate drops six times in the plot from (1). The estimate only drops six times because three of the \(Failed\) values are 0, i.e. there are three intervals where there are zero fails.
To estimate the hazard function for time-to-infection:
SurvDrop = l$SurvEst[1:8] - l$SurvEst[2:9]
HazardEst = ( SurvDrop/10 ) / l$SurvEst[1:8]
hazstep = stepfun( c(0,10,20,30,40,50,60,70,80) , c(0,HazardEst,0) )
plot(hazstep,do.points=F,ylab='h(k)',xlab='k',main="")
The risk of infection appears to be bimodal: risk is highest between 10 and 20 days, and again between 40 and 50 days. After 60 days, the risk of infection drops to almost 0. In general, the risk is higher at first, then decreases, then increases again, then decreases to essentialy zero.
Risk in an Exponential model must always be flat. In a Normal model, it must always increase. Risk for a Weibull model is more flexible, but cannot increase and then decrease. The data in the plot does not display any of these trends, so a parametric model is probably not our best bet.
To fit and plot the Weibull model:
survreg( Surv( Time , Status ) ~ 1 , dist='weibull' , data=Burns)
## Call:
## survreg(formula = Surv(Time, Status) ~ 1, data = Burns, dist = "weibull")
##
## Coefficients:
## (Intercept)
## 4.421
##
## Scale= 1.185
##
## Loglik(model)= -250.9 Loglik(intercept only)= -250.9
## n= 154
plot(hazstep,do.points=F,ylab='h(k)',xlab='k',main="")
curve(dweibull(x,shape=1/1.185073,scale=exp(4.421479))/(1-pweibull(x,shape=1/1.185073, scale=exp(4.421479))),add=T,col='red')
As expected, the Weibull model is not a good fit for the data.
To draw a piecewise constant estimate of the density:
DensityEst = ( SurvDrop / 10 )
denstep = stepfun( c(0,10,20,30,40,50,60,70,80) , c(0,DensityEst,0) )
plot( denstep , do.points=F , ylab='f(k)' , xlab='k' , main="")
This plot is also bimodal, with the highest risk of infection falling in the same regions: 10-20 and 40-50. However, the density of both regions is close to the same in the hazard plot, whereas the density of the first region is almost twice as much the density of the second region. This is due to the number of people that remain in the study as it goes on. The hazard function examines how many people die out of the people who make it to that point, but the density function examines how many people die out of all the people in the survey. This means that as time goes on, the denominator for the density function will increase while the denominator for the hazard function remains the same, causing a drop in density estimates and an inflation in hazard estimates.
KM = survfit( Surv(Time,Status)~1 , data=Burns )
summary(KM)
## Call: survfit(formula = Surv(Time, Status) ~ 1, data = Burns)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 154 1 0.994 0.00647 0.981 1.000
## 2 153 3 0.974 0.01282 0.949 0.999
## 3 150 4 0.948 0.01788 0.914 0.984
## 4 146 5 0.916 0.02240 0.873 0.961
## 5 141 6 0.877 0.02650 0.826 0.930
## 6 134 2 0.864 0.02767 0.811 0.920
## 7 132 3 0.844 0.02927 0.788 0.903
## 8 126 2 0.831 0.03030 0.773 0.892
## 9 121 2 0.817 0.03132 0.758 0.881
## 10 117 2 0.803 0.03230 0.742 0.869
## 11 112 3 0.781 0.03374 0.718 0.850
## 13 102 1 0.774 0.03426 0.709 0.844
## 14 96 1 0.766 0.03484 0.700 0.837
## 16 88 1 0.757 0.03552 0.690 0.830
## 17 82 2 0.738 0.03697 0.669 0.815
## 18 76 2 0.719 0.03847 0.647 0.799
## 19 70 1 0.709 0.03927 0.636 0.790
## 21 65 1 0.698 0.04015 0.623 0.781
## 23 55 1 0.685 0.04137 0.609 0.771
## 32 35 1 0.666 0.04458 0.584 0.759
## 42 19 1 0.631 0.05428 0.533 0.746
## 44 15 1 0.589 0.06493 0.474 0.731
## 47 12 1 0.539 0.07581 0.410 0.711
## 51 9 1 0.480 0.08795 0.335 0.687
To find the interval estimate, look at both intervals and see when each falls below .8. Our interval estimate of the 20th percentile is (7, 18).
To graph Kaplan-Meier curves stratified by treatment:
KMTrt = survfit( Surv( Time , Status )~Treatment , Burns)
plot( KMTrt , mark.time=F , conf.int=T , lty=1:2, col=1:2)
summary(KMTrt)
## Call: survfit(formula = Surv(Time, Status) ~ Treatment, data = Burns)
##
## Treatment=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 70 1 0.986 0.0142 0.958 1.000
## 3 69 3 0.943 0.0277 0.890 0.999
## 4 66 4 0.886 0.0380 0.814 0.963
## 5 62 1 0.871 0.0400 0.796 0.953
## 6 60 2 0.842 0.0436 0.761 0.932
## 7 58 3 0.799 0.0481 0.710 0.899
## 8 53 1 0.784 0.0495 0.693 0.887
## 9 50 2 0.752 0.0522 0.657 0.862
## 10 47 1 0.736 0.0535 0.639 0.849
## 11 44 2 0.703 0.0561 0.601 0.822
## 13 40 1 0.685 0.0574 0.582 0.808
## 19 33 1 0.665 0.0593 0.558 0.791
## 21 30 1 0.642 0.0613 0.533 0.774
## 23 27 1 0.619 0.0635 0.506 0.756
## 32 18 1 0.584 0.0686 0.464 0.735
## 44 8 1 0.511 0.0910 0.361 0.725
## 47 6 1 0.426 0.1086 0.258 0.702
## 51 4 1 0.320 0.1231 0.150 0.680
##
## Treatment=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 2 84 3 0.964 0.0202 0.925 1.000
## 3 81 1 0.952 0.0232 0.908 0.999
## 4 80 1 0.940 0.0258 0.891 0.992
## 5 79 5 0.881 0.0353 0.814 0.953
## 8 73 1 0.869 0.0369 0.800 0.944
## 10 70 1 0.856 0.0384 0.784 0.935
## 11 68 1 0.844 0.0398 0.769 0.926
## 14 57 1 0.829 0.0418 0.751 0.915
## 16 50 1 0.812 0.0441 0.730 0.904
## 17 47 2 0.778 0.0485 0.688 0.879
## 18 41 2 0.740 0.0531 0.643 0.852
## 42 9 1 0.658 0.0907 0.502 0.862
Our estimate for the routine bathing group is [0.464, 0.735], and our estimate for the group bathing with the new chlorhexidine solution is [0.643, 0.852]. There is significant overlap between these two intervals, so there does not appear to be a difference in the 30-day survival rate between the two treatments.
If there is difference between the treatment groups at all time points, the Kaplan-Meier curves should run in a generally parallel direction to each other. Overlap between the two groups at any time point would suggest similarity.