library(survival)
## Loading required package: splines
cell=read.csv("Cell.csv")
LifeTable=function(time,status,breaks,finite) {
failed=c(0,hist(time[status==1],breaks=breaks,plot=F)$counts)
censored=c(0,hist(time[status==0],breaks=breaks,plot=F)$counts)
alivestart=length(time)-cumsum(failed[-length(failed)])-cumsum(censored[-length(censored)])
atrisk=alivestart-c(censored[-1]/2)
failrate=failed[-1]/atrisk
survrate=1-failrate
survest=c(1,cumprod(1-failrate)[1:(length(failrate))])
if (finite==0) return(list(Failed=failed[-1],Censored=censored[-1],AliveStart=alivestart,
AtRisk=atrisk[-length(atrisk)],FailRate=failrate[-length(failrate)],
SurvRate=survrate[-length(survrate)],SurvEst=survest[-length(survest)]))
if (finite==1) return(list(Failed=failed[-1],Censored=censored[-1],AliveStart=alivestart,
AtRisk=atrisk,FailRate=failrate,SurvRate=survrate,SurvEst=survest)) }
l=LifeTable(c(1:15), rep(1,16), c(1:15), 0)
SurvDrop = l$SurvEst[1:14] - l$SurvEst[2:15]
HazardEst = ( SurvDrop ) / l$SurvEst[2:15]
hazstep = stepfun( c(1:15) , c(0,HazardEst,0) )
plot(hazstep,do.points=F,ylab='h(k)',xlab='k',main="")
We cannot fit an Exponential model to this data because we do not have the raw data itself. In order to fit any parametric model, we need the exact data. Estimates will not suffice.
This risk in this plot is not stable, indicating that an Exponential model is not a good fit for this data.
We cannot fit a Kaplan-Meier curve to this data because we do now know when (in what time interval) subjects were lost to follow up.
| Age Interval | # of CHD Events [Failure] | # Lost to Follow-Up | Alive at Start [Total-(Failure+# Lost to Follow-Up)] | At Risk [Alive at Start - .5(# Lost to Followup)] | Interval Failure Rate [Failures/At Risk] | Interval Survival Rate [1-Int. Failure Rate] | Survival at left endpt. |
|---|---|---|---|---|---|---|---|
| 45-50 | 17 | 29 | 1571 | 1562.5 | .011 | .989 | 1 |
| 50-55 | 36 | 60 | 1525 | 1507 | (36/1507=).024 | .976 | (.989x1=).989 |
| 55-60 | 62 | 83 | 1429 | 1398 | (62/1398=).044 | .956 | (.976x.989=).965 |
| 60-65 | 76 | 441 | 1284 | 1246 | (76/1284=).061 | .939 | (.956x.965=).922 |
| 65-70 | 50 | 439 | 767 | 742 | (50/742=).067 | .933 | (.922x.939=).866 |
| 70-75 | 9 | 262 | 278 | 267.5 | (9/267.5=).034 | .966 | (.933x.866=).808 |
| 75-80 | 0 | 7 | 7 | – | – | – | (.966x.808=).781 |
To plot the actuarial estimate:
l=LifeTable(c(45:80), rep(1,8), c(0,45,50,55,60,65,70,75,80), 0)
step = stepfun( seq(50,80,5) , l$SurvEst )
plot(step , do.points=F , ylab="Survival" , xlab="Days" , main="" , ylim=c(0.4,1)) #step function method
lines( seq(45,80,5) , l$SurvEst , col='red' ) #straight-line method
\(h(t)\) measures the number of people who failed against the number of people who survived up to a certain point and \(f(t)\) measures the number of people who failed against the total number of people in the story. The only way these will be equal is if the total number of people in the study is equal to the number of people who survive up to any given point, i.e. there are no failures.
The difference between h(t) and f(t) will be negligible for small values of t, but increases as t increases.
For an Exponential function, \(h(t)=\lambda\). Previous events are not considered when calculating risk at a certain point for an Exponential model, so it makes sense that risk () would be static no matter where (in what interval) risk is being calculated.
a.
kmcell=survfit(Surv(cell$Time)~1, conf.type='plain', data=cell)
plot(kmcell, mark.time=F, conf.int=F)
summary(kmcell)
## Call: survfit(formula = Surv(cell$Time) ~ 1, data = cell, conf.type = "plain")
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 296 17 1 0.9412 0.0571 0.8293 1.000
## 660 16 1 0.8824 0.0781 0.7292 1.000
## 670 15 1 0.8235 0.0925 0.6423 1.000
## 728 14 1 0.7647 0.1029 0.5631 0.966
## 775 13 1 0.7059 0.1105 0.4893 0.922
## 797 12 1 0.6471 0.1159 0.4199 0.874
## 841 11 1 0.5882 0.1194 0.3543 0.822
## 869 10 1 0.5294 0.1211 0.2921 0.767
## 999 9 1 0.4706 0.1211 0.2333 0.708
## 1006 8 1 0.4118 0.1194 0.1778 0.646
## 1035 7 1 0.3529 0.1159 0.1258 0.580
## 1169 6 1 0.2941 0.1105 0.0775 0.511
## 1193 5 1 0.2353 0.1029 0.0337 0.437
## 1415 4 1 0.1765 0.0925 0.0000 0.358
## 1424 3 1 0.1176 0.0781 0.0000 0.271
## 1500 2 1 0.0588 0.0571 0.0000 0.171
## 1540 1 1 0.0000 NaN NaN NaN
Our interval estimate of the three-year survival probability is [.1258, .0775], and our estimate of the median failure time is [728, 1193].
The derivative of the numerator is \(1-2S(k)\). Setting this equal to 0 (to find critical points) reveals that this value is maximized at S(k)=.5. The denominator is in proportion to the variance, so it does not affect our calculations.
To fit and plot the Weibull model:
survreg( Surv( Time ) ~ 1 , dist='weibull' , data=cell)
## Call:
## survreg(formula = Surv(Time) ~ 1, data = cell, dist = "weibull")
##
## Coefficients:
## (Intercept)
## 7.012
##
## Scale= 0.3012
##
## Loglik(model)= -122.8 Loglik(intercept only)= -122.8
## n= 17
plot(hazstep,do.points=F,ylab='h(k)',xlab='k',main="")
curve(dweibull(x,shape=1/0.3012274,scale=exp(7.011883))/(1-pweibull(x,shape=1/0.3012274, scale=exp(7.011883))),add=T,col='red')
The Weibull model is flat, indicating no wear and tear over time.
We assume that half of the censored observations survive any particular stage.
We could make our approximation better if we knew exactly how many points were censored, and at what times.
We can estimate the hazard function from a life-table by either a) finding how far S drops in a particular interval, calculating the density in that interval and then dividing by interval length or b) dividing the interval failure rate by the interval length.
When two groups S1 and S2 are stochastically ordered, S1(k) >= S2(k), for all time points k, i.e. survival is more likely at all points.
False
False
True
False
True
True
True
True
False